====== Special types of hardware (GPUs, KNLs) available & how to access them ====== * Article written by Siegfried Höfinger (VSC Team)
(last update 2017-04-27 by sh). ====== TOP500 List Nov 2016 ====== ^ Rank^Nation ^Machine ^ Performance^Accelerators ^ | 1.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:cn.png}} |Sunway TaihuLight | 93 PFLOPs/s| | | 2.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:cn.png}} |Tianhe-2 (MilkyWay-2) | 34 PFLOPs/s|Intel Xeon Phi 31S1P | | 3.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Titan | 18 PFLOPs/s|NVIDIA K20x | | 4.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Sequoia | 17 PFLOPs/s| | | 5.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Cori | 14 PFLOPs/s|Intel Xeon Phi 7250 | | 6.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:jp.png}} |Oakforest-PACS | 14 PFLOPs/s|Intel Xeon Phi 7250 | | 7.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:jp.png}} |K-computer | 11 PFLOPs/s| | | 8.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:ch.png}} |Piz Daint | 10 PFLOPs/s|NVIDIA P100 | | 9.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Mira | 9 PFLOPs/s| | | 10.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Trinity | 8 PFLOPs/s| | ====== Components on VSC-3 ====== ^Model ^#cores^Clock Freq (GHz)^Memory (GB)^Bandwith (GB/s)^TDP (Watt)^FP32/FP64 (GFLOPs/s)^ |10x GeForce GTX-1080 n25-0[10-20] | | | | | | | |{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:nvidia-gtx-1080.jpg}}|2560 |1.61 |8 |320 |180 |8228/257 | |4x Tesla k20m n25-00[5-6] | | | | | | | |{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:nvidia-k20m.png}} |2496 |0.71 |5 |208 |195 |3520/1175 | |4x KNL 7210 n25-05[0-3] | | | | | | | |{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:intel-knl.png}} |64 |1.30 |384 |102 |215 |5000+/2500+ | ====== Working on GPU nodes ====== **Interactive mode** 1. salloc -N 1 -p gpu --qos=gpu_compute -C gtx1080 --gres=gpu:1 (...perhaps -L intel@vsc) 2. squeue -u training 3. srun -n 1 hostname (...while still on the login node !) 4. ssh n25-012 (...or whatever else node had been assigned) 5. module load cuda/8.0.27 cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul nvcc ./matrixMul.cu ./a.out cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS nvcc matrixMulCUBLAS.cu -lcublas ./a.out 6. nvidia-smi 7. /opt/sw/x86_64/glibc-2.17/IntelXeonE51620v3/cuda/8.0.27/NVIDIA_CUDA-8.0_Samples/ 1_Utilities/deviceQuery/deviceQuery ====== Working on GPU nodes cont. ====== **SLURM submission** #!/bin/bash # usage: sbatch ./gpu_test.scrpt # #SBATCH -J gtx1080 #SBATCH -N 1 #SBATCH --partition=gpu #SBATCH --qos=gpu_compute #SBATCH -C gtx1080 #SBATCH --gres=gpu:1 module purge module load cuda/8.0.27 nvidia-smi /opt/sw/x86_64/glibc-2.17/IntelXeonE51620v3/cuda/8.0.27/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery **Exercise/Example/Problem:**
Using interactive mode or batch submission, figure out whether we have ECC enabled on GPUs of type gtx1080 ? ====== Working on KNL nodes ====== **Interactive mode** 1. salloc -N 1 -p knl --qos=knl -C knl -L intel@vsc 2. squeue -u training 3. srun -n 1 hostname 4. ssh n25-050 (...or whatever else node had been assigned) 5. module purge 6. module load intel/17.0.2 cd ~/examples/09_special_hardware/knl icc -xHost -qopenmp sample.c export OMP_NUM_THREADS=16 ./a.out ====== Working on KNL nodes cont. ====== **SLURM submission** #!/bin/bash # usage: sbatch ./knl_test.scrpt # #SBATCH -J knl #SBATCH -N 1 #SBATCH --partition=knl #SBATCH --qos=knl #SBATCH -C knl #SBATCH -L intel@vsc module purge module load intel/17.0.1 cat /proc/cpuinfo export OMP_NUM_THREADS=16 ./a.out **Exercise/Example/Problem:**
Given our KNL model, can you determine the current level of hyperthreading, ie 2x, 3x, 4x, whatever-x ? ====== Real-World Example, AMBER-16 ====== ^ Performance^Power Efficiency ^ | {{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:amber16.perf.png}}|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:amber16.powereff.png}} |