====== Special types of hardware (GPUs, KNLs) available & how to access them ======
* Article written by Siegfried Höfinger (VSC Team)
(last update 2017-04-27 by sh).
====== TOP500 List Nov 2016 ======
^ Rank^Nation ^Machine ^ Performance^Accelerators ^
| 1.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:cn.png}} |Sunway TaihuLight | 93 PFLOPs/s| |
| 2.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:cn.png}} |Tianhe-2 (MilkyWay-2) | 34 PFLOPs/s|Intel Xeon Phi 31S1P |
| 3.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Titan | 18 PFLOPs/s|NVIDIA K20x |
| 4.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Sequoia | 17 PFLOPs/s| |
| 5.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Cori | 14 PFLOPs/s|Intel Xeon Phi 7250 |
| 6.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:jp.png}} |Oakforest-PACS | 14 PFLOPs/s|Intel Xeon Phi 7250 |
| 7.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:jp.png}} |K-computer | 11 PFLOPs/s| |
| 8.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:ch.png}} |Piz Daint | 10 PFLOPs/s|NVIDIA P100 |
| 9.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Mira | 9 PFLOPs/s| |
| 10.|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:us.png}} |Trinity | 8 PFLOPs/s| |
====== Components on VSC-3 ======
^Model ^#cores^Clock Freq (GHz)^Memory (GB)^Bandwith (GB/s)^TDP (Watt)^FP32/FP64 (GFLOPs/s)^
|10x GeForce GTX-1080 n25-0[10-20] | | | | | | |
|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:nvidia-gtx-1080.jpg}}|2560 |1.61 |8 |320 |180 |8228/257 |
|4x Tesla k20m n25-00[5-6] | | | | | | |
|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:nvidia-k20m.png}} |2496 |0.71 |5 |208 |195 |3520/1175 |
|4x KNL 7210 n25-05[0-3] | | | | | | |
|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:intel-knl.png}} |64 |1.30 |384 |102 |215 |5000+/2500+ |
====== Working on GPU nodes ======
**Interactive mode**
1. salloc -N 1 -p gpu --qos=gpu_compute -C gtx1080 --gres=gpu:1 (...perhaps -L intel@vsc)
2. squeue -u training
3. srun -n 1 hostname (...while still on the login node !)
4. ssh n25-012 (...or whatever else node had been assigned)
5. module load cuda/8.0.27
cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
nvcc ./matrixMul.cu
./a.out
cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
nvcc matrixMulCUBLAS.cu -lcublas
./a.out
6. nvidia-smi
7. /opt/sw/x86_64/glibc-2.17/IntelXeonE51620v3/cuda/8.0.27/NVIDIA_CUDA-8.0_Samples/
1_Utilities/deviceQuery/deviceQuery
====== Working on GPU nodes cont. ======
**SLURM submission**
#!/bin/bash
# usage: sbatch ./gpu_test.scrpt
#
#SBATCH -J gtx1080
#SBATCH -N 1
#SBATCH --partition=gpu
#SBATCH --qos=gpu_compute
#SBATCH -C gtx1080
#SBATCH --gres=gpu:1
module purge
module load cuda/8.0.27
nvidia-smi
/opt/sw/x86_64/glibc-2.17/IntelXeonE51620v3/cuda/8.0.27/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery
**Exercise/Example/Problem:**
Using interactive mode or batch submission, figure out whether we have ECC enabled on GPUs of type gtx1080 ?
====== Working on KNL nodes ======
**Interactive mode**
1. salloc -N 1 -p knl --qos=knl -C knl -L intel@vsc
2. squeue -u training
3. srun -n 1 hostname
4. ssh n25-050 (...or whatever else node had been assigned)
5. module purge
6. module load intel/17.0.2
cd ~/examples/09_special_hardware/knl
icc -xHost -qopenmp sample.c
export OMP_NUM_THREADS=16
./a.out
====== Working on KNL nodes cont. ======
**SLURM submission**
#!/bin/bash
# usage: sbatch ./knl_test.scrpt
#
#SBATCH -J knl
#SBATCH -N 1
#SBATCH --partition=knl
#SBATCH --qos=knl
#SBATCH -C knl
#SBATCH -L intel@vsc
module purge
module load intel/17.0.1
cat /proc/cpuinfo
export OMP_NUM_THREADS=16
./a.out
**Exercise/Example/Problem:**
Given our KNL model, can you determine the current level of hyperthreading, ie 2x, 3x, 4x, whatever-x ?
====== Real-World Example, AMBER-16 ======
^ Performance^Power Efficiency ^
| {{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:amber16.perf.png}}|{{pandoc:introduction-to-vsc:09_special_hardware:01_accelerators:amber16.powereff.png}} |