Special types of hardware (GPUs, KNLs) available & how to access them

This version (2017/10/09 14:08) is a draft.
Approvals: 0/1

Article written by Siegfried Höfinger (VSC Team) <html> </html>(last update 2017-04-27 by sh).

Rank	Machine	Performance	Accelerators
1.	Sunway TaihuLight	93 PFLOPs/s
2.	Tianhe-2 (MilkyWay-2)	34 PFLOPs/s	<html><font color=“#cc3300”></html>Intel Xeon Phi 31S1P<html></font></html>
3.	Titan	18 PFLOPs/s	<html><font color=“navy”></html>NVIDIA K20x<html></font></html>
4.	Sequoia	17 PFLOPs/s
5.	Cori	14 PFLOPs/s	<html><font color=“#cc3300”></html>Intel Xeon Phi 7250<html></font></html>
6.	Oakforest-PACS	14 PFLOPs/s	<html><font color=“#cc3300”></html>Intel Xeon Phi 7250<html></font></html>
7.	K-computer	11 PFLOPs/s
8.	Piz Daint	10 PFLOPs/s	<html><font color=“navy”></html>NVIDIA P100<html></font></html>
9.	Mira	9 PFLOPs/s
10.	Trinity	8 PFLOPs/s

Model	#cores	Clock Freq (GHz)	Memory (GB)	Bandwith (GB/s)	TDP (Watt)	FP32/FP64 (GFLOPs/s)
<html><font color=“navy”></html>10x GeForce GTX-1080 n25-0[10-20]<html></font></html>
	2560	1.61	8	320	180	8228/257
<html><font color=“navy”></html>4x Tesla k20m n25-00[5-6]<html></font></html>
	2496	0.71	5	208	195	3520/1175
<html><font color=“#cc3300”></html>4x KNL 7210 n25-05[0-3]<html></font></html>
	64	1.30	384	102	215	5000+/2500+

Interactive mode

1. salloc -N 1 -p gpu --qos=gpu_compute  -C gtx1080 --gres=gpu:1  (...perhaps -L intel@vsc)

2. squeue -u training

3. srun -n 1 hostname  (...while still on the login node !)

4. ssh n25-012  (...or whatever else node had been assigned)

5. module load cuda/8.0.27    
     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
     nvcc ./matrixMul.cu
     ./a.out

     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
     nvcc matrixMulCUBLAS.cu -lcublas
     ./a.out

6. nvidia-smi

7. /opt/sw/x86_64/glibc-2.17/IntelXeonE51620v3/cuda/8.0.27/NVIDIA_CUDA-8.0_Samples/
   1_Utilities/deviceQuery/deviceQuery

SLURM submission

#!/bin/bash
#  usage: sbatch ./gpu_test.scrpt          
#
#SBATCH -J gtx1080     
#SBATCH -N 1
#SBATCH --partition=gpu         
#SBATCH --qos=gpu_compute
#SBATCH -C gtx1080     
#SBATCH --gres=gpu:1
 
module purge
module load cuda/8.0.27
 
nvidia-smi
/opt/sw/x86_64/glibc-2.17/IntelXeonE51620v3/cuda/8.0.27/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery

<html></html>Exercise/Example/Problem:<html></html> <html> </html> Using interactive mode or batch submission, figure out whether we have ECC enabled on GPUs of type gtx1080 ?

Interactive mode

1. salloc -N 1 -p knl --qos=knl -C knl -L intel@vsc

2. squeue -u training

3. srun -n 1 hostname

4. ssh n25-050  (...or whatever else node had been assigned)

5. module purge

6. module load intel/17.0.2
     cd ~/examples/09_special_hardware/knl
     icc -xHost -qopenmp sample.c
     export OMP_NUM_THREADS=16
     ./a.out

SLURM submission

#!/bin/bash
#  usage: sbatch ./knl_test.scrpt          
#
#SBATCH -J knl         
#SBATCH -N 1
#SBATCH --partition=knl         
#SBATCH --qos=knl         
#SBATCH -C knl         
#SBATCH -L intel@vsc
 
module purge
module load intel/17.0.1
cat /proc/cpuinfo
export OMP_NUM_THREADS=16
./a.out

<html></html>Exercise/Example/Problem:<html></html> <html> </html> Given our KNL model, can you determine the current level of hyperthreading, ie 2x, 3x, 4x, whatever-x ?

Performance	Power Efficiency

Special types of hardware (GPUs, KNLs) available & how to access them

TOP500 List Nov 2016

Components on VSC-3

Working on GPU nodes

Working on GPU nodes cont.

Working on KNL nodes

Working on KNL nodes cont.

Real-World Example, AMBER-16