Table of Contents

Special types of hardware (GPUs, KNLs) available & how to access them

TOP500 List Nov 2016

RankNation Machine PerformanceAccelerators
1. Sunway TaihuLight 93 PFLOPs/s
2. Tianhe-2 (MilkyWay-2) 34 PFLOPs/s<html><font color=“#cc3300”></html>Intel Xeon Phi 31S1P<html></font></html>
3. Titan 18 PFLOPs/s<html><font color=“navy”></html>NVIDIA K20x<html></font></html>
4. Sequoia 17 PFLOPs/s
5. Cori 14 PFLOPs/s<html><font color=“#cc3300”></html>Intel Xeon Phi 7250<html></font></html>
6. Oakforest-PACS 14 PFLOPs/s<html><font color=“#cc3300”></html>Intel Xeon Phi 7250<html></font></html>
7. K-computer 11 PFLOPs/s
8. Piz Daint 10 PFLOPs/s<html><font color=“navy”></html>NVIDIA P100<html></font></html>
9. Mira 9 PFLOPs/s
10. Trinity 8 PFLOPs/s

<HTML> <!–slide 2–> </HTML>

Components on VSC-3

Model #coresClock Freq (GHz)Memory (GB)Bandwith (GB/s)TDP (Watt)FP32/FP64 (GFLOPs/s)
<html><font color=“navy”></html>10x GeForce GTX-1080 n25-0[10-20]<html></font></html>
nvidia-gtx-1080.jpg2560 1.61 8 320 180 8228/257
<html><font color=“navy”></html>4x Tesla k20m n25-00[5-6]<html></font></html>
2496 0.71 5 208 195 3520/1175
<html><font color=“#cc3300”></html>4x KNL 7210 n25-05[0-3]<html></font></html>
64 1.30 384 102 215 5000+/2500+

<HTML> <!–slide 3–> </HTML>

Working on GPU nodes

Interactive mode

1. salloc -N 1 -p gpu --qos=gpu_compute  -C gtx1080 --gres=gpu:1  (...perhaps -L intel@vsc)

2. squeue -u training

3. srun -n 1 hostname  (...while still on the login node !)

4. ssh n25-012  (...or whatever else node had been assigned)

5. module load cuda/8.0.27    
     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
     nvcc ./matrixMul.cu
     ./a.out

     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
     nvcc matrixMulCUBLAS.cu -lcublas
     ./a.out

6. nvidia-smi

7. /opt/sw/x86_64/glibc-2.17/IntelXeonE51620v3/cuda/8.0.27/NVIDIA_CUDA-8.0_Samples/
   1_Utilities/deviceQuery/deviceQuery   

<HTML> <!–slide 4–> </HTML>

Working on GPU nodes cont.

SLURM submission

#!/bin/bash
#  usage: sbatch ./gpu_test.scrpt          
#
#SBATCH -J gtx1080     
#SBATCH -N 1
#SBATCH --partition=gpu         
#SBATCH --qos=gpu_compute
#SBATCH -C gtx1080     
#SBATCH --gres=gpu:1
 
module purge
module load cuda/8.0.27
 
nvidia-smi
/opt/sw/x86_64/glibc-2.17/IntelXeonE51620v3/cuda/8.0.27/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery

<html><font color=“navy”></html>Exercise/Example/Problem:<html></font></html> <html><br/></html> Using interactive mode or batch submission, figure out whether we have ECC enabled on GPUs of type gtx1080 ?

<HTML> <!–slide 5–> </HTML>

Working on KNL nodes

Interactive mode

1. salloc -N 1 -p knl --qos=knl -C knl -L intel@vsc

2. squeue -u training

3. srun -n 1 hostname

4. ssh n25-050  (...or whatever else node had been assigned)

5. module purge

6. module load intel/17.0.2
     cd ~/examples/09_special_hardware/knl
     icc -xHost -qopenmp sample.c
     export OMP_NUM_THREADS=16
     ./a.out

<HTML> <!–slide 6–> </HTML>

Working on KNL nodes cont.

SLURM submission

#!/bin/bash
#  usage: sbatch ./knl_test.scrpt          
#
#SBATCH -J knl         
#SBATCH -N 1
#SBATCH --partition=knl         
#SBATCH --qos=knl         
#SBATCH -C knl         
#SBATCH -L intel@vsc
 
module purge
module load intel/17.0.1
cat /proc/cpuinfo
export OMP_NUM_THREADS=16
./a.out

<html><font color=“#cc3300”></html>Exercise/Example/Problem:<html></font></html> <html><br/></html> Given our KNL model, can you determine the current level of hyperthreading, ie 2x, 3x, 4x, whatever-x ?

<HTML> <!–slide 7–> </HTML>

Real-World Example, AMBER-16

PerformancePower Efficiency