The most recent version of this page is a draft.DiffThis version is outdated by a newer approved version.DiffThis version (2019/01/15 12:42) is a draft.
Approvals: 0/1

This is an old revision of the document!


Special hardware (GPUs, KNLs, binfs) available & how to use it

  • Article written by Siegfried Höfinger (VSC Team) <html><br></html>(last update 2019-01-15 by sh).

TOP500 List November 2018

<HTML> <!–slide 1–> <!–for nations flags see https://www.free-country-flags.com–> </HTML>

RankNation Machine PerformanceAccelerators
1. Summit 144 PFLOPs/s<html><font color=“navy”></html>NVIDIA V100<html></font></html>
2. Sierra 95 PFLOPs/s<html><font color=“navy”></html>NVIDIA V100<html></font></html>
3. Sunway TaihuLight 93 PFLOPs/s
4. Tianhe-2A 62 PFLOPs/s
5. Piz Daint 21 PFLOPs/s<html><font color=“navy”></html>NVIDIA P100<html></font></html>
6. Trinity 20 PFLOPs/s<html><font color=“#cc3300”></html>Intel Xeon Phi 7250/KNL<html></font></html>
7. ABCI 20 PFLOPs/s<html><font color=“navy”></html>NVIDIA V100<html></font></html>
8. SuperMUC-NG 20 PFLOPs/s
9. Titan 18 PFLOPs/s<html><font color=“navy”></html>NVIDIA K20x<html></font></html>
10. Sequoia 17 PFLOPs/s

<HTML> <!–slide 2–> </HTML>

Components on VSC-3

Model #coresClock Freq (GHz)Memory (GB)Bandwidth (GB/s)TDP (Watt)FP32/FP64 (GFLOPs/s)
<html><font color=“navy”></html>36+50x GeForce GTX-1080 n7[1-3]-[001-004,001-022,001-028]<html></font></html>
nvidia-gtx-1080.jpg 2560 1.61 8 320 180 8228/257
<html><font color=“navy”></html>4x Tesla k20m n72-02[4-5]<html></font></html>
2496 0.71 5 208 195 3520/1175
<html><font color=“navy”></html>2x Tesla m60 n72-023]<html></font></html>
m60.jpeg 2048 1.18 8 160 265 8500/266
<html><font color=“#cc3300”></html>4x KNL 7210 n25-05[0-3]<html></font></html>
64 1.30 197+16<384 102 215 5000+/2500+

<HTML> <!–slide 3–> </HTML>

Working on GPU nodes

Interactive mode

1. salloc -N 1 -p gpu_gtx1080single --qos gpu_gtx1080single --gres gpu:1  (...perhaps -L intel@vsc)

2. squeue -u training

3. srun -n 1 hostname  (...while still on the login node !)

4. ssh n72-012  (...or whatever else node had been assigned)

5. module load cuda/9.1.85    
     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
     nvcc ./matrixMul.cu  
     ./a.out 

     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
     nvcc matrixMulCUBLAS.cu -lcublas
     ./a.out

6. nvidia-smi

7. /opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery

<HTML> <!–slide 4–> </HTML>

Working on GPU nodes cont.

SLURM submission gpu_test.scrpt

#!/bin/bash
#  usage: sbatch ./gpu_test.scrpt          
#
#SBATCH -J gtx1080     
#SBATCH -N 1
#SBATCH --partition gpu_gtx1080single         
#SBATCH --qos gpu_gtx1080single
#SBATCH --gres gpu:1
 
module purge
module load cuda/9.1.85
 
nvidia-smi
/opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery      

<html><font color=“navy”></html>Exercise/Example/Problem:<html></font></html> <html><br/></html> Using interactive mode or batch submission, figure out whether we have ECC enabled on GPUs of type gtx1080 ?

<HTML> <!–slide 5–> </HTML>

Working on KNL nodes

Interactive mode

1. salloc -N 1 -p knl --qos knl -C knl -L intel@vsc

2. squeue -u training

3. srun -n 1 hostname   (...while still on the login node !)

4. ssh n25-050  (...or whatever else node had been assigned)

5. module purge

6. module load intel/18
     cd ~/examples/09_special_hardware/knl
     icc -xHost -qopenmp sample.c
     export OMP_NUM_THREADS=16
     ./a.out

<HTML> <!–slide 6–> </HTML>

Working on KNL nodes cont.

SLURM submission knl_test.scrpt

#!/bin/bash
#  usage: sbatch ./knl_test.scrpt          
#
#SBATCH -J knl         
#SBATCH -N 1
#SBATCH --partition knl         
#SBATCH --qos knl         
#SBATCH -C knl         
#SBATCH -L intel@vsc
 
module purge
module load intel/18
cat /proc/cpuinfo
export OMP_NUM_THREADS=16
./a.out

<html><font color=“#cc3300”></html>Exercise/Example/Problem:<html></font></html> <html><br/></html> Given our KNL model, can you determine the current level of hyperthreading, ie 2x, 3x, 4x, whatever-x ?

<HTML> <!–slide 7–> </HTML>

Working on binf nodes

Interactive mode

1. salloc -N 1 -p binf --qos normal_binf -C binf -L intel@vsc

2. squeue -u training

3. srun -n 4 hostname   (...while still on the login node !)

4. ssh binf-11  (...or whatever else node had been assigned)

5. module purge

6. module load intel/17 
     cd ~/examples/09_special_hardware/binf
     icc -xHost -qopenmp sample.c
     export OMP_NUM_THREADS=8
     ./a.out

<HTML> <!–slide 8–> </HTML>

Working on binf nodes cont.

SLURM submission slrm.sbmt.scrpt

#!/bin/bash
#  usage: sbatch ./slrm.sbmt.scrpt          
#
#SBATCH -J gmxbinfs    
#SBATCH -N 2
#SBATCH --partition binf        
#SBATCH --qos normal_binf         
#SBATCH -C binf        
#SBATCH --ntasks-per-node 24
#SBATCH --ntasks-per-core 1
 
module purge
module load intel/17  intel-mkl/2017  intel-mpi/2017  gromacs/5.1.4_binf
 
export I_MPI_PIN=1
export I_MPI_PIN_PROCESSOR_LIST=0-23
export I_MPI_FABRICS=shm:tmi          
export I_MPI_TMI_PROVIDER=psm2        
export OMP_NUM_THREADS=1      
export MDRUN_ARGS=" -dd 0 0 0 -rdd 0 -rcon 0 -dlb yes -dds 0.8  -tunepme -v -nsteps 10000 " 
 
mpirun -np $SLURM_NTASKS gmx_mpi mdrun ${MDRUN_ARGS}  -s hSERT_5HT_PROD.0.tpr  -deffnm hSERT_5HT_PROD.0  -px hSERT_5HT_PROD.0_px.xvg  -pf hSERT_5HT_PROD.0_pf.xvg  -swap hSERT_5HT_PROD.0.xvg

<HTML> <!–slide 9–> </HTML>

Real-World Example, AMBER-16

PerformancePower Efficiency
  • pandoc/introduction-to-vsc/09_special_hardware/accelerators.1547556169.txt.gz
  • Last modified: 2019/01/15 12:42
  • by pandoc