Differences

no way to compare when less than two revisions

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+====== Special hardware (GPUs, KNLs, binfs) available & how to use it ======
+  * Article written by Siegfried Höfinger (VSC Team) <html><br></html>(last update 2019-01-15 by sh).
+====== TOP500 List November 2018 ======
+<HTML>
+<!--slide 1-->
+<!--for nations flags see https://www.free-country-flags.com-->
+</HTML>
+^  Rank^Nation                                                                        ^Machine            ^   Performance^Accelerators                                                                    ^
+|    1.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Summit             |  144 PFLOPs/s|<html><font color="navy"></html>NVIDIA V100<html></font></html>                 |
+|    2.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Sierra             |   95 PFLOPs/s|<html><font color="navy"></html>NVIDIA V100<html></font></html>                 |
+|    3.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:cn.png?0x24}}  |Sunway TaihuLight  |   93 PFLOPs/s|                                                                                |
+|    4.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:cn.png?0x24}}  |Tianhe-2A          |   62 PFLOPs/s|                                                                                |
+|    5.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:ch.png?0x24}}  |Piz Daint          |   21 PFLOPs/s|<html><font color="navy"></html>NVIDIA P100<html></font></html>                 |
+|    6.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Trinity            |   20 PFLOPs/s|<html><font color="#cc3300"></html>Intel Xeon Phi 7250/KNL<html></font></html>  |
+|    7.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:jp.png?0x24}}  |ABCI               |   20 PFLOPs/s|<html><font color="navy"></html>NVIDIA V100<html></font></html>                 |
+|    8.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:de.png?0x24}}  |SuperMUC-NG        |   20 PFLOPs/s|                                                                                |
+|    9.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Titan              |   18 PFLOPs/s|<html><font color="navy"></html>NVIDIA K20x<html></font></html>                 |
+|   10.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Sequoia            |   17 PFLOPs/s|                                                                                |
+<HTML>
+<!--slide 2-->
+</HTML>
+====== Components on VSC-3 ======
+^Model                                                                                                        ^#cores^Clock Freq (GHz)^Memory (GB)^Bandwidth (GB/s)^TDP (Watt)^FP32/FP64 (GFLOPs/s)^
+|<html><font color="navy"></html>36+50x GeForce GTX-1080 n7[1-3]-[001-004,001-022,001-028]<html></font></html>|      |                |           |                |          |                    |
+|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:nvidia-gtx-1080.jpg}}                         |2560  |1.61            |8          |320             |180       |8228/257            |
+|<html><font color="navy"></html>4x Tesla k20m n72-02[4-5]<html></font></html>                                |      |                |           |                |          |                    |
+|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:nvidia-k20m.png}}                             |2496  |0.71            |5          |208             |195       |3520/1175           |
+|<html><font color="navy"></html>2x Tesla m60 n72-023]<html></font></html>                                    |      |                |           |                |          |                    |
+|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:m60.jpeg}}                                    |2048  |1.18            |8          |160             |265       |8500/266            |
+|<html><font color="#cc3300"></html>4x KNL 7210 n25-05[0-3]<html></font></html>                               |      |                |           |                |          |                    |
+|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:intel-knl.png}}                               |64    |1.30            |197+16<384 |102             |215       |5000+/2500+         |
+<HTML>
+<!--slide 3-->
+</HTML>
+====== Working on GPU nodes ======
+**Interactive mode**
+<code>
+. salloc -N 1 -p gpu_gtx1080single --qos gpu_gtx1080single --gres gpu:1  (...perhaps -L intel@vsc)
+. squeue -u training
+. srun -n 1 hostname  (...while still on the login node !)
+. ssh n72-012  (...or whatever else node had been assigned)
+. module load cuda/9.1.85
+     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
+     nvcc ./matrixMul.cu
+     ./a.out
+     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
+     nvcc matrixMulCUBLAS.cu -lcublas
+     ./a.out
+. nvidia-smi
+. /opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery
+</code>
+<HTML>
+<!--slide 4-->
+</HTML>
+====== Working on GPU nodes cont. ======
+**SLURM submission** [[examples/gpu_gtx1080/gpu_test.scrpt|gpu_test.scrpt]]
+<code bash>
+#!/bin/bash
+#  usage: sbatch ./gpu_test.scrpt
+#
+#SBATCH -J gtx1080
+#SBATCH -N 1
+#SBATCH --partition gpu_gtx1080single
+#SBATCH --qos gpu_gtx1080single
+#SBATCH --gres gpu:1
+module purge
+module load cuda/9.1.85
+nvidia-smi
+/opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery
+</code>
+<html><font color="navy"></html>**Exercise/Example/Problem:**<html></font></html> <html><br/></html> Using interactive mode or batch submission, figure out whether we have ECC enabled on GPUs of type gtx1080 ?
+<HTML>
+<!--slide 5-->
+</HTML>
+====== Working on KNL nodes ======
+**Interactive mode**
+<code>
+. salloc -N 1 -p knl --qos knl -C knl -L intel@vsc
+. squeue -u training
+. srun -n 1 hostname   (...while still on the login node !)
+. ssh n25-050  (...or whatever else node had been assigned)
+. module purge
+. module load intel/18
+     cd ~/examples/09_special_hardware/knl
+     icc -xHost -qopenmp sample.c
+     export OMP_NUM_THREADS=16
+     ./a.out
+</code>
+<HTML>
+<!--slide 6-->
+</HTML>
+====== Working on KNL nodes cont. ======
+**SLURM submission** [[examples/knl/knl_test.scrpt|knl_test.scrpt]]
+<code bash>
+#!/bin/bash
+#  usage: sbatch ./knl_test.scrpt
+#
+#SBATCH -J knl
+#SBATCH -N 1
+#SBATCH --partition knl
+#SBATCH --qos knl
+#SBATCH -C knl
+#SBATCH -L intel@vsc
+module purge
+module load intel/18
+cat /proc/cpuinfo
+export OMP_NUM_THREADS=16
+./a.out
+</code>
+<html><font color="#cc3300"></html>**Exercise/Example/Problem:**<html></font></html> <html><br/></html> Given our KNL model, can you determine the current level of hyperthreading, ie 2x, 3x, 4x, whatever-x ?
+<HTML>
+<!--slide 7-->
+</HTML>
+====== Working on binf nodes ======
+**Interactive mode**
+<code>
+. salloc -N 1 -p binf --qos normal_binf -C binf -L intel@vsc
+. squeue -u training
+. srun -n 4 hostname   (...while still on the login node !)
+. ssh binf-11  (...or whatever else node had been assigned)
+. module purge
+. module load intel/17
+     cd ~/examples/09_special_hardware/binf
+     icc -xHost -qopenmp sample.c
+     export OMP_NUM_THREADS=8
+     ./a.out
+</code>
+<HTML>
+<!--slide 8-->
+</HTML>
+====== Working on binf nodes cont. ======
+**SLURM submission** [[examples/binf/gromacs-5.1.4_binf/slrm.sbmt.scrpt|slrm.sbmt.scrpt]]
+<code bash>
+#!/bin/bash
+#  usage: sbatch ./slrm.sbmt.scrpt
+#
+#SBATCH -J gmxbinfs
+#SBATCH -N 2
+#SBATCH --partition binf
+#SBATCH --qos normal_binf
+#SBATCH -C binf
+#SBATCH --ntasks-per-node 24
+#SBATCH --ntasks-per-core 1
+module purge
+module load intel/17  intel-mkl/2017  intel-mpi/2017  gromacs/5.1.4_binf
+export I_MPI_PIN=1
+export I_MPI_PIN_PROCESSOR_LIST=0-23
+export I_MPI_FABRICS=shm:tmi
+export I_MPI_TMI_PROVIDER=psm2
+export OMP_NUM_THREADS=1
+export MDRUN_ARGS=" -dd 0 0 0 -rdd 0 -rcon 0 -dlb yes -dds 0.8  -tunepme -v -nsteps 10000 "
+mpirun -np $SLURM_NTASKS gmx_mpi mdrun ${MDRUN_ARGS}  -s hSERT_5HT_PROD.0.tpr  -deffnm hSERT_5HT_PROD.0  -px hSERT_5HT_PROD.0_px.xvg  -pf hSERT_5HT_PROD.0_pf.xvg  -swap hSERT_5HT_PROD.0.xvg
+</code>
+<HTML>
+<!--slide 9-->
+</HTML>
+====== Real-World Example, AMBER-16 ======
+^                                                                        Performance^Power Efficiency                                                                       ^
+|  {{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:amber16.perf.png}}|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:amber16.powereff.png}}  |