Differences

This shows you the differences between two versions of the page.

--- pandoc:introduction-to-vsc:09_special_hardware:accelerators [2019/01/15 12:42] – Pandoc Auto-commit pandoc
+++ pandoc:introduction-to-vsc:09_special_hardware:accelerators [2024/06/06 15:38] – [TOP500 List June 2024] goldenberg
@@ Line 1: / Line 1: @@
+====== GPUs available & how to use it ======
-====== Special hardware (GPUs, KNLs, binfs) available & how to use it ======
+===== TOP500 List June 2024 =====
-  * Article written by Siegfried Höfinger (VSC Team) <html><br></html>(last update 2019-01-15 by sh).
+^  Rank^Nation             ^Machine            ^   Performance^Accelerators                                                             ^
+|    1.|{{.:us.png?0x24}}  | Frontier          | 1206 PFLOPs/s | AMD Instinct MI250X   |
+|    2.|{{.:us.png?0x24}}  | Aurora            | 1012 PFLOPs/s | Intel Data Center GPU Max          |
+|    3.|{{.:us.png?0x24}}  | Eagle             |  561 PFLOPs/s | NVIDIA H100          |
+|    4.|{{.:jp.png?0x24}}  | Fugaku            |  442 PFLOPs/s |  |
+|    5.|                   | LUMI              |  379 PFLOPs/s | AMD Instinct MI250X       |
+|    6.|{{.:ch.png?0x24}}  | Alps              |  270 PFLOPs/s | NVIDIA GH200 Superchip          |
+|    7.|{{.:it.png?0x24}}  | Leonardo          |  241 PFLOPs/s | NVIDIA A100 SXM4          |
+|    8.|                   | MareNostrum 5 ACC |  175 PFLOPs/s | NVIDIA H100         |
+|    9.|{{.:us.png?0x24}}  | Summit            |  148 PFLOPs/s | NVIDIA V100          |
+|   10.|{{.:us.png?0x24}}  | Eos NVIDIA DGX    |  121 PFLOPs/s | NVIDIA H100      |
-====== TOP500 List November 2018 ======
+===== Components on VSC-5 =====
+^Model ^#cores  ^Clock Freq (GHz)^Memory (GB)^Bandwidth (GB/s)^TDP (Watt)^FP32/FP64 (GFLOPs/s)^
+|19x GeForce RTX-2080Ti n375-[001-019] - only in a special project        |                |           |                |          |                    |
+|{{:pandoc:introduction-to-vsc:09_special_hardware:rtx-2080.jpg?nolink&200}} |4352|1.35              |11         |616             |250       |13450/420           |
+|45x2 nVidia A40 n306[6,7,8]-[001-019,001-019,001-007]   |        |                |           |                |          |                    |
+|{{ :pandoc:introduction-to-vsc:09_special_hardware:a40.jpg?nolink&200|}} |10752    |1.305           |48        |696      |300       |37400/1169     |
+|62x2 nVidia A100-40GB n307[1-4]-[001-015]       |                |           |                |          |                    |
+|{{ :pandoc:introduction-to-vsc:09_special_hardware:a100.jpg?nolink&200|}} |6912    |0.765           |40        |1555      |250       |19500/9700     |
-<HTML>
-<!--slide 1-->
-<!--for nations flags see https://www.free-country-flags.com-->
-</HTML>
-^  Rank^Nation                                                                        ^Machine            ^   Performance^Accelerators                                                                    ^
-|    1.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Summit             |  144 PFLOPs/s|<html><font color="navy"></html>NVIDIA V100<html></font></html>                 |
-|    2.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Sierra             |   95 PFLOPs/s|<html><font color="navy"></html>NVIDIA V100<html></font></html>                 |
-|    3.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:cn.png?0x24}}  |Sunway TaihuLight  |   93 PFLOPs/s|                                                                                |
-|    4.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:cn.png?0x24}}  |Tianhe-2A          |   62 PFLOPs/s|                                                                                |
-|    5.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:ch.png?0x24}}  |Piz Daint          |   21 PFLOPs/s|<html><font color="navy"></html>NVIDIA P100<html></font></html>                 |
-|    6.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Trinity            |   20 PFLOPs/s|<html><font color="#cc3300"></html>Intel Xeon Phi 7250/KNL<html></font></html>  |
-|    7.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:jp.png?0x24}}  |ABCI               |   20 PFLOPs/s|<html><font color="navy"></html>NVIDIA V100<html></font></html>                 |
-|    8.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:de.png?0x24}}  |SuperMUC-NG        |   20 PFLOPs/s|                                                                                |
-|    9.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Titan              |   18 PFLOPs/s|<html><font color="navy"></html>NVIDIA K20x<html></font></html>                 |
-|   10.|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:us.png?0x24}}  |Sequoia            |   17 PFLOPs/s|                                                                                |
+==== Working on GPU nodes Interactively ====
-<HTML>
-<!--slide 2-->
-</HTML>
-====== Components on VSC-3 ======
-^Model                                                                                                        ^#cores^Clock Freq (GHz)^Memory (GB)^Bandwidth (GB/s)^TDP (Watt)^FP32/FP64 (GFLOPs/s)^
-|<html><font color="navy"></html>36+50x GeForce GTX-1080 n7[1-3]-[001-004,001-022,001-028]<html></font></html>|      |                |           |                |          |                    |
-|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:nvidia-gtx-1080.jpg}}                         |2560  |1.61            |8          |320             |180       |8228/257            |
-|<html><font color="navy"></html>4x Tesla k20m n72-02[4-5]<html></font></html>                                |      |                |           |                |          |                    |
-|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:nvidia-k20m.png}}                             |2496  |0.71            |5          |208             |195       |3520/1175           |
-|<html><font color="navy"></html>2x Tesla m60 n72-023]<html></font></html>                                    |      |                |           |                |          |                    |
-|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:m60.jpeg}}                                    |2048  |1.18            |8          |160             |265       |8500/266            |
-|<html><font color="#cc3300"></html>4x KNL 7210 n25-05[0-3]<html></font></html>                               |      |                |           |                |          |                    |
-|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:intel-knl.png}}                               |64    |1.30            |197+16<384 |102             |215       |5000+/2500+         |
-<HTML>
-<!--slide 3-->
-</HTML>
-====== Working on GPU nodes ======
 **Interactive mode**
 <code>
-. salloc -N 1 -p gpu_gtx1080single --qos gpu_gtx1080single --gres gpu:1  (...perhaps -L intel@vsc)
+. VSC-5 >  salloc -N 1 -p zen2_0256_a40x2 --qos  zen2_0256_a40x2 --gres=gpu:2
-. squeue -u training
+. VSC-5 >  squeue -u $USER
-. srun -n 1 hostname  (...while still on the login node !)
+. VSC-5 >  srun -n 1 hostname  (...while still on the login node !)
-. ssh n72-012  (...or whatever else node had been assigned)
+. VSC-5 >  ssh n3066-012  (...or whatever else node had been assigned)
-. module load cuda/9.1.85
+. VSC-5 >  module load cuda/9.1.85
-     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
+            cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
-     nvcc ./matrixMul.cu
+            nvcc ./matrixMul.cu
-     ./a.out
+            ./a.out
-     cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
+            cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
-     nvcc matrixMulCUBLAS.cu -lcublas
+            nvcc matrixMulCUBLAS.cu -lcublas
-     ./a.out
+            ./a.out
-. nvidia-smi
+. VSC-5 >  nvidia-smi
-. /opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery
+. VSC-5 >  /opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery
 </code>
-<HTML>
-<!--slide 4-->
-</HTML>
-====== Working on GPU nodes cont. ======
-**SLURM submission** [[examples/gpu_gtx1080/gpu_test.scrpt|gpu_test.scrpt]]
+===== Working on GPU using SLURM =====
+**SLURM submission** gpu_test.scrpt
 <code bash>
 #!/bin/bash
+#
 #  usage: sbatch ./gpu_test.scrpt
 #
-#SBATCH -J gtx1080
+#SBATCH -J A40
-#SBATCH -N 1
+#SBATCH -N 1                           #use -N only if you use both GPUs on the nodes, otherwise leave this line out
-#SBATCH --partition gpu_gtx1080single
+#SBATCH --partition zen2_0256_a40x2
-#SBATCH --qos gpu_gtx1080single
+#SBATCH --qos zen2_0256_a40x2
-#SBATCH --gres gpu:1
+#SBATCH --gres=gpu:2                   #or --gres=gpu:1 if you only want to use half a node
 module purge
@@ Line 92: / Line 74: @@
 /opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery
 </code>
-<html><font color="navy"></html>**Exercise/Example/Problem:**<html></font></html> <html><br/></html> Using interactive mode or batch submission, figure out whether we have ECC enabled on GPUs of type gtx1080 ?
-<HTML>
-<!--slide 5-->
-</HTML>
-====== Working on KNL nodes ======
-**Interactive mode**
+===== Real-World Example, AMBER-16 =====
-<code>
-. salloc -N 1 -p knl --qos knl -C knl -L intel@vsc
-. squeue -u training
-. srun -n 1 hostname   (...while still on the login node !)
-. ssh n25-050  (...or whatever else node had been assigned)
-. module purge
-. module load intel/18
-     cd ~/examples/09_special_hardware/knl
-     icc -xHost -qopenmp sample.c
-     export OMP_NUM_THREADS=16
-     ./a.out
-</code>
-<HTML>
-<!--slide 6-->
-</HTML>
-====== Working on KNL nodes cont. ======
-**SLURM submission** [[examples/knl/knl_test.scrpt|knl_test.scrpt]]
-<code bash>
-#!/bin/bash
-#  usage: sbatch ./knl_test.scrpt
-#
-#SBATCH -J knl
-#SBATCH -N 1
-#SBATCH --partition knl
-#SBATCH --qos knl
-#SBATCH -C knl
-#SBATCH -L intel@vsc
-module purge
-module load intel/18
-cat /proc/cpuinfo
-export OMP_NUM_THREADS=16
-./a.out
-</code>
-<html><font color="#cc3300"></html>**Exercise/Example/Problem:**<html></font></html> <html><br/></html> Given our KNL model, can you determine the current level of hyperthreading, ie 2x, 3x, 4x, whatever-x ?
-<HTML>
-<!--slide 7-->
-</HTML>
-====== Working on binf nodes ======
-**Interactive mode**
-<code>
-. salloc -N 1 -p binf --qos normal_binf -C binf -L intel@vsc
-. squeue -u training
-. srun -n 4 hostname   (...while still on the login node !)
-. ssh binf-11  (...or whatever else node had been assigned)
-. module purge
-. module load intel/17
-     cd ~/examples/09_special_hardware/binf
-     icc -xHost -qopenmp sample.c
-     export OMP_NUM_THREADS=8
-     ./a.out
-</code>
-<HTML>
-<!--slide 8-->
-</HTML>
-====== Working on binf nodes cont. ======
-**SLURM submission** [[examples/binf/gromacs-5.1.4_binf/slrm.sbmt.scrpt|slrm.sbmt.scrpt]]
-<code bash>
-#!/bin/bash
-#  usage: sbatch ./slrm.sbmt.scrpt
-#
-#SBATCH -J gmxbinfs
-#SBATCH -N 2
-#SBATCH --partition binf
-#SBATCH --qos normal_binf
-#SBATCH -C binf
-#SBATCH --ntasks-per-node 24
-#SBATCH --ntasks-per-core 1
-module purge
-module load intel/17  intel-mkl/2017  intel-mpi/2017  gromacs/5.1.4_binf
-export I_MPI_PIN=1
-export I_MPI_PIN_PROCESSOR_LIST=0-23
-export I_MPI_FABRICS=shm:tmi
-export I_MPI_TMI_PROVIDER=psm2
-export OMP_NUM_THREADS=1
-export MDRUN_ARGS=" -dd 0 0 0 -rdd 0 -rcon 0 -dlb yes -dds 0.8  -tunepme -v -nsteps 10000 "
-mpirun -np $SLURM_NTASKS gmx_mpi mdrun ${MDRUN_ARGS}  -s hSERT_5HT_PROD.0.tpr  -deffnm hSERT_5HT_PROD.0  -px hSERT_5HT_PROD.0_px.xvg  -pf hSERT_5HT_PROD.0_pf.xvg  -swap hSERT_5HT_PROD.0.xvg
-</code>
-<HTML>
-<!--slide 9-->
-</HTML>
-====== Real-World Example, AMBER-16 ======
-^                                                                        Performance^Power Efficiency                                                                       ^
+^             Performance^Power Efficiency            ^
-|  {{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:amber16.perf.png}}|{{:pandoc:introduction-to-vsc:09_special_hardware:accelerators:amber16.powereff.png}}  |
+|  {{.:amber16.perf.png}}|{{.:amber16.powereff.png}}  |