====== GPUs available & how to use it ======

===== TOP500 List June 2024 =====

^  Rank^Nation             ^Machine            ^   Performance^Accelerators                                                             ^
|    1.|{{.:us.png?0x24}}  | Frontier          | 1206 PFLOPs/s | AMD Instinct MI250X   |
|    2.|{{.:us.png?0x24}}  | Aurora            | 1012 PFLOPs/s | Intel Data Center GPU Max          |
|    3.|{{.:us.png?0x24}}  | Eagle             | 561 PFLOPs/s  | NVIDIA H100          |
|    4.|{{.:jp.png?0x24}}  | Fugaku            | 442 PFLOPs/s  |  |
|    5.|{{.:640px-flag_of_finland.svg.png?nolink&24}}                   | LUMI              | 379 PFLOPs/s  | AMD Instinct MI250X       |
|    6.|{{.:ch.png?0x24}}  | Alps              | 270 PFLOPs/s  | NVIDIA GH200 Superchip          |
|    7.|{{.:it.png?0x24}}  | Leonardo          | 241 PFLOPs/s  | NVIDIA A100 SXM4          |
|    8.|{{.:640px-bandera_de_espana.svg.png?nolink&24}}                   | MareNostrum 5 ACC | 175 PFLOPs/s  | NVIDIA H100         |
|    9.|{{.:us.png?0x24}}  | Summit            | 148 PFLOPs/s  | NVIDIA V100          |
|   10.|{{.:us.png?0x24}}  | Eos NVIDIA DGX    | 121 PFLOPs/s  | NVIDIA H100      |

===== Components on VSC-5 =====

^Model ^#cores  ^Clock Freq (GHz)^Memory (GB)^Bandwidth (GB/s)^TDP (Watt)^FP32/FP64 (GFLOPs/s)^
|19x GeForce RTX-2080Ti n375-[001-019] - only in a special project        |                |           |                |          |                    |
|{{:pandoc:introduction-to-vsc:09_special_hardware:rtx-2080.jpg?nolink&200}} |4352|1.35              |11         |616             |250       |13450/420           |
|45x2 nVidia A40 n306[6,7,8]-[001-019,001-019,001-007]   |        |                |           |                |          |                    |
|{{ :pandoc:introduction-to-vsc:09_special_hardware:a40.jpg?nolink&200|}} |10752    |1.305           |48        |696      |300       |37400/1169     |
|62x2 nVidia A100-40GB n307[1-4]-[001-015]       |                |           |                |          |                    |
|{{ :pandoc:introduction-to-vsc:09_special_hardware:a100.jpg?nolink&200|}} |6912    |0.765           |40        |1555      |250       |19500/9700     |


==== Working on GPU nodes Interactively ====

**Interactive mode**

<code>
1. VSC-5 >  salloc -N 1 -p zen2_0256_a40x2 --qos  zen2_0256_a40x2 --gres=gpu:2

2. VSC-5 >  squeue -u $USER

3. VSC-5 >  srun -n 1 hostname  (...while still on the login node !)

4. VSC-5 >  ssh n3066-012  (...or whatever else node had been assigned)

5. VSC-5 >  module load cuda/9.1.85    
            cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMul
            nvcc ./matrixMul.cu  
            ./a.out 

            cd ~/examples/09_special_hardware/gpu_gtx1080/matrixMulCUBLAS
            nvcc matrixMulCUBLAS.cu -lcublas
            ./a.out

6. VSC-5 >  nvidia-smi

7. VSC-5 >  /opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery
</code>

===== Working on GPU using SLURM =====

**SLURM submission** gpu_test.scrpt

<code bash>
#!/bin/bash
#
#  usage: sbatch ./gpu_test.scrpt          
#
#SBATCH -J A40     
#SBATCH -N 1                           #use -N only if you use both GPUs on the nodes, otherwise leave this line out
#SBATCH --partition zen2_0256_a40x2
#SBATCH --qos zen2_0256_a40x2
#SBATCH --gres=gpu:2                   #or --gres=gpu:1 if you only want to use half a node

module purge
module load cuda/9.1.85

nvidia-smi
/opt/sw/x86_64/glibc-2.17/ivybridge-ep/cuda/9.1.85/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery/deviceQuery      
</code>


===== Real-World Example, AMBER-16 =====

^             Performance^Power Efficiency            ^
|  {{.:amber16.perf.png}}|{{.:amber16.powereff.png}}  |