Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
doku:vsc5quickstart [2022/06/23 13:22] – [Submit a Job] jzdoku:vsc5quickstart [2023/02/17 17:47] msiegel
Line 1: Line 1:
 ====== Quick start guide for VSC-5 ====== ====== Quick start guide for VSC-5 ======
  
-**Status: 2022/04**+**Status: 2023/01**
  
 This page is under construction. This page is under construction.
Line 120: Line 120:
 128 physical cores (core-id 0-127) and 256 virtual cores available. 128 physical cores (core-id 0-127) and 256 virtual cores available.
  
-The A100 GPU nodes have 512GB RAM and the NVIDIA A100 cards have 40GB RAM each. +The A100 GPU nodes have 512GB RAM and the two NVIDIA A100 cards have 40GB RAM each. 
-At the moment 40 GPU nodes are installed.+60 A100 nodes are installed.
  
 +The A40 GPU nodes have 256GB RAM and the two NVIDIA A40 cards have 46GB each.
 +45 A40 nodes are installed.
 <code> <code>
 $ nvidia-smi $ nvidia-smi
Line 155: Line 157:
  
 ===== SLURM ===== ===== SLURM =====
- +For the partition/queue setup see [[doku:vsc5_queue|Queue | Partition setup on VSC-5]]. 
-The following partitions are currently available: +type ''sinfo -o %P'' to see the available partitions.
-<code> +
-$ sinfo -o %P +
-PARTITION +
-gpu_a100_dual* -> Currently the default partition. AMD CPU nodes with 2x AMD Epyc (Milan) and 2x NIVIDA A100 and 512GB RAM +
-cascadelake_0384 -> Intel CPU nodes with 2x Intel Cascadelake and 384GB RAM +
-zen3_0512 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 512GB RAM +
-zen3_1024 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 1TB RAM +
-zen3_2048 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 2TB RAM +
-</code> +
- +
- +
-<code> +
-$ sinfo +
-PARTITION         AVAIL  TIMELIMIT  NODES  STATE NODELIST +
-gpu_a100_dual        up   infinite     40   idle n571-[001-015],n572-[001-015],n573-[001-010+
-cascadelake_0384*    up   infinite     48   idle n451-[001-024],n452-[001-024] +
-</code> +
- +
-==== QoS ==== +
- +
-During the friendly user test phase the QoS ''goodluck'' can be used for both partitions.+
  
 ==== Submit a Job ==== ==== Submit a Job ====
Line 187: Line 168:
 #SBATCH -J <meaningful name for job> #SBATCH -J <meaningful name for job>
 #SBATCH -N 1 #SBATCH -N 1
-#SBATCH --gres=gpu:2 
 ./my_program ./my_program
 </file> </file>
  
-This will submit a job in the default partition (gpu_a100_dual) using the default QoS (gpu_a100_dual).+This will submit a job in the default partition (zen3_0512) using the default QoS (zen3_0512).
  
 To submit a job to the cascadelake nodes: To submit a job to the cascadelake nodes:
Line 199: Line 179:
 #SBATCH -N 1 #SBATCH -N 1
 #SBATCH --partition=cascadelake_0384 #SBATCH --partition=cascadelake_0384
-#SBATCH --qos goodluck+#SBATCH --qos cascadelake_0384
 ./my_program ./my_program
 </file> </file>
Line 210: Line 190:
 #SBATCH -N 1 #SBATCH -N 1
 #SBATCH --partition=zen3_0512 #SBATCH --partition=zen3_0512
-#SBATCH --qos goodluck+#SBATCH --qos zen3_0512
 ./my_program ./my_program
 </file> </file>
  
-<file sh zen3_1023.sh>+<file sh zen3_1024.sh>
 #!/bin/sh #!/bin/sh
 #SBATCH -J <meaningful name for job> #SBATCH -J <meaningful name for job>
 #SBATCH -N 1 #SBATCH -N 1
 #SBATCH --partition=zen3_1024 #SBATCH --partition=zen3_1024
-#SBATCH --qos goodluck+#SBATCH --qos zen3_1024
 ./my_program ./my_program
 </file> </file>
  
-<file sh zen3_1024.sh>+<file sh zen3_2048.sh>
 #!/bin/sh #!/bin/sh
 #SBATCH -J <meaningful name for job> #SBATCH -J <meaningful name for job>
 #SBATCH -N 1 #SBATCH -N 1
-#SBATCH --partition=zen3_1024 +#SBATCH --partition=zen3_2048 
-#SBATCH --qos goodluck+#SBATCH --qos zen3_2048
 ./my_program ./my_program
 </file> </file>
Line 238: Line 218:
 #SBATCH -J <meaningful name for job> #SBATCH -J <meaningful name for job>
 #SBATCH -N 1 #SBATCH -N 1
-#SBATCH --partition=gpu_a100_dual +#SBATCH --partition=zen3_0512_a100x2 
-#SBATCH --qos goodluck+#SBATCH --qos zen3_0512_a100x2
 #SBATCH --gres=gpu:2 #SBATCH --gres=gpu:2
 ./my_program ./my_program
Line 248: Line 228:
 #!/bin/sh #!/bin/sh
 #SBATCH -J <meaningful name for job> #SBATCH -J <meaningful name for job>
-#SBATCH --partition=gpu_a100_dual +#SBATCH --partition=zen3_0512_a100x2 
-#SBATCH --qos goodluck+#SBATCH --qos zen3_0512_a100x2
 #SBATCH --gres=gpu:1 #SBATCH --gres=gpu:1
 ./my_program ./my_program
Line 262: Line 242:
  
 Official Slurm documentation: https://slurm.schedmd.com Official Slurm documentation: https://slurm.schedmd.com
 +
 +===== Intel MPI =====
 +
 +When **using Intel-MPI on the AMD nodes and mpirun** please set the following environment variable in your job script to allow for correct process pinning:
 +
 +<code>
 +export I_MPI_PIN_RESPECT_CPUSET=0
 +</code>
  
  • doku/vsc5quickstart.txt
  • Last modified: 2023/05/17 15:28
  • by msiegel