Differences

This shows you the differences between two versions of the page.

--- doku:vsc5quickstart [2022/06/23 13:22] – [Submit a Job] jz
+++ doku:vsc5quickstart [2023/02/17 17:47] – msiegel
@@ Line 1: / Line 1: @@
 ====== Quick start guide for VSC-5 ======
-**Status: 2022/04**
+**Status: 2023/01**
 This page is under construction.
@@ Line 120: / Line 120: @@
 physical cores (core-id 0-127) and 256 virtual cores available.
-The A100 GPU nodes have 512GB RAM and the NVIDIA A100 cards have 40GB RAM each.
+The A100 GPU nodes have 512GB RAM and the two NVIDIA A100 cards have 40GB RAM each.
-At the moment 40 GPU nodes are installed.
+A100 nodes are installed.
+The A40 GPU nodes have 256GB RAM and the two NVIDIA A40 cards have 46GB each.
+A40 nodes are installed.
 <code>
 $ nvidia-smi
@@ Line 155: / Line 157: @@
 ===== SLURM =====
+For the partition/queue setup see [[doku:vsc5_queue|Queue | Partition setup on VSC-5]].
-The following partitions are currently available:
+type ''sinfo -o %P'' to see the available partitions.
-<code>
-$ sinfo -o %P
-PARTITION
-gpu_a100_dual* -> Currently the default partition. AMD CPU nodes with 2x AMD Epyc (Milan) and 2x NIVIDA A100 and 512GB RAM
-cascadelake_0384 -> Intel CPU nodes with 2x Intel Cascadelake and 384GB RAM
-zen3_0512 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 512GB RAM
-zen3_1024 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 1TB RAM
-zen3_2048 -> AMD CPU nodes with 2x AMD Epyc (Milan) and 2TB RAM
-</code>
-<code>
-$ sinfo
-PARTITION         AVAIL  TIMELIMIT  NODES  STATE NODELIST
-gpu_a100_dual        up   infinite     40   idle n571-[001-015],n572-[001-015],n573-[001-010]
-cascadelake_0384*    up   infinite     48   idle n451-[001-024],n452-[001-024]
-</code>
-==== QoS ====
-During the friendly user test phase the QoS ''goodluck'' can be used for both partitions.
 ==== Submit a Job ====
@@ Line 187: / Line 168: @@
 #SBATCH -J <meaningful name for job>
 #SBATCH -N 1
-#SBATCH --gres=gpu:2
 ./my_program
 </file>
-This will submit a job in the default partition (gpu_a100_dual) using the default QoS (gpu_a100_dual).
+This will submit a job in the default partition (zen3_0512) using the default QoS (zen3_0512).
 To submit a job to the cascadelake nodes:
@@ Line 199: / Line 179: @@
 #SBATCH -N 1
 #SBATCH --partition=cascadelake_0384
-#SBATCH --qos goodluck
+#SBATCH --qos cascadelake_0384
 ./my_program
 </file>
@@ Line 210: / Line 190: @@
 #SBATCH -N 1
 #SBATCH --partition=zen3_0512
-#SBATCH --qos goodluck
+#SBATCH --qos zen3_0512
 ./my_program
 </file>
-<file sh zen3_1023.sh>
+<file sh zen3_1024.sh>
 #!/bin/sh
 #SBATCH -J <meaningful name for job>
 #SBATCH -N 1
 #SBATCH --partition=zen3_1024
-#SBATCH --qos goodluck
+#SBATCH --qos zen3_1024
 ./my_program
 </file>
-<file sh zen3_1024.sh>
+<file sh zen3_2048.sh>
 #!/bin/sh
 #SBATCH -J <meaningful name for job>
 #SBATCH -N 1
-#SBATCH --partition=zen3_1024
+#SBATCH --partition=zen3_2048
-#SBATCH --qos goodluck
+#SBATCH --qos zen3_2048
 ./my_program
 </file>
@@ Line 238: / Line 218: @@
 #SBATCH -J <meaningful name for job>
 #SBATCH -N 1
-#SBATCH --partition=gpu_a100_dual
+#SBATCH --partition=zen3_0512_a100x2
-#SBATCH --qos goodluck
+#SBATCH --qos zen3_0512_a100x2
 #SBATCH --gres=gpu:2
 ./my_program
@@ Line 248: / Line 228: @@
 #!/bin/sh
 #SBATCH -J <meaningful name for job>
-#SBATCH --partition=gpu_a100_dual
+#SBATCH --partition=zen3_0512_a100x2
-#SBATCH --qos goodluck
+#SBATCH --qos zen3_0512_a100x2
 #SBATCH --gres=gpu:1
 ./my_program
@@ Line 262: / Line 242: @@
 Official Slurm documentation: https://slurm.schedmd.com
+===== Intel MPI =====
+When **using Intel-MPI on the AMD nodes and mpirun** please set the following environment variable in your job script to allow for correct process pinning:
+<code>
+export I_MPI_PIN_RESPECT_CPUSET=0
+</code>