Quick start guide for VSC-5

This version is outdated by a newer approved version.

This version (2023/02/17 17:45) was approved by msiegel.The Previously approved version (2023/02/09 21:21) is available.

This is an old revision of the document!

Status: 2023/01

This page is under construction.

ssh <user>@vsc5.vsc.ac.at

or alternatively use a specific login node:

ssh <user>@l5[0-6].vsc.ac.at

On VSC-5 the same directories exist as on VSC-4 as we make use of the same IBM Spectrum Scale GPFS storage. Consequently, you will have access to your data in $HOME and $DATA as on VSC-4. There is, however, no BINF filesystem available.

Different CPUs come with different compilers, so we use the new spack feature environment to make sure to choose the right package.

On login the default spack environment (zen3) is loaded automatically, so only modules that run on AMD processors are visible with spack find.

On VSC5 no default modules are loaded. Please do that by yourself using spack load <module> or module load <module>.

Find the official SPACK documentation at https://spack.readthedocs.io/

Type spack env list to see which environments are available and which one is active.

$ spack env list
==> 2 environments
    cascadelake  zen3

The current spack environment is also shown in your prompt:

(zen3) [myname@l55 ~]#

Mind that if your prompt is changed later, like when loading a python environment using conda, the correct spack environment might not be shown correctly in your prompt.

When a spack environment is activated, the command spack find -l lists those packages available for the active environment.

The command module avail will also show only those modules that are compatible with the active spack environment.

If you want to look for a certain package that belongs to another architecture, first change the spack environment:

$ spacktivate <myenv>
$ spacktivate cascadelake

Only then spack find will show the modules for the active environment (e.g. cascadelake).

The following creates a load script for your current spack environment with all loaded modules:

$ spack env loads -r

This creates a file called loads in the environment directory. Sourcing that file in bash will make the environment available to the user. The source loads command can be included in .bashrc files. The loads file may also be copied out of the environment, renamed, etc.

Please always use spack, see SPACK - a package manager for HPC systems.

A program needs to be compiled on the hardware it will later run on. If you have programs compiled for VSC4, they will run on the cascadelake_0384 partition, but not on the default AMD partition that uses AMD processors!

Most nodes of VSC5 are based on AMD processors, including the login nodes. The spack environment zen3 is loaded automatically, so you can use spack load <mypackage> to load what you need, compile your program, and submit a job via slurm.

The nodes have 2x AMD Epyc CPUs (Milan architecture) each equipped with 64 cores. In total there are 128 physical cores (core-id 0-127) and 256 virtual cores available.

The A100 GPU nodes have 512GB RAM and the two NVIDIA A100 cards have 40GB RAM each. 60 A100 nodes are installed.

The A40 GPU nodes have 256GB RAM and the two NVIDIA A40 cards have 46GB each. 45 A40 nodes are installed.

$ nvidia-smi
Tue Apr 26 15:42:00 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  Off  | 00000000:01:00.0 Off |                  Off |
| N/A   40C    P0    35W / 250W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  Off  | 00000000:81:00.0 Off |                  Off |
| N/A   37C    P0    37W / 250W |      0MiB / 40960MiB |     40%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

If you have programs already compiled for VSC4, they will also run on the cascadelake_0384 partition of VSC5. Otherwise you need to login on a cascadelake_0384 node (or any VSC4 node) and compile your program there. Then submit a job on the slurm partition cascadelake_0384.

There are 48 nodes which have 2x Intel Cascadelake CPUs, 48 cores each. In total 96 physical cores (core-id 0-95) and 192 virtual cores are available. Each node has 384GB RAM.

For the exact partition/queue setup see Queue | Partition setup on VSC-5

type sinfo -o %P to see the available partitions:

partition	nodes	description
zen2_0256_a40x2		AMD CPU nodes with 2x AMD Epyc (Milan) and 2x NIVIDA A40 and 256GB RAM
jupyter		reserved for the jupyterhub
login5		login nodes, not an actual slurm partition
zen3_2048		AMD CPU nodes with 2x AMD Epyc (Milan) and 2TB RAM
zen3_1024		AMD CPU nodes with 2x AMD Epyc (Milan) and 1TB RAM
zen3_0512*		The default partition. AMD CPU nodes with 2x AMD Epyc (Milan) and 512GB RAM
cascadelake_0384		Intel CPU nodes with 2x Intel Cascadelake and 384GB RAM
zen3_0512_a100x2		AMD CPU nodes with 2x AMD Epyc (Milan) and 2x NIVIDA A100 and 512GB RAM

The following QoS are available for normal (=non private) projects:

QOS name	gives access to partition	description
zen3_0512	zen3_0512	default
zen3_1024	zen3_1024
zen3_2048	zen3_2048
cascadelake_0384	cascadelake_0384
zen2_0256_a40x2	zen2_0256_a40x2
zen3_0512_a100x2	zen3_0512_a100x2
zen3_0512_devel	5 nodes on zen3_0512

Submit a job in slurm using a job script like this (minimal) example:

defjob.sh

#!/bin/bash
#SBATCH -J <meaningful name for job>
#SBATCH -N 1
./my_program

This will submit a job in the default partition (zen3_0512) using the default QoS (zen3_0512).

To submit a job to the cascadelake nodes:

cascjob.sh

#!/bin/sh
#SBATCH -J <meaningful name for job>
#SBATCH -N 1
#SBATCH --partition=cascadelake_0384
#SBATCH --qos cascadelake_0384
./my_program

Job Scripts for the AMD CPU nodes:

zen3_0512.sh

#!/bin/sh
#SBATCH -J <meaningful name for job>
#SBATCH -N 1
#SBATCH --partition=zen3_0512
#SBATCH --qos zen3_0512
./my_program

zen3_1024.sh

#!/bin/sh
#SBATCH -J <meaningful name for job>
#SBATCH -N 1
#SBATCH --partition=zen3_1024
#SBATCH --qos zen3_1024
./my_program

zen3_2048.sh

#!/bin/sh
#SBATCH -J <meaningful name for job>
#SBATCH -N 1
#SBATCH --partition=zen3_2048
#SBATCH --qos zen3_2048
./my_program

Example job script to use both GPUs on a GPU nodes:

twogpujob.sh

#!/bin/sh
#SBATCH -J <meaningful name for job>
#SBATCH -N 1
#SBATCH --partition=zen3_0512_a100x2
#SBATCH --qos zen3_0512_a100x2
#SBATCH --gres=gpu:2
./my_program

Example script to use only one GPU on a GPU node:

onegpujob.sh

#!/bin/sh
#SBATCH -J <meaningful name for job>
#SBATCH --partition=zen3_0512_a100x2
#SBATCH --qos zen3_0512_a100x2
#SBATCH --gres=gpu:1
./my_program

Your job will then be constrained to one GPU and will not interfere with a second job on the node. It will not be possible to access the other GPU card not assigned to your job.

More at Submitting batch jobs (SLURM), but bear in mind the different partitions for VSC4!

Official Slurm documentation: https://slurm.schedmd.com

When using Intel-MPI on the AMD nodes and mpirun please set the following environment variable in your job script to allow for correct process pinning:

export I_MPI_PIN_RESPECT_CPUSET=0

Quick start guide for VSC-5

Connecting

Data storage

Loading Modules & Spack Environments

List Spack Environments

Change Spack Environment

Save Spack Environment

Load a Module

Compile Code

AMD: Zen3

Intel: Cascadelake

SLURM

QoS

Submit a Job

Intel MPI