Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
doku:vsc5_queue [2023/05/16 12:02] – msiegel | doku:vsc5_queue [2023/09/27 17:49] – mblasch | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Queue | Partition setup on VSC-5 ====== | + | ====== Queue | Partition |
- | On VSC-5, | + | On VSC-5, Nodes of the same type of hardware are grouped to partitions. The quality of service (QOS), former calle // |
+ | |||
+ | For submitting jobs to [[doku: | ||
+ | |||
+ | <code bash> | ||
+ | #SBATCH --account=xxxxxx | ||
+ | #SBATCH --partition=xxxxx_xxxx | ||
+ | #SBATCH --qos=xxxxx_xxxx | ||
+ | </ | ||
+ | |||
+ | Notes: | ||
+ | |||
+ | * Core hours will be charged to the specified account. | ||
+ | * Account, partition, and qos have to fit together | ||
+ | * If the account is not given, the default account will be used. | ||
+ | * If partition and QOS are not given, default values are '' | ||
===== Partitions ===== | ===== Partitions ===== | ||
- | There are three basic types of hardware that differ in architecture: | + | |
- | * Intel CPU nodes: | + | Nodes of the same type of hardware are grouped to partitions, there are three basic types: |
- | * AMD CPU nodes: | + | * Intel CPU nodes: |
+ | * AMD CPU nodes: | ||
* GPU nodes: there are two versions, one with Zen2 CPUs, 256GB RAM and 2x nVidia A40 GPUs, and one with Zen3 CPUs, 512GB RAM and 2x nVidia A100 GPUs. | * GPU nodes: there are two versions, one with Zen2 CPUs, 256GB RAM and 2x nVidia A40 GPUs, and one with Zen3 CPUs, 512GB RAM and 2x nVidia A100 GPUs. | ||
- | On VSC-5, the hardware is grouped into so-called < | + | These are the partitions on VSC-5: |
^ Partition ^ Nodes ^ Architecture ^ CPU ^ GPU ^ RAM ^ Use ^ | ^ Partition ^ Nodes ^ Architecture ^ CPU ^ GPU ^ RAM ^ Use ^ | ||
- | | zen3_0512* | 564 | AMD | AMD 7713 (64 Core/CPU) | No | 512 GB | The default partition | | + | | zen3_0512* | 564 | AMD | 2x AMD 7713 (64 Core/CPU) | No | 512 GB | The default partition | |
- | | zen3_1024 | 120 | AMD | AMD 7713 (64 Core/CPU) | No | 1 TB | High Memory partition | | + | | zen3_1024 | 120 | AMD | 2x AMD 7713 (64 Core/CPU) | No | 1 TB | High Memory partition | |
- | | zen3_2048 |20 | AMD | AMD 7713 (64 Core/CPU) | No | 2 TB | Higher Memory partition | | + | | zen3_2048 | 20 | AMD | 2x AMD 7713 (64 Core/CPU) | No | 2 TB | Higher Memory partition | |
- | | cascadelake_0384 | | Intel | 2x Intel Cascadelake | No | 384 GB | Directly use programs compiled for VSC-4 | | + | | cascadelake_0384 | 48 | Intel | 2x Intel Cascadelake | No | 384 GB | Directly use programs compiled for VSC-4 | |
| zen2_0256_a40x2 | 45 | AMD | 2x AMD 7252 (8 Core/CPU) | 2x NIVIDA A40 | 256 GB | Best for single precision GPU code | | | zen2_0256_a40x2 | 45 | AMD | 2x AMD 7252 (8 Core/CPU) | 2x NIVIDA A40 | 256 GB | Best for single precision GPU code | | ||
| zen3_0512_a100x2 | 60 | AMD | 2x AMD 7713 (64 Core/CPU) | 2x NIVIDA A100 | 512 GB | Best for double precision GPU code | | | zen3_0512_a100x2 | 60 | AMD | 2x AMD 7713 (64 Core/CPU) | 2x NIVIDA A100 | 512 GB | Best for double precision GPU code | | ||
- | Internally | + | Type '' |
+ | |||
+ | For the sake of completeness there are internally | ||
^ Partition ^ Description ^ | ^ Partition ^ Description ^ | ||
| login5 | login nodes, not an actual slurm partition | | | login5 | login nodes, not an actual slurm partition | | ||
- | | jupyter | + | | rackws5 |
+ | | _jupyter | variations of zen3, a40 and a100 nodes reserved for the jupyterhub | | ||
===== Quality of service (QOS) ===== | ===== Quality of service (QOS) ===== | ||
- | Access to node partitions is granted by the so-called < | + | The QOS defines |
- | < | + | |
- | For submitting jobs to [[doku: | + | The QOSs that are assigned |
- | + | ||
- | <code bash> | + | |
- | #SBATCH --account=xxxxxx | + | |
- | #SBATCH --partition=xxxxx_xxxx | + | |
- | #SBATCH --qos=xxxxx_xxxx | + | |
- | </ | + | |
- | + | ||
- | Notes: | + | |
- | + | ||
- | * Core hours will be charged | + | |
- | * Note that account, partition, and qos have to fit together | + | |
- | * If the account is not given, the default account will be used. | + | |
- | * If partition and qos are not given, default values are '' | + | |
- | The QOSs that are assigned to a specific user can be viewed with: | ||
< | < | ||
sacctmgr show user `id -u` withassoc format=user, | sacctmgr show user `id -u` withassoc format=user, | ||
</ | </ | ||
- | The default | + | |
+ | All QOS usable are also shown right after login. | ||
==== QOS, Partitions and Run time limits ==== | ==== QOS, Partitions and Run time limits ==== | ||
- | The following QoS are available for normal (=non private) projects: | + | The following QoS are available for all normal (=non private) projects: |
^ QOS name ^ Gives access to Partition ^ Hard run time limits | ^ QOS name ^ Gives access to Partition ^ Hard run time limits | ||
- | |zen3_0512 | zen3_0512 | 72h (3 days) | default | + | |zen3_0512 | zen3_0512 | 72h (3 days) | Default |
- | |zen3_1024 | zen3_1024 | 72h (3 days) | | + | |zen3_1024 | zen3_1024 | 72h (3 days) | High Memory |
- | |zen3_2048 | zen3_2048 | 72h (3 days) | | + | |zen3_2048 | zen3_2048 | 72h (3 days) | Higher Memory |
|cascadelake_0384 | cascadelake_0384 | 72h (3 days) | | |cascadelake_0384 | cascadelake_0384 | 72h (3 days) | | ||
- | |zen2_0256_a40x2 | zen2_0256_a40x2 | 72h (3 days) | | + | |zen2_0256_a40x2 | zen2_0256_a40x2 | 72h (3 days) | GPU Nodes | |
- | |zen3_0512_a100x2 | zen3_0512_a100x2 | 72h (3 days) | | + | |zen3_0512_a100x2 | zen3_0512_a100x2 | 72h (3 days) | GPU Nodes | |
- | |zen3_0512_devel | 5 nodes on zen3_0512 | 10min | | + | |zen3_0512_devel | 5 nodes on zen3_0512 | 10min | Fast Feedback |
==== Idle QOS ==== | ==== Idle QOS ==== | ||
- | Generally, it can be distinguished in QOS defined on the generally available compute nodes and on private nodes. Furthermore, | + | If a project |
- | ^ QOS name ^ Gives access to Partition ^ Hard run time limits ^ | + | There is no // |
- | | idle_0512 | zen3_0512 | 24h (1 day) | projects out of compute time | | + | |
- | | idle_1024 | zen3_1024 | 24h (1 day) | projects out of compute time | | + | |
- | | idle_2048 | zen3_2048 | 24h (1 day) | projects out of compute time | | + | |
+ | ^ QOS name ^ Gives access to Partition ^ Hard run time limits ^ Description ^ | ||
+ | | idle_0512 | zen3_0512 | 24h (1 day) | Projects out of compute time | | ||
+ | | idle_1024 | zen3_1024 | 24h (1 day) | Projects out of compute time | | ||
+ | | idle_2048 | zen3_2048 | 24h (1 day) | Projects out of compute time | | ||
- | ==== zen3_0512_devel ==== | ||
- | The < | + | ==== Devel QOS ==== |
+ | |||
+ | The //devel// | ||
^ QOS name ^ Gives access to Partition ^ Hard run time limits | ^ QOS name ^ Gives access to Partition ^ Hard run time limits | ||
|zen3_0512_devel | 5 nodes on zen3_0512 | 10min | | |zen3_0512_devel | 5 nodes on zen3_0512 | 10min | | ||
- | ==== Private | + | ==== Private |
- | Private projects come with different | + | Private projects come with different |
^ QOS name ^ Gives access to Partition ^ Hard run time limits | ^ QOS name ^ Gives access to Partition ^ Hard run time limits | ||
- | | private queues | + | | p....._0... |
For submitting jobs to [[doku: | For submitting jobs to [[doku: | ||
<code bash> | <code bash> | ||
- | #SBATCH --account=p7xxxx | + | #SBATCH --account=pxxxxx |
- | #SBATCH --partition=zen3_0512 | + | #SBATCH --partition=zen3_xxxx |
- | #SBATCH --qos=p7xxx_xxxx | + | #SBATCH --qos=pxxxx_xxxx |
</ | </ | ||
- | ==== Run time limits | + | ==== Run time ==== |
The QOS's run time limits can also be requested via the command | The QOS's run time limits can also be requested via the command | ||
Line 105: | Line 113: | ||
< | < | ||
- | SLURM allows for setting a run time limit // | + | If you know how long your job usually runs, you can set the run time limit in SLURM: |
+ | |||
+ | < | ||
+ | #SBATCH --time=< | ||
+ | </ | ||
- | < | + | Of course this has to be //below// the default QOS's run time limit. Your job might start earlier, which is nice; But after the specified |
- | Acceptable time formats include " | + | Acceptable time formats include: |
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||