Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
doku:vsc5_queue [2023/05/16 12:02] – msiegel | doku:vsc5_queue [2024/04/25 13:17] – [Partitions] goldenberg | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Queue | Partition setup on VSC-5 ====== | + | ====== Queue | Partition |
- | On VSC-5, | + | On VSC-5, Nodes of the same type of hardware are grouped to partitions. The quality of service (QOS), former calle // |
- | + | ||
- | ===== Partitions ===== | + | |
- | There are three basic types of hardware that differ in architecture: | + | |
- | * Intel CPU nodes: there is only one variant with Cascadelake CPUs and 368GB RAM. | + | |
- | * AMD CPU nodes: they all have Zen3 CPU nodes, but come in three memeory versions - 512GB, 1TB and 2TB RAM. | + | |
- | * GPU nodes: there are two versions, one with Zen2 CPUs, 256GB RAM and 2x nVidia A40 GPUs, and one with Zen3 CPUs, 512GB RAM and 2x nVidia A100 GPUs. | + | |
- | + | ||
- | On VSC-5, the hardware is grouped into so-called < | + | |
- | + | ||
- | ^ Partition ^ Nodes ^ Architecture ^ CPU ^ GPU ^ RAM ^ Use ^ | + | |
- | | zen3_0512* | 564 | AMD | AMD 7713 (64 Core/CPU) | No | 512 GB | The default partition | | + | |
- | | zen3_1024 | 120 | AMD | AMD 7713 (64 Core/CPU) | No | 1 TB | High Memory partition | | + | |
- | | zen3_2048 |20 | AMD | AMD 7713 (64 Core/CPU) | No | 2 TB | Higher Memory partition | | + | |
- | | cascadelake_0384 | | Intel | 2x Intel Cascadelake | No | 384 GB | Directly use programs compiled for VSC-4 | | + | |
- | | zen2_0256_a40x2 | 45 | AMD | 2x AMD 7252 (8 Core/CPU) | 2x NIVIDA A40 | 256 GB | Best for single precision GPU code | | + | |
- | | zen3_0512_a100x2 | 60 | AMD | 2x AMD 7713 (64 Core/CPU) | 2x NIVIDA A100 | 512 GB | Best for double precision GPU code | | + | |
- | + | ||
- | Internally used special partitions, that can not be selected manually: | + | |
- | + | ||
- | ^ Partition ^ Description ^ | + | |
- | | login5 | login nodes, not an actual slurm partition | | + | |
- | | jupyter | reserved for the jupyterhub | | + | |
- | + | ||
- | + | ||
- | ===== Quality of service (QOS) ===== | + | |
- | + | ||
- | Access to node partitions is granted by the so-called < | + | |
- | < | + | |
For submitting jobs to [[doku: | For submitting jobs to [[doku: | ||
Line 41: | Line 13: | ||
* Core hours will be charged to the specified account. | * Core hours will be charged to the specified account. | ||
- | * Note that account, partition, and qos have to fit together | + | * Account, partition, and qos have to fit together |
* If the account is not given, the default account will be used. | * If the account is not given, the default account will be used. | ||
- | * If partition and qos are not given, default values are '' | + | * If partition and QOS are not given, default values are '' |
+ | |||
+ | |||
+ | ===== Partitions ===== | ||
+ | |||
+ | Nodes of the same type of hardware are grouped to partitions, there are three basic types: | ||
+ | * Intel CPU nodes: There is only one variant with Cascadelake CPUs and 368GB RAM. | ||
+ | * AMD CPU nodes: They all have Zen3 CPU nodes, but come in three memory versions - 512GB, 1TB and 2TB RAM. | ||
+ | * GPU nodes: there are two versions, one with Zen2 CPUs, 256GB RAM and 2x nVidia A40 GPUs, and one with Zen3 CPUs, 512GB RAM and 2x nVidia A100 GPUs. | ||
+ | |||
+ | These are the partitions on VSC-5: | ||
+ | |||
+ | ^ Partition ^ Nodes ^ Architecture ^ CPU ^ Cores per CPU (physical/ | ||
+ | | zen3_0512* | 564 | AMD | 2x AMD 7713 | 64/128 | No | 512 GB | The default partition | | ||
+ | | zen3_1024 | 120 | AMD | 2x AMD 7713 | 64/128 | No | 1 TB | High Memory partition | | ||
+ | | zen3_2048 | 20 | AMD | 2x AMD 7713 | 64/128 | No | 2 TB | Higher Memory partition | | ||
+ | | cascadelake_0384 | 48 | Intel | 2x Intel Cascadelake | 48/96 | No | 384 GB | Directly use programs compiled for VSC-4 | | ||
+ | | zen2_0256_a40x2 | 45 | AMD | 2x AMD 7252 | 8/16 | 2x NIVIDA A40 | 256 GB | Best for single precision GPU code | | ||
+ | | zen3_0512_a100x2 | 60 | AMD | 2x AMD 7713 | 64/128 | 2x NIVIDA A100 | 512 GB | Best for double precision GPU code | | ||
+ | |||
+ | Type '' | ||
+ | |||
+ | For the sake of completeness there are internally used //special// partitions, that can not be selected manually: | ||
+ | |||
+ | ^ Partition ^ Description ^ | ||
+ | | login5 | login nodes, not an actual slurm partition | | ||
+ | | rackws5 | GUI login nodes, not an actual slurm partition | | ||
+ | | _jupyter | variations of zen3, a40 and a100 nodes reserved for the jupyterhub | | ||
+ | |||
+ | |||
+ | ===== Quality of service (QOS) ===== | ||
+ | |||
+ | The QOS defines the maximum run time of a job and the number and type of allocate-able nodes. | ||
The QOSs that are assigned to a specific user can be viewed with: | The QOSs that are assigned to a specific user can be viewed with: | ||
+ | |||
< | < | ||
sacctmgr show user `id -u` withassoc format=user, | sacctmgr show user `id -u` withassoc format=user, | ||
</ | </ | ||
- | The default | + | |
+ | All QOS usable are also shown right after login. | ||
==== QOS, Partitions and Run time limits ==== | ==== QOS, Partitions and Run time limits ==== | ||
- | The following QoS are available for normal (=non private) projects: | + | The following QoS are available for all normal (=non private) projects: |
^ QOS name ^ Gives access to Partition ^ Hard run time limits | ^ QOS name ^ Gives access to Partition ^ Hard run time limits | ||
- | |zen3_0512 | zen3_0512 | 72h (3 days) | default | + | |zen3_0512 | zen3_0512 | 72h (3 days) | Default |
- | |zen3_1024 | zen3_1024 | 72h (3 days) | | + | |zen3_1024 | zen3_1024 | 72h (3 days) | High Memory |
- | |zen3_2048 | zen3_2048 | 72h (3 days) | | + | |zen3_2048 | zen3_2048 | 72h (3 days) | Higher Memory |
|cascadelake_0384 | cascadelake_0384 | 72h (3 days) | | |cascadelake_0384 | cascadelake_0384 | 72h (3 days) | | ||
- | |zen2_0256_a40x2 | zen2_0256_a40x2 | 72h (3 days) | | + | |zen2_0256_a40x2 | zen2_0256_a40x2 | 72h (3 days) | GPU Nodes | |
- | |zen3_0512_a100x2 | zen3_0512_a100x2 | 72h (3 days) | | + | |zen3_0512_a100x2 | zen3_0512_a100x2 | 72h (3 days) | GPU Nodes | |
- | |zen3_0512_devel | 5 nodes on zen3_0512 | 10min | | + | |zen3_0512_devel | 5 nodes on zen3_0512 | 10min | Fast Feedback |
==== Idle QOS ==== | ==== Idle QOS ==== | ||
- | Generally, it can be distinguished in QOS defined on the generally available compute nodes and on private nodes. Furthermore, | + | If a project |
- | ^ QOS name ^ Gives access to Partition ^ Hard run time limits ^ | + | There is no // |
- | | idle_0512 | zen3_0512 | 24h (1 day) | projects out of compute time | | + | |
- | | idle_1024 | zen3_1024 | 24h (1 day) | projects out of compute time | | + | |
- | | idle_2048 | zen3_2048 | 24h (1 day) | projects out of compute time | | + | |
+ | ^ QOS name ^ Gives access to Partition ^ Hard run time limits ^ Description ^ | ||
+ | | idle_0512 | zen3_0512 | 24h (1 day) | Projects out of compute time | | ||
+ | | idle_1024 | zen3_1024 | 24h (1 day) | Projects out of compute time | | ||
+ | | idle_2048 | zen3_2048 | 24h (1 day) | Projects out of compute time | | ||
- | ==== zen3_0512_devel ==== | ||
- | The < | + | ==== Devel QOS ==== |
+ | |||
+ | The //devel// | ||
^ QOS name ^ Gives access to Partition ^ Hard run time limits | ^ QOS name ^ Gives access to Partition ^ Hard run time limits | ||
|zen3_0512_devel | 5 nodes on zen3_0512 | 10min | | |zen3_0512_devel | 5 nodes on zen3_0512 | 10min | | ||
- | ==== Private | + | ==== Private |
- | Private projects come with different | + | Private projects come with different |
^ QOS name ^ Gives access to Partition ^ Hard run time limits | ^ QOS name ^ Gives access to Partition ^ Hard run time limits | ||
- | | private queues | + | | p....._0... |
For submitting jobs to [[doku: | For submitting jobs to [[doku: | ||
<code bash> | <code bash> | ||
- | #SBATCH --account=p7xxxx | + | #SBATCH --account=pxxxxx |
- | #SBATCH --partition=zen3_0512 | + | #SBATCH --partition=zen3_xxxx |
- | #SBATCH --qos=p7xxx_xxxx | + | #SBATCH --qos=pxxxx_xxxx |
</ | </ | ||
- | ==== Run time limits | + | ==== Run time ==== |
The QOS's run time limits can also be requested via the command | The QOS's run time limits can also be requested via the command | ||
Line 105: | Line 113: | ||
< | < | ||
- | SLURM allows for setting a run time limit // | + | If you know how long your job usually runs, you can set the run time limit in SLURM: |
+ | |||
+ | < | ||
+ | #SBATCH --time=< | ||
+ | </ | ||
- | < | + | Of course this has to be //below// the default QOS's run time limit. Your job might start earlier, which is nice; But after the specified |
- | Acceptable time formats include " | + | Acceptable time formats include: |
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||