Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
doku:vsc4_queue [2022/11/04 10:44] – [example] goldenberg | doku:vsc4_queue [2023/05/16 13:39] – msiegel | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Queue | Partition setup on VSC-4 ====== | + | ====== Queue | Partition |
- | On VSC-4, | + | On VSC-4, Nodes of the same type of hardware are grouped to partitions. The quality of service (QOS), former calle // |
- | ===== Hardware types ===== | + | For submitting jobs to [[doku: |
- | There is one type of compute node, which comes in three different memory version, 96 GB, 384 GB and 768 GB. | + | |
- | On VSC-4, the hardware is grouped into so-called | + | <code bash> |
+ | #SBATCH --account=xxxxxx | ||
+ | #SBATCH --partition=skylake_xxxx | ||
+ | #SBATCH --qos=xxxxx_xxxx | ||
+ | </code> | ||
+ | |||
+ | Notes: | ||
+ | |||
+ | * Core hours will be charged to the specified account. | ||
+ | * Account, partition, and qos have to fit together | ||
+ | * If the account is not given, the default account will be used. | ||
+ | * If partition and QOS are not given, default values are '' | ||
+ | |||
+ | |||
+ | ===== Partitions ===== | ||
+ | |||
+ | Nodes of the same type of hardware are grouped to partitions. There are three basic types of compute nodes, all with the same CPU, but with different amount of memory: 96 GB, 384 GB and 768 GB. | ||
+ | |||
+ | These are the partitions on VSC-4: | ||
+ | |||
+ | ^ Partition ^ Nodes ^ Architecture ^ CPU ^ GPU ^ RAM ^ Use ^ | ||
+ | |skylake_0096 | 702 | Intel | | No | 96 GB | The default partition | | ||
+ | |skylake_0384 | 78 | Intel | | No | 384 GB | High Memory partition | | ||
+ | |skylake_0768 | 12 | Intel | | No | 768 GB | Higher Memory partition | | ||
+ | |||
+ | Type '' | ||
+ | |||
+ | For the sake of completeness there are internally used //special// partitions, that can not be selected manually: | ||
+ | |||
+ | ^ Partition ^ Description ^ | ||
+ | | login4 | login nodes, not an actual slurm partition | | ||
+ | | jupyter | reserved for the jupyterhub | | ||
- | ^partition name^ description^ | ||
- | | | | | ||
- | |skylake_0096 | default, nodes with 96 GB of memory | | ||
- | |skylake_0384 | nodes with 384 GB of memory| | ||
- | |skylake_0768 | nodes with 768 GB of memory| | ||
- | |adm_test | reserved for the admin team | | ||
- | |jupyter| reserved for the JupyterHub | | ||
===== Quality of service (QOS) ===== | ===== Quality of service (QOS) ===== | ||
- | Access to node partitions is granted by the so-called < | + | The QOS defines |
- | < | + | |
The QOSs that are assigned to a specific user can be viewed with: | The QOSs that are assigned to a specific user can be viewed with: | ||
+ | |||
< | < | ||
sacctmgr show user `id -u` withassoc format=user, | sacctmgr show user `id -u` withassoc format=user, | ||
</ | </ | ||
- | The default QOS and all QOSs usable are also shown right after login. | ||
- | Generally, it can be distinguished in QOS defined on the generally available compute nodes (mem_0096/ | + | All QOS usable |
- | The < | ||
- | ==== Run time limits ==== | + | ==== QOS, Partitions and Run time limits ==== |
+ | The following QoS are available for all normal (=non private) projects: | ||
- | ^ The QOS's hard run time limits ^ | | ||
- | | | | | ||
- | | skylake_0096 / skylake_0384 / skylake_0768 | ||
- | | idle_0096 / idle_0384 / idle_0768 | ||
- | | private queues | ||
- | | skylake_devel_0096 (up to 5 nodes available) | ||
- | The QOS's run time limits can also be requested via the command | ||
- | < | ||
- | SLURM allows for setting a run time limit //below// the default QOS's run time limit. After the specified time is elapsed, the job is killed: | ||
- | < | ||
- | Acceptable time formats include " | ||
- | ==== sbatch parameters ==== | + | ^ QOS name ^ Gives access to Partition ^ Hard run time limits |
- | For submitting jobs, three parameters are important: | + | | skylake_0096 | skylake_0096 | 72h (3 days) | Default | |
+ | | skylake_0384 | skylake_0384 | 72h (3 days) | High Memory | | ||
+ | | skylake_0768 | skylake_0768 | 72h (3 days) | Higher Memory | | ||
- | < | ||
- | #SBATCH --partition=skylake_xxxx | ||
- | #SBATCH --qos=xxxxx_xxxx | ||
- | #SBATCH --account=xxxxxx | ||
- | </ | ||
- | The core hours will be charged to the specified account. If not specified, the default account ('' | ||
- | === ordinary projects | + | ==== Idle QOS ==== |
- | For ordinary projects the QOSs are: | + | If a project runs out of compute time, jobs of this project |
- | ^QOS name ^ gives access to partition ^description^ | + | |
- | | | | | + | |
- | |skylake_0096 | skylake_0096 | default | | + | |
- | |skylake_0384 | skylake_0384 | | | + | |
- | |skylake_0768 | skylake_0768 | | | + | |
- | |skylake_devel_0096 | 5 nodes on skylake_0096 | | + | |
- | == examples == | + | ^ QOS name ^ Gives access to Partition ^ Hard run time limits ^ Description ^ |
- | < | + | | idle_0096 | skylake_0096 |
- | #SBATCH --partition=skylake_0096 | + | | idle_0384 | skylake_0384 | 24h (1 day) | Projects out of compute time | |
- | #SBATCH --qos=skylake_0096 | + | | idle_0768 | skylake_0768 | 24h (1 day) | Projects out of compute time | |
- | #SBATCH --account=p7xxxx | + | |
- | </ | + | |
- | * Note that partition, qos, and account have to fit together. | + | |
- | * If the account is not given, the default account | + | |
- | * If partition and qos are not given, default values are mem_0096 for both. | + | |
- | === private nodes projects === | ||
- | == example | + | ==== Devel QOS ==== |
- | < | + | The //devel// QOS gives fast feedback to the user when their job is running. Connect to the node where the actual job is running to directly [[doku: |
- | #SBATCH --partition=skylake_0384 | + | |
- | #SBATCH --qos=p7xxx_xxxx | + | ^ QOS name ^ Gives access to Partition ^ Hard run time limits |
+ | | skylake_0096_devel | 5 nodes on skylake_0096 | 10min | | ||
+ | |||
+ | ==== Private Projects ==== | ||
+ | |||
+ | Private projects come with different QOS; nevertheless partition, QOS, and account have to fit together. | ||
+ | |||
+ | ^ QOS name ^ Gives access to Partition ^ Hard run time limits | ||
+ | | p....._0... | ||
+ | |||
+ | For submitting jobs to [[doku: | ||
+ | |||
+ | <code bash> | ||
#SBATCH --account=p7xxxx | #SBATCH --account=p7xxxx | ||
+ | #SBATCH --partition=skylake_xxxx | ||
+ | #SBATCH --qos=p7xxx_xxxx | ||
</ | </ | ||
+ | ==== Run time ==== | ||
+ | |||
+ | The QOS's run time limits can also be requested via the command | ||
+ | |||
+ | < | ||
+ | |||
+ | If you know how long your job usually runs, you can set the run time limit in SLURM: | ||
+ | |||
+ | < | ||
+ | #SBATCH --time=< | ||
+ | </ | ||
+ | |||
+ | Of course this has to be //below// the default QOS's run time limit. Your job might start earlier, which is nice; But after the specified time is elapsed, the job is killed! | ||
+ | |||
+ | Acceptable time formats include: | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " | ||
+ | * " |