This version (2024/10/24 10:28) is a draft.
Approvals: 0/1
The Previously approved version (2022/06/23 12:11) is available.Diff

Queue | Partition setup on VSC-3+

On VSC-3+, the type of hardware and the quality of service (QOS) where the jobs run on may be selected. Nodes of the same type of hardware are grouped to partitions, the QOS defines the maximum run time of a job and the number and type of allocable nodes.

Three different types of compute nodes, nodes with 64 GB and 256 GB, GPU nodes and bioinformatics nodes (very high memory) are available.

On VSC-3+, the hardware is grouped into so-called <html><font color=#cc3300>&#x27A0; partitions</font></html>:

partition name description
vsc3plus_0064 default, nodes with 64 GB of memory
vsc3plus_0256 nodes with 256 GB of memory
gpu_xxxx GPU nodes, partition depending on GPU type
binf Bioinformatics nodes
jupyter reserved for the JupyterHub

For the specific GPU partitions, see GPUs on VSC-3

The partitions of the oil-cooled nodes (normal_0064, normal_0128, normal_0256), the Xeon Phi nodes (knl) and the ARM nodes (arm) have been decommissioned and are no longer available.

Access to node partitions is granted by the so-called <html><font color=#cc3300>&#x27A0; quality of service (QOS)</font></html>. The QOSs constrain the number of allocatable nodes and limit job wall time. The naming scheme of the QOSs is: <project_type>_<memoryConfig>

The QOSs that are assigned to a specific user can be viewed with:

sacctmgr show user `id -u` withassoc format=user,defaultaccount,account,qos%40s,defaultqos%20s

The default QOS and all QOSs usable are also shown right after login.

Generally, it can be distinguished in QOS defined on the ordinary compute nodes (vsc4plus_0064/vsc3plus_0256), on GPUs, on Bioinformatics nodes and on private nodes. Furthermore, there is a distinction whether a project still has available computing time or if the computing time has already been consumed. In the latter case, jobs of this project are running with low job priority and reduced maximum run time limit in the <html><font color=#cc3300>&#x27A0; idle queue</font></html>.

The <html><font color=#cc3300>&#x27A0; devel queue</font></html> (devel_0064) gives fast feed-back to the user if her or his job is running. It is possible to connect to the node where the actual job is running and to directly monitor the job, e.g., for the purpose of checking if the threads/processes are doing what is expected. This might be recommended before sending the job to one of the 'computing' queues.

The QOS's hard run time limits
vsc3plus_0064 / vsc3plus_0256 72h (3 days)
idle_0064 / idle_0256 24h (1 day)
GPU queues gpu_….. 72h (3 days)
normal_binf 24h (1 day)
private queues p….._0… up to 240h (10 days)
devel_0064 (up to 4 nodes available) 10min

The QOS's run time limits can also be requested via the command

sacctmgr show qos  format=name%20s,priority,grpnodes,maxwall,description%40s

SLURM allows for setting a run time limit below the default QOS's run time limit. After the specified time is elapsed, the job is killed:

#SBATCH --time=<time> 

Acceptable time formats include “minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, “days-hours:minutes” and “days-hours:minutes:seconds”.

Furthermore, it is possible to set a minimum time limit on the job allocation. When jobs with different demands of resources are scheduled, it is most likely that not all nodes can be filled. Imagine a job requesting more resources than are currently free and, thus, cannot start. Since it has the highest priority, all other jobs would need to wait. Backfilling means the process to fill up those idle nodes with jobs fitting the unused time gap. This permits the time limited job to be scheduled earlier than jobs with higher piority. It is highly encouraged to guess this minimum time where possible because it also contributes to a better cluster usage:

#SBATCH --time-min=<time>

For submitting jobs, three parameters are important:

#SBATCH --partition=mem_xxxx
#SBATCH --qos=xxxxx_xxxx
#SBATCH --account=xxxxxx

The core hours will be charged to the specified account. If not specified, the default account (sacctmgr show user `id -u` withassoc format=defaultaccount) will be used.

ordinary projects

For ordinary projects the QOSs are:

QOS name gives access to partition description
vsc3plus_0064 vsc3plus_0064 default
vsc3plus_0256 vsc3plus_0256
gpu_…. gpu_xxxxGPU QOS and GPU partition of the same name
normal_binf binf
devel_0064 4 nodes on vsc3plus_0064
examples
#SBATCH --partition=vsc3plus_0064
#SBATCH --qos=vsc3plus_0064   
#SBATCH --account=p7xxxx   
#SBATCH --partition=gpu_a40dual
#SBATCH --qos=gpu_a40dual
#SBATCH --account=p7xxxx
  • Note that partition, qos, and account have to fit together.
  • If the account is not given, the default account (sacctmgr show user `id -u` withassoc format=defaultaccount) will be used.
  • If partition and qos are not given, default values are vsc3plus_0064 for both.

private nodes projects

example
#SBATCH --partition=vsc3plus_xxxx
#SBATCH --qos=p7xxx_xxxx
#SBATCH --account=p7xxxx 
  • doku/vsc3_queue.txt
  • Last modified: 2024/10/24 10:28
  • by 127.0.0.1