Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
doku:slurm [2023/02/17 16:23] msiegeldoku:slurm [2023/03/14 12:15] – [Node configuration - hyperthreading] goldenberg
Line 1: Line 1:
-====== Submitting batch jobs ======+====== SLURM ======
  
-===== SPACK ===== +Contrary to the previously on VSC-1 and VSC-2 employed SGE, the scheduler on VSC-3, VSC-4, and VSC-is [[http://slurm.schedmd.com|SLURM - Simple Linux Utility for Resource Management]]. 
- +
-On VSC-4 and VSC-5, spack is used to install and provide software. See [[doku:spack|SPACK - a package manager for HPC systems]] +
- +
-===== Module environment ===== +
- +
-In order to set environment variables needed for a specific application, the **module** environment may be used: +
-  * ''module avail''     lists the **available** Application-Software, Compilers, Parallel-Environment, and Libraries  +
-  * ''module list''      shows currently loaded package of your session +
-  * ''module unload <xyz>'' unload a particular package <xyz> from your session +
-  * ''module load <xyz>'' load a particular package <xyz> into your session +
-  * ''module display <xyz>'' OR ''module show <xyz>'' show module details such as the full  path  of  the module file and all (or most) of the environment changes the modulefile will make if loaded +
-  * ''module purge'' unloads all loaded modulefiles +
-== Note: == +
- +
-  - **<xyz>** format corresponds exactly to the output of ''module avail''. Thus, in order to load or unload a selected module, copy and paste exactly the name listed by ''module avail''.\\  +
-  - a list of ''module load/unload'' directives may also be included in the top part of a job submission script\\  +
- +
-When all required/intended modules have been loaded, user packages may be compiled as usual. +
- +
-===== SLURM (Simple Linux Utility for Resource Management) ===== +
-Contrary to the previously on VSC 1 and VSC 2 employed SGE, the scheduler on VSC-3 and VSC-is [[http://slurm.schedmd.com|SLURM]]. +
  
 ==== Basic SLURM commands: ==== ==== Basic SLURM commands: ====
-  * ''[...]$ sinfo'' gives information on which 'queues'='partitionsare available for job submission. Note: the under SGE termed 'queue' is called a 'partition' under SLURM.+  * ''[...]$ sinfo'' gives information on which partitions are available for job submission. Note: What SGE on VSC-2 termed 'queue' is now called a 'partition' under SLURM.
   * ''[...]$ scontrol'' is used to view SLURM configuration including: job, job step, node, partition, reservation, and overall system configuration. Without a command entered on the execute line, scontrol operates in an interactive mode and prompt for input. With a command entered on the execute line, scontrol executes that command and terminates.    * ''[...]$ scontrol'' is used to view SLURM configuration including: job, job step, node, partition, reservation, and overall system configuration. Without a command entered on the execute line, scontrol operates in an interactive mode and prompt for input. With a command entered on the execute line, scontrol executes that command and terminates. 
   * ''[...]$ scontrol show job 567890'' shows information on the job with number 567890.   * ''[...]$ scontrol show job 567890'' shows information on the job with number 567890.
   * ''[...]$ scontrol show partition'' shows information on available partitions.   * ''[...]$ scontrol show partition'' shows information on available partitions.
   * ''[...]$ squeue''    to see the current list of submitted jobs, their state and resource allocation. [[doku:slurm_job_reason_codes|Here]] is a description of the most important **job reason codes** returned by the squeue command.   * ''[...]$ squeue''    to see the current list of submitted jobs, their state and resource allocation. [[doku:slurm_job_reason_codes|Here]] is a description of the most important **job reason codes** returned by the squeue command.
 +
 +
 +==== Software Installations and Modules ====
 +
 +On VSC-4 and VSC-5, spack is used to install and provide modules, see [[doku:spack|SPACK - a package manager for HPC systems]]. The methods described in [[doku:modules]] can still be used for backwards compatibility, but we suggest using spack.
  
 ==== Node configuration - hyperthreading ==== ==== Node configuration - hyperthreading ====
  
-The compute nodes of VSC-are configured with the following parameters in SLURM:+The compute nodes of VSC-are configured with the following parameters in SLURM
 +<code> 
 +CoresPerSocket=24 
 +Sockets=2 
 +ThreadsPerCore=2 
 +</code> 
 +And the primary nodes of VSC-5 with:
 <code> <code>
-CoresPerSocket=8+CoresPerSocket=64
 Sockets=2 Sockets=2
 ThreadsPerCore=2 ThreadsPerCore=2
 </code> </code>
-This reflects the fact that <html> <font color=#cc3300> hyperthreading </font> </html> is activated on all compute nodes and <html> <font color=#cc3300> 32 cores </font> </html> may be utilized on each node. +This reflects the fact that <html> <font color=#cc3300> hyperthreading </font> </html> is activated on all compute nodes and <html> <font color=#cc3300> 96 cores on VSC4 and 256 cores on VSC5 </font> </html> may be utilized on each node. 
 In the batch script hyperthreading is selected by adding the line In the batch script hyperthreading is selected by adding the line
 <code>  <code> 
Line 46: Line 36:
 which allows for 2 tasks per core. which allows for 2 tasks per core.
  
-Some codes may experience a performance gain from using all 32 virtual cores, e.g., GROMACS seems to profit. But note that using all virtual cores also leads to more communication and may impact on the performance of large MPI jobs.+Some codes may experience a performance gain from using all virtual cores, e.g., GROMACS seems to profit. But note that using all virtual cores also leads to more communication and may impact on the performance of large MPI jobs.
  
-**NOTE on accounting**: the project's core-h are always calculated as ''job_walltime * nnodes * 16'' (16 physical cores per node). SLURM's built in function ''sreport'' yields wrong accounting statistics because (depending on the job script) the multiplier is 32 instead of 16. You may instead use the accounting script introduced in this [[doku:slurm_sacct|section]].+**NOTE on accounting**: the project's core-h are always calculated as ''job_walltime * nnodes * ncpus'' (number of physical cores per node). SLURM's built in function ''sreport'' yields wrong accounting statistics because (depending on the job script) the multiplier is 'number of virtual cores' instead of 'physical cores'. You may instead use the accounting script introduced in this [[doku:slurm_sacct|section]].
  
 ==== Node allocation policy ==== ==== Node allocation policy ====
  • doku/slurm.txt
  • Last modified: 2024/07/11 09:05
  • by grokyta