Differences

This shows you the differences between two versions of the page.

--- doku:slurm [2023/02/17 16:23] – msiegel
+++ doku:slurm [2024/07/11 09:05] (current) – grokyta
@@ Line 1: / Line 1: @@
-====== Submitting batch jobs ======
+====== SLURM ======
-===== SPACK =====
+Contrary to the previously on VSC-1 and VSC-2 employed SGE, the scheduler on VSC-3, VSC-4, and VSC-5 is [[http://slurm.schedmd.com|SLURM - Simple Linux Utility for Resource Management]].
-On VSC-4 and VSC-5, spack is used to install and provide software. See [[doku:spack|SPACK - a package manager for HPC systems]]
+==== Basic SLURM commands: ====
+  * ''[...]$ sinfo'' gives information on which partitions are available for job submission. Note: What SGE on VSC-2 termed a 'queue' is now called a 'partition' under SLURM.
+  * ''[...]$ scontrol'' is used to view SLURM configuration including: job, job step, node, partition, reservation, and overall system configuration. Without a command entered on the execute line, scontrol operates in an interactive mode and prompt for input. With a command entered on the execute line, scontrol executes that command and terminates.
+  * ''[...]$ scontrol show job 567890'' shows information on the job with number 567890.
+  * ''[...]$ scontrol show partition'' shows information on available partitions.
+  * ''[...]$ squeue''    to see the current list of submitted jobs, their state and resource allocation. [[doku:slurm_job_reason_codes|Here]] is a description of the most important **job reason codes** returned by the squeue command.
+==== Software Installations and Modules ====
-===== Module environment =====
+On VSC-4 and VSC-5, spack is used to install and provide modules, see [[doku:spack|SPACK - a package manager for HPC systems]]. The methods described in [[doku:modules]] can still be used for backwards compatibility, but we suggest using spack.
 In order to set environment variables needed for a specific application, the **module** environment may be used:
@@ Line 20: / Line 28: @@
 When all required/intended modules have been loaded, user packages may be compiled as usual.
-===== SLURM (Simple Linux Utility for Resource Management) =====
-Contrary to the previously on VSC 1 and VSC 2 employed SGE, the scheduler on VSC-3 and VSC-4 is [[http://slurm.schedmd.com|SLURM]].
-==== Basic SLURM commands: ====
-  * ''[...]$ sinfo'' gives information on which 'queues'='partitions' are available for job submission. Note: the under SGE termed 'queue' is called a 'partition' under SLURM.
-  * ''[...]$ scontrol'' is used to view SLURM configuration including: job, job step, node, partition, reservation, and overall system configuration. Without a command entered on the execute line, scontrol operates in an interactive mode and prompt for input. With a command entered on the execute line, scontrol executes that command and terminates.
-  * ''[...]$ scontrol show job 567890'' shows information on the job with number 567890.
-  * ''[...]$ scontrol show partition'' shows information on available partitions.
-  * ''[...]$ squeue''    to see the current list of submitted jobs, their state and resource allocation. [[doku:slurm_job_reason_codes|Here]] is a description of the most important **job reason codes** returned by the squeue command.
 ==== Node configuration - hyperthreading ====
-The compute nodes of VSC-3 are configured with the following parameters in SLURM:
+The compute nodes of VSC-4 are configured with the following parameters in SLURM:
+<code>
+CoresPerSocket=24
+Sockets=2
+ThreadsPerCore=2
+</code>
+And the primary nodes of VSC-5 with:
 <code>
-CoresPerSocket=8
+CoresPerSocket=64
 Sockets=2
 ThreadsPerCore=2
 </code>
-This reflects the fact that <html> <font color=#cc3300> hyperthreading </font> </html> is activated on all compute nodes and <html> <font color=#cc3300> 32 cores </font> </html> may be utilized on each node.
+This reflects the fact that <color #cc3300> hyperthreading </color> is activated on all compute nodes and <color #cc3300> 96 cores on VSC4 and 256 cores on VSC5 </color> may be utilized on each node.
 In the batch script hyperthreading is selected by adding the line
-<code>
+<code>#SBATCH --ntasks-per-core=2
-#SBATCH --ntasks-per-core=2
 </code>
 which allows for 2 tasks per core.
-Some codes may experience a performance gain from using all 32 virtual cores, e.g., GROMACS seems to profit. But note that using all virtual cores also leads to more communication and may impact on the performance of large MPI jobs.
+Some codes may experience a performance gain from using all virtual cores, e.g., GROMACS seems to profit. But note that using all virtual cores also leads to more communication and may impact on the performance of large MPI jobs.
-**NOTE on accounting**: the project's core-h are always calculated as ''job_walltime * nnodes * 16'' (16 physical cores per node). SLURM's built in function ''sreport'' yields wrong accounting statistics because (depending on the job script) the multiplier is 32 instead of 16. You may instead use the accounting script introduced in this [[doku:slurm_sacct|section]].
+**NOTE on accounting**: the project's core-h are always calculated as ''job_walltime * nnodes * ncpus'' (number of physical cores per node). SLURM's built in function ''sreport'' yields wrong accounting statistics because (depending on the job script) the multiplier is 'number of virtual cores' instead of 'physical cores'. You may instead use the accounting script introduced in this [[doku:slurm_sacct|section]].
 ==== Node allocation policy ====
-On VSC-3 (as on VSC-2)  <html> <font color=#cc3300> only complete compute Nodes </font> </html>, i.e., integral multiples of 16 cores, can be allocated for user jobs. If you wish to run many single core jobs, there will be a possibility to schedule them in a smart way exploiting all 16 cpus of one node, please see the [[doku:slurm&#scheduler_script_for_many_single_core_jobs|scheduler script for a series of single core jobs]].
+On VSC-4 & VSC-5 there is a set of nodes that accept jobs that do not require entire exclusive nodes (anything from 1 core to less than a full node). These nodes are set up to accommodate different jobs from different users until they are full. They are automatically used for such types of jobs. All other nodes are assigned completely (and exclusively) to a job whenever the '-N' argument is used.
@@ Line 59: / Line 62: @@
 Depending on the demands of a certain application, the
-[[doku:vsc3_queue|partition (grouping hardware according its type) and
+[[doku:vsc5_queue|partition (grouping hardware according its type) and
 quality of service (QOS; defining the run time etc.)]] can be selected.
-Additionally, the run time of a job can be limited in the job script to a value lower than the runtime limit of the selected QOS. This allows for a process called [[doku:vsc3_queue#backfilling|backfilling]] possibly leading to a <html><font color=#cc3300>shorter waiting time</font></html> in the queue.
+Additionally, the run time of a job can be limited in the job script to a value lower than the runtime limit of the selected QOS. This allows for a process called backfilling possibly leading to a <color #cc3300>shorter waiting time</color> in the queue.
 ==== The job submission script====
-It is recommended to write the job script using a [[doku:win2vsc&#the_job_filetext_editors_on_the_cluster|text editor]] on the VSC //Linux// cluster.
+It is recommended to write the job script using a [[doku:win2vsc&#the_job_filetext_editors_on_the_cluster|text editor]] on the VSC //Linux// cluster or on any Linux/Mac system.
-Editors in //Windows// may add additional invisible characters to the job file which render it unreadable and, thus, it is not executed.
+Editors in //Windows// may add additional invisible characters to the job file which render it unreadable and, thus, it cannot be not executed.
 Assume a submission script ''check.slrm''
@@ Line 73: / Line 76: @@
 #SBATCH -J chk
 #SBATCH -N 2
-#SBATCH --ntasks-per-node=16
+#SBATCH --ntasks-per-node=48
 #SBATCH --ntasks-per-core=1
 #SBATCH --mail-type=BEGIN    # first have to state the type of event to occur
@@ Line 80: / Line 83: @@
 # when srun is used, you need to set:
-<srun -l -N2 -n32 a.out >
+<srun -l -N2 -n96 a.out >
 # or
-<mpirun -np 32 a.out>
+<mpirun -np 96 a.out>
 </code>
   * **-J**     job name,\\
-  * **-N**     number of nodes requested (16 cores per node available)\\
+  * **-N**     number of nodes requested\\
   * **-n, --ntasks=<number>** specifies the number of tasks to run,
   * **--ntasks-per-node**     number of processes run in parallel on a single node \\
@@ Line 94: / Line 97: @@
   * **--mail-user** sends an email to this address
-In order to send the job to specific queues, see [[doku:vsc3_queue|Queue/Partition setup on VSC-3]].
+In order to send the job to specific queues, see [[doku:vsc4_queue|Queue | Partition setup on VSC-4]] or [[doku:vsc5_queue|Queue | Partition setup on VSC-5]].
 ====Job submission====
 <code>
-[username@l31 ~]$ sbatch check.slrm    # to submit the job
+[username@l42 ~]$ sbatch check.slrm    # to submit the job
-[username@l31 ~]$ squeue -u `whoami`   # to check the status  of own jobs
+[username@l42 ~]$ squeue -u `whoami`   # to check the status  of own jobs
-[username@l31 ~]$ scancel  JOBID       # for premature removal, where JOBID
+[username@l42 ~]$ scancel  JOBID       # for premature removal, where JOBID
                                        # is obtained from the previous command
 </code>
-====A word on srun and mpirun:====
-Currently (27th March 2015), **srun** only works when the application uses **intel mpi** and is compiled with the **intel compiler**. We will provide compatible versions of MVAPICH2 and OpenMPI in the near future.
-At the moment, it is recommended to use **mpirun** in case of MVAPICH2 and OpenMPI.
 ==== Hybrid MPI/OMP: ====
@@ Line 154: / Line 148: @@
 </code>
 See also the [[https://software.intel.com/sites/products/documentation/hpc/ics/impi/41/lin/Reference_Manual/Environment_Variables_Process_Pinning.htm|Intel Environment Variables]].
@@ Line 166: / Line 161: @@
 #SBATCH -J chk
 #SBATCH -N 4
-#SBATCH --ntasks-per-node=16
+#SBATCH --ntasks-per-node=48
 #SBATCH --ntasks-per-core=1
@@ Line 172: / Line 167: @@
 scontrol show hostnames $SLURM_NODELIST  > ./nodelist
-srun -l -N2 -r0 -n32 job1.scrpt &
+srun -l -N2 -r0 -n96 job1.scrpt &
-srun -l -N2 -r2 -n32 job2.scrpt &
+srun -l -N2 -r2 -n96 job2.scrpt &
 wait
-srun -l -N2 -r2 -n32 job3.scrpt &
+srun -l -N2 -r2 -n96 job3.scrpt &
-srun -l -N2 -r0 -n32 job4.scrpt &
+srun -l -N2 -r0 -n96 job4.scrpt &
 wait
@@ Line 285: / Line 280: @@
 #SBATCH -J par                      # job name
 #SBATCH -N 2                        # number of nodes=2
-#SBATCH --ntasks-per-node=16        # uses all cpus of one node
+#SBATCH --ntasks-per-node=48        # uses all cpus of one node
 #SBATCH --ntasks-per-core=1
 #SBATCH --threads-per-core=1
@@ Line 295: / Line 290: @@
 rm machines_tmp
-tasks_per_node=16         # change number accordingly
+tasks_per_node=48         # change number accordingly
 nodes=2                   # change number accordingly
 for ((line=1; line<=nodes; line++))
@@ Line 347: / Line 342: @@
   - continue at 2. for further dependent jobs
-===== Licenses =====
-Software, that uses a license server, has to be specified upon job submission. A list of all available licensed software for your user can be shown by using the command:
-<code>
-slic
-</code>
-Within the job script add the flags as shown with 'slic', e.g. for using both Matlab and Mathematica:
-<code>
-#SBATCH -L matlab@vsc,mathematica@vsc
-</code>
 ===== Prolog Error Codes =====