Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
doku:slurm [2017/06/27 12:30] – [Job Arrays] ir | doku:slurm [2023/03/14 12:52] – [Licenses] goldenberg | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
- | ==== Module environment ==== | + | Contrary to the previously on VSC-1 and VSC-2 employed SGE, the scheduler on VSC-3, VSC-4, and VSC-5 is [[http:// |
- | In order to set environment variables needed for a specific application, | + | ==== Basic SLURM commands: |
- | * '' | + | * '' |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | == Note: == | + | |
- | + | ||
- | - **< | + | |
- | - a list of '' | + | |
- | + | ||
- | When all required/ | + | |
- | ==== SLURM (Simple Linux Utility for Resource Management) ==== | + | |
- | + | ||
- | Contrary to the previously on VSC 1 and VSC 2 employed SGE, the scheduler on VSC-3 is [[http:// | + | |
- | ==Basic SLURM commands: | + | |
- | * '' | + | |
* '' | * '' | ||
* '' | * '' | ||
* '' | * '' | ||
* '' | * '' | ||
+ | |||
+ | |||
+ | ==== Software Installations and Modules ==== | ||
+ | |||
+ | On VSC-4 and VSC-5, spack is used to install and provide modules, see [[doku: | ||
+ | |||
==== Node configuration - hyperthreading ==== | ==== Node configuration - hyperthreading ==== | ||
- | The compute nodes of VSC-3 are configured with the following parameters in SLURM: | + | The compute nodes of VSC-4 are configured with the following parameters in SLURM: |
< | < | ||
- | CoresPerSocket=8 | + | CoresPerSocket=24 |
Sockets=2 | Sockets=2 | ||
ThreadsPerCore=2 | ThreadsPerCore=2 | ||
</ | </ | ||
- | This reflects the fact that < | + | And the primary nodes of VSC-5 with: |
+ | < | ||
+ | CoresPerSocket=64 | ||
+ | Sockets=2 | ||
+ | ThreadsPerCore=2 | ||
+ | </ | ||
+ | This reflects the fact that < | ||
In the batch script hyperthreading is selected by adding the line | In the batch script hyperthreading is selected by adding the line | ||
< | < | ||
Line 40: | Line 36: | ||
which allows for 2 tasks per core. | which allows for 2 tasks per core. | ||
- | Some codes may experience a performance gain from using all 32 virtual cores, e.g., GROMACS seems to profit. But note that using all virtual cores also leads to more communication and may impact on the performance of large MPI jobs. | + | Some codes may experience a performance gain from using all virtual cores, e.g., GROMACS seems to profit. But note that using all virtual cores also leads to more communication and may impact on the performance of large MPI jobs. |
- | **NOTE on accounting**: | + | **NOTE on accounting**: |
==== Node allocation policy ==== | ==== Node allocation policy ==== | ||
- | On VSC-3 (as on VSC-2) < | + | On VSC-4 there are a set of nodes which accept jobs that do not require entire nodes (anythong from 1 core to less than a full node). These nodes are set up to accomodate different jobs from different users until they are full. They are automatically used for such types of jobs. All other nodes are assigned completely |
+ | On VSC-5 that feature is not yet active, so only complete nodes are assigned | ||
- | ===== The job submission script===== | + | ===== Submit a batch job ===== |
- | It is recommended to write the job script using a [[doku: | + | ==== Partition, quality of service and run time ==== |
+ | |||
+ | Depending on the demands of a certain application, | ||
+ | [[doku: | ||
+ | quality of service (QOS; defining the run time etc.)]] can be selected. | ||
+ | Additionally, | ||
+ | ==== The job submission script==== | ||
+ | |||
+ | It is recommended to write the job script using a [[doku: | ||
Editors in //Windows// may add additional invisible characters to the job file which render it unreadable and, thus, it is not executed. | Editors in //Windows// may add additional invisible characters to the job file which render it unreadable and, thus, it is not executed. | ||
Line 59: | Line 64: | ||
#SBATCH -J chk | #SBATCH -J chk | ||
#SBATCH -N 2 | #SBATCH -N 2 | ||
- | #SBATCH --ntasks-per-node=16 | + | #SBATCH --ntasks-per-node=48 |
#SBATCH --ntasks-per-core=1 | #SBATCH --ntasks-per-core=1 | ||
#SBATCH --mail-type=BEGIN | #SBATCH --mail-type=BEGIN | ||
Line 65: | Line 70: | ||
# when srun is used, you need to set: | # when srun is used, you need to set: | ||
- | export I_MPI_PMI_LIBRARY=/ | ||
- | <srun -l -N2 -n32 a.out > | + | <srun -l -N2 -n96 a.out > |
# or | # or | ||
- | <mpirun -np 32 a.out> | + | <mpirun -np 96 a.out> |
</ | </ | ||
* **-J** | * **-J** | ||
Line 81: | Line 85: | ||
* **--mail-user** sends an email to this address | * **--mail-user** sends an email to this address | ||
- | In order to send the job to specific queues, see [[doku:vsc3_queue|Queue/Partition setup on VSC-3]]. | + | In order to send the job to specific queues, see [[doku:vsc4_queue|Queue |
====Job submission==== | ====Job submission==== | ||
< | < | ||
- | [username@l31 ~]$ sbatch check.slrm | + | [username@l42 ~]$ sbatch check.slrm |
- | [username@l31 ~]$ squeue -u `whoami` | + | [username@l42 ~]$ squeue -u `whoami` |
- | [username@l31 ~]$ scancel | + | [username@l42 ~]$ scancel |
# is obtained from the previous command | # is obtained from the previous command | ||
</ | </ | ||
Line 95: | Line 99: | ||
- | ====A word on srun and mpirun:==== | ||
- | Currently (27th March 2015), **srun** only works when the application uses **intel mpi** and is compiled with the **intel compiler**. We will provide compatible versions of MVAPICH2 and OpenMPI in the near future. | ||
- | At the moment, it is recommended to use **mpirun** in case of MVAPICH2 and OpenMPI. | ||
- | ==== Further examples for job submission scripts ==== | + | ==== Hybrid MPI/ |
- | + | ||
- | + | ||
- | === Hybrid MPI/OMP: === | + | |
SLURM Script: | SLURM Script: | ||
Line 145: | Line 143: | ||
- | === Job chain === | + | ==== Job chain ==== |
This example is for using a set of 4 nodes to compute a series of jobs in two stages, each of them split into two separate subjobs. \\ | This example is for using a set of 4 nodes to compute a series of jobs in two stages, each of them split into two separate subjobs. \\ | ||
Line 155: | Line 153: | ||
#SBATCH -J chk | #SBATCH -J chk | ||
#SBATCH -N 4 | #SBATCH -N 4 | ||
- | #SBATCH --ntasks-per-node=16 | + | #SBATCH --ntasks-per-node=48 |
#SBATCH --ntasks-per-core=1 | #SBATCH --ntasks-per-core=1 | ||
- | export I_MPI_PMI_LIBRARY=/ | ||
scontrol show hostnames $SLURM_NODELIST | scontrol show hostnames $SLURM_NODELIST | ||
- | srun -l -N2 -r0 -n32 job1.scrpt & | + | srun -l -N2 -r0 -n96 job1.scrpt & |
- | srun -l -N2 -r2 -n32 job2.scrpt & | + | srun -l -N2 -r2 -n96 job2.scrpt & |
wait | wait | ||
- | srun -l -N2 -r2 -n32 job3.scrpt & | + | srun -l -N2 -r2 -n96 job3.scrpt & |
- | srun -l -N2 -r0 -n32 job4.scrpt & | + | srun -l -N2 -r0 -n96 job4.scrpt & |
wait | wait | ||
Line 264: | Line 261: | ||
</ | </ | ||
(The SLURM inherent command //#SBATCH --array starting_value-end_value: | (The SLURM inherent command //#SBATCH --array starting_value-end_value: | ||
+ | |||
+ | [[doku: | ||
===== Generating a host machines file ===== | ===== Generating a host machines file ===== | ||
Line 273: | Line 272: | ||
#SBATCH -J par # job name | #SBATCH -J par # job name | ||
#SBATCH -N 2 # number of nodes=2 | #SBATCH -N 2 # number of nodes=2 | ||
- | #SBATCH --ntasks-per-node=16 # uses all cpus of one node | + | #SBATCH --ntasks-per-node=48 # uses all cpus of one node |
#SBATCH --ntasks-per-core=1 | #SBATCH --ntasks-per-core=1 | ||
#SBATCH --threads-per-core=1 | #SBATCH --threads-per-core=1 | ||
Line 283: | Line 282: | ||
rm machines_tmp | rm machines_tmp | ||
- | tasks_per_node=16 # change number accordingly | + | tasks_per_node=48 # change number accordingly |
nodes=2 | nodes=2 | ||
for ((line=1; line< | for ((line=1; line< | ||
Line 298: | Line 297: | ||
Pay attention to the fact that '' | Pay attention to the fact that '' | ||
- | ===== Restarting Failed Jobs ===== | + | ==== Restarting Failed Jobs ==== |
- | + | ||
- | Slurm is configured to automatically requeue jobs which were aborted due to node failures. If this is an unwanted behaviour you can prevent automatic requeuing with the following option in your job script: | + | |
- | < | + | |
+ | Slurm is __not longer__ configured to automatically requeue jobs which were aborted due to node failures. If this is an unwanted behaviour you can < | ||
+ | < | ||
==== Job Arrays ==== | ==== Job Arrays ==== | ||
Line 336: | Line 334: | ||
- continue at 2. for further dependent jobs | - continue at 2. for further dependent jobs | ||
- | ===== Licenses ===== | ||
- | Software, that uses a license server, has to be specified upon job submission. A list of all available licensed software for your user can be shown by using the command: | + | |
+ | ===== Prolog Error Codes ===== | ||
< | < | ||
- | slic | + | ERROR_MEMORY=200 |
- | </ | + | ERROR_INFINIBAND_HW=201 |
+ | ERROR_INFINIBAND_SW=202 | ||
+ | ERROR_IPOIB=203 | ||
+ | ERROR_BEEGFS_SERVICE=204 | ||
+ | ERROR_BEEGFS_USER=205 | ||
+ | ERROR_BEEGFS_SCRATCH=206 | ||
+ | ERROR_NFS=207 | ||
- | Within the job script add the flags as shown with ' | + | ERROR_USER_GROUP=220 |
+ | ERROR_USER_HOME=221 | ||
+ | |||
+ | ERROR_GPFS_START=228 | ||
+ | ERROR_GPFS_MOUNT=229 | ||
+ | ERROR_GPFS_UNMOUNT=230 | ||
- | < | ||
- | #SBATCH -L matlab@vsc, | ||
</ | </ | ||
- |