Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision | ||
doku:slurm [2018/09/07 08:24] – [Restarting Failed Jobs] sh | doku:slurm [2024/02/07 10:55] – [The job submission script] katrin | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
- | ===== Module environment ===== | + | Contrary to the previously on VSC-1 and VSC-2 employed SGE, the scheduler on VSC-3, VSC-4, and VSC-5 is [[http:// |
- | In order to set environment variables needed for a specific application, | + | ==== Basic SLURM commands: |
- | * '' | + | * '' |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | * '' | + | |
- | == Note: == | + | |
- | + | ||
- | - **< | + | |
- | - a list of '' | + | |
- | + | ||
- | When all required/ | + | |
- | ===== SLURM (Simple Linux Utility for Resource Management) ===== | + | |
- | + | ||
- | Contrary to the previously on VSC 1 and VSC 2 employed SGE, the scheduler on VSC-3 is [[http:// | + | |
- | === Basic SLURM commands: === | + | |
- | * '' | + | |
* '' | * '' | ||
* '' | * '' | ||
* '' | * '' | ||
* '' | * '' | ||
+ | |||
+ | |||
+ | ==== Software Installations and Modules ==== | ||
+ | |||
+ | On VSC-4 and VSC-5, spack is used to install and provide modules, see [[doku: | ||
+ | |||
==== Node configuration - hyperthreading ==== | ==== Node configuration - hyperthreading ==== | ||
- | The compute nodes of VSC-3 are configured with the following parameters in SLURM: | + | The compute nodes of VSC-4 are configured with the following parameters in SLURM: |
< | < | ||
- | CoresPerSocket=8 | + | CoresPerSocket=24 |
Sockets=2 | Sockets=2 | ||
ThreadsPerCore=2 | ThreadsPerCore=2 | ||
</ | </ | ||
- | This reflects the fact that < | + | And the primary nodes of VSC-5 with: |
+ | < | ||
+ | CoresPerSocket=64 | ||
+ | Sockets=2 | ||
+ | ThreadsPerCore=2 | ||
+ | </ | ||
+ | This reflects the fact that < | ||
In the batch script hyperthreading is selected by adding the line | In the batch script hyperthreading is selected by adding the line | ||
< | < | ||
Line 40: | Line 36: | ||
which allows for 2 tasks per core. | which allows for 2 tasks per core. | ||
- | Some codes may experience a performance gain from using all 32 virtual cores, e.g., GROMACS seems to profit. But note that using all virtual cores also leads to more communication and may impact on the performance of large MPI jobs. | + | Some codes may experience a performance gain from using all virtual cores, e.g., GROMACS seems to profit. But note that using all virtual cores also leads to more communication and may impact on the performance of large MPI jobs. |
- | **NOTE on accounting**: | + | **NOTE on accounting**: |
==== Node allocation policy ==== | ==== Node allocation policy ==== | ||
- | On VSC-3 (as on VSC-2) < | + | On VSC-4 & VSC-5 there is a set of nodes that accept |
Line 58: | Line 55: | ||
==== The job submission script==== | ==== The job submission script==== | ||
- | It is recommended to write the job script using a [[doku: | + | It is recommended to write the job script using a [[doku: |
- | Editors in //Windows// may add additional invisible characters to the job file which render it unreadable and, thus, it is not executed. | + | Editors in //Windows// may add additional invisible characters to the job file which render it unreadable and, thus, it cannot be not executed. |
Assume a submission script '' | Assume a submission script '' | ||
Line 67: | Line 64: | ||
#SBATCH -J chk | #SBATCH -J chk | ||
#SBATCH -N 2 | #SBATCH -N 2 | ||
- | #SBATCH --ntasks-per-node=16 | + | #SBATCH --ntasks-per-node=48 |
#SBATCH --ntasks-per-core=1 | #SBATCH --ntasks-per-core=1 | ||
#SBATCH --mail-type=BEGIN | #SBATCH --mail-type=BEGIN | ||
Line 74: | Line 71: | ||
# when srun is used, you need to set: | # when srun is used, you need to set: | ||
- | <srun -l -N2 -n32 a.out > | + | <srun -l -N2 -n96 a.out > |
# or | # or | ||
- | <mpirun -np 32 a.out> | + | <mpirun -np 96 a.out> |
</ | </ | ||
* **-J** | * **-J** | ||
- | * **-N** | + | * **-N** |
* **-n, --ntasks=< | * **-n, --ntasks=< | ||
* **--ntasks-per-node** | * **--ntasks-per-node** | ||
Line 88: | Line 85: | ||
* **--mail-user** sends an email to this address | * **--mail-user** sends an email to this address | ||
- | In order to send the job to specific queues, see [[doku:vsc3_queue|Queue/Partition setup on VSC-3]]. | + | In order to send the job to specific queues, see [[doku:vsc4_queue|Queue |
====Job submission==== | ====Job submission==== | ||
< | < | ||
- | [username@l31 ~]$ sbatch check.slrm | + | [username@l42 ~]$ sbatch check.slrm |
- | [username@l31 ~]$ squeue -u `whoami` | + | [username@l42 ~]$ squeue -u `whoami` |
- | [username@l31 ~]$ scancel | + | [username@l42 ~]$ scancel |
# is obtained from the previous command | # is obtained from the previous command | ||
</ | </ | ||
Line 101: | Line 98: | ||
- | |||
- | ====A word on srun and mpirun:==== | ||
- | Currently (27th March 2015), **srun** only works when the application uses **intel mpi** and is compiled with the **intel compiler**. We will provide compatible versions of MVAPICH2 and OpenMPI in the near future. | ||
- | At the moment, it is recommended to use **mpirun** in case of MVAPICH2 and OpenMPI. | ||
Line 160: | Line 153: | ||
#SBATCH -J chk | #SBATCH -J chk | ||
#SBATCH -N 4 | #SBATCH -N 4 | ||
- | #SBATCH --ntasks-per-node=16 | + | #SBATCH --ntasks-per-node=48 |
#SBATCH --ntasks-per-core=1 | #SBATCH --ntasks-per-core=1 | ||
Line 166: | Line 159: | ||
scontrol show hostnames $SLURM_NODELIST | scontrol show hostnames $SLURM_NODELIST | ||
- | srun -l -N2 -r0 -n32 job1.scrpt & | + | srun -l -N2 -r0 -n96 job1.scrpt & |
- | srun -l -N2 -r2 -n32 job2.scrpt & | + | srun -l -N2 -r2 -n96 job2.scrpt & |
wait | wait | ||
- | srun -l -N2 -r2 -n32 job3.scrpt & | + | srun -l -N2 -r2 -n96 job3.scrpt & |
- | srun -l -N2 -r0 -n32 job4.scrpt & | + | srun -l -N2 -r0 -n96 job4.scrpt & |
wait | wait | ||
Line 268: | Line 261: | ||
</ | </ | ||
(The SLURM inherent command //#SBATCH --array starting_value-end_value: | (The SLURM inherent command //#SBATCH --array starting_value-end_value: | ||
+ | |||
+ | [[doku: | ||
===== Generating a host machines file ===== | ===== Generating a host machines file ===== | ||
Line 277: | Line 272: | ||
#SBATCH -J par # job name | #SBATCH -J par # job name | ||
#SBATCH -N 2 # number of nodes=2 | #SBATCH -N 2 # number of nodes=2 | ||
- | #SBATCH --ntasks-per-node=16 # uses all cpus of one node | + | #SBATCH --ntasks-per-node=48 # uses all cpus of one node |
#SBATCH --ntasks-per-core=1 | #SBATCH --ntasks-per-core=1 | ||
#SBATCH --threads-per-core=1 | #SBATCH --threads-per-core=1 | ||
Line 287: | Line 282: | ||
rm machines_tmp | rm machines_tmp | ||
- | tasks_per_node=16 # change number accordingly | + | tasks_per_node=48 # change number accordingly |
nodes=2 | nodes=2 | ||
for ((line=1; line< | for ((line=1; line< | ||
Line 339: | Line 334: | ||
- continue at 2. for further dependent jobs | - continue at 2. for further dependent jobs | ||
- | ===== Licenses ===== | ||
- | Software, that uses a license server, has to be specified upon job submission. A list of all available licensed software for your user can be shown by using the command: | + | |
+ | ===== Prolog Error Codes ===== | ||
< | < | ||
- | slic | + | ERROR_MEMORY=200 |
- | </ | + | ERROR_INFINIBAND_HW=201 |
+ | ERROR_INFINIBAND_SW=202 | ||
+ | ERROR_IPOIB=203 | ||
+ | ERROR_BEEGFS_SERVICE=204 | ||
+ | ERROR_BEEGFS_USER=205 | ||
+ | ERROR_BEEGFS_SCRATCH=206 | ||
+ | ERROR_NFS=207 | ||
- | Within the job script add the flags as shown with ' | + | ERROR_USER_GROUP=220 |
+ | ERROR_USER_HOME=221 | ||
+ | |||
+ | ERROR_GPFS_START=228 | ||
+ | ERROR_GPFS_MOUNT=229 | ||
+ | ERROR_GPFS_UNMOUNT=230 | ||
- | < | ||
- | #SBATCH -L matlab@vsc, | ||
</ | </ | ||
- |