pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm [2018/01/31 11:10] – Pandoc Auto-commit pandocpandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm [2020/10/20 08:09] – Pandoc Auto-commit pandoc
 +====== SLURM ======
 +  * Article written by Markus Stöhr (VSC Team) <html><br></html>(last update 2017-10-09 by ms).
 +==== Quickstart ====
 +script [[examples/job-quickstart.sh|examples/05_submitting_batch_jobs/job-quickstart.sh]]:
 +#SBATCH -J h5test
 +#SBATCH -N 1
 +module purge
 +module load gcc/5.3 intel-mpi/5 hdf5/1.8.18-MPI
 +cp $VSC_HDF5_ROOT/share/hdf5_examples/c/ph5example.c .
 +mpicc -lhdf5 ph5example.c -o ph5example
 +mpirun -np 8  ./ph5example -c -v 
 +$ sbatch job.sh
 +Submitted batch job 5250981
 +check what is going on:
 +squeue -u $USER
 +5250981  mem_0128   h5test   markus  R       0:00      2 n323-[018-019]
 +Output files:
 +try on .h5 files:
 +cancel jobs:
 +scancel <job_id> 
 +scancel <job_name>
 +scancel -u $USER
 +===== Basic concepts =====
 +==== Queueing system ====
 +  * job/batch script:
 +    * shell script, that does everything needed to run your calculation
 +    * independent of queueing system
 +    * **use simple scripts** (max 50 lines, i.e. put complicated logic elsewhere)
 +    * load modules from scratch (purge, then load)
 +  * tell scheduler where/how to run jobs:
 +    * #nodes
 +    * nodetype
 +    * …
 +  * scheduler manages job allocation to compute nodes
 +==== SLURM: Accounts and Users ====
 +==== SLURM: Partition and Quality of Service ====
 +==== VSC-3 Hardware Types ====
 +^partition    ^   RAM (GB)   ^CPU                          ^  Cores  ^  IB (HCA)  ^  #Nodes  ^
 +|mem_0064*    |      64      |2x Intel E5-2650 v2 @ 2.60GHz|   2x8     2xQDR    |   1849   |
 +|mem_0128         128      |2x Intel E5-2650 v2 @ 2.60GHz|   2x8     2xQDR    |   140    |
 +|mem_0256         256      |2x Intel E5-2650 v2 @ 2.60GHz|   2x8     2xQDR    |    50    |
 +|vsc3plus_0064|      64      |2x Intel E5-2660 v2 @ 2.20GHz|  2x10     1xFDR    |   816    |
 +|vsc3plus_0256|     256      |2x Intel E5-2660 v2 @ 2.20GHz|  2x10     1xFDR    |    48    |
 +|binf          512 - 1536  |2x Intel E5-2690 v4 @ 2.60GHz|  2x14     1xFDR    |    17    |
 +* default partition, QDR: Intel Truescale Infinipath (40Gbit/s), FDR: Mellanox ConnectX-3 (56Gbit/s)
 +effective: 10/2018
 +  * + GPU nodes (see later)
 +  * specify partition in job script:
 +#SBATCH -p <partition>
 +==== Standard QOS ====
 +^partition    ^QOS          ^
 +|mem_0064*    |normal_0064  |
 +|mem_0128     |normal_0128  |
 +|mem_0256     |normal_0256  |
 +|binf         |normal_binf  |
 +  * specify QOS in job script:
 +#SBATCH --qos <QOS>
 +==== VSC-4 Hardware Types ====
 +^partition^  RAM (GB)  ^CPU                              Cores  ^  IB (HCA)  ^  #Nodes  ^
 +|mem_0096*|     96     |2x Intel Platinum 8174 @ 3.10GHz|  2x24     1xEDR    |   688    |
 +|mem_0384 |    384     |2x Intel Platinum 8174 @ 3.10GHz|  2x24     1xEDR    |    78    |
 +|mem_0768 |    768     |2x Intel Platinum 8174 @ 3.10GHz|  2x24     1xEDR    |    12    |
 +* default partition, EDR: Intel Omni-Path (100Gbit/s)
 +effective: 10/2020
 +==== Standard QOS ====
 +^partition^QOS     ^
 +|mem_0384 |mem_0384|
 +|mem_0768 |mem_0768|
 +==== VSC Hardware Types ====
 +  * Display information about partitions and their nodes:
 +sinfo -o %P
 +scontrol show partition mem_0064
 +scontrol show node n301-001
 +==== QOS-Account/Project assignment ====
 +sqos -acc
 +default_account:              p70824
 +        account:              p70824                    
 +    default_qos:         normal_0064                    
 +            qos:          devel_0128                    
 +                            goodluck                    
 +                      gpu_gtx1080amd                    
 +                    gpu_gtx1080multi                    
 +                   gpu_gtx1080single                    
 +                            gpu_k20m                    
 +                             gpu_m60                    
 +                                 knl                    
 +                         normal_0064                    
 +                         normal_0128                    
 +                         normal_0256                    
 +                         normal_binf                    
 +                       vsc3plus_0064                    
 +                       vsc3plus_0256
 +==== QOS-Partition assignment ====
 +            qos_name total  used  free     walltime   priority partitions  
 +         normal_0064  1782  1173   609   3-00:00:00       2000 mem_0064    
 +         normal_0256    15    24    -9   3-00:00:00       2000 mem_0256    
 +         normal_0128    93    51    42   3-00:00:00       2000 mem_0128    
 +          devel_0128    10    20   -10     00:10:00      20000 mem_0128    
 +            goodluck               3-00:00:00       1000 vsc3plus_0256,vsc3plus_0064,amd
 +                 knl               3-00:00:00       1000 knl         
 +         normal_binf    16        11   1-00:00:00       1000 binf        
 +    gpu_gtx1080multi               3-00:00:00       2000 gpu_gtx1080multi
 +   gpu_gtx1080single    50    18    32   3-00:00:00       2000 gpu_gtx1080single
 +            gpu_k20m               3-00:00:00       2000 gpu_k20m    
 +             gpu_m60               3-00:00:00       2000 gpu_m60     
 +       vsc3plus_0064   800   781    19   3-00:00:00       1000 vsc3plus_0064
 +       vsc3plus_0256    48    44       3-00:00:00       1000 vsc3plus_0256
 +      gpu_gtx1080amd               3-00:00:00       2000 gpu_gtx1080amd
 +naming convention:
 +^QOS   ^Partition^
 +|*_0064|mem_0064 |
 +==== Specification in job script ====
 +#SBATCH --account=xxxxxx
 +#SBATCH --qos=xxxxx_xxxx
 +#SBATCH --partition=mem_xxxx
 +For omitted lines corresponding defaults are used. See previous slides, default partition is “mem_0064”
 +==== Sample batch job ====
 +#SBATCH -J jobname
 +#SBATCH -N number_of_nodes
 +job is submitted to:
 +  * partition mem_0064
 +  * qos normal_0064
 +  * default account
 +#SBATCH -J jobname
 +#SBATCH -N number_of_nodes
 +#SBATCH --partition=mem_xxxx
 +#SBATCH --qos=xxxxx_xxxx
 +#SBATCH --account=xxxxxx
 +  * must be a shell script (first line!)
 +  * ‘#SBATCH’ for marking SLURM parameters
 +  * environment variables are set by SLURM for use within the script (e.g. ''%%SLURM_JOB_NUM_NODES%%'')
 +==== Job submission ====
 +  * parameters are specified as in job script
 +  * precedence: sbatch parameters override parameters in job script
 +  * be careful to place slurm parameters **before** job script
 +==== Exercises ====
 +  * try these commands and find out which partition has to be used if you want to run in QOS ‘devel_0128’:
 +sqos -acc
 +  * find out, which nodes are in the partition that allows running in ‘devel_0128’. Further, check how much memory these nodes have:
 +scontrol show partition ...
 +scontrol show node ...
 +  * submit a one node job to QOS devel_0128 with the following commands:
 +==== Bad job practices ====
 +  * job submissions in a loop (takes a long time):
 +for i in {1..1000} 
 +    sbatch job.sh $i
 +  * loop inside job script (sequential mpirun commands):
 +for i in {1..1000}
 +    mpirun my_program $i
 +==== Array jobs ====
 +  * submit/run a series of **independent** jobs via a single SLURM script
 +  * each job in the array gets a unique identifier (SLURM_ARRAY_TASK_ID) based on which various workloads can be organized
 +  * example ([[examples/job_array.sh|job_array.sh]]), 10 jobs, SLURM_ARRAY_TASK_ID=1,2,3…10
 +#SBATCH -J array
 +#SBATCH -N 1
 +#SBATCH --array=1-10
 +echo "Hi, this is array job number"  $SLURM_ARRAY_TASK_ID
 +  * independent jobs: 1, 2, 3 … 10
 +VSC-4 >  squeue  -u $user
 +     406846_[7-10]  mem_0096    array       sh PD       0:00      1 (Resources)
 +          406846_4  mem_0096    array       sh  R    INVALID      1 n403-062
 +          406846_5  mem_0096    array       sh  R    INVALID      1 n403-072
 +          406846_6  mem_0096    array       sh  R    INVALID      1 n404-031
 +VSC-4 >  ls slurm-*
 +slurm-406846_10.out  slurm-406846_3.out  slurm-406846_6.out  slurm-406846_9.out
 +slurm-406846_1.out   slurm-406846_4.out  slurm-406846_7.out
 +slurm-406846_2.out   slurm-406846_5.out  slurm-406846_8.out
 +VSC-4 >  cat slurm-406846_8.out
 +Hi, this is array job number  8
 +  * fine-tuning via builtin variables (SLURM_ARRAY_TASK_MIN, SLURM_ARRAY_TASK_MAX…)
 +  * example of going in chunks of a certain size, e.g. 5, SLURM_ARRAY_TASK_ID=1,6,11,16
 +#SBATCH --array=1-20:5
 +  * example of limiting number of simultaneously running jobs to 2 (perhaps for licences)
 +#SBATCH --array=1-20:5%2
 +==== Single core jobs ====
 +  * use an entire compute node for several independent jobs
 +  * example: [[examples/single_node_multiple_jobs.sh|single_node_multiple_jobs.sh]]:
 +for ((i=1; i<=48; i++))
 +   stress --cpu 1 --timeout $i  &
 +  * ‘&’: send process into the background, script can continue
 +  * ‘wait’: waits for all processes in the background, otherwise script would terminate
 +==== Combination of array & single core job ====
 +  * example: [[examples/combined_array_multiple_jobs.sh|combined_array_multiple_jobs.sh]]:
 +#SBATCH --array=1-144:48
 +for ((i=$SLURM_ARRAY_TASK_ID; i<=$j; i++))
 +   stress --cpu 1 --timeout $i  &
 +==== Exercises ====
 +  * files are located in folder ''%%examples/05_submitting_batch_jobs%%''
 +  * look into [[examples/job_array.sh|job_array.sh]] and modify it such that the considered range is from 1 to 20 but in steps of 5
 +  * look into [[examples/single_node_multiple_jobs.sh|single_node_multiple_jobs.sh]] and also change it to go in steps of 5
 +  * run [[examples/combined_array_multiple_jobs.sh|combined_array_multiple_jobs.sh]] and check whether the output is reasonable
 +==== Job/process setup ====
 +  * normal jobs:
 +^#SBATCH          ^job environment      ^
 +|-N               |SLURM_JOB_NUM_NODES  |
 +|--ntasks, -n     |SLURM_NTASKS         |
 +  * emails:
 +#SBATCH --mail-user=yourmail@example.com
 +#SBATCH --mail-type=BEGIN,END
 +  * constraints:
 +#SBATCH -t, --time=<time>
 +#SBATCH --time-min=<time>
 +time format:
 +  * DD-HH[:MM[:SS]]
 +  * backfilling: * specify ‘–time’ or ‘–time-min’ which are estimates of the runtime of your job * shorter than default runtimes (mostly 72h) may enable the scheduler to use idle nodes waiting for a larger job
 +  * get the remaining running time for your job:
 +squeue -h -j $SLURM_JOBID -o %L
 +==== Licenses ====
 +VSC-3 >  slic
 +Within the SLURN submit script add the flags as shown with ‘slic’, e.g. when both Matlab and Mathematica are required
 +#SBATCH -L matlab@vsc,mathematica@vsc
 +Intel licenses are needed only when compiling code, not for running resulting executables
 +==== Reservation of compute nodes ====
 +  * core-h accounting is done for the entire period of reservation
 +  * contact service@vsc.ac.at
 +  * reservations are named after the project id
 +  * check for reservations:
 +VSC-3 >  scontrol show reservations
 +  * usage:
 +#SBATCH --reservation=
 +==== Exercises ====
 +  * check for available reservations. If there is one available, use it
 +  * specify an email address that notifies you when the job has finished
 +  * run the following matlab code in your job:
 +echo "2+2" | matlab
 +==== MPI + pinning ====
 +  * understand what your code is doing and place the processes correctly
 +  * use only a few processes per node if memory demand is high
 +  * details for pinning: https://wiki.vsc.ac.at/doku.php?id=doku:vsc3_pinning
 +Example: Two nodes with two MPI processes each:
 +=== srun ===
 +#SBATCH -N 2
 +#SBATCH --tasks-per-node=2
 +srun --cpu_bind=map_cpu:0,24 ./my_mpi_program
 +=== mpirun ===
 +#SBATCH -N 2
 +#SBATCH --tasks-per-node=2
 +export I_MPI_PIN_PROCESSOR_LIST=0,24   # Intel MPI syntax 
 +mpirun ./my_mpi_program
 +==== Job dependencies ====
 +  - Submit first job and get its <job id>
 +  - Submit dependent job (and get <job_id>):
 +#SBATCH -J jobname
 +#SBATCH -N 2
 +#SBATCH -d afterany:<job_id>
 +srun  ./my_program
 +<HTML><ol start="3" style="list-style-type: decimal;"></HTML>
 +<HTML><li></HTML>continue at 2. for further dependent jobs<HTML></li></HTML><HTML></ol></HTML>
  • pandoc/introduction-to-vsc/05_submitting_batch_jobs/slurm.txt
  • Last modified: 2020/10/20 09:13
  • by pandoc