Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm [2017/10/18 11:42] – Pandoc Auto-commit pandocpandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm [2020/10/20 09:13] (current) – Pandoc Auto-commit pandoc
Line 2: Line 2:
  
   * Article written by Markus Stöhr (VSC Team) <html><br></html>(last update 2017-10-09 by ms).   * Article written by Markus Stöhr (VSC Team) <html><br></html>(last update 2017-10-09 by ms).
 +
  
  
Line 25: Line 26:
  
 <code> <code>
-sbatch job.sh +sbatch job.sh
-</code> +
-<code>+
 Submitted batch job 5250981 Submitted batch job 5250981
 </code> </code>
 +
 check what is going on: check what is going on:
  
Line 37: Line 37:
 <code> <code>
   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)   JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
-5250981  mem_0128   h5test   markus  R       0:00      2 n23-[018-019]+5250981  mem_0128   h5test   markus  R       0:00      2 n323-[018-019]
 </code> </code>
 Output files: Output files:
Line 50: Line 50:
 <code> <code>
 h5dump h5dump
 +</code>
 +
 +cancel jobs:
 +
 +<code>
 +scancel <job_id> 
 +</code>
 +or
 +
 +<code>
 +scancel <job_name>
 +</code>
 +or
 +
 +<code>
 +scancel -u $USER
 </code> </code>
 ===== Basic concepts ===== ===== Basic concepts =====
Line 58: Line 74:
     * shell script, that does everything needed to run your calculation     * shell script, that does everything needed to run your calculation
     * independent of queueing system     * independent of queueing system
-    * **use simple scripts** (max 50 lines, i.e. put complicated logic elsewhere)+    * **use simple scripts** (max 50 lines, i.e. put complicated logic elsewhere)
     * load modules from scratch (purge, then load)     * load modules from scratch (purge, then load)
  
Line 65: Line 81:
     * #nodes     * #nodes
     * nodetype     * nodetype
-    * ...+    * 
  
  
Line 72: Line 88:
  
  
-{{pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm:queueing_basics.png?200}}+ 
 +{{.:queueing_basics.png?200}}
  
 ==== SLURM: Accounts and Users ==== ==== SLURM: Accounts and Users ====
  
-{{pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm:slurm_accounts.png}}+{{.:slurm_accounts.png}}
  
  
 ==== SLURM: Partition and Quality of Service ==== ==== SLURM: Partition and Quality of Service ====
  
-{{pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm:partitions.png}}+{{.:partitions.png}}
  
  
 ==== VSC-3 Hardware Types ==== ==== VSC-3 Hardware Types ====
  
-^partition^  memory      +^partition      RAM (GB)   ^CPU                          ^  Cores   IB (HCA)  ^  #Nodes  
-|mem_0064 |   64 GB|default+|mem_0064*         64      |2x Intel E5-2650 v2 @ 2.60GHz|   2x8     2xQDR    |   1849   
-|mem_0128 |  128 GB      +|mem_0128         128      |2x Intel E5-2650 v2 @ 2.60GHz|   2x8     2xQDR    |   140    
-|mem_0256 |  256 GB      |+|mem_0256     |     256      |2x Intel E5-2650 v2 @ 2.60GHz|   2x8     2xQDR    |    50    | 
 +|vsc3plus_0064|      64      |2x Intel E5-2660 v2 @ 2.20GHz|  2x10     1xFDR    |   816    | 
 +|vsc3plus_0256|     256      |2x Intel E5-2660 v2 @ 2.20GHz|  2x10     1xFDR    |    48    | 
 +|binf          512 - 1536  |2x Intel E5-2690 v4 @ 2.60GHz|  2x14     1xFDR       17    |
  
-  All nodes with the same CPU configuration+ 
-    * 16 cores +* default partition, QDR: Intel Truescale Infinipath (40Gbit/s), FDR: Mellanox ConnectX-3 (56Gbit/s) 
-    * 2 x Intel(RXeon(RCPU E5-2650 v2 2.60GHz (Ivy-Bridge)+ 
 +effective: 10/2018 
 + 
 +  + GPU nodes (see later) 
 +  * specify partition in job script
 + 
 +<code> 
 +#SBATCH -p <partition> 
 +</code> 
 +==== Standard QOS ==== 
 + 
 +^partition    ^QOS          ^ 
 +|mem_0064*    |normal_0064 
 +|mem_0128     |normal_0128 
 +|mem_0256     |normal_0256 
 +|vsc3plus_0064|vsc3plus_0064| 
 +|vsc3plus_0256|vsc3plus_0256| 
 +|binf         |normal_binf 
 + 
 + 
 +  specify QOS in job script: 
 + 
 +<code> 
 +#SBATCH --qos <QOS> 
 +</code> 
 + 
 +---- 
 + 
 +==== VSC-4 Hardware Types ==== 
 + 
 +^partition^  RAM (GB ^CPU                              Cores  ^  IB (HCA  #Nodes 
 +|mem_0096*|     96     |2x Intel Platinum 8174 3.10GHz|  2x24     1xEDR    |   688    | 
 +|mem_0384 |    384     |2x Intel Platinum 8174 @ 3.10GHz|  2x24     1xEDR    |    78    | 
 +|mem_0768 |    768     |2x Intel Platinum 8174 @ 3.10GHz|  2x24     1xEDR    |    12    | 
 + 
 + 
 +* default partition, EDR: Intel Omni-Path (100Gbit/s) 
 + 
 +effective: 10/2020 
 + 
 +==== Standard QOS ==== 
 + 
 +^partition^QOS     ^ 
 +|mem_0096*|mem_0096| 
 +|mem_0384 |mem_0384| 
 +|mem_0768 |mem_0768| 
 + 
 + 
 + 
 +---- 
 + 
 +==== VSC Hardware Types ====
  
   * Display information about partitions and their nodes:   * Display information about partitions and their nodes:
Line 100: Line 171:
 sinfo -o %P sinfo -o %P
 scontrol show partition mem_0064 scontrol show partition mem_0064
-scontrol show node n01-001+scontrol show node n301-001
 </code> </code>
- 
  
 ==== QOS-Account/Project assignment ==== ==== QOS-Account/Project assignment ====
  
-{{pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm:setup.png?200}}+ 
 +{{.:setup.png?200}}
  
 1.+2.: 1.+2.:
Line 115: Line 186:
  
 <code> <code>
-default_account:        p70824 +default_account:              p70824 
-        account:        p70824               +        account:              p70824                    
- +
-    default_qos:     asperitas               +
-            qos:     asperitas               +
-                    devel_0128               +
-                      goodluck               +
-                   gpu_compute               +
-                       gpu_vis               +
-                           knl               +
-                   normal_0064               +
-                   normal_0128               +
-                   normal_0256               +
  
 +    default_qos:         normal_0064                    
 +            qos:          devel_0128                    
 +                            goodluck                    
 +                      gpu_gtx1080amd                    
 +                    gpu_gtx1080multi                    
 +                   gpu_gtx1080single                    
 +                            gpu_k20m                    
 +                             gpu_m60                    
 +                                 knl                    
 +                         normal_0064                    
 +                         normal_0128                    
 +                         normal_0256                    
 +                         normal_binf                    
 +                       vsc3plus_0064                    
 +                       vsc3plus_0256
 </code> </code>
  
  
 ==== QOS-Partition assignment ==== ==== QOS-Partition assignment ====
 +
  
 3.: 3.:
Line 141: Line 216:
 </code> </code>
 <code> <code>
-   qos_name total  free     walltime   prio partitions   +            qos_name total  used  free     walltime   priority partitions   
-========================================================== +========================================================================= 
-normal_0064  1796    43   3-00:00:00   2000 mem_0064     +         normal_0064  1782  1173   609   3-00:00:00       2000 mem_0064     
-normal_0256    15    -  3-00:00:00   2000 mem_0256     +         normal_0256    15    24    -  3-00:00:00       2000 mem_0256     
-normal_0128    67    -3   3-00:00:00   2000 mem_0128     +         normal_0128    93    51    42   3-00:00:00       2000 mem_0128     
- devel_0128    10     9     00:10:00  20000 mem_0128     +          devel_0128    10    20   -10     00:10:00      20000 mem_0128     
-gpu_compute    12       3-00:00:00   1000 p70971_gpu,gpu +            goodluck     0           3-00:00:00       1000 vsc3plus_0256,vsc3plus_0064,amd 
-    gpu_vis     4       3-00:00:00   1000 p70971_gpu,gpu +                 knl     4     1       3-00:00:00       1000 knl          
-   goodluck   470   470   3-00:00:00   1000              +         normal_binf    16        11   1-00:00:00       1000 binf         
-        knl           3-00:00:00   1000 knl          +    gpu_gtx1080multi               3-00:00:00       2000 gpu_gtx1080multi 
-  asperitas   500   500   3-00:00:00   1000 asperitas   +   gpu_gtx1080single    50    18    32   3-00:00:00       2000 gpu_gtx1080single 
 +            gpu_k20m         0       3-00:00:00       2000 gpu_k20m     
 +             gpu_m60             0   3-00:00:00       2000 gpu_m60      
 +       vsc3plus_0064   800   781    19   3-00:00:00       1000 vsc3plus_0064 
 +       vsc3plus_0256    48    44       3-00:00:00       1000 vsc3plus_0256 
 +      gpu_gtx1080amd               3-00:00:00       2000 gpu_gtx1080amd
 </code> </code>
 naming convention: naming convention:
Line 157: Line 237:
 ^QOS   ^Partition^ ^QOS   ^Partition^
 |*_0064|mem_0064 | |*_0064|mem_0064 |
 +
  
  
Line 165: Line 246:
  
 ==== Specification in job script ==== ==== Specification in job script ====
 +
  
 <code> <code>
Line 171: Line 253:
 #SBATCH --partition=mem_xxxx #SBATCH --partition=mem_xxxx
 </code> </code>
-For omitted lines corresponding defaults are used. See previous slides, default partition is "mem_0064"+For omitted lines corresponding defaults are used. See previous slides, default partition is mem_0064
  
  
Line 211: Line 293:
  
   * must be a shell script (first line!)   * must be a shell script (first line!)
-  * '#SBATCHfor marking SLURM parameters +  * #SBATCH’ for marking SLURM parameters 
-  * environment variables are set by SLURM for use within the script (e.g. ''%%SLURM_JOB_NUM_NODES%%'')+  * environment variables are set by SLURM for use within the script (e.g. ''%%SLURM_JOB_NUM_NODES%%'')
  
  
Line 227: Line 309:
 ==== Exercises ==== ==== Exercises ====
  
-  * try these commands and find out which partition has to be used if you want to run in QOS 'devel_0128':+  * try these commands and find out which partition has to be used if you want to run in QOS devel_0128:
  
 <code> <code>
Line 233: Line 315:
 sqos -acc sqos -acc
 </code> </code>
-  * find out, which nodes are in the partition that allows running in 'devel_0128'. Further, check how much memory these nodes have:+  * find out, which nodes are in the partition that allows running in devel_0128. Further, check how much memory these nodes have:
  
 <code> <code>
Line 247: Line 329:
 ==== Bad job practices ==== ==== Bad job practices ====
  
-  * looped job submission (takes a long time):+  * job submissions in a loop (takes a long time):
  
 <code> <code>
Line 256: Line 338:
 </code> </code>
  
-  * loop in job (sequential mpirun commands):+  * loop inside job script (sequential mpirun commands):
  
 <code> <code>
Line 266: Line 348:
  
  
-==== Array job ====+==== Array jobs ====
  
-  * run similar, **independent** jobs at once, that can be distinguished by **one parameter** +  * submit/run a series of **independent** jobs via a single SLURM script 
-  * each task will be treated as a seperate job +  * each job in the array gets a unique identifier (SLURM_ARRAY_TASK_ID) based on which various workloads can be organized 
-  * example ([[examples/slurm/job_array.sh|job_array.sh]], [[examples/slurm/sleep.sh|sleep.sh]]), start=1, end=30stepwidth=7:+  * example ([[examples/job_array.sh|job_array.sh]]), 10 jobs, SLURM_ARRAY_TASK_ID=1,2,3…10
  
 <code> <code>
Line 276: Line 358:
 #SBATCH -J array #SBATCH -J array
 #SBATCH -N 1 #SBATCH -N 1
-#SBATCH --array=1-30:7+#SBATCH --array=1-10
  
-./sleep.sh $SLURM_ARRAY_TASK_ID+echo "Hi, this is array job number"  $SLURM_ARRAY_TASK_ID 
 +sleep $SLURM_ARRAY_TASK_ID
 </code> </code>
-  * computed tasks: 1, 815, 22, 29+  * independent jobs: 1, 23 … 10
  
 <code> <code>
-5605039_[15-29mem_0064    array   markus PD +VSC-4 >  squeue  -u $user 
-5605039_1       mem_0064    array   markus  R +             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
-5605039_8       mem_0064    array   markus  R+     406846_[7-10 mem_0096    array       sh PD       0:00      1 (Resources) 
 +          406846_4  mem_0096    array       sh     INVALID      1 n403-062 
 +          406846_5  mem_0096    array       sh  R    INVALID      1 n403-072 
 +          406846_6  mem_0096    array       sh     INVALID      1 n404-031
 </code> </code>
- 
- 
-useful variables within job: 
  
 <code> <code>
-SLURM_ARRAY_JOB_ID +VSC-4 >  ls slurm-* 
-SLURM_ARRAY_TASK_ID +slurm-406846_10.out  slurm-406846_3.out  slurm-406846_6.out  slurm-406846_9.out 
-SLURM_ARRAY_TASK_STEP +slurm-406846_1.out   slurm-406846_4.out  slurm-406846_7.out 
-SLURM_ARRAY_TASK_MAX +slurm-406846_2.out   slurm-406846_5.out  slurm-406846_8.out
-SLURM_ARRAY_TASK_MIN+
 </code> </code>
- 
-limit number of simultanously running jobs to 2: 
  
 <code> <code>
-#SBATCH --array=1-30:7%2+VSC-4 >  cat slurm-406846_8.out 
 +Hi, this is array job number  8
 </code> </code>
  
  
-==== Single core ==== 
  
-  * use a complete compute node for several tasks at once+  * fine-tuning via builtin variables (SLURM_ARRAY_TASK_MIN, SLURM_ARRAY_TASK_MAX…)
  
-  * example: [[examples/job_singlenode_manytasks.sh|job_singlenode_manytasks.sh]]:+  * example of going in chunks of a certain size, e.g5, SLURM_ARRAY_TASK_ID=1,6,11,16
  
 <code> <code>
-...+#SBATCH --array=1-20:
 +</code>
  
-max_num_tasks=16+  * example of limiting number of simultaneously running jobs to 2 (perhaps for licences)
  
-...+<code> 
 +#SBATCH --array=1-20:5%2 
 +</code>
  
-for i in `seq $task_start $task_increment $task_end`+ 
 +==== Single core jobs ==== 
 + 
 +  * use an entire compute node for several independent jobs 
 +  * example: [[examples/single_node_multiple_jobs.sh|single_node_multiple_jobs.sh]]: 
 + 
 +<code> 
 +for ((i=1; i<=48; i++))
 do do
-  ./$executable $i & +   stress --cpu 1 --timeout $i  &
-  check_running_tasks #sleeps as long as max_num_tasks are running+
 done done
 wait wait
 </code> </code>
 +  * ‘&’: send process into the background, script can continue
 +  * ‘wait’: waits for all processes in the background, otherwise script would terminate
  
-  * '&': start binary in background, script can continue 
-  * 'wait': waits for all processes in the background, otherwise script will finish 
  
 +==== Combination of array & single core job ====
  
- +  * example: [[examples/combined_array_multiple_jobs.sh|combined_array_multiple_jobs.sh]]:
-==== Array job + single core ==== +
- +
-[[examples/job_array_some_tasks.sh|job_array_some_tasks.sh]]:+
  
 <code> <code>
 ... ...
-#SBATCH --array=1-100:32+#SBATCH --array=1-144:48
  
-...+j=$SLURM_ARRAY_TASK_ID 
 +((j+=47))
  
-task_start=$SLURM_ARRAY_TASK_ID +for ((i=$SLURM_ARRAY_TASK_ID; i<=$j; i++))
-task_end=$(( $SLURM_ARRAY_TASK_ID $SLURM_ARRAY_TASK_STEP -1 )) +
-if [ $task_end -gt $SLURM_ARRAY_TASK_MAX ]; then +
-        task_end=$SLURM_ARRAY_TASK_MAX +
-fi +
-task_increment=1 +
- +
-... +
- +
-for i in `seq $task_start $task_increment $task_end`+
 do do
-  ./$executable $i & +   stress --cpu 1 --timeout $i  &
-  check_running_tasks+
 done done
 wait wait
 +
 </code> </code>
 ==== Exercises ==== ==== Exercises ====
  
   * files are located in folder ''%%examples/05_submitting_batch_jobs%%''   * files are located in folder ''%%examples/05_submitting_batch_jobs%%''
-  * download or copy [[examples/sleep.sh|sleep.sh]] and find out what it is doing +  * look into [[examples/job_array.sh|job_array.sh]] and modify it such that the considered range is from 1 to 20 but in steps of 5 
-  * run [[examples/slurm/job_array.sh|job_array.sh]] with tasks 4-20 and stepwidth 3 +  * look into [[examples/single_node_multiple_jobs.sh|single_node_multiple_jobs.sh]] and also change it to go in steps of 5 
-  * start a jobs for [[examples/job_singlenode_manytasks.sh|job_singlenode_manytasks.sh]] with max_num_tasks=16 and max_num_tasks=8; compare the job runtimes +  * run [[examples/combined_array_multiple_jobs.sh|combined_array_multiple_jobs.sh]] and check whether the output is reasonable
-  * run [[examples/job_array_some_tasks.sh|job_array_some_tasks.sh]]+
  
 ==== Job/process setup ==== ==== Job/process setup ====
Line 370: Line 448:
   * normal jobs:   * normal jobs:
  
-^#SBATCH            ^job environment        +^#SBATCH          ^job environment      
-|-N                 |SLURM_JOB_NUM_NODES    +|-N               |SLURM_JOB_NUM_NODES  
-|--ntasks-per-core  |SLURM_NTASKS_PER_CORE  +|--ntasks-per-core|SLURM_NTASKS_PER_CORE| 
-|--ntasks-per-node  |SLURM_NTASKS_PER_NODE  | +|--ntasks-per-node|SLURM_NTASKS_PER_NODE| 
-|--ntasks-per-socket|SLURM_NTASKS_PER_SOCKET+|--ntasks, -n     |SLURM_NTASKS         |
-|--ntasks, -n       |SLURM_NTASKS           |+
  
   * emails:   * emails:
Line 383: Line 460:
 #SBATCH --mail-type=BEGIN,END #SBATCH --mail-type=BEGIN,END
 </code> </code>
 +
   * constraints:   * constraints:
  
 <code> <code>
-#SBATCH -C --constraint 
-#SBATCH --gres= 
- 
 #SBATCH -t, --time=<time> #SBATCH -t, --time=<time>
 #SBATCH --time-min=<time> #SBATCH --time-min=<time>
 </code> </code>
  
-Valid time formats:+time format:
  
-  * MM 
-  * [HH:]MM:SS 
   * DD-HH[:MM[:SS]]   * DD-HH[:MM[:SS]]
  
  
  
-  * backfilling: +  * backfilling: * specify ‘–time’ or ‘–time-min’ which are estimates of the runtime of your job * shorter than default runtimes (mostly 72h) may enable the scheduler to use idle nodes waiting for a larger job 
-    * specify '--timeor '--time-min' that is eligible for your job +  * get the remaining running time for your job:
-    short runtimes may enable the scheduler to use idle nodes waiting for a large job+
  
 +<code>
 +squeue -h -j $SLURM_JOBID -o %L
 +</code>
  
  
 ==== Licenses ==== ==== Licenses ====
  
-{{pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm:licenses.png}}+{{.:licenses.png}}
  
  
 <code> <code>
-slic+VSC-3 >  slic
 </code> </code>
-Within the job script add the flags as shown with 'slic', e.g. for using both Matlab and Mathematica:+Within the SLURN submit script add the flags as shown with slic, e.g. when both Matlab and Mathematica are required
  
 <code> <code>
 #SBATCH -L matlab@vsc,mathematica@vsc #SBATCH -L matlab@vsc,mathematica@vsc
 </code> </code>
-Intel licenses are needed only for compiling code, not for running it!+Intel licenses are needed only when compiling code, not for running resulting executables
  
-==== Reservations of compute nodes ====+==== Reservation of compute nodes ====
  
-  * core-h accounting is done for the full reservation time +  * core-h accounting is done for the entire period of reservation 
-  * contact us, if needed+  * contact service@vsc.ac.at
   * reservations are named after the project id   * reservations are named after the project id
  
Line 431: Line 506:
  
 <code> <code>
-scontrol show reservations+VSC-3 >  scontrol show reservations
 </code> </code>
-  * use it:+  * usage:
  
 <code> <code>
Line 449: Line 524:
 echo "2+2" | matlab echo "2+2" | matlab
 </code> </code>
-==== MPI + NTASKS_PER_NODE + pinning ====+==== MPI + pinning ====
  
   * understand what your code is doing and place the processes correctly   * understand what your code is doing and place the processes correctly
Line 455: Line 530:
   * details for pinning: https://wiki.vsc.ac.at/doku.php?id=doku:vsc3_pinning   * details for pinning: https://wiki.vsc.ac.at/doku.php?id=doku:vsc3_pinning
  
-Example: Two nodes with two mpi processes each:+Example: Two nodes with two MPI processes each:
  
 === srun === === srun ===
Line 463: Line 538:
 #SBATCH --tasks-per-node=2 #SBATCH --tasks-per-node=2
  
-srun --cpu_bind=map_cpu:0,./my_mpi_program+srun --cpu_bind=map_cpu:0,24 ./my_mpi_program
  
 </code> </code>
Line 473: Line 548:
 #SBATCH --tasks-per-node=2 #SBATCH --tasks-per-node=2
  
-export I_MPI_PIN_PROCESSOR_LIST=0,8+export I_MPI_PIN_PROCESSOR_LIST=0,24   # Intel MPI syntax 
 mpirun ./my_mpi_program mpirun ./my_mpi_program
 </code> </code>
Line 495: Line 570:
  
 ---- ----
 +
  
  • pandoc/introduction-to-vsc/05_submitting_batch_jobs/slurm.1508326942.txt.gz
  • Last modified: 2017/10/18 11:42
  • by pandoc