Differences

This shows you the differences between two versions of the page.

Link to this comparison view

doku:quick3save [2014/11/25 10:28] – created irdoku:quick3save [2015/05/28 12:19] (current) – [A simple job submission script] markus
Line 1: Line 1:
- (%s %N). +==== Brief Introduction ====
-It is likely that your copy of e2fsck is old and/or doesn't support this @j format. +
-It is also possible the @j @S is corrupt.+
  
-Clearing fields beyond the V1 @j @S...+=== Login ===
  
 +<code>
 +ssh <username>@vsc3.vsc.ac.at
 +</code>
 +In the following you will be asked to type //first// your password and //then// your **o**ne **t**ime **p**assword (OTP; sms token).
  
-is %N; @s zero  now %T) is in the future+[[doku:win2vsc|How to connect from Windows?]] 
- now %Tis in the future+ 
- (by less than +=== VSC 3 === 
 + 
 +Once you have logged into VSC-3, type: 
 +  * **module avail**    to get a basic idea of what is around in terms of installed software and available standard tools 
 +  * **module list**    to see what is currently loaded into your session 
 +  * **module unload //xyz//**    to unload a particular package **//xyz//** from your session 
 +  * **module load //xyz//**    to load a particular package **//xyz//** into your session \\  
 +== Note: == 
 + 
 +  - **//xyz//** format corresponds exactly to the output of **module avail**Thus, in order to load or unload a selected module, copy and paste exactly the name listed by **module avail**.\\  
 +  - a list of **module load/unload** directives may also be included in the top part of a job submission script\\  
 + 
 +When all required/intended modules have been loaded, user packages may be compiled as usual. 
 + 
 +=== SLURM (Simple Linux Utility for Resource Management=== 
 + 
 + 
 +Contrary to previous VSC* times, the scheduler on VSC-3 is [[http://slurm.schedmd.com|SLURM]].  
 +For basic information type: 
 +  * **sinfo**   to find out which 'queues'='partitions' are available for job submission. Note: the in SGE times termed 'queue', is now under SLURM called a 'partition'
 +  * **scontrol show //partition//**   more or less the same as the previous command except that with **scontrol** much more information may be obtained and basic settings be modified/reset/abandoned. 
 +  * **squeue**    to see the current list of submitted jobs, their state and resource allocation
 +==== A simple job submission script==== 
 + 
 +vi check.slrm\\  
 +<code> 
 +#!/bin/bash 
 +
 +#SBATCH -J chk 
 +#SBATCH -N 2 
 +#SBATCH --ntasks-per-node=16 
 +#SBATCH --ntasks-per-core=1 
 + 
 +mpirun -np 32 a.out 
 + 
 +</code> 
 +  * **-J**     some name for the job\\  
 +  * **-N**     number of nodes requested (16 cores per node available)\\       
 +  * **--ntasks-per-node**     number of processes run in parallel on a single node \\         
 +  * **--ntasks-per-core**     number of tasks a single core should work on\\    
 +  * **mpirun -np 32 a.out**    standard invocation of some parallel program (a.out) running 32 processes in parallel. **Note**, 
 +  * in SLURM **srun** is preferred over **mpirun**, so an equivalent call to the one on the final line above could have been **srun -l -N2 -n32 a.out** where the **-l** just adds task-specific labels to the beginning of all output lines
 + 
 +==Job submission=== 
 +  
 +<code> 
 +[username@l31 ~]$ sbatch check.slrm    # to submit the job              
 +[username@l31 ~]$ squeue               # to check the status   
 +[username@l31 ~]$ scancel  JOBID       # for premature removal, where JOBID 
 +                                       # is obtained from the previous command    
 +</code> 
 + 
 + 
 + 
 +===Another simple job submission script=== 
 + 
 +This example is for using a set of 4 nodes to compute a series of jobs in two stages, each of them split into two separate subjobs. \\ 
 + 
 +vi check.slrm\\  
 +<code> 
 +#!/bin/bash 
 +
 +#SBATCH -J chk 
 +#SBATCH -N 4 
 +#SBATCH --ntasks-per-node=16 
 +#SBATCH --ntasks-per-core=1 
 + 
 +export I_MPI_PMI_LIBRARY=/cm/shared/apps/slurm/current/lib64/libpmi.so 
 + 
 +scontrol show hostnames $SLURM_NODELIST  > ./nodelist 
 + 
 +srun -l -N2 -r0 -n32 job1.scrpt & 
 +srun -l -N2 -r2 -n32 job2.scrpt & 
 +wait 
 + 
 +srun -l -N2 -r2 -n32 job3.scrpt & 
 +srun -l -N2 -r0 -n32 job4.scrpt & 
 +wait 
 + 
 +</code> 
 +== Note: == 
 +the file 'nodelist' has been written for information only; \\ 
 +it is important to send the jobs into the background (&) and insert the 'wait' at each synchronization point; \\ 
 +with **-r2** one can define an offset in the node list, in particular the **-r2** means taking nodes number 2 and 3 from the set of four (where the list starts with node number 0), hence combination of -N -r -n allows full control over all involved cores and the tasks they are going to be used for;
  • doku/quick3save.1416911333.txt.gz
  • Last modified: 2014/11/25 10:28
  • by ir