==== Brief Introduction ====
=== Login ===
ssh @vsc3.vsc.ac.at
In the following you will be asked to type //first// your password and //then// your **o**ne **t**ime **p**assword (OTP; sms token).
[[doku:win2vsc|How to connect from Windows?]]
=== VSC 3 ===
Once you have logged into VSC-3, type:
* **module avail** to get a basic idea of what is around in terms of installed software and available standard tools
* **module list** to see what is currently loaded into your session
* **module unload //xyz//** to unload a particular package **//xyz//** from your session
* **module load //xyz//** to load a particular package **//xyz//** into your session \\
== Note: ==
- **//xyz//** format corresponds exactly to the output of **module avail**. Thus, in order to load or unload a selected module, copy and paste exactly the name listed by **module avail**.\\
- a list of **module load/unload** directives may also be included in the top part of a job submission script\\
When all required/intended modules have been loaded, user packages may be compiled as usual.
=== SLURM (Simple Linux Utility for Resource Management) ===
Contrary to previous VSC* times, the scheduler on VSC-3 is [[http://slurm.schedmd.com|SLURM]].
For basic information type:
* **sinfo** to find out which 'queues'='partitions' are available for job submission. Note: the in SGE times termed 'queue', is now under SLURM called a 'partition'.
* **scontrol show //partition//** more or less the same as the previous command except that with **scontrol** much more information may be obtained and basic settings be modified/reset/abandoned.
* **squeue** to see the current list of submitted jobs, their state and resource allocation.
==== A simple job submission script====
vi check.slrm\\
#!/bin/bash
#
#SBATCH -J chk
#SBATCH -N 2
#SBATCH --ntasks-per-node=16
#SBATCH --ntasks-per-core=1
mpirun -np 32 a.out
* **-J** some name for the job\\
* **-N** number of nodes requested (16 cores per node available)\\
* **--ntasks-per-node** number of processes run in parallel on a single node \\
* **--ntasks-per-core** number of tasks a single core should work on\\
* **mpirun -np 32 a.out** standard invocation of some parallel program (a.out) running 32 processes in parallel. **Note**,
* in SLURM **srun** is preferred over **mpirun**, so an equivalent call to the one on the final line above could have been **srun -l -N2 -n32 a.out** where the **-l** just adds task-specific labels to the beginning of all output lines.
==Job submission===
[username@l31 ~]$ sbatch check.slrm # to submit the job
[username@l31 ~]$ squeue # to check the status
[username@l31 ~]$ scancel JOBID # for premature removal, where JOBID
# is obtained from the previous command
===Another simple job submission script===
This example is for using a set of 4 nodes to compute a series of jobs in two stages, each of them split into two separate subjobs. \\
vi check.slrm\\
#!/bin/bash
#
#SBATCH -J chk
#SBATCH -N 4
#SBATCH --ntasks-per-node=16
#SBATCH --ntasks-per-core=1
export I_MPI_PMI_LIBRARY=/cm/shared/apps/slurm/current/lib64/libpmi.so
scontrol show hostnames $SLURM_NODELIST > ./nodelist
srun -l -N2 -r0 -n32 job1.scrpt &
srun -l -N2 -r2 -n32 job2.scrpt &
wait
srun -l -N2 -r2 -n32 job3.scrpt &
srun -l -N2 -r0 -n32 job4.scrpt &
wait
== Note: ==
the file 'nodelist' has been written for information only; \\
it is important to send the jobs into the background (&) and insert the 'wait' at each synchronization point; \\
with **-r2** one can define an offset in the node list, in particular the **-r2** means taking nodes number 2 and 3 from the set of four (where the list starts with node number 0), hence a combination of -N -r -n allows full control over all involved cores and the tasks they are going to be used for;