Brief Introduction
Login
ssh <username>@vsc3.vsc.ac.at
In the following you will be asked to type first your password and then your one time password (OTP; sms token).
How to connect from Windows?
VSC 3
Once you have logged into VSC-3, type:
module avail to get a basic idea of what is around in terms of installed software and available standard tools
module list to see what is currently loaded into your session
module unload xyz to unload a particular package xyz from your session
module load xyz to load a particular package xyz into your session
Note:
xyz format corresponds exactly to the output of module avail. Thus, in order to load or unload a selected module, copy and paste exactly the name listed by module avail.
a list of module load/unload directives may also be included in the top part of a job submission script
When all required/intended modules have been loaded, user packages may be compiled as usual.
SLURM (Simple Linux Utility for Resource Management)
Contrary to previous VSC* times, the scheduler on VSC-3 is SLURM.
For basic information type:
sinfo to find out which 'queues'='partitions' are available for job submission. Note: the in SGE times termed 'queue', is now under SLURM called a 'partition'.
scontrol show partition more or less the same as the previous command except that with scontrol much more information may be obtained and basic settings be modified/reset/abandoned.
squeue to see the current list of submitted jobs, their state and resource allocation.
A simple job submission script
vi check.slrm
#!/bin/bash
#
#SBATCH -J chk
#SBATCH -N 2
#SBATCH --ntasks-per-node=16
#SBATCH --ntasks-per-core=1
mpirun -np 32 a.out
-J some name for the job
-N number of nodes requested (16 cores per node available)
–ntasks-per-node number of processes run in parallel on a single node
–ntasks-per-core number of tasks a single core should work on
mpirun -np 32 a.out standard invocation of some parallel program (a.out) running 32 processes in parallel. Note,
in SLURM srun is preferred over mpirun, so an equivalent call to the one on the final line above could have been srun -l -N2 -n32 a.out where the -l just adds task-specific labels to the beginning of all output lines.
Job submission
[username@l31 ~]$ sbatch check.slrm # to submit the job
[username@l31 ~]$ squeue # to check the status
[username@l31 ~]$ scancel JOBID # for premature removal, where JOBID
# is obtained from the previous command
Another simple job submission script
This example is for using a set of 4 nodes to compute a series of jobs in two stages, each of them split into two separate subjobs.
vi check.slrm
#!/bin/bash
#
#SBATCH -J chk
#SBATCH -N 4
#SBATCH --ntasks-per-node=16
#SBATCH --ntasks-per-core=1
export I_MPI_PMI_LIBRARY=/cm/shared/apps/slurm/current/lib64/libpmi.so
scontrol show hostnames $SLURM_NODELIST > ./nodelist
srun -l -N2 -r0 -n32 job1.scrpt &
srun -l -N2 -r2 -n32 job2.scrpt &
wait
srun -l -N2 -r2 -n32 job3.scrpt &
srun -l -N2 -r0 -n32 job4.scrpt &
wait
Note:
the file 'nodelist' has been written for information only;
it is important to send the jobs into the background (&) and insert the 'wait' at each synchronization point;
with -r2 one can define an offset in the node list, in particular the -r2 means taking nodes number 2 and 3 from the set of four (where the list starts with node number 0), hence a combination of -N -r -n allows full control over all involved cores and the tasks they are going to be used for;