This version (2022/06/20 09:01) was approved by msiegel.

Starting fewer MPI processes per node than slots available

In cases when not all CPUs of one node are required, the machines file can be modified to guarantee the right behaviour of mpirun. The $TMPDIR/machines file on VSC-1 consists of a number of machine/node names. Each name stands for one CPU on the given machine/node. For an exclusive job on 2 nodes the machine file looks like:

r10n01
r10n01
r10n01
r10n01
r10n01
r10n01
r10n01
r10n01
r12n10
r12n10
r12n10
r12n10
r12n10
r12n10
r12n10
r12n10

For running a job on less than eight cores the $TMPDIR/machines file has to be replaced within the job script:

#$ -N test
#$ -pe mpich 16

NSLOTS_PER_NODE_AVAILABLE=8
NSLOTS_PER_NODE_USED=4
NSLOTS_REDUCED=`echo "$NSLOTS / $NSLOTS_PER_NODE_AVAILABLE * $NSLOTS_PER_NODE_USED" | bc  `

echo "starting run with $NSLOTS_REDUCED processes; $NSLOTS_PER_NODE_USED per node"
for i in `seq 1 $NSLOTS_PER_NODE_USED`
do
	uniq $TMPDIR/machines >> $TMPDIR/tmp
done
sort $TMPDIR/tmp  > $TMPDIR/myhosts
cat $TMPDIR/myhosts


mpirun -machinefile $TMPDIR/myhosts -np $NSLOTS_REDUCED sleep 2

The reduced form would look like:

r10n01
r10n01
r10n01
r10n01
r12n10
r12n10
r12n10
r12n10
  • doku/machine_file.txt
  • Last modified: 2014/08/06 10:17
  • by ir