It is proving increasingly difficult to exert control over how different threads are being assigned to the available CPU cores in multi-threaded OpenMP applications. Particularly troublesome are hybrid MPI/OpenMP codes where usually the developer has a clear idea of which regions to run in parallel, but relies on the OS for optimal assignment of different physical cores to the individual computing threads. A variety of methods do exist to explicitly state which CPU core should be linked to what particular thread, however, in practice many of these configurations turn out to be either non-functional, dependent on MPI versions, or frequently ineffective and overruled by the queuing system (e.g. SLURM). In the following we describe the auxiliary tool likwid-pin that has shown promise in successfully managing arbitrary thread assignment to individual CPU cores in a more general way.

Suppose we have the following little test program, test_mpit_var.c, and want to run it using 8 threads on a single compute node based on the following set of physical cores: 3, 4, 2, 1, 6, 5, 7, 9. So after compilation like mpigcc -fopenmp ./test_mpit_var.c we could use the following submit script to SLURM

 #!/bin/bash
#
#SBATCH -J tmv
#SBATCH -N 1
#SBATCH --time=00:01:00

module purge

likwid-pin -c 3,3,4,2,1,6,5,7,9 ./a.out
• Note the repeated declaration of the initial core #3 which is due to the fact that we are still calling one main task, which subsequently will branch out into 8 parallel threads.
• Thread #0 must run on the same core the parent process (main task) will run at (e.g. core #3 in the above example).
• There is plenty of additional ways to define appropriate masks for thread domains (see below link), for example, to employ all available physical cores in an explicit order on both sockets, we could have set export OMP_NUM_THREADS=16 and then called for likwid-pin -c 2,2,0,1,3,7,4,6,5,10,8,9,11,15,12,14,13 ./a.out
• On the good note, likwid-pin works exactly the same way for INTEL-based compilers, for example we could have compiled our little test program as mpiicc -openmp ./test_mpit_var.c and the above submit script would have led to exactly the same type of results.

likwid-pin may also be used for hybrid MPI/OpenMP applications. For example, if we want to run our little test program on 4 nodes using 16 threads per node we simply needed to modify our submit script in the following way,

 #!/bin/bash
#
#SBATCH -J tmvmxd
#SBATCH -N 4
#SBATCH --time=00:01:00

module purge
srun -n4 likwid-pin -c 0,0-15 ./a.out