This version (2022/06/20 09:01) was approved by msiegel.

This is an old revision of the document!


It is proving increasingly difficult to exert control over the assignment of different threads to the available CPU cores in multi-threaded OpenMP applications. Particularly troublesome are hybrid MPI/OpenMP codes. Here, the developer usually has a comprehensive knowledge of the regions running in parallel, but relies on the OS for optimal assignment of different physical cores to the individual computing threads. A variety of methods do exist to explicitly state the link between CPU core and a particular thread. However, in practice many of these configurations turn out to be either non-functional, dependent on MPI versions, or frequently ineffective and, moreover, are overruled by the queuing system (e.g. SLURM). In the following the auxiliary tool likwid-pin is described. It has shown promise in successfully managing arbitrary thread assignment to individual CPU cores in a more general way.

Suppose we have the following little test program, test_mpit_var.c, and want to run it using 8 threads on a single compute node based on the following set of physical cores: 3, 4, 2, 1, 6, 5, 7, 9. Thus, after compilation, e.g., via mpigcc -fopenmp ./test_mpit_var.c, the following SLURM submit script could be used

 #!/bin/bash
 #
 #SBATCH -J tmv     
 #SBATCH -N 1
 #SBATCH --time=00:01:00

 module purge
 module load intel-mpi/5 likwid/4.0
 
 export OMP_NUM_THREADS=8
 
 likwid-pin -c 3,3,4,2,1,6,5,7,9 ./a.out
  • Note the repeated declaration of the initial core #3. This is required due to the fact that one main task is called which subsequently will branch out into 8 parallel threads.
  • Thread #0 must run on the same core the parent process (main task) will run at (e.g. core #3 in the above example).
  • There are plenty of additional ways to define appropriate masks for thread domains (see link below), for example, in order to employ all available physical cores in an explicit order on both sockets, export OMP_NUM_THREADS=16 could be set and then likwid-pin -c 2,2,0,1,3,7,4,6,5,10,8,9,11,15,12,14,13 ./a.out could be called.
  • The good news is, likwid-pin works exactly the same way for INTEL-based compilers. For example, the above submit script would have led to exactly the same type of results when compiled with the command mpiicc -openmp ./test_mpit_var.c.

likwid-pin may also be used for hybrid MPI/OpenMP applications. For example, in order to run the little test program on 4 nodes using 16 threads per node the submit script has to simply be modified in the following way,

 #!/bin/bash
 #
 #SBATCH -J tmvmxd     
 #SBATCH -N 4
 #SBATCH --time=00:01:00

 module purge
 module load intel-mpi/5 likwid/4.0
 
 export I_MPI_PMI_LIBRARY=/cm/shared/apps/slurm/current/lib/libpmi.so
 export OMP_NUM_THREADS=16
 
 srun -n4 likwid-pin -c 0,0-15 ./a.out
  • doku/likwid.1443101450.txt.gz
  • Last modified: 2024/10/24 10:21
  • (external edit)