This version is outdated by a newer approved version.DiffThis version (2015/09/21 14:51) is a draft.
Approvals: 0/1

This is an old revision of the document!


It is proving increasingly difficult to exert control over how different threads are assigned to the available CPU cores in multi-threaded OpenMP applications. Particularly troublesome are hybrid MPI/OpenMP codes where usually the developer has a clear idea of which regions to run in parallel, but relies on the OS for optimal assignment of different physical cores to the individual threads. A variety of methods do exist to explicitly state which CPU core should be linked to what particular thread, however, in practice many of these recommended ways of configuration turn out to be either non-functional, dependent on MPI versions, or frequently ineffective and overruled by the queuing system (e.g. SLURM). In the following we describe the auxiliary tool likwid-pin that has shown promise in successfully managing arbitrary thread assignment to individual CPU cores in a more general way.

Assume we have the following little test program, test_mpit_var.c, and want to run it with 8 threads on a single compute node using the following set of physical cores: 3, 4, 2, 1, 6, 5, 7, 9. So after compilation as mpigcc -fopenmp ./test_mpit_var.c we could use the following submit script to SLURM

 #!/bin/bash
 #
 #SBATCH -J tmv     
 #SBATCH -N 1
 #SBATCH --time=00:01:00

 module purge
 module load intel-mpi/5
 export OMP_NUM_THREADS=8
 module load allinea/5.10
  
 perf-report mpirun -np 32 a.out
 

This will result in the creation of two summary files in *.txt and *.html format providing an overview of the relative time spent in MPI, I/O, OpenMP etc.

Explanations, examples, typical cases at http://content.allinea.com/downloads/getting-started.pdf

  • doku/likwid.1442847118.txt.gz
  • Last modified: 2015/09/21 14:51
  • by sh