Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
doku:likwid [2015/09/22 12:02] shdoku:likwid [Unknown date] (current) – external edit (Unknown date) 127.0.0.1
Line 3: Line 3:
  
 ==== Background: ====  ==== Background: ==== 
-It is proving increasingly difficult to exert control over how different threads are being assigned to the available CPU cores in multi-threaded OpenMP applications. Particularly troublesome are hybrid MPI/OpenMP codes where usually the developer has a clear idea of which regions to run in parallel, but relies on the OS for optimal assignment of different physical cores to the individual computing threads. A variety of methods do exist to explicitly state which CPU core should be linked to what particular thread, however, in practice many of these configurations turn out to be either non-functional, dependent on MPI versions, or frequently ineffective and overruled by the queuing system (e.g. [[doku:slurm|SLURM]]). In the following we describe the auxiliary tool ''likwid-pin'' that has shown promise in successfully managing arbitrary thread assignment to individual CPU cores in a more general way.+It is proving increasingly difficult to exert control over the assignment of different threads to the available CPU cores in multi-threaded OpenMP applications. Particularly troublesome are hybrid MPI/OpenMP codes. Here, the developer usually has a comprehensive knowledge of the regions running in parallel, but relies on the OS for optimal assignment of different physical cores to the individual computing threads. A variety of methods do exist to explicitly state the link between CPU core and a particular thread. However, in practice many of these configurations turn out to be either non-functional, dependent on MPI versions, or frequently ineffective and, moreover, are overruled by the queuing system (e.g. [[doku:slurm|SLURM]]). In the following the auxiliary tool ''likwid-pin'' is described. It has shown promise in successfully managing arbitrary thread assignment to individual CPU cores in a more general way.
  
  
 ==== Example: ==== ==== Example: ====
-Suppose we have the following little test program, {{:doku:test_mpit_var.c|test_mpit_var.c}}, and want to run it using 8 threads on a single compute node based on the following set of physical cores: 3, 4, 2, 1, 6, 5, 7, 9. So after compilation like ''mpigcc -fopenmp ./test_mpit_var.c'' we could use the following submit script to [[doku:slurm|SLURM]]+Suppose we have the following little test program, {{:doku:test_mpit_var.c|test_mpit_var.c}}, and want to run it using 8 threads on a single compute node based on the following set of physical cores: 3, 4, 2, 1, 6, 5, 7, 9. Thus, after compilation, e.g., via ''mpigcc -fopenmp ./test_mpit_var.c''the following  [[doku:slurm|SLURM]] submit script  could be used
  
  
Line 23: Line 23:
    likwid-pin -c 3,3,4,2,1,6,5,7,9 ./a.out    likwid-pin -c 3,3,4,2,1,6,5,7,9 ./a.out
    
-  * Note the repeated declaration of the initial core #3 which is due to the fact that we are still calling one main taskwhich subsequently will branch out into 8 parallel threads.  +  * Note the repeated declaration of the initial core #3. This is required due to the fact that one main task is called which subsequently will branch out into 8 parallel threads.  
-  * Thread #0 must run on the same core the parent process (main task) will run (core #3 in the above example). +  * Thread #0 must run on the same core the parent process (main task) will run at (e.g. core #3 in the above example). 
-  * There is plenty of additional ways to define appropriate masks for thread domains (see below link), for example, to employ all available physical cores in an explicit order on both sockets, we could have set ''export OMP_NUM_THREADS=16'' and then called for ''likwid-pin -c 2,2,0,1,3,7,4,6,5,10,8,9,11,15,12,14,13 ./a.out''+  * There are plenty of additional ways to define appropriate masks for thread domains (see link below), for example, in order to employ all available physical cores in an explicit order on both sockets, ''export OMP_NUM_THREADS=16'' could be set and then ''likwid-pin -c 2,2,0,1,3,7,4,6,5,10,8,9,11,15,12,14,13 ./a.out'' could be called. 
 +  * The good news is, likwid-pin works exactly the same way for INTEL-based compilers. For example, the above submit script would have led to exactly the same type of results when compiled with the command ''mpiicc  -openmp  ./test_mpit_var.c''
 +==== MPI/OpenMP: ==== 
 +''likwid-pin'' may also be used for hybrid MPI/OpenMP applications. For example, in order to run the little test program on 4 nodes using 16 threads per node the submit script has to simply be modified in the following way, 
 + 
 +   #!/bin/bash 
 +   # 
 +   #SBATCH -J tmvmxd      
 +   #SBATCH -N 4 
 +   #SBATCH --time=00:01:00 
 +   
 +   module purge 
 +   module load intel-mpi/5 likwid/4.0 
 +    
 +   export I_MPI_PMI_LIBRARY=/cm/shared/apps/slurm/current/lib/libpmi.so 
 +   export OMP_NUM_THREADS=16 
 +    
 +   srun -n4 likwid-pin -c 0,0-15 ./a.out 
 + 
  
 ==== Further Reading: ==== ==== Further Reading: ====
 [[https://github.com/rrze-likwid/likwid/wiki/Likwid-Pin]] [[https://github.com/rrze-likwid/likwid/wiki/Likwid-Pin]]
  • doku/likwid.1442923323.txt.gz
  • Last modified: 2015/09/22 12:02
  • by sh