Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
doku:likwid [2015/09/21 14:48] shdoku:likwid [Unknown date] (current) – external edit (Unknown date) 127.0.0.1
Line 3: Line 3:
  
 ==== Background: ====  ==== Background: ==== 
-It is proving increasingly difficult to exert control over how different threads are assigned to the available CPU cores in multi-threaded OpenMP applications. Particularly troublesome are hybrid MPI/OpenMP codes where usually the developer has a clear idea of which regions to run in parallel, but relies on the OS for optimal assignment of different physical cores to the individual threads. A variety of methods do exist to explicitly state which CPU core should be linked to what particular thread, however, in practice many of these recommended ways of configuration turn out to be either non-functional, dependent on MPI versions, or frequently ineffective and overruled by the queuing system (e.g. [[doku:slurm|SLURM]]). In the following we describe the auxiliary tool ''likwid-pin'' that has shown promise in successfully managing arbitrary thread assignment to individual CPU cores in a more general way.+It is proving increasingly difficult to exert control over the assignment of different threads to the available CPU cores in multi-threaded OpenMP applications. Particularly troublesome are hybrid MPI/OpenMP codes. Here, the developer usually has a comprehensive knowledge of the regions running in parallel, but relies on the OS for optimal assignment of different physical cores to the individual computing threads. A variety of methods do exist to explicitly state the link between CPU core and a particular thread. However, in practice many of these configurations turn out to be either non-functional, dependent on MPI versions, or frequently ineffective and, moreover, are overruled by the queuing system (e.g. [[doku:slurm|SLURM]]). In the following the auxiliary tool ''likwid-pin'' is described. It has shown promise in successfully managing arbitrary thread assignment to individual CPU cores in a more general way.
  
  
 ==== Example: ==== ==== Example: ====
-Assume we have the following little test program, test_mpit_var.c, and want to run it with 8 threads on a single compute node using the following thread assignment to physical cores: 3, 4, 2, 1, 6, 5, 7, 9. So after compiling it like, ''mpigcc -fopenmp ./test_mpit_var.c'' we could use the following submit script to [[doku:slurm|SLURM]]+Suppose we have the following little test program, {{:doku:test_mpit_var.c|test_mpit_var.c}}, and want to run it using 8 threads on a single compute node based on the following set of physical cores: 3, 4, 2, 1, 6, 5, 7, 9. Thus, after compilation, e.g.via ''mpigcc -fopenmp ./test_mpit_var.c''the following  [[doku:slurm|SLURM]] submit script  could be used
  
  
    #!/bin/bash    #!/bin/bash
    #    #
-   #SBATCH -J prflng         +   #SBATCH -J tmv      
-   #SBATCH -N  +   #SBATCH -N 1 
-   #SBATCH --time=00:10:00+   #SBATCH --time=00:01:00
      
    module purge    module purge
-   module load intel-mpi/5 +   module load intel-mpi/likwid/4.0
-   module load allinea/5.10 +
-     +
-   perf-report mpirun -np 32 a.out+
        
-This will result in the creation of two summary files in *.txt and *.html format providing an overview of the relative time spent in MPI, I/OOpenMP etc.+   export OMP_NUM_THREADS=8 
 +    
 +   likwid-pin -c 3,3,4,2,1,6,5,7,9 ./a.out 
 +  
 +  * Note the repeated declaration of the initial core #3. This is required due to the fact that one main task is called which subsequently will branch out into 8 parallel threads.  
 +  * Thread #0 must run on the same core the parent process (main task) will run at (e.g. core #3 in the above example). 
 +  * There are plenty of additional ways to define appropriate masks for thread domains (see link below), for example, in order to employ all available physical cores in an explicit order on both sockets, ''export OMP_NUM_THREADS=16'' could be set and then ''likwid-pin -c 2,2,0,1,3,7,4,6,5,10,8,9,11,15,12,14,13 ./a.out'' could be called. 
 +  The good news is, likwid-pin works exactly the same way for INTEL-based compilersFor example, the above submit script would have led to exactly the same type of results when compiled with the command ''mpiicc  -openmp  ./test_mpit_var.c''
 +==== MPI/OpenMP: ==== 
 +''likwid-pin'' may also be used for hybrid MPI/OpenMP applications. For example, in order to run the little test program on 4 nodes using 16 threads per node the submit script has to simply be modified in the following way, 
 + 
 +   #!/bin/bash 
 +   # 
 +   #SBATCH -J tmvmxd      
 +   #SBATCH -N 4 
 +   #SBATCH --time=00:01:00 
 +   
 +   module purge 
 +   module load intel-mpi/5 likwid/4.0 
 +    
 +   export I_MPI_PMI_LIBRARY=/cm/shared/apps/slurm/current/lib/libpmi.so 
 +   export OMP_NUM_THREADS=16 
 +    
 +   srun -n4 likwid-pin -c 0,0-15 ./a.out 
 + 
  
 ==== Further Reading: ==== ==== Further Reading: ====
-Explanations, examples, typical cases at [[http://content.allinea.com/downloads/getting-started.pdf]]+[[https://github.com/rrze-likwid/likwid/wiki/Likwid-Pin]]
  • doku/likwid.1442846905.txt.gz
  • Last modified: 2015/09/21 14:48
  • by sh