Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
doku:ompmpi [2014/08/12 10:08] irdoku:ompmpi [2015/06/09 13:32] – [Fewer MPI processes than slots available] ir
Line 1: Line 1:
-====== Hybrid OpenMP/MPI jobs ======+====== MPI environments ====== 
 +===== Fewer MPI processes than slots available =====
    
-===== Modifying the machine file ===== 
  
-Starting fewer MPI processes per node than slots available+Starting fewer MPI processes per node than slots available can be established in two different ways, 
 +  - either by manually modifying the machine file  
 +  - or by automatic modification through a parallel environment.
  
 +This can be useful for memory intensive or hybrid OpenMP/MPI jobs.
  
 +Exemplarily, for the [[doku:vasp-benchmarks|VASP code]] running times depending on **mpich, number of processes, and number of threads** have been recorded.  
 ==== 1. Manually modifying the machine file ==== ==== 1. Manually modifying the machine file ====
  
Line 53: Line 57:
 </code> </code>
  
-The reduced form would look like:+This job script yields the following machines file:
  
 <code> <code>
Line 66: Line 70:
 </code> </code>
  
-==== 2. Automatic modification through parallel environment variable ====+==== 2. Automatic modification through parallel environment ====
  
-Jobs using more than 2 GB per process can be executed in one of the parallel environments +Jobs using more than 2 GB (VSC 2) or 3 GB (VSC 1) per process can be executed in one of the parallel environments 
-  * mpich : 2 GB per process+  - VSC 1: 
 +  * mpich/mpich8:  3 GB per process (mpich: number of processes equals number of cores in a node) 
 +  * mpich4: 6 GB per process 
 +  * mpich2: 12 GB per process 
 +  * mpich1: 24 GB per process 
 +  - VSC 2: 
 +  * mpich : 2 GB per process (acts like mpich16 would act but mpich16 is not defined)
   * mpich8: 4 GB per process   * mpich8: 4 GB per process
   * mpich4: 8 GB per process   * mpich4: 8 GB per process
Line 85: Line 95:
 The machine file and the variable NSLOTS_REDUCED are automatically modified such that they reflect the number of slots requested (4), whereas the variable NSLOTS is set to the number of physical cores allocated in the queueing system. This latter number NSLOTS corresponds to the cost calculation in the previous paragraph:  NSLOTS = 4 processes * 8 cores = 32 cores = 2 nodes. The machine file and the variable NSLOTS_REDUCED are automatically modified such that they reflect the number of slots requested (4), whereas the variable NSLOTS is set to the number of physical cores allocated in the queueing system. This latter number NSLOTS corresponds to the cost calculation in the previous paragraph:  NSLOTS = 4 processes * 8 cores = 32 cores = 2 nodes.
  
-=== Hybrid MPI/OpenMP jobs ===+===== Hybrid MPI/OpenMP jobs =====
 <code>#$ -pe mpich2 4 <code>#$ -pe mpich2 4
 export OMP_NUM_THREADS=8 export OMP_NUM_THREADS=8
 mpirun -machinefile $TMPDIR/machines -np $NSLOTS_REDUCED -ppn 2 ./executable</code> mpirun -machinefile $TMPDIR/machines -np $NSLOTS_REDUCED -ppn 2 ./executable</code>
 In this example, two nodes are reserved in the grid engine. On both nodes 2 processes each spanning 8 threads are started. In this example, two nodes are reserved in the grid engine. On both nodes 2 processes each spanning 8 threads are started.