This version (2022/06/20 09:01) was approved by msiegel.

The NUMA memory of VSC-2 is highly depending on the positioning of processes to the four NUMA nodes on each compute node. Using Intel MPI the Parameter

export I_MPI_PIN_PROCESSOR_LIST=1,14,9,6,5,10,13,2,3,12,11,4,7,8,15,0

as mentioned above should always be used to pin (up to) 16 processes to the 16 cores. In the case of sequential jobs, we recommend to use 'taskset' or 'numactl', e.g.

taskset -c 0 our_example_code param1 param2 >out1 &
taskset -c 8 our_example_code param1 param2 >out2 &
wait

Performance gains of up to 200% were observed for synthetic benchmarks. Note also the examples for sequential jobs.

  • doku/pinning.txt
  • Last modified: 2014/06/18 08:25
  • by ir