This version (2024/10/24 10:28) is a draft.
Approvals: 0/1The Previously approved version (2014/06/18 08:25) is available.
Approvals: 0/1The Previously approved version (2014/06/18 08:25) is available.
Process pinning
The NUMA memory of VSC-2 is highly depending on the positioning of processes to the four NUMA nodes
on each compute node.
Using Intel MPI the Parameter
export I_MPI_PIN_PROCESSOR_LIST=1,14,9,6,5,10,13,2,3,12,11,4,7,8,15,0
as mentioned above should always be used to pin (up to) 16 processes to the 16 cores. In the case of sequential jobs, we recommend to use 'taskset' or 'numactl', e.g.
taskset -c 0 our_example_code param1 param2 >out1 & taskset -c 8 our_example_code param1 param2 >out2 & wait
Performance gains of up to 200% were observed for synthetic benchmarks. Note also the examples for sequential jobs.