This version (2022/06/20 09:01) was approved by msiegel.

Large Jobs

Jobs using more than 1024 cores and Intel MPI 4.0.3 run into a problem in the startup phase. As a result not all nodes will start the user job properly. To overcome this startup problem we recommend setting the environment variable

export I_MPI_HYDRA_BRANCH_COUNT=130
# 1024 Cores: default is OK
# 2048 Cores: export I_MPI_HYDRA_BRANCH_COUNT=130
# more Cores: scaling linearly

Other versions of MPI are not affected by this problem. However, Intel MPI 4.0.1 has poor performance in the startup phase if many cores are requested whereas the overall performance of Open MPI is a bit lower – depending on the application.

  • doku/large.txt
  • Last modified: 2012/01/09 13:25
  • by 127.0.0.1