map and ddt are ARM's (formerly Allinea's) advanced tools for performance analysis, see Licenses for up to 512 parallel tasks are available. Of additional note, perf-report — a related lightweight profiling tool — has now been integrated into forge in more recent releases.

Profiling may be split into two steps, where the initial task is to create a *.map file from within a regular job script submitted to SLURM. In a subsequent step this *.map file can then be analyzed within an interactive session on the login node. Suppose we had previously prepared an application for profiling, for instance via mpicc -g -O3 ./my_prog.c then we could call for a corresponding profile with the following submit script,

 #SBATCH -J map       
 #SBATCH -L allinea@vsc
 #SBATCH --ntasks-per-node 16
 #SBATCH --ntasks-per-core  1
 module purge
 module load  intel/18  intel-mpi/2018 allinea/20.1_FORGE
 map --profile srun --jobid $SLURM_JOB_ID --mpi=pmi2 -n 64 ./a.out

which generates a *.map file (note the mention of #tasks and #nodes together with the date/time stamp in the filename) that may then be analyzed via the gui, ie

 ssh -l my_uid -X
 cd wherever/the/map/file/may/be
 module purge
 module load allinea/20.1_FORGE
 map ./

Debugging with ddt is currently limited to the Remote Launch option. Best is to launch ddt-sessions on separate compute nodes.

ddt (fully interactive via salloc):

The following steps need to be carried out:

 ssh -l my_uid -X
 my_uid@l33$  cd wherever/my/app/may/be
 my_uid@l33$  salloc -N 4 -L allinea@vsc
 my_uid@l33$  echo $SLURM_JOB_ID    ( just to figure out the current job ID, say it's 8909346 )
 my_uid@l33$  srun --jobid 8909346 -n 4 hostname | tee ./machines.txt ( this is important ! it looks like a redundant command but will actually fix a lot of the prerequisites usually taken care of in the SLURM prologue of regular submit scripts, one of them being provisioning of required licenses )
              ... let's assume we got n305-[044,057,073,074] which should now be listed inside file 'machines.txt' 
 my_uid@l33$  rm -rf ~/.allinea/   ( to get rid of obsolete configurations from previous sessions )
 my_uid@l33$  module purge
 my_uid@l33$  module load  intel/18  intel-mpi/2018  allinea/20.1_FORGE   ( or whatever else suite of MPI )
 my_uid@l33$  mpiicc -g -O0 my_app.c
 my_uid@l33$  ddt &     ( gui should open )
              ... select 'Remote Launch - Configure'
              ... click  'Add'   
              ... set my_uid@n305-044 as 'Host Name' or any other node from the above list
              ... set 'Remote Installation Directory' to /opt/sw/x86_64/glibc-2.17/ivybridge-ep/allinea/20.1_FORGE
              ... keep auto-selected defaults for the rest, then check it with 'Test Remote Launch'     ( should be ok )
              ... click OK twice to close the dialogues
              ... click Close to exit from the Configure menu
              ... next really select 'Remote Launch' by clicking the name tag that was auto-assigned above   ( licence label should be ok in the lower left corner and the hostname of the connecting client should appear in the lower right corner )
 ssh -l my_uid   ( a second terminal will be needed to actually start the debug session )   
 my_uid@l34$  ssh n305-044       ( log into that compute node that was selected/prepared above for remote launch )
 my_uid@n305-044$  module purge
 my_uid@n305-044$  module load  intel/18  intel-mpi/2018  allinea/20.1_FORGE
 my_uid@n305-044$  cd wherever/my/app/may/be
 my_uid@n305-044$  srun --jobid 8909346 -n 16 hostname    ( just a dummy check to see whether all is set up and working correctly )
 my_uid@n305-044$  ddt --connect srun --jobid 8909346 --mpi=pmi2 -n 64 ./a.out -arg1 -arg2   ( in the initial ddt-window a dialogue will pop up prompting for a Reverse Connection request; accept it and click Run and the usual debug session will start )
  • doku/forge.txt
  • Last modified: 2020/09/24 13:09
  • by sh