This is an old revision of the document!
forge = map + ddt
Synopsis:
map and ddt are ARM's (formerly Allinea's) advanced tools for performance analysis, see https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge. Licenses for up to 512 parallel tasks are available. Of additional note, perf-report — a related lightweight profiling tool — has now been integrated into forge in more recent releases.
Usage of map:
Profiling may be split into two steps, where the initial task is to create a *.map file from within a regular job script submitted to SLURM. In a subsequent step this *.map file can then be analyzed within an interactive session on the login node. Suppose we had previously prepared an application for profiling, for instance via mpicc -g -O3 ./my_prog.c
then we could call for a corresponding profile with the following submit script,
#!/bin/bash # #SBATCH -J map #SBATCH -N 4 #SBATCH -L allinea@vsc #SBATCH --ntasks-per-node 16 #SBATCH --ntasks-per-core 1 module purge module load intel/18 intel-mpi/2018 allinea/20.1_FORGE map --profile srun --jobid $SLURM_JOB_ID --mpi=pmi2 -n 64 ./a.out
which generates a *.map file (note the mention of #tasks and #nodes together with the date/time stamp in the filename) that may then be analyzed via the gui, ie
ssh vsc3.vsc.ac.at -l my_uid -X cd wherever/the/map/file/may/be module purge module load allinea/20.1_FORGE map ./a_64p_4n_2020-09-24_11-42.map
Usage of ddt:
Debugging with ddt
is currently limited to the Remote Launch option.
Best is to launch ddt
-sessions on separate compute nodes.
ddt (fully interactive via salloc):
The following steps need to be carried out:
ssh vsc3.vsc.ac.at -l my_uid -X my_uid@l33$ cd wherever/my/app/may/be my_uid@l33$ salloc -N 4 -L allinea@vsc my_uid@l33$ echo $SLURM_JOB_ID ( just to figure out the current job ID, say it's 8909346 ) my_uid@l33$ srun --jobid 8909346 -n 4 hostname | tee ./machines.txt ( this is important ! it looks like a redundant command but will actually fix a lot of the prerequisites usually taken care of in the SLURM prologue of regular submit scripts, one of them being provisioning of required licenses ) ... let's assume we got n305-[044,057,073,074] which should now be listed inside file 'machines.txt' my_uid@l33$ rm -rf ~/.allinea/ ( to get rid of obsolete configurations from previous sessions ) my_uid@l33$ module purge my_uid@l33$ module load intel/18 intel-mpi/2018 allinea/20.1_FORGE ( or whatever else suite of MPI ) my_uid@l33$ mpiicc -g -O0 my_app.c my_uid@l33$ ddt & ( gui should open ) ... select 'Remote Launch - Configure' ... click 'Add' ... set my_uid@n305-044 as 'Host Name' or any other node from the above list ... set 'Remote Installation Directory' to /opt/sw/x86_64/glibc-2.17/ivybridge-ep/allinea/20.1_FORGE ... keep auto-selected defaults for the rest, then check it with 'Test Remote Launch' ( should be ok ) ... click OK twice to close the dialogues ... click Close to exit from the Configure menu ... next really select 'Remote Launch' by clicking the name tag that was auto-assigned above ( licence label should be ok in the lower left corner and the hostname of the connecting client should appear in the lower right corner ) ssh vsc3.vsc.ac.at -l my_uid ( a second terminal will be needed to actually start the debug session ) my_uid@l34$ ssh n305-044 ( log into that compute node that was selected/prepared above for remote launch ) my_uid@n305-044$ module purge my_uid@n305-044$ module load intel/18 intel-mpi/2018 allinea/20.1_FORGE my_uid@n305-044$ cd wherever/my/app/may/be my_uid@n305-044$ srun --jobid 8909346 -n 16 hostname ( just a dummy check to see whether all is set up and working correctly ) my_uid@n305-044$ ddt --connect srun --jobid 8909346 --mpi=pmi2 -n 64 ./a.out -arg1 -arg2 ( in the initial ddt-window a dialogue will pop up prompting for a Reverse Connection request; accept it and click Run and the usual debug session will start )
Further Reading:
/opt/sw/x86_64/glibc-2.17/ivybridge-ep/allinea/20.1_FORGE/doc/userguide-forge.pdf
Tutorial (by Patrick Wohlschlegel, Allinea):
- Overflow: 5_makefile 5_mmult2.c