===== forge = map + ddt =====
==== Synopsis: ====
map and ddt are ARM's (formerly Allinea's) advanced tools for performance analysis, see [[https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge]].
Licenses for up to 512 parallel tasks are available. Of additional note, [[doku:perf-report|perf-report]] --- a related lightweight profiling tool --- has now been integrated into forge in more recent releases.
==== Usage of map: ====
Profiling may be split into two steps, where the initial task is to create a *.map file from within a regular job script submitted to [[doku:slurm|SLURM]]. In a subsequent step this *.map file can then be analyzed within an interactive session on the login node. Suppose we had previously prepared an application for profiling, for instance via ''mpicc -g -O3 ./my_prog.c'' then we could call for a corresponding profile with the following submit script,
#!/bin/bash
#
#SBATCH -J map
#SBATCH -N 4
#SBATCH --ntasks-per-node 16
#SBATCH --ntasks-per-core 1
module purge
module load intel/18 intel-mpi/2018 arm/20.1_FORGE
map --profile srun --jobid $SLURM_JOB_ID --mpi=pmi2 -n 64 ./a.out
which generates a *.map file (note the mention of #tasks and #nodes together with the date/time stamp in the filename) that may then be analyzed via the gui, ie
ssh vsc4.vsc.ac.at -l my_uid -X
cd wherever/the/map/file/may/be
module purge
module load allinea/20.1_FORGE
map ./a_64p_4n_2020-09-24_11-42.map
==== Usage of ddt: ====
Debugging with ''ddt'' is currently limited to the __Remote Launch__ option.
Best is to launch ''ddt''-sessions on separate compute nodes.
=== ddt (fully interactive via salloc): ===
The following steps need to be carried out:
ssh vsc3.vsc.ac.at -l my_uid -X
my_uid@l33$ cd wherever/my/app/may/be
my_uid@l33$ salloc -N 4 -L allinea@vsc
my_uid@l33$ echo $SLURM_JOB_ID ( just to figure out the current job ID, say it's 8909346 )
my_uid@l33$ srun --jobid 8909346 -n 4 hostname | tee ./machines.txt ( this is important ! it looks like a redundant command but will actually fix a lot of the prerequisites usually taken care of in the SLURM prologue of regular submit scripts, one of them being provisioning of required licenses )
... let's assume we got n305-[044,057,073,074] which should now be listed inside file 'machines.txt'
my_uid@l33$ rm -rf ~/.allinea/ ( to get rid of obsolete configurations from previous sessions )
my_uid@l33$ module purge
my_uid@l33$ module load intel/18 intel-mpi/2018 allinea/20.1_FORGE ( or whatever else suite of MPI )
my_uid@l33$ mpiicc -g -O0 my_app.c
my_uid@l33$ ddt & ( gui should open )
... select 'Remote Launch - Configure'
... click 'Add'
... set my_uid@n305-044 as 'Host Name' or any other node from the above list
... set 'Remote Installation Directory' to /opt/sw/x86_64/glibc-2.17/ivybridge-ep/allinea/20.1_FORGE
... keep auto-selected defaults for the rest, then check it with 'Test Remote Launch' ( should be ok )
... click OK twice to close the dialogues
... click Close to exit from the Configure menu
... next really select 'Remote Launch' by clicking the name tag that was auto-assigned above ( licence label should be ok in the lower left corner and the hostname of the connecting client should appear in the lower right corner )
ssh vsc3.vsc.ac.at -l my_uid ( a second terminal will be needed to actually start the debug session )
my_uid@l34$ ssh n305-044 ( log into that compute node that was selected/prepared above for remote launch )
my_uid@n305-044$ module purge
my_uid@n305-044$ module load intel/18 intel-mpi/2018 allinea/20.1_FORGE
my_uid@n305-044$ cd wherever/my/app/may/be
my_uid@n305-044$ srun --jobid 8909346 -n 16 hostname ( just a dummy check to see whether all is set up and working correctly )
my_uid@n305-044$ ddt --connect srun --jobid 8909346 --mpi=pmi2 -n 64 ./a.out -arg1 -arg2 ( in the initial ddt-window a dialogue will pop up prompting for a Reverse Connection request; accept it and click Run and the usual debug session will start )
==== Further Reading: ====
''/opt/sw/x86_64/glibc-2.17/ivybridge-ep/allinea/20.1_FORGE/doc/userguide-forge.pdf''
{{ :doku:forge:training:2016may25.vsc.technical_training.pdf | Tutorial (by Patrick Wohlschlegel, Allinea):}}
- **Debugging:** {{:doku:forge:training:0_debugging_makefile | 1_makefile}} {{:doku:forge:training:0_debugging_mmult1.c | 1_mmult1.c}} {{:doku:forge:training:0_debugging_mmult1.f90 | 1_mmult1.f90}} {{:doku:forge:training:0_debugging_report.html | 1_report.html}} {{:doku:forge:training:0_debugging_script.sub | 1_script.sub}}
- **Profiling:** {{:doku:forge:training:1_profiling_1_mmult1_unopt_o1.html | 2.1_mmult1_unopt_o1.html}}{{:doku:forge:training:1_profiling_1_mmult1_unopt_o1.map | 2.1_mmult1_unopt_o1.map}} {{:doku:forge:training:1_profiling_2_mmult1_unopt_o3.html | 2.2_mmult1_unopt_o3.html}} {{:doku:forge:training:1_profiling_2_mmult1_unopt_o3.map | 2.2_mmult1_unopt_o3.map}} {{:doku:forge:training:1_profiling_3_mmult1_unopt_o3xhost.html | 2.3_mmult1_unopt_o3xhost.html}}{{:doku:forge:training:1_profiling_3_mmult1_unopt_o3xhost.map | 2.3_mmult1_unopt_o3xhost.map}}{{:doku:forge:training:1_profiling_makefile | 2_makefile}}{{:doku:forge:training:1_profiling_mmult1.c | 2_mmult1.c}}{{:doku:forge:training:1_profiling_mmult1.f90 | 2_mmult1.f90}}
- **Vectorization:** {{:doku:forge:training:2_vecto_1_mmult1_opt_o1.html | 3.1_mmult1_unopt_o1.html}}{{:doku:forge:training:2_vecto_1_mmult1_opt_o1.map | 3.1_mmult1_unopt_o1.map}} {{:doku:forge:training:2_vecto_2_mmult1_opt_o3.html | 3.2_mmult1_opt_o3.html}} {{:doku:forge:training:2_vecto_2_mmult1_opt_o3.map | 3.2_mmult1_opt_o3.map}} {{:doku:forge:training:2_vecto_3_mmult1_opt-ivdep_o3.html | 3.3_mmult1_opt-ivdep_o3.html}} {{:doku:forge:training:2_vecto_3_mmult1_opt-ivdep_o3.map | 3.3_mmult1_opt-ivdep_o3.map}} {{:doku:forge:training:2_vecto_makefile | 3_makefile}} {{:doku:forge:training:2_vecto_mmult1_sol.c | 3_mmult1_sol.c}} {{:doku:forge:training:2_vecto_mmult1_sol.f90 | 3_mmult1_sol.f90}}
- **Leaks:** {{:doku:forge:training:3_leaks_makefile | 4_makefile}} {{:doku:forge:training:3_leaks_mmult2.c | 4_mmult2.c}} {{:doku:forge:training:3_leaks_mmult2.f90 | 4_mmult2.f90}} {{:doku:forge:training:3_leaks_ref_c.mat | 4_ref_c.mat}} {{:doku:forge:training:3_leaks_ref_f90.mat | 4_ref_f90.mat}} {{:doku:forge:training:3_leaks_solution_makefile | 4_solution_makefile}} {{:doku:forge:training:3_leaks_solution_mmult2_sol.c | 4_solution_mmult2_sol.c}} {{:doku:forge:training:3_leaks_solution_mmult2_sol_c.exe | 4_solution_mmult2_sol_c.exe}} {{:doku:forge:training:3_leaks_solution_mmult2_sol.f90 | 4_solution_mmult2_sol.f90}} {{:doku:forge:training:3_leaks_solution_mmult2_sol_f90.exe | 4_solution_mmult2_sol_f90.exe}}
- **Overflow:** {{:doku:forge:training:4_overflow_makefile | 5_makefile}} {{:doku:forge:training:4_overflow_mmult2.c | 5_mmult2.c}}
- **Scaling:** {{:doku:forge:training:5_scale_makefile | 6_makefile}} {{:doku:forge:training:5_scale_mmult4.c | 6_mmult4.c}} {{:doku:forge:training:5_scale_mmult4.f90 | 6_mmult4.f90}} {{:doku:forge:training:5_scale_solution_2_mmult4_opt_o3.map | 6_solution_2_mmult4_opt_o3.map}} {{:doku:forge:training:5_scale_solution_makefile | 6_solution_makefile}} {{:doku:forge:training:5_scale_solution_mmult4_sol.c | 6_solution_mmult4_sol.c}} {{:doku:forge:training:5_scale_solution_mmult4_sol.f90 | 6_solution_mmult4_sol.f90}}
- **Reporting:** {{:doku:forge:training:6_reporting_1_6p_1n_home.html | 7.1_6p_1n_home.html}} {{:doku:forge:training:6_reporting_2_6p_1n_scratch.html | 7.2_6p_1n_scratch.html}} {{:doku:forge:training:6_reporting_3_24p_2n_scratch.html | 7.3_24p_2n_scratch.html}} {{:doku:forge:training:6_reporting_4_24p_1n_scratch.html | 7.4_24p_1n_scratch.html}} {{:doku:forge:training:6_reporting_5_12p_1n_scratch.html | 7.5_12p_1n_scratch.html}}