===== forge = map + ddt ===== ==== Synopsis: ==== map and ddt are ARM's (formerly Allinea's) advanced tools for performance analysis, see [[https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge]]. Licenses for up to 512 parallel tasks are available. Of additional note, [[doku:perf-report|perf-report]] --- a related lightweight profiling tool --- has now been integrated into forge in more recent releases. ==== Usage of map: ==== Profiling may be split into two steps, where the initial task is to create a *.map file from within a regular job script submitted to [[doku:slurm|SLURM]]. In a subsequent step this *.map file can then be analyzed within an interactive session on the login node. Suppose we had previously prepared an application for profiling, for instance via ''mpicc -g -O3 ./my_prog.c'' then we could call for a corresponding profile with the following submit script, #!/bin/bash # #SBATCH -J map #SBATCH -N 4 #SBATCH --ntasks-per-node 16 #SBATCH --ntasks-per-core 1 module purge module load intel/18 intel-mpi/2018 arm/20.1_FORGE map --profile srun --jobid $SLURM_JOB_ID --mpi=pmi2 -n 64 ./a.out which generates a *.map file (note the mention of #tasks and #nodes together with the date/time stamp in the filename) that may then be analyzed via the gui, ie ssh vsc4.vsc.ac.at -l my_uid -X cd wherever/the/map/file/may/be module purge module load allinea/20.1_FORGE map ./a_64p_4n_2020-09-24_11-42.map ==== Usage of ddt: ==== Debugging with ''ddt'' is currently limited to the __Remote Launch__ option. Best is to launch ''ddt''-sessions on separate compute nodes. === ddt (fully interactive via salloc): === The following steps need to be carried out: ssh vsc3.vsc.ac.at -l my_uid -X my_uid@l33$ cd wherever/my/app/may/be my_uid@l33$ salloc -N 4 -L allinea@vsc my_uid@l33$ echo $SLURM_JOB_ID ( just to figure out the current job ID, say it's 8909346 ) my_uid@l33$ srun --jobid 8909346 -n 4 hostname | tee ./machines.txt ( this is important ! it looks like a redundant command but will actually fix a lot of the prerequisites usually taken care of in the SLURM prologue of regular submit scripts, one of them being provisioning of required licenses ) ... let's assume we got n305-[044,057,073,074] which should now be listed inside file 'machines.txt' my_uid@l33$ rm -rf ~/.allinea/ ( to get rid of obsolete configurations from previous sessions ) my_uid@l33$ module purge my_uid@l33$ module load intel/18 intel-mpi/2018 allinea/20.1_FORGE ( or whatever else suite of MPI ) my_uid@l33$ mpiicc -g -O0 my_app.c my_uid@l33$ ddt & ( gui should open ) ... select 'Remote Launch - Configure' ... click 'Add' ... set my_uid@n305-044 as 'Host Name' or any other node from the above list ... set 'Remote Installation Directory' to /opt/sw/x86_64/glibc-2.17/ivybridge-ep/allinea/20.1_FORGE ... keep auto-selected defaults for the rest, then check it with 'Test Remote Launch' ( should be ok ) ... click OK twice to close the dialogues ... click Close to exit from the Configure menu ... next really select 'Remote Launch' by clicking the name tag that was auto-assigned above ( licence label should be ok in the lower left corner and the hostname of the connecting client should appear in the lower right corner ) ssh vsc3.vsc.ac.at -l my_uid ( a second terminal will be needed to actually start the debug session ) my_uid@l34$ ssh n305-044 ( log into that compute node that was selected/prepared above for remote launch ) my_uid@n305-044$ module purge my_uid@n305-044$ module load intel/18 intel-mpi/2018 allinea/20.1_FORGE my_uid@n305-044$ cd wherever/my/app/may/be my_uid@n305-044$ srun --jobid 8909346 -n 16 hostname ( just a dummy check to see whether all is set up and working correctly ) my_uid@n305-044$ ddt --connect srun --jobid 8909346 --mpi=pmi2 -n 64 ./a.out -arg1 -arg2 ( in the initial ddt-window a dialogue will pop up prompting for a Reverse Connection request; accept it and click Run and the usual debug session will start ) ==== Further Reading: ==== ''/opt/sw/x86_64/glibc-2.17/ivybridge-ep/allinea/20.1_FORGE/doc/userguide-forge.pdf'' {{ :doku:forge:training:2016may25.vsc.technical_training.pdf | Tutorial (by Patrick Wohlschlegel, Allinea):}} - **Debugging:** {{:doku:forge:training:0_debugging_makefile | 1_makefile}} {{:doku:forge:training:0_debugging_mmult1.c | 1_mmult1.c}} {{:doku:forge:training:0_debugging_mmult1.f90 | 1_mmult1.f90}} {{:doku:forge:training:0_debugging_report.html | 1_report.html}} {{:doku:forge:training:0_debugging_script.sub | 1_script.sub}} - **Profiling:** {{:doku:forge:training:1_profiling_1_mmult1_unopt_o1.html | 2.1_mmult1_unopt_o1.html}}{{:doku:forge:training:1_profiling_1_mmult1_unopt_o1.map | 2.1_mmult1_unopt_o1.map}} {{:doku:forge:training:1_profiling_2_mmult1_unopt_o3.html | 2.2_mmult1_unopt_o3.html}} {{:doku:forge:training:1_profiling_2_mmult1_unopt_o3.map | 2.2_mmult1_unopt_o3.map}} {{:doku:forge:training:1_profiling_3_mmult1_unopt_o3xhost.html | 2.3_mmult1_unopt_o3xhost.html}}{{:doku:forge:training:1_profiling_3_mmult1_unopt_o3xhost.map | 2.3_mmult1_unopt_o3xhost.map}}{{:doku:forge:training:1_profiling_makefile | 2_makefile}}{{:doku:forge:training:1_profiling_mmult1.c | 2_mmult1.c}}{{:doku:forge:training:1_profiling_mmult1.f90 | 2_mmult1.f90}} - **Vectorization:** {{:doku:forge:training:2_vecto_1_mmult1_opt_o1.html | 3.1_mmult1_unopt_o1.html}}{{:doku:forge:training:2_vecto_1_mmult1_opt_o1.map | 3.1_mmult1_unopt_o1.map}} {{:doku:forge:training:2_vecto_2_mmult1_opt_o3.html | 3.2_mmult1_opt_o3.html}} {{:doku:forge:training:2_vecto_2_mmult1_opt_o3.map | 3.2_mmult1_opt_o3.map}} {{:doku:forge:training:2_vecto_3_mmult1_opt-ivdep_o3.html | 3.3_mmult1_opt-ivdep_o3.html}} {{:doku:forge:training:2_vecto_3_mmult1_opt-ivdep_o3.map | 3.3_mmult1_opt-ivdep_o3.map}} {{:doku:forge:training:2_vecto_makefile | 3_makefile}} {{:doku:forge:training:2_vecto_mmult1_sol.c | 3_mmult1_sol.c}} {{:doku:forge:training:2_vecto_mmult1_sol.f90 | 3_mmult1_sol.f90}} - **Leaks:** {{:doku:forge:training:3_leaks_makefile | 4_makefile}} {{:doku:forge:training:3_leaks_mmult2.c | 4_mmult2.c}} {{:doku:forge:training:3_leaks_mmult2.f90 | 4_mmult2.f90}} {{:doku:forge:training:3_leaks_ref_c.mat | 4_ref_c.mat}} {{:doku:forge:training:3_leaks_ref_f90.mat | 4_ref_f90.mat}} {{:doku:forge:training:3_leaks_solution_makefile | 4_solution_makefile}} {{:doku:forge:training:3_leaks_solution_mmult2_sol.c | 4_solution_mmult2_sol.c}} {{:doku:forge:training:3_leaks_solution_mmult2_sol_c.exe | 4_solution_mmult2_sol_c.exe}} {{:doku:forge:training:3_leaks_solution_mmult2_sol.f90 | 4_solution_mmult2_sol.f90}} {{:doku:forge:training:3_leaks_solution_mmult2_sol_f90.exe | 4_solution_mmult2_sol_f90.exe}} - **Overflow:** {{:doku:forge:training:4_overflow_makefile | 5_makefile}} {{:doku:forge:training:4_overflow_mmult2.c | 5_mmult2.c}} - **Scaling:** {{:doku:forge:training:5_scale_makefile | 6_makefile}} {{:doku:forge:training:5_scale_mmult4.c | 6_mmult4.c}} {{:doku:forge:training:5_scale_mmult4.f90 | 6_mmult4.f90}} {{:doku:forge:training:5_scale_solution_2_mmult4_opt_o3.map | 6_solution_2_mmult4_opt_o3.map}} {{:doku:forge:training:5_scale_solution_makefile | 6_solution_makefile}} {{:doku:forge:training:5_scale_solution_mmult4_sol.c | 6_solution_mmult4_sol.c}} {{:doku:forge:training:5_scale_solution_mmult4_sol.f90 | 6_solution_mmult4_sol.f90}} - **Reporting:** {{:doku:forge:training:6_reporting_1_6p_1n_home.html | 7.1_6p_1n_home.html}} {{:doku:forge:training:6_reporting_2_6p_1n_scratch.html | 7.2_6p_1n_scratch.html}} {{:doku:forge:training:6_reporting_3_24p_2n_scratch.html | 7.3_24p_2n_scratch.html}} {{:doku:forge:training:6_reporting_4_24p_1n_scratch.html | 7.4_24p_1n_scratch.html}} {{:doku:forge:training:6_reporting_5_12p_1n_scratch.html | 7.5_12p_1n_scratch.html}}