perf-report is a lightweight profiling tool that provides basic information about where the computing time is actually spent in a given application. It is developed by ARM (formerly Allinea) and very easy to use, i.e. by simply prefixing the usual executable with a call to perf-report. It is nowadays integral part of ARM's forge.

For example, analyzing a simple MPI job could be done with the help of the following submit script to SLURM

 #SBATCH -J prflng        
 #SBATCH -N 2 
 #SBATCH -L allinea@vsc
 #SBATCH --ntasks-per-node=16
 #SBATCH --ntasks-per-core=1

 module purge
 module load intel/18 intel-mpi/2018 allinea/20.1_FORGE
 perf-report srun --jobid $SLURM_JOB_ID --mpi=pmi2 -n 32 ./a.out

This will result in the creation of two summary files in *.txt and *.html format providing an overview of the relative time spent in MPI, I/O, OpenMP etc. Note the 'Energy' section in recent releases !

/opt/sw/x86_64/glibc-2.17/ivybridge-ep/allinea/20.1_FORGE/doc/userguide-forge.pdf ( part 4, explanations, examples, typical cases )

