Both sides previous revision Previous revision Next revision | Previous revision Next revisionBoth sides next revision |
doku:papi [2016/06/15 11:43] – sh | doku:papi [2016/06/16 13:46] – sh |
---|
| |
==== Usage of papi: ==== | ==== Usage of papi: ==== |
The user will have to modify the source code and insert ''papi'' calls (see below). Invocation and usage is then as simple as | The user will have to modify the source code and insert ''papi'' calls (see below). Invocation and usage is then as simple as, |
| |
| |
module purge | module purge |
module load papi/5.4.3 | module load papi/5.4.3 |
gcc my_program.c -lpapi ( gfortran my_program.f -lpapi ) | gcc my_program.c -lpapi |
./a.out | ./a.out |
| |
| or for Fortran users, |
| |
| module purge |
| module load papi/5.4.3 |
| gfortran my_program.f -I/opt/sw/x86_64/glibc-2.12/ivybridge-ep/papi/5.4.3/gnu-4.4.7/include -lpapi |
| ./a.out |
| |
| |
==== Interfacing with papi : ==== | ==== Interfacing with papi : ==== |
exit(995); | exit(995); |
} | } |
// PAPI Time Estimators InitializationPAPI_L2_DCM | // PAPI Time Estimators Initialization |
time0 = PAPI_get_real_usec(); | time0 = PAPI_get_real_usec(); |
cyc0 = PAPI_get_real_cyc(); | cyc0 = PAPI_get_real_cyc(); |
exit(998); | exit(998); |
} | } |
if (PAPI_destroy_eventset(&eventset) != PAPI_OK) {PAPI_L2_DCM | if (PAPI_destroy_eventset(&eventset) != PAPI_OK) { |
printf("PAPI event set destruction error !\n"); | printf("PAPI event set destruction error !\n"); |
exit(999); | exit(999); |
* Measuring the specific event ''PAPI_TOT_CYC'' can differ significantly from the result obtained by calling ''PAPI_get_real_cyc()''. This is particularly true for ''papi'' analysis of very small code sections that are executed frequently (e.g. hotspot functions/routines that were ranked high during time based profiling). Although off in absolute terms, ''PAPI_TOT_CYC'' remains a useful reference time for relative comparisons. | * Measuring the specific event ''PAPI_TOT_CYC'' can differ significantly from the result obtained by calling ''PAPI_get_real_cyc()''. This is particularly true for ''papi'' analysis of very small code sections that are executed frequently (e.g. hotspot functions/routines that were ranked high during time based profiling). Although off in absolute terms, ''PAPI_TOT_CYC'' remains a useful reference time for relative comparisons. |
* Evaluating floating point performance on Intel ivy bridge: [[https://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops]] | * Evaluating floating point performance on Intel ivy bridge: [[https://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops]] |
* Occasionally it is useful to ''papi''-analyze an application within two steps: at first the outermost code region by a selection of characteristic events, then in similar fashion a set of subroutines/functions that consume the major fraction of exe time. Say for example we obtain overall counters, ''PAPI_TOT_CYC'', ''PAPI_FP_OPS'', ''PAPI_L1_DCM'' and ''PAPI_L2_DCM'' for the ''main()'' part of some application, then it is quite useful to determine the analogous set of event counters for suspicious subroutines/functions and look into relative fractions to identify overall-determining | * Useful notes on Intel's CPI metric: [[https://software.intel.com/en-us/node/544403]] |
| * Occasionally it is useful to ''papi''-analyze an application within two steps: at first the outermost code region by a selection of characteristic events, then in similar fashion a set of subroutines/functions that consume major fractions of the execution time. Say for example we obtain overall counts for the ''main()'' part of some application, e.g. ''PAPI_TOT_CYC'', ''PAPI_FP_OPS'', ''PAPI_L1_DCM'' and ''PAPI_L2_DCM'', then it is quite useful to determine the analogous set of event counters for suspicious subroutines/functions and look into relative fractions of these event counts and identify those which best match the initial (i.e. overall) evaluation. In so doing specific subroutines/functions can be detected that determine overall performance with respect to cache misses, flops etc. |
| |
Operation modes may be distinguished between: | |
=== A.) papi enclosing long lasting code sections: === | |
=== B.) papi around hotspot functions/routines: === | |
| |