This is an old revision of the document!

papi is an event-based profiling library that reads out hardware performance counters from the CPU and thus can provide useful information about critical events, e.g. cache misses, number of FLOPs, number of CYCLES etc.

The user will have to modify the source code and insert papi calls (see below). Invocation and usage is then as simple as

 module purge
 module load papi/5.4.3
 gcc my_program.c -lpapi    ( gfortran my_program.f -lpapi )

In general, some code section to be analyzed with papi needs to be wrapped into a sequence of standard papi calls, e.g.

 #include "papi.h"
 // PAPI variables
 // best is to analyze one particular event at a time
 int eventset;
 long long value, time0, time1, cyc0, cyc1;
 // PAPI Initialization
 eventset = PAPI_NULL;
    printf("PAPI init error !\n");
 // PAPI Event Set Creation
 if (PAPI_create_eventset(&eventset) != PAPI_OK) {
    printf("PAPI event set creation error !\n");
 // PAPI Specify a Particular Target Event to Analyze
 //   PAPI_TOT_CYC         Total cycles executed
 //   PAPI_FP_OPS          Floating point operations executed
 //   PAPI_L1_DCM          Level 1 data cache misses
 //   PAPI_L2_DCM          Level 2 data cache misses
 //   for other events see /opt/sw/x86_64/glibc-2.12/ivybridge-ep/papi/5.4.3/gnu-4.4.7/include/papiStdEventDefs.h
 if (PAPI_add_event(eventset, PAPI_FP_OPS) != PAPI_OK) {
    printf("PAPI event set adding error !\n");
 // PAPI Time Estimators InitializationPAPI_L2_DCM
 time0 = PAPI_get_real_usec();
 cyc0 = PAPI_get_real_cyc();
 // PAPI Counting Start
 if (PAPI_start(eventset) != PAPI_OK) {
    printf("PAPI start error !\n");
 //*** Here follows the original code section to be analyzed ***
 // PAPI Counting Stop
 if (PAPI_stop(eventset, &value) != PAPI_OK) {
    printf("PAPI stop error !\n");
 // PAPI Time Estimators Stop
 time1 = PAPI_get_real_usec();
 cyc1 = PAPI_get_real_cyc();
 // PAPI Results
 printf("PAPI event count %lld\n", value);
 printf("PAPI time passed in usec %lld\n", time1 - time0);
 printf("PAPI cycles passed %lld\n", cyc1 - cyc0);
 // PAPI Free Event Set
 if (PAPI_cleanup_eventset(eventset) != PAPI_OK) {
    printf("PAPI event set cleanup error !\n");
 if (PAPI_destroy_eventset(&eventset) != PAPI_OK) {PAPI_L2_DCM
    printf("PAPI event set destruction error !\n");
 // PAPI Finalize
  • A quick overview of supported events and corresponding papi variables for a particular type of CPU is obtained from executing command papi_avail.
  • Measuring the specific event PAPI_TOT_CYC can differ significantly from the result obtained by calling PAPI_get_real_cyc(). This is particularly true for papi analysis of very small code sections that are executed frequently (e.g. hotspot functions/routines that were ranked high during time based profiling). Although off in absolute terms, PAPI_TOT_CYC remains a useful reference time for relative comparisons.
  • Evaluating floating point performance on Intel ivy bridge:
  • Occasionally it is useful to papi-analyze an application within two steps: at first the outermost code region by a selection of characteristic events, then in similar fashion a set of subroutines/functions that consume the major fraction of exe time. Say for example we obtain overall counters, PAPI_TOT_CYC, PAPI_FP_OPS, PAPI_L1_DCM and PAPI_L2_DCM for the main() part of some application, then it is quite useful to determine the analogous set of event counters for suspicious subroutines/functions and look into relative fractions to identify overall-determining

Operation modes may be distinguished between:

A.) papi enclosing long lasting code sections:

B.) papi around hotspot functions/routines:

  • doku/papi.1465991007.txt.gz
  • Last modified: 2016/06/15 11:43
  • by sh