The most recent version of this page is a draft.This version (2016/06/15 11:43) is a draft.
Approvals: 0/1
Approvals: 0/1
This is an old revision of the document!
VSC-3: papi version 5.4.3
Synopsis:
papi is an event-based profiling library that reads out hardware performance counters from the CPU and thus can provide useful information about critical events, e.g. cache misses, number of FLOPs, number of CYCLES etc.
Usage of papi:
The user will have to modify the source code and insert papi
calls (see below). Invocation and usage is then as simple as
module purge module load papi/5.4.3 gcc my_program.c -lpapi ( gfortran my_program.f -lpapi ) ./a.out
Interfacing with papi :
In general, some code section to be analyzed with papi
needs to be wrapped into a sequence of standard papi
calls, e.g.
#include "papi.h" // PAPI variables // best is to analyze one particular event at a time int eventset; long long value, time0, time1, cyc0, cyc1; // PAPI Initialization eventset = PAPI_NULL; if (PAPI_library_init(PAPI_VER_CURRENT) != PAPI_VER_CURRENT) { printf("PAPI init error !\n"); exit(993); } // PAPI Event Set Creation if (PAPI_create_eventset(&eventset) != PAPI_OK) { printf("PAPI event set creation error !\n"); exit(994); } // PAPI Specify a Particular Target Event to Analyze // PAPI_TOT_CYC Total cycles executed // PAPI_FP_OPS Floating point operations executed // PAPI_L1_DCM Level 1 data cache misses // PAPI_L2_DCM Level 2 data cache misses // for other events see /opt/sw/x86_64/glibc-2.12/ivybridge-ep/papi/5.4.3/gnu-4.4.7/include/papiStdEventDefs.h // if (PAPI_add_event(eventset, PAPI_FP_OPS) != PAPI_OK) { printf("PAPI event set adding error !\n"); exit(995); } // PAPI Time Estimators InitializationPAPI_L2_DCM time0 = PAPI_get_real_usec(); cyc0 = PAPI_get_real_cyc(); // PAPI Counting Start if (PAPI_start(eventset) != PAPI_OK) { printf("PAPI start error !\n"); exit(996); } //*** Here follows the original code section to be analyzed *** // PAPI Counting Stop if (PAPI_stop(eventset, &value) != PAPI_OK) { printf("PAPI stop error !\n"); exit(997); } // PAPI Time Estimators Stop time1 = PAPI_get_real_usec(); cyc1 = PAPI_get_real_cyc(); // PAPI Results printf("PAPI event count %lld\n", value); printf("PAPI time passed in usec %lld\n", time1 - time0); printf("PAPI cycles passed %lld\n", cyc1 - cyc0); // PAPI Free Event Set if (PAPI_cleanup_eventset(eventset) != PAPI_OK) { printf("PAPI event set cleanup error !\n"); exit(998); } if (PAPI_destroy_eventset(&eventset) != PAPI_OK) {PAPI_L2_DCM printf("PAPI event set destruction error !\n"); exit(999); } // PAPI Finalize PAPI_shutdown();
Practical tips:
- A quick overview of supported events and corresponding
papi
variables for a particular type of CPU is obtained from executing commandpapi_avail
. - Measuring the specific event
PAPI_TOT_CYC
can differ significantly from the result obtained by callingPAPI_get_real_cyc()
. This is particularly true forpapi
analysis of very small code sections that are executed frequently (e.g. hotspot functions/routines that were ranked high during time based profiling). Although off in absolute terms,PAPI_TOT_CYC
remains a useful reference time for relative comparisons. - Evaluating floating point performance on Intel ivy bridge: https://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops
- Occasionally it is useful to
papi
-analyze an application within two steps: at first the outermost code region by a selection of characteristic events, then in similar fashion a set of subroutines/functions that consume the major fraction of exe time. Say for example we obtain overall counters,PAPI_TOT_CYC
,PAPI_FP_OPS
,PAPI_L1_DCM
andPAPI_L2_DCM
for themain()
part of some application, then it is quite useful to determine the analogous set of event counters for suspicious subroutines/functions and look into relative fractions to identify overall-determining
Operation modes may be distinguished between: