Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
doku:papi [2016/06/15 11:43] shdoku:papi [2016/06/16 13:46] sh
Line 8: Line 8:
  
 ==== Usage of papi: ==== ==== Usage of papi: ====
-The user will have to modify the source code and insert ''papi'' calls (see below). Invocation and usage is then as simple as+The user will have to modify the source code and insert ''papi'' calls (see below). Invocation and usage is then as simple as,
  
-    
    module purge    module purge
    module load papi/5.4.3    module load papi/5.4.3
-   gcc my_program.c -lpapi    ( gfortran my_program.f -lpapi )+   gcc my_program.c -lpapi    
    ./a.out    ./a.out
        
 +or for Fortran users,
 +
 +   module purge
 +   module load papi/5.4.3
 +   gfortran  my_program.f -I/opt/sw/x86_64/glibc-2.12/ivybridge-ep/papi/5.4.3/gnu-4.4.7/include -lpapi
 +   ./a.out
 +
        
 ==== Interfacing with papi : ==== ==== Interfacing with papi : ====
Line 48: Line 54:
       exit(995);       exit(995);
    }    }
-   // PAPI Time Estimators InitializationPAPI_L2_DCM+   // PAPI Time Estimators Initialization
    time0 = PAPI_get_real_usec();    time0 = PAPI_get_real_usec();
    cyc0 = PAPI_get_real_cyc();    cyc0 = PAPI_get_real_cyc();
Line 76: Line 82:
       exit(998);       exit(998);
    }    }
-   if (PAPI_destroy_eventset(&eventset) != PAPI_OK) {PAPI_L2_DCM+   if (PAPI_destroy_eventset(&eventset) != PAPI_OK) {
       printf("PAPI event set destruction error !\n");       printf("PAPI event set destruction error !\n");
       exit(999);       exit(999);
Line 89: Line 95:
   * Measuring the specific event ''PAPI_TOT_CYC'' can differ significantly from the result obtained by calling ''PAPI_get_real_cyc()''. This is particularly true for ''papi'' analysis of very small code sections that are executed frequently (e.g. hotspot functions/routines that were ranked high during time based profiling). Although off in absolute terms, ''PAPI_TOT_CYC'' remains a useful reference time for relative comparisons.    * Measuring the specific event ''PAPI_TOT_CYC'' can differ significantly from the result obtained by calling ''PAPI_get_real_cyc()''. This is particularly true for ''papi'' analysis of very small code sections that are executed frequently (e.g. hotspot functions/routines that were ranked high during time based profiling). Although off in absolute terms, ''PAPI_TOT_CYC'' remains a useful reference time for relative comparisons. 
   * Evaluating floating point performance on Intel ivy bridge: [[https://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops]]   * Evaluating floating point performance on Intel ivy bridge: [[https://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:SandyFlops]]
-  * Occasionally it is useful to ''papi''-analyze an application within two steps: at first the outermost code region by a selection of characteristic events, then in similar fashion a set of subroutines/functions that consume the major fraction of exe time. Say for example we obtain overall counters, ''PAPI_TOT_CYC'', ''PAPI_FP_OPS'', ''PAPI_L1_DCM'' and ''PAPI_L2_DCM'' for the ''main()'' part of some application, then it is quite useful to determine the analogous set of event counters for suspicious subroutines/functions and look into relative fractions to identify overall-determining  +  * Useful notes on Intel's CPI metric: [[https://software.intel.com/en-us/node/544403]] 
 +  * Occasionally it is useful to ''papi''-analyze an application within two steps: at first the outermost code region by a selection of characteristic events, then in similar fashion a set of subroutines/functions that consume major fractions of the execution time. Say for example we obtain overall counts for the ''main()'' part of some applicatione.g. ''PAPI_TOT_CYC'', ''PAPI_FP_OPS'', ''PAPI_L1_DCM'' and ''PAPI_L2_DCM'', then it is quite useful to determine the analogous set of event counters for suspicious subroutines/functions and look into relative fractions of these event counts and identify those which best match the initial (i.e. overall) evaluation. In so doing specific subroutines/functions can be detected that determine overall performance with respect to cache misses, flops etc.
            
-Operation modes may be distinguished between: 
-=== A.) papi enclosing long lasting code sections: === 
-=== B.) papi around hotspot functions/routines: === 
  
  • doku/papi.txt
  • Last modified: 2016/07/06 12:28
  • by ir