Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revisionBoth sides next revision
doku:papi_ir [2016/07/06 11:24] – [Interfacing with PAPI : Low level interface] irdoku:papi_ir [2016/07/06 12:10] – [Practical tips:] ir
Line 32: Line 32:
 As stated in the comment of the C code above, it is best to analyze one particular event at a time. This advice is given because the CPU has limitations in combining arbitrary counters at a time.  As stated in the comment of the C code above, it is best to analyze one particular event at a time. This advice is given because the CPU has limitations in combining arbitrary counters at a time. 
 ==== Interfacing with PAPI : High level interface ==== ==== Interfacing with PAPI : High level interface ====
-The high-level API combines the counters for a specified list of PAPI preset events, only. The set of implemented high level functions is quite limited and can be found in the section ''The High Level API'' next to the last paragraph of ''papi.h''It can also be used in conjunction with the low-level API. An example for the usage of the high level API can be found for +The high level API combines the counters for a specified list of PAPI preset events, only. The set of implemented high level functions is quite limited and can be found in the section ''The High Level API'' next to the last paragraph of ''papi.h'' 
-  * [[doku:papi_hl_c|C]] users and +The high level API can also be used in conjunction with the low level API.
-  * [[doku:papi_hl_fortran|Fortran]] users+
  
 +Example code: 
 +  * [[doku:papi_hl_c|C]] 
 +  * [[doku:papi_hl_fortran|Fortran]]
  
 ==== Practical tips: ==== ==== Practical tips: ====
Line 43: Line 45:
   * Useful notes on Intel's CPI metric: [[https://software.intel.com/en-us/node/544403]]   * Useful notes on Intel's CPI metric: [[https://software.intel.com/en-us/node/544403]]
   * Occasionally, it is useful to PAPI-analyze an application within two steps: in the first step, selected characteristic events of the outermost code region are collected. In the second step, a set of subroutines/functions consuming major fractions of the execution time are analyzed with respect to the same events. In other words, we first obtain overall counts for ''main()'' of some application, e.g. ''PAPI_TOT_CYC'', ''PAPI_FP_OPS'', ''PAPI_L1_DCM'' and ''PAPI_L2_DCM''. Subsequently, the analogous set of event counters is measured for suspicious subroutines or functions. Also relative fractions of these event counters could be useful. Based on these measures the subroutines' or functions' performance characteristics can be compared to that of the initial evaluation of the complete code. In this way, those parts matching the overall performance (w.r.t. cache misses, flops etc.) best can be identified.   * Occasionally, it is useful to PAPI-analyze an application within two steps: in the first step, selected characteristic events of the outermost code region are collected. In the second step, a set of subroutines/functions consuming major fractions of the execution time are analyzed with respect to the same events. In other words, we first obtain overall counts for ''main()'' of some application, e.g. ''PAPI_TOT_CYC'', ''PAPI_FP_OPS'', ''PAPI_L1_DCM'' and ''PAPI_L2_DCM''. Subsequently, the analogous set of event counters is measured for suspicious subroutines or functions. Also relative fractions of these event counters could be useful. Based on these measures the subroutines' or functions' performance characteristics can be compared to that of the initial evaluation of the complete code. In this way, those parts matching the overall performance (w.r.t. cache misses, flops etc.) best can be identified.
 +  * 8-)
            
  
  
  
  • doku/papi_ir.txt
  • Last modified: 2016/06/23 13:23
  • by ir