Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
pandoc:introduction-to-vsc:10_performance:10_performance [2018/01/31 13:17] – Pandoc Auto-commit pandocpandoc:introduction-to-vsc:10_performance:10_performance [2020/10/20 08:09] (current) – Pandoc Auto-commit pandoc
Line 38: Line 38:
   * HPC (High Performance Computing): execution time spent on just **a few lines of code**   * HPC (High Performance Computing): execution time spent on just **a few lines of code**
  
-  * Linear Algebra * (Dense) matrix multiplication: O(n^3) * (Dense) matrix vector multiplication: (O(n^2) * Dot product: O(n) * (Dense) system of linear equations / least squares: O(n^3) * (Dense) Eigensolver: O(n^3) * FFT: O(n * log(n)) * Sparse algorithms: O(n) .. O(n^3) * ..+  * Linear Algebra 
 +    * (Dense) matrix multiplication: O(n^3) 
 +    * (Dense) matrix vector multiplication: (O(n^2) 
 +    * Dot product: O(n) 
 +    * (Dense) system of linear equations / least squares: O(n^3) 
 +    * (Dense) Eigensolver: O(n^3) 
 +    * FFT: O(n * log(n)) 
 +    * Sparse algorithms: O(n) .. O(n^3) 
 +    * ..
  
  
 ===== Know algorithmic complexity (2) ===== ===== Know algorithmic complexity (2) =====
  
-  * Other codes * Identify **main parameters** * Identify **scaling** (usually polynomial)+  * Other codes 
 +    * Identify **main parameters** 
 +    * Identify **scaling** (usually polynomial)
  
   * Large problem size => subalgorithm with highest exponent of complexity becomes dominant   * Large problem size => subalgorithm with highest exponent of complexity becomes dominant
Line 130: Line 140:
 ===== Very important libraries ===== ===== Very important libraries =====
  
-  * **Level 3 BLAS** (=“Matrix Multiplication”) * O(n^3) operations with O(n^2) memory accesses * portable performance (often faster by a factor of 10) * including triangular and symmetric / hermitian matrices * any three-loop operation with $C=\alpha AB + \beta C$ can be written using Level 3 BLAS+  * **Level 3 BLAS** (=“Matrix Multiplication”) 
 +    * O(n^3) operations with O(n^2) memory accesses 
 +    * portable performance (often faster by a factor of 10) 
 +    * including triangular and symmetric / hermitian matrices 
 +    * any three-loop operation with $C=\alpha AB + \beta C$ can be written using Level 3 BLAS
  
-  * MKL: free **Math Kernel Library** from Intel * Linear Algebra * FFT * Neural networks * Statistics * Sparse algorithms * Parallel * ..+  * MKL: free **Math Kernel Library** from Intel 
 +    * Linear Algebra 
 +    * FFT 
 +    * Neural networks 
 +    * Statistics 
 +    * Sparse algorithms 
 +    * Parallel 
 +    * ..
   * **HDF5**: Hierarchical Data Format: efficient and portable library   * **HDF5**: Hierarchical Data Format: efficient and portable library
  
Line 183: Line 204:
   * I/O libraries * Hint: use I/O of only one language, or * link with correct runtime libraries, or * link by calling compiler (e.g. gfortran) instead of linker (e.g. ld)   * I/O libraries * Hint: use I/O of only one language, or * link with correct runtime libraries, or * link by calling compiler (e.g. gfortran) instead of linker (e.g. ld)
   * Copy-in and/or copy-out: can affect performance   * Copy-in and/or copy-out: can affect performance
-  * Class/module information in objects (*.o) only useful in one language+  * Class/module information in objects ("*.o") only useful in one language
  
 ===== Interoperability (3) ===== ===== Interoperability (3) =====
Line 320: Line 341:
  
 https://wiki.vsc.ac.at/doku.php?id=doku:perf-report – https://wiki.vsc.ac.at/doku.php?id=doku:forge https://wiki.vsc.ac.at/doku.php?id=doku:perf-report – https://wiki.vsc.ac.at/doku.php?id=doku:forge
 +
  
  • pandoc/introduction-to-vsc/10_performance/10_performance.1517404648.txt.gz
  • Last modified: 2018/01/31 13:17
  • by pandoc