Differences

This shows you the differences between two versions of the page.

--- pandoc:introduction-to-vsc:10_performance:10_performance [2018/01/16 07:23] – Pandoc Auto-commit pandoc
+++ pandoc:introduction-to-vsc:10_performance:10_performance [2018/03/21 11:33] – Pandoc Auto-commit pandoc
@@ Line 1: / Line 1: @@
 ====== Going for efficiency - how to get out performance ======
@@ Line 20: / Line 22: @@
 |**Network** throughput (GB/s)             |2*4      |2*3.4     |2*2                     |0.2 / 0.0002         |
 |**Network** latency (µs)                  |1.4-1.8  |1.4-1.8   |1.4-1.8                 |2                    |
-|**Storage** throughput ("/global", GB/s)  |20       |15        |10                      |1                    |
+|**Storage** throughput (“/global”, GB/s)  |20       |15        |10                      |1                    |
 |**Storage** latency (ms)                  |0.03     |1         |1                       |1000                 |
-"Operations": Floating point / 16 cores @3GHz; Values per node\\
+“Operations”: Floating point / 16 cores @3GHz; Values per node\\
-"Memory": Mixture of three cache levels and main memory; Values per node\\
+“Memory”: Mixture of three cache levels and main memory; Values per node\\
-"Network": Medium shared by all users (large jobs only); Values per node\\
+“Network”: Medium shared by all users (large jobs only); Values per node\\
-"Storage": Medium shared by all users; Values per application
+“Storage”: Medium shared by all users; Values per application
 Which one is the **limiting factor** for a given code?
@@ Line 55: / Line 57: @@
   * Large problem size => subalgorithm with highest exponent of complexity becomes dominant
     * **Measure** compute time, preferably with profiler
-    * Beware: **cache effects** can 'change' algorithmic complexity
+    * Beware: **cache effects** can ‘change’ algorithmic complexity
   * Often: algorithmic complexity is well known and documented
@@ Line 138: / Line 140: @@
 ===== Very important libraries =====
-  * **Level 3 BLAS** (="Matrix Multiplication")
+  * **Level 3 BLAS** (=“Matrix Multiplication”)
     * O(n^3) operations with O(n^2) memory accesses
     * portable performance (often faster by a factor of 10)
@@ Line 179: / Line 181: @@
   * Naming schemes
-    * UPPER vs. lower case
+    * UPPER vs. lower case
     * added underscores
@@ Line 200: / Line 202: @@
   * Call by value / call by reference
   * Order of parameters: size of string parameters sometimes at the very end of parameter list
-  * I/O libraries
+  * I/O libraries * Hint: use I/O of only one language, or * link with correct runtime libraries, or * link by calling compiler (e.g. gfortran) instead of linker (e.g. ld)
-    * Hint: use I/O of only one language, or
-    * link with correct runtime libraries, or
-    * link by calling compiler (e.g. gfortran) instead of linker (e.g. ld)
   * Copy-in and/or copy-out: can affect performance
-  * Class/module information in objects ("*.o") only useful in one language
+  * Class/module information in objects (“*.o“) only useful in one language
 ===== Interoperability (3) =====
@@ Line 237: / Line 236: @@
 ===== Memory optimization =====
-To keep CPUs busy, data has to be kept near the execution units, e.g. in **Level 1 Cache**
+To keep CPUs busy, data has to be kept near the execution units, e.g. in **Level 1 Cache**
 Methods
@@ Line 299: / Line 298: @@
 //Large N => reusing data in fast cache levels
 </code>
-Adapt parameter l to size of cache -- separately for each cache level
+Adapt parameter l to size of cache – separately for each cache level
@@ Line 341: / Line 340: @@
   * VSC School Trainings upon request
-https://wiki.vsc.ac.at/doku.php?id=doku:perf-report -- https://wiki.vsc.ac.at/doku.php?id=doku:forge
+https://wiki.vsc.ac.at/doku.php?id=doku:perf-report – https://wiki.vsc.ac.at/doku.php?id=doku:forge