Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
pandoc:introduction-to-vsc:10_performance:10_performance [2018/01/16 07:23] – Pandoc Auto-commit pandoc | pandoc:introduction-to-vsc:10_performance:10_performance [2018/03/21 11:33] – Pandoc Auto-commit pandoc | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
+ | |||
====== Going for efficiency - how to get out performance ====== | ====== Going for efficiency - how to get out performance ====== | ||
Line 20: | Line 22: | ||
|**Network** throughput (GB/ | |**Network** throughput (GB/ | ||
|**Network** latency (µs) |1.4-1.8 | |**Network** latency (µs) |1.4-1.8 | ||
- | |**Storage** throughput ("/global", GB/s) |20 | + | |**Storage** throughput (“/global”, GB/s) |20 |
|**Storage** latency (ms) |0.03 | |**Storage** latency (ms) |0.03 | ||
- | "Operations": Floating point / 16 cores @3GHz; Values per node\\ | + | “Operations”: Floating point / 16 cores @3GHz; Values per node\\ |
- | "Memory": Mixture of three cache levels and main memory; Values per node\\ | + | “Memory”: Mixture of three cache levels and main memory; Values per node\\ |
- | "Network": Medium shared by all users (large jobs only); Values per node\\ | + | “Network”: Medium shared by all users (large jobs only); Values per node\\ |
- | "Storage": Medium shared by all users; Values per application | + | “Storage”: Medium shared by all users; Values per application |
Which one is the **limiting factor** for a given code? | Which one is the **limiting factor** for a given code? | ||
Line 55: | Line 57: | ||
* Large problem size => subalgorithm with highest exponent of complexity becomes dominant | * Large problem size => subalgorithm with highest exponent of complexity becomes dominant | ||
* **Measure** compute time, preferably with profiler | * **Measure** compute time, preferably with profiler | ||
- | * Beware: **cache effects** can 'change' | + | * Beware: **cache effects** can ‘change’ algorithmic complexity |
* Often: algorithmic complexity is well known and documented | * Often: algorithmic complexity is well known and documented | ||
Line 138: | Line 140: | ||
===== Very important libraries ===== | ===== Very important libraries ===== | ||
- | * **Level 3 BLAS** (="Matrix Multiplication") | + | * **Level 3 BLAS** (=“Matrix Multiplication”) |
* O(n^3) operations with O(n^2) memory accesses | * O(n^3) operations with O(n^2) memory accesses | ||
* portable performance (often faster by a factor of 10) | * portable performance (often faster by a factor of 10) | ||
Line 179: | Line 181: | ||
* Naming schemes | * Naming schemes | ||
- | * UPPER vs. lower case | + | * UPPER vs. lower case |
* added underscores | * added underscores | ||
Line 200: | Line 202: | ||
* Call by value / call by reference | * Call by value / call by reference | ||
* Order of parameters: size of string parameters sometimes at the very end of parameter list | * Order of parameters: size of string parameters sometimes at the very end of parameter list | ||
- | * I/O libraries | + | * I/O libraries * Hint: use I/O of only one language, or * link with correct runtime libraries, or * link by calling compiler (e.g. gfortran) instead of linker (e.g. ld) |
- | | + | |
- | | + | |
- | | + | |
* Copy-in and/or copy-out: can affect performance | * Copy-in and/or copy-out: can affect performance | ||
- | * Class/ | + | * Class/ |
===== Interoperability (3) ===== | ===== Interoperability (3) ===== | ||
Line 237: | Line 236: | ||
===== Memory optimization ===== | ===== Memory optimization ===== | ||
- | To keep CPUs busy, data has to be kept near the execution units, e.g. in **Level 1 Cache** | + | To keep CPUs busy, data has to be kept near the execution units, e.g. in **Level 1 Cache** |
Methods | Methods | ||
Line 299: | Line 298: | ||
//Large N => reusing data in fast cache levels | //Large N => reusing data in fast cache levels | ||
</ | </ | ||
- | Adapt parameter l to size of cache -- separately for each cache level | + | Adapt parameter l to size of cache – separately for each cache level |
Line 341: | Line 340: | ||
* VSC School Trainings upon request | * VSC School Trainings upon request | ||
- | https:// | + | https:// |