====== VSC-3 – supercomputer ======
* Article written by Claudia Blaas-Schenner (VSC Team)
(last update 2019-10-08 by cb).
**OUTLINE:**
* **VSC – Vienna Scientific Cluster**
$~$
* **Supercomputers for beginners –**
**– introducing VSC-3 to our (new) users**
* Supercomputers for beginners – what is a supercomputer ?
* VSC-3 – what does it look like ?
* VSC-3 – components of a supercomputer
* Parallel hardware architectures –
– which parallel programming models can be used ?
* VSC-3 compute nodes
* VSC-3 node-interconnect
* VSC-3 ping-pong – intra-node vs. inter-node
----
====== VSC – Vienna Scientific Cluster ======
* **The VSC is** a joint high performance computing (HPC) facility of Austrian universities.
* **Our mission:** Within the limits of available resources we satisfy the HPC needs of our users.
* **VSC is primarily devoted to research.**
* **Who can use VSC?** Scientific personnel of the partner universities, see: http://vsc.ac.at/access
➠$~~$**http://vsc.ac.at/training**
**VSC course slides:**
➠$~~$➠$~~$➠$~~$**[[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:01_supercomputers_for_beginners:00_linux|VSC-Linux]]**
➠$~~$➠$~~$➠$~~$**[[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:01_supercomputers_for_beginners:00_intro|VSC-Intro]]**
----
====== Supercomputers for beginners ======
* **What is a supercomputer ?**
* A supercomputer is a computer with a high level of computing performance compared to a general-purpose computer. Performance of a supercomputer is measured in floating-point operations per second (FLOPS)… [from Wikipedia]
* **A supercomputer is listed in the [[https://www.top500.org|TOP500]]**
^ ^ ^ TOP500^ GREEN500^ (#1 TOP500)^
|VSC-1 (2009) | 35 TFlop/s| 156 (11/2009)| 94 (06/2009)| 1.8 PFlop/s #1 (11/2009)|
|VSC-2 (2011) | 135 TFlop/s| 56 (06/2011)| 71 (06/2011)| 8 PFlop/s #1 (06/2011)|
|[[https://www.top500.org/system/178471|VSC-3 (2014)]]| 596 TFlop/s| 85 (11/2014)| 86 (11/2014)| 33 PFlop/s #1 (11/2014)|
|VSC-3 (………) | 596 TFlop/s| 460 (11/2017)| 175 (11/2017)| 93 PFlop/s #1 (11/2017)|
|[[https://www.top500.org/system/179697|VSC-4 (2019)]]| 2.7 PFlop/s| 82 (06/2019)| ——–| 148 PFlop/s #1 (06/2019)|
----
====== VSC-3 – what does it look like ? ======
{{:pandoc:introduction-to-vsc:01_supercomputers_for_beginners:vsc3_supercomputer:vsc3.png}}
----
====== VSC-3 – what does it look like ? – inside ======
{{:pandoc:introduction-to-vsc:01_supercomputers_for_beginners:vsc3_supercomputer:vsc3-inside.png}}
----
====== VSC-3 – components of a supercomputer ======
{{:pandoc:introduction-to-vsc:01_supercomputers_for_beginners:vsc3_supercomputer:vsc3-schematic.png}}
$~$
* **login nodes** vs. **compute nodes**
* **shared** (login, storage) vs. **user exclusive** (compute nodes)
----
====== Parallel hardware architectures ======
* **how to connect cores (processing units) ?**
{{:pandoc:introduction-to-vsc:01_supercomputers_for_beginners:vsc3_supercomputer:hw-cores_margin.png?150}}
{{:pandoc:introduction-to-vsc:01_supercomputers_for_beginners:vsc3_supercomputer:hw-architectures.png}}
----
====== VSC-3 compute nodes ======
* most nodes are **Intel Xeon IvyBridge** (E5-2650 v2 @ 2.60GHz) with **64 GB**, some with 128 / 256 GB,
plus special types of hardware
* **1 node** $~$ = $~$ **2 sockets** (CPUs), **8 cores** per socket (P), **2 threads** per core (T1/T2) $~$ + $~$ **2 HCAs**
{{:pandoc:introduction-to-vsc:01_supercomputers_for_beginners:vsc3_supercomputer:vsc3-node.png}}
* **intra-socket**: 59.7 GB/s (max), **inter-socket** via QPI (QuickPath interconnect): 32 GB/s (max)
* **inter-node** via dual rail Intel QDR-80: 4 GB/s (max) / 3.4 GB/s (eff) per HCA (host channel adapter)
* Avoiding slow data paths is the key to most performance optimizations! $~~~$ ➠ $~$**Affinity matters!**$~$
**processing units** (PU#) $~~~$ ➠ pinning
see: [[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm#mpi_ntasks_per_node_pinning|article on SLURM]] and [[https://wiki.vsc.ac.at/doku.php?id=doku:vsc3_pinning|pinning@Wiki]]
**memory hierarchy (mem_0064 nodes):**
L1 instruction cache: **32 kB**, private to core
L1 data cache: **32 kB**, private to core
L2 cache: **256 kB**, private to core (unified)
L3 cache: **20 MB**, shared by 8 cores of 1 socket
**memory: 32 GB per socket**
----
====== VSC-3 node-interconnect ======
**IB fabric = dual rail Intel QDR-80 = 3-level fat-tree** (BF: 2:1 / 4:1) – schematic figure / numbers only
**IB fabric = dual rail Intel ** (blocking – BF: down- : up-links – might introduce an additional latency)
{{:pandoc:introduction-to-vsc:01_supercomputers_for_beginners:vsc3_supercomputer:vsc3-fabric-3.png}}
----
====== VSC-3 ping-pong – intra-node vs. inter-node ======
* **1 node** $~$ = $~$ 2 sockets with 8 cores per socket $~$ + $~$ **2 HCAs**
* **inter-node** $~$ = $~$ IB fabric = dual rail Intel QDR-80 = 3-level fat-tree (BF: 2:1 / 4:1)
* **ping-pong benchmark** $~$ = $~$ module load $~$ intel/16.0.3 $~$ intel-mpi/5.1.3 $~$ | $~$ openmpi/1.10.2 $~$ (1 HCA)
**MPI latency & bandwidth (plus typical values for comparison):**
^VSC-3: ^ latency [μs] ^ ^ typical values for: ^ latency^ bandwidth^ |intra-socket | 0.3 μs | | L1 cache | 1–2 ns| 100 GB/s| |inter-socket | 0.7 μs | | L2/L3 c. | 3–10 ns| 50 GB/s| |IB -1- edge | 1.4 μs | | memory | 100 ns| 10 GB/s| |IB -2- leaf | 1.8 μs | | HPC networks | |IB -3- spine | 2.3 μs | | (per node / 2 HCAs) | 1–10 μs| 1–8 GB/s|