====== VSC – supercomputers ====== * Article written by Claudia Blaas-Schenner (VSC Team)
(last update 2020-10-10 by cb). **OUTLINE:** * **VSC – Vienna Scientific Cluster**
$~$ * **Supercomputers for beginners –**
**– introducing VSC to our (new) users** * Supercomputers for beginners – what is a supercomputer ? * VSC systems – what do they look like ? * VSC-4 – components of a supercomputer * Parallel hardware architectures –
– which parallel programming models can be used ? * VSC compute nodes * VSC node-interconnect * VSC-3 ping-pong – intra-node vs. inter-node ---- ====== VSC – Vienna Scientific Cluster ====== * **The VSC is** a joint high performance computing (HPC) facility of Austrian universities. * **Our mission:** Within the limits of available resources we satisfy the HPC needs of our users. * **VSC is primarily devoted to research.** * **Who can use VSC?** Scientific personnel of the partner universities, see: https://vsc.ac.at/access VSC is open to users from other Austrian academic and research institutions. * **Projects** (test, funded, …): Access to VSC is granted on the basis of **peer-reviewed projects**. * **Project manager** (= usually your supervisor): Project application, extensions, creates user accounts, … * **Publications**: Please [[https://vsc.ac.at/access/acknowledgments/|acknowledge VSC]] and [[https://vsc.ac.at/access/publications-database/|add publications]] $~~$➠$~~$ visible on [[https://vsc.ac.at/publications|VSC homepage]] ! ^VSC links: ^Information provided: ^ |➠$~~$**https://vsc.ac.at** |VSC homepage (general info) | |➠$~~$**https://service.vsc.ac.at** |VSC service website (application) | |➠$~~$**https://wiki.vsc.ac.at** |VSC user documentation | |➠$~~${{.:contact_vsc-red_margin.png?150}} |VSC user support $~$&$~$ contact | * **VSC Training Courses:**
➠$~~$**https://vsc.ac.at/training**
**VSC course slides:**
➠$~~$➠$~~$➠$~~$**[[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:01_supercomputers_for_beginners:00_linux|VSC-Linux]]**
➠$~~$➠$~~$➠$~~$**[[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:01_supercomputers_for_beginners:00_intro|VSC-Intro]]** ---- ====== Supercomputers for beginners ====== * **What is a supercomputer ?** * A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS)… [from Wikipedia]

* **A supercomputer is listed in the [[https://www.top500.org|TOP500]]** ^ ^ ^ TOP500^ GREEN500^ (#1 TOP500)^ |VSC-1 (2009) | 35 TFlop/s| 156 (11/2009)| 94 (06/2009)| 1.8 PFlop/s #1 (11/2009)| |VSC-2 (2011) | 135 TFlop/s| 56 (06/2011)| 71 (06/2011)| 8 PFlop/s #1 (06/2011)| |[[https://www.top500.org/system/178471|VSC-3 (2014)]]| 596 TFlop/s| 85 (11/2014)| 86 (11/2014)| 33 PFlop/s #1 (11/2014)| |VSC-3 (………) | 596 TFlop/s| 461 (11/2017)| 175 (11/2017)| 93 PFlop/s #1 (11/2017)| |[[https://www.top500.org/system/179697|VSC-4 (2019)]]| 2.7 PFlop/s| 82 (06/2019)| ———| 148 PFlop/s #1 (06/2019)| |VSC-4 (………) | 2.7 PFlop/s| 105 (06/2020)| ———| 415 PFlop/s #1 (06/2020)| ---- ====== VSC systems – what do they look like ? ====== {{.:vsc.png}} ---- ====== VSC-4 – components of a supercomputer ====== {{.:vsc4-schematic.png}}
$~$ * **login nodes** vs. **compute nodes** * **shared** (login, storage) vs. **user exclusive** (compute nodes -N $~$ | $~$ **on VSC-4** optional shared nodes -n) ---- ====== Parallel hardware architectures ====== **how to connect cores (processing units) ?**
{{.:hw-cores_margin.png?150}} {{.:hw-architectures.png}} ---- ====== VSC compute nodes ====== * **VSC-3**, **VSC-3+**, and **VSC-4** $~$ ➠ $~$ Intel CPUs $~$ ➠ $~$ different: $~$ **types**, $~$ **memory**, $~$ **# cores**, $~$ **# HCAs**
plus special types of hardware (GPUs on VSC-3) ➠ see: [[../09_special_hardware/accelerators.html#(4)|talk on special hardware]] and [[../05_submitting_batch_jobs/slurm.html#(11)|talk on SLURM]]
$~$ * **VSC-3**: $~$ **1 node** $~$ = $~$ **2 sockets** (CPUs), **8 cores** per socket (P), **2 threads** per core (T1/T2) $~$ + $~$ **2 HCAs** {{.:vsc3-node.png}} * **intra-socket**: 59.7 GB/s (max), **inter-socket** via QPI (QuickPath interconnect): 32 GB/s (max) * **inter-node** via dual rail Intel QDR-80: 4 GB/s (max) / 3.4 GB/s (eff) per HCA (host channel adapter) * Avoiding slow data paths is the key to most performance optimizations! $~~~$ ➠ $~$**Affinity matters!**$~$ **processing units** (PU#) $~~~$ ➠ pinning
see: [[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm#mpi_ntasks_per_node_pinning|article on SLURM]] and [[https://wiki.vsc.ac.at/doku.php?id=doku:vsc3_pinning|pinning@Wiki]] **memory hierarchy (mem_0064 nodes):**
L1 data cache: **32 kB**, private to core
L2 cache: **256 kB**, private to core (unified)
L3 cache: **20 MB**, shared by all cores of 1 socket
**memory: 32 GB per socket** ---- ====== VSC node-interconnect schematic ====== INTENT VSC-X**VSC-3** $~$ ➠ $~$ **dual rail Intel QDR-80 $~$ ➠ $~$ 3-level fat-tree** (BF = 2:1 / 4:1)

INTENT VSC-X**VSC-4** $~$ ➠ $~$ **single rail Intel Omnipath $~$ ➠ $~$ 2-level fat-tree** (BF = 2:1) {{.:vsc-fabric-3.png}} ---- ====== VSC-3 ping-pong – intra-node vs. inter-node ====== * **1 node** $~$ = $~$ 2 sockets with 8 cores per socket $~$ + $~$ **2 HCAs** * **inter-node** $~$ = $~$ IB fabric = dual rail Intel QDR-80 = 3-level fat-tree (BF: 2:1 / 4:1) * **ping-pong benchmark** $~$ = $~$ module load $~$ intel/16.0.3 $~$ intel-mpi/5.1.3 $~$ | $~$ openmpi/1.10.2 $~$ (1 HCA)

**MPI latency & bandwidth (plus typical values for comparison):**
^VSC-3: ^ latency [μs] ^ ^ typical values for: ^ latency^ bandwidth^ |intra-socket | 0.3 μs | | L1 cache | 1–2 ns| 100 GB/s| |inter-socket | 0.7 μs | | L2/L3 c. | 3–10 ns| 50 GB/s| |IB -1- edge | 1.4 μs | | memory | 100 ns| 10 GB/s| |IB -2- leaf | 1.8 μs | | HPC networks | | | |IB -3- spine | 2.3 μs | | (per node / 2 HCAs) | 1–10 μs| 1–8 GB/s|

{{.:ping-pong-bandwidth.png}} {{.:ping-pong-bandwidth-log.png}} ----