This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== VSC – supercomputers ====== * Article written by Claudia Blaas-Schenner (VSC Team) <html><br></html>(last update 2020-10-10 by cb). **OUTLINE:** * **VSC – Vienna Scientific Cluster**<html><br></html>$~$ * **Supercomputers for beginners –**<html><br></html>**– introducing VSC to our (new) users** * Supercomputers for beginners – what is a supercomputer ? * VSC systems – what do they look like ? * VSC-4 – components of a supercomputer * Parallel hardware architectures –<html><br></html>– which parallel programming models can be used ? * VSC compute nodes * VSC node-interconnect * VSC-3 ping-pong – intra-node vs. inter-node ---- ====== VSC – Vienna Scientific Cluster ====== * **The VSC is** a joint high performance computing (HPC) facility of Austrian universities. * **Our mission:** Within the limits of available resources we satisfy the HPC needs of our users. * **VSC is primarily devoted to research.** * **Who can use VSC?** Scientific personnel of the partner universities, see: https://vsc.ac.at/access <html><nobr></html>VSC is open to users<html></nobr></html> from other Austrian academic and research institutions. * **Projects** (test, funded, …): Access to VSC is granted on the basis of **peer-reviewed projects**. * **Project manager** (= usually your supervisor): Project application, extensions, creates user accounts, … * **Publications**: Please [[https://vsc.ac.at/access/acknowledgments/|acknowledge VSC]] and [[https://vsc.ac.at/access/publications-database/|add publications]] <html><font color=#cc3300></html>$~~$➠$~~$<html></font></html> visible on [[https://vsc.ac.at/publications|VSC homepage]] ! ^VSC links: ^Information provided: ^ |<html><font color=#cc3300></html>➠$~~$<html></font></html>**https://vsc.ac.at** |VSC homepage (general info) | |<html><font color=#cc3300></html>➠$~~$<html></font></html>**https://service.vsc.ac.at** |VSC service website (application) | |<html><font color=#cc3300></html>➠$~~$<html></font></html>**https://wiki.vsc.ac.at** |VSC user documentation | |<html><font color=#cc3300></html>➠$~~$<html></font></html>{{.:contact_vsc-red_margin.png?150}} |VSC user support $~$&$~$ contact | * **VSC Training Courses:** <html><br></html><html><font color=#cc3300></html>➠$~~$<html></font></html>**https://vsc.ac.at/training** <html><br></html>**VSC course slides:** <html><br></html><html><font color=#cc3300></html>➠$~~$➠$~~$➠$~~$<html></font></html>**[[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:01_supercomputers_for_beginners:00_linux|VSC-Linux]]** <html><br></html><html><font color=#cc3300></html>➠$~~$➠$~~$➠$~~$<html></font></html>**[[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:01_supercomputers_for_beginners:00_intro|VSC-Intro]]** ---- ====== Supercomputers for beginners ====== * **What is a supercomputer ?** * A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS)… [from Wikipedia] <html><br></html><html><br></html> * **A supercomputer is listed in the [[https://www.top500.org|TOP500]]** ^ ^ ^ TOP500^ GREEN500^ (#1 TOP500)^ |VSC-1 (2009) | 35 TFlop/s| 156 (11/2009)| 94 (06/2009)| 1.8 PFlop/s #1 (11/2009)| |VSC-2 (2011) | 135 TFlop/s| 56 (06/2011)| 71 (06/2011)| 8 PFlop/s #1 (06/2011)| |[[https://www.top500.org/system/178471|VSC-3 (2014)]]| 596 TFlop/s| 85 (11/2014)| 86 (11/2014)| 33 PFlop/s #1 (11/2014)| |VSC-3 (………) | 596 TFlop/s| 461 (11/2017)| 175 (11/2017)| 93 PFlop/s #1 (11/2017)| |[[https://www.top500.org/system/179697|VSC-4 (2019)]]| 2.7 PFlop/s| 82 (06/2019)| ———| 148 PFlop/s #1 (06/2019)| |VSC-4 (………) | 2.7 PFlop/s| 105 (06/2020)| ———| 415 PFlop/s #1 (06/2020)| ---- ====== VSC systems – what do they look like ? ====== {{.:vsc.png}} ---- ====== VSC-4 – components of a supercomputer ====== {{.:vsc4-schematic.png}} <html><br></html>$~$ * **login nodes** vs. **compute nodes** * **shared** (login, storage) vs. **user exclusive** (compute nodes -N $~$ | $~$ **on VSC-4** optional shared nodes -n) ---- ====== Parallel hardware architectures ====== **how to connect cores (processing units) ?** <html><br></html>{{.:hw-cores_margin.png?150}} {{.:hw-architectures.png}} ---- ====== VSC compute nodes ====== * <html><font color=#cc3300></html>**VSC-3**<html></font></html>, <html><font color=#cc3300></html>**VSC-3+**<html></font></html>, and <html><font color=#cc3300></html>**VSC-4**<html></font></html> $~$ ➠ $~$ Intel CPUs $~$ ➠ $~$ different: $~$ **types**, $~$ **memory**, $~$ **# cores**, $~$ **# HCAs** <html><br></html>plus special types of hardware (GPUs on VSC-3) <html><font color=#cc3300></html> ➠ <html></font></html> see: [[../09_special_hardware/accelerators.html#(4)|talk on special hardware]] and [[../05_submitting_batch_jobs/slurm.html#(11)|talk on SLURM]]<html><br></html>$~$ * **VSC-3**: $~$ **1 node** $~$ = $~$ **2 sockets** (CPUs), **8 cores** per socket (P), **2 threads** per core (T1/T2) $~$ + $~$ **2 HCAs** {{.:vsc3-node.png}} * **intra-socket**: 59.7 GB/s (max), **inter-socket** via QPI (QuickPath interconnect): 32 GB/s (max) * **inter-node** via dual rail Intel QDR-80: 4 GB/s (max) / 3.4 GB/s (eff) per HCA (host channel adapter) * <html><font color=#cc3300></html>Avoiding slow data paths is the key to most performance optimizations! $~~~$ ➠ $~$**Affinity matters!**$~$<html></font></html> **processing units** (PU#) $~~~$ <html><font color=#cc3300></html> ➠ pinning<html></font></html> <html><br></html>see: [[https://wiki.vsc.ac.at/doku.php?id=pandoc:introduction-to-vsc:05_submitting_batch_jobs:slurm#mpi_ntasks_per_node_pinning|article on SLURM]] and [[https://wiki.vsc.ac.at/doku.php?id=doku:vsc3_pinning|pinning@Wiki]] **memory hierarchy (mem_0064 nodes):** <html><br></html>L1 data cache: **32 kB**, private to core <html><br></html>L2 cache: **256 kB**, private to core (unified) <html><br></html>L3 cache: **20 MB**, shared by all cores of 1 socket <html><br></html>**memory: 32 GB per socket** ---- ====== VSC node-interconnect schematic ====== <html><font color=#ffffff></html>INTENT VSC-X<html></font></html><html><font color=#ffa500></html>**VSC-3** $~$ ➠ $~$ <html></font></html> **dual rail Intel QDR-80 <html><font color=#ffa500></html> $~$ ➠ $~$ <html></font></html> 3-level fat-tree** (BF = 2:1 / 4:1) <html><br></html><html><br></html> <html><font color=#ffffff></html>INTENT VSC-X<html></font></html><html><font color=#ff00ff></html>**VSC-4** $~$ ➠ $~$ <html></font></html> **single rail Intel Omnipath <html><font color=#ff00ff></html> $~$ ➠ $~$ <html></font></html> 2-level fat-tree** (BF = 2:1) {{.:vsc-fabric-3.png}} ---- ====== VSC-3 ping-pong – intra-node vs. inter-node ====== * **1 node** $~$ = $~$ 2 sockets with 8 cores per socket $~$ + $~$ **2 HCAs** * **inter-node** $~$ = $~$ IB fabric = dual rail Intel QDR-80 = 3-level fat-tree (BF: 2:1 / 4:1) * **ping-pong benchmark** $~$ = $~$ module load $~$ intel/16.0.3 $~$ intel-mpi/5.1.3 $~$ | $~$ openmpi/1.10.2 $~$ (1 HCA) <HTML><ul></HTML> <HTML><li></HTML><HTML><p></HTML>**MPI latency & bandwidth (plus typical values for comparison):**<HTML></p></HTML> ^VSC-3: ^ latency [μs] ^ ^ typical values for: ^ latency^ bandwidth^ |<html><font color=#0000ff></html>intra-socket<html></font></html> | <html><font color=#0000ff></html>0.3 μs<html></font></html> | | <html><font color=#696969></html>L1 cache<html></font></html> | <html><font color=#696969></html>1–2 ns<html></font></html>| <html><font color=#696969></html>100 GB/s<html></font></html>| |<html><font color=#6b8e23></html>inter-socket<html></font></html> | <html><font color=#6b8e23></html>0.7 μs<html></font></html> | | <html><font color=#696969></html>L2/L3 c.<html></font></html> | <html><font color=#696969></html>3–10 ns<html></font></html>| <html><font color=#696969></html>50 GB/s<html></font></html>| |<html><font color=#cc3300></html>IB -1- edge<html></font></html> | <html><font color=#cc3300></html>1.4 μs<html></font></html> | | <html><font color=#696969></html>memory<html></font></html> | <html><font color=#696969></html>100 ns<html></font></html>| <html><font color=#696969></html>10 GB/s<html></font></html>| |<html><font color=#ff00ff></html>IB -2- leaf<html></font></html> | <html><font color=#ff00ff></html>1.8 μs<html></font></html> | | <html><font color=#696969></html>HPC networks<html></font></html> | | | |<html><font color=#ffa500></html>IB -3- spine<html></font></html> | <html><font color=#ffa500></html>2.3 μs<html></font></html> | | <html><font color=#696969></html>(per node / 2 HCAs)<html></font></html> | <html><font color=#696969></html>1–10 μs<html></font></html>| <html><font color=#696969></html>1–8 GB/s<html></font></html>| <HTML></li></HTML><HTML></ul></HTML> {{.:ping-pong-bandwidth.png}} {{.:ping-pong-bandwidth-log.png}} ---- pandoc/introduction-to-vsc/01_supercomputers_for_beginners/vsc_supercomputers.txt Last modified: 2020/10/20 09:13by pandoc