This version (2020/10/20 09:13) is a draft.
Approvals: 0/1

VSC – supercomputers

  • Article written by Claudia Blaas-Schenner (VSC Team) <html><br></html>(last update 2020-10-10 by cb).

OUTLINE:

  • VSC – Vienna Scientific Cluster<html><br></html>$~$
  • Supercomputers for beginners –<html><br></html>– introducing VSC to our (new) users
    • Supercomputers for beginners – what is a supercomputer ?
    • VSC systems – what do they look like ?
    • VSC-4 – components of a supercomputer
    • Parallel hardware architectures –<html><br></html>– which parallel programming models can be used ?
    • VSC compute nodes
    • VSC node-interconnect
    • VSC-3 ping-pong – intra-node vs. inter-node

VSC – Vienna Scientific Cluster

  • The VSC is a joint high performance computing (HPC) facility of Austrian universities.
  • Our mission: Within the limits of available resources we satisfy the HPC needs of our users.
  • VSC is primarily devoted to research.
  • Who can use VSC? Scientific personnel of the partner universities, see: https://vsc.ac.at/access <html><nobr></html>VSC is open to users<html></nobr></html> from other Austrian academic and research institutions.
  • Projects (test, funded, …): Access to VSC is granted on the basis of peer-reviewed projects.
  • Project manager (= usually your supervisor): Project application, extensions, creates user accounts, …
  • Publications: Please acknowledge VSC and add publications <html><font color=#cc3300></html>$~~$➠$~~$<html></font></html> visible on VSC homepage !
VSC links: Information provided:
<html><font color=#cc3300></html>➠$~~$<html></font></html>https://vsc.ac.at VSC homepage (general info)
<html><font color=#cc3300></html>➠$~~$<html></font></html>https://service.vsc.ac.at VSC service website (application)
<html><font color=#cc3300></html>➠$~~$<html></font></html>https://wiki.vsc.ac.at VSC user documentation
<html><font color=#cc3300></html>➠$~~$<html></font></html> VSC user support $~$&$~$ contact
  • VSC Training Courses: <html><br></html><html><font color=#cc3300></html>➠$~~$<html></font></html>https://vsc.ac.at/training <html><br></html>VSC course slides: <html><br></html><html><font color=#cc3300></html>➠$~~$➠$~~$➠$~~$<html></font></html>VSC-Linux <html><br></html><html><font color=#cc3300></html>➠$~~$➠$~~$➠$~~$<html></font></html>VSC-Intro

Supercomputers for beginners

  • What is a supercomputer ?
  • A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS)… [from Wikipedia] <html><br></html><html><br></html>
  • A supercomputer is listed in the TOP500
TOP500 GREEN500 (#1 TOP500)
VSC-1 (2009) 35 TFlop/s 156 (11/2009) 94 (06/2009) 1.8 PFlop/s #1 (11/2009)
VSC-2 (2011) 135 TFlop/s 56 (06/2011) 71 (06/2011) 8 PFlop/s #1 (06/2011)
VSC-3 (2014) 596 TFlop/s 85 (11/2014) 86 (11/2014) 33 PFlop/s #1 (11/2014)
VSC-3 (………) 596 TFlop/s 461 (11/2017) 175 (11/2017) 93 PFlop/s #1 (11/2017)
VSC-4 (2019) 2.7 PFlop/s 82 (06/2019) ——— 148 PFlop/s #1 (06/2019)
VSC-4 (………) 2.7 PFlop/s 105 (06/2020) ——— 415 PFlop/s #1 (06/2020)

VSC systems – what do they look like ?


VSC-4 – components of a supercomputer

<html><br></html>$~$

  • login nodes vs. compute nodes
  • shared (login, storage) vs. user exclusive (compute nodes -N $~$ | $~$ on VSC-4 optional shared nodes -n)

Parallel hardware architectures

how to connect cores (processing units) ? <html><br></html>


VSC compute nodes

  • <html><font color=#cc3300></html>VSC-3<html></font></html>, <html><font color=#cc3300></html>VSC-3+<html></font></html>, and <html><font color=#cc3300></html>VSC-4<html></font></html> $~$ ➠ $~$ Intel CPUs $~$ ➠ $~$ different: $~$ types, $~$ memory, $~$ # cores, $~$ # HCAs <html><br></html>plus special types of hardware (GPUs on VSC-3) <html><font color=#cc3300></html> ➠ <html></font></html> see: talk on special hardware and talk on SLURM<html><br></html>$~$
  • VSC-3: $~$ 1 node $~$ = $~$ 2 sockets (CPUs), 8 cores per socket (P), 2 threads per core (T1/T2) $~$ + $~$ 2 HCAs

  • intra-socket: 59.7 GB/s (max), inter-socket via QPI (QuickPath interconnect): 32 GB/s (max)
  • inter-node via dual rail Intel QDR-80: 4 GB/s (max) / 3.4 GB/s (eff) per HCA (host channel adapter)
  • <html><font color=#cc3300></html>Avoiding slow data paths is the key to most performance optimizations! $~~~$ ➠ $~$Affinity matters!$~$<html></font></html>

processing units (PU#) $~~~$ <html><font color=#cc3300></html> ➠ pinning<html></font></html> <html><br></html>see: article on SLURM and pinning@Wiki

memory hierarchy (mem_0064 nodes): <html><br></html>L1 data cache: 32 kB, private to core <html><br></html>L2 cache: 256 kB, private to core (unified) <html><br></html>L3 cache: 20 MB, shared by all cores of 1 socket <html><br></html>memory: 32 GB per socket


VSC node-interconnect schematic

<html><font color=#ffffff></html>INTENT VSC-X<html></font></html><html><font color=#ffa500></html>VSC-3 $~$ ➠ $~$ <html></font></html> dual rail Intel QDR-80 <html><font color=#ffa500></html> $~$ ➠ $~$ <html></font></html> 3-level fat-tree (BF = 2:1 / 4:1) <html><br></html><html><br></html> <html><font color=#ffffff></html>INTENT VSC-X<html></font></html><html><font color=#ff00ff></html>VSC-4 $~$ ➠ $~$ <html></font></html> single rail Intel Omnipath <html><font color=#ff00ff></html> $~$ ➠ $~$ <html></font></html> 2-level fat-tree (BF = 2:1)


VSC-3 ping-pong – intra-node vs. inter-node

  • 1 node $~$ = $~$ 2 sockets with 8 cores per socket $~$ + $~$ 2 HCAs
  • inter-node $~$ = $~$ IB fabric = dual rail Intel QDR-80 = 3-level fat-tree (BF: 2:1 / 4:1)
  • ping-pong benchmark $~$ = $~$ module load $~$ intel/16.0.3 $~$ intel-mpi/5.1.3 $~$ | $~$ openmpi/1.10.2 $~$ (1 HCA)

<HTML><ul></HTML> <HTML><li></HTML><HTML><p></HTML>MPI latency & bandwidth (plus typical values for comparison):<HTML></p></HTML>

VSC-3: latency [μs]   typical values for: latency bandwidth
<html><font color=#0000ff></html>intra-socket<html></font></html> <html><font color=#0000ff></html>0.3 μs<html></font></html>   <html><font color=#696969></html>L1 cache<html></font></html> <html><font color=#696969></html>1–2 ns<html></font></html> <html><font color=#696969></html>100 GB/s<html></font></html>
<html><font color=#6b8e23></html>inter-socket<html></font></html> <html><font color=#6b8e23></html>0.7 μs<html></font></html>   <html><font color=#696969></html>L2/L3 c.<html></font></html> <html><font color=#696969></html>3–10 ns<html></font></html> <html><font color=#696969></html>50 GB/s<html></font></html>
<html><font color=#cc3300></html>IB -1- edge<html></font></html> <html><font color=#cc3300></html>1.4 μs<html></font></html>   <html><font color=#696969></html>memory<html></font></html> <html><font color=#696969></html>100 ns<html></font></html> <html><font color=#696969></html>10 GB/s<html></font></html>
<html><font color=#ff00ff></html>IB -2- leaf<html></font></html> <html><font color=#ff00ff></html>1.8 μs<html></font></html>   <html><font color=#696969></html>HPC networks<html></font></html>
<html><font color=#ffa500></html>IB -3- spine<html></font></html> <html><font color=#ffa500></html>2.3 μs<html></font></html>   <html><font color=#696969></html>(per node / 2 HCAs)<html></font></html> <html><font color=#696969></html>1–10 μs<html></font></html> <html><font color=#696969></html>1–8 GB/s<html></font></html>

<HTML></li></HTML><HTML></ul></HTML>


  • pandoc/introduction-to-vsc/01_supercomputers_for_beginners/vsc_supercomputers.txt
  • Last modified: 2020/10/20 09:13
  • by pandoc