VSC – supercomputers

This version (2024/10/24 10:28) is a draft.
Approvals: 0/1

Article written by Claudia Blaas-Schenner (VSC Team) <html> </html>(last update 2020-10-10 by cb).

OUTLINE:

VSC – Vienna Scientific Cluster<html> </html>$~$
Supercomputers for beginners –<html> </html>– introducing VSC to our (new) users
- Supercomputers for beginners – what is a supercomputer ?
- VSC systems – what do they look like ?
- VSC-4 – components of a supercomputer
- Parallel hardware architectures –<html> </html>– which parallel programming models can be used ?
- VSC compute nodes
- VSC node-interconnect
- VSC-3 ping-pong – intra-node vs. inter-node

The VSC is a joint high performance computing (HPC) facility of Austrian universities.
Our mission: Within the limits of available resources we satisfy the HPC needs of our users.
VSC is primarily devoted to research.
Who can use VSC? Scientific personnel of the partner universities, see: https://vsc.ac.at/access <html></html>VSC is open to users<html></html> from other Austrian academic and research institutions.
Projects (test, funded, …): Access to VSC is granted on the basis of peer-reviewed projects.
Project manager (= usually your supervisor): Project application, extensions, creates user accounts, …
Publications: Please acknowledge VSC and add publications <html></html>$~~$➠$~~$<html></html> visible on VSC homepage !

VSC links:	Information provided:
<html><font color=#cc3300></html>➠$~~$<html></font></html>https://vsc.ac.at	VSC homepage (general info)
<html><font color=#cc3300></html>➠$~~$<html></font></html>https://service.vsc.ac.at	VSC service website (application)
<html><font color=#cc3300></html>➠$~~$<html></font></html>https://wiki.vsc.ac.at	VSC user documentation
<html><font color=#cc3300></html>➠$~~$<html></font></html>	VSC user support $~$&$~$ contact

VSC Training Courses: <html> </html><html></html>➠$~~$<html></html>https://vsc.ac.at/training <html> </html>VSC course slides: <html> </html><html></html>➠$~~$➠$~~$➠$~~$<html></html>VSC-Linux <html> </html><html></html>➠$~~$➠$~~$➠$~~$<html></html>VSC-Intro

What is a supercomputer ?
A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS)… [from Wikipedia] <html> </html><html> </html>
A supercomputer is listed in the TOP500

		TOP500	GREEN500	(#1 TOP500)
VSC-1 (2009)	35 TFlop/s	156 (11/2009)	94 (06/2009)	1.8 PFlop/s #1 (11/2009)
VSC-2 (2011)	135 TFlop/s	56 (06/2011)	71 (06/2011)	8 PFlop/s #1 (06/2011)
VSC-3 (2014)	596 TFlop/s	85 (11/2014)	86 (11/2014)	33 PFlop/s #1 (11/2014)
VSC-3 (………)	596 TFlop/s	461 (11/2017)	175 (11/2017)	93 PFlop/s #1 (11/2017)
VSC-4 (2019)	2.7 PFlop/s	82 (06/2019)	———	148 PFlop/s #1 (06/2019)
VSC-4 (………)	2.7 PFlop/s	105 (06/2020)	———	415 PFlop/s #1 (06/2020)

<html> </html>$~$

login nodes vs. compute nodes
shared (login, storage) vs. user exclusive (compute nodes -N $~$ | $~$ on VSC-4 optional shared nodes -n)

how to connect cores (processing units) ? <html> </html>

<html></html>VSC-3<html></html>, <html></html>VSC-3+<html></html>, and <html></html>VSC-4<html></html> $~$ ➠ $~$ Intel CPUs $~$ ➠ $~$ different: $~$ types, $~$ memory, $~$ # cores, $~$ # HCAs <html> </html>plus special types of hardware (GPUs on VSC-3) <html></html> ➠ <html></html> see: talk on special hardware and talk on SLURM<html> </html>$~$
VSC-3: $~$ 1 node $~$ = $~$ 2 sockets (CPUs), 8 cores per socket (P), 2 threads per core (T1/T2) $~$ + $~$ 2 HCAs

intra-socket: 59.7 GB/s (max), inter-socket via QPI (QuickPath interconnect): 32 GB/s (max)
inter-node via dual rail Intel QDR-80: 4 GB/s (max) / 3.4 GB/s (eff) per HCA (host channel adapter)
<html></html>Avoiding slow data paths is the key to most performance optimizations! $~~~$ ➠ $~$Affinity matters!$~$<html></html>

processing units (PU#) $~~~$ <html></html> ➠ pinning<html></html> <html> </html>see: article on SLURM and pinning@Wiki

memory hierarchy (mem_0064 nodes): <html> </html>L1 data cache: 32 kB, private to core <html> </html>L2 cache: 256 kB, private to core (unified) <html> </html>L3 cache: 20 MB, shared by all cores of 1 socket <html> </html>memory: 32 GB per socket

<html></html>INTENT VSC-X<html></html><html></html>VSC-3 $~$ ➠ $~$ <html></html> dual rail Intel QDR-80 <html></html> $~$ ➠ $~$ <html></html> 3-level fat-tree (BF = 2:1 / 4:1) <html> </html><html> </html> <html></html>INTENT VSC-X<html></html><html></html>VSC-4 $~$ ➠ $~$ <html></html> single rail Intel Omnipath <html></html> $~$ ➠ $~$ <html></html> 2-level fat-tree (BF = 2:1)

1 node $~$ = $~$ 2 sockets with 8 cores per socket $~$ + $~$ 2 HCAs
inter-node $~$ = $~$ IB fabric = dual rail Intel QDR-80 = 3-level fat-tree (BF: 2:1 / 4:1)
ping-pong benchmark $~$ = $~$ module load $~$ intel/16.0.3 $~$ intel-mpi/5.1.3 $~$ | $~$ openmpi/1.10.2 $~$ (1 HCA)

<HTML><ul></HTML> <HTML><li></HTML><HTML></HTML>MPI latency & bandwidth (plus typical values for comparison):<HTML></HTML>

VSC-3:	latency [μs]	typical values for:	latency	bandwidth
<html><font color=#0000ff></html>intra-socket<html></font></html>	<html><font color=#0000ff></html>0.3 μs<html></font></html>	<html><font color=#696969></html>L1 cache<html></font></html>	<html><font color=#696969></html>1–2 ns<html></font></html>	<html><font color=#696969></html>100 GB/s<html></font></html>
<html><font color=#6b8e23></html>inter-socket<html></font></html>	<html><font color=#6b8e23></html>0.7 μs<html></font></html>	<html><font color=#696969></html>L2/L3 c.<html></font></html>	<html><font color=#696969></html>3–10 ns<html></font></html>	<html><font color=#696969></html>50 GB/s<html></font></html>
<html><font color=#cc3300></html>IB -1- edge<html></font></html>	<html><font color=#cc3300></html>1.4 μs<html></font></html>	<html><font color=#696969></html>memory<html></font></html>	<html><font color=#696969></html>100 ns<html></font></html>	<html><font color=#696969></html>10 GB/s<html></font></html>
<html><font color=#ff00ff></html>IB -2- leaf<html></font></html>	<html><font color=#ff00ff></html>1.8 μs<html></font></html>	<html><font color=#696969></html>HPC networks<html></font></html>
<html><font color=#ffa500></html>IB -3- spine<html></font></html>	<html><font color=#ffa500></html>2.3 μs<html></font></html>	<html><font color=#696969></html>(per node / 2 HCAs)<html></font></html>	<html><font color=#696969></html>1–10 μs<html></font></html>	<html><font color=#696969></html>1–8 GB/s<html></font></html>

VSC – supercomputers

VSC – Vienna Scientific Cluster

Supercomputers for beginners

VSC systems – what do they look like ?

VSC-4 – components of a supercomputer

Parallel hardware architectures

VSC compute nodes

VSC node-interconnect schematic

VSC-3 ping-pong – intra-node vs. inter-node