Table of Contents

VSC-3 – supercomputer

OUTLINE:


VSC – Vienna Scientific Cluster

VSC links: Information provided:
<html><font color=#cc3300></html>➠$~~$<html></font></html>http://vsc.ac.at VSC homepage (general info)
<html><font color=#cc3300></html>➠$~~$<html></font></html>https://service.vsc.ac.at VSC service website (application)
<html><font color=#cc3300></html>➠$~~$<html></font></html>https://wiki.vsc.ac.at VSC user documentation
<html><font color=#cc3300></html>➠$~~$<html></font></html> VSC user support $~$&$~$ contact

Supercomputers for beginners

TOP500 GREEN500 (#1 TOP500)
VSC-1 (2009) 35 TFlop/s 156 (11/2009) 94 (06/2009) 1.8 PFlop/s #1 (11/2009)
VSC-2 (2011) 135 TFlop/s 56 (06/2011) 71 (06/2011) 8 PFlop/s #1 (06/2011)
VSC-3 (2014) 596 TFlop/s 85 (11/2014) 86 (11/2014) 33 PFlop/s #1 (11/2014)
VSC-3 (………) 596 TFlop/s 460 (11/2017) 175 (11/2017) 93 PFlop/s #1 (11/2017)
VSC-4 (2019) 2.7 PFlop/s 82 (06/2019) ——– 148 PFlop/s #1 (06/2019)

VSC-3 – what does it look like ?


VSC-3 – what does it look like ? – inside


VSC-3 – components of a supercomputer

<html><br></html>$~$


Parallel hardware architectures


VSC-3 compute nodes

processing units (PU#) $~~~$ <html><font color=#cc3300></html> ➠ pinning<html></font></html> <html><br></html>see: article on SLURM and pinning@Wiki

memory hierarchy (mem_0064 nodes): <html><br></html>L1 instruction cache: 32 kB, private to core <html><br></html>L1 data cache: 32 kB, private to core <html><br></html>L2 cache: 256 kB, private to core (unified) <html><br></html>L3 cache: 20 MB, shared by 8 cores of 1 socket <html><br></html>memory: 32 GB per socket


VSC-3 node-interconnect

IB fabric = dual rail Intel QDR-80 = 3-level fat-tree (BF: 2:1 / 4:1) – schematic figure / numbers only <html><br></html><html><font color=#ffffff></html>IB fabric = dual rail Intel <html></font></html> (blocking – BF: down- : up-links – might introduce an additional latency)


VSC-3 ping-pong – intra-node vs. inter-node

<HTML><ul></HTML> <HTML><li></HTML><HTML><p></HTML>MPI latency & bandwidth (plus typical values for comparison):<HTML></p></HTML>

VSC-3: latency [μs]   typical values for: latency bandwidth
<html><font color=#0000ff></html>intra-socket<html></font></html> <html><font color=#0000ff></html>0.3 μs<html></font></html>   <html><font color=#696969></html>L1 cache<html></font></html> <html><font color=#696969></html>1–2 ns<html></font></html> <html><font color=#696969></html>100 GB/s<html></font></html>
<html><font color=#6b8e23></html>inter-socket<html></font></html> <html><font color=#6b8e23></html>0.7 μs<html></font></html>   <html><font color=#696969></html>L2/L3 c.<html></font></html> <html><font color=#696969></html>3–10 ns<html></font></html> <html><font color=#696969></html>50 GB/s<html></font></html>
<html><font color=#cc3300></html>IB -1- edge<html></font></html> <html><font color=#cc3300></html>1.4 μs<html></font></html>   <html><font color=#696969></html>memory<html></font></html> <html><font color=#696969></html>100 ns<html></font></html> <html><font color=#696969></html>10 GB/s<html></font></html>
<html><font color=#ff00ff></html>IB -2- leaf<html></font></html> <html><font color=#ff00ff></html>1.8 μs<html></font></html>   <html><font color=#696969></html>HPC networks<html></font></html>
<html><font color=#ffa500></html>IB -3- spine<html></font></html> <html><font color=#ffa500></html>2.3 μs<html></font></html>   <html><font color=#696969></html>(per node / 2 HCAs)<html></font></html> <html><font color=#696969></html>1–10 μs<html></font></html> <html><font color=#696969></html>1–8 GB/s<html></font></html>

<HTML></li></HTML><HTML></ul></HTML>