Table of Contents

<HTML> <!— — title: “Parallel I/O” date: December 5-6, 2017 keywords: - high performance computing - I/O header-includes: - <meta name=“duration” content=“10” /> — —> </HTML>

Reason for I/O

Input and output of calculations: always as file

* Size of I/O may vary: at least a flag, but maybe large data sets
* Intermediate files
* Checkpoints

VSC infrastructure

* No monitor
* No printer

Organizational

Questions: immediately

Coffee: immediately

Comments/Feedback: yes, please!

Limiting factors to high performance

High Performance Computing

CPU: 10^10 operations per second

Memory: 10^8 operations per second

Network: 10^6 operations per second

SSD: 10^5 operations per second

HDD: 10^2 operations per second

High Performance Storage

High performance: throughput and IOPS

High Performance Storage

Latency should not dominate

Combine I/O operations

Methods

Topics of today

Tomorrow: Parallel I/O and Portable Data Formats

Introduction to I/O

User view to I/O

User view to I/O

Performance

<html><img src=“pictures/Performance_wc.png” alt=“user view” style=“position: static; vertical-align: center”/></html>

Performance

Usage

<html><img src=“pictures/Usage_wc.png” alt=“user view” style=“position: static; vertical-align: center”/></html>

Usage

Security/Safety

<html><img src=“pictures/Secure_wc.png” alt=“user view” style=“position: static; vertical-align: center”/></html>

Security/Safety

Technology

<html><img src=“pictures/Technology_wc.png” alt=“user view” style=“position: static; vertical-align: center”/></html>

Technology

<HTML> <!— ![](pictures/Technology_wc.png){height=200px style=“float: right;”}—> </HTML>

Technology used by VSC

User - Performance

User - Performance

Storage size - parallel file system - number of spindles - throughput

Temporary - locality - IOPS

Highly available - throughput

File size - number of files - storage size

User - Usage

User - Usage

Backup - locking

Storage size - read never

Performance - Usage

Performance - Usage

Throughput - random access

IOPS - random access

Throughput - sequential access

User - Security/Safety

User - Security/Safety

Storage size - redundancy

Highly available - RAID - erasure coding

Backup - redundancy

Highly available - USC - UPS

Highly available - buffer battery

User - Technology

User - Technology

Storage size - DAS - NAS - SAN

Highly available - HDFS

Visibility - object storage

Storage size - tiered storage

Big Data

Technology available

The result is called ‘Big Data’

Data growth

New data is generated digitally

Data creation increases exponentially

Internet / Social networks / Mobile Devices

Internet of Things

Medicine / Genome

Science

Types of data

V3

Characterization of Big Data by

Tools

Hadoop software tools

Applications