Differences

This shows you the differences between two versions of the page.

--- pandoc:parallel-io:02_storage_technologies:storage_technologies [2017/12/05 07:07] – Pandoc Auto-commit pandoc
+++ pandoc:parallel-io:02_storage_technologies:storage_technologies [2020/10/20 09:13] (current) – Pandoc Auto-commit pandoc
@@ Line 1: / Line 1: @@
 ====== Storage Technologies - Welcome ======
-"Computers are like Old Testament gods; lots of rules and no mercy." (Joseph Campbell)
+“Computers are like Old Testament gods; lots of rules and no mercy.” (Joseph Campbell)
+<!– “Simple things should be simple. Complicated things should be possible.” (Alan Kay) –!>
+<!– “Computer sciene is not about machines, in the same way that astronomy is not about telescopes. […] Science is not about tools, it is about how we use them and what we find out when we do.” (Michael R. Fellows) –!>
-<HTML>
-<!-- "Simple things should be simple. Complicated things should be possible." (Alan Kay) -->
-<!-- "Computer sciene is not about machines, in the same way that astronomy is not about telescopes. [...] Science is not about tools, it is about how we use them and what we find out when we do." (Michael R. Fellows) -->
-</HTML>
 ====== Storage Technologies - Contents ======
@@ Line 23: / Line 23: @@
 ====== Storage Technologies - Hardware Basics ======
-{{pandoc:parallel-io:02_storage_technologies:storage_technologies:node.jpg}}
+{{.:node.jpg}}
 ====== Storage Technologies - Hardware Basics ======
@@ Line 41: / Line 41: @@
   * System Bus
     * Provides Communication among processor, memory and I/O modules
-    * ISA, PCI, AGP, PCI-Express, ...
+    * ISA, PCI, AGP, PCI-Express, …
   * External Devices
     * I/O Controllers (HBA / RAID)
@@ Line 60: / Line 60: @@
 ====== Memory Hierarchy (I) ======
-{{pandoc:parallel-io:02_storage_technologies:storage_technologies:memoryhierarchypyramid.png}} (Operating Systems, 7th Edition, W. Stallings, Chapter 1)
+{{.:memoryhierarchypyramid.png}} (Operating Systems, 7th Edition, W. Stallings, Chapter 1)
 ====== Memory Hierarchy (II) ======
@@ Line 78: / Line 78: @@
   * Why?
-  * SRAM is expensive
+    * SRAM is expensive
-    * Trade-off among speed, cost, size, and power consumption
+      * Trade-off among speed, cost, size, and power consumption
   * Strategies
     * Caching\\
@@ Line 88: / Line 88: @@
 ====== Memory Hierarchy (IV) ======
-  * As one goes down the hierarchy...
+  * As one goes down the hierarchy…
     * Decreasing cost per bit
     * Increasing capacity
@@ Line 110: / Line 110: @@
 ====== Memory Access Times ======
-{{pandoc:parallel-io:02_storage_technologies:storage_technologies:haswell_access_times.png}}
+{{.:haswell_access_times.png}}
 ====== Memory Access Times (II) ======
-{{pandoc:parallel-io:02_storage_technologies:storage_technologies:haswell_access_times_with_ram.png}}
+{{.:haswell_access_times_with_ram.png}}
 ====== Memory Hierarchy (VI) ======
   * Storage Devices Typical Access Times
-    * NVMe Flash Memory ~ 6 µs (@ 150'000 IOPS)
+    * NVMe Flash Memory ~ 6 µs (@ 150’000 IOPS)
     * SAS Magnetical Disk ~ 2-3 ms (@ 350 IOPS)
     * Magnetical Tape milliseconds up to many seconds
@@ Line 125: / Line 125: @@
 ====== Caching ======
-{{pandoc:parallel-io:02_storage_technologies:storage_technologies:memoryhierarchypyramid.png}} (Operating Systems, 7th Edition, W. Stallings, Chapter 1)
+{{.:memoryhierarchypyramid.png}} (Operating Systems, 7th Edition, W. Stallings, Chapter 1)
 ====== Storage Technologies - Cache Memory ======
@@ Line 139: / Line 139: @@
 ====== Caching Strategies ======
-  * Cache Hit vs. Cache Miss
+  * Cache Hit vs. Cache Miss
     * Cache Hit Ratio
   * Replacement Strategies / Associativity
@@ Line 154: / Line 154: @@
   * Larger Caches have higher hit-rate but also higher latency
     * Multi Level Caches
-  * Data may become incoherent particularly in multiprocessor systems --> Cache Coherency Protocol
+  * Data may become incoherent particularly in multiprocessor systems –> Cache Coherency Protocol
     * Write Propagation
     * Transaction Serialization (Reads/Writes to a memory location must be seen in the same order by all processors)
-====== Storage Technologies - Why Caching Works --> Locality of reference ======
+====== Storage Technologies - Why Caching Works –> Locality of reference ======
   * Memory References tend to cluster.
@@ Line 203: / Line 203: @@
 ====== Memory Hierarchy - Recap ======
-  * Problem: A CPU waiting for data can't do any work
+  * Problem: A CPU waiting for data can’t do any work
     * Low Memory Access Times are crucial
   * Solution: Caching/Prefetching algorithms.
     * Works well with sequential data / local data access patterns
-    * Won't work with totally random access patterns (Locality)
+    * Won’t work with totally random access patterns (Locality)
   * As one goes down the hierarchy
     * Increasing access time
     * Decreasing throughput
-    * Also known as "Von Neumann Bottleneck"
+    * Also known as “Von Neumann Bottleneck”
 ====== Storage Technologies (I) ======
@@ Line 248: / Line 248: @@
     * Read/Write
     * Read Only
-    * Slow Write, Fast Read (e.g. SMR Discs)
+    * Slow Write, Fast Read (e.g. SMR Discs)
   * Accessibility
     * Random Access
@@ Line 257: / Line 257: @@
     * Content addressable
-====== Sequential I/O vs. Random I/O ======
+====== Sequential I/O vs. Random I/O ======
   * Sequential I/O
@@ Line 266: / Line 266: @@
     * Writing / Reading small chunks of data to / from random locations (Chunk Size <= 10E4 Byte)
     * Slowest way to read data from storage devices
-    * Magnitude of the slow-down depends on the underlying hard- and software (e.g. Tape-Drives vs. Flash-Drives)
+    * Magnitude of the slow-down depends on the underlying hard- and software (e.g. Tape-Drives vs. Flash-Drives)
 ====== Hard-Drives Overview (I) ======
   * Invented in the mid 50s by IBM
-  * The first IBM drive stored 3.75 MB on a stack of 50 discs
+    * The first IBM drive stored 3.75 MB on a stack of 50 discs
-    * Had the size of two refrigerators (1.9 m³)
+      * Had the size of two refrigerators (1.9 m³)
   * Became cheap / mainstream in the late 80s
   * Today one 3.5" drive can hold up to 14TB of data
-  * Density ~ 1.5 Tbit / in²
+    * Density ~ 1.5 Tbit / in²
-  * With future technologies even higer data densities may become possible
+    * With future technologies even higer data densities may become possible
-    * Filling disks with helium
+      * Filling disks with helium
-    * Shingled Magnetic Recording
+      * Shingled Magnetic Recording
-    * Heat Assisted Magnetic Recording
+      * Heat Assisted Magnetic Recording
   * Interface
     * Serial Attached SCSI (SAS)
@@ Line 287: / Line 287: @@
 ====== Hard-Drives Overview (II) ======
-{{pandoc:parallel-io:02_storage_technologies:storage_technologies:harddisk_description_wikipedia.png}}
+{{.:harddisk_description_wikipedia.png}}
   * Video of operating harddisk on wikipedia: https:%%//%%upload.wikimedia.org/wikipedia/commons/c/c5/HardDisk1.ogv
@@ Line 296: / Line 296: @@
     * The time it takes to move the head to the correct track
   * Rotational Delay
-    * The time it takes until the platter rotates to the correct position ( ~2ms @ 15'000 rpm)
+    * The time it takes until the platter rotates to the correct position ( ~2ms @ 15’000 rpm)
-    * r... rotation speed in revolutions per second
+    * r… rotation speed in revolutions per second
     * $T_{rdelay} = 1/2r$
   * Transfer time calculation
-    * T... Transfer Time
+    * T… Transfer Time
-    * b... Number of bytes to be transferred
+    * b… Number of bytes to be transferred
-    * N... Number of bytes per track
+    * N… Number of bytes per track
-    * r... rotation speed in revolutions per second
+    * r… rotation speed in revolutions per second
     * $T = b/rN$
   * Average Access Time
@@ Line 312: / Line 312: @@
   * Do not rely on average values!
   * Given a Harddisk with
-    * Rotational speed of 7'500 rpm
+    * Rotational speed of 7’500 rpm
     * 512 byte sectors
     * 500 sectors per track
   * We read a file that is stored on 5 adjacent tracks in 2500 sectors (1.28 MBytes)
-    * Average seek... 4ms
+    * Average seek… 4ms
-    * Rotational delay... 4ms
+    * Rotational delay… 4ms
-    * Read 500 sectors... 8ms
+    * Read 500 sectors… 8ms
     * We need to do this 5 times (1 for each track), but because of sequential organization we can skip the seek time for the consecutive tracks
     * $T_{total} = 16 + (4 * 12) = 64 ms$
@@ Line 325: / Line 325: @@
   * Given a Harddisk with
-    * Rotational speed of 7'500 rpm
+    * Rotational speed of 7’500 rpm
     * 512 byte sectors
     * 500 sectors per track
   * We read a file that is distributed randomly over 2500 sectors of the disk
-    * Average seek... 4ms
+    * Average seek… 4ms
-    * Rotational delay... 4ms
+    * Rotational delay… 4ms
-    * Read 1 sector... 0.016ms
+    * Read 1 sector… 0.016ms
     * We need to do this 2500 times with a seek after each sector
     * $T_{total} = 2'500 * 8.016 = 20'040 ms = 20.04s$
@@ Line 376: / Line 376: @@
     * Is composed of one or more chips
     * Chips are segmented into planes
-    * Planes are segmented into thousands (e.g. 2048) of blocks
+    * Planes are segmented into thousands (e.g. 2048) of blocks
       * And 1 or 2 registers of the page size for buffering
     * A Block usually contains 64 to 128 pages
       * Each page has a data part (few KBytes)
-      * and a small metadata area (e.g. 128 bytes) for Error Correcting Code
+      * and a small metadata area (e.g. 128 bytes) for Error Correcting Code
     * Exact specification varies across different memory packages
@@ Line 401: / Line 401: @@
     * Can take up to 5 ms
     * Limited number of erase cycles
-      * SLC: 100'000
+      * SLC: 100’000
-      * MLC: 10'000
+      * MLC: 10’000
     * Some flash memory is reserved to replace bad blocks
     * Controller takes care of wear leveling
@@ Line 416: / Line 416: @@
       * Format the memory
       * Mark bad blocks
-    * Moves data and picks blocks so that single cells don't wear out
+    * Moves data and picks blocks so that single cells don’t wear out
     * Parallelizes read/write/erase operations on many dies
     * Adresses in Blocks/Pages
@@ Line 431: / Line 431: @@
     * Needs more CPU utilization to get full throughput
       * Needs some pressure (multiple calls) to fully parallelize read/write calls
-      * This is also called 'Queue Depth'
+      * This is also called ‘Queue Depth’
-====== (NVMe) Solid State Drives - IOPS vs. Throughput ======
+====== (NVMe) Solid State Drives - IOPS vs. Throughput ======
   * Traditional discs have been measured in terms of throughput
@@ Line 440: / Line 440: @@
     * $Throughput = IOPS * Blocksize$
     * Where blocksize can be chosen freely
-  * So if we know the blocksize that was used in benchmarking...
+  * So if we know the blocksize that was used in benchmarking…
     * We can calculate IOPS from throughput and vice versa
     * (If we assume that the disk was empty at the beginning of the benchmark and no additional controller overhead was involved)
@@ Line 448: / Line 448: @@
   * Given an Intel DC P3700 SSD with a capacity of 2 TB, specification says:
-    * Sequential Read 450'000 IOPS with 2800MB/s of max throughput
+    * Sequential Read 450’000 IOPS with 2800MB/s of max throughput
     * $2'800'000'000 Bytes = 450'000 * x$
     * $x = 6222 Bytes ~ 6KByte$
@@ Line 455: / Line 455: @@
     * How would a traditional HDD perform under these conditions?
-====== (NVMe) Comparison SSD vs. HDD ======
+====== (NVMe) Comparison SSD vs. HDD ======
   * In the slide before, we proposed that on our SSD with a block size of 8 KByte will lead to a throughput of 2800MB/s
-  * Let's try this with a HDD
+  * Let’s try this with a HDD
-    * Rotational speed of 7'500 rpm
+    * Rotational speed of 7’500 rpm
     * 512 byte sectors
     * 500 sectors per track
-====== (NVMe) Comparison SSD vs. HDD ======
+====== (NVMe) Comparison SSD vs. HDD ======
   * We read a file that is distributed randomly over 2500 sectors of the disk in blocks of 8KByte (Blocks of 16 sectors)
-    * Average seek... 4ms
+    * Average seek… 4ms
-    * Rotational delay... 4ms
+    * Rotational delay… 4ms
-    * Read 16 sectors... 0.256ms
+    * Read 16 sectors… 0.256ms
     * We need to do this 156.25 times with a seek after every block of 16 sectors
     * $T_{total} = 156.25 * 8.256 = 1290 ms = 1.290 s$
@@ Line 487: / Line 487: @@
     * In Q4 2016 45 million SSDs with a total capacity of 16 Exabyte were delivered to customers
     * Market for HDDs is significantly bigger than for SSDs
-    * New memory technologies (e.g. Intel/Micron 3DXPoint)
+    * New memory technologies (e.g. Intel/Micron 3DXPoint)
     * Intel Optane Memories
   * Things to consider
@@ Line 498: / Line 498: @@
 ====== Magnetic Tapes (I) ======
-  * Have been in use for data storage since the 50's
+  * Have been in use for data storage since the 50’s
   * Main storage medium in some early computers
   * Capacity:
-    * 1950's: ~ 1 MByte
+    * 1950’s: ~ 1 MByte
-    * 1980's: ~ 100 MByte - 1 GByte
+    * 1980’s: ~ 100 MByte - 1 GByte
-    * 1990's: ~ 10 - 100 GByte
+    * 1990’s: ~ 10 - 100 GByte
-    * 2000's: ~ 100 GByte - 1 TByte
+    * 2000’s: ~ 100 GByte - 1 TByte
     * Now: >10 TByte
     * Future: Going up to 200 TByte per Tape seems possible
@@ Line 552: / Line 552: @@
   * Linear Tape Open
     * Open standard to store data on magnetic tapes
-    * Developed in the 90's
+    * Developed in the 90’s
     * Eigth Generation LTO was released in 2017
   * LTO-8 State of the art
@@ Line 571: / Line 571: @@
       * Expected durability of a tape is about 10^4 end-to-end passes
-====== Memory Hierarchy - cont'd ======
+====== Memory Hierarchy - cont’d ======
   * With the given storage technologies (Flash, HDD and Tape) we can refine our Memory hierarchy
@@ Line 581: / Line 581: @@
     * Recently sequential acessed files will be moved to the HDDs (They are fast enough when used in a sequential way)
     * Rarely acessed files go to the tape machine
-      * We could even link these files transparently into the file system, so the user doesn't have to know anything about where his data is located
+      * We could even link these files transparently into the file system, so the user doesn’t have to know anything about where his data is located
 ====== Multi Tiered Storage Systems ======
@@ Line 589: / Line 589: @@
     * Tier 2: SATA HDDs
     * Tier 3: Tape Storage
-  * But there's no free meal
+  * But there’s no free meal
-    * We need to store data about where our data is stored (More on metadata later on) --> Memory Overhead
+    * We need to store data about where our data is stored (More on metadata later on) –> Memory Overhead
-    * If a user accesses a file, we need to check where the file is --> Processing Overhead
+    * If a user accesses a file, we need to check where the file is –> Processing Overhead
       * In the worst case we have to copy it from tape to the disk
       * which means loading the right tape and winding to the correct position
@@ Line 602: / Line 602: @@
     * Add parity calculation or mirroring for redundancy
   * Different Techniques / RAID Levels available
-    * RAID 0, 1, 5, 6, 10, 01, ...
+    * RAID 0, 1, 5, 6, 10, 01, …
   * Several Discs in a RAID are called a RAID array
@@ Line 633: / Line 633: @@
 ====== RAID - Other Levels ======
-  * "Hybrid"-RAID
+  * “Hybrid”-RAID
     * Mirroring between SSD and HDD (special controller needed)
   * Nested-RAID
@@ Line 656: / Line 656: @@
   * Abstraction of the secondary storage
-    * After all we don't want to cope with the internals of storage devices
+    * After all we don’t want to cope with the internals of storage devices
-    * Abstraction to Files / Directories --> File System
+    * Abstraction to Files / Directories –> File System
   * Files
     * Long-term existence
     * Sharable between processes
     * Structure
-      * Files can have internal structure (e.g. databases)
+      * Files can have internal structure (e.g. databases)
   * Operations
     * Create
@@ Line 693: / Line 693: @@
   * Appending to files and writing new files is not possible when inodes run out
   * Depending on the amount of files/directories a filesystem can use up to 10% of its capacity for meta information
-  * Show Inode Number with 'ls -i'
+  * Show Inode Number with ‘ls -i’
   * Show content with stat
@@ Line 711: / Line 711: @@
 ====== File System - Organization (I) ======
-{{pandoc:parallel-io:02_storage_technologies:storage_technologies:fs1.png}} (Operating Systems 7th Edition, W. Stallings, Chapter 12)
+{{.:fs1.png}} (Operating Systems 7th Edition, W. Stallings, Chapter 12)
 ====== File System - Organization (II) ======
@@ Line 735: / Line 735: @@
 ====== File System - Organization (IV) ======
-{{pandoc:parallel-io:02_storage_technologies:storage_technologies:fs2.png}} (Operating Systems 7th Edition, W. Stallings, Chapter 12)
+{{.:fs2.png}} (Operating Systems 7th Edition, W. Stallings, Chapter 12)
 ====== Unix File Management ======
@@ Line 745: / Line 745: @@
       * Contains list of file names plus pointers to their index nodes
     * Special
-      * Map physical devices to files (e.g. /dev/sda for the first SAS/SATA disc)
+      * Map physical devices to files (e.g. /dev/sda for the first SAS/SATA disc)
     * Named pipes
       * Buffers received data
@@ Line 758: / Line 758: @@
   * Different File Systems under Linux
-    * ext2, ext3, ext4, reiserfs, xfs, nfs, ...
+    * ext2, ext3, ext4, reiserfs, xfs, nfs, …
     * VFS provides a virtual file system
       * Presents a single, uniform interface to user processes
       * Mapping between VFS and underlying file system
-====== Storage Networks - NAS vs. SAN ======
+====== Storage Networks - NAS vs. SAN ======
   * NAS - Network Attached Storage
     * Centralized File Storage
     * Exporting File-Systems for clients to mount
-    * Connected via Network (Ethernet, Infiniband, ...)
+    * Connected via Network (Ethernet, Infiniband, …)
     * File System Layers of the server are used
     * User/Access Management at the server
@@ Line 815: / Line 815: @@
     * Server replies to requests before they have been committed to stable storage
     * Usually less error prone than client side caching, because servers are usually in a more controlled environment than clients
-      * e.g. Uninterruptible Power Supply, BBU Units on RAID controllers, etc.
+      * e.g. Uninterruptible Power Supply, BBU Units on RAID controllers, etc.
   * Problems arise
     * Server crashes before data is transferred from client to server
@@ Line 821: / Line 821: @@
     * Network connection breaks
     * Power outages
-    * etc...
+    * etc…
 ====== Parallell File Systems - BeeGFS ======
@@ Line 846: / Line 846: @@
   * Holds the Metadata information of the file system
-    * Should be stored on fast storage devices (e.g. Flash Memories)
+    * Should be stored on fast storage devices (e.g. Flash Memories)
       * Metadata accesses are mostly tiny random accesses
     * Ext4 seems to be the fastest filesystem to store beegfs metadata
@@ Line 861: / Line 861: @@
     * ACLs
     * Extended attributes
-    * ...
+    * …
 ====== BeeGFS - Metadata Server ======
-  * Entry point of the file system '/' is located on MDS #1
+  * Entry point of the file system ‘/’ is located on MDS #1
     * Defined entry point in the directory tree
     * Additional directories are assigned random to other MDS
@@ Line 911: / Line 911: @@
       * Laserdisc / CD-Rom / DVD / Blu-Ray
     * Magnetic
-    * Floppy / ZIP Drives
+      * Floppy / ZIP Drives
     * Magneto-Optical
-    * Minidisc (Sony)
+      * Minidisc (Sony)
   * Different Parallell File Systems
     * GPFS
@@ Line 935: / Line 935: @@
     * Sequential I/O can even be done in parallell from multiple nodes to further improve the throughput
     * Highly parallellized Random calls will result in degraded storage performance for ALL users and can even lead to an unresponsive storage.
-      * Don't do random I/O on storage devices
+      * Don’t do random I/O on storage devices
 Thank you for your attention