Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
pandoc:parallel-io:02_storage_technologies:storage_technologies [2017/12/05 07:07] – Pandoc Auto-commit pandoc | pandoc:parallel-io:02_storage_technologies:storage_technologies [2020/10/20 09:13] (current) – Pandoc Auto-commit pandoc | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Storage Technologies - Welcome ====== | ====== Storage Technologies - Welcome ====== | ||
- | "Computers are like Old Testament gods; lots of rules and no mercy." | + | “Computers are like Old Testament gods; lots of rules and no mercy.” (Joseph Campbell) |
+ | |||
+ | <!– “Simple things should be simple. Complicated things should be possible.” (Alan Kay) –!> | ||
+ | |||
+ | <!– “Computer sciene is not about machines, in the same way that astronomy is not about telescopes. […] Science is not about tools, it is about how we use them and what we find out when we do.” (Michael R. Fellows) –!> | ||
- | < | ||
- | <!-- " | ||
- | <!-- " | ||
- | </ | ||
====== Storage Technologies - Contents ====== | ====== Storage Technologies - Contents ====== | ||
Line 23: | Line 23: | ||
====== Storage Technologies - Hardware Basics ====== | ====== Storage Technologies - Hardware Basics ====== | ||
- | {{pandoc: | + | {{.:node.jpg}} |
====== Storage Technologies - Hardware Basics ====== | ====== Storage Technologies - Hardware Basics ====== | ||
Line 41: | Line 41: | ||
* System Bus | * System Bus | ||
* Provides Communication among processor, memory and I/O modules | * Provides Communication among processor, memory and I/O modules | ||
- | * ISA, PCI, AGP, PCI-Express, | + | * ISA, PCI, AGP, PCI-Express, |
* External Devices | * External Devices | ||
* I/O Controllers (HBA / RAID) | * I/O Controllers (HBA / RAID) | ||
Line 60: | Line 60: | ||
====== Memory Hierarchy (I) ====== | ====== Memory Hierarchy (I) ====== | ||
- | {{pandoc: | + | {{.: |
====== Memory Hierarchy (II) ====== | ====== Memory Hierarchy (II) ====== | ||
Line 78: | Line 78: | ||
* Why? | * Why? | ||
- | | + | |
- | * Trade-off among speed, cost, size, and power consumption | + | * Trade-off among speed, cost, size, and power consumption |
* Strategies | * Strategies | ||
* Caching\\ | * Caching\\ | ||
Line 88: | Line 88: | ||
====== Memory Hierarchy (IV) ====== | ====== Memory Hierarchy (IV) ====== | ||
- | * As one goes down the hierarchy... | + | * As one goes down the hierarchy… |
* Decreasing cost per bit | * Decreasing cost per bit | ||
* Increasing capacity | * Increasing capacity | ||
Line 110: | Line 110: | ||
====== Memory Access Times ====== | ====== Memory Access Times ====== | ||
- | {{pandoc: | + | {{.: |
====== Memory Access Times (II) ====== | ====== Memory Access Times (II) ====== | ||
- | {{pandoc: | + | {{.: |
====== Memory Hierarchy (VI) ====== | ====== Memory Hierarchy (VI) ====== | ||
* Storage Devices Typical Access Times | * Storage Devices Typical Access Times | ||
- | * NVMe Flash Memory ~ 6 µs (@ 150'000 IOPS) | + | * NVMe Flash Memory ~ 6 µs (@ 150’000 IOPS) |
* SAS Magnetical Disk ~ 2-3 ms (@ 350 IOPS) | * SAS Magnetical Disk ~ 2-3 ms (@ 350 IOPS) | ||
* Magnetical Tape milliseconds up to many seconds | * Magnetical Tape milliseconds up to many seconds | ||
Line 125: | Line 125: | ||
====== Caching ====== | ====== Caching ====== | ||
- | {{pandoc: | + | {{.: |
====== Storage Technologies - Cache Memory ====== | ====== Storage Technologies - Cache Memory ====== | ||
Line 139: | Line 139: | ||
====== Caching Strategies ====== | ====== Caching Strategies ====== | ||
- | * Cache Hit vs. Cache Miss | + | * Cache Hit vs. Cache Miss |
* Cache Hit Ratio | * Cache Hit Ratio | ||
* Replacement Strategies / Associativity | * Replacement Strategies / Associativity | ||
Line 154: | Line 154: | ||
* Larger Caches have higher hit-rate but also higher latency | * Larger Caches have higher hit-rate but also higher latency | ||
* Multi Level Caches | * Multi Level Caches | ||
- | * Data may become incoherent particularly in multiprocessor systems | + | * Data may become incoherent particularly in multiprocessor systems |
* Write Propagation | * Write Propagation | ||
* Transaction Serialization (Reads/ | * Transaction Serialization (Reads/ | ||
- | ====== Storage Technologies - Why Caching Works --> Locality of reference ====== | + | ====== Storage Technologies - Why Caching Works –> Locality of reference ====== |
* Memory References tend to cluster. | * Memory References tend to cluster. | ||
Line 203: | Line 203: | ||
====== Memory Hierarchy - Recap ====== | ====== Memory Hierarchy - Recap ====== | ||
- | * Problem: A CPU waiting for data can't do any work | + | * Problem: A CPU waiting for data can’t do any work |
* Low Memory Access Times are crucial | * Low Memory Access Times are crucial | ||
* Solution: Caching/ | * Solution: Caching/ | ||
* Works well with sequential data / local data access patterns | * Works well with sequential data / local data access patterns | ||
- | * Won't work with totally random access patterns (Locality) | + | * Won’t work with totally random access patterns (Locality) |
* As one goes down the hierarchy | * As one goes down the hierarchy | ||
* Increasing access time | * Increasing access time | ||
* Decreasing throughput | * Decreasing throughput | ||
- | * Also known as "Von Neumann Bottleneck" | + | * Also known as “Von Neumann Bottleneck” |
====== Storage Technologies (I) ====== | ====== Storage Technologies (I) ====== | ||
Line 248: | Line 248: | ||
* Read/Write | * Read/Write | ||
* Read Only | * Read Only | ||
- | * Slow Write, Fast Read (e.g. SMR Discs) | + | * Slow Write, Fast Read (e.g. SMR Discs) |
* Accessibility | * Accessibility | ||
* Random Access | * Random Access | ||
Line 257: | Line 257: | ||
* Content addressable | * Content addressable | ||
- | ====== Sequential I/O vs. Random I/O ====== | + | ====== Sequential I/O vs. Random I/O ====== |
* Sequential I/O | * Sequential I/O | ||
Line 266: | Line 266: | ||
* Writing / Reading small chunks of data to / from random locations (Chunk Size <= 10E4 Byte) | * Writing / Reading small chunks of data to / from random locations (Chunk Size <= 10E4 Byte) | ||
* Slowest way to read data from storage devices | * Slowest way to read data from storage devices | ||
- | * Magnitude of the slow-down depends on the underlying hard- and software (e.g. Tape-Drives vs. Flash-Drives) | + | * Magnitude of the slow-down depends on the underlying hard- and software (e.g. Tape-Drives vs. Flash-Drives) |
====== Hard-Drives Overview (I) ====== | ====== Hard-Drives Overview (I) ====== | ||
* Invented in the mid 50s by IBM | * Invented in the mid 50s by IBM | ||
- | | + | |
- | * Had the size of two refrigerators (1.9 m³) | + | * Had the size of two refrigerators (1.9 m³) |
* Became cheap / mainstream in the late 80s | * Became cheap / mainstream in the late 80s | ||
* Today one 3.5" drive can hold up to 14TB of data | * Today one 3.5" drive can hold up to 14TB of data | ||
- | | + | |
- | * With future technologies even higer data densities may become possible | + | * With future technologies even higer data densities may become possible |
- | * Filling disks with helium | + | * Filling disks with helium |
- | * Shingled Magnetic Recording | + | * Shingled Magnetic Recording |
- | * Heat Assisted Magnetic Recording | + | * Heat Assisted Magnetic Recording |
* Interface | * Interface | ||
* Serial Attached SCSI (SAS) | * Serial Attached SCSI (SAS) | ||
Line 287: | Line 287: | ||
====== Hard-Drives Overview (II) ====== | ====== Hard-Drives Overview (II) ====== | ||
- | {{pandoc: | + | {{.: |
* Video of operating harddisk on wikipedia: https: | * Video of operating harddisk on wikipedia: https: | ||
Line 296: | Line 296: | ||
* The time it takes to move the head to the correct track | * The time it takes to move the head to the correct track | ||
* Rotational Delay | * Rotational Delay | ||
- | * The time it takes until the platter rotates to the correct position ( ~2ms @ 15'000 rpm) | + | * The time it takes until the platter rotates to the correct position ( ~2ms @ 15’000 rpm) |
- | * r... rotation speed in revolutions per second | + | * r… rotation speed in revolutions per second |
* $T_{rdelay} = 1/2r$ | * $T_{rdelay} = 1/2r$ | ||
* Transfer time calculation | * Transfer time calculation | ||
- | * T... Transfer Time | + | * T… Transfer Time |
- | * b... Number of bytes to be transferred | + | * b… Number of bytes to be transferred |
- | * N... Number of bytes per track | + | * N… Number of bytes per track |
- | * r... rotation speed in revolutions per second | + | * r… rotation speed in revolutions per second |
* $T = b/rN$ | * $T = b/rN$ | ||
* Average Access Time | * Average Access Time | ||
Line 312: | Line 312: | ||
* Do not rely on average values! | * Do not rely on average values! | ||
* Given a Harddisk with | * Given a Harddisk with | ||
- | * Rotational speed of 7'500 rpm | + | * Rotational speed of 7’500 rpm |
* 512 byte sectors | * 512 byte sectors | ||
* 500 sectors per track | * 500 sectors per track | ||
* We read a file that is stored on 5 adjacent tracks in 2500 sectors (1.28 MBytes) | * We read a file that is stored on 5 adjacent tracks in 2500 sectors (1.28 MBytes) | ||
- | * Average seek... 4ms | + | * Average seek… 4ms |
- | * Rotational delay... 4ms | + | * Rotational delay… 4ms |
- | * Read 500 sectors... 8ms | + | * Read 500 sectors… 8ms |
* We need to do this 5 times (1 for each track), but because of sequential organization we can skip the seek time for the consecutive tracks | * We need to do this 5 times (1 for each track), but because of sequential organization we can skip the seek time for the consecutive tracks | ||
* $T_{total} = 16 + (4 * 12) = 64 ms$ | * $T_{total} = 16 + (4 * 12) = 64 ms$ | ||
Line 325: | Line 325: | ||
* Given a Harddisk with | * Given a Harddisk with | ||
- | * Rotational speed of 7'500 rpm | + | * Rotational speed of 7’500 rpm |
* 512 byte sectors | * 512 byte sectors | ||
* 500 sectors per track | * 500 sectors per track | ||
* We read a file that is distributed randomly over 2500 sectors of the disk | * We read a file that is distributed randomly over 2500 sectors of the disk | ||
- | * Average seek... 4ms | + | * Average seek… 4ms |
- | * Rotational delay... 4ms | + | * Rotational delay… 4ms |
- | * Read 1 sector... 0.016ms | + | * Read 1 sector… 0.016ms |
* We need to do this 2500 times with a seek after each sector | * We need to do this 2500 times with a seek after each sector | ||
* $T_{total} = 2'500 * 8.016 = 20'040 ms = 20.04s$ | * $T_{total} = 2'500 * 8.016 = 20'040 ms = 20.04s$ | ||
Line 376: | Line 376: | ||
* Is composed of one or more chips | * Is composed of one or more chips | ||
* Chips are segmented into planes | * Chips are segmented into planes | ||
- | * Planes are segmented into thousands (e.g. 2048) of blocks | + | * Planes are segmented into thousands (e.g. 2048) of blocks |
* And 1 or 2 registers of the page size for buffering | * And 1 or 2 registers of the page size for buffering | ||
* A Block usually contains 64 to 128 pages | * A Block usually contains 64 to 128 pages | ||
* Each page has a data part (few KBytes) | * Each page has a data part (few KBytes) | ||
- | * and a small metadata area (e.g. 128 bytes) for Error Correcting Code | + | * and a small metadata area (e.g. 128 bytes) for Error Correcting Code |
* Exact specification varies across different memory packages | * Exact specification varies across different memory packages | ||
Line 401: | Line 401: | ||
* Can take up to 5 ms | * Can take up to 5 ms | ||
* Limited number of erase cycles | * Limited number of erase cycles | ||
- | * SLC: 100'000 | + | * SLC: 100’000 |
- | * MLC: 10'000 | + | * MLC: 10’000 |
* Some flash memory is reserved to replace bad blocks | * Some flash memory is reserved to replace bad blocks | ||
* Controller takes care of wear leveling | * Controller takes care of wear leveling | ||
Line 416: | Line 416: | ||
* Format the memory | * Format the memory | ||
* Mark bad blocks | * Mark bad blocks | ||
- | * Moves data and picks blocks so that single cells don't wear out | + | * Moves data and picks blocks so that single cells don’t wear out |
* Parallelizes read/ | * Parallelizes read/ | ||
* Adresses in Blocks/ | * Adresses in Blocks/ | ||
Line 431: | Line 431: | ||
* Needs more CPU utilization to get full throughput | * Needs more CPU utilization to get full throughput | ||
* Needs some pressure (multiple calls) to fully parallelize read/write calls | * Needs some pressure (multiple calls) to fully parallelize read/write calls | ||
- | * This is also called | + | * This is also called |
- | ====== (NVMe) Solid State Drives - IOPS vs. Throughput ====== | + | ====== (NVMe) Solid State Drives - IOPS vs. Throughput ====== |
* Traditional discs have been measured in terms of throughput | * Traditional discs have been measured in terms of throughput | ||
Line 440: | Line 440: | ||
* $Throughput = IOPS * Blocksize$ | * $Throughput = IOPS * Blocksize$ | ||
* Where blocksize can be chosen freely | * Where blocksize can be chosen freely | ||
- | * So if we know the blocksize that was used in benchmarking... | + | * So if we know the blocksize that was used in benchmarking… |
* We can calculate IOPS from throughput and vice versa | * We can calculate IOPS from throughput and vice versa | ||
* (If we assume that the disk was empty at the beginning of the benchmark and no additional controller overhead was involved) | * (If we assume that the disk was empty at the beginning of the benchmark and no additional controller overhead was involved) | ||
Line 448: | Line 448: | ||
* Given an Intel DC P3700 SSD with a capacity of 2 TB, specification says: | * Given an Intel DC P3700 SSD with a capacity of 2 TB, specification says: | ||
- | * Sequential Read 450'000 IOPS with 2800MB/s of max throughput | + | * Sequential Read 450’000 IOPS with 2800MB/s of max throughput |
* $2' | * $2' | ||
* $x = 6222 Bytes ~ 6KByte$ | * $x = 6222 Bytes ~ 6KByte$ | ||
Line 455: | Line 455: | ||
* How would a traditional HDD perform under these conditions? | * How would a traditional HDD perform under these conditions? | ||
- | ====== (NVMe) Comparison SSD vs. HDD ====== | + | ====== (NVMe) Comparison SSD vs. HDD ====== |
* In the slide before, we proposed that on our SSD with a block size of 8 KByte will lead to a throughput of 2800MB/s | * In the slide before, we proposed that on our SSD with a block size of 8 KByte will lead to a throughput of 2800MB/s | ||
- | * Let's try this with a HDD | + | * Let’s try this with a HDD |
- | * Rotational speed of 7'500 rpm | + | * Rotational speed of 7’500 rpm |
* 512 byte sectors | * 512 byte sectors | ||
* 500 sectors per track | * 500 sectors per track | ||
- | ====== (NVMe) Comparison SSD vs. HDD ====== | + | ====== (NVMe) Comparison SSD vs. HDD ====== |
* We read a file that is distributed randomly over 2500 sectors of the disk in blocks of 8KByte (Blocks of 16 sectors) | * We read a file that is distributed randomly over 2500 sectors of the disk in blocks of 8KByte (Blocks of 16 sectors) | ||
- | * Average seek... 4ms | + | * Average seek… 4ms |
- | * Rotational delay... 4ms | + | * Rotational delay… 4ms |
- | * Read 16 sectors... 0.256ms | + | * Read 16 sectors… 0.256ms |
* We need to do this 156.25 times with a seek after every block of 16 sectors | * We need to do this 156.25 times with a seek after every block of 16 sectors | ||
* $T_{total} = 156.25 * 8.256 = 1290 ms = 1.290 s$ | * $T_{total} = 156.25 * 8.256 = 1290 ms = 1.290 s$ | ||
Line 487: | Line 487: | ||
* In Q4 2016 45 million SSDs with a total capacity of 16 Exabyte were delivered to customers | * In Q4 2016 45 million SSDs with a total capacity of 16 Exabyte were delivered to customers | ||
* Market for HDDs is significantly bigger than for SSDs | * Market for HDDs is significantly bigger than for SSDs | ||
- | * New memory technologies (e.g. Intel/ | + | * New memory technologies (e.g. Intel/ |
* Intel Optane Memories | * Intel Optane Memories | ||
* Things to consider | * Things to consider | ||
Line 498: | Line 498: | ||
====== Magnetic Tapes (I) ====== | ====== Magnetic Tapes (I) ====== | ||
- | * Have been in use for data storage since the 50's | + | * Have been in use for data storage since the 50’s |
* Main storage medium in some early computers | * Main storage medium in some early computers | ||
* Capacity: | * Capacity: | ||
- | * 1950's: ~ 1 MByte | + | * 1950’s: ~ 1 MByte |
- | * 1980's: ~ 100 MByte - 1 GByte | + | * 1980’s: ~ 100 MByte - 1 GByte |
- | * 1990's: ~ 10 - 100 GByte | + | * 1990’s: ~ 10 - 100 GByte |
- | * 2000's: ~ 100 GByte - 1 TByte | + | * 2000’s: ~ 100 GByte - 1 TByte |
* Now: >10 TByte | * Now: >10 TByte | ||
* Future: Going up to 200 TByte per Tape seems possible | * Future: Going up to 200 TByte per Tape seems possible | ||
Line 552: | Line 552: | ||
* Linear Tape Open | * Linear Tape Open | ||
* Open standard to store data on magnetic tapes | * Open standard to store data on magnetic tapes | ||
- | * Developed in the 90's | + | * Developed in the 90’s |
* Eigth Generation LTO was released in 2017 | * Eigth Generation LTO was released in 2017 | ||
* LTO-8 State of the art | * LTO-8 State of the art | ||
Line 571: | Line 571: | ||
* Expected durability of a tape is about 10^4 end-to-end passes | * Expected durability of a tape is about 10^4 end-to-end passes | ||
- | ====== Memory Hierarchy - cont'd ====== | + | ====== Memory Hierarchy - cont’d ====== |
* With the given storage technologies (Flash, HDD and Tape) we can refine our Memory hierarchy | * With the given storage technologies (Flash, HDD and Tape) we can refine our Memory hierarchy | ||
Line 581: | Line 581: | ||
* Recently sequential acessed files will be moved to the HDDs (They are fast enough when used in a sequential way) | * Recently sequential acessed files will be moved to the HDDs (They are fast enough when used in a sequential way) | ||
* Rarely acessed files go to the tape machine | * Rarely acessed files go to the tape machine | ||
- | * We could even link these files transparently into the file system, so the user doesn't have to know anything about where his data is located | + | * We could even link these files transparently into the file system, so the user doesn’t have to know anything about where his data is located |
====== Multi Tiered Storage Systems ====== | ====== Multi Tiered Storage Systems ====== | ||
Line 589: | Line 589: | ||
* Tier 2: SATA HDDs | * Tier 2: SATA HDDs | ||
* Tier 3: Tape Storage | * Tier 3: Tape Storage | ||
- | * But there's no free meal | + | * But there’s no free meal |
- | * We need to store data about where our data is stored (More on metadata later on) --> Memory Overhead | + | * We need to store data about where our data is stored (More on metadata later on) –> Memory Overhead |
- | * If a user accesses a file, we need to check where the file is --> Processing Overhead | + | * If a user accesses a file, we need to check where the file is –> Processing Overhead |
* In the worst case we have to copy it from tape to the disk | * In the worst case we have to copy it from tape to the disk | ||
* which means loading the right tape and winding to the correct position | * which means loading the right tape and winding to the correct position | ||
Line 602: | Line 602: | ||
* Add parity calculation or mirroring for redundancy | * Add parity calculation or mirroring for redundancy | ||
* Different Techniques / RAID Levels available | * Different Techniques / RAID Levels available | ||
- | * RAID 0, 1, 5, 6, 10, 01, ... | + | * RAID 0, 1, 5, 6, 10, 01, … |
* Several Discs in a RAID are called a RAID array | * Several Discs in a RAID are called a RAID array | ||
Line 633: | Line 633: | ||
====== RAID - Other Levels ====== | ====== RAID - Other Levels ====== | ||
- | * "Hybrid"-RAID | + | * “Hybrid”-RAID |
* Mirroring between SSD and HDD (special controller needed) | * Mirroring between SSD and HDD (special controller needed) | ||
* Nested-RAID | * Nested-RAID | ||
Line 656: | Line 656: | ||
* Abstraction of the secondary storage | * Abstraction of the secondary storage | ||
- | * After all we don't want to cope with the internals of storage devices | + | * After all we don’t want to cope with the internals of storage devices |
- | * Abstraction to Files / Directories | + | * Abstraction to Files / Directories |
* Files | * Files | ||
* Long-term existence | * Long-term existence | ||
* Sharable between processes | * Sharable between processes | ||
* Structure | * Structure | ||
- | * Files can have internal structure (e.g. databases) | + | * Files can have internal structure (e.g. databases) |
* Operations | * Operations | ||
* Create | * Create | ||
Line 693: | Line 693: | ||
* Appending to files and writing new files is not possible when inodes run out | * Appending to files and writing new files is not possible when inodes run out | ||
* Depending on the amount of files/ | * Depending on the amount of files/ | ||
- | * Show Inode Number with 'ls -i' | + | * Show Inode Number with ‘ls -i’ |
* Show content with stat | * Show content with stat | ||
Line 711: | Line 711: | ||
====== File System - Organization (I) ====== | ====== File System - Organization (I) ====== | ||
- | {{pandoc: | + | {{.:fs1.png}} (Operating Systems 7th Edition, W. Stallings, Chapter 12) |
====== File System - Organization (II) ====== | ====== File System - Organization (II) ====== | ||
Line 735: | Line 735: | ||
====== File System - Organization (IV) ====== | ====== File System - Organization (IV) ====== | ||
- | {{pandoc: | + | {{.:fs2.png}} (Operating Systems 7th Edition, W. Stallings, Chapter 12) |
====== Unix File Management ====== | ====== Unix File Management ====== | ||
Line 745: | Line 745: | ||
* Contains list of file names plus pointers to their index nodes | * Contains list of file names plus pointers to their index nodes | ||
* Special | * Special | ||
- | * Map physical devices to files (e.g. /dev/sda for the first SAS/SATA disc) | + | * Map physical devices to files (e.g. / |
* Named pipes | * Named pipes | ||
* Buffers received data | * Buffers received data | ||
Line 758: | Line 758: | ||
* Different File Systems under Linux | * Different File Systems under Linux | ||
- | * ext2, ext3, ext4, reiserfs, xfs, nfs, ... | + | * ext2, ext3, ext4, reiserfs, xfs, nfs, … |
* VFS provides a virtual file system | * VFS provides a virtual file system | ||
* Presents a single, uniform interface to user processes | * Presents a single, uniform interface to user processes | ||
* Mapping between VFS and underlying file system | * Mapping between VFS and underlying file system | ||
- | ====== Storage Networks - NAS vs. SAN ====== | + | ====== Storage Networks - NAS vs. SAN ====== |
* NAS - Network Attached Storage | * NAS - Network Attached Storage | ||
* Centralized File Storage | * Centralized File Storage | ||
* Exporting File-Systems for clients to mount | * Exporting File-Systems for clients to mount | ||
- | * Connected via Network (Ethernet, Infiniband, | + | * Connected via Network (Ethernet, Infiniband, |
* File System Layers of the server are used | * File System Layers of the server are used | ||
* User/Access Management at the server | * User/Access Management at the server | ||
Line 815: | Line 815: | ||
* Server replies to requests before they have been committed to stable storage | * Server replies to requests before they have been committed to stable storage | ||
* Usually less error prone than client side caching, because servers are usually in a more controlled environment than clients | * Usually less error prone than client side caching, because servers are usually in a more controlled environment than clients | ||
- | * e.g. Uninterruptible Power Supply, BBU Units on RAID controllers, | + | * e.g. Uninterruptible Power Supply, BBU Units on RAID controllers, |
* Problems arise | * Problems arise | ||
* Server crashes before data is transferred from client to server | * Server crashes before data is transferred from client to server | ||
Line 821: | Line 821: | ||
* Network connection breaks | * Network connection breaks | ||
* Power outages | * Power outages | ||
- | * etc... | + | * etc… |
====== Parallell File Systems - BeeGFS ====== | ====== Parallell File Systems - BeeGFS ====== | ||
Line 846: | Line 846: | ||
* Holds the Metadata information of the file system | * Holds the Metadata information of the file system | ||
- | * Should be stored on fast storage devices (e.g. Flash Memories) | + | * Should be stored on fast storage devices (e.g. Flash Memories) |
* Metadata accesses are mostly tiny random accesses | * Metadata accesses are mostly tiny random accesses | ||
* Ext4 seems to be the fastest filesystem to store beegfs metadata | * Ext4 seems to be the fastest filesystem to store beegfs metadata | ||
Line 861: | Line 861: | ||
* ACLs | * ACLs | ||
* Extended attributes | * Extended attributes | ||
- | * ... | + | * … |
====== BeeGFS - Metadata Server ====== | ====== BeeGFS - Metadata Server ====== | ||
- | * Entry point of the file system | + | * Entry point of the file system |
* Defined entry point in the directory tree | * Defined entry point in the directory tree | ||
* Additional directories are assigned random to other MDS | * Additional directories are assigned random to other MDS | ||
Line 911: | Line 911: | ||
* Laserdisc / CD-Rom / DVD / Blu-Ray | * Laserdisc / CD-Rom / DVD / Blu-Ray | ||
* Magnetic | * Magnetic | ||
- | | + | |
* Magneto-Optical | * Magneto-Optical | ||
- | | + | |
* Different Parallell File Systems | * Different Parallell File Systems | ||
* GPFS | * GPFS | ||
Line 935: | Line 935: | ||
* Sequential I/O can even be done in parallell from multiple nodes to further improve the throughput | * Sequential I/O can even be done in parallell from multiple nodes to further improve the throughput | ||
* Highly parallellized Random calls will result in degraded storage performance for ALL users and can even lead to an unresponsive storage. | * Highly parallellized Random calls will result in degraded storage performance for ALL users and can even lead to an unresponsive storage. | ||
- | * Don't do random I/O on storage devices | + | * Don’t do random I/O on storage devices |
Thank you for your attention | Thank you for your attention | ||