Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
pandoc:introduction-to-vsc:08_storage_infrastructure:storage_infrastructure [2017/10/18 11:42] – Pandoc Auto-commit pandocpandoc:introduction-to-vsc:08_storage_infrastructure:storage_infrastructure [2020/10/20 09:13] (current) – Pandoc Auto-commit pandoc
Line 1: Line 1:
 ====== Storage infrastructure ====== ====== Storage infrastructure ======
  
-  * Article written by Siegfried Reinwald (VSC Team) <html><br></html>(last update 2017-04-27 by sr).+  * Article written by Siegfried Reinwald (VSC Team) <html><br></html>(last update 2019-01-15 by sh).
  
  
-====== Storage hardware ======+ 
 +====== Storage hardware VSC-3 ======
  
   * Storage on VSC-3   * Storage on VSC-3
-    * Servers for ''%%$HOME%%''+    * 10 Servers for ''%%$HOME%%''
     * 8 Servers for ''%%$GLOBAL%%''     * 8 Servers for ''%%$GLOBAL%%''
-    * 17 Servers for ''%%$BINFS%%'' / ''%%$BINFL%%''+    * 16 Servers for ''%%$BINFS%%'' / ''%%$BINFL%%''
     * ~ 800 spinning disks     * ~ 800 spinning disks
     * ~ 100 SSDs     * ~ 100 SSDs
Line 23: Line 24:
     * ''%%$BINFL%%''     * ''%%$BINFL%%''
   * For different purposes   * For different purposes
 +    * Random I/O
     * Small Files     * Small Files
     * Huge Files / Streaming Data     * Huge Files / Streaming Data
-    * Random I/O 
  
 ====== Storage performance ====== ====== Storage performance ======
  
-{{pandoc:introduction-to-vsc:08_storage_infrastructure:storage_infrastructure:vsc3_storage_performance.png}}+{{.:vsc3_storage_performance.png}}
  
-====== The HOME Filesystem ======+====== The HOME Filesystem (VSC-3) ======
  
   * Use for non I/O intensive jobs   * Use for non I/O intensive jobs
   * Basically NFS Exports over infiniband (no RDMA)   * Basically NFS Exports over infiniband (no RDMA)
-  * Targets with up to 24 Disks (RAID-6 on VSC-3) 
-  * Up to 2 Gigabyte/second write speed 
   * Logical volumes of projects are distributed among the servers   * Logical volumes of projects are distributed among the servers
     * Each logical volume belongs to 1 NFS server     * Each logical volume belongs to 1 NFS server
Line 48: Line 47:
     * Can be increased on request (subject to availability)     * Can be increased on request (subject to availability)
   * BeeGFS Filesystem   * BeeGFS Filesystem
-  * Metadata Servers 
-    * Metadata on SSDs (RAID-1) 
-    * 8 Metadata Targets for VSC-3 
-  * Object Storages 
-    * Disk Storages (RAID-6 on VSC-3) 
-    * VSC-3: 12 Disks per Target / 4 Targets per Server / 8 Servers total 
-  * Up to 20 Gigabyte/second write speed 
   * Accessible via the ''%%$GLOBAL%%'' and ''%%$SCRATCH%%'' environment variables   * Accessible via the ''%%$GLOBAL%%'' and ''%%$SCRATCH%%'' environment variables
-    * ''%%$GLOBAL%%'' ... ///global/lv70XXX/username// +    * ''%%$GLOBAL%%'' … ///global/lv70XXX/username// 
-    * ''%%$SCRATCH%%'' ... ///scratch//+    * ''%%$SCRATCH%%'' … ///scratch// 
 +  * Check quota
  
 +<code>
 +    beegfs-ctl --getquota --cfgFile=/etc/beegfs/global3.d/beegfs-client.conf --gid 70XXX
 +</code>
 +<code>
 +VSC-3 > beegfs-ctl --getquota --cfgFile=/etc/beegfs/global3.d/beegfs-client.conf --gid 70824
 +      user/group     ||           size          ||    chunk files    
 +     name      id  ||    used    |    hard    ||  used    hard   
 +--------------|------||------------|------------||---------|---------
 +        p70824| 70824||      0 Byte|  500.00 GiB||        0|   100000
 +
 +</code>
 ====== The BINFL filesystem ====== ====== The BINFL filesystem ======
  
Line 66: Line 70:
     * Can be increased on request (subject to availability)     * Can be increased on request (subject to availability)
   * BeeGFS Filesystem   * BeeGFS Filesystem
-  * Metadata Servers 
-    * Metadata on Datacenter SSDs (RAID-10) 
-    * 8 Metadata Servers 
-  * Object Storages 
-    * Disk Storages configured as RAID-6 
-    * 12 Disks per Target / 1 Target per Server / 16 Servers total 
-  * Up to 40 Gigabyte/second write speed 
   * Accessible via ''%%$BINFL%%'' environment variable   * Accessible via ''%%$BINFL%%'' environment variable
-    * ''%%$BINFL%%'' ... ///binfl/lv70XXX/username//+    * ''%%$BINFL%%'' … ///binfl/lv70XXX/username// 
 +  * Also available on VSC-4 
 +  * Check quota
  
 +<code>
 +    beegfs-ctl --getquota --cfgFile=/etc/beegfs/hdd_storage.d/beegfs-client.conf --gid 70XXX
 +</code>
 +<code>
 +VSC-3 > beegfs-ctl --getquota --cfgFile=/etc/beegfs/hdd_storage.d/beegfs-client.conf --gid 70824
 +      user/group     ||           size          ||    chunk files    
 +     name      id  ||    used    |    hard    ||  used    hard   
 +--------------|------||------------|------------||---------|---------
 +        p70824| 70824||    5.93 MiB|   10.00 GiB||      574|  1000000
 +
 +</code>
 ====== The BINFS filesystem ====== ====== The BINFS filesystem ======
  
Line 83: Line 93:
     * Can be increased on request (subject to availability)     * Can be increased on request (subject to availability)
   * BeeGFS Filesystem   * BeeGFS Filesystem
-  * Metadata Servers 
-    * Metadata on Datacenter SSDs (RAID-10) 
-    * 8 Metadata Servers 
-  * Object Storages 
-    * Datacenter SSDs are used instead of traditional disks. 
-      * No redundancy. See it as (very) fast and low-latency scratch space. Data may be lost after a hardware failure. 
-    * 4x Intel P3600 2TB Datacenter SSDs per Server 
-    * 16 Storage Servers 
-  * Up to 80 Gigabyte/second via OmniPath Interconnect 
   * Accessible via ''%%$BINFS%%'' environment variable   * Accessible via ''%%$BINFS%%'' environment variable
-    * ''%%$BINFS%%'' ... ///binfs/lv70XXX/username//+    * ''%%$BINFS%%'' … ///binfs/lv70XXX/username// 
 +  * Also available on VSC-4 
 +  * Check quota
  
 +<code>
 +    beegfs-ctl --getquota --cfgFile=/etc/beegfs/nvme_storage.d/beegfs-client.conf --gid 70XXX
 +</code>
 +<code>
 +VSC-3 > beegfs-ctl --getquota --cfgFile=/etc/beegfs/nvme_storage.d/beegfs-client.conf --gid 70824
 +      user/group     ||           size          ||    chunk files    
 +     name      id  ||    used    |    hard    ||  used    hard   
 +--------------|------||------------|------------||---------|---------
 +        p70824| 70824||      0 Byte|    2.00 GiB||        0|     2000
 +
 +</code>
 ====== The TMP filesystem ====== ====== The TMP filesystem ======
  
Line 105: Line 119:
   * Disadvantages   * Disadvantages
     * Space is consumed from main memory <html><!--* Alternatively the mmap() system call can be used     * Space is consumed from main memory <html><!--* Alternatively the mmap() system call can be used
-* Keep in mind, that mmap() uses lazy loading +  * Keep in mind, that mmap() uses lazy loading 
-* Very small files waste main memory (memory mapped files are aligned to page-size)--></html>+  * Very small files waste main memory (memory mapped files are aligned to page-size)--></html>
   * Accessible with the ''%%$TMPDIR%%'' environment variable   * Accessible with the ''%%$TMPDIR%%'' environment variable
 +
 +====== Storage hardware VSC-4 ======
 +
 +  * Storage on VSC-4
 +    * 1 Server for ''%%$HOME%%''
 +    * 6 Servers for ''%%$DATA%%''
 +    * 720 spinning disks
 +    * 16 NVMEs flash drives
 +
 +====== The HOME Filesystem (VSC-4) ======
 +
 +  * Use for software and job scripts
 +  * Default quota: 100GB
 +  * Accessible with the ''%%$HOME%%'' environment variable (VSC-4)
 +    * /home/fs70XXX/username
 +  * Also available on VSC-3
 +    * /gpfs/home/fs70XXX/username
 +  * Check quota
 +
 +<code>
 +mmlsquota --block-size auto -j home_fs70XXX home
 +</code>
 +<code>
 +VSC-4 > mmlsquota --block-size auto -j home_fs70824 home
 +                         Block Limits                                    |     File Limits
 +Filesystem type         blocks      quota      limit   in_doubt    grace |    files   quota    limit in_doubt
 +home       FILESET       63.7M       100G       100G          0     none |     3822 1000000  1000000        0 
 +
 +</code>
 +====== The DATA Filesystem ======
 +
 +  * Use for all kind of I/O
 +  * Default quota: 10TB
 +    * Extansion can be requested
 +  * Accessible with the ''%%$DATA%%'' environment variable (VSC-4)
 +    * /data/fs70XXX/username
 +  * Also available on VSC-3
 +    * /gpfs/data/fs70XXX/username
 +  * Check quota
 +
 +<code>
 +mmlsquota --block-size auto -j data_fs70XXX data
 +</code>
 +<code>
 +VSC-4 > mmlsquota --block-size auto -j data_fs70824 data
 +                         Block Limits                                    |     File Limits
 +Filesystem type         blocks      quota      limit   in_doubt    grace |    files   quota    limit in_doubt 
 +data       FILESET               9.766T     9.766T          0     none |       14 1000000  1000000        0 
 +
 +</code>
 +====== Backup policy ======
 +
 +  * Backup of user files is **solely the responsibility of each user**
 +    * [[https://service.vsc.ac.at/slides/introduction-to-vsc/02_connecting_to_VSC/connecting_to_VSC.html#(21)|How to back up my files]]
 +  * Backed up filesystems:
 +    * ''%%$HOME%%'' (VSC-3)
 +    * ''%%$HOME%%'' (VSC-4)
 +    * ''%%$DATA%%'' (VSC-4)
 +  * Backups are performed on best effort basis
 +    * Full backup run: ~3 days
 +  * Backups are used for **disaster recovery only**
 +  * Project manager can exclude $DATA filesystem from backup
 +    * [[https://service.vsc.ac.at/|service.vsc.ac.at]]
  
 ====== Storage exercises ====== ====== Storage exercises ======
  
-In these exercises we try to measure the performance of the different storage targets on VSC-3. For that we will use the "ior" application (https:%%//%%github.com/LLNL/ior) which is a standard benchmark for distributed storage systems.+In these exercises we try to measure the performance of the different storage targets on VSC-3. For that we will use the “IOR” application (https:%%//%%github.com/LLNL/ior) which is a standard benchmark for distributed storage systems.
  
-"ior" for these exercises has been built with gcc-4.9 and openmpi-1.10.2 so load these 2 modules first:+“IOR” for these exercises has been built with gcc-4.9 and openmpi-1.10.2 so load these 2 modules first:
  
 <code> <code>
 module purge module purge
-module load load gcc/4.9 openmpi/1.10.2+module load gcc/4.9 openmpi/1.10.2
 </code> </code>
 Now extract the storage exercises to your own Folder. Now extract the storage exercises to your own Folder.
Line 124: Line 201:
 mkdir my_directory_name mkdir my_directory_name
 cd my_directory_name cd my_directory_name
-tar xf /home/lv70824/training/examples/08_storage_infrastructure/Storage_Exercises.tar.xz+cp -r ~training/examples/08_storage_infrastructure/*Benchmark ./
 </code> </code>
-Download [[examples/Storage_Exercises.tar.xz|Storage_Exercises]] 
- 
 Keep in mind that the results will vary, because there are other users working on the storage targets. Keep in mind that the results will vary, because there are other users working on the storage targets.
  
-====== Exercise 1 - Sequential I/O performance hands-On ======+====== Exercise 1 - Sequential I/O ======
  
 We will now measure the sequential performance of the different storage targets on VSC-3. We will now measure the sequential performance of the different storage targets on VSC-3.
  
-<HTML><ol style="list-style-type: lower-alpha;"></HTML> +  - With one process
-<HTML><li></HTML>With one process<HTML></li></HTML><HTML></ol></HTML>+
  
 <code> <code>
 cd 01_SequentialStorageBenchmark cd 01_SequentialStorageBenchmark
-Run the test-script +Submit the job     
-./run_exercise1a.sh +sbatch 01a_one_process_per_target.slrm 
-Wait to complete and look at the output afterwards +Inspect corresponding slurm-*.out files
-cat Exercise_1a.result+
 </code> </code>
 <HTML><ol start="2" style="list-style-type: lower-alpha;"></HTML> <HTML><ol start="2" style="list-style-type: lower-alpha;"></HTML>
Line 148: Line 221:
  
 <code> <code>
-./run_exercise1b.sh +Submit the job 
-Wait to complete and look at the output afterwards +sbatch 01b_eight_processes_per_target.slrm 
-cat Exercise_1b.result+# Inspect corresponding slurm-*.out files
 </code> </code>
-Take your time and compare the outputs of the 2 different runs. What can you deduce about the storage targets on VSC-3?+Take your time and compare the outputs of the 2 different runs. What conclusions can be drawn for the storage targets on VSC-3?
  
 ====== Exercise 1 - Sequential I/O performance discussion ====== ====== Exercise 1 - Sequential I/O performance discussion ======
Line 160: Line 233:
   * The performance of which storage targets improves with the number of processes? Why?   * The performance of which storage targets improves with the number of processes? Why?
   * What could you do to further improve the performance of the sequential write throughput? What could be a problem with that?   * What could you do to further improve the performance of the sequential write throughput? What could be a problem with that?
-  * Bonus Question: ''%%$TMPDIR%%'' seems to scale pretty well with the number of processes although it'an in-memory filesystem. Why is that happening?+  * Bonus Question: ''%%$TMPDIR%%'' seems to scale pretty well with the number of processes although it is an in-memory filesystem. Why is that happening?
  
  
Line 188: Line 261:
 ---- ----
  
-====== Exercise 2 - Random I/O performance hands-On ======+====== Exercise 2 - Random I/O ======
  
-We will now measure the storage performance when confronted with tiny 4kilobyte random writes.+We will now measure the storage performance for tiny 4kilobyte random writes. 
 + 
 +  - With one process
  
-<HTML><ol style="list-style-type: lower-alpha;"></HTML> 
-<HTML><li></HTML><HTML><p></HTML>With one process<HTML></p></HTML> 
 <code> <code>
 cd 02_RandomioStorageBenchmark cd 02_RandomioStorageBenchmark
-Run the Test +Submit the job 
-./run_exercise2a.sh +sbatch 02a_one_process_per_target.slrm 
-Wait to complete and look at the output afterwards +Inspect corresponding slurm-*.out files
-cat Exercise_2a.result +
-</code><HTML></li></HTML> +
-<HTML><li></HTML><HTML><p></HTML>With 8 processes<HTML></p></HTML> +
-<code> +
-./run_exercise2b.sh +
-# Wait to complete and look at the output afterwards +
-cat Exercise_2b.result +
-</code><HTML></li></HTML><HTML></ol></HTML>+
  
-Take your time and compare the outputs of the 2 different runs. Do additional processes speed up the process?+</code> 
 +<HTML><ol start="2" style="list-style-type: lower-alpha;"></HTML> 
 +<HTML><li></HTML>With 8 processes<HTML></li></HTML><HTML></ol></HTML> 
 + 
 +<code> 
 +# Submit the job 
 +sbatch 02b_eight_processes_per_target.slrm 
 +# Inspect corresponding slurm-*.out files 
 +</code> 
 +Take your time and compare the outputs of the 2 different runs. Do additional processes speed up the I/O activity?
  
-Now compare your Results to the sequential run in exercise 1. What can you deduce about random I/O versus sequential I/O on the VSC-3 storage targets?+Now compare your Results to the sequential run in exercise 1. What can be concluded for random I/O versus sequential I/O on the VSC-3 storage targets?
  
 ====== Exercise 2 - Random I/O performance discussion ====== ====== Exercise 2 - Random I/O performance discussion ======
Line 246: Line 320:
  
 ---- ----
 +
  
  • pandoc/introduction-to-vsc/08_storage_infrastructure/storage_infrastructure.1508326951.txt.gz
  • Last modified: 2017/10/18 11:42
  • by pandoc