The $GLOBAL (and $SCRATCH) system has been decommissioned. Use $BINFL or $DATA (if the project also exists on VSC4)

This article is about the $GLOBAL and $HOME filesystems of VSC-3. If you are searching for info about the bioinformatics storage, the article can be found here.

VSC-3 provides three facilities for persisting data: the high-performance BeeGFS Parallel Filesystem (former Fraunhofer Parallel Filesystem, FhGFS), the Network File System (NFS) and a node-local ramdisk. They are accessible under:

  • NFS: $HOME which expands to /home/lv<project>/<username>
  • BeeGFS (former FhGFS): Decommissioned
    • $GLOBAL expands to /fhgfs/global/lv<project>/<username>,
    • $SCRATCH to /fhgfs/<node> (node local)
  • Scratch RAM Disk $TMPDIR </tmp>

$HOME is the location of the user UNIX home directory. It can be accessed from login and compute nodes. $HOME can be used to hold results, settings, source code etc. - data for which high concurrent job throughput and support for large file sizes is not required. Conversely, the parallel BeeGFS filesystem (see below) should utilized to persist temporary data in compute runs.

Backup of $HOME is user responsibility.

$HOME is provided from file servers with disk arrays that are exported over the network file system (NFS). Even on highly scaled storage such on VSC-3, the number of concurrent file operations is bound by spinning disk physics: small file (write) operations can easily saturate capacity. Hence, please mind that $HOME is a shared resource over all projects on a given NFS server. In case your project requires persistence over a large number of small files please contact VSC administration in advance.

$TMPDIR provides a small ephermal-volatile RAM disk of 50% node RAM, e.g. 32GB for a 64GB node. It suits very fast local access that is restricted to single nodes, especially for many small files. The RAM disk does not explicitly have to be requested in jobs and grows with file contents - subtracting its usage from available memory. The variable $TMPDIR expands to /tmp. Please do not hardcode /tmp directly. Directories in $TMPDIR are purged after job execution.

$ echo $TMP -- $TMPDIR
/tmp/123456.789.queue.q -- /tmp/123456.789.queue.q

Disk quotas are set per project. Users within a project share the quota.

Storage extensions can be requested through Vergabeassistent at Extensions - Storage.

The storage resources underlying NFS and BeeGFS (former FhGFS) are shared. Please utilize BeeGFS primarily for large I/O intensive runs. The number of files per run or per project is not hard limited. Yet, it is strongly discouraged to create/operate on O(10E5) and above number of files. If millions of (small) files are required for a code, please contact system operation in advance as performance impact on other users can occur.

Parallel filesystems used in large scale computing are unlike desktop file systems. Contact VSC staff in planning for high I/O computation. Also, VSC can support architecting one-time and recurrent large ingress-egress data pipelines, recurrent large data transfer workflows, and support optimizing codes for parallel I/O.

Backup of user files independent of location is solely the responsibility of each user.

VSC-3 NFS and BeeGFS (former FhGFS) servers utilize RAID-6 that can sustain up to 2 disks failing concurrently. The data path is otherwise not redundant. Data loss may also occur due to failure modes including, but not limited to natural disaster, cooling failure, disk controller failure and filesystem software faults.

User data on VSC-3 is not backuped.

  • doku/vsc3_storage.txt
  • Last modified: 2021/08/23 08:52
  • by goldenberg