Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
doku:vsc2 [2014/06/18 10:45] – removed ir | doku:vsc2 [2022/02/01 23:10] (current) – ↷ Links adapted because of a move operation 114.119.138.70 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Documentation for VSC-2 ====== | ||
+ | ==== Quick Start ==== | ||
+ | - Log in to your university' | ||
+ | ssh < | ||
+ | |||
+ | # TU Wien | ||
+ | ssh < | ||
+ | |||
+ | # Boku Wien | ||
+ | ssh < | ||
+ | </ | ||
+ | - Transfer your programs and data/input files to your home directory. | ||
+ | - (Re-)Compile your application. Please choose your MPI-Environment as described in [[doku: | ||
+ | - Write a job script for your application: | ||
+ | #$ -pe mpich < | ||
+ | #$ -V | ||
+ | mpirun -machinefile $TMPDIR/ | ||
+ | /bin/sh: module: line 1: syntax error: unexpected end of file | ||
+ | /bin/sh: error importing function definition for `module' | ||
+ | bash: module: line 1: syntax error: unexpected end of file | ||
+ | bash: error importing function definition for `module'</ | ||
+ | #$ -v LD_LIBRARY_PATH | ||
+ | </ | ||
+ | #$ -m beas # all job events sent via email</ | ||
+ | #$ -N hitchhiker | ||
+ | #$ -pe mpich 32 | ||
+ | #$ -V | ||
+ | #$ -M my.name@example.com | ||
+ | #$ -m be | ||
+ | #$ -l h_rt=03: | ||
+ | |||
+ | mpirun -machinefile $TMPDIR/ | ||
+ | - Submit your job:\\ < | ||
+ | - Check if and where your job has been scheduled: | ||
+ | - Inspect the job output. Assuming your job was assigned the id " | ||
+ | hitchhiker.o42 | ||
+ | hitchhiker.e42 | ||
+ | hitchhiker.po42 | ||
+ | hitchhiker.pe42</ | ||
+ | - Delete Jobs: < | ||
+ | |||
+ | ==== Queues ==== | ||
+ | |||
+ | === Standard Queue (all.q) === | ||
+ | |||
+ | The majority of jobs use the standard queue ''' | ||
+ | |||
+ | == Node types == | ||
+ | |||
+ | All nodes are configured equivalently, | ||
+ | VSC-2 has | ||
+ | * 1334 nodes with 32GB main memory | ||
+ | * 8 nodes with 64 GB main memory | ||
+ | * 8 nodes with 128 GB main memory | ||
+ | * 2 nodes with 256 GB main memory and 64 cores (not in ''' | ||
+ | Jobs are scheduled by default to request nodes with at least 27 GB free memory. To override this default on the command line or in the job script you may specify: | ||
+ | * to allow scheduling on the 32 GB, 64 GB and 128 GB nodes | ||
+ | * this is the default | ||
+ | * to allow scheduling on the 64 GB and 128 GB nodes | ||
+ | * ''' | ||
+ | * '''# | ||
+ | * to allow scheduling on 128 GB nodes only: | ||
+ | * ''' | ||
+ | * '''# | ||
+ | * to allow scheduling on 256 GB nodes only: | ||
+ | * see below: High Memory Queue | ||
+ | |||
+ | In order to avoid jobs with low memory requirements on nodes with 64 or 128 GB, priority adjustments are made in the queue. | ||
+ | |||
+ | === Long Queue === | ||
+ | |||
+ | A queue where jobs will be allowed to run a maximum of 7 days | ||
+ | is available on VSC-2. The limit on the number of slots per job | ||
+ | is 128 and the maximum number of allocatable slots per user | ||
+ | at one time is 768. A total of 4096 slots are available for long jobs. | ||
+ | All nodes of this queue have 32GB main memory. | ||
+ | Use this queue by specifying it explicitly in your job script: | ||
+ | < | ||
+ | |||
+ | === High Memory Queue === | ||
+ | |||
+ | Due to higher memory requests from some users, two nodes with 256 GB memory and 64 cores | ||
+ | are available in the queue ''' | ||
+ | The four processors utilized are AMD Opteron 6274 with 2.2GHz and 16 cores each. | ||
+ | These nodes show a sustained performance of about 400 GFlop/s, which compares to about four standard nodes of the VSC-2. | ||
+ | |||
+ | Due to the special memory requirements of jobs in this queue, jobs are granted exclusive access. | ||
+ | 64 slots are accounted for, even if the job does not make efficient use of all 64 cores. | ||
+ | Make sure to adapt your job script to pin processes to cores | ||
+ | < | ||
+ | if applicable. | ||
+ | |||
+ | The run time limit is 3 days (72 hours). | ||
+ | |||
+ | Programs which work in the ''' | ||
+ | Intel compilers and Intel MPI show good behaviour on the ''' | ||
+ | |||
+ | Please use this node only for jobs with memory requirements of more than 64 GB! | ||
+ | ==== MPI Version ==== | ||
+ | |||
+ | On VSC-2 several versions of MPI are available. | ||
+ | Choose one using ' | ||
+ | < | ||
+ | #list available MPI versions: | ||
+ | $ mpi-selector --list | ||
+ | impi_intel-4.1.0.024 | ||
+ | impi_intel-4.1.1.036 | ||
+ | intel_mpi_intel64-4.0.3.008 | ||
+ | mvapich2_1.8_intel_limic | ||
+ | mvapich2_gcc-1.9a2 | ||
+ | mvapich2_intel | ||
+ | openmpi-1.5.4_gcc | ||
+ | openmpi-1.5.4_intel | ||
+ | openmpi_gcc-1.6.4 | ||
+ | |||
+ | #see the currently used MPI version: | ||
+ | $ mpi-selector --query | ||
+ | default: | ||
+ | level:user | ||
+ | |||
+ | #set the MPI version: | ||
+ | $ mpi-selector --set impi_intel-4.1.0.024 | ||
+ | </ | ||
+ | Modifications will be active after logging in again. | ||
+ | |||
+ | ==== Scratch Directories ==== | ||
+ | |||
+ | In addition to $HOME, which is fine to use for standard jobs with rather few small files (<1000 files, overall size <1G), there are a number of specialized scratch directories. | ||
+ | |||
+ | The [[http:// | ||
+ | |||
+ | === Global Personal Scratch Directories $GLOBAL === | ||
+ | |||
+ | Please use the environment variable '' | ||
+ | < | ||
+ | $ echo $GLOBAL | ||
+ | / | ||
+ | </ | ||
+ | The directory is writeable as user and readable by the group members. It is advisable to make use of these directories in particular for jobs with heavy I/O operations. In addition it will reduce the load on the fileserver holding the $HOME directories. | ||
+ | |||
+ | The Fraunhofer parallel file system is shared by all users and by all nodes. | ||
+ | Single jobs producing heavy load (>> | ||
+ | |||
+ | === Per-node Scratch Directories $SCRATCH === | ||
+ | |||
+ | Local scratch directories on each node are provided as a link to the Fraunhofer parallel file system and can thus be viewed also via the login nodes as '''/ | ||
+ | The parallel file system (and thus the performance) is identical between $SCRATCH and $GLOBAL. | ||
+ | The variable '' | ||
+ | < | ||
+ | $ echo $SCRATCH | ||
+ | /scratch | ||
+ | </ | ||
+ | These directories are purged after job execution. | ||
+ | |||
+ | === Local temporary ram disk $TMPDIR === | ||
+ | |||
+ | For smaller files and very fast access, restricted to single nodes, the variables '' | ||
+ | < | ||
+ | $ echo $TMP -- $TMPDIR | ||
+ | / | ||
+ | </ | ||
+ | These directories are purged after job execution. | ||
+ | |||
+ | Please refrain from writing directly to '''/ | ||
+ | |||
+ | === Joblocal scratch directory $JOBLOCAL === | ||
+ | |||
+ | The newest, still experimental, | ||
+ | The ''' | ||
+ | < | ||
+ | -v JOBLOCAL_FILESYSTEM=TRUE | ||
+ | </ | ||
+ | All nodes within a job access the same files under '''/ | ||
+ | |||
+ | This method scales very well up to several hundred similar jobs. | ||
+ | Although the file system has 32GB, it is recommended to use only a few GB. | ||
+ | |||
+ | To save files at the job end, use, e.g., | ||
+ | < | ||
+ | cd /joblocal; tar czf ${HOME}/ | ||
+ | </ | ||
+ | in your [[prolog|user epilog]] script. | ||
+ | |||
+ | If there are many files (>> | ||
+ | |||
+ | Implementation details: '' | ||
+ | Very high performance for small files is achieved by extensive caching on the jobs master node, which acts as (job internal) NFS server. | ||
+ | |||
+ | === Comparison of scratch directories === | ||
+ | |||
+ | | || $GLOBAL | ||
+ | | Recommended file size || large || large || small || small || | ||
+ | | Lifetime | ||
+ | | Size || x00 TB (for all users) | ||
+ | | Scaling | ||
+ | | Visibility | ||
+ | | Recommended usage || large files, available after job life || large files || small files, or many seek-operations within a file || many small files (>1000), or many seek-operations within a file || | ||
+ | ==== General recommendations ==== | ||
+ | To make sure that the MPI communication happens via the infiniband fabric, please use the following settings in your job-script and/or in your '' | ||
+ | < | ||
+ | export I_MPI_DAT_LIBRARY=/ | ||
+ | export OMP_NUM_THREADS=1 | ||
+ | export I_MPI_FABRICS=shm: | ||
+ | export I_MPI_FALLBACK=0 | ||
+ | export I_MPI_CPUINFO=proc | ||
+ | export I_MPI_PIN_PROCESSOR_LIST=1, | ||
+ | export I_MPI_JOB_FAST_STARTUP=0 | ||
+ | </ | ||
+ | |||
+ | ==== Recommendations for various codes ==== | ||
+ | |||
+ | * [[vasp-vsc2|VASP]] | ||
+ | * [[antares|ANTARES]] | ||
+ | * [[wien2k|WIEN2k]] | ||
+ | * [[mpi-helium|MPI-Helium]] | ||
+ | * [[wrf|WRFV3]] | ||
+ | * [[gaussian09|Gaussian09]] | ||
+ | * [[sequential-codes|Sequential codes]] | ||
+ | |||
+ | ==== Recommendations for advanced users ==== | ||
+ | |||
+ | * [[fft]] libraries | ||
+ | * [[large]] jobs with more than 1024 cores | ||
+ | * [[memory]] intensive jobs requiring more than 2 GB per core | ||
+ | * [[ScaLAPACK]] compile options | ||
+ | * [[nwchem-vsc2|NWChem]] | ||
+ | * [[blas|Linking to BLAS Libraries]] | ||
+ | * user defined [[prolog|prolog and epilog]] scripts | ||
+ | |||
+ | ==== Process pinning ==== | ||
+ | |||
+ | The NUMA memory of VSC-2 is highly depending on the positioning of processes to the four '' | ||
+ | Using Intel MPI the Parameter | ||
+ | < | ||
+ | </ | ||
+ | as mentioned above should always be used to pin (up to) 16 processes to the 16 cores. | ||
+ | In the case of sequential jobs, we recommend to use ' | ||
+ | < | ||
+ | taskset -c 0 our_example_code param1 param2 >out1 & | ||
+ | taskset -c 8 our_example_code param1 param2 >out2 & | ||
+ | wait | ||
+ | </ | ||
+ | Performance gains of up to 200% were observed for synthetic benchmarks. | ||
+ | Note also the examples for [[sequential-codes|sequential jobs]]. | ||
+ | |||
+ | ==== Backup ==== | ||
+ | |||
+ | No backup on VSC. | ||
+ | |||
+ | Backup is at the responsibility of each user. | ||
+ | |||
+ | Data loss by hardware failure is prevented by using state-of-the-art technology like RAID-6. |