Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
doku:runtime [2014/05/28 11:19] – ir | doku:runtime [2021/05/13 17:44] (current) – removed goldenberg | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Specifying the maximum runtime ====== | ||
- | The specification of the maximum runtime may render the job execution faster by reducing the time waiting in the queue. | ||
- | === Slot reservation for large jobs === | ||
- | In order to ensure that large jobs are able to run on the cluster and in order to avoid blocking by many smaller jobs,the queueing system used on the VSC enables for the **reservation of slots**. | ||
- | This feature is **automatically** activated **for any jobs requiring 16 or more slots**. | ||
- | |||
- | The down side of this feature is that a certain amount of resources is unused until the total amount of slots for a queued large job is available. | ||
- | === Faster scheduling by backfilling === | ||
- | In the meantime waiting for a large job to be scheduled, idle resources could be used by smaller jobs. By conserving the priorities of the jobs on top of the list, only jobs can pass which have a maximum runtime shorter than the expected time needed to free the total amount of slots requested by the large job. | ||
- | |||
- | Users can profit from this process called **backfilling** by adding this on top of the job submission script: | ||
- | < | ||
- | #$ -l h_rt=01: | ||
- | Here, h_rt means "hard runtime" | ||
- | |||
- | Since this feature greatly improves the use of our precious computing time by reducing the amount of idle resources, we would like to encourage you to **use the feature of maximum runtime specification**: | ||
- | - a lot of electricity will be saved and | ||
- | - the time of your jobs spent in the queue is reduced. | ||
- | |||
- | |||
- | ====== Specifying runtime limits ====== | ||
- | |||
- | In SGE Jobs two runtime limits are available: soft (s_rt) and hard (h_rt) runtime limit | ||
- | |||
- | |||
- | h_rt specifies the time after all parts of the job script have to be finished. Running processes are then killed by GridEngine. Grid Engine sends a SIGUSR2 signal. | ||
- | |||
- | s_rt specifies the soft runtime limit after that a SIGUSR1 signal is sent to the process. If s_rt is n times smaller than h_rt SIGUSR1 is sent n times: | ||
- | |||
- | |||
- | < | ||
- | #!/bin/bash | ||
- | |||
- | #$ -N notify_test | ||
- | #$ -pe mpich 2 | ||
- | #$ -notify | ||
- | #$ -V | ||
- | #$ -l h_rt=0: | ||
- | #$ -l s_rt=0: | ||
- | |||
- | echo $TMPDIR | ||
- | |||
- | function sigusr1handler() | ||
- | { | ||
- | date | ||
- | echo " | ||
- | } | ||
- | function sigusr2handler() | ||
- | { | ||
- | date | ||
- | echo " | ||
- | } | ||
- | trap sigusr1handler SIGUSR1 | ||
- | trap sigusr2handler SIGUSR2 | ||
- | |||
- | echo " | ||
- | date | ||
- | # Start | ||
- | # -q 0: disable "MPI progress Quiescence" | ||
- | |||
- | #mpirun -q 0 -m $TMPDIR/ | ||
- | for i in {1..900} | ||
- | do | ||
- | echo " | ||
- | sleep 10 | ||
- | done | ||
- | |||
- | echo " | ||
- | date | ||
- | </ | ||
- | |||
- | |||
- | output in error (*.e*) file of this job example is: | ||
- | |||
- | < | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 1 | ||
- | SIGUSR1 caught by shell script | ||
- | User defined signal 2 | ||
- | SIGUSR2 caught by shell script | ||
- | </ | ||