Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
doku:runtime [2014/05/28 11:19] irdoku:runtime [2021/05/13 17:44] (current) – removed goldenberg
Line 1: Line 1:
-====== Specifying the maximum runtime ====== 
-The specification of the maximum runtime may render the job execution faster by reducing the time waiting in the queue.  
-=== Slot reservation for large jobs === 
-In order to ensure that large jobs are able to run on the cluster and in order to avoid blocking by many smaller jobs,the queueing system used on the VSC enables for the **reservation of slots**. 
-This feature is **automatically** activated **for any jobs requiring 16 or more slots**. 
- 
-The down side of this feature is that a certain amount of resources is unused until the total amount of slots for a queued large job is available.  
-=== Faster scheduling by backfilling === 
-In the meantime waiting for a large job to be scheduled, idle resources could be used by smaller jobs. By conserving the priorities of the jobs on top of the list, only jobs can pass which have a maximum runtime shorter than the expected time needed to free the total amount of slots requested by the large job. 
- 
-Users can profit from this process called **backfilling** by adding this on top of the job submission script: 
-<code>#### backfilling: 
-#$ -l h_rt=01:00:00    # in this example, ONE HOUR (01:00:00) is the job's maximum run time</code> 
-Here, h_rt means "hard runtime". The job will be permitted to run __at most__ this long. Please add a little **buffer of about 10-50%** to the runtime you've estimated/measured, so the job doesn't get killed before its result is available! For example, if your last job came in at 03:57:03 and you think that the next one will be pretty similar in nature, try to specify "-l h_rt=06:00:00". 
- 
-Since this feature greatly improves the use of our precious computing time by reducing the amount of idle resources, we would like to encourage you to **use the feature of maximum runtime specification**: 
-  - a lot of electricity will be saved and  
-  - the time of your jobs spent in the queue is reduced.  
- 
- 
-====== Specifying runtime limits ====== 
- 
-In SGE Jobs two runtime limits are available: soft (s_rt) and hard (h_rt) runtime limit 
- 
- 
-h_rt specifies the time after all parts of the job script have to be finished. Running processes are then killed by GridEngine. Grid Engine sends a SIGUSR2 signal. 
- 
-s_rt specifies the soft runtime limit after that a SIGUSR1 signal is sent to the process. If s_rt is n times smaller than h_rt SIGUSR1 is sent n times: 
- 
- 
-<code> 
-#!/bin/bash 
- 
-#$ -N notify_test 
-#$ -pe mpich 2 
-#$ -notify 
-#$ -V 
-#$ -l h_rt=0:20:00 
-#$ -l s_rt=0:02:00 
- 
-echo $TMPDIR 
- 
-function sigusr1handler() 
-{ 
- date 
-        echo "SIGUSR1 caught by shell script" 1>&2 
-} 
-function sigusr2handler() 
-{ 
- date 
-        echo "SIGUSR2 caught by shell script" 1>&2 
-} 
-trap sigusr1handler SIGUSR1 
-trap sigusr2handler SIGUSR2 
- 
-echo "starting:" 
-date 
-# Start 
-# -q 0: disable "MPI progress Quiescence" error message 
- 
-#mpirun -q 0 -m $TMPDIR/machines -np $NSLOTS sleep 200 
-for i in {1..900} 
-do 
- echo "waiting $i" 
- sleep 10 
-done 
- 
-echo "finished:" 
-date 
-</code> 
- 
- 
-output in error (*.e*) file of this job example is: 
- 
-<code> 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 2 
-SIGUSR2 caught by shell script 
-</code> 
  
  • doku/runtime.1401275957.txt.gz
  • Last modified: 2014/05/28 11:19
  • by ir