Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
doku:runtime [2014/12/12 09:43] – [Soft runtime limit] irdoku:runtime [2021/05/13 17:44] (current) – removed goldenberg
Line 1: Line 1:
-====== Specifying the maximum runtime ====== 
-The specification of the maximum runtime may render the job execution faster by reducing the time waiting in the queue.  
-=== Slot reservation for large jobs === 
-In order to ensure that large jobs are able to run on the cluster and in order to avoid blocking by many smaller jobs,the queueing system used on the VSC enables for the **reservation of slots**. 
-This feature is **automatically** activated **for any jobs requiring 16 or more slots**. 
- 
-The down side of this feature is that a certain amount of resources is unused until the total amount of slots for a queued large job is available.  
-=== Faster scheduling by backfilling === 
-In the meantime waiting for a large job to be scheduled, idle resources could be used by smaller jobs. By conserving the priorities of the jobs on top of the list, only jobs can pass which have a maximum runtime shorter than the expected time needed to free the total amount of slots requested by the large job. 
- 
-Users can profit from this process called **backfilling** by adding this on top of the job submission script: 
-<code>#### backfilling: 
-#$ -l h_rt=01:00:00    # in this example, ONE HOUR (01:00:00) is the job's maximum run time</code> 
-Here, h_rt means "hard runtime". The job will be permitted to run __at most__ this long. Please add a little **buffer of about 10-50%** to the runtime you've estimated/measured, so the job doesn't get killed before its result is available! For example, if your last job came in at 03:57:03 and you think that the next one will be pretty similar in nature, try to specify "-l h_rt=06:00:00". 
- 
-Since this feature greatly improves the use of our precious computing time by reducing the amount of idle resources, we would like to encourage you to **use the feature of maximum runtime specification**: 
-  - a lot of electricity will be saved and  
-  - the time of your jobs spent in the queue is reduced.  
- 
- 
-====== Soft runtime limit ====== 
- 
-In SGE Jobs two runtime limits are available:  
-  * **soft runtime (s_rt):** \\ after this time has elapsed, GridEngine sends a SIGUSR1 signal to the process. 
-  * **hard runtime (h_rt):** \\ after this time has elapsed, GridEngine sends a SIGUSR2 signal and running processes are killed.  
- 
-The SIGUSR1 signal (s_rt) informs the process about the time already elapsed. In fact that the original process presumably exceeds h_rt, the process could be modified. 
-====example:==== 
-h_rt is a multiple of s_rt, here s_rt=0:02:00 and h_rt=0:20:00, a factor n=10: 
- 
-<code> 
-#!/bin/bash 
- 
-#$ -N notify_test 
-#$ -pe mpich 2 
-#$ -notify 
-#$ -V 
-#$ -l h_rt=0:20:00 
-#$ -l s_rt=0:02:00 
- 
-echo $TMPDIR 
- 
-function sigusr1handler() 
-{ 
- date 
-        echo "SIGUSR1 caught by shell script" 1>&2 
-} 
-function sigusr2handler() 
-{ 
- date 
-        echo "SIGUSR2 caught by shell script" 1>&2 
-} 
-trap sigusr1handler SIGUSR1 
-trap sigusr2handler SIGUSR2 
- 
-echo "starting:" 
-date 
-# Start 
-# -q 0: disable "MPI progress Quiescence" error message 
- 
-#mpirun -q 0 -m $TMPDIR/machines -np $NSLOTS sleep 200 
-for i in {1..900} 
-do 
- echo "waiting $i" 
- sleep 10 
-done 
- 
-echo "finished:" 
-date 
-</code> 
- 
- 
-output in error (*.e*) file of this job example is: 
- 
-<code> 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 1 
-SIGUSR1 caught by shell script 
-User defined signal 2 
-SIGUSR2 caught by shell script 
-</code> 
  
  • doku/runtime.1418377402.txt.gz
  • Last modified: 2014/12/12 09:43
  • by ir