This is an old revision of the document!
Submitting batch jobs
Sun grid engine (SGE)
The job script
It is recommended to write the job script using a text editors on the VSC Linux cluster. Editors in Windows may add additional invisible characters to the job file which render it unreadable and, thus, it is not executed.
A - the header of the job script
#$ -N <job_name> #$ -pe mpich <slots> #$ -V #$ -l h_rt=hh:mm:ss #$ -M <email address to notify of job events> #$ -m beas # all job events sent via email
- “<job_name>” is a freely chosen descriptive name,
- “<slots>” is the number of processor cores that you want to use for the calculation. The compute nodes are always reserved for your job, exclusively. If the value for “<slots>” is no integral multiple of 16, it is corrected to the next larger integral multiple of 16.
- “-V” declares that all environment variables in the qsub command's environment are to be exported to the batch job.
- “-l” specifies the job's runtime. This explicit specification is in particular advisable for jobs with short run times, i.e., several hours or even minutes. In order to reduce the time in the queue, see also the section on maximum runtime specification.
- “-M <email address>; -m beas” request E-Mail notifications concerning job events (b .. beginning, e .. end, a .. abort or reschedule, s .. suspend).
B - run executable
The job can be started in several ways,
- as single core job on one core (no MPI) task
./<executable>
- as parallel single core job (no MPI) on parallel cores (see also Sequential code)
- as MPI-enabled application
mpirun -m $TMPDIR/machines -np $NSLOTS <executable>
“<executable>” is substituted by the path of the MPI-enabled application.
Please note that the particular options to mpirun depend on the MPI version that you use. Current IntelMPI versions, for example, require the option -machinefile
instead of -m
:
mpirun -machinefile $TMPDIR/machines -np $NSLOTS <executable>
Please always check for the correct options with
mpirun -help
Example
Here is an example job-script, requesting 32 processor cores, which will run for a maximum of 3 hours and sends emails at the beginning and at the end of the job:
#$ -N hitchhiker #$ -pe mpich 32 #$ -V #$ -M <my.name@example.at> #$ -m be #$ -l h_rt=03:00:00 mpirun -machinefile $TMPDIR/machines -np $NSLOTS ./myjob
Submit your job:
- The job is submitted via the following command (“<job_file>” is the name of the file you just created):
qsub <job_file>
- Check if and where your job has been scheduled:
qstat
- Inspect the job output. Assuming your job was assigned the id “42” and your job's name was “hitchhiker”, you should be able to find the following files in the directory you started it from:
$ ls -l hitchhiker.o42 hitchhiker.e42 hitchhiker.po42 hitchhiker.pe42
In this example hitchhiker.o42 contains the output of your job. hitchhiker.e42 contains possible error messages. In hitchhiker.po42 and hitchhiker.pe42 you might find additional information related to the parallel computing environment.
- Delete Jobs:
$ qdel <job_id>
- View all jobs in the queue:
$ qstat -u \*
Advanced topics
NOTE
when the option #$ -V
is specified this error message will be generated in one of the output files of the grid engine:
/bin/sh: module: line 1: syntax error: unexpected end of file /bin/sh: error importing function definition for `module' bash: module: line 1: syntax error: unexpected end of file bash: error importing function definition for `module'
This message occurs due to a known bug in the grid engine which cannot handle functions defined in the user environment. This message can be safely ignored. This error message can be avoided by exporting particular environment variables only in your job-script:
#$ -v PATH #$ -v LD_LIBRARY_PATH