This is an old revision of the document!
Accounting info of your project
Accounting script
The script <html> <font color=#cc3300> <b> vsc4CoreHours.py </b> </font> </html> on VSC4 or <html> <font color=#cc3300> <b> vsc3CoreHours.py </b> </font> </html> on VSC3+ calculates the elapsed core-hours per user in your project and the total amount of core-hours in your project. The basic formula in this script takes into account the number of nodes per job and the time difference from start to end
Usage of the script
You may give start time -S …
and end time -E …
. Default start time is the start of VSC-3 (even on VSC4), 2015-04-01T00:00:00, default end time is today. Instead you may give a duration -D d
which gives you the core-hours within the past d days.
Examples:
vsc4CoreHours.py # total project time span vsc4CoreHours.py -D 7 # last week vsc4CoreHours.py -S 2019-04-23 -E 2019-05-26T00:00:01 vsc4CoreHours.py -E 2020-05-26 # project start until 2020-05-26 vsc4CoreHours.py -S 2019-04-23 # 2019-04-23 until today
sacct
In order to customize your accounting request, the command sacct
allows for assessing information from the SLURM job accounting log or SLURM database.
The default output values are jobs, job steps, status, and exit codes. By specifying the format, the output of sacct
can be customized.
In the framework of this section only a minimal subset of options is listed.
For the full list see the SLURM documentation on the web (sacct) or on the manual pages ([username@l34 ~]$ man sacct
).
Example 1: format specification
[...@... ~]$ sacct -o Account,User,UID,AveCPUFreq,Elapsed,Start,End,TotalCPU [...@... ~]$ sacct --format=Account,User,UID,AveCPUFreq,Elapsed,Start,End,TotalCPU [...@... ~]$ sacct --format=JobID,UID,State,ExitCode [...@... ~]$ sacct -o UID,User,Account,Group,JobID,JobName,Elapsed,Start,End
The options
-o<space|comma-separated list of formats like in the example above> # or --format=<no space|comma-separated list of formats>
specify the format. Available formats are displayed with the options
-e, --helpformat [...@... ~]$ sacct -e # or [...@... ~]$ sacct --helpformat
A shortcut for showing all parameters is the option
-l, --long
which is equivalent to specifying -o jobid,jobname,partition,maxvmsize,maxvmsizenode,maxvmsizetask, avevmsize,maxrss,maxrssnode,maxrsstask,averss,maxpages,maxpagesnode, maxpagestask,avepages,mincpu,mincpunode,mincputask,avecpu,ntasks, alloccpus,elapsed,state,exitcode,maxdiskread,maxdiskreadnode,maxdiskreadtask, avediskread,maxdiskwrite,maxdiskwritenode,maxdiskwritetask,avediskwrite, allocgres,reqgres
whereas minimum information (-o jobid,status,exitcode
) is returned via the option
-b, --brief
Example 2: start time|end time
sacct -S YYYY-MM-DD[THH:MM[:SS]] sacct -E YYYY-MM-DD[THH:MM[:SS]] sacct -S 2015-05-18T09:00:01 -E 2015-05-18T12:02:01 -X -T # Valid time formats are... # HH:MM[:SS] [AM|PM] # MMDD[YY] or MM/DD[/YY] or MM.DD[.YY] # MM/DD[/YY]-HH:MM[:SS] # YYYY-MM-DD[THH:MM[:SS]]
Example 3: job state
sacct -s R -S 2015-05-1917:00 -E 2015-05-1918:00 -X -T -o JobID,Start,End,State
In this example we ask for jobs which have been in the state running (-s R
) in the given time interval (-S …
start time and -E
end time). -X
and -T
see below. -o …
see above. The output may look like this
JobID Start End State ------------ ------------------- ------------------- ---------- 616785 2015-05-19T17:01:33 2015-05-19T17:15:13 CANCELLED+ 616835 2015-05-19T17:35:52 2015-05-19T18:00:00 RUNNING ... ... ... ... 616175_238 2015-05-19T17:52:00 2015-05-19T17:53:33 COMPLETED 616175_239 2015-05-19T17:52:02 2015-05-19T17:53:38 COMPLETED 616772_1 2015-05-19T17:52:16 2015-05-19T17:52:22 FAILED 616772_2 2015-05-19T17:52:22 2015-05-19T17:52:28 FAILED
The jobs in the given list have been running in the selected time interval, however, the column state reports the present state at the moment of execution of the sacct command.
Further possible parameters for the option -s
are: BF BOOT_FAIL, CA CANCELLED, CD COMPLETED, CF CONFIGURING, CG COMPLETING, F FAILED, NF NODE_FAIL, PD PENDING, PR PREEMPTED, R RUNNING, RS RESIZING, S SUSPENDED, TO TIMEOUT
-X
The option
[...@... ~]$ sacct -X # or [...@... ~]$ sacct --allocations
is useful because it shows only cumulative statistics for each job, not the intermediate steps.
-T
The option
[...@... ~]$ -T [...@... ~]$ --truncate
is supposed to truncate time. If a job started before the optionally given start time -S YYYY-MM-DD[THH:MM[:SS]]
, the start time would be truncated to YYYY-MM-DD[THH:MM[:SS]]. The same for end time and -E YYYY-MM-DD[THH:MM[:SS]]
.
We observed unexpected behavior of this option returning start times later than end times.
Further options
-g gid_list, --gid=gid_list --group=group_list # e.g., p70815 -j job(.step) , --jobs=job(.step) # 618093.batch,615402.54 --name=jobname_list # display jobs that have any of these name(s) -q, --qos # quality of service (QOS), e.g., normal_0064 -r, --partition= # e.g., mem_0064,mem_0256 -u uid_list, --uid=uid_list, --user=user_list # e.g., 74911
Job Memory Usage
sacct -j <job_ID> --format=JobID,MaxVMSize,MaxVMSizeNode,MaxVMSizeTask