This version (2024/08/01 12:00) was approved by goldenberg.The Previously approved version (2023/01/30 14:14) is available.Diff

Accounting info of your project

The script vsc4CoreHours.py on VSC4 calculates the elapsed core-hours per user in your project and the total amount of core-hours in your project. The basic formula in this script takes into account the number of nodes per job and the time difference from start to end

Usage of the script

You may give start time -S … and end time -E …. Default start time on the clusters is the start of VSC-3, 2015-04-01T00:00:00, default end time is today. Instead you may give a duration -D d which gives you the core-hours within the past d days.

Examples:
vsc4CoreHours.py                     # total project time span
vsc4CoreHours.py -D 7                # last week
vsc4CoreHours.py -S 2019-04-23 -E 2019-05-26T00:00:01
vsc4CoreHours.py -E 2020-05-26       # project start until 2020-05-26
vsc4CoreHours.py -S 2019-04-23       # 2019-04-23 until today

In order to customize your accounting request, the command sacct allows for assessing information from the SLURM job accounting log or SLURM database. The default output values are jobs, job steps, status, and exit codes. By specifying the format, the output of sacct can be customized. In the framework of this section only a minimal subset of options is listed.

For the full list see the SLURM documentation on the web (sacct) or on the manual pages ([username@l34 ~]$ man sacct).

[...@... ~]$ sacct -o Account,User,UID,AveCPUFreq,Elapsed,Start,End,TotalCPU
[...@... ~]$ sacct --format=Account,User,UID,AveCPUFreq,Elapsed,Start,End,TotalCPU
[...@... ~]$ sacct --format=JobID,UID,State,ExitCode
[...@... ~]$ sacct -o UID,User,Account,Group,JobID,JobName,Elapsed,Start,End

The options

-o<space|comma-separated list of formats like in the example above>       #   or     
--format=<no space|comma-separated list of formats>

specify the format. Available formats are displayed with the options

-e, --helpformat

[...@... ~]$ sacct -e      #   or  
[...@... ~]$ sacct --helpformat

A shortcut for showing all parameters is the option

-l, --long 

which is equivalent to specifying -o jobid,jobname,partition,maxvmsize,maxvmsizenode,maxvmsizetask, avevmsize,maxrss,maxrssnode,maxrsstask,averss,maxpages,maxpagesnode, maxpagestask,avepages,mincpu,mincpunode,mincputask,avecpu,ntasks, alloccpus,elapsed,state,exitcode,maxdiskread,maxdiskreadnode,maxdiskreadtask, avediskread,maxdiskwrite,maxdiskwritenode,maxdiskwritetask,avediskwrite, allocgres,reqgres whereas minimum information (-o jobid,status,exitcode) is returned via the option

-b, --brief
sacct -S YYYY-MM-DD[THH:MM[:SS]]
sacct -E YYYY-MM-DD[THH:MM[:SS]]
sacct -S 2015-05-18T09:00:01 -E 2015-05-18T12:02:01  -X   -T
# Valid time formats are...
# HH:MM[:SS] [AM|PM] 
# MMDD[YY] or MM/DD[/YY] or MM.DD[.YY] 
# MM/DD[/YY]-HH:MM[:SS] 
# YYYY-MM-DD[THH:MM[:SS]]
sacct -s R -S 2015-05-1917:00 -E 2015-05-1918:00 -X -T -o JobID,Start,End,State

In this example we ask for jobs which have been in the state running (-s R) in the given time interval (-S … start time and -E end time). -X and -T see below. -o … see above. The output may look like this

       JobID               Start                 End      State 
------------ ------------------- ------------------- ---------- 
616785       2015-05-19T17:01:33 2015-05-19T17:15:13 CANCELLED+ 
616835       2015-05-19T17:35:52 2015-05-19T18:00:00    RUNNING 
 ...                       ...              ...            ...
616175_238   2015-05-19T17:52:00 2015-05-19T17:53:33  COMPLETED 
616175_239   2015-05-19T17:52:02 2015-05-19T17:53:38  COMPLETED 
616772_1     2015-05-19T17:52:16 2015-05-19T17:52:22     FAILED 
616772_2     2015-05-19T17:52:22 2015-05-19T17:52:28     FAILED 

The jobs in the given list have been running in the selected time interval, however, the column state reports the present state at the moment of execution of the sacct command.

Further possible parameters for the option -s are: BF BOOT_FAIL, CA CANCELLED, CD COMPLETED, CF CONFIGURING, CG COMPLETING, F FAILED, NF NODE_FAIL, PD PENDING, PR PREEMPTED, R RUNNING, RS RESIZING, S SUSPENDED, TO TIMEOUT

The option

[...@... ~]$ sacct -X              # or
[...@... ~]$ sacct --allocations

is useful because it shows only cumulative statistics for each job, not the intermediate steps.

The option

[...@... ~]$ -T
[...@... ~]$ --truncate

is supposed to truncate time. If a job started before the optionally given start time -S YYYY-MM-DD[THH:MM[:SS]], the start time would be truncated to YYYY-MM-DD[THH:MM[:SS]]. The same for end time and -E YYYY-MM-DD[THH:MM[:SS]].

We observed unexpected behavior of this option returning start times later than end times.

-g gid_list, --gid=gid_list --group=group_list # e.g., p70815
-j job(.step) , --jobs=job(.step)              # 618093.batch,615402.54
--name=jobname_list        # display jobs that have any of these name(s)
-q, --qos                  # quality of service (QOS), e.g., normal_0064
-r, --partition=           # e.g., mem_0064,mem_0256
-u uid_list, --uid=uid_list, --user=user_list  # e.g., 74911
sacct -j <job_ID> --format=JobID,MaxVMSize,MaxVMSizeNode,MaxVMSizeTask
  • doku/slurm_sacct.txt
  • Last modified: 2024/07/08 07:51
  • by grokyta