User Tools

Site Tools


hpc:accounting

Utilization and accounting

When you submit jobs, they are using physical resources such as CPUs, Memory, Network, GPUs, Energy etc. We keep track of the usage of some of those resource. On this page we'll let you know how to consult your usage of the resource. We have several tools that you can use to consult your utilization: sacct, sreport, openxdmod

Job accounting

If you are interested in your HPC usage, group usage, job wait time, etc., we have the right tools for you.

OpenXDMoD

We track the job usage of our clusters here: https://openxdmod.hpc.unige.ch/

We have a tutorial explaining some of the features: here

Openxdmod is integrated into our SI. When you connect to it, you'll get the profile “user” and the data are filtered by your user by default. If you are a PI, you can ask us to change your profile to be PI.

sacct

You can see your job history using sacct:

[sagon@master ~] $ sacct -u $USER -S 2021-04-01
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
45517641        jobname  debug-cpu   rossigno          1     FAILED      2:0
45517641.ba+      batch              rossigno          1     FAILED      2:0
45517641.ex+     extern              rossigno          1  COMPLETED      0:0
45517641.0            R              rossigno          1     FAILED      2:0
45518119        jobname  debug-cpu   rossigno          1  COMPLETED      0:0
45518119.ba+      batch              rossigno          1  COMPLETED      0:0
45518119.ex+     extern              rossigno          1  COMPLETED      0:0

Report and statistics with sreport

To get reporting about your past jobs, you can use sreport (https://slurm.schedmd.com/sreport.html).

Here are some examples that can give you a starting point :

To get the number of jobs you ran (you ⇔ $USER) in 2018 (dates in yyyy-mm-dd format) :

[brero@login2 ~]$ sreport job sizesbyaccount user=$USER PrintJobCount start=2018-01-01 end=2019-01-01
 
--------------------------------------------------------------------------------
Job Sizes 2018-01-01T00:00:00 - 2018-12-31T23:59:59 (31536000 secs)
Units are in number of jobs ran
--------------------------------------------------------------------------------
  Cluster   Account     0-49 CPUs   50-249 CPUs  250-499 CPUs  500-999 CPUs  >= 1000 CPUs % of cluster 
--------- --------- ------------- ------------- ------------- ------------- ------------- ------------ 
   baobab      root           180            40             4            15             0      100.00%

You can see how many jobs were run (grouped by allocated CPU). You can also see we specified an extra day for the end date end=2019-01-01 in order to cover the whole year :

Job Sizes 2018-01-01T00:00:00 - 2018-12-31T23:59:59''

You can also check how much CPU time (seconds) you have used on the cluster between since 2019-09-01 :

[brero@login2 ~]$ sreport cluster AccountUtilizationByUser user=$USER start=2019-09-01 -t Seconds
--------------------------------------------------------------------------------
Cluster/Account/User Utilization 2019-09-01T00:00:00 - 2019-09-09T23:59:59 (64800 secs)
Usage reported in CPU Seconds
--------------------------------------------------------------------------------
  Cluster         Account     Login     Proper Name     Used   Energy 
--------- --------------- --------- --------------- -------- -------- 
   baobab        rossigno     brero   BRERO Massimo     1159        0 

In this example, we added the time -t Seconds parameter to have the output in seconds. Minutes or Hours are also possible.

Please note :

  • By default, the CPU time is in Minutes
  • It takes up to an hour for Slurm to upate this information in its database, so be patient
  • If you don't specify a start, nor an end date, yesterday's date will be used.
  • The CPU time is the time that was allocated to you. It doesn't matter if the CPU was actually used or not. So let's say you ask for 15min allocation, then do nothing for 3 minutes then run 1 CPU at 100% for 4 minutes and exit the allocation, then 7 minutes will be added to your CPU time.

Tip : If you absolutely need a report including your job that ran on the same day, you can override the default end date by forcing tomorrow's date :

sreport cluster AccountUtilizationByUser user=$USER start=2019-09-01 end=$(date --date="tomorrow" +%Y-%m-%d) -t seconds

Differences between sreport, sacct and sshare

  • sacct: Displays only account jobs, excluding time requested via reservation. If duplicate jobs exist, only one is returned.
  • sreport: By default, the report is truncated if a job's wall time overlaps the report's time span. For jobs using a reservation, the idle requested time is distributed among all users with access to the reservation.
  • sshare: Avoid using sshare as an accounting reference; the displayed values are adjusted due to fairshare calculations.
hpc/accounting.txt · Last modified: 2024/12/05 15:09 by Yann Sagon