Differences

This shows you the differences between two versions of the page.

--- hpc:accounting [2025/12/04 10:43] – [Resources available for research group] Yann Sagon
+++ hpc:accounting [2025/12/10 07:41] (current) – [sreport examples] Yann Sagon
@@ Line 1: / Line 1: @@
-{{METATOC 1-5}}
+{{METATOC 1-8}}
 ====== Utilization and accounting ======
 When you submit jobs, they are using physical resources such as CPUs, Memory, Network, GPUs, Energy etc. We keep track of the usage of some of those resource. On this page we'll let you know how to consult your usage of the resource. We have several tools that you can use to consult your utilization: sacct, sreport, openxdmod
@@ Line 18: / Line 18: @@
 We use this model because our cluster is heterogeneous, and both the computational power and the cost of GPUs vary significantly depending on the model. To ensure fairness and transparency, each GPU type is assigned a weight that reflects its relative performance compared to a CPU core. Similarly, memory usage is converted into CPU-hour equivalents based on predefined weights.
-We also bill memory usage because some jobs consume very little CPU but require large amounts of memory, which means an entire compute node is occupied. This ensures that jobs using significant memory resources are accounted for fairly.
+We also **account for memory usage** because some jobs consume very little CPU but require large amounts of memory, which means an entire compute node is occupied. This ensures that jobs using significant memory resources are accounted for fairly.
-Example: A job using a GPU with a weight of 10 for 2 hours and memory equivalent to 5 CPU hours would be billed as 25 CPU hours. This approach guarantees consistent, transparent, and fair resource accounting across all heterogeneous components of the cluster.
-You can check the up to date conversion details by inspecting the parameters of any partition on the clusters. The same conversion table is applied everywhere.
+==== Conversion Rules extract (see below for details) ====
+  * **1 CPU core = 1 CPUh per hour**
+  * **1 GB RAM = 0.25 CPUh per hour**
+  * **1 GPU A100 (40 GB) = 5 CPUh per hour**
+==== Example Calculation ====
+Suppose you request:
+  * **2 CPUs**
+  * **20 GB RAM**
+  * **1 GPU A100**
+The cost per hour is calculated as:
+  * CPU: 2 × 1 CPUh = **2 CPUh**
+  * RAM: 20 GB × 0.25 CPUh = **5 CPUh**
+  * GPU: 1 × 5 CPUh = **5 CPUh**
+**Total per hour = 2 + 5 + 5 = 12 CPUh**
+This approach guarantees consistent, transparent, and fair resource accounting across all heterogeneous components of the cluster.
+You can check the up to date conversion details by inspecting the parameters of any partition on the clusters. The same conversion table is applied on all our clusters and partitions.
 <code>
@@ Line 136: / Line 155: @@
 <code>
-(baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py -h
+(baobab)-[sagon@login1] $ ug_slurm_usage_per_user.py --help
-usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--group GROUP] [--cluster {baobab,yggdrasil,bamboo}] [--all_users] [--report_type {user,account}] [--time_format {Hours,Minutes,Seconds}]
+usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--group GROUP] [--cluster {baobab,yggdrasil,bamboo}] [--all-users] [--aggregate] [--report-type {user,account}]
-                                  [--verbose]
+                                  [--time-format {Hours,Minutes,Seconds}] [--verbose]
 Retrieve HPC utilization statistics for a user or group of users.
@@ Line 151: / Line 170: @@
   --cluster {baobab,yggdrasil,bamboo}
                         Cluster name (default: all clusters).
-  --all_users           Include all users under the PI account.
+  --all-users           Include all users under the PI account.
-  --report_type {user,account}
+  --aggregate           Aggregate the usage per user.
+  --report-type {user,account}
                         Type of report: user (default) or account.
-  --time_format {Hours,Minutes,Seconds}
+  --time-format {Hours,Minutes,Seconds}
                         Time format: Hours (default), Minutes, or Seconds.
   --verbose             Verbose output.
@@ Line 160: / Line 180: @@
 By default when you run this script, it will print your past usage of the current month, for all the accounts you are member of.
+=== Usage details of a given PI ===
+<code>
+(baobab)-[sagon@login1] $ ug_slurm_usage_per_user.py --pi **** --report-type account --start 2025-01-01
+--------------------------------------------------------------------------------
+Cluster/Account/User Utilization 2025-01-01T00:00:00 - 2025-12-08T13:59:59 (29512800 secs)
+Usage reported in TRES Hours
+--------------------------------------------------------------------------------
+Cluster    Login    Proper Name    Account    TRES Name      Used
+---------  -------  -------------  ---------  -----------  ------
+bamboo                             krusek     billing      176681
+baobab                             krusek     billing      961209
+yggdrasil                          krusek     billing           0
+Total usage: 1.14M
+</code>
+=== Usage details of all PIs associated with a private group ===
 Usage example to see the resource usage from the beginning of 2025 for all the PIs and associate users of the group private_xxx. The group private_xxx owns several compute nodes:
 <code>
-(baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py --group private_xxx --start=2025-01-01 --report_type=account
+(baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py --group private_xxx --start=2025-01-01 --report-type=account
 --------------------------------------------------------------------------------
@@ Line 186: / Line 225: @@
 Total usage: 7.36M
 </code>
+=== Aggregate usage by all users of a given PI ===
+<code>
+$ ug_slurm_usage_per_user.py --pi ***** --report-type account --start 2025-01-01 --all-users --aggregate
+--------------------------------------------------------------------------------
+Cluster/Account/User Utilization 2025-01-01T00:00:00 - 2025-12-08T13:59:59 (29512800 secs)
+Usage reported in TRES Hours
+--------------------------------------------------------------------------------
+Login       Used
+--------  ------
+a***u    547746
+d***i    272634
+d***on    91178
+d***l     86860
+e***j     60649
+v***d0    37962
+w***r     29886
+s***o      9120
+k***k      1853
+m***l         1
+Total usage: 1.14M
+</code>
 === sreport examples ===
+<note important>by default, the TRES (tracking resource) shown by sreport is CPUh. If you want to see what will be accounted and billed, you need to use the TRES "billing".</note>
 Here are some examples that can give you a starting point :