User Tools

Site Tools


hpc:accounting

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
hpc:accounting [2025/12/08 12:46] – [Usage details of all PIs associated with a private group] Yann Sagonhpc:accounting [2026/03/19 13:51] (current) Yann Sagon
Line 1: Line 1:
-{{METATOC 1-5}}+{{METATOC 1-8}}
 ====== Utilization and accounting ====== ====== Utilization and accounting ======
 When you submit jobs, they are using physical resources such as CPUs, Memory, Network, GPUs, Energy etc. We keep track of the usage of some of those resource. On this page we'll let you know how to consult your usage of the resource. We have several tools that you can use to consult your utilization: sacct, sreport, openxdmod When you submit jobs, they are using physical resources such as CPUs, Memory, Network, GPUs, Energy etc. We keep track of the usage of some of those resource. On this page we'll let you know how to consult your usage of the resource. We have several tools that you can use to consult your utilization: sacct, sreport, openxdmod
Line 18: Line 18:
 We use this model because our cluster is heterogeneous, and both the computational power and the cost of GPUs vary significantly depending on the model. To ensure fairness and transparency, each GPU type is assigned a weight that reflects its relative performance compared to a CPU core. Similarly, memory usage is converted into CPU-hour equivalents based on predefined weights. We use this model because our cluster is heterogeneous, and both the computational power and the cost of GPUs vary significantly depending on the model. To ensure fairness and transparency, each GPU type is assigned a weight that reflects its relative performance compared to a CPU core. Similarly, memory usage is converted into CPU-hour equivalents based on predefined weights.
  
-We also bill memory usage because some jobs consume very little CPU but require large amounts of memory, which means an entire compute node is occupied. This ensures that jobs using significant memory resources are accounted for fairly.+We also **account for memory usage** because some jobs consume very little CPU but require large amounts of memory, which means an entire compute node is occupied. This ensures that jobs using significant memory resources are accounted for fairly.
  
-Example: A job using a GPU with a weight of 10 for 2 hours and memory equivalent to 5 CPU hours would be billed as 25 CPU hours. This approach guarantees consistent, transparent, and fair resource accounting across all heterogeneous components of the cluster. 
  
-You can check the up to date conversion details by inspecting the parameters of any partition on the clusters. The same conversion table is applied everywhere.+==== Conversion Rules extract (see below for details) ==== 
 +  * **1 CPU core = 1 CPUh per hour** 
 +  * **1 GB RAM = 0.25 CPUh per hour** 
 +  * **1 GPU A100 (40 GB) = 5 CPUh per hour** 
 + 
 +==== Example Calculation ==== 
 +Suppose you request: 
 +  * **2 CPUs** 
 +  * **20 GB RAM** 
 +  * **1 GPU A100** 
 + 
 +The cost per hour is calculated as: 
 +  * CPU: 2 × 1 CPUh = **2 CPUh** 
 +  * RAM: 20 GB × 0.25 CPUh = **5 CPUh** 
 +  * GPU: 1 × 5 CPUh = **5 CPUh** 
 + 
 +**Total per hour = 2 + 5 + 5 = 12 CPUh** 
 + 
 +This approach guarantees consistent, transparent, and fair resource accounting across all heterogeneous components of the cluster. 
 + 
 +You can check the up to date conversion details by inspecting the parameters of any partition on the clusters. The same conversion table is applied on all our clusters and partitions.
  
 <code> <code>
Line 52: Line 71:
  
 To view details of owned resources, users can run the script:   To view details of owned resources, users can run the script:  
-''ug_getNodeCharacteristicsSummary.py''+''ug_getNodeCharacteristicsSummary''
 This script provides a summary of the node characteristics within the cluster. This script provides a summary of the node characteristics within the cluster.
  
Line 59: Line 78:
 Output example of the script: Output example of the script:
 <code> <code>
-ug_getNodeCharacteristicsSummary.py --partitions private-<group>-gpu private-<group>-cpu --cluster <cluster> --summary+ug_getNodeCharacteristicsSummary --partitions private-<group>-gpu private-<group>-cpu --cluster <cluster> --summary
 host    sn             cpu    mem    gpunumber    gpudeleted  gpumodel                      gpumemory  purchasedate      months remaining in prod. (Jan 2025)    billing host    sn             cpu    mem    gpunumber    gpudeleted  gpumodel                      gpumemory  purchasedate      months remaining in prod. (Jan 2025)    billing
 ------  -----------  -----  -----  -----------  ------------  --------------------------  -----------  --------------  --------------------------------------  --------- ------  -----------  -----  -----  -----------  ------------  --------------------------  -----------  --------------  --------------------------------------  ---------
Line 136: Line 155:
  
 <code> <code>
-(baobab)-[sagon@login1] $ ug_slurm_usage_per_user.py --help +(baobab)-[sagon@login1] $ ug_slurm_usage_per_user --help 
-usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--group GROUP] [--cluster {baobab,yggdrasil,bamboo}] [--all-users] [--aggregate] [--report-type {user,account}]+usage: ug_slurm_usage_per_user [-h] [--user USER] [--start START] [--end END] [--pi PI] [--group GROUP] [--cluster {baobab,yggdrasil,bamboo}] [--all-users] [--aggregate] [--report-type {user,account}]
                                   [--time-format {Hours,Minutes,Seconds}] [--verbose]                                   [--time-format {Hours,Minutes,Seconds}] [--verbose]
  
Line 163: Line 182:
 === Usage details of a given PI === === Usage details of a given PI ===
 <code> <code>
-(baobab)-[sagon@login1] $ ug_slurm_usage_per_user.py --pi **** --report-type account --start 2025-01-01+(baobab)-[sagon@login1] $ ug_slurm_usage_per_user --pi **** --report-type account --start 2025-01-01
 -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
  
Line 184: Line 203:
 Usage example to see the resource usage from the beginning of 2025 for all the PIs and associate users of the group private_xxx. The group private_xxx owns several compute nodes: Usage example to see the resource usage from the beginning of 2025 for all the PIs and associate users of the group private_xxx. The group private_xxx owns several compute nodes:
 <code> <code>
-(baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py --group private_xxx --start=2025-01-01 --report-type=account+(baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user --group private_xxx --start=2025-01-01 --report-type=account
 -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
  
Line 209: Line 228:
 === Aggregate usage by all users of a given PI === === Aggregate usage by all users of a given PI ===
 <code> <code>
-$ ug_slurm_usage_per_user.py --pi ***** --report-type account --start 2025-01-01 --all-users --aggregate+$ ug_slurm_usage_per_user --pi ***** --report-type account --start 2025-01-01 --all-users --aggregate
 -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
  
Line 237: Line 256:
  
 === sreport examples === === sreport examples ===
 +
 +<note important>by default, the TRES (tracking resource) shown by sreport is CPUh. If you want to see what will be accounted and billed, you need to use the TRES "billing".</note>
  
 Here are some examples that can give you a starting point : Here are some examples that can give you a starting point :
hpc/accounting.1765197993.txt.gz · Last modified: by Yann Sagon