hpc:accounting
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| hpc:accounting [2025/12/04 10:43] – [Resources available for research group] Yann Sagon | hpc:accounting [2025/12/10 07:41] (current) – [sreport examples] Yann Sagon | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | {{METATOC 1-5}} | + | {{METATOC 1-8}} |
| ====== Utilization and accounting ====== | ====== Utilization and accounting ====== | ||
| When you submit jobs, they are using physical resources such as CPUs, Memory, Network, GPUs, Energy etc. We keep track of the usage of some of those resource. On this page we'll let you know how to consult your usage of the resource. We have several tools that you can use to consult your utilization: | When you submit jobs, they are using physical resources such as CPUs, Memory, Network, GPUs, Energy etc. We keep track of the usage of some of those resource. On this page we'll let you know how to consult your usage of the resource. We have several tools that you can use to consult your utilization: | ||
| Line 18: | Line 18: | ||
| We use this model because our cluster is heterogeneous, | We use this model because our cluster is heterogeneous, | ||
| - | We also bill memory usage because some jobs consume very little CPU but require large amounts of memory, which means an entire compute node is occupied. This ensures that jobs using significant memory resources are accounted for fairly. | + | We also **account for memory usage** because some jobs consume very little CPU but require large amounts of memory, which means an entire compute node is occupied. This ensures that jobs using significant memory resources are accounted for fairly. |
| - | Example: A job using a GPU with a weight of 10 for 2 hours and memory equivalent to 5 CPU hours would be billed as 25 CPU hours. This approach guarantees consistent, transparent, | ||
| - | You can check the up to date conversion details by inspecting the parameters of any partition on the clusters. The same conversion table is applied | + | ==== Conversion Rules extract (see below for details) ==== |
| + | * **1 CPU core = 1 CPUh per hour** | ||
| + | * **1 GB RAM = 0.25 CPUh per hour** | ||
| + | * **1 GPU A100 (40 GB) = 5 CPUh per hour** | ||
| + | |||
| + | ==== Example Calculation ==== | ||
| + | Suppose you request: | ||
| + | * **2 CPUs** | ||
| + | * **20 GB RAM** | ||
| + | * **1 GPU A100** | ||
| + | |||
| + | The cost per hour is calculated as: | ||
| + | * CPU: 2 × 1 CPUh = **2 CPUh** | ||
| + | * RAM: 20 GB × 0.25 CPUh = **5 CPUh** | ||
| + | * GPU: 1 × 5 CPUh = **5 CPUh** | ||
| + | |||
| + | **Total per hour = 2 + 5 + 5 = 12 CPUh** | ||
| + | |||
| + | This approach guarantees consistent, transparent, | ||
| + | |||
| + | You can check the up to date conversion details by inspecting the parameters of any partition on the clusters. The same conversion table is applied | ||
| < | < | ||
| Line 136: | Line 155: | ||
| < | < | ||
| - | (baobab)-[sagon@login1 | + | (baobab)-[sagon@login1] $ ug_slurm_usage_per_user.py --help |
| - | usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--group GROUP] [--cluster {baobab, | + | usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--group GROUP] [--cluster {baobab, |
| - | | + | |
| Retrieve HPC utilization statistics for a user or group of users. | Retrieve HPC utilization statistics for a user or group of users. | ||
| Line 151: | Line 170: | ||
| --cluster {baobab, | --cluster {baobab, | ||
| Cluster name (default: all clusters). | Cluster name (default: all clusters). | ||
| - | --all_users | + | --all-users |
| - | --report_type | + | --aggregate |
| + | --report-type | ||
| Type of report: user (default) or account. | Type of report: user (default) or account. | ||
| - | --time_format | + | --time-format |
| Time format: Hours (default), Minutes, or Seconds. | Time format: Hours (default), Minutes, or Seconds. | ||
| --verbose | --verbose | ||
| Line 160: | Line 180: | ||
| By default when you run this script, it will print your past usage of the current month, for all the accounts you are member of. | By default when you run this script, it will print your past usage of the current month, for all the accounts you are member of. | ||
| + | === Usage details of a given PI === | ||
| + | < | ||
| + | (baobab)-[sagon@login1] $ ug_slurm_usage_per_user.py --pi **** --report-type account --start 2025-01-01 | ||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Cluster/ | ||
| + | |||
| + | Usage reported in TRES Hours | ||
| + | |||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Cluster | ||
| + | --------- | ||
| + | bamboo | ||
| + | baobab | ||
| + | yggdrasil | ||
| + | Total usage: 1.14M | ||
| + | </ | ||
| + | === Usage details of all PIs associated with a private group === | ||
| Usage example to see the resource usage from the beginning of 2025 for all the PIs and associate users of the group private_xxx. The group private_xxx owns several compute nodes: | Usage example to see the resource usage from the beginning of 2025 for all the PIs and associate users of the group private_xxx. The group private_xxx owns several compute nodes: | ||
| < | < | ||
| - | (baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py --group private_xxx --start=2025-01-01 --report_type=account | + | (baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py --group private_xxx --start=2025-01-01 --report-type=account |
| -------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | ||
| Line 186: | Line 225: | ||
| Total usage: 7.36M | Total usage: 7.36M | ||
| </ | </ | ||
| + | |||
| + | === Aggregate usage by all users of a given PI === | ||
| + | < | ||
| + | $ ug_slurm_usage_per_user.py --pi ***** --report-type account --start 2025-01-01 --all-users --aggregate | ||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Cluster/ | ||
| + | |||
| + | Usage reported in TRES Hours | ||
| + | |||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Login Used | ||
| + | -------- | ||
| + | a***u 547746 | ||
| + | d***i 272634 | ||
| + | d***on | ||
| + | d***l 86860 | ||
| + | e***j 60649 | ||
| + | v***d0 | ||
| + | w***r 29886 | ||
| + | s***o 9120 | ||
| + | k***k 1853 | ||
| + | m***l 1 | ||
| + | Total usage: 1.14M | ||
| + | |||
| + | </ | ||
| + | |||
| + | |||
| === sreport examples === | === sreport examples === | ||
| + | |||
| + | <note important> | ||
| Here are some examples that can give you a starting point : | Here are some examples that can give you a starting point : | ||
hpc/accounting.1764844983.txt.gz · Last modified: by Yann Sagon