hpc:accounting
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| hpc:accounting [2025/02/10 16:06] – [Resources available for research group] Yann Sagon | hpc:accounting [2025/12/08 12:49] (current) – Yann Sagon | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | {{METATOC 1-5}} | + | {{METATOC 1-8}} |
| ====== Utilization and accounting ====== | ====== Utilization and accounting ====== | ||
| When you submit jobs, they are using physical resources such as CPUs, Memory, Network, GPUs, Energy etc. We keep track of the usage of some of those resource. On this page we'll let you know how to consult your usage of the resource. We have several tools that you can use to consult your utilization: | When you submit jobs, they are using physical resources such as CPUs, Memory, Network, GPUs, Energy etc. We keep track of the usage of some of those resource. On this page we'll let you know how to consult your usage of the resource. We have several tools that you can use to consult your utilization: | ||
| Line 13: | Line 13: | ||
| ===== Resource accounting uniformization ===== | ===== Resource accounting uniformization ===== | ||
| - | We charge usage uniformly | + | We apply uniform resource accounting |
| + | A CPU hour represents one hour of processing time on a single CPU core. | ||
| - | A CPU hour represents one hour of processing time by a single | + | We use this model because our cluster is heterogeneous, |
| - | For GPUs, SLURM assigns a conversion factor to each GPU model through TRESBillingWeights (see below the conversion table), reflecting its computational performance relative to a CPU. Similarly, | + | We also bill memory usage because some jobs consume very little |
| - | For example, a job using a GPU with a weight of 10 for 2 hours and memory equivalent to 5 CPU hours would be billed as 25 CPU hours. This approach | + | Example: A job using a GPU with a weight of 10 for 2 hours and memory equivalent to 5 CPU hours would be billed as 25 CPU hours. This approach |
| - | You can see the detail of the conversion by looking at the parameter | + | You can check the up to date conversion |
| < | < | ||
| Line 45: | Line 46: | ||
| ===== Resources available for research group ===== | ===== Resources available for research group ===== | ||
| + | Research groups that have invested in the HPC cluster by purchasing private CPU or GPU nodes benefit from **high-priority access** to these resources. | ||
| + | Although these nodes remain available to all users, owners receive **priority scheduling** and a predefined annual allocation of compute hours, referred to as [[accounting# | ||
| + | The advantage of this approach is flexibility: | ||
| - | Research groups that have invested in the HPC cluster | + | To view details of owned resources, users can run the script: |
| + | '' | ||
| + | This script provides a summary of the node characteristics within | ||
| - | While these nodes remain available to all users, | + | **Note:** This model ensures **fairness** across |
| - | To check the details | + | Output example |
| - | + | ||
| - | Example: | + | |
| < | < | ||
| - | ug_getNodeCharacteristicsSummary.sh --partitions private-< | + | ug_getNodeCharacteristicsSummary.py --partitions private-< |
| host sn | host sn | ||
| ------ | ------ | ||
| cpu084 | cpu084 | ||
| - | cpu085 | + | [...] |
| - | cpu086 | + | |
| - | cpu087 | + | |
| cpu088 | cpu088 | ||
| - | cpu089 | + | [...] |
| - | cpu090 | + | |
| - | cpu209 | + | |
| - | cpu210 | + | |
| - | cpu211 | + | |
| - | cpu212 | + | |
| - | cpu213 | + | |
| cpu226 | cpu226 | ||
| - | cpu227 | + | [...] |
| - | cpu228 | + | |
| cpu229 | cpu229 | ||
| cpu277 | cpu277 | ||
| Line 99: | Line 94: | ||
| * **purchasedate**: | * **purchasedate**: | ||
| * **months remaining in prod. (Jan 2025)**: the number of months the node remains the property of the research group, the reference date is indicated in parenthesis. In this example it is January 2025. | * **months remaining in prod. (Jan 2025)**: the number of months the node remains the property of the research group, the reference date is indicated in parenthesis. In this example it is January 2025. | ||
| - | * **billing**: | + | * **billing**: |
| + | You can modify the reference year if you want to " | ||
| ===== Job accounting ===== | ===== Job accounting ===== | ||
| Line 113: | Line 108: | ||
| Openxdmod is integrated into our SI. When you connect to it, you'll get the profile " | Openxdmod is integrated into our SI. When you connect to it, you'll get the profile " | ||
| + | <note important> | ||
| ==== sacct ==== | ==== sacct ==== | ||
| You can see your job history using '' | You can see your job history using '' | ||
| Line 135: | Line 131: | ||
| - | We wrote a helper that you can use to get your past resource usage on the cluster. | + | We wrote a helper that you can use to get your past resource usage on the cluster. |
| + | * for each user of a given account (PI) | ||
| + | * total usage of a given account (PI) | ||
| < | < | ||
| - | (baobab)-[sagon@login1 | + | (baobab)-[sagon@login1] $ ug_slurm_usage_per_user.py --help |
| - | usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--verbose] | + | usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--group GROUP] [--cluster {baobab, |
| + | [--time-format {Hours, | ||
| - | Retrieve HPC utilization statistics for a user within a specified time range. | + | Retrieve HPC utilization statistics for a user or group of users. |
| options: | options: | ||
| - | -h, --help | + | -h, --help |
| - | --user USER The username | + | --user USER Username |
| - | --start START Start date (default: first day of current | + | --start START |
| - | --end END End date (default: | + | --end END |
| - | --pi PI Specify | + | --pi PI |
| - | --verbose | + | --group GROUP |
| + | --cluster {baobab, | ||
| + | Cluster name (default: all clusters). | ||
| + | --all-users | ||
| + | --aggregate | ||
| + | --report-type {user,account} | ||
| + | Type of report: user (default) or account. | ||
| + | | ||
| + | Time format: Hours (default), Minutes, or Seconds. | ||
| + | --verbose | ||
| </ | </ | ||
| By default when you run this script, it will print your past usage of the current month, for all the accounts you are member of. | By default when you run this script, it will print your past usage of the current month, for all the accounts you are member of. | ||
| + | === Usage details of a given PI === | ||
| + | < | ||
| + | (baobab)-[sagon@login1] $ ug_slurm_usage_per_user.py --pi **** --report-type account --start 2025-01-01 | ||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Cluster/ | ||
| + | |||
| + | Usage reported in TRES Hours | ||
| + | |||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Cluster | ||
| + | --------- | ||
| + | bamboo | ||
| + | baobab | ||
| + | yggdrasil | ||
| + | Total usage: 1.14M | ||
| + | </ | ||
| + | |||
| + | === Usage details of all PIs associated with a private group === | ||
| + | |||
| + | Usage example to see the resource usage from the beginning of 2025 for all the PIs and associate users of the group private_xxx. The group private_xxx owns several compute nodes: | ||
| + | < | ||
| + | (baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py --group private_xxx --start=2025-01-01 --report-type=account | ||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Cluster/ | ||
| + | |||
| + | Usage reported in TRES Hours | ||
| + | |||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Cluster | ||
| + | --------- | ||
| + | baobab | ||
| + | yggdrasil | ||
| + | bamboo | ||
| + | baobab | ||
| + | yggdrasil | ||
| + | bamboo | ||
| + | baobab | ||
| + | yggdrasil | ||
| + | [...] | ||
| + | Total usage: 7.36M | ||
| + | </ | ||
| + | |||
| + | === Aggregate usage by all users of a given PI === | ||
| + | < | ||
| + | $ ug_slurm_usage_per_user.py --pi ***** --report-type account --start 2025-01-01 --all-users --aggregate | ||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Cluster/ | ||
| + | |||
| + | Usage reported in TRES Hours | ||
| + | |||
| + | -------------------------------------------------------------------------------- | ||
| + | |||
| + | Login Used | ||
| + | -------- | ||
| + | a***u 547746 | ||
| + | d***i 272634 | ||
| + | d***on | ||
| + | d***l 86860 | ||
| + | e***j 60649 | ||
| + | v***d0 | ||
| + | w***r 29886 | ||
| + | s***o 9120 | ||
| + | k***k 1853 | ||
| + | m***l 1 | ||
| + | Total usage: 1.14M | ||
| + | |||
| + | </ | ||
| + | |||
hpc/accounting.1739203584.txt.gz · Last modified: (external edit)