hpc:accounting
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
hpc:accounting [2025/01/23 10:23] – Yann Sagon | hpc:accounting [2025/03/13 09:57] (current) – [Report and statistics with sreport] Yann Sagon | ||
---|---|---|---|
Line 4: | Line 4: | ||
- | ===== Differences between | + | ===== Comparison of sreport, sacct, and sshare ===== |
- | * sacct: Displays only account jobs, excluding time requested via reservation. If duplicate jobs exist, only one is returned. | + | We use **sreport** as our primary accounting reference. However, you may find other tools useful for specific purposes. Here's a comparison: |
- | * sreport: By default, the report is truncated if a job' | + | |
- | * sshare: Avoid using sshare as an accounting reference; the displayed values are adjusted due to fairshare calculations. | + | |
+ | * **sacct**: Displays only account jobs, excluding time allocated via reservations. If duplicate jobs exist, only one is shown. | ||
+ | * **sreport**: | ||
+ | * **sshare**: Not recommended for accounting purposes; displayed values are adjusted based on fairshare calculations. | ||
===== Resource accounting uniformization ===== | ===== Resource accounting uniformization ===== | ||
Line 21: | Line 21: | ||
For example, a job using a GPU with a weight of 10 for 2 hours and memory equivalent to 5 CPU hours would be billed as 25 CPU hours. This approach ensures consistent, transparent, | For example, a job using a GPU with a weight of 10 for 2 hours and memory equivalent to 5 CPU hours would be billed as 25 CPU hours. This approach ensures consistent, transparent, | ||
+ | You can see the detail of the conversion by looking at the parameter of a random partition on any of the clusters. We are using the same conversion table everywhere. | ||
+ | < | ||
+ | (bamboo)-[root@slurm1 ~]$ scontrol show partition debug-cpu | grep TRESBillingWeights | tr "," | ||
+ | | ||
+ | Mem=0.25G | ||
+ | GRES/gpu=1 | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | GRES/ | ||
+ | </ | ||
+ | |||
+ | Here you can see for example that using a gpu nvidia_a100-pcie-40gb for 1 hour is equivalent in term of cost to use 5 CPUhour. | ||
+ | |||
+ | ===== Resources available for research group ===== | ||
+ | |||
+ | |||
+ | |||
+ | Research groups that have invested in the HPC cluster by purchasing private CPU or GPU nodes benefit from high priority access to these resources. | ||
+ | |||
+ | While these nodes remain available to all users, owners receive priority scheduling and a designated number of included compute hours per year. | ||
+ | |||
+ | To check the details of their owned resources, users can run the script '' | ||
+ | |||
+ | Example: | ||
+ | < | ||
+ | ug_getNodeCharacteristicsSummary.sh --partitions private-< | ||
+ | host sn | ||
+ | ------ | ||
+ | cpu084 | ||
+ | cpu085 | ||
+ | cpu086 | ||
+ | cpu087 | ||
+ | cpu088 | ||
+ | cpu089 | ||
+ | cpu090 | ||
+ | cpu209 | ||
+ | cpu210 | ||
+ | cpu211 | ||
+ | cpu212 | ||
+ | cpu213 | ||
+ | cpu226 | ||
+ | cpu227 | ||
+ | cpu228 | ||
+ | cpu229 | ||
+ | cpu277 | ||
+ | gpu002 | ||
+ | gpu012 | ||
+ | gpu017 | ||
+ | gpu023 | ||
+ | gpu024 | ||
+ | gpu044 | ||
+ | gpu047 | ||
+ | gpu049 | ||
+ | |||
+ | ============================================================ Summary ============================================================ | ||
+ | Total CPUs: 1364 Total CPUs memory[GB]: 6059 Total GPUs: 61 Total GPUs memory[MB]: 142300 Billing: 1959 CPUhours per year: 10.30M | ||
+ | </ | ||
+ | |||
+ | How to read the output: | ||
+ | * **host**: the hostname of the compute node | ||
+ | * **sn**: the serial number of the node | ||
+ | * **cpu**: the number of CPUs available in the node | ||
+ | * **mem**: the quantity of memory on the node in GB | ||
+ | * **gpunumber**: | ||
+ | * **gpudeleted**: | ||
+ | * **gpumodel**: | ||
+ | * **gpumemory**: | ||
+ | * **purchasedate**: | ||
+ | * **months remaining in prod. (Jan 2025)**: the number of months the node remains the property of the research group, the reference date is indicated in parenthesis. In this example it is January 2025. | ||
+ | * **billing**: | ||
+ | |||
+ | You can modify the reference year if you want to " | ||
===== Job accounting ===== | ===== Job accounting ===== | ||
- | If you are interested in your HPC usage, group usage, job wait time, etc., we have the right tools for you. | ||
==== OpenXDMoD ==== | ==== OpenXDMoD ==== | ||
Line 56: | Line 135: | ||
- | We wrote a helper that you can use to get your past resource usage on the cluster. | + | We wrote a helper that you can use to get your past resource usage on the cluster. |
+ | * for each user of a given account (PI) | ||
+ | * total usage of a given account (PI) | ||
< | < | ||
- | (baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py --help | + | (baobab)-[sagon@login1 ~]$ ug_slurm_usage_per_user.py -h |
- | usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--verbose] | + | usage: ug_slurm_usage_per_user.py [-h] [--user USER] [--start START] [--end END] [--pi PI] [--cluster CLUSTER] [--all_users] [--report_type {user, |
Retrieve HPC utilization statistics for a user within a specified time range. | Retrieve HPC utilization statistics for a user within a specified time range. | ||
options: | options: | ||
- | -h, --help | + | -h, --help |
- | --user USER The username to retrieve utilization for. | + | --user USER |
- | --start START Start date (default: first day of current month). | + | --start START |
- | --end END End date (default: current time). | + | --end END |
- | --pi PI Specify the PI manually (optional). If not provided, it will be auto-detected. | + | --pi PI |
- | --verbose | + | --cluster CLUSTER |
+ | --all_users | ||
+ | --report_type {user, | ||
+ | Report type: UserUtilizationByAccount or AccountUtilizationByUser | ||
+ | --time_format TIME_FORMAT | ||
+ | Specify the time formt for the reporting. Default is by hours. You can use Minutes or Seconds | ||
+ | --verbose | ||
</ | </ | ||
By default when you run this script, it will print your past usage of the current month, for all the accounts you are member of. | By default when you run this script, it will print your past usage of the current month, for all the accounts you are member of. | ||
+ | |||
+ | |||
hpc/accounting.1737627811.txt.gz · Last modified: 2025/01/23 10:23 by Yann Sagon