User Tools

Site Tools


hpc:hpc_clusters

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
hpc:hpc_clusters [2025/09/01 07:46] – [Cost of Renting a Compute Node] Yann Sagonhpc:hpc_clusters [2025/12/16 17:04] (current) – [Cost model] Yann Sagon
Line 82: Line 82:
   * **Purchase or rent** compute nodes for more intensive workloads.   * **Purchase or rent** compute nodes for more intensive workloads.
  
-You can as well find a summary of how this model is implemented yethttps://hpc-community.unige.ch/t/hpc-accounting-summary/4056+ 
 +**Summary:** 
 + 
 +  * Starting this year, you receive a **CPU hours credit** based on the hardware you own (if any) in the cluster (private partition). 
 +  * You can find instructions on how to check your annual credit here[[accounting#resources_available_for_research_group|Resources Available for Research Groups]]If you know your research group has bought some compute nodes but your PI doesn'appear in the report, please contact us. 
 +  * The credit calculation in the provided script assumes a **5-year hardware ownership period**. However, **if** this policy was introduced after your compute nodes were purchased, we have extended the production duration by two years. 
 +  * To ensure **flexibility and simplicity**, we have standardized resource usage by converting CPU Memory, and GPU hours into CPU hours, using different conversion ratios depending on the GPU type. More details can be found here: [[accounting#resource_accounting_uniformization|Resource Accounting Uniformization]]. 
 +  * You can use your credit across all three clusters (**Baobab, Yggdrasil, and Bamboo**), not just on your private compute nodes. However, when using your own compute nodes, you will receive a **higher priority**. 
 +  * To check your group's current resource usage, visit: [[accounting#report_and_statistics_with_sreport|Report and Statistics with sreport]].
 ==== Price per hour ==== ==== Price per hour ====
 <WRAP center round important 60%> <WRAP center round important 60%>
Line 98: Line 106:
  
  
 +=== Progressive Pricing for HPC Compute Hours ===
 +A tiered pricing model applies to compute hour billing. Discounts increase as usage grows: once you reach 200K, 500K, and 1,000K compute hours, an additional 10% reduction is applied at each threshold. This ensures cost efficiency for large-scale workloads.
  
 +^ Usage (Compute Hours) ^ Discount Applied ^
 +| 0 – 199,999           | Base Rate       |
 +| 200,000 – 499,999     | Base Rate -10%  |
 +| 500,000 – 999,999     | Base Rate -20%  |
 +| 1,000,000+            | Base Rate -30%  |
 ===== Purchasing or Renting Private Compute Nodes ===== ===== Purchasing or Renting Private Compute Nodes =====
  
Line 106: Line 121:
  
   * **Shared Integration**: The compute node is added to the corresponding shared partition. Other users may utilize it when the owning group is not using it. For details, refer to the [[hpc/slurm#partitions|partitions]] section.   * **Shared Integration**: The compute node is added to the corresponding shared partition. Other users may utilize it when the owning group is not using it. For details, refer to the [[hpc/slurm#partitions|partitions]] section.
-  * **Maximum Usage**: Research groups can utilize up to **60% of the node's maximum theoretical computational capacity**. This ensures fair access to shared resources. See [[hpc:hpc_clusters#usage_limit|Usage limit]] +  * **Usage Limit**: Each research group may consume up to **60% of the theoretical usage credit associated with the compute node**. This policy ensures fair access to shared cluster resources. See  the [[hpc:hpc_clusters#usage_limit|Usage limit]] policy for more details 
-  * **Cost**: In addition to the base cost of the compute node, a **15% surcharge** is applied to cover operational expenses such as cables, racks, switches, and storage.+  * **Cost**: In addition to the base cost of the compute node, a **15% surcharge** is applied to cover operational expenses such as cables, racks, switches, and storage (not yet valid).
   * **Ownership Period**: The compute node remains the property of the research group for **5 years**. After this period, the node may remain in production but will only be accessible via public and shared partitions.   * **Ownership Period**: The compute node remains the property of the research group for **5 years**. After this period, the node may remain in production but will only be accessible via public and shared partitions.
   * **Warranty and Repairs**: Nodes come with a **3-year warranty**. If the node fails after this period, the research group is responsible for **100% of repair costs**. Repairing the node involves sending it to the vendor for diagnostics and a quote, with a maximum diagnostic fee of **420 CHF**, even if the node is irreparable.   * **Warranty and Repairs**: Nodes come with a **3-year warranty**. If the node fails after this period, the research group is responsible for **100% of repair costs**. Repairing the node involves sending it to the vendor for diagnostics and a quote, with a maximum diagnostic fee of **420 CHF**, even if the node is irreparable.
Line 193: Line 208:
 We usually install and order the nodes twice per year. We usually install and order the nodes twice per year.
  
-If you want to ask a financial contribution from UNIGE you must complete a COINF application : https://www.unige.ch/rectorat/commissions/coinf/appel-a-projets +If you want to ask a financial contribution from UNIGE you must complete submit request to the [[https://www.unige.ch/rectorat/commissions/coinf/appel-a-projets 
 +|COINF]].
 ====== Use Baobab for teaching ====== ====== Use Baobab for teaching ======
  
Line 233: Line 248:
  
 Both clusters contain a mix of "public" nodes provided by the University of Geneva, a "private" nodes in  Both clusters contain a mix of "public" nodes provided by the University of Geneva, a "private" nodes in 
-general paid 50% by the University and 50% by a research group for instance. Any user of the clusters can +general funded 50% by the University through the [[https://www.unige.ch/rectorat/commissions/coinf/appel-a-projets 
 +|COINF]] and 50% by a research group for instance. Any user of the clusters can 
 request compute resources on any node (public and private), but a research group who owns "private" nodes has  request compute resources on any node (public and private), but a research group who owns "private" nodes has 
 a higher priority on its "private" nodes and can request a longer execution time. a higher priority on its "private" nodes and can request a longer execution time.
Line 287: Line 303:
 === CPUs on Bamboo ===  === CPUs on Bamboo === 
  
-^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                             ^ Memory             ^Extra flag    ^ Status            ^ +^ Generation ^ Model     ^ Freq    ^ Nb cores  ^ Architecture               ^ Nodes                             ^ Memory             ^Extra flag    ^ Status            ^ 
-| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | cpu[001-043,049-052],gpu[001-002] | 512GB              |              | on prod           | +| V8         | EPYC-7742 | 2.25GHz | 128 cores | "Rome" (7 nm)              | cpu[001-043,049-052],gpu[001-002] | 512GB              |              | on prod           | 
-| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | cpu[049-052]                      | 256GB              |              | on prod           | +| V8         | EPYC-7742 | 2.25GHz | 128 cores | "Rome" (7 nm)              | cpu[049-052]                      | 256GB              |              | on prod           | 
-| V8         | EPYC-7302P| 3.0GHz  | 16 cores | "Rome" (7 nm)              | gpu003                            | 512GB              |              | on prod           | +| V8         | EPYC-7302P| 3.0GHz  | 16 cores  | "Rome" (7 nm)              | gpu003                            | 512GB              |              | on prod           | 
-| V10        | EPYC-72F3 | 3.7GHz  | 16 cores | "Milan" (7 nm)             | cpu[044-045]                      | 1TB                |BIG_MEM       | on prod           | +| V10        | EPYC-72F3 | 3.7GHz  | 16 cores  | "Milan" (7 nm)             | cpu[044-045]                      | 1TB                |BIG_MEM       | on prod           | 
-| V10        | EPYC-7763 | 2.45GHz | 128 cores| "Milan" (7 nm)             | cpu[046-048]                      | 512GB              |              | on prod           | +| V10        | EPYC-7763 | 2.45GHz | 128 cores | "Milan" (7 nm)             | cpu[046-048]                      | 512GB              |              | on prod           | 
-| V11        | EPYC-9554 | 3.10GHz | 64 cores | "Genoa" (5 nm)             | gpu[004-005]                            | 768GB              |              | on prod           |+| V11        | EPYC-9554 | 3.10GHz | 64 cores  | "Genoa" (5 nm)             | gpu[008]                          | 768GB              |              | on prod           | 
 +| V11        | EPYC-9554 | 3.10GHz | 128 cores | "Genoa" (5 nm)             | gpu[004-005]                      | 768GB              |              | on prod           | 
 +| V12        | EPYC-9654 | 3.70GHz | 96 cores  | "Genoa" (5 nm)             | gpu[006]                          | 768GB              |              | on prod           | 
 +| V13        | EPYC-9754 | 3.70GHz | 128 cores | "Genoa" (5 nm)             | gpu[007]                          | 768GB              |              | on prod           |
 === GPUs on Bamboo === === GPUs on Bamboo ===
  
-^ GPU model   ^ Architecture ^ Mem  ^ Compute Capability ^ Slurm resource ^ Nb per node ^ Nodes            ^ Peer access between GPUs ^ +^ GPU model              ^ Architecture ^ Mem   ^ Compute Capability ^ Slurm resource                ^ Nb per node ^ Nodes            ^ Peer access between GPUs ^ 
-| RTX 3090    | Ampere       | 25GB | 8.6                | ampere         | 8           | gpu[001,002]     | NO                       | +| RTX 3090               | Ampere       | 25GB  | 8.6                | nvidia_geforce_rtx_3090       | 8           | gpu[001,002]     | NO                       | 
-| A100        | Ampere       | 80GB | 8.0                | ampere         | 4           | gpu[003]         | YES                      | +| A100                   | Ampere       | 80GB  | 8.0                | nvidia_a100_80gb_pcie         | 4           | gpu[003]         | YES                      | 
-| H100        | Hopper       | 94GB | 9.0                | hopper         | 1           | gpu[004]         | NO                       | +| H100                   | Hopper       | 94GB  | 9.0                | nvidia_h100_nvl               | 1           | gpu[004]         | NO                       | 
-| H200        | Hopper       | 144GB | 9.0                | hopper                   | gpu[005]         | NO                       |+| H200                   | Hopper       | 144GB | 9.0                | nvidia_h200_nvl                         | gpu[005]         | NO                       | 
 +| H200                   | Hopper       | 144GB | 9.0                | nvidia_h200_nvl               | 4           | gpu[006]         | YES                      | 
 +| RTX Pro 6000 Blackwell | Blackwell    | 97GB  | 9.0                | nvidia_rtx_pro_6000_blackwell | 4           | gpu[008]         | NO                       |
  
 ==== Baobab ==== ==== Baobab ====
Line 308: Line 329:
 Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table. Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table.
  
-^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                                             ^Extra flag      ^ Status                       +^ Generation ^ Model        ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                                             ^Extra flag      ^ Status                       | 
-| V2         | X5650     | 2.67GHz | 12 cores | "Westmere-EP" (32 nm)      | cpu[093-101,103-111,140-153                                      | decommissioned               | +| V5         | E5-2643V3    | 3.40GHz | 12 cores | "Haswell-EP" (22 nm)       | gpu[002]                                          |                | on prod                      | 
-| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | cpu[009-010,012-018,020-025,029-044]              |                | decommissioned in 2023       | +| V6         | E5-2630V4    | 2.20GHz | 20 cores | "Broadwell-EP" (14 nm)     | cpu[173-185,187-201,205-213,220-229,237-264],gpu[004-009]        | on prod                      |  
-| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | cpu[011,019,026-028,042]                          |                | decommissioned in 2024       | +| V6         | E5-2637V4    | 3.50GHz | 8 cores  | "Broadwell-EP" (14 nm)     | cpu[218-219]                                      | HIGH_FREQUENCY | on prod                      | 
-| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | cpu[001-005,007-008,045-056,058]                  |                | decommissioned in 2024       | +| V6         | E5-2643V4    | 3.40GHz | 12 cores | "Broadwell-EP" (14 nm)     | cpu[202,216-217]                                  | HIGH_FREQUENCY | on prod                      | 
-| V3         | E5-2670V0 | 2.60GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | cpu[059,061-062]                                  |                | decommissioned in 2024       | +| V6         | E5-2680V4    | 2.40GHz | 28 cores | "Broadwell-EP" (14 nm)     | gpu[012]                                          |                | on prod                      | 
-| V3         | E5-4640V0 | 2.40GHz | 32 cores | "Sandy Bridge-EP" (32 nm)  | cpu[186]                                          |                | decommissioned in 2024       | +| V7         | EPYC-7601    | 2.20GHz | 64 cores | "Naples" (14 nm)           | gpu[011]                                          |                | on prod                      | 
-| V4         | E5-2650V2 | 2.60GHz | 16 cores | "Ivy Bridge-EP" (22 nm)    | cpu[063-066,154-172]                              |                | decommissioned in 2025 +| V8         | EPYC-7742    | 2.25GHz | 128 cores| "Rome" (7 nm)              | cpu[273-277,285-307,312-335],gpu[013-046]         |                | on prod                      | 
-| V5         | E5-2643V3 | 3.40GHz | 12 cores | "Haswell-EP" (22 nm)       | gpu[002]                                          |                | on prod                      | +| V9         | SILVER-4210R | 2.60GHz | 36 cores | "Cascade Lake" (14 nm)     | gpu010                                            |                | on prod                      | 
-| V6         | E5-2630V4 | 2.20GHz | 20 cores | "Broadwell-EP" (14 nm)     | cpu[173-185,187-201,205-213,220-229,237-264],gpu[004-010]         |                | on prod                      |  +| V9         | GOLD-6240    | 2.60GHz | 36 cores | "Cascade Lake" (14 nm)     | cpu[084-090,265-272,278-284,308-311,336-349]      |                | on prod                      | 
-| V6         | E5-2637V4 | 3.50GHz | 8 cores  | "Broadwell-EP" (14 nm)     | cpu[218-219]                                      | HIGH_FREQUENCY | on prod                      | +| V9      | GOLD-6244    | 3.60GHz | 16 cores | "Intel Xeon Gold 6244 CPU| cpu[351]                                          |                |                              | 
-| V6         | E5-2643V4 | 3.40GHz | 12 cores | "Broadwell-EP" (14 nm)     | cpu[202,216-217]                              | HIGH_FREQUENCY | on prod                      | +| V10        | EPYC-7763    | 2.45GHz | 128 cores| "Milan" (7 nm)             | cpu[001],gpu[047,048]                             |                | on prod                      | 
-| V6         | E5-2680V4 | 2.40GHz | 28 cores | "Broadwell-EP" (14 nm)     | gpu[012]                                 |                | on prod                      | +| V11        | EPYC-9554    | 3.10GHz | 128 cores| "Genoa" (5 nm)             | gpu[049]                                          |                | on prod                      | 
-| V7         | EPYC-7601 | 2.20GHz | 64 cores | "Naples" (14 nm)           | gpu[011]                                          |                | on prod                      | +V12        | EPYC-9654    | 3.70GHz | 192 cores| "Genoa" (5 nm)             | cpu[350]                                                         | on prod                      | 
-| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | cpu[273-277,285-307,312-335],gpu[013-046]                        | on prod                      | +| V12        | EPYC-9654    | 3.70GHz | 96 cores | "Genoa" (5 nm)             gpu[050]                                          |                | on prod                      |
-| V9         | GOLD-6240 | 2.60GHz | 36 cores | "Cascade Lake" (14 nm)     | cpu[084-090,265-272,278-284,308-311,336-349]      |                | on prod                      | +
-| V9      | GOLD-6244 | 3.60GHz | 16 cores | Intel Xeon Gold 6244 CPU” | cpu[351]                                          |                |                              | +
-| V10        | EPYC-7763 | 2.45GHz | 128 cores| "Milan" (7 nm)             | cpu[001],gpu[047,048]                                      |                | on prod                      | +
-| V11        | EPYC-9554 | 3.10GHz | 128 cores| "Genoa" (5 nm)             | gpu[049]                                          |                | on prod                      | +
-V11        | EPYC-9654 | 3.70GHz | 96 cores | "Genoa" (5 nm)             | cpu[350],gpu[050]                                 |                | on prod                      |+
  
 The "generation" column is just a way to classify the nodes on our clusters. In the following table you can see the features of each architecture. The "generation" column is just a way to classify the nodes on our clusters. In the following table you can see the features of each architecture.
Line 356: Line 372:
 | Titan X     | Pascal       | 12GB  | 6.1               | nvidia_titan_x             | titan                | 8         | gpu[009-010]     | | Titan X     | Pascal       | 12GB  | 6.1               | nvidia_titan_x             | titan                | 8         | gpu[009-010]     |
 | RTX 2080 Ti | Turing       | 11GB  | 7.5               | nvidia_geforce_rtx_2080_ti | turing               | 2         | gpu[011]         | | RTX 2080 Ti | Turing       | 11GB  | 7.5               | nvidia_geforce_rtx_2080_ti | turing               | 2         | gpu[011]         |
-| RTX 2080 Ti | Turing       | 11GB  | 7.5               | nvidia_geforce_rtx_2080_ti | turing               | 8         | gpu[012,015]     +| RTX 2080 Ti | Turing       | 11GB  | 7.5               | nvidia_geforce_rtx_2080_ti | turing               | 8         | gpu[015]         
-| RTX 2080 Ti | Turing       | 11GB  | 7.5                                          | turing               | 8         | gpu[013,016]     |+| RTX 2080 Ti | Turing       | 11GB  | 7.5               nvidia_geforce_rtx_2080_ti | turing               | 8         | gpu[013,016]     |
 | RTX 2080 Ti | Turing       | 11GB  | 7.5               | nvidia_geforce_rtx_2080_ti | turing               | 4         | gpu[018-019]     | | RTX 2080 Ti | Turing       | 11GB  | 7.5               | nvidia_geforce_rtx_2080_ti | turing               | 4         | gpu[018-019]     |
 | RTX 3090    | Ampere       | 25GB  | 8.6               | nvidia_geforce_rtx_3090    | ampere               | 8         | gpu[025]                 | | RTX 3090    | Ampere       | 25GB  | 8.6               | nvidia_geforce_rtx_3090    | ampere               | 8         | gpu[025]                 |
Line 386: Line 402:
 Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table. Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table.
  
-^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                        ^Extra flag    ^ +^ Generation ^ Model                                                                                                                                 ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                        ^ Extra flag    ^ 
-| V9 | [[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html|GOLD-6240]]   | 2.60GHz | 36 cores  | “Cascade Lake” (14 nm)    | cpu[001-083,091-097,120-122]             |              | +| V9         | [[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html|GOLD-6240]]   | 2.60GHz | 36 cores  | “Cascade Lake” (14 nm)    | cpu[001-083,091-097,120-122]  |              | 
-| V9 | [[https://ark.intel.com/content/www/us/en/ark/products/192442/intel-xeon-gold-6244-processor-24-75m-cache-3-60-ghz.html|GOLD-6244]]   | 3.60GHz | 16 cores  | “Cascade Lake” (14 nm)    | cpu[112-115]             |              | +| V9         | [[https://ark.intel.com/content/www/us/en/ark/products/192442/intel-xeon-gold-6244-processor-24-75m-cache-3-60-ghz.html|GOLD-6244]]   | 3.60GHz | 16 cores  | “Cascade Lake” (14 nm)    | cpu[112-115]                  |              | 
-| V8 | EPYC-7742    | 2.25GHz | 128 cores | "Rome (7 nm) "      | cpu[123-150]             |              | +| V8         | EPYC-7742                                                                                                                             | 2.25GHz | 128 cores | "Rome (7 nm) "            | cpu[123-150]                  |              | 
-| V9 | [[https://ark.intel.com/content/www/fr/fr/ark/products/193390/intel-xeon-silver-4208-processor-11m-cache-2-10-ghz.html|SILVER-4208]] | 2.10GHz | 16 cores  | “Cascade Lake” (14 nm)    | gpu[001-006,008]         |              | +| V9         | [[https://ark.intel.com/content/www/fr/fr/ark/products/193390/intel-xeon-silver-4208-processor-11m-cache-2-10-ghz.html|SILVER-4208]]  | 2.10GHz | 16 cores  | “Cascade Lake” (14 nm)    | gpu[001-006,008]              |              | 
-| V9 | [[https://ark.intel.com/content/www/us/en/ark/products/193954/intel-xeon-gold-6234-processor-24-75m-cache-3-30-ghz.html|GOLD-6234]]   | 3.30GHz | 16 cores  | “Cascade Lake” (14 nm)    | gpu[007]                  |              |+| V9         | [[https://ark.intel.com/content/www/us/en/ark/products/193954/intel-xeon-gold-6234-processor-24-75m-cache-3-30-ghz.html|GOLD-6234]]   | 3.30GHz | 16 cores  | “Cascade Lake” (14 nm)    | gpu[007]                      |              |  
 +| V12        | EPYC-9654                                                                                                                             | 3.70GHz | 192 cores | “Genoa” (5 nm)            | cpu[159-164]                  |              | 
  
 The "generation" column is just a way to classify the nodes on our clusters. In the following table you can see the features of each architecture. The "generation" column is just a way to classify the nodes on our clusters. In the following table you can see the features of each architecture.
hpc/hpc_clusters.1756712781.txt.gz · Last modified: by Yann Sagon