Differences

This shows you the differences between two versions of the page.

--- hpc:hpc_clusters [2023/05/15 13:48]
Yann Sagon [Private nodes]
+++ hpc:hpc_clusters [2024/04/09 16:07]
Yann Sagon [Price per hour]
@@ Line 1: / Line 1: @@
+{{METATOC 1-5}}
 ====== How our clusters work ======
@@ Line 27: / Line 29: @@
 ^ cluster name ^ datacentre ^ Interconnect ^ public CPU ^ public GPU ^ Total CPU size ^ Total GPU size ^
-| Baobab       | Dufour     | IB 40GB QDR  | ~900       | 0          | ~4200          | 95             |
+| Baobab       | Dufour     | IB 40GB QDR  | ~900       | 0          | ~9'736           | 271          |
-| Yggdrasil    | Astro      | IB 100GB EDR | ~3000      | 44         | ~4308          | 52             |
+| Yggdrasil    | Astro      | IB 100GB EDR | ~3000      | 44         | ~8'228           | 52           |
@@ Line 45: / Line 47: @@
 All those servers (login, compute, management and storage nodes) :
-  * run with the GNU/Linux distribution [[https://www.centos.org/|CentOS]].
+  * run with the GNU/Linux distribution [[https://rockylinux.org/|Rocky]].
   * are inter-connected on high speed InfiniBand network
     * 40Gbit/s (QDR) for Baobab.
@@ Line 67: / Line 69: @@
 that will use only CPU or GPU nodes.
-===== Private nodes =====
+===== Cost model =====
-<note important>Important update, draft preview.
+<note important>**Important update, draft preview.**
 We are currently in the process of implementing changes to the investment approach for the HPC service Baobab, wherein research groups will no longer purchase physical nodes as their property. Instead, they will have the option to pay for a share and duration of usage. This new approach offers several advantages for both the research groups and us as the service provider.
@@ Line 78: / Line 80: @@
 In cases where research groups have already purchased compute nodes, we offer them the opportunity to convert their ownership into credits for shares. We estimate that a compute node typically lasts for at least 6 years under normal conditions, and this conversion option ensures that the value of their existing investment is not lost.
 </note>
+==== Price per hour ====
+Overview:
+{{:hpc:pasted:20240404-092421.png}}
+You can find the whole table that you can send to the FNS {{:hpc:hpc:acrobat_2024-04-09_15-58-28.png?linkonly}}.
+==== Private nodes ====
 Research groups can buy "private" nodes to add in our clusters, which means their research group has a
@@ Line 85: / Line 95: @@
 Rules:
   * The compute node remains the research group property
-  * The compute node has a three years warranty. If it fails after the warranty expiration, the repair cost is to be paid at 100% by the research group
+  * There is a three-year warranty on the compute node. If there is a failure after the warranty period, 100% of the repair costs will be the responsibility of the research group. If the node is out of order, you have the option to have it repaired. In order to get a quote, we need to send the compute node to the vendor and the initial cost they will charge to do a quick diagnostic and make a quote is a maximum of 420 CHF, even if the node can't be repaired (worst case).
   * The research groups hasn't an admin right on it
   * The compute node is installed and maintained by the HPC team in the same way as the other compute nodes
   * The HPC team can decide to decommission the node when it is too old but the hardware will be in production for at least four years
+  *
 See the [[hpc/slurm#partitions|partitions]] section to have more details about the integration of your private node in the cluster.
@@ Line 190: / Line 203: @@
 See [[hpc::slurm#gpgpu_jobs|here]] how to request GPU for your jobs.
+==== Bamboo (coming soon) ====
+^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes         ^ Memory              ^Extra flag    ^ Status            ^
+| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | node[001-043] | 512GB               |              | to be installed   |
+| V8         | EPYC-72F3 | 3.7GHz  | 16 cores | "Rome" (7 nm)              | node[044-045] | 1TB                 |BIG_MEM       | to be installed   |
+^ GPU model   ^ Architecture ^ Mem  ^ Compute Capability ^ Slurm resource ^ Nb per node ^ Nodes            ^ Peer access between GPUs ^
+| RTX 3090    | Ampere       | 25GB | 8.6                | ampere         | 8           | gpu[001,002]     | NO                       |
+| A100        | Ampere       | 80GB | 8.0                | amper          | 4           | gpu[003]         | YES                      |
 ==== Baobab ====
@@ Line 197: / Line 223: @@
 Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table.
-^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                              ^Extra flag    ^ Status
+^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                                              ^Extra flag      ^ Status                       |
-| V2         | X5650     | 2.67GHz | 12 cores | "Westmere-EP" (32 nm)      | node[093-101,103-111,140-153       |              | decommissioned               |
+| V2         | X5650     | 2.67GHz | 12 cores | "Westmere-EP" (32 nm)      | node[093-101,103-111,140-153                       |                | decommissioned               |
-| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[001-005,007-023,025-056,058]          |              | to be decommissioned in 2022 |
+| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[009-010,012-018,020-025,029-044]              |                | decommissioned in 2023       |
-| V3         | E5-2670V0 | 2.60GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[059,061-062]                  |              | to be decommissioned in 2022 |
+| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[001-005,007-008,011,019,026-028,045-056,058]  |                | to be decommissioned in 2022 |
-| V3         | E5-4640V0 | 2.40GHz | 32 cores | "Sandy Bridge-EP" (32 nm)  | node[186]                          |              | to be decommissioned in 2022 |
+| V3         | E5-2670V0 | 2.60GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[059,061-062]                                  |                | to be decommissioned in 2022 |
-| V4         | E5-2650V2 | 2.60GHz | 16 cores | "Ivy Bridge-EP" (22 nm)    | node[063-066,154-172]              |              | to be decommissioned in 2022 |
+| V3         | E5-4640V0 | 2.40GHz | 32 cores | "Sandy Bridge-EP" (32 nm)  | node[186]                                          |                | to be decommissioned in 2022 |
-| V5         | E5-2643V3 | 3.40GHz | 12 cores | "Haswell-EP" (22 nm)       | gpu[002,012]                       |              | on prod           |
+| V4         | E5-2650V2 | 2.60GHz | 16 cores | "Ivy Bridge-EP" (22 nm)    | node[063-066,154-172]                              |                | to be decommissioned in 2022 |
-| V6         | E5-2630V4 | 2.20GHz | 20 cores | "Broadwell-EP" (14 nm)     | node[173-185,187-201,205-213]      |              | on prod           |
+| V5         | E5-2643V3 | 3.40GHz | 12 cores | "Haswell-EP" (22 nm)       | gpu[002,012]                                       |                | on prod                      |
-|  :::       |  :::      |  :::    |  :::     |  :::                       | gpu[004-010]                       | :::          | on prod           |
+| V6         | E5-2630V4 | 2.20GHz | 20 cores | "Broadwell-EP" (14 nm)     | node[173-185,187-201,205-213]                      |                | on prod                      |
-| V6         | E5-2637V4 | 3.50GHz | 8 cores  | "Broadwell-EP" (14 nm)     | node[218-219]                      |HIGH_FREQUENCY| on prod           |
+|  :::       |  :::      |  :::    |  :::     |  :::                       | gpu[004-010]                                       | :::            | on prod                      |
-| V6         | E5-2643V4 | 3.40GHz | 12 cores | "Broadwell-EP" (14 nm)     | node[202,204,216-217]              |HIGH_FREQUENCY| on prod           |
+| V6         | E5-2637V4 | 3.50GHz | 8 cores  | "Broadwell-EP" (14 nm)     | node[218-219]                                      | HIGH_FREQUENCY | on prod                      |
-| V6         | E5-2680V4 | 2.40GHz | 28 cores | "Broadwell-EP" (14 nm)     | node[203]                          |              | on prod           |
+| V6         | E5-2643V4 | 3.40GHz | 12 cores | "Broadwell-EP" (14 nm)     | node[202,204,216-217]                              | HIGH_FREQUENCY | on prod                      |
-| V7         | EPYC-7601 | 2.20GHz | 64 cores | "Naples" (14 nm)           | gpu[011]                           |              | on prod           |
+| V6         | E5-2680V4 | 2.40GHz | 28 cores | "Broadwell-EP" (14 nm)     | node[203]                                          |                | on prod                      |
-| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | node[273-277,285-288,312-320] gpu[013-031] |              | on prod           |
+| V7         | EPYC-7601 | 2.20GHz | 64 cores | "Naples" (14 nm)           | gpu[011]                                           |                | on prod                      |
-| V9         | GOLD-6240 | 2.60GHz | 36 cores | "Cascade Lake" (14 nm)     | node[265-272]                      |              | on prod           |
+| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | node[273-277,285-288,312-335] gpu[013-031]         |                | on prod                      |
+| V9         | GOLD-6240 | 2.60GHz | 36 cores | "Cascade Lake" (14 nm)     | node[265-272]                                      |                | on prod                      |
@@ Line 240: / Line 267: @@
 | RTX 2080 Ti | Turing       | 11GB | 7.5               | turing       | 8         | gpu[012-016]     |
 | RTX 2080 Ti | Turing       | 11GB | 7.5               | turing       | 4         | gpu[018-019]     |
-| RTX 3090    | Ampere       | 25GB | 8.6               | ampere       | 8         | gpu[017,021,026,034-035] |
+| RTX 3090    | Ampere       | 25GB | 8.6               | ampere       | 8         | gpu[017,021,025-026,034-035] |
 | RTX A5000   | Ampere       | 25GB | 8.6               | ampere       | 8         | gpu[044]         |
 | RTX 3080    | Ampere       | 10GB | 8.6               | ampere       | 8         | gpu[023-024,036-43] |
@@ Line 246: / Line 273: @@
 | A100        | Ampere       | 40GB | 8.0               | ampere       | 6         | gpu[022]         |
 | A100        | Ampere       | 40GB | 8.0               | ampere       | 1         | gpu[028]         |
-| A100        | Ampere       | 80GB | 8.0               | ampere       | 4         | gpu[029]         |
 | A100        | Ampere       | 40GB | 8.0               | ampere       | 4         | gpu[020,030-031] |
+| A100        | Ampere       | 80GB | 8.0               | ampere       | 4         | gpu[029]         |
 | A100        | Ampere       | 80GB | 8.0               | ampere       | 3         | gpu[032-033]     |
+| A100        | Ampere       | 80GB | 8.0               | ampere       | 2         | gpu[045]         |

eResearch Doc

User Tools

Site Tools

Differences

Page Tools