Differences

This shows you the differences between two versions of the page.

--- hpc:hpc_clusters [2023/06/22 09:48]
Yann Sagon [The clusters : Baobab and Yggdrasil]
+++ hpc:hpc_clusters [2024/04/09 16:07] (current)
Yann Sagon [Price per hour]
@@ Line 47: / Line 47: @@
 All those servers (login, compute, management and storage nodes) :
-  * run with the GNU/Linux distribution [[https://www.centos.org/|CentOS]].
+  * run with the GNU/Linux distribution [[https://rockylinux.org/|Rocky]].
   * are inter-connected on high speed InfiniBand network
     * 40Gbit/s (QDR) for Baobab.
@@ Line 69: / Line 69: @@
 that will use only CPU or GPU nodes.
-===== Private nodes =====
+===== Cost model =====
 <note important>**Important update, draft preview.**
@@ Line 80: / Line 80: @@
 In cases where research groups have already purchased compute nodes, we offer them the opportunity to convert their ownership into credits for shares. We estimate that a compute node typically lasts for at least 6 years under normal conditions, and this conversion option ensures that the value of their existing investment is not lost.
 </note>
+==== Price per hour ====
+Overview:
+{{:hpc:pasted:20240404-092421.png}}
+You can find the whole table that you can send to the FNS {{:hpc:hpc:acrobat_2024-04-09_15-58-28.png?linkonly|here}}.
+==== Private nodes ====
 Research groups can buy "private" nodes to add in our clusters, which means their research group has a
@@ Line 87: / Line 95: @@
 Rules:
   * The compute node remains the research group property
-  * The compute node has a three years warranty. If it fails after the warranty expiration, the repair cost is to be paid at 100% by the research group
+  * There is a three-year warranty on the compute node. If there is a failure after the warranty period, 100% of the repair costs will be the responsibility of the research group. If the node is out of order, you have the option to have it repaired. In order to get a quote, we need to send the compute node to the vendor and the initial cost they will charge to do a quick diagnostic and make a quote is a maximum of 420 CHF, even if the node can't be repaired (worst case).
   * The research groups hasn't an admin right on it
   * The compute node is installed and maintained by the HPC team in the same way as the other compute nodes
   * The HPC team can decide to decommission the node when it is too old but the hardware will be in production for at least four years
+  *
 See the [[hpc/slurm#partitions|partitions]] section to have more details about the integration of your private node in the cluster.
@@ Line 192: / Line 203: @@
 See [[hpc::slurm#gpgpu_jobs|here]] how to request GPU for your jobs.
+==== Bamboo (coming soon) ====
+^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes         ^ Memory              ^Extra flag    ^ Status            ^
+| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | node[001-043] | 512GB               |              | to be installed   |
+| V8         | EPYC-72F3 | 3.7GHz  | 16 cores | "Rome" (7 nm)              | node[044-045] | 1TB                 |BIG_MEM       | to be installed   |
+^ GPU model   ^ Architecture ^ Mem  ^ Compute Capability ^ Slurm resource ^ Nb per node ^ Nodes            ^ Peer access between GPUs ^
+| RTX 3090    | Ampere       | 25GB | 8.6                | ampere         | 8           | gpu[001,002]     | NO                       |
+| A100        | Ampere       | 80GB | 8.0                | amper          | 4           | gpu[003]         | YES                      |
 ==== Baobab ====
@@ Line 199: / Line 223: @@
 Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table.
-^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                              ^Extra flag    ^ Status
+^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes                                              ^Extra flag      ^ Status                       |
-| V2         | X5650     | 2.67GHz | 12 cores | "Westmere-EP" (32 nm)      | node[093-101,103-111,140-153       |              | decommissioned               |
+| V2         | X5650     | 2.67GHz | 12 cores | "Westmere-EP" (32 nm)      | node[093-101,103-111,140-153                       |                | decommissioned               |
-| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[001-005,007-023,025-056,058]          |              | to be decommissioned in 2022 |
+| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[009-010,012-018,020-025,029-044]              |                | decommissioned in 2023       |
-| V3         | E5-2670V0 | 2.60GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[059,061-062]                  |              | to be decommissioned in 2022 |
+| V3         | E5-2660V0 | 2.20GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[001-005,007-008,011,019,026-028,045-056,058]  |                | to be decommissioned in 2022 |
-| V3         | E5-4640V0 | 2.40GHz | 32 cores | "Sandy Bridge-EP" (32 nm)  | node[186]                          |              | to be decommissioned in 2022 |
+| V3         | E5-2670V0 | 2.60GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | node[059,061-062]                                  |                | to be decommissioned in 2022 |
-| V4         | E5-2650V2 | 2.60GHz | 16 cores | "Ivy Bridge-EP" (22 nm)    | node[063-066,154-172]              |              | to be decommissioned in 2022 |
+| V3         | E5-4640V0 | 2.40GHz | 32 cores | "Sandy Bridge-EP" (32 nm)  | node[186]                                          |                | to be decommissioned in 2022 |
-| V5         | E5-2643V3 | 3.40GHz | 12 cores | "Haswell-EP" (22 nm)       | gpu[002,012]                       |              | on prod           |
+| V4         | E5-2650V2 | 2.60GHz | 16 cores | "Ivy Bridge-EP" (22 nm)    | node[063-066,154-172]                              |                | to be decommissioned in 2022 |
-| V6         | E5-2630V4 | 2.20GHz | 20 cores | "Broadwell-EP" (14 nm)     | node[173-185,187-201,205-213]      |              | on prod           |
+| V5         | E5-2643V3 | 3.40GHz | 12 cores | "Haswell-EP" (22 nm)       | gpu[002,012]                                       |                | on prod                      |
-|  :::       |  :::      |  :::    |  :::     |  :::                       | gpu[004-010]                       | :::          | on prod           |
+| V6         | E5-2630V4 | 2.20GHz | 20 cores | "Broadwell-EP" (14 nm)     | node[173-185,187-201,205-213]                      |                | on prod                      |
-| V6         | E5-2637V4 | 3.50GHz | 8 cores  | "Broadwell-EP" (14 nm)     | node[218-219]                      |HIGH_FREQUENCY| on prod           |
+|  :::       |  :::      |  :::    |  :::     |  :::                       | gpu[004-010]                                       | :::            | on prod                      |
-| V6         | E5-2643V4 | 3.40GHz | 12 cores | "Broadwell-EP" (14 nm)     | node[202,204,216-217]              |HIGH_FREQUENCY| on prod           |
+| V6         | E5-2637V4 | 3.50GHz | 8 cores  | "Broadwell-EP" (14 nm)     | node[218-219]                                      | HIGH_FREQUENCY | on prod                      |
-| V6         | E5-2680V4 | 2.40GHz | 28 cores | "Broadwell-EP" (14 nm)     | node[203]                          |              | on prod           |
+| V6         | E5-2643V4 | 3.40GHz | 12 cores | "Broadwell-EP" (14 nm)     | node[202,204,216-217]                              | HIGH_FREQUENCY | on prod                      |
-| V7         | EPYC-7601 | 2.20GHz | 64 cores | "Naples" (14 nm)           | gpu[011]                           |              | on prod           |
+| V6         | E5-2680V4 | 2.40GHz | 28 cores | "Broadwell-EP" (14 nm)     | node[203]                                          |                | on prod                      |
-| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | node[273-277,285-288,312-320] gpu[013-031] |              | on prod           |
+| V7         | EPYC-7601 | 2.20GHz | 64 cores | "Naples" (14 nm)           | gpu[011]                                           |                | on prod                      |
-| V9         | GOLD-6240 | 2.60GHz | 36 cores | "Cascade Lake" (14 nm)     | node[265-272]                      |              | on prod           |
+| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | node[273-277,285-288,312-335] gpu[013-031]         |                | on prod                      |
+| V9         | GOLD-6240 | 2.60GHz | 36 cores | "Cascade Lake" (14 nm)     | node[265-272]                                      |                | on prod                      |
@@ Line 242: / Line 267: @@
 | RTX 2080 Ti | Turing       | 11GB | 7.5               | turing       | 8         | gpu[012-016]     |
 | RTX 2080 Ti | Turing       | 11GB | 7.5               | turing       | 4         | gpu[018-019]     |
-| RTX 3090    | Ampere       | 25GB | 8.6               | ampere       | 8         | gpu[017,021,026,034-035] |
+| RTX 3090    | Ampere       | 25GB | 8.6               | ampere       | 8         | gpu[017,021,025-026,034-035] |
 | RTX A5000   | Ampere       | 25GB | 8.6               | ampere       | 8         | gpu[044]         |
 | RTX 3080    | Ampere       | 10GB | 8.6               | ampere       | 8         | gpu[023-024,036-43] |
@@ Line 248: / Line 273: @@
 | A100        | Ampere       | 40GB | 8.0               | ampere       | 6         | gpu[022]         |
 | A100        | Ampere       | 40GB | 8.0               | ampere       | 1         | gpu[028]         |
-| A100        | Ampere       | 80GB | 8.0               | ampere       | 4         | gpu[029]         |
 | A100        | Ampere       | 40GB | 8.0               | ampere       | 4         | gpu[020,030-031] |
+| A100        | Ampere       | 80GB | 8.0               | ampere       | 4         | gpu[029]         |
 | A100        | Ampere       | 80GB | 8.0               | ampere       | 3         | gpu[032-033]     |
+| A100        | Ampere       | 80GB | 8.0               | ampere       | 2         | gpu[045]         |

eResearch Doc

User Tools

Site Tools

Differences

Page Tools