User Tools

Site Tools


hpc:hpc_clusters

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
hpc:hpc_clusters [2025/01/17 09:01] – [Private nodes (renting or buying)] Yann Sagonhpc:hpc_clusters [2026/03/19 13:23] (current) – [CPU MODELS — baobab] Adrien Albert
Line 11: Line 11:
  
 ====== The clusters : Baobab, Yggdrasil and Bamboo ====== ====== The clusters : Baobab, Yggdrasil and Bamboo ======
 +{{anchor:hpc_clusters_the_clusters}}
  
 The University of Geneva owns three HPC clusters or supercomputers : **Baobab**, **Yggdrasil** and **Bamboo**. The University of Geneva owns three HPC clusters or supercomputers : **Baobab**, **Yggdrasil** and **Bamboo**.
Line 41: Line 42:
  
 Each cluster is composed of : Each cluster is composed of :
-  * a **login node** (aka **headnode**) allowing users to connect and submit //jobs// to the cluster.+  * a **login node** (aka **headnode**) allowing users to connect and submit //jobs// to the cluster. Aach user is limited to 2 CPU cores and 8 GB of RAM on the login node.
   * many **compute nodes** which provide the computing power. The compute nodes are not all identical ; they all provide CPU cores (from 8 to 128 cores depending on the model), and some nodes also have GPUs or more RAM (see [[hpc/hpc_clusters#compute_nodes|below]]).   * many **compute nodes** which provide the computing power. The compute nodes are not all identical ; they all provide CPU cores (from 8 to 128 cores depending on the model), and some nodes also have GPUs or more RAM (see [[hpc/hpc_clusters#compute_nodes|below]]).
   * **management servers** that you don't need to worry about, that's the HPC engineers' job. The management servers are here to provide the necessary services such as all the applications (with EasyBuild / module), Slurm job management and queuing system, ways for the HPC engineers to (re-)deploy compute nodes automatically, etc.   * **management servers** that you don't need to worry about, that's the HPC engineers' job. The management servers are here to provide the necessary services such as all the applications (with EasyBuild / module), Slurm job management and queuing system, ways for the HPC engineers to (re-)deploy compute nodes automatically, etc.
Line 72: Line 73:
 ===== Cost model ===== ===== Cost model =====
  
-<note important>**Important updatedraft preview.** +In September 2024we {{ :hpc:communication_facturation_hpc_2025_vfinale.pdf |announced}} via our mailing list (Some of the information is now obsolete. Please see below for the updated information.) that our service would become a paid offeringUsers can find detailed information about this change on the page below and in our [[hpc:faq#cost|FAQ]]We are committed to providing transparent communication and ensuring you have all the necessary details about our new pricing model.
-We are currently in the process of implementing changes to the investment approach for the HPC service Baobab, wherein research groups will no longer purchase physical nodes as their propertyInstead, they will have the option to pay for a share and duration of usageThis new approach offers several advantages for both the research groups and us as the service provider.+
  
-For research groups, the main advantage lies in the increased flexibility it provides. They can now tailor their investments to suit the specific needs of their projects, scaling their usage as requiredThis eliminates the constraints of owning physical nodes and allows for more efficient allocation of resources.+Our service remains free in specific cases:   
 +  * **Free usage** as part of an educational course  
 +  * **Free usage** through the annual allocation of CPU hours 
  
-As the service providerwe benefit from this new investment model as well. We can now strategically purchase compute nodes and hardware based on the actual demand from research groups, ensuring that our investments align with their usage patternsThis allows us to optimize resource utilization and make timely acquisitions when needed.+For additional needspaid options are available:   
 +  * **[[ hpc:hpc_clusters#price_per_hour|Pay-per-hour]]** based on the FNS rate table  
 +  * **Purchase or rent** compute nodes for more intensive workloads.
  
-In cases where research groups have already purchased compute nodes, we offer them the opportunity to convert their ownership into credits for shares. We estimate that a compute node typically lasts for at least 6 years under normal conditions, and this conversion option ensures that the value of their existing investment is not lost. 
-</note> 
  
-At the end of September 2024 we sent out a {{ :hpc:communication_facturation_hpc_2025_vfinale.pdf |communication}} about the cost model that will start in January 2025, we'll integrate the important information into our documentation belowbut in the meantime you can have look at the pdfSee the cost section in our [[hpc:faq#cost|FAQ]]+**Summary:** 
 + 
 +  * Starting this year, you receive a **CPU hours credit** based on the hardware you own (if any) in the cluster (private partition). 
 +  * You can find instructions on how to check your annual credit here: [[accounting#resources_available_for_research_group|Resources Available for Research Groups]]. If you know your research group has bought some compute nodes but your PI doesn't appear in the report, please contact us. 
 +  * The credit calculation in the provided script assumes a **5-year hardware ownership period**. However, **if** this policy was introduced after your compute nodes were purchased, we have extended the production duration by two years. 
 +  * To ensure **flexibility and simplicity**, we have standardized resource usage by converting CPU Memory, and GPU hours into CPU hoursusing different conversion ratios depending on the GPU type. More details can be found here: [[accounting#resource_accounting_uniformization|Resource Accounting Uniformization]]. 
 +  * You can use your credit across all three clusters (**Baobab, Yggdrasil, and Bamboo**), not just on your private compute nodes. However, when using your own compute nodes, you will receive **higher priority**. 
 +  * To check your group's current resource usage, visit: [[accounting#report_and_statistics_with_sreport|Report and Statistics with sreport]].
 ==== Price per hour ==== ==== Price per hour ====
 +<WRAP center round important 60%>
 +Since we have [[https://doc.eresearch.unige.ch/hpc/accounting#resource_accounting_uniformization|unified our resources]], we refer to **CPUhours** in a generic way, which may include CPU hours, GPU hours, or memory usage. Please refer to the conversion table for details. This means that any previous references specifically mentioning GPUs should no longer be considered. We'll update the table.
 +</WRAP>
 +
 Overview: Overview:
 {{:hpc:pasted:20240404-092421.png}} {{:hpc:pasted:20240404-092421.png}}
Line 89: Line 102:
 You can find the whole table that you can send to the FNS {{:hpc:hpc:acrobat_2024-04-09_15-58-28.png?linkonly|here}}. You can find the whole table that you can send to the FNS {{:hpc:hpc:acrobat_2024-04-09_15-58-28.png?linkonly|here}}.
  
 +University of Genevea members will be charged the cost indicated by line "U1". The line U2 and U3 are for external users such as company that would use the cluster.
 +=== Free CPU Hour Allocation ===
 +Each PI (Principal Investigator) is entitled to 100,000 CPU hours per year free of charge. This allocation applies per PI, not per individual user. See [[hpc:accounting|how to check PI and user past usage]]..
 +
 +
 +=== Progressive Pricing for HPC Compute Hours ===
 +A tiered pricing model applies to compute hour billing. Discounts increase as usage grows: once you reach 200K, 500K, and 1,000K compute hours, an additional 10% reduction is applied at each threshold. This ensures cost efficiency for large-scale workloads.
 +
 +^ Usage (Compute Hours) ^ Discount Applied ^
 +| 0 – 199,999           | Base Rate       |
 +| 200,000 – 499,999     | Base Rate -10%  |
 +| 500,000 – 999,999     | Base Rate -20%  |
 +| 1,000,000+            | Base Rate -30%  |
 +===== Purchasing or Renting Private Compute Nodes =====
 +
 +Research groups have the option to purchase or rent "private" compute nodes to expand the resources available in our clusters. This arrangement provides the group with a **private partition**, granting higher priority access to the specified nodes (resulting in reduced wait times) and extended job runtimes of up to **7 days** (compared to 4 days for public compute nodes).  
 +
 +==== Key Rules and Details ====
 +
 +  * **Shared Integration**: The compute node is added to the corresponding shared partition. Other users may utilize it when the owning group is not using it. For details, refer to the [[hpc/slurm#partitions|partitions]] section.
 +  * **Usage Limit**: Each research group may consume up to **60% of the theoretical usage credit associated with the compute node**. This policy ensures fair access to shared cluster resources. . See  the [[hpc:hpc_clusters#usage_limits|Usage limits]] policy for more details
 +  * **Cost**: In addition to the base cost of the compute node, a **15% surcharge** is applied to cover operational expenses such as cables, racks, switches, and storage (not yet valid).
 +  * **Ownership Period**: The compute node remains the property of the research group for **5 years**. After this period, the node may remain in production but will only be accessible via public and shared partitions.
 +  * **Warranty and Repairs**: Nodes come with a **3-year warranty**. If the node fails after this period, the research group is responsible for **100% of repair costs**. Repairing the node involves sending it to the vendor for diagnostics and a quote, with a maximum diagnostic fee of **420 CHF**, even if the node is irreparable.
 +  * **Administrative Access**: The research group does not have administrative rights over the node.
 +  * **Maintenance**: The HPC team handles the installation and maintenance of the compute node, ensuring it operates consistently with other nodes in the cluster.
 +  * **Decommissioning**: The HPC team may decommission the node if it becomes obsolete, but it will remain in production for at least **5 years**.
  
 ==== Cost of Renting a Compute Node ==== ==== Cost of Renting a Compute Node ====
Line 95: Line 135:
  
 For example, consider a CPU compute node with a vendor price of **14,361 CHF**. Adding **15% for extra costs** brings the total to **16,515.15 CHF**. Dividing this by 60 months (5 years) results in a monthly rental cost of approximately **275.25 CHF**. For example, consider a CPU compute node with a vendor price of **14,361 CHF**. Adding **15% for extra costs** brings the total to **16,515.15 CHF**. Dividing this by 60 months (5 years) results in a monthly rental cost of approximately **275.25 CHF**.
 +
 +Currently, we only rent standard AMD compute nodes, which have two 64-CPU cores and 512 GB of RAM. You will receive 1.34 million billing credits per year with this model.
 +
 +The minimum rental period for a compute node is six months. Any unused allocated resources will be lost at the end of the year.
  
 For more details or to request a specific quote, please contact the HPC support team. For more details or to request a specific quote, please contact the HPC support team.
Line 100: Line 144:
 ==== Usage Limits ==== ==== Usage Limits ====
  
-Users are entitled to utilize up to 60% of the computational resources they own or rent within the cluster. For example, if you rent a compute node with 128 CPU cores for one year, you will receive a total credit of **128 (cores) × 24 (hours) × 365 (days) × 0.6 (max usage rate) = 672,768 core-hours**. This credit can be used across any of our three clusters -- Bamboo, Baobab, and Yggdrasil -- regardless of where the compute node was rented or purchased.+Users are entitled to utilize up to 60% of the computational resources they own or rent within the cluster. Example calculation if you rent a compute node with 128 CPU cores and 512GB RAM for one year:
  
-The key distinction when using your own resources is that you benefit from a higher scheduling priority, ensuring quicker access to computational resourcesFor more details, please refer to the HPC usage policy or contact the support team.+  * **CPU contribution:** 128 cores × 1.0 (factor)   
 +  * **Memory contribution:** 512 GB × 0.25 (factor)   
 +  * **Time period:** 24 hours × 365 days   
 +  * **Max usage rate:** 0.6  
  
-===== Purchasing or Renting Private Compute Nodes =====+Total:   
 +(128 × 1.0 + 512 × 0.25) × 24 × 365 × 0.6 **1,342,848 core-hours**
  
-Research groups have the option to purchase or rent "private" compute nodes to expand the resources available in our clusters. This arrangement provides the group with a **private partition**granting higher priority access to the specified nodes (resulting in reduced wait times) and extended job runtimes of up to **7 days** (compared to 4 days for public compute nodes) +This credit can be used across any of our three clusters -- Bamboo, Baobab, and Yggdrasil -- regardless of where the compute node was rented or purchased.
  
-==== Key Rules and Details ====+The main advantage is that you are not restricted to using your private nodes, but can access the three clusters and even the GPUs. 
 + 
 +We are developing scripts to allow to check the usage and the amount of hours you have the right to use regarding the hardware your group owns. 
 + 
 +The key distinction when using your own resources is that you benefit from a higher scheduling priority, ensuring quicker access to computational resources. As well, you are not restricted to using your private nodes, but can access the three clusters and even the GPUs.  
 + 
 +For more details, please contact the HPC support team.
  
-  * **Shared Integration**: The compute node is added to the corresponding shared partition. Other users may utilize it when the owning group is not using it. For details, refer to the [[hpc/slurm#partitions|partitions]] section. 
-  * **Maximum Usage**: Research groups can utilize up to **60% of the node's maximum theoretical computational capacity**. This ensures fair access to shared resources. 
-  * **Cost**: In addition to the base cost of the compute node, a **15% surcharge** is applied to cover operational expenses such as cables, racks, switches, and storage. 
-  * **Ownership Period**: The compute node remains the property of the research group for **5 years**. After this period, the node may remain in production but will only be accessible via public and shared partitions. 
-  * **Warranty and Repairs**: Nodes come with a **3-year warranty**. If the node fails after this period, the research group is responsible for **100% of repair costs**. Repairing the node involves sending it to the vendor for diagnostics and a quote, with a maximum diagnostic fee of **420 CHF**, even if the node is irreparable. 
-  * **Administrative Access**: The research group does not have administrative rights over the node. 
-  * **Maintenance**: The HPC team handles the installation and maintenance of the compute node, ensuring it operates consistently with other nodes in the cluster. 
-  * **Decommissioning**: The HPC team may decommission the node if it becomes obsolete, but it will remain in production for at least **5 years**. 
  
 ==== CPU and GPU server example pricing ==== ==== CPU and GPU server example pricing ====
Line 131: Line 177:
   * ~ 14'442.55 CHF TTC   * ~ 14'442.55 CHF TTC
  
 +
 +  * 2 x 96 Core AMD EPYC 9754 2.4GHz Processor
 +  * 768GB DDR45 4800MHz Memory (24x32GB)
 +  * 100G IB EDR card
 +  * 960GB SSD
 +  * ~ 16'464 CHF TTC
 +
 +Key differences:
 +  * + 9754 has higher memory performance of up to 460.8 GB/s vs 7763 which has 190.73 GB/s
 +  * + 9754 has a bigger cache
 +  * - 9754 is more expensive
 +  * - power consumption is 400W for 9754 vs 240W for 7763 
 +  * - 9754 is more difficult to cool as the inlet temperature for air colling must be 22° max
 === GPU H100 with AMD=== === GPU H100 with AMD===
  
Line 150: Line 209:
 We usually install and order the nodes twice per year. We usually install and order the nodes twice per year.
  
-If you want to ask a financial contribution from UNIGE you must complete a COINF application : https://www.unige.ch/rectorat/commissions/coinf/appel-a-projets+If you want to ask a financial contribution from UNIGE you must complete submit request to the [[https://www.unige.ch/rectorat/commissions/coinf/appel-a-projets 
 +|COINF]]. 
 +====== Use Baobab for teaching ====== 
 + 
 +Baobab, our HPC infrastructure, supports educators in providing students with hands-on HPC experience.  
 + 
 +Teachers can request access via [dw.unige.ch](final link to be added later, use hpc@unige.ch in the meantime), and once the request is fulfilled, a special account named <PI_NAME>_teach will be created for the instructor. Students must specify this account when submitting jobs for course-related work (e.g., <nowiki>--account=<PI_NAME>_teach</nowiki>).  
 + 
 +A shared storage space can also be created optionally, accessible at ''/home/share/<PI_NAME>_teach'' and/or ''/srv/beegfs/scratch/shares/<PI_NAME>_teach''.  
 + 
 +**All student usage is free of charge if they submit their job to the correct account**.  
 + 
 +We strongly recommend that teachers use and promote our user-friendly web portal at [[hpc:how_to_use_openondemand|OpenOndDemand]] which supports tools like Matlab, JupyterLab, and more. Baobab helps integrate real-world computational tools into curricula, fostering deeper learning in HPC technologies. 
  
 ====== How do I use your clusters ? ====== ====== How do I use your clusters ? ======
Line 177: Line 249:
  
 Both clusters contain a mix of "public" nodes provided by the University of Geneva, a "private" nodes in  Both clusters contain a mix of "public" nodes provided by the University of Geneva, a "private" nodes in 
-general paid 50% by the University and 50% by a research group for instance. Any user of the clusters can +general funded 50% by the University through the [[https://www.unige.ch/rectorat/commissions/coinf/appel-a-projets 
 +|COINF]] and 50% by a research group for instance. Any user of the clusters can 
 request compute resources on any node (public and private), but a research group who owns "private" nodes has  request compute resources on any node (public and private), but a research group who owns "private" nodes has 
 a higher priority on its "private" nodes and can request a longer execution time. a higher priority on its "private" nodes and can request a longer execution time.
  
  
-==== GPUs models on the clusters ==== +==== CPUs models available ==== 
-We have several GPU models on the clusterYou can find here a table of what is available.+Several CPU models are available across the three clustersThe table below summarizes the available resources.
  
-On Baobab+^ Model ^ Generation ^ Architecture ^ Cores per Socket ^ Freq ^ 
 +| [[https://www.intel.fr/content/www/fr/fr/products/sku/81706/intel-xeon-processor-e52660-v3-25m-cache-2-60-ghz/specifications.html | E5-2660V0]] | V3 | Sandy Bridge EP | 8 |  | 
 +| [[https://www.intel.com/content/www/us/en/products/sku/81900/intel-xeon-processor-e52643-v3-20m-cache-3-40-ghz/specifications.html | E5-2643V3]] | V5 | Haswell-EP | 6 | 3.4GHz | 
 +| [[https://www.intel.fr/content/www/fr/fr/products/sku/92981/intel-xeon-processor-e52630-v4-25m-cache-2-20-ghz/specifications.html | E5-2630V4]] | V6 | Broadwell-EP | 10 | 2.2GHz | 
 +| [[https://www.intel.com/content/www/us/en/products/sku/92983/intel-xeon-processor-e52637-v4-15m-cache-3-50-ghz/specifications.html | E5-2637V4]] | V6 | Broadwell-EP | 4 | 2.2GHz | 
 +| [[https://www.intel.com/content/www/us/en/products/sku/92989/intel-xeon-processor-e52643-v4-20m-cache-3-40-ghz/specifications.html | E5-2643V4]] | V6 | Broadwell-EP | 6 | 3.4GHz | 
 +| [[https://www.intel.com/content/www/us/en/products/sku/91754/intel-xeon-processor-e52680-v4-35m-cache-2-40-ghz/specifications.html | E5-2680V4]] | V6 | Broadwell-EP | 14 | 2.4GHz | 
 +| [[https://www.amd.com/en/support/downloads/drivers.html/processors/epyc/epyc-7001-series/amd-epyc-7601.html#amd_support_product_spec | EPYC-7601]] | V7 | Naples | 32 | 2.2GHz | 
 +| [[https://www.amd.com/en/products/processors/server/epyc/7002-series.html | EPYC-7302P]] | V8 | Rome | 16 | 3.0GHz | 
 +| EPYC-7742 | V8 | Rome | 64 | 2.25GHz | 
 +| [[https://ark.intel.com/content/www/us/en/ark/products/193954/intel-xeon-gold-6234-processor-24-75m-cache-3-30-ghz.html | GOLD-6234]] | V9 | Cascade Lake | 8 | 3.30GHz | 
 +| [[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html | GOLD-6240]] | V9 | Cascade Lake | 18 | 2.60GHz | 
 +| [[https://ark.intel.com/content/www/us/en/ark/products/192442/intel-xeon-gold-6244-processor-24-75m-cache-3-60-ghz.html | GOLD-6244]] | V9 | Cascade Lake | 8 | 3.60GHz | 
 +| [[https://ark.intel.com/content/www/fr/fr/ark/products/193390/intel-xeon-silver-4208-processor-11m-cache-2-10-ghz.html | SILVER-4208]] | V9 | Cascade Lake | 8 | 2.10GHz | 
 +| [[https://www.intel.com/content/www/us/en/products/sku/197098/intel-xeon-silver-4210r-processor-13-75m-cache-2-40-ghz/specifications.html | SILVER-4210R]] | V9 | Cascade Lake | 10 | 2.6GHz | 
 +| [[https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-72f3.html | EPYC-72F3]] | V10 | Milan | 8 | 3.7GHz | 
 +| [[https://www.amd.com/fr/products/processors/server/epyc/7003-series/amd-epyc-7763.html | EPYC-7763]] | V10 | Milan | 64 | 2.45GHz | 
 +| [[https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9554.html | EPYC-9554]] | V11 | Genoa | 64 | 3.10GHz | 
 +| [[https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9654.html | EPYC-9654]] | V12 | Genoa | 96 | 3.70GHz | 
 +| [[https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9754.html | EPYC-9754]] | V13 | Genoa | 128 | 3.70GHz |
  
-^ Model        ^ Memory ^GRES                        ^ old GRES   ^ Constraint gpu arch  ^ Compute Capability     ^ minimum CUDA version ^ Precision            ^ Feature                     ^ Weight  | 
-| Titan X      | 12GB   | nvidia_titan_x             | titan  | COMPUTE_TYPE_TITAN   |COMPUTE_CAPABILITY_6_1  | 8.0                  | SIMPLE_PRECISION_GPU | COMPUTE_MODEL_TITAN_X_12G   | 10      | 
-| P100         | 12GB   | tesla_p100-pcie-12gb       | pascal | COMPUTE_TYPE_PASCAL  |COMPUTE_CAPABILITY_6_0  | 8.0                  | DOUBLE_PRECISION_GPU | COMPUTE_MODEL_P100_12G      | 20      | 
-| RTX 2080 Ti  | 11GB   | nvidia_geforce_rtx_2080_ti | turing | COMPUTE_TYPE_TURING  |COMPUTE_CAPABILITY_7_5  | 10.0                 | SIMPLE_PRECISION_GPU | COMPUTE_MODEL_RTX_2080_11G  | 30      | 
-| RTX 3080     | 10GB   | nvidia_geforce_rtx_3080    | ampere | COMPUTE_TYPE_AMPERE  |COMPUTE_CAPABILITY_8_6  | 11.1                 | SIMPLE_PRECISION_GPU | COMPUTE_MODEL_RTX_3080_10G  | 40      | 
-| RTX 3090     | 25GB   | nvidia_geforce_rtx_3090    | ampere | COMPUTE_TYPE_AMPERE  |COMPUTE_CAPABILITY_8_6  | 11.1                 | SIMPLE_PRECISION_GPU | COMPUTE_MODEL_RTX_3090_25G  | 50      | 
-| RTX A5000    | 25GB   | nvidia_rtx_a5000           | ampere | COMPUTE_TYPE_AMPERE  |COMPUTE_CAPABILITY_8_6  | 11.1                 | SIMPLE_PRECISION_GPU | COMPUTE_MODEL_RTX_A5000_25G | 50      | 
-| RTX A5500    | 24GB   | nvidia_rtx_a5500           | ampere | COMPUTE_TYPE_AMPERE  |COMPUTE_CAPABILITY_8_6  | 11.1                 | SIMPLE_PRECISION_GPU | COMPUTE_MODEL_RTX_A5500_24G | 50      | 
-| RTX A6000    | 48GB   | nvidia_rtx_a6000           | ampere | COMPUTE_TYPE_AMPERE  |COMPUTE_CAPABILITY_8_6  | 11.1                 | SIMPLE_PRECISION_GPU | COMPUTE_MODEL_RTX_A6000_48G | 70      | 
-| A100         | 40GB   | nvidia_a100-pcie-40gb      | ampere | COMPUTE_TYPE_AMPERE  |COMPUTE_CAPABILITY_8_0  | 11.0                 | DOUBLE_PRECISION_GPU | COMPUTE_MODEL_A100_40G      | 60      | 
-| A100         | 80GB   | nvidia_a100_80gb_pcie      | ampere | COMPUTE_TYPE_AMPERE  |COMPUTE_CAPABILITY_8_0  | 11.0                 | DOUBLE_PRECISION_GPU | COMPUTE_MODEL_A100_80G      | 70      | 
-| RTX 4090     | 24GB   | nvidia_geforce_rtx_4090    | -      | -                    |COMPUTE_CAPABILITY_8_9  |                      |                      |                                     | 
  
 +==== GPUs models available ====
 +Several GPU models are available across the three clusters. The table below summarizes the available resources.
  
-If more than one GPU model can be selected if you didn't specify a constraint, they are allocated in the the same order as they are listed in the tableThe low end GPU first (GPU with a lower weight are selected first).+^ Model ^ Memory ^ GRES ^ Constraint gpu arch ^ Compute Capability ^ CUDA min → max ^ Feature ^ Billing Weight ^ 
 +| [[https://www.nvidia.com/fr-be/titan/titan-rtx/ | Titan RTX]] | 24GB | nvidia_titan_rtx | COMPUTE_TYPE_TURING | COMPUTE_CAPABILITY_7_5 | 10.0 → 13.0 | COMPUTE_MODEL_NVIDIA_TITAN_RTX | 1 | 
 +| Titan X | 12GB | nvidia_titan_x | COMPUTE_TYPE_PASCAL | COMPUTE_CAPABILITY_6_1 | 8.0 → 12.9 | COMPUTE_MODEL_NVIDIA_TITAN_X | 1 | 
 +| [[https://www.nvidia.com/en-in/data-center/tesla-p100/ | P100]] | 12GB | tesla_p100-pcie-12gb | COMPUTE_TYPE_PASCAL | COMPUTE_CAPABILITY_6_0 | 8.0 → 12.9 | COMPUTE_MODEL_TESLA_P100_PCIE_12GB | 1 | 
 +| [[https://www.nvidia.com/en-us/geforce/20-series/ | RTX 2080 Ti]] | 11GB | nvidia_geforce_rtx_2080_ti | COMPUTE_TYPE_TURING | COMPUTE_CAPABILITY_7_5 | 10.0 → 13.0 | COMPUTE_MODEL_NVIDIA_GEFORCE_RTX_2080_TI | 2 | 
 +| [[https://www.nvidia.com/fr-fr/geforce/graphics-cards/30-series/rtx-3080-3080ti/ | RTX 3080]] | 10GB | nvidia_geforce_rtx_3080 | COMPUTE_TYPE_AMPERE | COMPUTE_CAPABILITY_7_0 | 11.0 → 13.0 | COMPUTE_MODEL_NVIDIA_GEFORCE_RTX_3080 | 3 | 
 +| [[https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf | V100]] | 32GB | tesla_v100-pcie-32gb | COMPUTE_TYPE_VOLTA | COMPUTE_CAPABILITY_7_0 | 9.0 → 12.9 | COMPUTE_MODEL_TESLA_V100_PCIE_32GB | 3 | 
 +| [[https://www.nvidia.com/en-us/data-center/a100/ | A100 40GB]] | 40GB | nvidia_a100-pcie-40gb | COMPUTE_TYPE_AMPERE | COMPUTE_CAPABILITY_8_0 | 11.0 → 13.0 | COMPUTE_MODEL_NVIDIA_A100_PCIE_40GB | 5 | 
 +| [[https://www.nvidia.com/fr-fr/geforce/graphics-cards/30-series/rtx-3090-3090ti/ | RTX 3090]] | 24GB | nvidia_geforce_rtx_3090 | COMPUTE_TYPE_AMPERE | COMPUTE_CAPABILITY_8_6 | 11.0 → 13.0 | COMPUTE_MODEL_NVIDIA_GEFORCE_RTX_3090 | 5 | 
 +| [[https://www.nvidia.com/en-us/products/workstations/rtx-a5000/ | RTX A5000]] | 25GB | nvidia_rtx_a5000 | COMPUTE_TYPE_AMPERE | COMPUTE_CAPABILITY_8_6 | 11.0 → 13.0 | COMPUTE_MODEL_NVIDIA_RTX_A5000 | 5 | 
 +| [[https://www.nvidia.com/en-us/products/workstations/rtx-a5500/ | RTX A5500]] | 24GB | nvidia_rtx_a5500 | COMPUTE_TYPE_AMPERE | COMPUTE_CAPABILITY_8_6 | 11.0 → 13.0 | COMPUTE_MODEL_NVIDIA_RTX_A5500 | 5 | 
 +| [[https://www.nvidia.com/en-us/data-center/a100/ | A100 80GB]] | 80GB | nvidia_a100_80gb_pcie | COMPUTE_TYPE_AMPERE | COMPUTE_CAPABILITY_8_0 | 11.0 → 13.0 | COMPUTE_MODEL_NVIDIA_A100_80GB_PCIE | 8 | 
 +| [[https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4090/ | RTX 4090]] | 24GB | nvidia_geforce_rtx_4090 | COMPUTE_TYPE_ADA | COMPUTE_CAPABILITY_8_9 | 11.8 → 13.0 | COMPUTE_MODEL_NVIDIA_GEFORCE_RTX_4090 | 8 | 
 +| [[https://www.nvidia.com/en-us/products/workstations/rtx-a6000/ | RTX A6000]] | 48GB | nvidia_rtx_a6000 | COMPUTE_TYPE_AMPERE | COMPUTE_CAPABILITY_8_6 | 11.0 → 13.0 | COMPUTE_MODEL_NVIDIA_RTX_A6000 | 8 | 
 +| [[https://www.nvidia.com/en-us/products/workstations/rtx-5000/ | RTX 5000]] | 32GB | nvidia_rtx_5000 | COMPUTE_TYPE_ADA | COMPUTE_CAPABILITY_8_9 | 11.8 → 13.0 | COMPUTE_MODEL_NVIDIA_RTX_5000 | 9 | 
 +| [[https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/ | RTX 5090]] | 32GB | nvidia_geforce_rtx_5090 | COMPUTE_TYPE_BLACKWELL | COMPUTE_CAPABILITY_12_0 | 12.8 → 13.0 | COMPUTE_MODEL_NVIDIA_GEFORCE_RTX_5090 | 10 | 
 +| [[https://www.nvidia.com/en-us/data-center/h100/ | H100]] | 94GB | nvidia_h100_nvl | COMPUTE_TYPE_HOPPER | COMPUTE_CAPABILITY_9_0 | 11.8 → 13.0 | COMPUTE_MODEL_NVIDIA_H100_NVL | 14 | 
 +| [[https://www.nvidia.com/en-us/data-center/rtx-pro-6000-blackwell-server-edition/ | RTX Pro 6000]] | 96GB | nvidia_rtx_pro_6000_blackwell | COMPUTE_TYPE_BLACKWELL | COMPUTE_CAPABILITY_9_0 | 12.8 → 13.0 | COMPUTE_MODEL_NVIDIA_RTX_PRO_6000_BLACKWELL | 16 | 
 +| [[https://www.nvidia.com/en-us/data-center/h200/ | H200]] | 141GB | nvidia_h200_nvl | COMPUTE_TYPE_HOPPER | COMPUTE_CAPABILITY_9_0 | 11.8 → 13.0 | COMPUTE_MODEL_NVIDIA_H200_NVL | 17 |
  
  
  
- 
-On Yggdrasil 
- 
-^ Model        ^ Memory ^ GRES                 ^ old  GRES ^ Constraint gpu arch  ^ Compute Capability     ^ Precision             ^ Feature                       ^ Weight | 
-| Titan RTX    | 24GB   | nvidia_titan_rtx     | turing    | COMPUTE_TYPE_TURING   |COMPUTE_CAPABILITY_7.5  | SIMPLE_PRECISION_GPU | COMPUTE_MODEL_TITAN_RTX_24G   | 10     | 
-| V100         | 32GB   | tesla_v100-pcie-32gb | volta     | COMPUTE_TYPE_VOLTA    |COMPUTE_CAPABILITY_7.0  | DOUBLE_PRECISION_GPU | COMPUTE_MODEL_VOLTA_V100_32G  | 20     | 
- 
- 
-When you request a GPU, you can either specify no model at all or you can give specific constraints  
-such as double precision. 
- 
- 
- 
- 
-<note tip>If you are doing machine learning for example, you DON'T need double precision. Double precision is  
-useful for software doing for example physical numerical simulations.</note> 
  
 <note tip>We don't have mixed GPUs models on the same node. Every GPU node has only one GPU model.</note> <note tip>We don't have mixed GPUs models on the same node. Every GPU node has only one GPU model.</note>
Line 229: Line 313:
 ==== Bamboo ==== ==== Bamboo ====
  
-=== CPUs on Bamboo === +=== CPU MODELS — bamboo ===  
 + 
 +^ Model ^ Generation ^ Architecture ^ Freq ^ Nb core ^ Memory ^ Nodeset ^ 
 +| EPYC-7742 | V8 | Rome | 2.25GHz | 128 | 251GB | cpu[049-052] | 
 +| EPYC-7742 | V8 | Rome | 2.25GHz | 128 | 512GB | cpu[001-043] | 
 +| [[https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-72f3.html | EPYC-72F3]] | V10 | Milan | 3.7GHz | 128 | 1024GB | cpu[044-045] | 
 +| [[https://www.amd.com/fr/products/processors/server/epyc/7003-series/amd-epyc-7763.html | EPYC-7763]] | V10 | Milan | 2.45GHz | 128 | 512GB | cpu[046-048] | 
  
-^ Generation ^ Model     ^ Freq    ^ Nb cores ^ Architecture               ^ Nodes         ^ Memory              ^Extra flag    ^ Status            ^ 
-| V8         | EPYC-7742 | 2.25GHz | 128 cores| "Rome" (7 nm)              | cpu[001-043],gpu[001-002] | 512GB                            | on prod           | 
-| V10        | EPYC-72F3 | 3.7GHz  | 16 cores | "Milan" (7 nm)              | cpu[044-045] | 1TB                 |BIG_MEM       | on prod           | 
-| V8         | EPYC-7302P| 3.0GHz  | 16 cores | "Rome" (7 nm)              | gpu003 | 512GB                      | on prod           | 
 === GPUs on Bamboo === === GPUs on Bamboo ===
  
-GPU model   ^ Architecture ^ Mem  ^ Compute Capability ^ Slurm resource Nb per node ^ Nodes            Peer access between GPUs +Model Memory per GPU Nodeset 
-RTX 3090    Ampere       25GB 8.6                ampere         8           | gpu[001,002]     NO                       +[[https://www.nvidia.com/en-us/data-center/a100/ A100 80GB]] 80GB gpu003 | 
-A100        Ampere       80GB 8.0                ampere         4           | gpu[003        YES                      |+| [[https://www.nvidia.com/fr-fr/geforce/graphics-cards/30-series/rtx-3090-3090ti/ RTX 3090]] 24GB | gpu[001-002] | 
 +| [[https://www.nvidia.com/en-us/geforce/graphics-cards/50-series/rtx-5090/ | RTX 5090]] | 32GB | gpu[009-010] 
 +[[https://www.nvidia.com/en-us/data-center/h100/ H100]] 94GB gpu004 | 
 +| [[https://www.nvidia.com/en-us/data-center/h200/ H200]] 141GB | gpu[005-006] | 
 +| [[https://www.nvidia.com/en-us/data-center/rtx-pro-6000-blackwell-server-edition/ | RTX Pro 6000]] | 96GB | gpu[007-008,011] |
  
  
 ==== Baobab ==== ==== Baobab ====
  
-=== CPUs on Baobab === 
  
 Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table. Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table.
  
-^ Generation ^ Model     ^ Freq    ^ Nb cores Architecture               Nodes                                             ^Extra flag      ^ Status                       | +=== CPU MODELS — baobab === 
-V2         | X5650     | 2.67GHz | 12 cores | "Westmere-EP" (32 nm)      | cpu[093-101,103-111,140-153                                      decommissioned               | +^ Model ^ Generation ^ Architecture ^ Freq ^ Nb core Memory Nodeset 
-| V3         | E5-2660V0 | 2.20GHz 16 cores "Sandy Bridge-EP" (32 nm)  cpu[009-010,012-018,020-025,029-044]              |                | decommissioned in 2023       | +[[https://www.intel.fr/content/www/fr/fr/products/sku/92981/intel-xeon-processor-e52630-v4-25m-cache-2-20-ghz/specifications.html E5-2630V4]] V6 Broadwell-EP | 2.2GHz 20 86GB | cpu199 | 
-| V3         | E5-2660V0 | 2.20GHz 16 cores | "Sandy Bridge-EP" (32 nm)  | cpu[011,019,026-028,042                                        decommissioned in 2024       | +| [[https://www.intel.fr/content/www/fr/fr/products/sku/92981/intel-xeon-processor-e52630-v4-25m-cache-2-20-ghz/specifications.html E5-2630V4]] | V6 Broadwell-EP | 2.2GHz 20 94GB | cpu[193-198,200-201,205-213,220-229,237-244,247-264] | 
-| V3         | E5-2660V0 | 2.20GHz 16 cores "Sandy Bridge-EP" (32 nm)  | cpu[001-005,007-008,045-056,058]                  |                | to be decommissioned in 2024 | +[[https://www.intel.fr/content/www/fr/fr/products/sku/92981/intel-xeon-processor-e52630-v4-25m-cache-2-20-ghz/specifications.html E5-2630V4]] | V6 Broadwell-EP | 2.2GHz 20 224GB cpu246 
-| V3         | E5-2670V0 | 2.60GHz | 16 cores | "Sandy Bridge-EP" (32 nm)  | cpu[059,061-062                                                | to be decommissioned in 2024 +[[https://www.intel.fr/content/www/fr/fr/products/sku/92981/intel-xeon-processor-e52630-v4-25m-cache-2-20-ghz/specifications.html E5-2630V4]] | V6 | Broadwell-EP | 2.2GHz | 20 | 251GB | cpu245 | 
-V3         | E5-4640V0 | 2.40GHz 32 cores | "Sandy Bridge-EP" (32 nm)  | cpu[186                                                        to be decommissioned in 2024 | +| [[https://www.intel.com/content/www/us/en/products/sku/92983/intel-xeon-processor-e52637-v4-15m-cache-3-50-ghz/specifications.html | E5-2637V4]] | V6 | Broadwell-EP 2.2GHz | 8 | 503GB | cpu[218-219] | 
-| V4         | E5-2650V2 | 2.60GHz 16 cores "Ivy Bridge-EP" (22 nm)    cpu[063-066,154-172]                              |                | to be decommissioned in 2025 +[[https://www.intel.com/content/www/us/en/products/sku/92989/intel-xeon-processor-e52643-v4-20m-cache-3-40-ghz/specifications.html | E5-2643V4]] | V6 | Broadwell-EP | 3.4GHz | 12 | 62GB | cpu[202,216-217] | 
-V5         | E5-2643V3 | 3.40GHz 12 cores | "Haswell-EP" (22 nm)       | cpu[202,216-217gpu[002                                                    | on prod                      | +[[https://www.intel.fr/content/www/fr/fr/products/sku/81706/intel-xeon-processor-e52660-v3-25m-cache-2-60-ghz/specifications.html E5-2660V0]] | V3 Sandy Bridge EP  16 62GB cpu001 | 
-| V6         E5-2630V4 | 2.20GHz | 20 cores "Broadwell-EP" (14 nm)     cpu[173-185,187-201,205-213],gpu[004-010                                    | on prod                      |  +| [[https://www.intel.com/content/www/us/en/products/sku/91754/intel-xeon-processor-e52680-v4-35m-cache-2-40-ghz/specifications.html E5-2680V4]] V6 Broadwell-EP 2.4GHz 28 | 503GB | cpu203 
-| V6         E5-2637V4 3.50GHz | 8 cores  "Broadwell-EP" (14 nm)     | cpu[218-219]                                      | HIGH_FREQUENCY | on prod                      +[[https://www.amd.com/en/support/downloads/drivers.html/processors/epyc/epyc-7002-series/amd-epyc-7742.html | EPYC-7742]] | V8 | Rome | 2.25GHz | 128 | 503GB | cpu[273-277,285-307,314-335] 
-V6         | E5-2643V4 | 3.40GHz | 12 cores "Broadwell-EP" (14 nm)     | cpu[202,204,216-217]                              | HIGH_FREQUENCY | on prod                      +[[https://www.amd.com/en/support/downloads/drivers.html/processors/epyc/epyc-7002-series/amd-epyc-7742.html | EPYC-7742]] | V8 Rome | 2.25GHz | 128 | 1007GB | cpu[312-313] 
-V6         | E5-2680V4 | 2.40GHz 28 cores | "Broadwell-EP" (14 nm)     | cpu[203],gpu[012                                               on prod                      | +[[https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9654.html | EPYC-9654]] | V12 | Genoa | 3.70GHz | 192 | 768GB | cpu[350,352] | 
-V7         EPYC-7601 | 2.20GHz 64 cores "Naples" (14 nm)           gpu[011]                                                         on prod                      +| [[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html | GOLD-6240]] | V9 | Cascade Lake | 2.60GHz | 36 | 187GB | cpu[084-090,265-272,278-284,308-311,336-349] | 
-V8         | EPYC-7742 | 2.25GHz | 128 cores"Rome" (7 nm)              | cpu[273-277,285-307,312-335],gpu[013-046                       on prod                      +[[https://ark.intel.com/content/www/us/en/ark/products/192442/intel-xeon-gold-6244-processor-24-75m-cache-3-60-ghz.html GOLD-6244]] | V9 Cascade Lake | 3.60GHz 16 754GB | cpu351 | 
-V9         | GOLD-6240 | 2.60GHz | 36 cores "Cascade Lake" (14 nm)     | cpu[084-090,265-272,278-284,308-311]                                     | on prod                      +(baobab)-[root@admin1 slurm(master *)$  
-V10         | EPYC-7763 | 2.45GHz 128 cores| "Milan" (7 nm)            | gpu[047,048                                                        on prod                      | + 
-| V11         | EPYC-9554 | 3.10GHz 128 cores"Genoa" (5 nm           | gpu[049                                                        | on prod                      |+
  
  
Line 286: Line 376:
 In the following table you can see which type of GPU is available on Baobab. In the following table you can see which type of GPU is available on Baobab.
  
-GPU model   Architecture ^ Mem  ^ Compute Capability^Slurm resource              ^ Legacy Slurm resource^Nb per nodeNodes            +Model Memory per GPU Nodeset 
-Titan X     | Pascal       | 12GB  | 6.1               | nvidia_titan_x             | titan                | 6         | gpu[002]         | +| [[https://www.nvidia.com/en-us/data-center/a100/ A100 40GB]] 40GB | gpu[020,022,027-028,030-031] | 
-| P100        | Pascal       | 12GB  | 6.0               | tesla_p100-pcie-12gb       pascal               6         | gpu[004]         | +[[https://www.nvidia.com/en-us/data-center/a100/ A100 80GB]] 80GB | gpu[027,029,032-033,045] | 
-| P100        | Pascal       | 12GB  | 6.0               | tesla_p100-pcie-12gb       | pascal               | 5         | gpu[005        +| [[https://www.nvidia.com/en-us/geforce/20-series/ | RTX 2080 Ti]] | 11GB | gpu[011,013-016,018-019] | 
-P100        | Pascal       | 12GB  | 6.0               | tesla_p100-pcie-12gb       pascal               8         | gpu[006]         | +[[https://www.nvidia.com/fr-fr/geforce/graphics-cards/30-series/rtx-3080-3080ti/ RTX 3080]] 10GB | gpu[023-024,036-043] | 
-| P100        | Pascal       | 12GB  | 6.0               | tesla_p100-pcie-12gb       | pascal               | 4         | gpu[007        +| [[https://www.nvidia.com/fr-fr/geforce/graphics-cards/30-series/rtx-3090-3090ti/ | RTX 3090]24GB | gpu[017,021,025-026,034-035] | 
-Titan X     | Pascal       | 12GB  | 6.1               | nvidia_titan_x             | titan                | 7         | gpu[008]         | +| [[https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4090/ | RTX 4090]] | 24GB gpu049 | 
-| Titan X     | Pascal       | 12GB  | 6.1               | nvidia_titan_x             | titan                | 8         | gpu[009-010]     | +| [[https://www.nvidia.com/en-us/products/workstations/rtx-5000/ RTX 5000]] 32GB gpu050 
-| RTX 2080 Ti | Turing       | 11GB  | 7.5               | nvidia_geforce_rtx_2080_ti | turing               | 2         | gpu[011]         +[[https://www.nvidia.com/en-us/products/workstations/rtx-a5000/ RTX A5000]] 25GB | gpu[044,047] | 
-RTX 2080 Ti | Turing       | 11GB  | 7.5               nvidia_geforce_rtx_2080_ti turing               | 8         | gpu[012,015    +[[https://www.nvidia.com/en-us/products/workstations/rtx-a5500/ RTX A5500]] 24GB gpu046 
-RTX 2080 Ti | Turing       | 11GB  | 7.5                                          | turing               | 8         | gpu[013,016]     | +[[https://www.nvidia.com/en-us/products/workstations/rtx-a6000/ RTX A6000]] 48GB gpu048 
-| RTX 2080 Ti | Turing       | 11GB  | 7.5               | nvidia_geforce_rtx_2080_ti | turing               | 4         | gpu[018-019]     | +Titan X 12GB | gpu[002,008-010] | 
-| RTX 3090    | Ampere       | 25GB  | 8.6               | nvidia_geforce_rtx_3090    | ampere               | 8         | gpu[025                | +[[https://www.nvidia.com/en-in/data-center/tesla-p100/ P100]] 12GB | gpu[004-007] | 
-| RTX 3090    | Ampere       | 25GB  | 8.6               | nvidia_geforce_rtx_3090    | ampere               | 8         | gpu[017,021,026,034-035] | +
-RTX A5000   | Ampere       | 25GB  | 8.6               | nvidia_rtx_a5000           | ampere               | 8         | gpu[044,047]     | +
-| RTX A5500   | Ampere       | 25GB  | 8.6               | nvidia_rtx_a5500           | ampere               | 8         | gpu[046]           +
-| RTX A6000   | Ampere       | 48GB  | 8.6               | nvidia_rtx_a6000           | ampere               | 8         | gpu[048]           +
-| RTX 3080    | Ampere       | 10GB  | 8.6               | nvidia_geforce_rtx_3080    | ampere               | 8         | gpu[023-024,036-43] | +
-A100        Ampere       40GB  | 8.0               | nvidia_a100_40gb_pcie      | ampere               | 3         | gpu[027]         | +
-| A100        | Ampere       | 40GB  | 8.0               | nvidia_a100-pcie-40gb      ampere               6         gpu[022]         +
-A100        | Ampere       | 40GB  | 8.0               | nvidia_a100-pcie-40gb      ampere               1         | gpu[028        +
-A100        | Ampere       | 40GB  | 8.0               | nvidia_a100-pcie-40gb      ampere               4         gpu[020,030-031] +
-A100        | Ampere       | 80GB  | 8.0               | nvidia_a100-pcie-80gb      ampere               4         gpu[029]         +
-A100        Ampere       | 80GB  | 8.0               | nvidia_a100-pcie-80gb      | ampere               | 3         | gpu[032-033    +
-A100        | Ampere       | 80GB  | 8.0               | nvidia_a100-pcie-80gb      ampere               2         | gpu[045]         | +
-| RTX 4090    | Ada Lovelace | 24GB  | 8.9               | nvidia_geforce_rtx_4090    |                    | 8         | gpu[049        +
-| H200        | Hopper       | 144GB | xx                | xx                         | xx                   | 4         | gpu[0xx]         |+
  
    
Line 321: Line 398:
 ==== Yggdrasil ==== ==== Yggdrasil ====
  
-=== CPUs on Yggdrasil ===+=== CPU MODELS — yggdrasil ===
  
 Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table. Since our clusters are regularly expanded, the nodes are not all from the same generation. You can see the details in the following table.
  
-^ Generation ^ Model     ^ Freq    ^ Nb cores Architecture               Nodes                        ^Extra flag    + 
-V9 | [[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html|GOLD-6240]]   2.60GHz 36 cores  "Intel Xeon Gold 6240 CPU @ 2.60GHz  cpu[001-083,091-111,120-122]                          +^ Model ^ Generation ^ Architecture ^ Freq ^ Nb core Memory Nodeset 
-| V9 | [[https://ark.intel.com/content/www/us/en/ark/products/192442/intel-xeon-gold-6244-processor-24-75m-cache-3-60-ghz.html|GOLD-6244]]   3.60GHz 16 cores  "Intel Xeon Gold 6244 CPU @ 3.60GHz  | cpu[112-115            |              +EPYC-7742 | V8 | Rome | 2.25GHz | 128 | 503GB | cpu[123-124,135-150] | 
-V8 EPYC-7742    | 2.25GHz 128 cores "AMD EPYC 7742 64-Core Processor"      cpu[116-119,123-150]                          +| EPYC-7742 | V8 | Rome | 2.25GHz | 128 | 1007GB | cpu[125-134] | 
-| V9 | [[https://ark.intel.com/content/www/fr/fr/ark/products/193390/intel-xeon-silver-4208-processor-11m-cache-2-10-ghz.html|SILVER-4208]] | 2.10GHz 16 cores  "Intel Xeon Silver 4208 CPU @ 2.10GHz" gpu[001-006,008        |              +| [[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html | GOLD-6240]] | V9 Cascade Lake | 2.60GHz | 36 | 184GB cpu001 
-| V9 | [[https://ark.intel.com/content/www/us/en/ark/products/193954/intel-xeon-gold-6234-processor-24-75m-cache-3-30-ghz.html|GOLD-6234]]   | 3.30GHz | 16 cores  "Intel Xeon Gold 6234 CPU @ 3.30GHz"   gpu[007                              |+| [[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html | GOLD-6240]] | V9 Cascade Lake 2.60GHz | 36 | 187GB | cpu[002-057,059-082,091-097] | 
 +[[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html GOLD-6240]] | V9 | Cascade Lake | 2.60GHz 36 204GB cpu058 
 +| [[https://ark.intel.com/content/www/fr/fr/ark/products/192443/intel-xeon-gold-6240-processor-24-75m-cache-2-60-ghz.html | GOLD-6240]] | V9 Cascade Lake | 2.60GHz 36 | 1510GB | cpu[120-122] | 
 +| [[https://ark.intel.com/content/www/us/en/ark/products/192442/intel-xeon-gold-6244-processor-24-75m-cache-3-60-ghz.html | GOLD-6244]] | V9 | Cascade Lake | 3.60GHz | 16 | 754GB | cpu[113-115] | 
 +| [[https://www.amd.com/fr/products/processors/server/epyc/7003-series/amd-epyc-7763.html EPYC-7763]] | V10 | Milan | 2.45GHz | 128 | 503GB | cpu[151-158] | 
 +[[https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9654.html | EPYC-9654]] | V12 | Genoa | 3.70GHz | 192 | 773GB | cpu[159-164] | 
 + 
  
 The "generation" column is just a way to classify the nodes on our clusters. In the following table you can see the features of each architecture. The "generation" column is just a way to classify the nodes on our clusters. In the following table you can see the features of each architecture.
Line 347: Line 431:
 In the following table you can see which type of GPU is available on Yggdrasil. In the following table you can see which type of GPU is available on Yggdrasil.
  
-GPU model   Architecture ^ Mem  ^ Compute Capability ^ Slurm resource ^ Nb per node ^ Nodes            Peer access between GPUs +Model Memory per GPU Nodeset 
-Titan RTX   | Turing       | 24GB | 7.5                | turing         | 8           | gpu[001,002,004] | NO                       | +[[https://www.nvidia.com/fr-be/titan/titan-rtx/ | Titan RTX]| 24GB | gpu[001,003-007],gpustack 
-| Titan RTX   | Turing       | 24GB | 7.5                | turing         | 6           | gpu[003,005    | NO                       | +[[https://images.nvidia.com/content/technologies/volta/pdf/volta-v100-datasheet-update-us-1165301-r5.pdf V100]] | 32GB gpu008 | 
-| Titan RTX   | Turing       | 24GB | 7.5                | turing         | 4           | gpu[006,007]     | NO                       +
-V100        | Volta        | 32GB | 7.0                | volta          1           | gpu[008        YES                      |+
  
 Link to see the GPU details https://developer.nvidia.com/cuda-gpus#compute Link to see the GPU details https://developer.nvidia.com/cuda-gpus#compute
hpc/hpc_clusters.1737104468.txt.gz · Last modified: (external edit)