User Tools

Site Tools


hpc:faq

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
hpc:faq [2025/06/04 08:33] Yann Sagonhpc:faq [2025/12/08 13:25] (current) Yann Sagon
Line 27: Line 27:
 !!!There could be several reasons for the cluster to slow down. It’s important to figure out where the slowness is happening: !!!There could be several reasons for the cluster to slow down. It’s important to figure out where the slowness is happening:
  
-  * **Login Node**:If the login node feels slow, it might be because someone is running heavy processes on it, which isn’t recommended. The login node is meant for tasks like file editing, job submission, and monitoringnot running jobsIf another user is hogging the CPU resources, it could affect your experience, but this won’t impact the performance of jobs on the compute nodes.+  * **Login Node**:The login node is designed for light tasks such as file editing, job submission, and monitoringnot for running heavy computationsTo ensure fair usage and maintain responsiveness, each user is limited to 2 CPU cores and 8 GB of RAM on the login node
  
   * **Compute Nodes**: Slowness on the compute nodes might be due to high CPU usage, storage issues, or other factors, which could cause your jobs to run more slowly.   * **Compute Nodes**: Slowness on the compute nodes might be due to high CPU usage, storage issues, or other factors, which could cause your jobs to run more slowly.
Line 38: Line 38:
  
 =?=== Cost ==== =?=== Cost ====
 +??? I'm using Open XDMoD to check how many hours my group used this year, but the amount is quite different from what appears on the invoice or in ''ug_slurm_usage_per_user.py''.
 +!!! For invoicing, we use a Slurm metric called billing. This metric aggregates CPU hours, memory usage, and GPU usage. Unfortunately, Open XDMoD does not currently support this metric and therefore does not take into account GPU type or memory usage. We have added a warning about this in [[accounting#openxdmod|Open XDMoD documentation]]
 +
 +
 +??? My group only used our private partition. I heard from a colleague that it is free of charge, yet I still received an invoice.
 +!!! Because GPU models and RAM usage vary significantly, we now apply a uniform resource accounting system by converting GPU hours and memory usage into CPU-hour equivalents.
 +This metric is called billing. If you own a private partition, you receive a fixed annual allocation of billing units that can be used on any cluster. More details are available here: [[accounting#resources_available_for_research_group|Resources available for research groups]]
 +
 +??? Your documentation states the cost per hour is 0.0157 CHF, but my invoice shows 0.02 CHF.
 +!!! The price per hour shown on the invoice is rounded for display purposes only. The actual calculation uses the full precision of the rate.
 +
 +??? I'm a PI and want to see usage details per user.
 +!!! You can use the script ''ug_slurm_usage_per_user.py'' for this purpose. An example is provided in our documentation: Aggregate usage by all users of a given PI. [[accounting#aggregate_usage_by_all_users_of_a_given_pi|Aggregate usage by all users of a given PI]]
 +
  
 ??? I have no idea why I received your email about 'HPC billing'. ??? I have no idea why I received your email about 'HPC billing'.
Line 89: Line 103:
 </code> </code>
  
 +??? If multiple PIs jointly purchase computing resources on the Baobab cluster, who receives the invoice?
 +
 +!!! When several Principal Investigators (PIs) collaborate to acquire computing resources on the Baobab cluster, the invoice will be sent to the designated contact person of the group associated with the partition named private_xxx, where xxx is the name of the partition. This person acts as the reference for the group and is responsible for managing the billing and communication related to the shared resources.
 +
 +It is up to the PIs involved to agree among themselves on how the computing hours are distributed. The Baobab team does not manage internal allocations within shared purchases.
  
 ??? I'm a PI, I tried to use OpenXDmoD to see the past usage of my group without success ??? I'm a PI, I tried to use OpenXDmoD to see the past usage of my group without success
Line 94: Line 113:
  
  
-??? How can I check usage on more than one partition?+??? With OpenXDmoD how can I check usage on more than one partition?
 !!! Unfortunately, it seems that you need to do this operation for each partition separately. !!! Unfortunately, it seems that you need to do this operation for each partition separately.
  
Line 106: Line 125:
  
  
-??? I'organising a course and we need some HPC resources for the students. Do we have to pay for it?+??? I'organizing a course and we need some HPC resources for the students. Do we have to pay for it?
 !!! The Baobab service is free for courses as long as the usage is low and for a defined period of time. Check [[hpc:hpc_clusters#use_baobab_for_teaching|How our clusters work]]. !!! The Baobab service is free for courses as long as the usage is low and for a defined period of time. Check [[hpc:hpc_clusters#use_baobab_for_teaching|How our clusters work]].
  
Line 125: Line 144:
 !!!  * If you have a non student account (Phd, postdoc, researcher), your account will expire at the same time your contract expire at UNIGE. Right now, there is a grace period after the end of your contract of around 6 months. !!!  * If you have a non student account (Phd, postdoc, researcher), your account will expire at the same time your contract expire at UNIGE. Right now, there is a grace period after the end of your contract of around 6 months.
   * If you have an outsider account, you need to check the expiration date you received when you filled the invitation.   * If you have an outsider account, you need to check the expiration date you received when you filled the invitation.
-  * If you have an unige student account, you can check the expiration date with the ''chage'' command:+  * If you have an UNIGE student account, you can check the expiration date with the ''chage'' command:
 <code> <code>
 (baobab)-[yourusername@login2 ~]$ chage -l yourusername (baobab)-[yourusername@login2 ~]$ chage -l yourusername
Line 138: Line 157:
  
 ??? I'm leaving UNIGE, can I continue to use Baobab HPC service? ??? I'm leaving UNIGE, can I continue to use Baobab HPC service?
-!!! Yes it is possible as long as you collaborate tightly with your former research group. Your PI must [[https://gestion-externe.unige.ch/main/outsider-requests|invite]] you as [[hpc:access_the_hpc_clusters#outsider_account|outsider]]. For technical reason, your account needs to be expired  prior doing the request for the invitation. +!!! For UNIGE or external members, please first refer to the guidelines about account expiration and the grace period: https://plone.unige.ch/distic/pub/isis/comment-fonctionne-isis#prolongations. 
-We'll then reactivate your account. You'll keep your data.+Please note that expired accounts are eligible for deletion at any time. We strongly recommend that you carefully prepare for your departure or contract extension. 
 + 
 +However, it is possible to extend your access as long as you maintain close collaboration with your former research group. Your PI must [[https://gestion-externe.unige.ch/main/outsider-requests|invite]] you as an [[hpc:access_the_hpc_clusters#outsider_account|outsider]]. 
 +For technical reasons, your account must be expired before making the invitation request
 +Once the invitation is processed, we will reactivate your account, and you will retain access to your data.
  
  
Line 245: Line 268:
     - **~/.local/session**     - **~/.local/session**
     - **~/.config/xfce**     - **~/.config/xfce**
-=?==== Storage =====+=?=== Storage ====
  
 ??? I have a question about the storage !? ??? I have a question about the storage !?
Line 258: Line 281:
  
 If you need to store a large amount of data, consider using the "Academic NAS" service, which you can find here: Academic NAS. If you need to store a large amount of data, consider using the "Academic NAS" service, which you can find here: Academic NAS.
 +
 +??? How can I request for a shared directory
 +!!!A shared directory is a directory with access permissions granted to a specific group for sharing data.
 +Since June 2025, all new shared directories must use groups declared in the Active Directory of UNIGE.
 +
 +To request a shared directory please fill the [[https://dw.unige.ch/openentry.html?tid=hpc | HPC form]] on DW.
 +(For Oustsider please refer to your PI/repondant)
  
 ??? How can I access to a shared directory? ??? How can I access to a shared directory?
 !!! To access a **shared directory**, you need to be added to the appropriate group. !!! To access a **shared directory**, you need to be added to the appropriate group.
  
-Please send an email to [[hpc@unige.ch]] including relevant information (Uusername, Group, private_partion etc...) with the responsible person for the share or partition in CC. The responsible person **must** approve the modification.+For shared directories using a group from Active Directory (groups beginning with GG or GL - example: GL_S_SCIENCES_POSY_LET, please contact your CI (Correspondant Informatique) or use the [[https://dw.unige.ch/openentry.html?tid=adaccess|ADaccess form]] on DW. 
 + 
 +For old group (share_XXX, private_XXX), please send an email to [[hpc@unige.ch]] including relevant information (Uusername, Group, private_partion etc...) with the responsible person for the share or partition in CC. The responsible person **must** approve the modification.
  
 ??? How can I copy data from one cluster to another one? ??? How can I copy data from one cluster to another one?
Line 328: Line 360:
  
 In this case you need to check if there is another version available compatible with the toolchain (''GCC'', ''foss'' etc...) you want to use. If not, please refer to [[hpc:faq#the_software_i_need_is_not_ava|The software I need is not available on Clusters: what should I do ?]].  In this case you need to check if there is another version available compatible with the toolchain (''GCC'', ''foss'' etc...) you want to use. If not, please refer to [[hpc:faq#the_software_i_need_is_not_ava|The software I need is not available on Clusters: what should I do ?]]. 
-=?==== Slurm: job scheduler =====+=?=== Slurm: job scheduler ====
 ??? What is Slurm ? ??? What is Slurm ?
 !!! Slurm is a job scheduling system used to manage and allocate resources in a computing cluster. It helps you submit, monitor, and control jobs (tasks) on the cluster. !!! Slurm is a job scheduling system used to manage and allocate resources in a computing cluster. It helps you submit, monitor, and control jobs (tasks) on the cluster.
Line 373: Line 405:
 ??? I want to run several time the same job with different parameters ??? I want to run several time the same job with different parameters
 !!!In that case you can use the **job arrays** feature of SLURM. Please, have a look at the documentation [[hpc:slurm#job_array|Job array]] !!!In that case you can use the **job arrays** feature of SLURM. Please, have a look at the documentation [[hpc:slurm#job_array|Job array]]
 +
 +??? Is the nodes from my partition in use?
 +!!! You can't just check with the partition name as the node may be in use by another partition such as shared-cpu. Here is an example to check the usage of your comput nodes:
 +<code>
 +squeue -w $(sinfo --noheader --partition <to be replaced by partition-name> --format="%n" | nodeset -f)
 +</code>
  
  
Line 410: Line 448:
  
 Please send an email to [[hpc@unige.ch]] including relevant information (Uusername, Group, private_partion etc...) with the responsible person for the share or partition in CC. The responsible person **must** approve the modification. Please send an email to [[hpc@unige.ch]] including relevant information (Uusername, Group, private_partion etc...) with the responsible person for the share or partition in CC. The responsible person **must** approve the modification.
-=?==== Mac Issues  =====+ 
 +=?=== Issues  ====
  
 ??? I have a keyboard issue using a Mac. ??? I have a keyboard issue using a Mac.
hpc/faq.1749026002.txt.gz · Last modified: (external edit)