Differences

This shows you the differences between two versions of the page.

--- hpc:storage_on_hpc [2022/11/28 12:42]
Adrien Albert
+++ hpc:storage_on_hpc [2024/05/02 13:51] (current)
Gaël Rossignol [Cluster storage]
@@ Line 1: / Line 1: @@
-<title> Storage </title>
+{{METATOC 1-5}}
+======= Storage =======
 There are different types of storage on the clusters. This is important to understand where to store which type of data.
@@ Line 11: / Line 12: @@
 This is the storage space we offer on our clusters
-^ Cluster   ^ BeeGFS path              ^ Total storage size  ^ Nb of servers ^ Nb of targets per servers  ^ Backup     ^
+^ Cluster   ^ Path                     ^ Total storage size  ^ Nb of servers ^ Nb of targets per servers  ^ Backup     ^ Quota size         ^ Quota number files ^
-| Baobab    | ''/home/''               | 138 TB              | 4             | 1 meta, 2 storage          | Yes (tape) |
+| Baobab    | ''/home/''               | 138 TB              | 4             | 1 meta, 2 storage          | Yes (tape) | 1 TB               | -                  |
-| :::       | ''/srv/beegfs/scratch/'' | 1.0 PB              | 2             | 1 meta, 6 storage          | No         |
+| :::       | ''/srv/beegfs/scratch/'' | 1.0 PB              | 2             | 1 meta, 6 storage          | No         | -                  | 10 M               |
-| Yggdrasil | ''/home/''               | 495 TB              | 2             | 1 meta, 2 storage          | Yes (tape) |
+| :::       | ''/srv/fast''            | 5 TB                | 1             | 1                          | No         | 500G/User 1T/Group | -               |
-| :::       | ''/srv/beegfs/scratch/'' | 1.2 PB              | 2             | 1 meta, 6 storage          | No         |
+| Yggdrasil | ''/home/''               | 495 TB              | 2             | 1 meta, 2 storage          | Yes (tape) | 1 TB               | -                  |
+| :::       | ''/srv/beegfs/scratch/'' | 1.2 PB              | 2             | 1 meta, 6 storage          | No         | -                  | 10 M               |
-We realize you all have different needs in terms of storage. To guarantee storage space for all users, we have set a quota of 1 TB per user on home beegfs, beyond this limit, you will not able to write to this filesystem. We count on all of you to only store research data on the clusters. We also count on your **to periodically delete old or unneeded files** and to **clean up everything when you will leave UNIGE**. Please keep on reading to understand when you should use each type of storage.
+We realize you all have different needs in terms of storage. To guarantee storage space for all users, we have **set a quota on home and scratch directory**, see table above for details. Beyond this limit, you will not able to write to this filesystem. We count on all of you to only store research data on the clusters. We also count on your **to periodically delete old or unneeded files** and to **clean up everything when you will leave UNIGE**. Please keep on reading to understand when you should use each type of storage.
@@ Line 66: / Line 68: @@
 Also, the scratch directory is not a permanent storage solution, we strongly advise you to move/clean useless/unused data after your project.
+==== Quota ====
+As the storage is shared by everyone, this ensure a fair scratch usage and prevent users from filling it. We setup a quota based on the number of files you own, not the file size.
+**The maximum file count is currently set to 10M.**
+What does it mean for you: if the number of files in your scratch space is higher than 10M, you won’t be able to write to it anymore.
+Error message:
+Disk quota exceeded
+To resume the situation, you should clean up some data in your scratch directory.
+===== Fast directory =====
+A new fast storage is available dedicated for jobs using multiples nodes and scratchlocal need to be shared between nodes.
+^ Cluster   ^ path              ^ Total storage size  ^ Nb of servers ^ Backup     ^ Quota size ^ Quota number files ^
+| Baobab    | ''/srv/fast''     | 5 TB                | 1             | No         | 500G by user & 1TB by group       | -                  |
+<note important>This storage will be erased on each maintenances.</note>
+==== Quota ====
+As the storage is shared by everyone, this ensure a fair scratch usage and prevent users from filling it. We setup a quota based on the total size.
+You should clean up some data in your fast directory as soon as your jobs are finished.
 ====== Local storage ======
@@ Line 84: / Line 117: @@
 </note>
-===== temporary space =====
+===== Temporary private space =====
 On **each** compute node, you can use the following private ephemeral spaces:
@@ Line 93: / Line 126: @@
 Those places are private and only accessible by your job.
+===== Temporary shared space =====
+If you need to access the data from more than one node, you can use a space reachable from all your jobs running on the same compute node. When you have no more jobs running on the node, the content of the storage is erased.
+The path is the following: ''/share/users/${SLURM_JOB_USER:0:1}/${SLURM_JOB_USER}''
+See here for a usage example: https://hpc-community.unige.ch/t/local-share-directory-beetween-jobs-on-compute/2893
 ====== Sharing files with other users ======
@@ Line 117: / Line 154: @@
 For easy sharing you need to set ''umask 0002'' (thus new files and directories will be created with 660 and 770 permissions, respectively), otherwise you will be asked for confirmation every time you want to modify a file or, even worse, you will not be able to create new files/folders.
-This is a side-effect of the default permissions on Red Hat-based systems without **User Private Groupes** (//i.e.// when the UID/GID differs, as it is the case on Baobab/Yggdrasil (( https://hpc-community.unige.ch/t/some-directory-permission-seem-to-change-on-their-own/1050/2 )) ).
+This is a side-effect of the default permissions on Red Hat-based systems without **User Private Groups** (//i.e.// when the UID/GID differs, as it is the case on Baobab/Yggdrasil (( https://hpc-community.unige.ch/t/some-directory-permission-seem-to-change-on-their-own/1050/2 )) ).
+</note>
+<note info>
+Since we use ACL to set the user right, you can't rely on sticky bit to force the new files to belong to a group which is not your primary group. You have the following options:
+  * You can request to change your primary group: every file that you create on the cluster will belong to this group
+  * You can set your umask to 0002 as explained previously
+  * You can launch on a regular basis as script that "fix" the group. Example: ''find . -type f -exec chown :share_xxx {} \;''
 </note>
 ====== Best practices ======
@@ Line 153: / Line 196: @@
 ===== Check disk usage on the clusters =====
-Since ''/home'' and ''/srv/beegfs/scratch/'' have quota enabled and enforced (only on home), we can quickly check the disk usage by fetching the quota information.
+==== Check disk usage on home and scratch ====
+Since ''/home'' and ''/srv/beegfs/scratch/'' have quota enabled and enforced, we can quickly check the disk usage by fetching the quota information.
 The script ''beegfs-get-quota-home-scratch.sh'' gives you a quick summary :
 <code console>
-[brero@login2 ~]$ beegfs-get-quota-home-scratch.sh brero
+(baobab)-[sagon@login2 ~]$ beegfs-get-quota-home-scratch.sh
-       USER >     /home    | /srv/beegfs/scratch
+home dir: /home/sagon
-      brero >    1.04 GiB  |    4.00 KiB
+scratch dir: /srv/beegfs/scratch/users/s/sagon
+        user/group                 ||           size          ||    chunk files
+storage     |   name        |  id  ||    used    |    hard    ||  used   |  hard
+----------------------------|------||------------|------------||---------|---------
+home        |          sagon|240477||  530.46 GiB| 1024.00 GiB||  1158225|unlimited
+scratch     |          sagon|240477||    2.74 TiB|   unlimited||   436030| 10000000
 </code>
-N.B. This includes all your data in ''$HOME'', ''$HOME/scratch'', but also any data in ''/home/share'' and ''/srv/beegfs/scratch/shares'' that belongs to you (if you are using a shared directory).
+<WRAP center round tip 60%>
+This includes all your data in ''$HOME'', ''$HOME/scratch'', but also any data in ''/home/share'' and ''/srv/beegfs/scratch/shares'' that belongs to you (if you are using a shared directory).
+</WRAP>
+<note>The column "chunk files" doesn't correspond exactly to the number of files you own. It corresponds to the number of chunks you own. Each file has at least one chunk. The current chunk size is 512kB. If a file is bigger, it will be split.</note>
+==== Check disk usage on NASAC ====
 If you have space as well in ''/acanas'' (The NASAC) you can check your quota and usage like this:
@@ Line 294: / Line 352: @@
 reference: (([[https://hpc-community.unige.ch/t/howto-access-external-storage-from-baobab/551|How to access external storage from Baobab]]))
 ===== CVMFS =====
-The following cvmfs content is mounted on the compute nodes and login nodes on '/cvmfs'
+All the compute nodes of our clusters have CernVM-FS client installed. CernVM-FS, the CernVM File System (also known as CVMFS), is a file distribution service that is particularly well suited to distribute software installations across a large number of systems world-wide in an efficient way.
+A couple of repository are mounted on the compute and login node such as:
   * atlas.cern.ch
@@ Line 303: / Line 365: @@
   * grid.cern.ch
-The content is mounted using autofs. It means that the root directory `/cvmfs` may appears empty as long as you
+The content is mounted using autofs under the path ''/cvmfs''. It means that the root directory ''/cvmfs'' may appears empty as long as you
 didn't access explicitly one of the child directory. Doing so will mount the repository for a couple of
 minutes and unmount it automatically.
+Other flaghship repository available without further configuration:
+  * unpacked.cern.ch
+  * singularity.opensciencegrid.org (container registry)
+  * software.eessi.io (
 <code>
@@ Line 314: / Line 383: @@
 cvmfs-config.cern.ch  grid.cern.ch
 </code>
+The EESSI did a nice tutorial about CVMFS readable on [[https://multixscale.github.io/cvmfs-tutorial-hpc-best-practices/|multixscale]] git repo.
 ====== Robinhood ======

eResearch Doc

User Tools

Site Tools

Differences

Page Tools