hpc:applications_and_libraries
Differences
This shows you the differences between two versions of the page.
| hpc:applications_and_libraries [2025/01/15 09:39] – [FOSS toolchain] Yann Sagon | hpc:applications_and_libraries [2025/06/11 12:27] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 313: | Line 313: | ||
| ===== Conda ===== | ===== Conda ===== | ||
| + | ==== How to Create a Conda Environment in a Container ===== | ||
| - | Use it | ||
| - | < | + | Using **Conda** directly on HPC systems or shared servers can cause performance issues and storage overload because Conda environments create thousands of small files. This often results in: |
| - | module load Anaconda3 | + | |
| + | * Slow job startup times | ||
| + | * Filesystem limitations being hit | ||
| + | * High I/O load on the cluster | ||
| + | * Complex environment management | ||
| + | |||
| + | A better solution is to **encapsulate Conda environments inside a container**. This way, the entire environment is packaged into a single file (such as a `.sif` image used by Apptainer/ | ||
| + | |||
| + | |||
| + | === Benefits === | ||
| + | Using this method offers multiple advantages: | ||
| + | - ✅ **Fewer files**: Your environment is stored in a single `.sif` file | ||
| + | - ✅ **Portability**: | ||
| + | - ✅ **Reproducibility**: | ||
| + | - ✅ **Isolation**: | ||
| + | - ✅ **Stability**: | ||
| + | |||
| + | === Limitations === | ||
| + | - ⚠️ The container is static; to update packages, you need to rebuild the image | ||
| + | |||
| + | |||
| + | This guide explains how to build such a container using [[https:// | ||
| + | |||
| + | |||
| + | === Step 1 – Define the Conda Environment === | ||
| + | Create a file '' | ||
| + | (As exemple we will use '' | ||
| + | |||
| + | < | ||
| + | name: bioenv | ||
| + | channels: | ||
| + | - bioconda | ||
| + | - conda-forge | ||
| + | - defaults | ||
| + | dependencies: | ||
| + | - blast=2.16.0 | ||
| + | - diamond=2.1.11 | ||
| + | - exonerate=2.4.0 | ||
| + | - spades=4.1.0 | ||
| + | - mafft=7.525 | ||
| + | - trimal=1.5.0 | ||
| + | - numpy | ||
| + | - joblib | ||
| + | - scipy | ||
| + | [...] | ||
| + | |||
| + | prefix:/ | ||
| </ | </ | ||
| + | |||
| + | You can generate this file using the following commands: | ||
| + | |||
| + | <code bash> | ||
| + | # 1. (optional) create your environment (or not if you already have one) | ||
| + | $ conda create -n bioenv -c bioconda -c conda-forge spades exonerate diamond blast mafft trimal numpy joblib scipy -y | ||
| + | |||
| + | # 2. Activate your environment | ||
| + | $ conda activate bioenv | ||
| + | |||
| + | # 3. Export the settings of your environment | ||
| + | # It’s recommended to manually remove the `prefix:` line at the bottom of the file before using it with cotainr. | ||
| + | $ conda env export > bioenv.yml | ||
| + | </ | ||
| + | |||
| + | |||
| + | |||
| + | === Step 2 – Build the Container === | ||
| + | Now use '' | ||
| + | |||
| + | <code bash> | ||
| + | $ module load GCCcore/ | ||
| + | # Ex: cotainr build < | ||
| + | $ cotainr build bioenv.sif --base-image=docker:// | ||
| + | </ | ||
| + | |||
| + | You can replace '' | ||
| + | |||
| + | === Step 3 – Use the Container === | ||
| + | You can now run commands inside the container as follows: | ||
| + | |||
| + | <code bash> | ||
| + | |||
| + | $ apptainer exec bioenv.sif python3 -c " | ||
| + | </ | ||
| + | |||
| + | Or launch any program inside the container just like you would in a normal environment. | ||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| ==== Conda environment management ==== | ==== Conda environment management ==== | ||
| + | |||
| + | Use it | ||
| + | |||
| + | < | ||
| + | module load Anaconda3 | ||
| + | </ | ||
| Create | Create | ||
| Line 1056: | Line 1150: | ||
| With the Baobab upgrade to CentOS 7 (cf. https:// | With the Baobab upgrade to CentOS 7 (cf. https:// | ||
| - | Instead, | + | Instead, |
| - | + | ||
| - | - install it in your '' | + | |
| - | capello@login2: | + | |
| - | capello@login2: | + | |
| - | capello@login2: | + | |
| - | [...] | + | |
| - | capello@login2: | + | |
| - | [...] | + | |
| - | capello@login2: | + | |
| - | </ | + | |
| - | - launch an interactive graphical job: | + | |
| - | - connect to the cluster using [[hpc:access_the_hpc_clusters# | + | |
| - | - start an interactive session on a node (see [[hpc/ | + | |
| - | capello@login2: | + | |
| - | salloc: Pending job allocation 39085914 | + | |
| - | salloc: job 39085914 queued and waiting for resources | + | |
| - | salloc: job 39085914 has been allocated resources | + | |
| - | salloc: Granted job allocation 39085914 | + | |
| - | capello@node001: | + | |
| - | </ | + | |
| - | - load one of the R version supported by RStudio, for example:< | + | |
| - | capello@node001: | + | |
| - | + | ||
| - | ---------------------------------------------------------------------------------- | + | |
| - | R: R/3.6.0 | + | |
| - | ---------------------------------------------------------------------------------- | + | |
| - | Description: | + | |
| - | R is a free software environment for statistical computing and | + | |
| - | graphics. | + | |
| - | + | ||
| - | + | ||
| - | You will need to load all module(s) on any one of the lines below | + | |
| - | before the " | + | |
| - | + | ||
| - | GCC/ | + | |
| - | [...] | + | |
| - | capello@node001: | + | |
| - | capello@node001: | + | |
| - | capello@node001: | + | |
| - | capello@node001: | + | |
| - | </ | + | |
| - | - run RStudio : <code console> | + | |
| - | capello@node001: | + | |
| - | </ | + | |
| - | + | ||
| - | <note important> | + | |
| - | < | + | |
| - | module load PostgreSQL/ | + | |
| - | </ | + | |
| - | </ | + | |
| ==== R packages ==== | ==== R packages ==== | ||
hpc/applications_and_libraries.1736933987.txt.gz · Last modified: (external edit)