This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
hpc:best_practices [2020/11/26 11:41] Yann Sagon [Single thread vs multi thread vs distributed jobs] |
hpc:best_practices [2023/05/26 15:07] (current) Adrien Albert [First steps] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | < | + | {{METATOC 1-5}} |
- | This page gives best practices and tips on how to use the clusters **Baobab** and **Yggdrasil**. | ||
====== Introduction ====== | ====== Introduction ====== | ||
+ | This page gives best practices and tips on how to use the clusters **Baobab** and **Yggdrasil**. | ||
+ | |||
An HPC cluster is an advanced, complex and always-evolving piece of technology. It's easy to forget details and make mistakes when using one, so don't hesitate to check this section every now and then, yes, even if you are the local HPC guru in your team! There' | An HPC cluster is an advanced, complex and always-evolving piece of technology. It's easy to forget details and make mistakes when using one, so don't hesitate to check this section every now and then, yes, even if you are the local HPC guru in your team! There' | ||
Line 14: | Line 15: | ||
For your first steps we recommend the following : | For your first steps we recommend the following : | ||
* Check the [[hpc: | * Check the [[hpc: | ||
- | * Connect to the login node of the cluster you are planning to use : | + | * Connect to the login node of the cluster you are planning to use : [[hpc: |
- | * [[hpc: | + | |
* Check the rest of this page for best practices and smart use of the HPC resources. | * Check the rest of this page for best practices and smart use of the HPC resources. | ||
* [[hpc: | * [[hpc: | ||
- | * Understand how to load your libraries/ | + | * Understand how to load your libraries/ |
- | * [[applications_and_libraries|Applications and libraries]] | + | * Learn how to write a Slurm '' |
- | * Learn how to write a Slurm '' | + | |
- | * [[slurm|Slurm and job management]] | + | |
====== Rules and etiquette ====== | ====== Rules and etiquette ====== | ||
Line 87: | Line 85: | ||
===== Single thread vs multi thread vs distributed jobs ===== | ===== Single thread vs multi thread vs distributed jobs ===== | ||
- | There are three job categories each with different needs: | + | See [[hpc:slurm# |
- | ^Job type ^ Number of cpu used ^ Examples | ||
- | | **single threaded** | **one CPU** | Python, plain R | - | | ||
- | | **multi threaded** | **all the CPUs** of a compute node (best case scenario) | ||
- | | **distributed** | can spread tasks on various compute nodes | Palabos OpenFOAM | OpenMPI, workers | | ||
- | |||
- | |||
- | |||
- | |||
- | There are also **hybrid** jobs, where each tasks of such a job behave like a multi-threaded job. This is not very common and we won't cover this case. | ||
- | |||
- | FIXME On the cluster, we have two type of partitions with a fundamental difference: | ||
- | |||
- | * with resources allocated per compute node: shared-EL7, parallel-EL7 | ||
- | * with resources allocated per cpu: all the other partitions | ||
===== Bad CPU usage ===== | ===== Bad CPU usage ===== | ||
Let's take an example of a **single threaded job**. You should clearly use a partition which allows to request a single CPU, such as '' | Let's take an example of a **single threaded job**. You should clearly use a partition which allows to request a single CPU, such as '' | ||
- | |||
{{ : | {{ : | ||
- | image | ||
Line 148: | Line 130: | ||
* This will help you choose the parameters ''< | * This will help you choose the parameters ''< | ||
* [[hpc/ | * [[hpc/ | ||
- | * This will help you choose the parameters '' | + | * This will help you choose the parameters '' |
* How much memory does my job need ? | * How much memory does my job need ? | ||
* This will help you choose the parameters ''< | * This will help you choose the parameters ''< | ||
Line 156: | Line 138: | ||
* Do I want to receive email notification ? | * Do I want to receive email notification ? | ||
* This is optional, but you can specify the level of details you want with the ''< | * This is optional, but you can specify the level of details you want with the ''< | ||
+ | |||
+ | ====== Transfer data from cluster to another with ====== | ||
+ | ===== Rsync ===== | ||
+ | This help assumes you want transfer the directory ''< | ||
+ | |||
+ | |||
+ | __**Rsync options: | ||
+ | * ''< | ||
+ | * ''< | ||
+ | * ''< | ||
+ | * ''< | ||
+ | * ''< | ||
+ | * ''< | ||
+ | * ''< | ||
+ | * ''< | ||
+ | * ''< | ||
+ | |||
+ | 1) Go to your directory containing ''< | ||
+ | < | ||
+ | (baobab)-[toto@login2 ~]$cd $HOME/ | ||
+ | </ | ||
+ | |||
+ | 2) Set the variables (or not) | ||
+ | < | ||
+ | (baobab)-[toto@login2 my_projects]$ DST=$HOME/ | ||
+ | (baobab)-[toto@login2 my_projects]$ DIR=the_best_project_ever | ||
+ | (baobab)-[toto@login2 my_projects]$ YGGDRASIL=login1.yggdrasil | ||
+ | </ | ||
+ | 3) Run the rsync | ||
+ | < | ||
+ | (baobab)-[toto@login2 my_projects]$ rsync -aviuzPrg ${DIR} ${YGGDRASIL}: | ||
+ | </ |