hpc:slurm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
hpc:slurm [2023/10/12 09:02] – [Memory and CPU usage] Yann Sagon | hpc:slurm [2025/04/08 17:05] (current) – [Clusters partitions] Adrien Albert | ||
---|---|---|---|
Line 68: | Line 68: | ||
* Special public partitions: | * Special public partitions: | ||
* '' | * '' | ||
- | * '' | + | * '' |
* '' | * '' | ||
* '' | * '' | ||
Line 113: | Line 113: | ||
^ Partition | ^ Partition | ||
|debug-cpu | |debug-cpu | ||
- | |debug-gpu |15 Minutes | + | |public-interactive-gpu |4 hours |
|public-interactive-cpu |8 hours |10GB | | |public-interactive-cpu |8 hours |10GB | | ||
|public-longrun-cpu | |public-longrun-cpu | ||
Line 126: | Line 126: | ||
Minimum resource is one core. | Minimum resource is one core. | ||
- | N.B. : no '' | + | N.B. : no '' |
Line 206: | Line 206: | ||
Example to request three titan cards: ''< | Example to request three titan cards: ''< | ||
+ | |||
You can find a detailed list of GPUs available on our clusters here : | You can find a detailed list of GPUs available on our clusters here : | ||
Line 217: | Line 218: | ||
* [[http:// | * [[http:// | ||
+ | ===== CPU ===== | ||
+ | <WRAP center round important 60%> | ||
+ | You can request all the CPUs of a compute node minus two that are reserved for the OS. See [[https:// | ||
+ | </ | ||
===== CPU types ===== | ===== CPU types ===== | ||
Line 460: | Line 464: | ||
===== GPGPU jobs ===== | ===== GPGPU jobs ===== | ||
- | When we talk about [[https:// | + | When we talk about [[https:// |
You can see on this table [[hpc: | You can see on this table [[hpc: | ||
Line 498: | Line 502: | ||
#SBATCH --partition=shared-gpu | #SBATCH --partition=shared-gpu | ||
#SBATCH --gpus=1 | #SBATCH --gpus=1 | ||
- | #SBATCH --constraint=" | + | #SBATCH --constraint=" |
</ | </ | ||
Line 677: | Line 681: | ||
Use reservation via srun: | Use reservation via srun: | ||
- | (baobab)-[alberta@login2 ~]# srun --partition | + | (baobab)-[alberta@login2 ~]# srun --reservation |
| | ||
Use reservation via script sbatch: | Use reservation via script sbatch: | ||
Line 764: | Line 768: | ||
If you want other information please see the sacct manpage. | If you want other information please see the sacct manpage. | ||
+ | <note tip>by default the command displays a lot of fields. You can use this trick to display them correctly. Then you can move with left and right arrows to see the remaining fields | ||
+ | < | ||
+ | (yggdrasil)-[root@admin1 ~]$ sstat -j 39919765 --all | less -#2 -N -S | ||
+ | 1 JobID | ||
+ | 2 ------------ ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -------- ---------- -------> | ||
+ | 3 39919765.ex+ | ||
+ | 4 39919765.ba+ | ||
+ | </ | ||
+ | |||
+ | </ | ||
===== Energy usage ===== | ===== Energy usage ===== | ||
+ | ==== CPUs ==== | ||
You can see the energy consumption of your jobs on Yggdrasil (Baobab soon). The energy is shown in Joules using sacct. | You can see the energy consumption of your jobs on Yggdrasil (Baobab soon). The energy is shown in Joules using sacct. | ||
Line 776: | Line 791: | ||
2023-10-12T09: | 2023-10-12T09: | ||
</ | </ | ||
- | <note important> | + | <note important> |
- | + | ||
- | + | ||
- | ===== Job history ===== | + | |
- | You can see your job history using '' | + | |
+ | ==== GPUs ==== | ||
+ | If you are interested by the power usage of a GPU card your job is using, you can issue the following command while your job is running on a GPU node: | ||
< | < | ||
- | [sagon@master ~] $ sacct -u $USER -S 2021-04-01 | + | (baobab)-[root@gpu002 |
- | | + | |
- | ------------ ---------- ---------- ---------- ---------- ---------- -------- | + | |
- | 45517641 | + | |
- | 45517641.ba+ | + | |
- | 45517641.ex+ | + | |
- | 45517641.0 | + | |
- | 45518119 | + | |
- | 45518119.ba+ | + | |
- | 45518119.ex+ | + | |
- | </ | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== Report and statistics with sreport ===== | + | |
- | + | ||
- | To get reporting about your past jobs, you can use '' | + | |
- | + | ||
- | Here are some examples that can give you a starting point : | + | |
- | + | ||
- | To get the number of jobs you ran (you <=> '' | + | |
- | + | ||
- | <code console> | + | |
- | [brero@login2 | + | |
- | -------------------------------------------------------------------------------- | + | # gpu pwr gtemp mtemp |
- | Job Sizes 2018-01-01T00: | + | # Idx W C C |
- | Units are in number of jobs ran | + | |
- | -------------------------------------------------------------------------------- | + | |
- | | + | 0 62 |
- | --------- --------- ------------- ------------- ------------- ------------- ------------- ------------ | + | |
- | | + | |
</ | </ | ||
- | You can see how many jobs were run (grouped by allocated CPU). You can also see we specified an extra day for the //end date// '' | ||
- | < | ||
- | You can also check how much CPU time (seconds) you have used on the cluster between since 2019-09-01 : | ||
- | |||
- | <code console> | ||
- | [brero@login2 ~]$ sreport cluster AccountUtilizationByUser user=$USER start=2019-09-01 -t Seconds | ||
- | -------------------------------------------------------------------------------- | ||
- | Cluster/ | ||
- | Usage reported in CPU Seconds | ||
- | -------------------------------------------------------------------------------- | ||
- | Cluster | ||
- | --------- --------------- --------- --------------- -------- -------- | ||
- | | ||
- | </ | ||
- | |||
- | In this example, we added the time '' | ||
- | |||
- | Please note : | ||
- | * By default, the CPU time is in Minutes | ||
- | * It takes up to an hour for Slurm to upate this information in its database, so be patient | ||
- | * If you don't specify a start, nor an end date, yesterday' | ||
- | * The CPU time is the time that was allocated to you. It doesn' | ||
- | |||
- | Tip : If you absolutely need a report including your job that ran on the same day, you can override the default end date by forcing tomorrow' | ||
- | |||
- | < | ||
- | sreport cluster AccountUtilizationByUser user=$USER start=2019-09-01 end=$(date --date=" | ||
- | </ | ||
hpc/slurm.1697101368.txt.gz · Last modified: 2023/10/12 09:02 by Yann Sagon