hpc:getting_started
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
hpc:getting_started [2022/08/04 13:26] – [An example] Pierre Kuenzli | hpc:getting_started [2023/06/09 09:07] (current) – Adrien Albert | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | < | + | ====== |
This document present in general what are computing clusters and what is High Performance Computing (HPC). It can be read out of curiosity, if you want to know what are those infrastructures for or if you are a potential future user with limited technical background. For more practical information on using unige HPC infrastructure, | This document present in general what are computing clusters and what is High Performance Computing (HPC). It can be read out of curiosity, if you want to know what are those infrastructures for or if you are a potential future user with limited technical background. For more practical information on using unige HPC infrastructure, | ||
Line 9: | Line 9: | ||
When running heavy computations, | When running heavy computations, | ||
- | But what if now you have to run hundreds or thousands of those tasks ? Then the idea is simple : use a lot of computers at the same time. But having lots of computers available is not enough, you need a way to manage them in a centralized way. Otherwise, you would have to connect individually to each computer, run some tasks, wait for completion, and manually gather the results. And still, what happens if you want to share the resources with other people ? You would have to establish some kind of usage schedule. If you want to use hundreds of computers or more, manual management of tasks is simply not an option. | + | But what if you have to run hundreds or thousands of those tasks ? Then the idea is simple : use a lot of computers at the same time. But having lots of computers available is not enough, you need a way to manage them in a centralized way. Otherwise, you would have to connect individually to each computer, run some tasks, wait for completion, and manually gather the results. And still, what happens if you want to share the resources with other people ? You would have to establish some kind of usage schedule. If you want to use hundreds of computers or more, manual management of tasks is simply not an option. |
That's where computing clusters comes into play. They are quite literally clusters of computers, interconnected by a network, with a centralized storage and a central tasks (or jobs) management software. | That's where computing clusters comes into play. They are quite literally clusters of computers, interconnected by a network, with a centralized storage and a central tasks (or jobs) management software. | ||
Line 15: | Line 15: | ||
< | < | ||
- | First clusters where made with commodity hardware and named " | + | First clusters where made with commodity hardware and named " |
**Documentation :** see [[slurm|documentation on how to use slurm]] on unige' | **Documentation :** see [[slurm|documentation on how to use slurm]] on unige' | ||
Line 27: | Line 27: | ||
**Documentation :** the [[hpc_glossary|glossary]] gives the meaning of some terms. | **Documentation :** the [[hpc_glossary|glossary]] gives the meaning of some terms. | ||
- | Each node in a cluster is a computer, embedding one or more multicore CPUs, a certain amount of RAM, one or more network interfaces and possibly one or more coprocessors (more often GPUs). Thus, a cluster can be characterized by its number of nodes, quantity of RAM, type and number | + | Each node in a cluster is a computer, embedding one or more multicore CPUs, a certain amount of RAM, one or more network interfaces and possibly one or more coprocessors (usually |
Another very important part of a cluster is its storage. Indeed, software running on a cluster needs to access data to process. In case of clusters, data are stored as close as possible to the compute nodes in storage servers or local storage rather than in some distant server or service, such as cloud storage. | Another very important part of a cluster is its storage. Indeed, software running on a cluster needs to access data to process. In case of clusters, data are stored as close as possible to the compute nodes in storage servers or local storage rather than in some distant server or service, such as cloud storage. | ||
Line 89: | Line 89: | ||
* Clusters are running Linux, which heavily relies on a command line interface. You will be able to perform some tasks with a graphical interface, but at some point you will have to use a command line interface. | * Clusters are running Linux, which heavily relies on a command line interface. You will be able to perform some tasks with a graphical interface, but at some point you will have to use a command line interface. | ||
* There are many users using the cluster at the same time. So you have no guarantee your computations will start immediately. | * There are many users using the cluster at the same time. So you have no guarantee your computations will start immediately. | ||
- | * You will not directly run your program on the cluster as on you do on your computer. You will ask the queueing system to run a program, and once resources are available, the queuing system will start the program. | + | * You will not directly run your program on the cluster as you do on your personnal |
* HPC clusters where not designed for interactive tasks. While doable, they are much better candidates for asynchronous computing (without user interaction). | * HPC clusters where not designed for interactive tasks. While doable, they are much better candidates for asynchronous computing (without user interaction). | ||
| | ||
- | As a user, you will interact directly with the login node only. From this computer, you will manage your file, set up your execution configuration and ask the queuing system for computation on the compute nodes. But you will never run your programs directly on the login node. | + | As a user, you will interact directly with the login node only. From this computer, you will manage your files, set up your execution configuration and ask the queuing system for computation on the compute nodes. But you will never run your programs directly on the login node. |
{{ : | {{ : | ||
Line 130: | Line 130: | ||
</ | </ | ||
- | With this file, we are telling the queuing system "I want you to run my multiply program on one processor of one of the computing node, for a maximum of one minute" | + | With this file, we are telling the queuing system "I want you to run my multiply program on one processor of one of the computing node, for a maximum |
So let's do the job. Send those files to the cluster, submit the job to the queuing system and gather results once done. | So let's do the job. Send those files to the cluster, submit the job to the queuing system and gather results once done. | ||
- | {{ : | + | {{ : |
In this image, you see exactly the steps described above achieved from a linux terminal. | In this image, you see exactly the steps described above achieved from a linux terminal. | ||
Line 146: | Line 146: | ||
==== Going further ==== | ==== Going further ==== | ||
- | Now that you have an idea of what HPC clusters are, if you are willing to actually use them, your next steps are : | + | Now that you have an idea of what HPC clusters are. If you are willing to actually use them, your next steps are : |
* Getting familiar with Linux and command line environment. | * Getting familiar with Linux and command line environment. | ||
* Go through the rest of the HPC unige documentation to get used to the local infrastructure. | * Go through the rest of the HPC unige documentation to get used to the local infrastructure. | ||
- | **Documentation :** some important parts of the unige HPC documentation : | + | More particularly, |
+ | |||
+ | |||
+ | **Documentation :** [[best_practices]|best practices guide]]. | ||
+ | |||
+ | You can as well find help and advice through our forum, FAQ and direct contact with the HPC admin and support team. | ||
+ | |||
+ | |||
+ | **Documentation :** [[start# | ||
+ | |||
+ | |||
+ | **Documentation :** some other important parts of the unige HPC documentation : | ||
* [[start|The main page of unige HPC documentation]] | * [[start|The main page of unige HPC documentation]] |
hpc/getting_started.1659619615.txt.gz · Last modified: 2022/08/04 13:26 by Pierre Kuenzli