D: Clusters and High Performance Computing
About |
Questions |
Video Intro |
Reading |
Aalto |
|
---|---|---|---|---|---|
D01 What is HPC? |
Before you can use larger resources, you need to understand the difference from your own computers |
>What are the scales of computing? |
|||
D20 Modules and software |
Using and installing software on a cluster is different from your own computer, because hundreds of people are sharing it. Modules are the solution. |
>How do you use |
>Triton tutorials for intro: modules, applications, >Lmod user guide |
||
D21 Batch systems |
On a cluster, you have to share resources with others. Slurm is one batch queuing system that makes it possible. |
>What role does the batch system fill? >How does one submit to the batch system? |
Triton tutorials: >interactive, >serial, >array |
Triton tutorials: interactive, serial, array |
|
D22 HPC Storage |
Storage turns out to be just as important as computing power. There are different places available, each with different advantages. |
>Why is storage so important? >How can you monitor input/output (I/O) performance? >How to best handle your data? |
Triton tutorials: storage basics. More advanced: lustre, local storage, small files |
||
D23 Parallel computing |
The point of a cluster is to run things in parallel. Shared memory (OpenMP) and message passing (MPI) are the most common models. Learn how to run them, not write them. |
>What are the main models of parallel code? >How are they run on clusters? >How do you figure out what your code uses? |
Triton tutorials: parallel. |
||
D24 Advanced shell scripting and automation |
Hands-on shell scripting, putting everything together to automate large computations on the cluster. |
Various courses, finishing the linux shell tutorial is a good start. The Advanced bash scripting guide is a classic. |