D: Clusters and High Performance Computing

About

Questions

Video Intro

Reading

Aalto

D01 What is HPC?

Before you can use larger resources, you need to understand the difference from your own computers

>What are the scales of computing?

>HPC Intro

Triton cluster intro

D20 Modules and software

Using and installing software on a cluster is different from your own computer, because hundreds of people are sharing it. Modules are the solution.

>How do you use module? >How do you find software?

>Lmod introduction

>Triton tutorials for intro: modules, applications, >Lmod user guide

> Software and applications, > modules

D21 Batch systems

On a cluster, you have to share resources with others. Slurm is one batch queuing system that makes it possible.

>What role does the batch system fill? >How does one submit to the batch system?

>Slurm basics >interactive jobs >batch jobs

Triton tutorials: >interactive, >serial, >array

Triton tutorials: interactive, serial, array

D22 HPC Storage

Storage turns out to be just as important as computing power. There are different places available, each with different advantages.

>Why is storage so important? >How can you monitor input/output (I/O) performance? >How to best handle your data?

>HPC I/O principles

>Storage basics.

Triton tutorials: storage basics. More advanced: lustre, local storage, small files

D23 Parallel computing

The point of a cluster is to run things in parallel. Shared memory (OpenMP) and message passing (MPI) are the most common models. Learn how to run them, not write them.

>What are the main models of parallel code? >How are they run on clusters? >How do you figure out what your code uses?

>Parallel jobs.

Triton tutorials: parallel.

D24 Advanced shell scripting and automation

Hands-on shell scripting, putting everything together to automate large computations on the cluster.

Various courses, finishing the linux shell tutorial is a good start. The Advanced bash scripting guide is a classic.