D: Clusters and High Performance Computing



Video Intro



D01 What is HPC?

Before you can use larger resources, you need to understand the difference from your own computers

>What are the scales of computing?

>HPC Intro

Triton cluster intro

D20 Modules and software

Using and installing software on a cluster is different from your own computer, because hundreds of people are sharing it. Modules are the solution.

>How do you use module? >How do you find software?

>Lmod introduction

>Triton tutorials for intro: modules, applications, >Lmod user guide

> Software and applications, > modules

D21 Batch systems

On a cluster, you have to share resources with others. Slurm is one batch queuing system that makes it possible.

>What role does the batch system fill? >How does one submit to the batch system?

>Slurm basics >interactive jobs >batch jobs

Triton tutorials: >interactive, >serial, >array

Triton tutorials: interactive, serial, array

D22 HPC Storage

Storage turns out to be just as important as computing power. There are different places available, each with different advantages.

>Why is storage so important? >How can you monitor input/output (I/O) performance? >How to best handle your data?

>HPC I/O principles

>Storage basics.

Triton tutorials: storage basics. More advanced: lustre, local storage, small files

D23 Parallel computing

The point of a cluster is to run things in parallel. Shared memory (OpenMP) and message passing (MPI) are the most common models. Learn how to run them, not write them.

>What are the main models of parallel code? >How are they run on clusters? >How do you figure out what your code uses?

>Parallel jobs.

Triton tutorials: parallel.

D24 Advanced shell scripting and automation

Hands-on shell scripting, putting everything together to automate large computations on the cluster.

Various courses, finishing the linux shell tutorial is a good start. The Advanced bash scripting guide is a classic.