CodeRefinery

CodeRefinery is a course on tools needed to do efficient research software development. In-person and online courses are occasionally offered, however, all material + videos are available online. This page collects this material so that you can study on your own.

This page contains an index to all material in one place, in the order it is actually presented, and updated with the current “best” material as we produce new versions of videos / material.

How to use this material

You may go through this at your own page: written and video material are roughly the same and compliment each other; use one or the other or both in whatever order suits your styles.

Written lesson material could be used without the videos.
Videos are self-sufficient for an overview but to do examples you also want to open the written material. They are portrait-mode so that you can adjust your screen to have half of it for you.
Q&A are the live Q&A/notes asked by workshop attendees and answered during the workshop, and are optional (could be used for advanced study).

In the Hands-on Scientific Computing scheme, most of this material is the E-level, with the git-intro being C-level. This page is outside of the main Hands-on SciComp flow and there are no credits directly offered for this page.

Git introduction

The git version control system, from the very basics. How to use it well for your own projects. Topics include: why version control, git, terminology, branches, merging, conflict resolution, inspecting history, undoing things, staging area, practical advice.

Git collaborative

How to use Git with multiple people. Topics include: collaboration workflows (centralized and distributed), remotes, pushing/pulling, pull requests (merge requests), Github, more on branching and merging, conventions when contributing to other projects.

Reproducible research and FAIR data

It is easy to do things once, but it’s important to be able to do them many times, or for others to be able to do them. Topics include: motivation, organization of files in projects, environments (virtualenv, conda) and recording dependencies, automating computational steps, sharing code and data.

Jupyter

Jupyter is a system for interactive computing. Topics include: why notebooks, best practices, tips and tricks, the Jupyter ecosystem, basics of Jupyter, notebooks and version control, sharing notebooks.

Documentation

Documentation is often the difference between reusable (or usable by yourself in six months) and not. We go over various ways to make documentation much more enjoyable. Topics include: types of documentation, popular tools, in-code documentation, readme files, the Sphinx documentation generator, hosting docs on ReadTheDocs or Github Pages.

Software testing

Automatic testing is one of the cornerstones of modern software development and without it, you often end up sending more and more time fixing old bugs rather than doing new things. Here, we the concepts and simple strategies for getting started. Topics include: motivation, relevance to scientific accuracy, pytest, local testing, automated testing (Github Actions), test design.

Modular code development

When you can mix-and-match and reuse code, your productivity goes way up, and that is enabled by modularity. Here, we give a basic intro to the concept and how to do so. Topics include: what is modularity, why, functions, modules, state and pure functions, unit test, command line interface.

Concluding remarks and where to go from here

Other

Expanded video Q&A from the May 2021 workshop

Source material

Source material from past workshops (in general newer is probably better):

All CodeRefinery lessons (includes a few minor ones not in the main workshop flow).
May 2021
May 2020
- Workshop page
- YouTube playlist