Skip to content

2025-02

Julia Bookmarks

My Julia repos

Package development

Publishing code examples

Optimization

Curve fitting and parameter estimation

Modeling and simulation

Universal differential equations (UDEs)

Neural network x differential equations

Partial differential equations (PDEs)

Model analysis

Probability and Statistics

Handy tools

Arrays

Visualization

Concurrency

Julia docs: Parallel Computing

Julia design patterns

Some notes of Tom kwong's book Hands on Design patterns in Julia.

Separated Project environments

It is recommended to maintain a minimal root environment (with a few necessary packages like Revise.jl) and customize the local Julia project environment by the following steps:

  1. Go to your project folder and run julia --project=.. This will run pkg> activate . at start.
  2. Add you packages by pkg> add Pkg1 Pkg2...

Packages and modules

Julia also encourages making your own packages, even temporarily, to utilize unit-testing, precompilation, and to separate namespaces.

  • Creating Julia packages is light-weight: pkg> generate PkgName only creates two files (one julia and one TOML file). For a more complete configuration, consider using PkgTemplates or PkgSkeleton.jl for more functionalities like CI testing and code coverage.
  • Revise.jl watches file system changes and update the code in the loaded packages / modules automatically.

Functional interfaces and multiple dispatch

In the Julia world, generic functions called functions, while those with type annotations / parameterizations are called methods. My impressions so far was that, Julia is a functional interface-first programming language, by the power of multiple dispatch paradigm, to make Julia a much more flexible (in programming) and composable between packages: e.g. DiffEq + Flux + GPU kernel), and mathematically natural. However, it requires a vastly different mindset for users coming from the object-oriented worlds like Python / Java.

  • Abstract types cannot have fields. They are only meant to be inherited with their functional interfaces. Concrete types (structs with fields), on the other hand, cannot not be inherited.
  • Use parameteric type (structs) and methods rather than directly type-annotate the fields / arguments.
  • Traits are functions that return True/False/Error based on the input type. See holy traits for more details.

Delegation pattern

This is a form of polymorphism via composition1. Use a new wrapper type to established packages to reuse their code at the cose of an additional layer of indirection.

Holy traits

Holy traits are named after Tim Holy.

  • Traits are empty structs.
  • Data types are assigned catagorically to traits' interfaces, implementing different behavior for different kind of data type.
  • Traits heirarchy could be separated from the type heirarchy they modeled.

SimpleTraits.jl automates some of the boilerplate for you.

Global constant

Global (module-level) variables are discouraged for performance reasons, but global constants are welcomed in Julia since the compiler can optimize global constants.

Struct of arrays (SoA)

Struct of arrays (SoA) are superior to array of structs (AoS) in terms of performance in SIMD and GPU.

StructArrays handles the mapping of AoS (on the interface) to SoA (in the memory).

Memoization

Memoization saves duplicated work in repetitive or recursive calls.

You can implement memoization by yourself using function wrapper, local cache, and closure. But Memoize.jl would do the hard work for you.

Barrier functions for type stability

Julia runs slower in type-unstable code. Use @code_warntype in front of an expression to spot type instability. (or use @inferred in unit tests to err on type instabilities) Use generic functions (aka barrier functions) to ensure type stability.

e.g.
- zero(x) instead of 0.
- Separate kernel functions. Small functions are more easily optimized.

Keyword definition

Keyword definitions use less boilerplate code for struct initialization.

Accessor: getters and setters

Customize Base.getproperty(x, :a) for getters (x.a) and Base.setproperty!(x, :a, val) for setters (x.a = val).

Let blocks

A let block defines its own local namespace. The variables defined inside a let block cannot be accessed outside.

Functional pipes

Useful in data pipelines. Checkout Chain.jl for enhanced pipelines.

Anti-patterns in Julia

  1. Type instability especially in tight loops yields poor performance.
  2. Global variables are type-unstable. But Global constants are not.
  3. Type piracy, a.k.a. redefining an existing function or twisting the behavior of a function. Do not define methods for types you do not own.
  4. Narrow argument types means overspecialization. Write generic code first.
  5. Non-concrete field type: struct A x::Real end provides no benefit against struct A x end. Use parametric types instead alike the example below.
struct A{T<:Real}
  x::T
end

References

Julia package loading

When your Julia codebase grows larger, you might want to organize it into modules and packages.

Include other files as submodules

You could include jl files as submodules like this

main.jl
include("foo.jl")
using .Foo

include("bar.jl")
using .Bar
foo.jl
module Foo
# content of Foo module
end
bar.jl
module Bar
# content of Bar module
end
  • Best when the submodules are used exclusively for this project and will not be shared with others.
  • Usually you want to include all dependent submodules in the top-most file, like a table of contents.
  • The include and using lines need to be re-execute when the code in the submodule changes. (if Revise.includet("foo.jl") is not used)
  • Use relative module path when Bar depends on Foo.
  • There may be recursive include() calls and replace module warnings. FromFile.jl can deal with these file inclusion duplications.

Automatic package loading in a project

In this example, the project folder is JuliaHello. Note that main.jl has access to the JuliaHello module automatically.

image

src/JuliaHello.jl
module JuliaHello
greet() = print("Hello World!")
end # module
main.jl
using JuliaHello
JuliaHello.greet()

"Developing" a temporary package

Us the Julia Pkg command dev --local pkg...

Julia docs | Pkg | dev

Assuming we have the file structure for the packages

. present working directory (pwd)
| - main.jl
| - Manifest.toml
| - Project.toml
|
+---Mod1.jl
|   | - Manifest.toml (optional)
|   | - Project.toml
|   |
|   \---src
|         - Mod1.jl
|
\---Mod2.jl
    | - Manifest.toml (optional)
    | - Project.toml
    |
    \---src
          - Mod2.jl

Add local packages and track the file changes in the Julia REPL

julia> ]
pkg> activate .
pkg> dev --local Mod1 Mod2

Or run the commands in the Julia script

import Pkg

# To generate Project.toml if not present
Pkg.activate(".")

Pkg.develop(PackageSpec(path="Mod1.jl"))
Pkg.develop(PackageSpec(path="Mod2.jl"))
  • Best when Mod1 and Mod2 are modified frequently and shared.
  • Loaded code is determined by local files instead of package versions.
  • The updates are loaded when using is invoked, along with precompilation. Revise.jl tracks and updates modified files and you don't have to restart the Julia process upon module code changes.

Make a hosted package

Make a Git repo for your custom package and publish it to Git service providers, e.g. GitHub / Gitlab. And then you can ]add https://github.com/username/Mod1.jl.git

PkgTemplates.jl or PkgSkeleton.jl is recommended to generate package with unit tests and CI/CD settings.

Nonetheless, it's just one step away from proper registeration to the general Julia registry to be used by more people.

Unit testing

You can have local dependencies for running tests in test/Project.toml without the need of extra and targets sections in the main project's Project.toml.

Though the build-in unit-test framework is good, but Jive.jl provides more flexibilities. See TestJive.jl for code examples.

  • Discover unit testing jl files automatically.
  • Skip or select which test(s) to run.
  • Multiprocessing for faster runs.

Documentation

Use Documenter.jl to generate the documentation for Julia packages.

You need an SSH deploy key to deploy docs to GitHub pages.

using DocumenterTools
DocumenterTools.genkeys(user="you", repo="YourPackage.jl")

Continuous integration / delivery (CI/CD)

PkgTemplates.jl should set up the appropriate code structure for you. I would recommend to use GitHub to host Julia packages because

  • Running GH actions is unlimited for public repositories, with multiple operating systems running concurrently (matrix build).
  • Julia github actions are convenient to use.
  • Automation bots integrate better with GitHub. e.g. TagBot, Registerbot, and Compat Helper.

Reference

Publish Julian Jupyter notebooks

This post will demonstrate my template repositoryabout

  • How to use Docker to build a Julian Jupyter notebooks runtime environment.
  • How to use GitHub actions to execute notebooks in the docker container in parallel.
  • How to use jupyter-book to publish notebooks automatically when changes are pushed to GitHub.

Docker image as the runtime environment

Create a Dockerfile for the runtime environment. The whole content is:

.github/Dockerfile
FROM python:3.11.5-slim as base

# Julia config
ENV JULIA_CI true
ENV JULIA_NUM_THREADS "auto"
# Let PythonCall use built-in python
ENV JULIA_CONDAPKG_BACKEND "Null"
# Avoid recompilation
ENV JULIA_CPU_TARGET "generic"
ENV JULIA_PATH /usr/local/julia/
ENV JULIA_DEPOT_PATH /srv/juliapkg/
ENV PATH ${JULIA_PATH}/bin:${PATH}
COPY --from=julia:1.9.3 ${JULIA_PATH} ${JULIA_PATH}

FROM base

WORKDIR /app

# Python dependencies. e.g. matplotlib
COPY requirements.txt ./
RUN pip install --no-cache-dir nbconvert -r requirements.txt

# Julia environment
COPY Project.toml Manifest.toml ./
COPY src/ src
RUN julia --project="" --color=yes -e 'import Pkg; Pkg.add("IJulia"); import IJulia; IJulia.installkernel("Julia", "--project=@.")' && \
    julia --project=@. --color=yes -e 'import Pkg; Pkg.instantiate(); Pkg.resolve(); Pkg.precompile()'

Choosing the base image

Usually, Julia projects use julia as the base image; however, we need a python's nbconvert to render Jupyter notebooks. Therefore, we use the python image, which includes pip to install nbconvert and other required Python packages.

.github/Dockerfile
FROM python:3.11.5-slim as base

We then copy the Julia executable from the julia base image.

.github/Dockerfile
ENV JULIA_CI true
ENV JULIA_NUM_THREADS "auto"
ENV JULIA_CONDAPKG_BACKEND "Null"
ENV JULIA_CPU_TARGET "generic"
ENV JULIA_PATH /usr/local/julia/
ENV JULIA_DEPOT_PATH /srv/juliapkg/
ENV PATH ${JULIA_PATH}/bin:${PATH}
COPY --from=julia:1.9.3 ${JULIA_PATH} ${JULIA_PATH}

apt packages (optional)

You can install required system packages using apt-get. For example,

  • gnuplot is required by Gnuplot.jl or Gaston.jl.
  • parallel can run multiple notebooks in parallel in multi-core machines. e.g., GitHub-hosted action runners have 2 cores by default. And self-hosted runners might have more CPU cores.
  • clang or gcc is required by PacakgeCompiler.jl to compile a sysimage.

For example,

.github/Dockerfile
RUN apt-get update && apt-get install -y parallel --no-install-recommends && rm -rf /var/lib/apt/lists/*

Python dependencies

You can install Python dependencies from requirements.txt. For instance, matplotlib is require by PyPlot.jl. You can leave requirements.txt blank if you have no Python dependencies.

.github/Dockerfile
COPY requirements.txt .
RUN pip install --no-cache-dir nbconvert -r requirements.txt

Julia packages

IJulia.jl is install globally for the Jupyter kernel. Julia dependencies are defined in Project.toml and Manifest.toml and the Pkg.instantiate() command installs the dependencies.

.github/Dockerfile
# Julia environment
COPY Project.toml Manifest.toml ./
COPY src/ src
RUN julia --project="" --color=yes -e 'import Pkg; Pkg.add("IJulia")' && \
    julia --project=@. --color=yes -e 'import Pkg; Pkg.instantiate(); Pkg.resolve(); Pkg.precompile()'

Manifest and gitignore

Be sure to remove Manifest.toml from .gitignore. Thus Manifest.toml will be tracked by git and it creates a reproducible runtime.

(Optional) Build sysimage to decrease package load time

You can also build a sysimage to reduce package load time. See Satoshi Terasaki's sysimage creator for details.

Install the GCC compiler

.github/Dockerfile
RUN apt-get update && apt-get install -y gcc && rm -rf /var/lib/apt/lists/*

Julia commands are put into a script file build-kernel.jl.

.github/Dockerfile
COPY Project.toml Manifest.toml build-kernel.jl ./
COPY src/ src
RUN julia --color=yes build-kernel.jl

The build-kernel.jl script installs the IJulia kernel and PackageCompiler.jl. The IJulia kernel would load the result system image.

build-kernel.jl
# Adapted from https://github.com/terasakisatoshi/sysimage_creator/
import Pkg

Pkg.add(["PackageCompiler", "IJulia"])

using PackageCompiler

sysimage_path = joinpath(@__DIR__, "sysimage.so")

@info "SysImage path: " sysimage_path

# Packages that you want them to load faster
pkglist = ["Plots"]

PackageCompiler.create_sysimage(
    pkglist;
    project=".",
    sysimage_path=sysimage_path,
    cpu_target=PackageCompiler.default_app_cpu_target()
)

using IJulia
IJulia.installkernel("Julia-sys", "--project=@. --sysimage=$(sysimage_path)")

Pkg.rm("PackageCompiler")
Pkg.gc()

Why Docker?

  • Julia and Python dependencies in one image.
  • Skipping precompilation for the same package dependencies.
  • Friendly to continuous integration (CI).

Docker images capture and "freeze" installed dependencies, which is sharable across CI jobs and doesn't need to precompile the packages again, which takes a quite some time in thrown-away environments like CI virtual machines. Even though I tried to cache the Julia environment folder ~/.julia reused it, for some reason (probably CPU target issues) some packages still need precompilation (for the very same set of dependencies). Thus, I used docker to build a self-sufficient runtime environment.

GitHub actions workflow

.github/workflows/ci.yml
name: CI with dynamic parallel matrix

on:
  workflow_dispatch:
  push:
    branches: [main]
  pull_request:
    branches: [main]

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

permissions:
  packages: write

env:
  TIMEOUT: '-1'    # nbconvert timeout
  EXTRA_ARGS: ''   # Extra arguments for nbconvert

jobs:
  setup:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
      hash: ${{ steps.hash.outputs.id }}
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Login to Docker Hub
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: ${{ github.repository_owner }}
          password: ${{ github.token    }}
      - name: Get docker image hash
        id: hash
        run: echo "id=${{ hashFiles('requirements.txt', 'Project.toml', 'Manifest.toml', 'src/**', '.github/Dockerfile') }}" >> "$GITHUB_OUTPUT"
      - name: Build and cache Docker container
        uses: docker/build-push-action@v4
        env:
          IMG: ghcr.io/${{ github.repository }}:${{ steps.hash.outputs.id }}
        with:
          context: .
          file: '.github/Dockerfile'
          tags: ${{ env.IMG }}
          push: true
          cache-from: type=registry,ref=${{ env.IMG }}-cache
          cache-to: type=registry,ref=${{ env.IMG }}-cache,mode=max
      - name: List notebooks as a JSON array
        id: set-matrix
        working-directory: docs
        run: echo "matrix=$(python -c 'import glob, json; print(json.dumps(glob.glob("**/*.ipynb", recursive=True)))')" >> "$GITHUB_OUTPUT"

  execute:
    needs: setup
    strategy:
      max-parallel: 20
      fail-fast: false
      matrix:
        # Notebooks need to be executed
        notebook: ${{ fromJSON(needs.setup.outputs.matrix) }}
    runs-on: ubuntu-latest
    env:
      IMAGE: ghcr.io/${{ github.repository }}:${{ needs.setup.outputs.hash }}
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Get notebook path
        id: file
        run: echo "name=docs/${{ matrix.notebook }}" >> "$GITHUB_OUTPUT"
      - name: Get notebook hash
        id: hash
        run: echo "id=${{ needs.setup.outputs.hash }}-${{ hashFiles(steps.file.outputs.name) }}" >> "$GITHUB_OUTPUT"
      - name: Restore notebook if present
        uses: actions/cache/restore@v3
        id: cache
        with:
          path: ${{ steps.file.outputs.name }}
          key: ${{ runner.os }}-${{ steps.hash.outputs.id }}
      - name: Get Julia version
        if: ${{ steps.cache.outputs.cache-hit != 'true' }}
        id: julia
        run: echo "ver=$(docker run ${{ env.IMAGE }} julia -e 'print(VERSION.minor)')" >> "$GITHUB_OUTPUT"
      - name: Julia precompile
        if: ${{ steps.cache.outputs.cache-hit != 'true' }}
        run: docker run -w /tmp -v ${{ github.workspace }}:/tmp ${{ env.IMAGE }} julia --project=@. -e 'import Pkg; Pkg.instantiate(); Pkg.precompile()'
      - name: Install IJulia kernel
        if: ${{ steps.cache.outputs.cache-hit != 'true' }}
        run: docker run -w /tmp -v ${{ github.workspace }}:/tmp ${{ env.IMAGE }} julia --project="" --color=yes -e 'import IJulia; IJulia.installkernel("Julia", "--project=@.")'
      - name: Execute Notebook
        if: ${{ steps.cache.outputs.cache-hit != 'true' }}
        run: >
          docker run -w /tmp -v ${{ github.workspace }}:/tmp ${{ env.IMAGE }}
          jupyter nbconvert --to notebook --execute --inplace ${{ env.EXTRA_ARGS }}
          --ExecutePreprocessor.timeout=${{ env.TIMEOUT }}
          --ExecutePreprocessor.kernel_name=julia-1.${{ steps.julia.outputs.ver }}
          docs/${{ matrix.notebook }}
      - name: Cache notebook
        uses: actions/cache/save@v3
        if: ${{ steps.cache.outputs.cache-hit != 'true' }}
        with:
          path: ${{ steps.file.outputs.name }}
          key: ${{steps.cache.outputs.cache-primary-key }}
       - name: Upload Notebook
        uses: actions/upload-artifact@v3
        with:
          name: notebooks
          path: docs*/${{ matrix.notebook }}  # keep folder structure
          retention-days: 1

  jupyter-book:
    needs: execute
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/sosiristseng/docker-jupyterbook:latest
    # store success output flag for the ci job
    outputs:
      success: ${{ steps.setoutput.outputs.success }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Download notebooks
        uses: actions/download-artifact@v3
        with:
          name: notebooks
          path: out/
      - name: Display structure of downloaded files
        run: ls -R
        working-directory: out
      - name: Copy back built notebooks
        run: cp --verbose -rf out/docs/* docs/
      - name: Build website
        run: jupyter-book build docs/
      - name: Upload pages artifact
        if: ${{ github.ref == 'refs/heads/main' }}
        uses: actions/upload-pages-artifact@v2
        with:
          path: docs/_build/html
      - name: Set output flag
        id: setoutput
        run: echo "success=true" >> $GITHUB_OUTPUT

  # CI conclusion for GitHub status check
  # https://brunoscheufler.com/blog/2022-04-09-the-required-github-status-check-that-wasnt
  CI:
    needs: jupyter-book
    if: always()
    runs-on: ubuntu-latest
    steps:
      # pass step only when output of previous jupyter-book job is set
      # in case at least one of the execution fails, jupyter-book is skipped
      # and the output will not be set, which will then cause the ci job to fail
      - if: ${{ needs.jupyter-book.outputs.success == "true" }}
        run: echo "Tests passed" && exit 0
      - if: ${{ needs.jupyter-book.outputs.success != "true" }}
        run: echo "Tests failed" && exit 1

  # Deployment job
  deploy:
    name: Deploy to GitHub pages
    needs: jupyter-book
    if: ${{ github.ref == 'refs/heads/main' }}
    # Grant GITHUB_TOKEN the permissions required to make a Pages deployment
    permissions:
      pages: write # to deploy to Pages
      id-token: write # to verify the deployment originates from an appropriate source
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v2

The workflow includes 4 stages

  • setup: builds and caches the runtime docker container
  • execute: executes notebooks in parallel
  • jupyter-book: renders executed notebooks
  • CI and page deployment: concludes the workflow and pushes rendered webpages to GitHub pages.

Thesetup stage

Thesetup stage builds the runtime Docker image.

  • The setup-buildx-action uses buildx to cache docker image layers.
  • The docker file hash was calculated from .github/Dockerfile and package dependencies.
  • The build-push-action builds the docker image from the .github/Dockerfile and pushes the image to Github container registry (GHCR).

We also list all the jupyter notebooks (*.ipynb) in the docs folder in a JSON array for the next stage.

The execute stage

This stage executes notebooks in parallel using a job matrix to execute notebooks in parallel and reduce overall running time. The concurrency limit is 20 for GitHub free tier (both personal and organization accounts). That is, you can run up to 20 notebooks simultaneously.

This stage uses the docker image from the previous stage. Finished notebooks are uploaded as artifacts for the next stage. Both the runtime environment docker image and the notebook content are cached. The workflow will skip running if a exact copy was built before.

The jupyter-book stage

This stage renders the notebooks using jupyter-book, a static site generator (SSG) building publication-quality books and websites from Markdown documents(*.md) and Jupyter notebooks (*.ipynb).

Here, we collect executed notebooks from the previous stage using actions/download-artifact, use jupyter-book to render them into a website, and upload them as a website artifact.

Why Jupyter Notebooks?

There are Pluto notebooks-based publishing like PlutoStaticHTML.jl and PlutoSliderServer.jl, but someone might prefer a Jupyter notebook-based workflow and I would like to share a way to publish Jupyter notebooks. Since notebook execution is tied to continuous integration (CI), we can make sure the code works under specified Julia dependencies.

Are there alternatives to jupyter-book?

You can also use Quarto for this stage. Quarto is an open-source scientific and technical publishing system built on pandoc, which also renders Markdown files and Jupyter Notebooks into a beautiful website.

The deploy stage

Finally, we deploy the rendered files to GitHub pages if the content was pushed from the main branch. Be sure to enable github pages via repository settings -> Pages -> Build and deployment -> GitHub actions as source.

The CI stage for status check

GitHub status check treats skipped workflows as passed. Thus, even if any of the notebooks went wrong, the jupyter-book step will be skipped and the overall status check will still be green, which is not ideal for continuouse integration. This blog post by Bruno Scheufler provides a workaround for this issue by adding a additional stage to determine the execution status.

Other workflows

jupyter-book can check web links in Jupyter notebooks.

.github/workflows/linkcheck.yml
name: Check markdown links

on:
  workflow_dispatch:
  schedule:
    - cron: '0 0 1 * *' # Every month
  push:
    branches:
      - main
    paths:
      - 'docs/**'
      - '.github/workflows/linkcheck.yml'

jobs:
  linkcheck:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/sosiristseng/docker-jupyterbook:latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Disable code cell execution
        uses: mikefarah/yq@master
        with:
          cmd: yq -i '.execute.execute_notebooks = "off"' 'docs/_config.yml'
      - name: Check links
        run: jupyter-book build docs/ --builder linkcheck

MyBinder container

Building executable environments for mybinder.org could be time-consuming and often results in time out error when using its own service to build the docker image. Thus, we run repo2docker by GitHub actions to build executable environments for mybinder. The executable environment docker container will be stored at GitHub container registry (GHCR). mybinder.org will directly copy this container for the executable environment upon request, greatly reducing environment build time.

.github/workflows/binder.yml
name: Build Binder Container

on:
  push:
    branches:
      - main

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  binder:
    permissions:
      packages: write
      contents: write
      pull-requests: write
    env:
      IMAGE_NAME: ghcr.io/${{ github.repository }}:binder
    runs-on: ubuntu-latest
    steps:
    - name: Checkout Code
      uses: actions/checkout@v3
    - name: Remove binder folder if present
      run: rm -rf .binder/ || true
    - name: Setup Python
      uses: actions/setup-python@v4
      id: python
      with:
        python-version: '3.x'
    - name: Install repo2docker
      run: pip install jupyter-repo2docker
    - name: Login to GitHub Container Registry
      uses: docker/login-action@v2
      with:
        registry: ghcr.io
        username: ${{ github.repository_owner }}
        password: ${{ secrets.GITHUB_TOKEN }}
    - name: Pull docker image
      run: docker pull ${{ env.IMAGE_NAME }} || true
    - name: Build binder image with repo2docker
      run: >
        jupyter-repo2docker
        --image-name ${{ env.IMAGE_NAME }}
        --cache-from ${{ env.IMAGE_NAME }}
        --push --no-run --user-id 1000 --user-name jovyan
        .
    - name: Add back binder folder and Dockerfile
      run: |
        mkdir -p .binder
        echo "FROM ${{ env.IMAGE_NAME }}" > .binder/Dockerfile
    - name: Create Pull Request if binder Dockerfile has changed
      id: cpr
      uses: peter-evans/create-pull-request@v4
      with:
        title: Binder Dockerfile
        add-paths: .binder/Dockerfile
        branch: binder-dockerfile

Julia dependencies update

Julia dependencies (Manifest.toml) can be regularly updated and checked if the updated dependencies work or not.

.github/workflows/update.yml
name: Auto update Julia dependencies

on:
  workflow_dispatch:
  schedule:
    - cron: '0 0 * * 1' # Every week
  push:
    branches:
      - main
    paths:
      - .github/Dockerfile
      - .github/workflows/update-manifest.yml

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

env:
  DFILE: '.github/Dockerfile'
  IMAGENAME: 'app:test'

jobs:
  update-manifest:
    permissions:
      contents: write
      pull-requests: write
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Build and cache Docker container
        uses: docker/build-push-action@v4
        with:
          target: base
          context: .
          file: ${{ env.DFILE }}
          tags: ${{ env.IMAGE_NAME }}
          load: true
      - name: Update Julia dependencies
        run: >
          docker run
          --workdir=/tmp -v ${{ github.workspace }}:/tmp
          -e JULIA_PKG_PRECOMPILE_AUTO=0
          ${{ env.IMAGE_NAME }}
          julia --color=yes --project=@. -e "import Pkg; Pkg.update()"
      # Authenticate with a custom GitHub APP
      # https://github.com/peter-evans/create-pull-request/blob/main/docs/concepts-guidelines.md#authenticating-with-github-app-generated-tokens
      - name: Generate token
        uses: tibdex/github-app-token@v1
        id: generate-token
        with:
          app_id: ${{ secrets.APP_ID }}
          private_key: ${{ secrets.APP_PRIVATE_KEY }}
      - name: Create Pull Request
        id: cpr
        uses: peter-evans/create-pull-request@v4
        with:
          title: Julia Dependency Update
          token: ${{ steps.generate-token.outputs.token }}
          labels: |
            automerge

Further reading

Python Bookmarks

Awesome Python: a curated list of Python stuff.

Python IDEs

JupyterLab themes

Run Python Notebooks online

  • Google Colab: an online jupyter notebook platform for machine learning.
  • Binder: make online (e.g., GitHub repo) notebooks executable.

Package Manager for Python

  • Anaconda Python: a full set of scientific Python packages with the conda package manager.
      - conda-forge/miniforge : minimal installation with conda and mamba package managers and the conda-forge community packages.
      - micromamba: micromamba is a tiny version of the mamba package manager.
      - prefix-dev/pixi: pixi is a cross-platform, multi-language package manager similar to conda, but is blazing fast and written in Rust.
  • pipenv: the official dependency management tool for Python packages.
  • Poetry: Python packaging and dependency management.
  • pdm: A modern Python package manager with PEP 582 support.
  • uv: uv is a drop-in replacement for pip, an extremely fast Python package and project manager written in Rust.

Machine learning

Gradient boosters

Wikipedia: Gradient boosting

Hyperparameter optimization

Wikipedia: Hyperparameter optimization

Scientific machine learning

  • lululxvi/deepxde : deep learning and differential equations.
  • google/jax : Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more.

Python tutorials

Visualization Tutorials

Scientific Python Tutorials

Data Processing and Machine learning

Static site generators

Docsify

Docsify renders Markdown files to HTML on-the-fly. Technically docsify is a single page application (SPA) rather than a static site generator (SSG).

Docsify themes

Rust-based

Python-based

  • Jupyter book : Building beautiful, publication-quality books and documents from jupyter notebooks.
  • MkDocs : MkDocs is a fast, simple and downright gorgeous static site generator that's geared towards building project documentation. The most famous theme is the MkDocs Material theme. Template made by me.
  • Nikola : Static Site Generator written in Python.

Julia-based

Misc

  • Bookdown : Write HTML, PDF, ePub, and Kindle books with R Markdown. Written in R.
  • Jekyll : The default SSG for GitHub pages. Written in Ruby.
  • Publii : A content management system (CMS) for creating static websites fast and hassle-free, even for beginners.
  • Quarto: an open-source scientific and technical publishing system built on Pandoc.

Artifacts in GitHub actions

Artifacts are data generated by workflows that can be passed to subsequent jobs.

Workflows

Official workflows

The actions/upload-artifact and the actions/download-artifact actions by GitHub for storing and retrieving artifact(s) in the same workflow.

Artifacts across two different workflows

The dawidd6/action-download-artifact action downloads and extracts uploaded artifact(s) associated with a given (different) workflow.

The tonyhallett/artifacts-url-comments action creates comment(s) in pull request and/or associated issues containing the URL to artifacts from the workflow run being watched.

Merge multiple artifacts

merge-multiple: true

- name: Download notebooks
  uses: actions/download-artifact@v4
  with:
    path: path/of/artifacts
    pattern: notebook-*
    merge-multiple: true
- name: Display structure of downloaded files
  run: ls -R path/of/artifacts

Automatic Dependency Update in GitHub

Updating package dependencies automatically as a part of continuous integration (CI)

Dependabot

Dependabot creates a pull request once there is an update for the dependencies. The pull requests are usually tested by continuous integration (CI).

However, dependabot does not support automerging on its own due to security concerns. The good news is that we could use Kodiak to do the job. See it's quickstart if you are interested.

For example, the dependabot file .github/dependabot.yml

.github/dependabot.yml
version: 2

updates:
  - package-ecosystem: "github-actions"
    directory: "/"
    schedule:
      interval: "daily"
    labels:
    - "automerge"

Kodiak bot file: .github/.kodiak.toml

.github/.kodiak.toml
version = 1

[merge]
method = "squash"

And you need additional steps in the Github settings to make Kodiak Bot work

  • Add the automerge tag in the GitHub issue tab.
  • In Options -> Branches, protect the to-be-merged branch (usually the main branch)
  • Also tick "Require status checks to pass before merging" and "Require branches to be up to date before merging"
  • And select which github action job(s) should be passed in order to automerge by using the search bar below.

Renovate

Renovate bot can manage both dependency update checking and automated pull request merging.

Renovate supports a variety of platforms

  • GitHub (.com and Enterprise)
  • GitLab (.com and CE/EE)
  • Bitbucket Cloud / Servee
  • Azure DevOps
  • Gitea

And a variety of programming languages

  • Git submodules
  • GitHub actions
  • Node JS packages
  • Dockerfile
  • Javascript (and node JS)
  • Java
  • And more

Setup for GitHub

Enable the Renovate GitHub APP for GitHub repositories. Renovate bot will open an pull request for reachable repos to begin an interactive setup.

Setup for GitLab

According to the renovate GitLab runner documentation,

  1. Create a repository for the Renovate runner.
  2. Add a GitLab personal access token (PAT) with read_user, api and write_repository scopes as the RENOVATE_TOKEN CI/CD variable,
  3. Add a GitHub PAT as GITHUB_COM_TOKEN. This token allows renovate bot to read information of updated dependencies unhindered.
  4. Create .gitlab-ci.yml to run the pipelines
    .gitlab-ci.yml
    include:
     - project: 'renovate-bot/renovate-runner'
       file: '/templates/renovate-dind.gitlab-ci.yml'
    
  5. Select what repositories renovate bot could touch by setting up the CI/CD variable RENOVATE_EXTRA_FLAGS : --autodiscover=true --autodiscover-filter=group1/* or configure them in the config.js file.
    config.js
    module.exports = {
        repositories: [
            "group1/repo1",
            "group2/repo2",
        ],
    };
    

    As a plus, it's easier to set up more renovate runner options in the config.js file.
  6. Setup a schedule for the pipeline.

Renovate settings file

The settings file renovate.json example

renovate.json
{
  "extends": [
    "config:recommended",
  ],
  "git-submodules": {
      "enabled": true
  }
}

Caching in GitHub actions

Caching dependencies

The actions/cache action caches dependencies for the execution environment.

- name: Cache multiple paths
  uses: actions/cache@v4
  with:
    path: |
      ~/cache
      !~/cache/exclude
    key: ${{ runner.os }}-${{ hashFiles('**/Lockfile') }}
    restore-keys: |
      ${{ runner.os }}-
  • The key is the identifier for writing into the cache. If the key stays the same before and after the workflow, the cache will not be updated.
  • The restore-keys are the identifiers for reading the cache besides the key. If there is no matching key but a part of it (restore-keys) matches, the GitHub action will still read the cache and update it after the job. (since the key is different)

Restore and save actions

The cache actions could be split into restore and save steps, leading to a fine-grained behavior.

- name: Restore cached Primes
      id: cache-primes-restore
      uses: actions/cache/restore@v4
      with:
        path: |
          path/to/dependencies
          some/other/dependencies
        key: ${{ runner.os }}-primes
#
# //intermediate workflow steps
#
- name: Save Primes
  id: cache-primes-save
  uses: actions/cache/save@v4
  with:
    path: |
      path/to/dependencies
      some/other/dependencies

Caching for a specific programming language

Some GitHub actions for setting up runtime for programming languages can cache package dependency.

Cleanup PR caches

Clean up PR caches after it closes to save space.

name: Cleanup PR caches
on:
  pull_request:
    types:
      - closed

jobs:
  cleanup:
    permissions:
      actions: write
    runs-on: ubuntu-latest
    steps:
      - name: Cleanup
        run: |
          gh extension install actions/gh-actions-cache

          echo "Fetching list of cache key"
          cacheKeysForPR=$(gh actions-cache list -R $REPO -B $BRANCH -L 100 | cut -f 1 )

          ## Setting this to not fail the workflow while deleting cache keys.
          set +e
          echo "Deleting caches..."
          for cacheKey in $cacheKeysForPR
          do
              gh actions-cache delete $cacheKey -R $REPO -B $BRANCH --confirm
          done
          echo "Done"
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          REPO: ${{ github.repository }}
          BRANCH: refs/pull/${{ github.event.pull_request.number }}/merge

Git Operations in GitHub actions

Git commands, such as checkout, add, create a branch, make a pull request in Github actions.

Checkout (Clone a repository)

The official actions/checkout action clones the repository to $GITHUB_WORKSPACE. By default it uses built-in GITHUB_TOKEN for authentication.

In most cases, this is what you need:

- uses: actions/checkout@v4

The checkout action also supports pushing a commit to the same repo.

Warning

This may not work on protected branches that need status checks.

on: push
jobs:
  git-push:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: |
          date > generated.txt
          git config user.name github-actions
          git config user.email github-actions@github.com
          git add .
          git commit -m "generated"
          git push

However, no further workflows will be triggered with the GITHUB_TOKEN. You will need the following steps to trigger workflows.

How to trigger further CI runs

You will need either a Personal access token (PAT) with repo scope access as an action secret.

- uses: actions/checkout@v4
  with:
    token: ${{ secrets.PAT }}

Or a pair of SSH keys; the public key is the deploy key with write access, while the private key is an action secret variable SSH_PRIVATE_KEY.

- uses: actions/checkout@v4
  with:
    ssh-key: ${{ secrets.SSH_PRIVATE_KEY }}

Push changes back to GitHub

The following actions are more convenient for commit and push than the official checkout action.

Create a pull request

The peter-evans/create-pull-request action will commit all files into a new branch and make a pull request to the target (default main) branch.

- name: Create Pull Request
  uses: peter-evans/create-pull-request@v6
  with:
  # token: ${{ secrets.PAT }} # A PAT is required for triggering pull request workflows
    token: ${{ secrets.GITHUB_TOKEN }}  # This will not trigger further workflows

Merge pull requests