Skip to content

DevOps

GitLab

GitLab CI/CD is a tool built into GitLab for software development for Continuous Integration (CI) and Continuous Delivery/Deployment (CD).

Parallel Matrix build

Test and build in parallel with matrix build in Gitlab CI/CD.

For example,

.gitlab-ci.yml
test:
  image: $IMAGE
  script:
    - echo $MSG
    - python -V
  parallel:
    matrix:
      # First cartesian set of parameters
      - IMAGE: ['python:3.6-alpine', 'python:3.7-alpine']
        MSG: ['Test1', 'Test2']
      # Second cartesian set of parameters

This will create 4 jobs with a combination of a custom message and a specific Python image.

See also the blog post by Michael Friedrich for more parallel matrix build with GitLab CI/CD.

Replace old only/except with new rules to include or exclude jobs in pipelines

GitLab CI/CD rules reference

Note

Rules cannot be used together with only/except. Otherwise, GitLab will return a key may not be used with rules error.

only run if this is a scheduled pipeline

.gitlab-ci.yml
scheduled-update:
  # only run if this is a scheduled pipeline
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"

Run upon push

.gitlab-ci.yml
push-job:
  rules:
    - if: $CI_PIPELINE_SOURCE == "push"

Run upon merge request

.gitlab-ci.yml
merge-request:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"

Run only for the commits in the default branch

.gitlab-ci.yml
# GitLab pages job
pages:
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

Run only for tags

.gitlab-ci.yml
pages:
  rules:
    - if: $CI_COMMIT_TAG

Choose a specific runner

Use tags to tun jobs in a specific runner e.g., your self-hosted GitLab runner in the workstation.

.gitlab-ci.yml
run-custom:
  tags:
    - myWS
  script:
    - echo "Running in my workstation."

Create a release

Create a release with GitLab CI/CD pipelines with the release-cli docker image:

.gitlab-ci.yml
release_job:
  stage: release
  image: registry.gitlab.com/gitlab-org/release-cli:latest
  rules:
    - if: $CI_COMMIT_TAG                  # Run this job when a tag is created manually
  script:
    - echo "Running the release job."
  release:
    name: "Release $CI_COMMIT_TAG"
    description: "Release created using the release-cli."

Cache Conda Packages

We can cache conda packages by setting CONDA_PKGS_DIRS environment variable inside the project folder (CI_PROJECT_DIR) so that the GitLab runner can cache these dependencies.

.gitlab-ci.yml
image: condaforge/miniforge3:latest

variables:
  CONDA_PKGS_DIRS: "${CI_PROJECT_DIR}/.cache/conda/pkgs"

cache:
  - key:
      files:
        - environment.yml
    paths:
      - .env/
      - .cache/conda/pkgs

before_script:
  - conda env update --prefix ./.env --file environment.yml --prune
  - source activate ./.env

Because GitLab only caches files inside the project folder (CI_PROJECT_DIR)

  • CONDA_PKGS_DIRS is set to ${CI_PROJECT_DIR}/.cache/conda/pkgs to hold the downloaded compressed packages.
  • Extracted environment folder is set to ${CI_PROJECT_DIR}/.env using the --prefix option.

Conda will create the runtime environment according to environment.yml. The environment folder will be created (if not present) or cached. The option --prune means conda will remove unnecessary packages for subsequent caching.

Git Operations in GitLab CI/CD

Using SSH keys

Warning

Currently the private key cannot be masked and base64 encoding/decoding is needed.

You can use a pair of SSH keys to access a git repository
- The private key would be a CI/CD project variable
- The public key would be a deploy key

You also need additional steps to setup a SSH client in the pipeline.

before_script:
   # apt-get applies to Debian-based images. Change the package manager if needed.
  - 'which ssh-agent || ( apt-get update -qy && apt-get install openssh-client -qqy )'
  - 'which git || ( apt-get update -qy && apt-get install git -qqy )'
  - eval `ssh-agent -s`
  - echo "${SSH_PRIVATE_KEY}" | tr -d '\r' | ssh-add - > /dev/null # add ssh key
  - '[[ -f /.dockerenv ]] && echo -e "Host *\n\tStrictHostKeyChecking no\n\n" > ~/.ssh/config'

And replace the default HTTP-based git origin with the SSH one.

script:
  - git remote rm origin && git remote add origin git@gitlab.com:$CI_PROJECT_PATH.git

Using a personal access token (PAT)

Compared to SSH, using a personal access token (PAT) with write repo right might be simpler. In the following example, the PAT is stored as a masked CI/CD variable GIT_PUSH_TOKEN.

script:
  - bash update.sh
  - |
    if [ -n $(git status --porcelain) ]; then
        echo "Committing updates"
        git config --global user.name "${GITLAB_USER_NAME}"
        git config --global user.email "${GITLAB_USER_EMAIL}"
        git add .
        git commit -m "Automated update: $(date '+%Y-%m-%d-%H-%M-%S')"
        git push "https://${GITLAB_USER_NAME}:${GIT_PUSH_TOKEN}@${CI_REPOSITORY_URL#*@}"
        exit;
    else
        echo "no change, nothing to commit"
    fi

For a MR pipeline, GitLab provides git push options for merge request settings.

script:
  - bash update.sh
  - |
    if [ -n $(git status --porcelain) ]; then
        echo "Committing updates"
        NEW_BR=auto-update-$(date '+%Y-%m-%d-%H-%M-%S')
        git config --global user.name "${GITLAB_USER_NAME}"
        git config --global user.email "${GITLAB_USER_EMAIL}"
        git checkout -b ${NEW_BR}
        git add .
        git commit -m "${NEW_BR}"
        git push "https://${GITLAB_USER_NAME}:${GIT_PUSH_TOKEN}@${CI_REPOSITORY_URL#*@}" \
            -o merge_request.create \
            -o merge_request.target="${CI_DEFAULT_BRANCH}" \
            -o merge_request.merge_when_pipeline_succeeds \
            -o merge_request.remove_source_branch \
            -o merge_request.title="${NEW_BR}" \
            -o merge_request.label="automated update" \
            -o merge_request.assign="${GITLAB_USER_NAME}"
        exit;
    else
        echo "no change, nothing to commit"
    fi

Synchronize GitLab repo to GitHub

Assuming you have two identical repositories on GitLab and GitHub each (you can do this by importing one's repo to the other), the following steps show how to mirror GitLab repositories to GitHub with deploy SSH keys.

On the GitLab side
  1. In the GitLab repo, go to Settings/Repository/Mirroring repositories and set Git repository URL as ssh://git@github.com/<namespace>/<repo>.git. e.g. ssh://git@github.com/sosiristseng/docker-python-julia.git

Warning

The GitHub button gives git@github.com:<namespace>/<repo>.git as the repo URL, one should change it to ssh://git@github.com/<namespace>/<repo>.git for GitLab to access the repository.

  1. Set Mirror direction to push.

  2. Set Authentication method to SSH public key. Optionally you can click Detect host keys.

  3. (Optionally) check "Keep divergent refs" to prevent force pushes and/or "Mirror only protected branches" for a cleaner GitHub mirror.

  4. Click Mirror repository.
  5. Copy the SSH public key (the middle button) and go to the GitHub mirror repo.
On the GitHub side

In the Github mirror repository, go to Settings/Deploy keys and add deploy key.

Paste the SSH public key copied from the GitLab source. Give it a title, allow write access, click add key to finish this step, and viola.

Dynamic parallel matrix

Job matrix creates multiple job runs that are based on the combinations of the variables. Sometimes we want a dynamic number of matrix jobs, which requires a JSON array as an output. Here we use json and glob modules in Python to generate that JSON list.12

Git Commands

Sources:

Ordinary workflows

HEAD: the current state of the repo.

Download a repository

Clone a git repo from a remote repository:

git clone <url>

Checking out a specific branch:

git clone <url> -b <branchname>

If there are submodule(s) in the Git repository, you might want to clone them as well using the --recursive option.

git clone <url> --recursive

See also: SSH login to Git services like GitHub and GitLab.

Make changes and commit

git status      # The current state of the repository.
git add <file>  # Add a new or edited file to the staging area. i.e. telling git to track this file
git add -A      # Track all files at once
git commit -m "Commit message"  # Commit staged (added) file
git commit -am "Commit message" # Commit modified files without having to run git add beforehand
git revert <SHA>                # Make a counter commit to undo the changes. The tracked files will go back to the <SHA> commit.

Synchronize with remote: Push and pull

git fetch # Download objects and refs from another repository without really pull in the changes
git merge # After git fetch, merge the changes done in the remote to the local repo
git push <remote> <branch-name> # Push commits in to remote
git push --set-upstream <remote> <name-of-your-branch>  # Setup remote url before push
git pull <remote>  # Pull changes from the remote

Stash

To temporarily store untracked files.

git stash -u   # Store current work with untracked files
git stash pop  # Bring stashed work back to the working directory

Work with branches

git branch <branch_name>    # Create a new branch
git branch -a               # List all branches
git branch -d <branch_name> # Delete a branch

git checkout <branch_name>    # checkout an existing branch
git checkout -b <branch_name> # Create a new branch and checkout it

git switch <branch_name>    # Switch to a specified branch. If the branch name does not exist, create one.
git merge  <branch_name>    # Merge the branch into the current branch

Orphan branches

Orphan branches are unrelated to others in history. For example, gh-pages branch dedicated to GitHub pages.

git branch --orphan <branchname>  # Create a orphan branch

Git submodule

Frequently used commands for Git submodules.

Add a submodule

TO add the reference to another git project as a submodule:

git submodule add $url $path
git submodule update --init --recursive

Alternatively, you can use GUI tools like or GitHub desktop. They download and initiate submodules automatically.

Add you will see the file .gitmodules with information about the submodule(s). For instance,

.gitmodules
[submodule "themes/DoIt"]
    path = themes/DoIt
    url = https://github.com/HEIGE-PCloud/DoIt.git

Track a specific branch in the submodule

With -b $branch option

git submodule add -b $branch $url $path

Or set-branch -b $branch if you already have added a submodule

git submodule set-branch -b  $branch $path

Update all Git submodules to the latest commit

From a stackOverflow post and Git docs

git submodule update --remote --merge

For automated updates by bots, see automatic dependency update.

Remove a submodule

From Git docs

# Remove submodule from config
git submodule deinit $path
# Delete submodule tracking data
git rm <submodule path> && git commit
# Complete removal
rm -rf $GIT_DIR/modules/$name/

SSH login to GitHub and GitLab

Generate a pair of SSH keys

ssh-keygen -t ed25519 -C "your_email@example.com"

The SSH agent will ask you to enter a location to save the keys. e.g. /home/user/.ssh/id_ed25519. Passphrase is optional.
Then there will be two SSH key files:

  • ~/.ssh/id_ed25519 is the private key. Protect it at all costs.
  • ~/.ssh/id_ed25519.pub is the public key.

Using different keys for GitHub and GitLab access is more secure. However, the same pair of keys is used for this demonstration purposes.

Add remote to the SSH settings

Edit ~/.ssh/config

mkdir -p ~/.ssh
touch ~/.ssh/config
chmod 600 ~/.ssh/config
nano ~/.ssh/config

Add the following content to set the private key as the IdentityFile.

~/.ssh/config
Host GitHub
  HostName github.com
  IdentityFile ~/.ssh/id_ed25519

Host GitLab
  HostName gitlab.com
  IdentityFile ~/.ssh/id_ed25519

Add the SSH key to your GitHub account

According to 📖 Github docs, add the SSH key here

  • Paste the content of the public key, ~/.ssh/id_ed25519.pub to the key field.
  • Add a descriptive label for the new key in the "Title" field.
  • Finally, click the Add SSH key green button. If prompted, confirm your GitHub password.

Add the SSH key to your GitLab account

Add the SSH key here

  • Paste the content of the public key, ~/.ssh/id_ed25519.pub to the key field.
  • Add a descriptive label for the new key in the "Title" field.
  • Select an expiration date.
  • Finally, click the Add key button.

Test your setup

GitHub:

ssh -vT git@github.com

Accept its fingerprint if prompted. If you see "Hi user! You've successfully authenticated, but GitHub does not provide shell access" that means login is successful.

GitLab:

ssh -vT git@gitlab.com

Accept its fingerprint if prompted.

Strip Jupyter Notebook Output

Jupyter notebooks without multimedia outputs are more friendly to source control since git is not good at comparing binary data (e.g., plots, pictures, videos) in jupyter notebooks. And they tend to bloat the size of git repositories.

Removing large binary blobs in the git tree

Git filter-repo is a filter-branch replacement for rewriting history written in a single-file python script.

To wipe large binary files entirely:

git filter-repo --strip-blobs-bigger-than 100M

Bonus: Remove sensitive content

git filter-repo --use-base-name --path id_dsa --path id_rsa --invert-paths
git filter-repo --replace-text passwords.txt