Search and Replace across all Repositories in a GitLab Instance

We recently introduced container registry mirrors in our kubernetes cluster at containerd level. Since this day, every team specified the pull-through cache directly in the image name like: image: docker-cache.example.com/library/alpine. To remove docker-cache.example.com as single point of failure, all teams need to change the image name back to image: docker.io/library/alpine or image: alpine.

A possible solution would be to write a mutating kubernetes webhook which alters the image name for every pod. That would work but it would not change the image in the source code. This solution works ASAP but would lead to inconsistent helm charts.

Before enforcing the new image names via OPA, we thought about helping the teams to change the image name instead of blocking their deployments. Manually digging through 200+ services from 40+ teams was not an option. Let the automation begin.

How to clone a Company

GitLab has a neat CLI tool called glab found here. With glab you can create issues, merge requests, releases and much more from the command line.

In order to modify every repository in our GitLab instance we first need to clone them locally.

glab repo clone -g <group> -a=false -p --paginate

Parameters:

  • -g allows you to specify the group
  • -a specify if you want to clone archived repositories
    • since you cannot modify them anyways we do not want to include them
  • -p preserves namespace and clones them into subdirectories
  • --paginate makes additional requests in order to fetch all repositories

Unfortunately glab does not let you specify the git depth for the repositories. In general we would like to have a shallow clone since the history is not important for us and it would reduce network bandwith + disk space a lot.

Substitution

As already explained we want to replace all occurences of docker-cache.example.com with docker.io. Since the mirror only applies to container deployed into Kubernetes, our script should only trigger for helm chart files.
The replacements lets you specify an array in case you have multiple different pull through proxies defined.

#!/bin/bash
# replace.sh

replacements=(
    # caches
    's/docker-cache.example.com/docker.io/g'
    's/ghcr-cache.example.com/ghcr.io/g'
)

# finds all .yaml and .yml files
# filters out files that include 'gitlab-ci' or 'docker-compose' in their name
for file in $(find $1 -type f -name "*.y*ml" | grep -v "docker-compose" | grep -v "gitlab-ci"); do
    org=$(cat $file)
    mod="$org"

    # loop over replacements
    for pattern in "${replacements[@]}"; do
        mod=$(echo "$mod" | sed "$pattern" 2>/dev/null)
    done

    # only modify the actual file if the content changed
    if [[ "$mod" != "$org" ]]; then
        echo "$file"
        echo "$mod" > $file
    fi
done

Run the script:

bash replace.sh <folder>

Tons of Merge Requests

Some repositories are now containing changes on our local disk. We do not want to manually go through every repository, checking the diff, and pushing it to GitLab. Even worse, clicking hour after hour in the UI in order to create hundreds of merge requests.

Lets write some script:

#!/bin/bash
# traverse.sh

traverse() {
    # iterate over all items inside the folder given as first arg
    for dir in "$1"/*; do
        # if its not a folder, continue
        if [ ! -d "$dir" ]; then
          continue
        fi

        # if it is not a git repository
        # then recursively call the function again
        if [ ! -d "$dir/.git" ]; then
            echo "Entering $dir"
            traverse "$dir"
            continue
        fi

        # check for git changes
        (cd "$dir" && git diff --quiet)
        git_status=$?

        # just continue if there are not changes
        if [ $git_status -eq 0 ]; then
            continue
        fi

        # enter the folder
        pushd "$dir"
        # push changes to remote
        git checkout -b fix/replace-image-registry
        git add .
        git commit -m "fix: replace image registries" -m "Registry mirrors are set transparent in the kubernetes containerd configuration."
        git push

        # create a merge requests on gitlab
        glab mr create --remove-source-branch --assignee="<YOUR-USERNAME>" --yes --title="feat: replace image registry"

        # leave the folder
        popd
    done
}

traverse $1

Run the script:

bash traverse.sh <folder>

Feel free to not blindly execute the script but instead try it step by step. It is easy to comment some parts out and run this script multiple times.

Review your changes

Every merge requests will create one GitLab TODO in the UI if you assigned yourself with --assignee in the glab command. That lets you do through all merge requests one by one and review them if needed.
I personally did this even though it took about an hour for 100 merge requests. It was still faster than doing every step manually because you only have to make manual changes if needed.