Search and Replace across all Repositories in a GitLab Instance
We recently introduced container registry mirrors in our Kubernetes cluster at the containerd level. Since then, every team specified the pull-through cache directly in the image name, like: image: docker-cache.example.com/library/alpine
. To remove docker-cache.example.com
as a single point of failure, all teams need to change the image name back to image: docker.io/library/alpine
or image: alpine
.
A possible solution would be to write a mutating Kubernetes webhook that alters the image name for every pod. That would work, but it would not change the image in the source code. This solution works ASAP but would lead to inconsistent Helm charts.
Before enforcing the new image names via OPA, we thought about helping the teams to change the image name instead of blocking their deployments. Manually digging through 200+ services from 40+ teams was not an option. Let the automation begin.
How to Clone a Company
GitLab has a neat CLI tool called glab
found here. With glab
you can create issues, merge requests, releases and much more from the command line.
In order to modify every repository in our GitLab instance, we first need to clone them locally.
glab repo clone -g <group> -a=false -p --paginate
Parameters:
-g
allows you to specify the group-a
specify if you want to clone archived repositories- since you cannot modify them anyways we do not want to include them
-p
preserves namespace and clones them into subdirectories--paginate
makes additional requests in order to fetch all repositories
Unfortunately, glab
does not let you specify the git depth for the repositories. In general, we would like to have a shallow clone since the history is not important for us and it would reduce network bandwidth and disk space a lot.
Substitution
As already explained, we want to replace all occurrences of docker-cache.example.com
with docker.io
. Since the mirror only applies to containers deployed into Kubernetes, our script should only trigger for Helm chart files.
The replacements let you specify an array in case you have multiple different pull-through proxies defined.
#!/bin/bash
# replace.sh
replacements=(
# caches
's/docker-cache.example.com/docker.io/g'
's/ghcr-cache.example.com/ghcr.io/g'
)
# finds all .yaml and .yml files
# filters out files that include 'gitlab-ci' or 'docker-compose' in their name
for file in $(find $1 -type f -name "*.y*ml" | grep -v "docker-compose" | grep -v "gitlab-ci"); do
org=$(cat $file)
mod="$org"
# loop over replacements
for pattern in "${replacements[@]}"; do
mod=$(echo "$mod" | sed "$pattern" 2>/dev/null)
done
# only modify the actual file if the content changed
if [[ "$mod" != "$org" ]]; then
echo "$file"
echo "$mod" > $file
fi
done
Run the script:
bash replace.sh <folder>
Tons of Merge Requests
Some repositories now contain changes on our local disk. We do not want to manually go through every repository, checking the diff, and pushing it to GitLab. Even worse, clicking hour after hour in the UI to create hundreds of merge requests.
Let’s write a script:
#!/bin/bash
# traverse.sh
traverse() {
# iterate over all items inside the folder given as first arg
for dir in "$1"/*; do
# if it's not a folder, continue
if [ ! -d "$dir" ]; then
continue
fi
# if it is not a git repository
# then recursively call the function again
if [ ! -d "$dir/.git" ]; then
echo "Entering $dir"
traverse "$dir"
continue
fi
# check for git changes
(cd "$dir" && git diff --quiet)
git_status=$?
# just continue if there are no changes
if [ $git_status -eq 0 ]; then
continue
fi
# enter the folder
pushd "$dir"
# push changes to remote
git checkout -b fix/replace-image-registry
git add .
git commit -m "fix: replace image registries" -m "Registry mirrors are set transparent in the Kubernetes containerd configuration."
git push
# create a merge request on GitLab
glab mr create --remove-source-branch --assignee="<YOUR-USERNAME>" --yes --title="feat: replace image registry"
# leave the folder
popd
done
}
traverse $1
Run the script:
bash traverse.sh <folder>
Feel free not to blindly execute the script; instead, try it step by step. It is easy to comment some parts out and run this script multiple times.
Review Your Changes
Every merge request will create one GitLab TODO in the UI if you assign yourself with --assignee
in the glab
command. That lets you go through all merge requests one by one and review them if needed.
I personally did this even though it took about an hour for 100 merge requests. It was still faster than doing every step manually because you only have to make manual changes if needed.