Data Import by CI Job
Wikicommons - CC0
A data import is part of most of our projects and in some cases, we need to re-import all the data from time to time. This post shows how you can delegate this task to a GitLab CI job and start it by adding a certain keyword to your commit message.
What is the problem?
There are several circumstances when a data import (or a re-import) is required. One reason is, that the code regarding the import process has changed and we want these changes to be applied to our instances. In conjunction with a code change introduced by a git commit, we can track changes and data imports together. However, not every push to GitLab should trigger this job, as it can be time-consuming and may causes a service interruption.
In generall we should automatize this task as much as possible.
GitLab CI
GitLab CI can add jobs to the pipeline when a specific condition evaluates to true. So we can add utilize a keyword added to the commit message. GitLab offers predefined keywords to change the CI behavior like [skip CI]
to omit the pipeline completely.
General Setup
First of all my recommendation is to introduce a new stage for the data import called data_import
for example.
We set up a job with the rules
key to contain our defined condition:
import:
image: bash:latest
stage: data_import
rules:
- if: '$CI_COMMIT_MESSAGE =~ /ci-reimport-data/'
script:
- printf "your data import here\n"
This job will be added to the pipeline whenever your commit message contains ci-reimport-data
. If not it will not be present in the pipeline at all. It is recommended to put this in a separate paragraph of your message. Leave a blank line between your semantic commit message, like
feat: the most awesome feature ever
incredible performance boost for blockchain records
ci-reimport-data
or, via CLI, use multiple -m
arguments like
git commit -a -m "commit title" -m "extensive description" -m "ci-reimport-data"
Branch-aware
To make this aware of the branch you are working on, the test can be extended. This is useful, when the call for a data import should be aware of a specific instance. We want to test our changes in atopic branch at first and so this job should not run for pipelines triggered from the develop
and main
branch.
$CI_COMMIT_MESSAGE =~ /ci-reimport-data/ && $CI_COMMIT_REF_NAME != "develop" && $CI_COMMIT_REF_NAME != "main"
To omit this job for a pipeline on a merge request or tag, add && $CI_COMMIT_TAG == null && $CI_MERGE_REQUEST_ID == null'
to the condition.
API-triggered Pipelines
When a pipeline is triggered via API (e.g. WebUI) we do can not alter the latest commit message. But we can inject an environment variable to the pipeline and simply check for it:
($CI_COMMIT_MESSAGE =~ /ci-reimport-data/ || $REIMPORT == "true")
Git Flow
When you add your code following a branch model like git flow changes to develop
and main
branch are usually done via merge requests. In this case you have to add the keyword to the merge commit message, what can be done via web interface:
Ahiqar project
In the Ahiqar project (project description, GitLab) the import job is sensitive to either the commit message containing ci-reimport-data
or a variable $REIMPORT
set to true
.
The job is also aware of the branch will prepare the data import on the corresponding instance only (topic branches → test instance; develop → development instance; main → production instance).
The job setup for topic branches looks like this:
import-test:
image: curlimages/curl
stage: data_import
rules:
- if: '($CI_COMMIT_MESSAGE =~ /ci-reimport-data/ || $REIMPORT == "true") && $CI_COMMIT_REF_NAME != "develop" && $CI_COMMIT_REF_NAME != "main" && $CI_COMMIT_TAG == null && $CI_MERGE_REQUEST_ID == null'
script:
- curl https://ahikar-test.sub.uni-goettingen.de/api/import-data?token=${APP_DEPLOY_TOKEN}
Next Steps
One idea to improve the usability might be to refine the condition to be sensitive to checkboxes marked at the merge request description. It would enable us to set or unset a job by clicking a checkbox before a merge is performed.
You might have your own idea to improve this? Leave a comment below 👇 to tell about!