Efficient pipelines in GitLab CI/CD: Parallel Matrix Builds + !reference
Today I was diving deeper into GitLab CI/CD Pipeline Efficiency tricks, after I discovered resuable job attributes with !reference last week.
Resource optimization is a big topic, and next to ideas on failing fast, I was looking into more parallelization. Luckily GitLab introduced this feature last year.
Parallel Matrix Builds
Instead of creating multiple jobs for
- RHEL8 and Ubuntu 20
- x64 and x86
which then call the same build scripts and containers, and result in blocking pipelines, it would be much nicer to run them in parallel. We can build it with using the parallel
matrix
keywords. DISTRIBUTION
and ARCH
define the arrays.
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
The great thing about the variables is that they are populated with their values and are mapped into the CI/CD jobs as environment variables. The combined configuration below prints them to show their values - you can do much more with them, e.g. in build scripts for conditional decisions, package names, upload directories, etc.
stages:
- build
- test
build:
stage: build
script:
- echo "Building $DISTRIBUTION on $ARCH"
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
test:
stage: test
script:
- echo "Testing $DISTRIBUTION on $ARCH"
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
Works :)
As a developer, you spot repeating blocks
2 times the definition of the variable values. Let's imagine we copy paste everything into more jobs for staging and deployments later - someone removes a deprecated distribution, and misses a location. Bingo, happy pipeline debugging.
Variable assignment from global doesn't work unfortunately. Since I learned that !reference
allows to reuse existing job attributes such as script
and rules
(the latter was added in 14.3), I decided to try it with parallel
too.
!reference + parallel:matrix?
Define a template job called .parallel
and add the matrix attribute in there.
.parallel:
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
Then use !reference [<job template name>,<attribute name>]
to merge it into the other jobs with the parallel
attribute.
stages:
- build
- test
.parallel:
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
build:
stage: build
script:
- echo "Building $DISTRIBUTION on $ARCH"
parallel: !reference [.parallel,parallel]
test:
stage: test
script:
- echo "Testing $DISTRIBUTION on $ARCH"
parallel: !reference [.parallel,parallel]
UN-BE-LIEV-ABLE. IT WORKS.
More pipeline efficiency tricks soon. Watch this space and follow me on social.
Update 2021-09-24: Simon shared a great thought on Twitter.
Extends instead of !reference?
Remember the mistake I made with extends to override script
- I was thinking about that. Using extends
is an easier solution, that's correct. I'll share a thought on merge strategies further below, and why I think !reference
has a place on the table.
Let's build the above solution with using extends
:
stages:
- build
- test
.parallel:
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
build:
extends: .parallel
stage: build
script:
- echo "Building $DISTRIBUTION on $ARCH"
test:
extends: .parallel
stage: test
script:
- echo "Testing $DISTRIBUTION on $ARCH"
Select specific attributes with !reference vs. extends merge strategies
One difference with !reference
is that you control to only inherit one specific job attribute into the current scope. extends
merges all attributes, and there is a certain merge strategy involved:
You can use extends
to merge hashes but not arrays. The algorithm used for merge is “closest scope wins,” so keys from the last member always override anything defined on other levels.
Meaning to say, script
gets overridden but variables
gets merged. If someone adds variables
into the job template, all jobs extending it will inherit and merge. This could lead into unexpected behaviour - especially when this happens by accident.
The following example adds variables
and script
into the job template, and variables
next to script
into the jobs. Would you expect that the ENVIRONMENT
variable available in jobs?
stages:
- build
- test
.parallel:
variables:
ENVIRONMENT: prod
script:
- echo "Installing dependencies"
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
build:
extends: .parallel
stage: build
variables:
OPTIMIZE: 1
script:
- echo "Building $DISTRIBUTION on $ARCH"
test:
extends: .parallel
stage: test
variables:
OPTIMIZE: 0
script:
- echo "Testing $DISTRIBUTION on $ARCH"
The merged YAML view
in the pipeline editor helps with debugging. This feature is relatively new, so not everyone may know about it. 💡
The variables
hash gets merged, script
was overridden. In that case, you might want to use !reference
with script
to solve it.
stages:
- build
- test
.parallel:
variables:
ENVIRONMENT: prod
script:
- echo "Installing dependencies"
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
build:
extends: .parallel
stage: build
variables:
OPTIMIZE: 1
script:
- !reference [.parallel,script]
- echo "Building $DISTRIBUTION on $ARCH"
test:
extends: .parallel
stage: test
variables:
OPTIMIZE: 0
script:
- !reference [.parallel,script]
- echo "Testing $DISTRIBUTION on $ARCH"
Or avoid mixing extends
and only use !reference
to visually aid the flow what is merged. That way variables
are not inherited if not explicitly specified.
stages:
- build
- test
.parallel:
variables:
ENVIRONMENT: prod
script:
- echo "Installing dependencies"
parallel:
matrix:
- DISTRIBUTION: [rhel8, ubuntu20]
ARCH: [x64,x86]
build:
#extends: .parallel # May merge additional variables
stage: build
variables:
OPTIMIZE: 1
script:
- !reference [.parallel,script]
- echo "Building $DISTRIBUTION on $ARCH"
parallel: !reference [.parallel,parallel]
test:
#extends: .parallel # May merge additional variables
stage: test
variables:
OPTIMIZE: 0
script:
- !reference [.parallel,script]
- echo "Testing $DISTRIBUTION on $ARCH"
parallel: !reference [.parallel,parallel]
Conclusion v2
Both extends
and !reference
have their advantages. I recommend that you evaluate and practice both strategies for your CI/CD workflows, and document a code style for CI/CD configuration for your team.
I've added the thoughts and feedback above into practical workshop exercises, which will be released later this year. Thanks Simon! :)