Commit 2755e414 authored by Ottomata's avatar Ottomata
Browse files

Merge branch 'gitlab-ci-snippets' into 'main'

Gitlab CI templates for using, building, and publishing conda envs

See merge request repos/data-engineering/workflow_utils!11
parents fabfbaf9 1692be39
# A pipeline is composed of independent jobs that run scripts, grouped into stages.
# Stages run in sequential order, but jobs within stages run in parallel.
# For more information, see:
image: ""
# Include conda.yml to install and activate a conda env for CI jobs below.
- local: gitlab_ci_templates/conda.yml
- test
- release
- publish
# Python version to use in miniconda environment.
# This is just the miniconda version, conda can still
# make new conda envs with different python version.
# Version of miniconda installer to download.
# ./bin/install-miniconda-env will install a conda env here.
MINICONDA_ENV_PREFIX: /srv/miniconda
# A set of commands that will be executed before each job.
# They resolve a number deps missing from the docker image,
# that are required to test, build and publish the wmf_workflowutils package.
# TODO: we should consider providing our own
# internal docker images with miniconda already installed.
- apt update
# the miniconda installer used by this package requires curl and perl (shasum)
- apt install -y curl perl ca-certificates openssh-client git
# Install a miniconda env into $MINICONDA_ENV_PREFIX
- ./bin/install-miniconda-env
# To test against multiple python versions, we define
# CI jobs that run on docker containers shipping the target
......@@ -46,13 +21,9 @@ test:
# is configured to use these versions.
- python_version: ['3.7', '3.9']
- source $MINICONDA_ENV_PREFIX/bin/activate
- pip install nox==2022.1.7
# Run the CI nox session
- nox --session test --python ${python_version}
# We might still want to allow the publish stage, even if
# tests pass but lint/mypy fails.
allow_failure: false
......@@ -70,7 +41,6 @@ bump-version-patch:
- git config --global "${CI_GIT_USER_USERNAME}"
- git remote set-url origin${CI_PROJECT_PATH}.git
- source $MINICONDA_ENV_PREFIX/bin/activate
- pip install nox==2022.1.7
- SEMVER_RELEASE_TYPE=patch nox -s bump_version
......@@ -84,7 +54,6 @@ publish-wheel:
stage: publish
when: manual
- source $MINICONDA_ENV_PREFIX/bin/activate
- pip install twine==3.8 build==0.7.0
- python -m build --sdist --wheel .
- TWINE_PASSWORD=${CI_JOB_TOKEN} TWINE_USERNAME=gitlab-ci-token python3 -m twine upload --repository-url ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/pypi dist/* --verbose
......@@ -33,22 +33,21 @@ use docker images in Hadoop YARN, we use conda environments instead.
### tl;dr
# Download Dockerfile.conda-dist and use it to build a Docker image
# that can build your conda dist env.
curl > /tmp/Dockerfile.conda-dist
docker build -t workflow_utils:conda_dist -f /tmp/Dockerfile.conda-dist .
# Make sure we are in our local project directory.
cd ./my-project
# Use the workflow_utils:conda-dist image to build a packed conda dist environment
# for your project.
docker run --mount type=bind,source=$(pwd),target=/srv/project workflow_utils:conda_dist
Include the conda-dist.yml gitlab-ci template in your project's `.gitlab-ci.yml`:
- project: 'repos/data-engineering/workflow_utils'
ref: main
file: '/gitlab_ci_templates/conda-dist.yml'
# make sure you define a 'publish' stage:
# ... test, whatever else your project uses.
- publish
If all goes well, after this finishes you will have a packed conda dist environment
at e.g. ./dist/conda_dist_env.2021-12-28T15.00.00.tgz.
In your project's CI/CD pipelines, you can now manually trigger a run of the `publish_conda_env` job to publish
a .tgz file of your project and its dependencies.
### `conda-dist`
......@@ -124,37 +123,12 @@ used when creating the dist env:
To use `conda-dist`, you'll either need to have workflow_utils installed and in your path,
or you can build and use a Docker image with workflow_utils installed.
#### `conda-dist` docker image
#### `conda-dist` gitlab CI template
The recommended way of creating your conda dist env is to
use the provided Dockerfile to create a Docker image
with conda and workflow_utils installed, and then
use that Docker image to create a conda dist env
for your project.
We don't publish a workflow_utils Docker image anywhere,
so they easiest way to get this Docker image is to
download the Dockerfile.conda-dist file from
the git repository and use it to build the Docker image.
curl > /tmp/Dockerfile.conda-dist
docker build -t workflow_utils:conda_dist -f /tmp/Dockerfile.conda-dist
Once your docker image is built locally, you can use it to generate your
project's conda dist env.
# Make sure we are in our local project directory.
cd ./transformer
docker run --mount type=bind,source=$(pwd),target=/srv/project workflow_utils:conda_dist
If all goes well, after this finishes you will have a packed conda dist environment
at e.g. ./dist/conda_dist_env.2021-12-28T15.00.00.tgz.
use the provided gitlab-ci template to build and publish your conda env to gitlab.
See tl;dr above.
#### `conda-dist` local CLI
# Includes conda.yml, and adds a manual publish_conda_env job
# to automate publishing a conda env of your job repository.
- local: gitlab_ci_templates/conda.yml
# Override this if you want to publish to a different package registry.
# As is, this will publish to the current project's generic package registry.
# TODO: Consider defaulting to a common gitlab project for artifacts, e.g. data-engineering/artifacts ?
PACKAGE_REGISTRY_URI: ${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic
# Override this if your python project name is different than your gitlab repo name.
# Stored as a variable to more easily put the content into a file and
# execute it with a python interpreter.
# When getting the project version, we need to do so using a python environment where the
# project is actually installed. We only need the project version when we are publishing
# the conda env to Gitlab Package Registry, so we wait until we build the conda dist env
# with this project installed into it, and then run this script from that conda dist env
# to get the project version.
import os
from importlib.metadata import version
project_version = version(project_name)
except ModuleNotFoundError:
# importlib.metadata is not built in python until python 3.8
import pkg_resources
project_version = pkg_resources.get_distribution(project_name).version
# Runs conda-dist to build a conda_dist_env.tgz file for the project.
stage: publish
when: manual
# Manually publish conda envs to this project's package registry.
# TODO: use a common data-engineering/arttifacts registry(?)
# Install data-engineering/workflow-utils to get conda-dist
# TODO install this from a gitlab pypi repo.
- pip install git+
# Use conda-dist to build a packed conda environment with this project and its dependencies.
- conda-dist
# Run PROJECT_VERSION_SCRIPT from conda dist env's python after conda-dist has installed
# the project into it. This allows the script to use importlib.metadata to get the version
# number of the package installed in the conda dist env.
- echo -e "${PROJECT_VERSION_SCRIPT}" > /tmp/
- PROJECT_VERSION=$(./dist/conda_dist_env/bin/python /tmp/
# Store the PROJECT_VERSION in a .env file.
# This will be handed to the publish job as a reports dotenv artifact,
# So that it will be usable in script there.
- echo "PROJECT_VERSION=${PROJECT_VERSION}" >> build.dotenv
name: ${PROJECT_NAME}.artifacts
# Add the conda dist env tgz file as an artifact that can be downloaded
# in the gitlab CI/CD Pipelines UI.
- dist/conda_dist_env.tgz
# Expose the conda dist env tgz file in Gitlab Merge Requests UI too.
expose_as: conda distribution environment tgz
expire_in: 7 days
# This is used by publish_conda_dist_env to use $PROJECT_VERSION when publishing
# the conda dist env tgz artifact.
# Publishes the conda_dist_env.tgz file built in the build_conda_env job to a Gitlab Package Registry.
stage: publish
# TODO either keep this manual, or use rules to automaticallydo this if the job is running in a tag commit.
when: manual
- build_conda_env
- apt-get update
- apt-get -y install curl ca-certificates
- 'echo "Publishing conda dist env to ${PACKAGE_REGISTRY_URI}/${PROJECT_NAME}/${PROJECT_VERSION}/$PROJECT_NAME-$PROJECT_VERSION.conda.tgz"'
- 'curl -v --header "JOB-TOKEN: ${CI_JOB_TOKEN}" --upload-file "./dist/conda_dist_env.tgz" "${PACKAGE_REGISTRY_URI}/${PROJECT_NAME}/${PROJECT_VERSION}/${PROJECT_NAME}-${PROJECT_VERSION}.conda.tgz"'
# Installs conda and some other usesul debian packages.
# Conda will be installed into /opt/conda.
# By including this gitlab-ci template, your jobs will install and activate a mini conda environment.
image: ""
- echo "deb buster-wikimedia thirdparty/conda" >> /etc/apt/sources.list.d/wikimedia.list
- apt update
- apt install -y curl gpg git make ca-certificates conda
# TODO install openjdk-8-jdk, but there is a bug in debian buster docker?
# Activate the base miniconda environment for use in piepline jobs that include this CI template.
- source /opt/conda/etc/profile.d/
- conda activate
name = workflow-utils
version = 0.1.0
version = 0.2.0
author = Andrew Otto
author_email =
description = Common libraries for data engineering workflows at WMF.
# Installs miniconda and workflow_utils into a conda env in the container.
# This container can then be used to run workflow_utils commands like
# conda-dist.
FROM AS workflow_utils_install
RUN apt-get update && apt-get install -y wget curl ca-certificates git
# Download our install-miniconda-env bash script to bootstrap a working miniconda environment.
# You can override the version of miniconda and python installed here by setting
# These must refer to an installer version available at
# These versions ARE NOT the versions of conda or python that will be in your conda dist env.
# Those should be controlled by your project in a conda environment.yml file.
# TODO: if we can put this built image somewhere, we don't need to download this at build time,
# but can COPY it in from local FS.
RUN wget -O /srv/install-miniconda-env && chmod 755 /srv/install-miniconda-env
# Installs by default to /srv/miniconda
RUN /srv/install-miniconda-env
# TODO: perhaps parameterize this requirement URL?
# If we could just publish this docker image somewhere,then this should just be the working copy of the repo.
RUN /srv/miniconda/bin/pip install git+
# A new layer that now only contains the workflow_utils conda env.
# This conda env can be used to run workflow_utils related commands,
# especially conda-dist for automating packing of other projects
# into conda envs.
FROM AS workflow_utils_env
# Copy /srv/conda_env from the previous stage.
# In this way we keep only the final conda env
# without any of the build step leftovers.
COPY --from=workflow_utils_install /srv/miniconda /srv/miniconda
# workflow_utils conda-dist ENTRYPOINT.
# Usage:
# docker run --mount type=bind,source=$(pwd)/test_project,target=/srv/project workflow_utils:conda_dist
# mount source should be the path to your project dir, which should be mounted at /srv/project
# in the container.
FROM workflow_utils_env as workflow_utils_conda_dist
# This is needed for conda-dist to find conda CLI.
ENV CONDA_EXE=/srv/miniconda/condabin/conda
SHELL ["/bin/bash", "-c"]
ENTRYPOINT /srv/miniconda/bin/conda-dist --dist-env-prefix=/srv/conda_dist_env --dist-env-dest=/srv/project/dist/conda_dist_env.$(date +%Y-%m-%dT%H.%M.%S).tgz /srv/project
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment