Skip to content

Create a conda env distribution.

Gmodena requested to merge 5-add-ci-integration-package-conda-env-wip into main

@bmansurov @aikochou
This MR introduces the capability to generate and publish relocatable conda environments. It's a requirement needed to satisfy #5 (closed), and a prerequisite for #3.

Changes

CI job

A new job has been added to the CI pipeline to generate and publish a conda environment. This jobs uses the conda-dist package from https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils.

Docker and Makefile

I added a Makefile and a Dockerfile to help with local development. This is not strictly necessary for our pipeline, but allows for local experimentation/dev in an environment that resembles Wikimedia analytics hosts and Gitlab's CI container.

I used it to test/troubleshoot workflow_utils, it might come in handy for point 2 of #5 (closed). @bmansurov @aikochou would this type of build tooling be at all useful for you?

Versioning

I added bump2version to requirements_dev.txt to automate version bumps, and propagate changes to multiple affected files.

This is mostly prep work for implementing a release cycle and this point from #5 (closed): triggering building a conda envs for main (and possibly development branches). @bmansurov this is something we'll need to think about together, and prepare a proposal for Fabian. I have some ideas, but nothing prescriptive. We'll also need to factor in requirements from DE/Airflow operators.

Known issues

conda-dist occasionally fails

I’ve experienced a few of these:

CondaPackError:
Files managed by conda were found to have been deleted/overwritten in the
following packages:
- ncurses 6.3:
  share/terminfo/2/2621A
  share/terminfo/E/Eterm
  share/terminfo/E/Eterm-color
  + 1054 others

I narrowed down the issue to conda-dist pip-installing with the --prefix=./dist/myenv argument. If I instead invoke pip from inside the environment (e.g. ./dist/myenv/bin/pip install .) the dependencies do not get clobbered and conda-pack generates a valid tarball.

I had a patch ready for upstream... and then suddenly the issue went away by itself. I wanted to document it here, might we encounter a regression.

CI failures

There's a couple of new failures in mypy/flake8. Not a blocker, but flagging just as a FYI.

Edited by Gmodena

Merge request reports