Create a conda env distribution.
@bmansurov @aikochou
This MR introduces the capability to generate and publish relocatable conda environments.
It's a requirement needed to satisfy #5 (closed), and a prerequisite for #3.
Changes
CI job
A new job has been added to the CI pipeline to generate and publish a conda environment.
This jobs uses the conda-dist
package from https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils.
Docker and Makefile
I added a Makefile and a Dockerfile to help with local development. This is not strictly necessary for our pipeline, but allows for local experimentation/dev in an environment that resembles Wikimedia analytics hosts and Gitlab's CI container.
I used it to test/troubleshoot workflow_utils
, it might come in handy for point 2 of #5 (closed).
@bmansurov @aikochou would this type of build tooling be at all useful for you?
Versioning
I added bump2version to requirements_dev.txt
to automate version bumps, and propagate changes to multiple affected files.
This is mostly prep work for implementing a release cycle and this point from #5 (closed): triggering building a conda envs for main (and possibly development branches). @bmansurov this is something we'll need to think about together, and prepare a proposal for Fabian. I have some ideas, but nothing prescriptive. We'll also need to factor in requirements from DE/Airflow operators.
Known issues
conda-dist occasionally fails
I’ve experienced a few of these:
CondaPackError:
Files managed by conda were found to have been deleted/overwritten in the
following packages:
- ncurses 6.3:
share/terminfo/2/2621A
share/terminfo/E/Eterm
share/terminfo/E/Eterm-color
+ 1054 others
I narrowed down the issue to conda-dist
pip-installing with the --prefix=./dist/myenv
argument.
If I instead invoke pip from inside the environment (e.g. ./dist/myenv/bin/pip install .
) the dependencies do not get clobbered and conda-pack
generates a valid tarball.
I had a patch ready for upstream... and then suddenly the issue went away by itself. I wanted to document it here, might we encounter a regression.
CI failures
There's a couple of new failures in mypy/flake8. Not a blocker, but flagging just as a FYI.
- mypy: https://gitlab.wikimedia.org/repos/research/knowledge-gaps/-/jobs/11676#L58 (we just need to decorate
wmfdata
imports with# type: ignore
). - flake8: undefined name at https://gitlab.wikimedia.org/repos/research/knowledge-gaps/-/jobs/11675#L57