A new job has been added to the CI pipeline to generate and publish a conda environment.
This jobs uses the
conda-dist package from https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils.
Docker and Makefile
I added a Makefile and a Dockerfile to help with local development. This is not strictly necessary for our pipeline, but allows for local experimentation/dev in an environment that resembles Wikimedia analytics hosts and Gitlab's CI container.
I added bump2version to
requirements_dev.txt to automate version bumps, and propagate changes to multiple affected files.
This is mostly prep work for implementing a release cycle and this point from #5 (closed): triggering building a conda envs for main (and possibly development branches). @bmansurov this is something we'll need to think about together, and prepare a proposal for Fabian. I have some ideas, but nothing prescriptive. We'll also need to factor in requirements from DE/Airflow operators.
conda-dist occasionally fails
I’ve experienced a few of these:
CondaPackError: Files managed by conda were found to have been deleted/overwritten in the following packages: - ncurses 6.3: share/terminfo/2/2621A share/terminfo/E/Eterm share/terminfo/E/Eterm-color + 1054 others
I narrowed down the issue to
conda-dist pip-installing with the
If I instead invoke pip from inside the environment (e.g.
./dist/myenv/bin/pip install .) the dependencies do not get clobbered and
conda-pack generates a valid tarball.
I had a patch ready for upstream... and then suddenly the issue went away by itself. I wanted to document it here, might we encounter a regression.
There's a couple of new failures in mypy/flake8. Not a blocker, but flagging just as a FYI.
- mypy: https://gitlab.wikimedia.org/repos/research/knowledge-gaps/-/jobs/11676#L58 (we just need to decorate
# type: ignore).
- flake8: undefined name at https://gitlab.wikimedia.org/repos/research/knowledge-gaps/-/jobs/11675#L57