Refactor code to python package and add CI

Gmodena requested to merge add-ci-config into main

This MR adds a python package (differential_privacy) of pyspark DP jobs.

The MR adds a Gitlab CI pipeline for the repo (.gitlab-ci.yml). CI allows to

  • Automatically run unit tests on push
  • Automatically run linting (flake 8) on push
  • Manually run a build job that produces a conda-dist archive of dependencies, compatible with WMF airflow deployments.


This MR has been tested by running existing tmlt pipelines notebook using the conda environment published at


The following will be tackled in follow up MRs.

  • [] try to build python-flint from source
  • [] try to reduce conda-dist size by remove pyspark deps (assuming avail on stat/airflow nodes)
  • [] simplify package management either by using pyproject and/or poetry. This might break compat with our internal tooling, and needs testing.
Edited by Gmodena

