Refactor code to python package and add CI

Gmodena requested to merge add-ci-config into main

This MR adds a python package (differential_privacy) of pyspark DP jobs.

The MR adds a Gitlab CI pipeline for the repo (.gitlab-ci.yml). CI allows to

  • Automatically run unit tests on push
  • Automatically run linting (flake 8) on push
  • Manually run a build job that produces a conda-dist archive of dependencies, compatible with WMF airflow deployments.

Testing

This MR has been tested by running existing tmlt pipelines notebook using the conda environment published at https://gitlab.wikimedia.org/repos/security/differential-privacy/-/packages/158

TODO

The following will be tackled in follow up MRs.

  • [] try to build python-flint from source
  • [] try to reduce conda-dist size by remove pyspark deps (assuming avail on stat/airflow nodes)
  • [] simplify package management either by using pyproject and/or poetry. This might break compat with our internal tooling, and needs testing.
Edited by Gmodena

Merge request reports