- 23 Dec, 2021 8 commits
-
-
Gmodena authored
Add cookiecutter_replay to gitignore. See merge request gmodena/platform-airflow-dags!27
-
Gmodena authored
-
Luke Bowmaker authored
Onboard sample-project data pipeline See merge request gmodena/platform-airflow-dags!25
-
-
Gmodena authored
Update unit test. Updated expected output after default path changes. See merge request gmodena/platform-airflow-dags!24
-
Gmodena authored
-
Gmodena authored
T295360 more path normalisation Align conventions with documentation. See merge request gmodena/platform-airflow-dags!23
-
Gmodena authored
-
- 22 Dec, 2021 6 commits
-
-
Gmodena authored
Set dag owner. The property is required for a dag to be picked up by the scheduler, and displayed in DAG UI. See merge request gmodena/platform-airflow-dags!22
-
Gmodena authored
The property is required for a dag to be picked up by the scheduler, and displayed in DAG UI.
-
Gmodena authored
Normalise path layout in factories and cookiecutter This MR fixes some inconsistencies between dags boilerplate and the cookiecutter template: * the expected venv location has moved one level up in the deployed project home * project dir (cookiecutter config) is not part of pipelines home; we let boilerplate chain dirs together. See merge request gmodena/platform-airflow-dags!21
-
Gmodena authored
-
Gmodena authored
T295360 datapipeline scaffolding This merge request adds a cookiecutter template to scaffold new data pipelines as described in https://phabricator.wikimedia.org/T295360. This template provides * Integration with our tox config (mypy/flake8/pytest) * A PySpark job template * A pytest template for pyspark code * An Airflow dag template to help users getting started. # Structure changes The project directory largely follows `image-matching`'s strcuture. Notable changes are: * Python code has been moved under `pyspark` * Python code is pip installable. This allows to package deps at build time, and ease spark deployment (e.g. we don't need to pass each module like `--files schema.py` - imports will be resolved from the `venv`). # How to test checkout the `T295360-datapipeline-scaffolding` branch and run A new datapipline can be created with: ``` make datapipeline ``` This will generate a new directory for pipeline code under: ```bash your_data_pipeline ``` And install an Airflow dag template under ``` dags/your_data_pipeline_dag.py ``` From the top level directory, you can now run `make test-dags`. The command will check that `dags/your_data_pipeline_dag.py` is a valid airflow dag. The output should look like this: ``` make test-dags ---------- coverage: platform linux, python 3.7.11-final-0 ----------- Name Stmts Miss Cover ----------------------------------------------------------- dags/factory/sequence.py 70 3 96% dags/ima.py 49 5 90% dags/similarusers-train-and-ingest.py 20 0 100% dags/your_data_pipeline_dag.py 19 0 100% ----------------------------------------------------------- TOTAL 158 8 95% =========================== 8 passed, 8 warnings in 12.75s =========================== ______________________________________ summary ____________ ``` See merge request gmodena/platform-airflow-dags!16
-
Gmodena authored
-
- 17 Dec, 2021 1 commit
-
-
Gmodena authored
Install openjdk in Github action. Conda vendored openjdk shows flaky behaviour with the rest of the build pipeline. This change installs adopt openjdk directly on the host system. See merge request gmodena/platform-airflow-dags!18
-
- 16 Dec, 2021 1 commit
-
-
Gmodena authored
Conda vendored openjdk shows flaky behaviour with the rest of the build pipeline. This change installs adopt openjdk directly on the host system.
-
- 15 Dec, 2021 2 commits
-
-
Gmodena authored
Install openjdk in base image See merge request gmodena/platform-airflow-dags!17
-
Gmodena authored
-
- 13 Dec, 2021 1 commit
-
-
Clarakosi authored
Modify pipeline to be compatible with algo_v2 changes See merge request gmodena/platform-airflow-dags!13
-
- 09 Dec, 2021 1 commit
-
-
Clarakosi authored
-
- 08 Dec, 2021 2 commits
- 24 Nov, 2021 3 commits
-
-
Gmodena authored
Fix tar archive generation. See merge request gmodena/platform-airflow-dags!15
-
Gmodena authored
-
Gmodena authored
Fixes relative import issues when running code checks in docker containers.
-
- 23 Nov, 2021 2 commits
-
-
Gmodena authored
- 11 Nov, 2021 9 commits
-
-
Gmodena authored
Fix typo in README.md See merge request gmodena/platform-airflow-dags!12
-
Gmodena authored
-
Gmodena authored
Test job failures See merge request gmodena/platform-airflow-dags!11
-
Gmodena authored
-
Gmodena authored
Ci demo cleanup See merge request gmodena/platform-airflow-dags!10
-
Gmodena authored
-
Gmodena authored
-
Gmodena authored
-
Gmodena authored
-
- 10 Nov, 2021 4 commits
-
-
Gmodena authored
-
Gmodena authored
T292741 implement ci checks w tox See merge request gmodena/platform-airflow-dags!9
-
Gmodena authored
-