Airflow dag
- Implement airflow dag to run the content gap pipelines
- Dag should go in https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/tree/main/research/dags, and use the
- The conda environment can be packaged using https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils, and put on hdfs
- The conda env on hdfs is configured as artifact in https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/tree/main/research/config,
- Scheduled to run when a new snapshot is detected using a sensor trigger (implemented by data eng?)
- Failure alerting to email (tdb, research team email?)