Skip to content

Migrate and refactor Refine into Airflow dynamic DAG

Aqu requested to merge T356362_dynamic_dag_poc into main

With this PR, we are taking the first step toward migrating the scheduling of the Refine jobs to Airflow.

We have the first task of fetching the dataset configurations from the config store. Then, we map over its result to create as many task groups as datasets. In the task group focused on a single dataset, we wait for the HDFS _IMPORTED files. Then, we trigger the Spark job.

Creating a mapped task within a mapped task group is currently impossible. So, the URL sensor now accepts a list of URLs.

Bug: T356192

Edited by Aqu

Merge request reports