Skip to content

Draft: Adding support for "produced_by" configuration of datasets

Aleksandar Mastilovic requested to merge feature/Datasets-Config-Upgrade into main

Implementation of the proposal outlined in https://phabricator.wikimedia.org/T372647

Airflow producer dataset annotation

NOTE: Missing implementation of automatic configuration of execution_delta based on target DAG's schedule

Example:

  produced_by:
    airflow:
      instance: search
      dag: dummy_dag
      task_group: dummy_grouped_tasks
      task: dummy_standalone_task
  • If produced_by configuration is present for any Dataset implementation, get_sensor_for returns a configured external task sensor
  • Depending on whether produced_by configuration refers to the Airflow instance the DAG code is running on, or not, producer get_sensor_for returns either the basic ExternalTaskSensor or RestExternalTaskSensor

Merge request reports