Skip to content
  • Aleksandar Mastilovic's avatar
    Adding support for "produced_by" configuration of datasets · 1f183ef3
    Aleksandar Mastilovic authored
    * Linter fix
    
    * Addressing code review nitpick, fix CI/CD pipeline
    
    * Fixing ensure_multiple_reviewers.py - mr.discussions.list() needs an extra param to retrieve all of the comments
    
    * Making the linter happy...
    
    * Making the linter happy
    
    * Addressing review comments
    
    * Moved AIRFLOW_INSTANCE_MAP into wmf_airflow_common.config
    * Added tests for produced_by sensors
    * produced_by sensor task_id abbreviated
    * Depending on which Airflow instance is configured in produced_by,
    return either ExternalTaskSensor or RestExternalTaskSensor
    * Better class and method docs
    
    * Typing and linting fixes
    
    * Adding support for "produced_by" configuration of datasets
    
    Implementation of the proposal outlined in
    https://phabricator.wikimedia.org/T372647
    
    Airflow producer dataset annotation
    
    NOTE: Missing implementation of automatic configuration of
    `execution_delta` based on target DAG's schedule
    
    Example:
    
    ```
      produced_by:
        airflow:
          instance: search
          dag: dummy_dag
          task_group: dummy_grouped_tasks
          task: dummy_standalone_task
    ```
    
    * If `produced_by` configuration is present for any `Dataset`
    implementation, `get_sensor_for` returns a configured external task
    sensor
    * Depending on whether produced_by configuration refers to the Airflow
    instance the DAG code is running on, or not, producer `get_sensor_for`
    returns either the basic ExternalTaskSensor or RestExternalTaskSensor
    
    * Implementation of get_sensor_for for producers
    
    * Support automatic retrieval of RestExternalTaskSensor, based on
    comparison of local Airflow instance and producer's Airflow instance
    * Unit tests for producer get_sensor_for
    
    * Addressing review comments
    
    * Moved AIRFLOW_INSTANCE_MAP into wmf_airflow_common.config
    * Added tests for produced_by sensors
    * produced_by sensor task_id abbreviated
    * Depending on which Airflow instance is configured in produced_by,
    return either ExternalTaskSensor or RestExternalTaskSensor
    * Better class and method docs
    
    * Typing and linting fixes
    
    * Add support for manual override of execution_delta param
    
    * Adding support for "produced_by" configuration of datasets
    
    Implementation of the proposal outlined in
    https://phabricator.wikimedia.org/T372647
    
    Airflow producer dataset annotation
    
    NOTE: Missing implementation of automatic configuration of
    `execution_delta` based on target DAG's schedule
    
    Example:
    
    ```
      produced_by:
        airflow:
          instance: search
          dag: dummy_dag
          task_group: dummy_grouped_tasks
          task: dummy_standalone_task
    ```
    
    * If `produced_by` configuration is present for any `Dataset`
    implementation, `get_sensor_for` returns a configured external task
    sensor
    * Depending on whether produced_by configuration refers to the Airflow
    instance the DAG code is running on, or not, producer `get_sensor_for`
    returns either the basic ExternalTaskSensor or RestExternalTaskSensor
    
    * Fixing ensure_multiple_reviewers.py
    
    * mr.discussions.list() needs an extra param to retrieve all of the comments
    
    * Addressing code review nitpick
    
    * Making the linter happy...
    
    * Making the linter happy
    
    * Implementation of get_sensor_for for producers
    
    * Support automatic retrieval of RestExternalTaskSensor, based on
    comparison of local Airflow instance and producer's Airflow instance
    * Unit tests for producer get_sensor_for
    
    * Addressing review comments
    
    * Moved AIRFLOW_INSTANCE_MAP into wmf_airflow_common.config
    * Added tests for produced_by sensors
    * produced_by sensor task_id abbreviated
    * Depending on which Airflow instance is configured in produced_by,
    return either ExternalTaskSensor or RestExternalTaskSensor
    * Better class and method docs
    
    * Typing and linting fixes
    
    * Add support for manual override of execution_delta param
    
    * Adding support for "produced_by" configuration of datasets
    
    Implementation of the proposal outlined in
    https://phabricator.wikimedia.org/T372647
    
    Airflow producer dataset annotation
    
    NOTE: Missing implementation of automatic configuration of
    `execution_delta` based on target DAG's schedule
    
    Example:
    
    ```
      produced_by:
        airflow:
          instance: search
          dag: dummy_dag
          task_group: dummy_grouped_tasks
          task: dummy_standalone_task
    ```
    
    * If `produced_by` configuration is present for any `Dataset`
    implementation, `get_sensor_for` returns a configured external task
    sensor
    * Depending on whether produced_by configuration refers to the Airflow
    instance the DAG code is running on, or not, producer `get_sensor_for`
    returns either the basic ExternalTaskSensor or RestExternalTaskSensor
    
    * Addressing review comments
    
    * Moved AIRFLOW_INSTANCE_MAP into wmf_airflow_common.config
    * Added tests for produced_by sensors
    * produced_by sensor task_id abbreviated
    * Depending on which Airflow instance is configured in produced_by,
    return either ExternalTaskSensor or RestExternalTaskSensor
    * Better class and method docs
    
    * Typing and linting fixes
    
    * Add support for manual override of execution_delta param
    
    * Adding support for "produced_by" configuration of datasets
    
    Implementation of the proposal outlined in
    https://phabricator.wikimedia.org/T372647
    
    Airflow producer dataset annotation
    
    NOTE: Missing implementation of automatic configuration of
    `execution_delta` based on target DAG's schedule
    
    Example:
    
    ```
      produced_by:
        airflow:
          instance: search
          dag: dummy_dag
          task_group: dummy_grouped_tasks
          task: dummy_standalone_task
    ```
    
    * If `produced_by` configuration is present for any `Dataset`
    implementation, `get_sensor_for` returns a configured external task
    sensor
    * Depending on whether produced_by configuration refers to the Airflow
    instance the DAG code is running on, or not, producer `get_sensor_for`
    returns either the basic ExternalTaskSensor or RestExternalTaskSensor
    1f183ef3