Draft: Add HiveToDruidOperator, DruidSegmentSensor and Dataset, plus migrate druid_load_navigationtiming_dag. (!206) · Merge requests · repos / data-engineering / Airflow DAGs

Mforns requested to merge druid-operator into main Jan 14, 2023

This change adds a couple things (sorry for putting all together):

HiveToDruidOperator: A simple Operator that wraps the existing HiveToDruid.scala Spark utility. You pass it a hive table and a list of fields (among other params) and it loads a particular interval of that table to a Druid datasource.
DruidSegmentSensor: A sensor that checks for the existence of a list of segment intervals within a given datasource.
Dataset: A class tree that allows to consistently define datasets and that provides a method dataset.get_sensor() that automagically returns the best sensor for that dataset.
To test it all, this change also includes a new DAG file, that loads event.navigationtiming data do Druid (in 2 steps: 1 hourly, and 2 daily for recompaction).

Admin message