Skip to content

Draft: Add HiveToDruidOperator, DruidSegmentSensor and Dataset, plus migrate druid_load_navigationtiming_dag.

Mforns requested to merge druid-operator into main

This change adds a couple things (sorry for putting all together):

  • HiveToDruidOperator: A simple Operator that wraps the existing HiveToDruid.scala Spark utility. You pass it a hive table and a list of fields (among other params) and it loads a particular interval of that table to a Druid datasource.
  • DruidSegmentSensor: A sensor that checks for the existence of a list of segment intervals within a given datasource.
  • Dataset: A class tree that allows to consistently define datasets and that provides a method dataset.get_sensor() that automagically returns the best sensor for that dataset.
  • To test it all, this change also includes a new DAG file, that loads event.navigationtiming data do Druid (in 2 steps: 1 hourly, and 2 daily for recompaction).

Merge request reports