Draft: Add HiveToDruidOperator, DruidSegmentSensor and Dataset, plus migrate druid_load_navigationtiming_dag.
This change adds a couple things (sorry for putting all together):
- HiveToDruidOperator: A simple Operator that wraps the existing HiveToDruid.scala Spark utility. You pass it a hive table and a list of fields (among other params) and it loads a particular interval of that table to a Druid datasource.
- DruidSegmentSensor: A sensor that checks for the existence of a list of segment intervals within a given datasource.
- Dataset: A class tree that allows to consistently define datasets and that provides a method
dataset.get_sensor()
that automagically returns the best sensor for that dataset. - To test it all, this change also includes a new DAG file, that loads event.navigationtiming data do Druid (in 2 steps: 1 hourly, and 2 daily for recompaction).