Skip to content

Load sanitized netflow data from event_sanitized instead of event database

Mforns requested to merge pull-netflow-sanitization-from-sanitized-data into main

When I created the netflow druid loading DAG, I made a mistake in the number of shards per segment (32) in the sanitized version of the data. It should have been (2) instead, and now we have lots of very small segments for older dates. It would be cool to fix that by re-ingesting the sanitized data, but the source table event.netflow does not have data for those old dates (was purged).

We can however use the event_sanitized.netflow table, since it contains all the fields that the sanitized version of netflow has. This change modifies the source table just for the sanitization DAG.

Merge request reports