Skip to content

Bump up Spark config for anomaly detection DAGs

Mforns requested to merge fix-anomaly-detection-ooms into main

Some jobs have had OOMs and failed during the last days. 2 of the 3 anomaly detection DAGs running today are querying 1 full day of pageview_hourly, which probably needs a bit more than the Spark default config. This change bumps up the resources for all anomaly detection DAGs.

This is not a definitive fix, since each anomaly detection DAG should be able to specify their own Spark config. However, I argue in favor of tackling that when we refactor the anomaly detection DAGs, to include the use of datasets.yaml, get_easy_dag and DagProperties. When we do that, I'd recommend getting rid of the DAG template, and have each DAG specify all their operators.

Merge request reports