Skip to content

test_k8s/dumps/sql_xml: code generate each DAG to free scheduler resources

The way the SQL/XML DAGs are dynamically generated by the scheduler causes the import time of the test_k8s/dumps/dags/mediawiki_sql_xml_dumps.py file to near 2.5 minutes!

That causes all kinds of issues:

  • high scheduler load
  • high DAG parsing/processing time
  • import errors causing the DAGs to disappear from the UI while they're being imported

To solve these issues, we can codegen the dag files instead of relying on highly dynamic code. We introduce a python script in charge of looping on wiki size, levels and buckets, and codegen the associate dag python dag files from a code template.

While ugly and probably punished in several countries, including Canada, this will free a lot of scheduler runtime resources, while trading them with some extra maintenance cost.

The effect in a devenv was quite striking, as shown in the following screenshots.

Before After
Screenshot_2025-08-22_at_11.33.46 Screenshot_2025-08-22_at_11.33.35

Note: that allows us to remove the hack introduced in !1518 (merged) altogether, as we now define one DAG per file.

Signed-off-by: Balthazar Rouberol brouberol@wikimedia.org Bug: T402529

Closes T402529

Edited by Brouberol

Merge request reports

Loading