test_k8s/dumps/sql_xml: code generate each DAG to free scheduler resources
The way the SQL/XML DAGs are dynamically generated by the scheduler
causes the import time of the test_k8s/dumps/dags/mediawiki_sql_xml_dumps.py
file to near 2.5 minutes!
That causes all kinds of issues:
- high scheduler load
- high DAG parsing/processing time
- import errors causing the DAGs to disappear from the UI while they're being imported
To solve these issues, we can codegen the dag files instead of relying on highly dynamic code. We introduce a python script in charge of looping on wiki size, levels and buckets, and codegen the associate dag python dag files from a code template.
While ugly and probably punished in several countries, including Canada, this will free a lot of scheduler runtime resources, while trading them with some extra maintenance cost.
The effect in a devenv was quite striking, as shown in the following screenshots.
| Before | After |
|---|---|
![]() |
![]() |
Note: that allows us to remove the hack introduced in !1518 (merged) altogether, as we now define one DAG per file.
Signed-off-by: Balthazar Rouberol brouberol@wikimedia.org Bug: T402529
Closes T402529

