test_k8s: define prototype DAGs in charge of running the xml-sql v1 dumps
Note: These DAGs are currently in the prototyping phase, and will be refined as we go.
We define 2 DAGs for XML/SQL dumps v1 jobs:
- a DAG in charge of dumping the large wikis
- a DAG in charge of dumping the regular wikis
Each DAG has its own pool, in order to guarantee that the large dumps tasks start as soon as possible, while allowing us to adjust parallelism on the fly, if we ever find ourselves causing too much load on databases, external storage servers, Ceph, etc.
The way each DAG works is the following:
- we start by fetching the wiki list of the associated type (regular or large)
- we then query the Kubernetes API for the
mediawiki-production-dumps-job-templateJob spec, and extract the underlyingPodspec from it - we create a mapped task per wiki in the wiki list, each of them being
in charge of creating and managing the lifetime of a dumps Pod, using
the previously fetch spec (adjusted on the fly for the current wiki)
using the
KubernetesPodOperator.
Note: we have to mount the confimaps defined in the
mediawiki-dumps-legacy chart in the pod to make the required dumps
configuration files availble to the dump worker.
Signed-off-by: Balthazar Rouberol brouberol@wikimedia.org Bug: T388378 Bug: T389931