Skip to content

test_k8s: define prototype DAGs in charge of running the xml-sql v1 dumps

Note: These DAGs are currently in the prototyping phase, and will be refined as we go.

We define 2 DAGs for XML/SQL dumps v1 jobs:

  • a DAG in charge of dumping the large wikis
  • a DAG in charge of dumping the regular wikis

Each DAG has its own pool, in order to guarantee that the large dumps tasks start as soon as possible, while allowing us to adjust parallelism on the fly, if we ever find ourselves causing too much load on databases, external storage servers, Ceph, etc.

The way each DAG works is the following:

  • we start by fetching the wiki list of the associated type (regular or large)
  • we then query the Kubernetes API for the mediawiki-production-dumps-job-template Job spec, and extract the underlying Pod spec from it
  • we create a mapped task per wiki in the wiki list, each of them being in charge of creating and managing the lifetime of a dumps Pod, using the previously fetch spec (adjusted on the fly for the current wiki) using the KubernetesPodOperator.

Note: we have to mount the confimaps defined in the mediawiki-dumps-legacy chart in the pod to make the required dumps configuration files availble to the dump worker.

Signed-off-by: Balthazar Rouberol brouberol@wikimedia.org Bug: T388378 Bug: T389931

Edited by Brouberol

Merge request reports

Loading