Skip to content

Job to do consistency check

Xcollazo requested to merge job-to-emit-reconciliation-events into main

In this MR we:

  • Include CREATE DDL for a new table that will hold inconsistent (wiki_db, revision_id) pairs, plus other metadata, so that they can be reconciled, alerted, and analyzed.
  • Implement a PySpark job that can detect such pairs.
  • Delete the older approach for data quality in favor of this new mechanism that serves both purposes: detection and data quality metrics.
  • Incorporates mismatch categorization improvements from https://phabricator.wikimedia.org/T368176#9974410.

Bug: T368754

Edited by Xcollazo

Merge request reports