Job to do consistency check
In this MR we:
- Include CREATE DDL for a new table that will hold inconsistent
(wiki_db, revision_id)
pairs, plus other metadata, so that they can be reconciled, alerted, and analyzed. - Implement a PySpark job that can detect such pairs.
- Delete the older approach for data quality in favor of this new mechanism that serves both purposes: detection and data quality metrics.
- Incorporates mismatch categorization improvements from https://phabricator.wikimedia.org/T368176#9974410.
Bug: T368754
Edited by Xcollazo