Skip to content

Optimize code to be able to handle historic runs.

Xcollazo requested to merge do-historic-reconcile into main

In this MR we optimize the code to be able to handle historic runs by:

  • Refactoring wikitext_inconsistent_rows table to be easier to use by dropping the computation_window_min_dt and computation_window_max_dt columns in favor of a computation_class column, which currently only supports last-24h or all-of-wiki-time.
  • consistency_check.py
    • Refactor the adaptive algo of df_from_mariadb_replica_adaptive() to grow and support up to our biggest wikis.
  • emit_reconcile_events_to_kafka.py
    • Refactor the algo to an interative approach that will take care of max_events_per_iteration at a time. This helps us finish successfully when doing all-of-wiki-time runs, as in the case of commonswiki, where we found ~375M inconsistencies (big number discussed in https://phabricator.wikimedia.org/T377852#10385240).
    • Remove the --forced mechanism as that will no longer work with the iterative approach. If we are in a position where we need to force re-run a reconcile, we will need to first UPDATE wikitext_inconsistent_rows manually.

Bug: T377852

Edited by Xcollazo

Merge request reports