Optimize code to be able to handle historic runs.
In this MR we optimize the code to be able to handle historic runs by:
- Refactoring
wikitext_inconsistent_rows
table to be easier to use by dropping thecomputation_window_min_dt
andcomputation_window_max_dt
columns in favor of acomputation_class
column, which currently only supportslast-24h
orall-of-wiki-time
. -
consistency_check.py
- Refactor the adaptive algo of
df_from_mariadb_replica_adaptive()
to grow and support up to our biggest wikis.
- Refactor the adaptive algo of
-
emit_reconcile_events_to_kafka.py
- Refactor the algo to an interative approach that will take care of
max_events_per_iteration
at a time. This helps us finish successfully when doingall-of-wiki-time
runs, as in the case ofcommonswiki
, where we found ~375M inconsistencies (big number discussed in https://phabricator.wikimedia.org/T377852#10385240). - Remove the
--forced
mechanism as that will no longer work with the iterative approach. If we are in a position where we need to force re-run a reconcile, we will need to first UPDATEwikitext_inconsistent_rows
manually.
- Refactor the algo to an interative approach that will take care of
Bug: T377852
Edited by Xcollazo