Optimize code to be able to handle historic runs.
In this MR we optimize the code to be able to handle historic runs by:
- Refactoring
wikitext_inconsistent_rowstable to be easier to use by dropping thecomputation_window_min_dtandcomputation_window_max_dtcolumns in favor of acomputation_classcolumn, which currently only supportslast-24horall-of-wiki-time. -
consistency_check.py- Refactor the adaptive algo of
df_from_mariadb_replica_adaptive()to grow and support up to our biggest wikis.
- Refactor the adaptive algo of
-
emit_reconcile_events_to_kafka.py- Refactor the algo to an interative approach that will take care of
max_events_per_iterationat a time. This helps us finish successfully when doingall-of-wiki-timeruns, as in the case ofcommonswiki, where we found ~375M inconsistencies (big number discussed in https://phabricator.wikimedia.org/T377852#10385240). - Remove the
--forcedmechanism as that will no longer work with the iterative approach. If we are in a position where we need to force re-run a reconcile, we will need to first UPDATEwikitext_inconsistent_rowsmanually.
- Refactor the algo to an interative approach that will take care of
Bug: T377852