Skip to content

Calculate previous snapshots for deltas automatically.

Xcollazo requested to merge T330688-robust-deltas into main

In this MR we start calculating previous snapshots for deltas automatically with a simple:

SELECT wikiid,page_namespace,page_id,tag,values FROM a.b
WHERE snapshot=(SELECT MAX(DATE(snapshot)) FROM a.b)
ORDER BY page_id

The SELECT MAX(DATE(snapshot)) FROM a.b is calculated in about ~20 seconds with our current 45 day data retention.

This eliminates the need for the previous_snapshot parameter, and makes the pipeline resilient to sporadic upstream data failures.

Bug: T330688

Edited by Xcollazo

Merge request reports