Skip to content

Add DAG and table maintenance for wmf_content.mediawiki_content_current_v1

Xcollazo requested to merge mw-content-current into main

(Depends on repos/data-engineering/dumps/mediawiki-content-dump!62 (merged)).

In this MR we:

  • Implement a daily DAG to calculage and merge changes from wmf_content.mediawiki_content_history_v1 into wmf_content.mediawiki_content_current_v1. This is done via a MERGE INTO, which is under review separately.
  • Implement Iceberg table maintenance for wmf_content.mediawiki_content_current_v1, including a call to rewrite_position_delete_files() which allows us to run the MERGE INTO pipeline with merge-on-read (faster, and more stable!).
  • Modified folder hierarchy a bit to accomodate the set of history content jobs vs the current jobs.

Bug: T391283

Edited by Xcollazo

Merge request reports

Loading