Skip to content

Modify schema to wikitext_raw_rc2.

Xcollazo requested to merge wikitext_raw_rc2 into main

In this MR we:

  • Introduce column row_visibility_last_update to keep track of visibility changes separate from other page updates.
  • Introduce an errors column that should let us identify event stream errors, if any.
  • Change partitioning to PARTITIONED BY (wiki_db, months(revision_timestamp)), which will alleviate inode pressure on the HDFS Namenode.
  • Explicitly make the table an Iceberg v2 table, so that, in the event we need to do Iceberg merge-on-read, we can.

Bug: T340863

Edited by Xcollazo

Merge request reports