Skip to content

Merge multiple streams and produce enriched page change content.

Gmodena requested to merge T307959-enrich-event-payload into main

Bug: T307959 This MR is a draft/wip and not ready to be merged yet.

Pass the page payload to AsyncDataStream, and enrich it in the AsyncFunction. AsyncDataStream will produce a message with a revision schema plus content and action fields.

Changes

The main changes are:

  1. Support for multiple types of page changes (create, delete, edit).
  2. Improved error handling and logging.
  3. Decouples enrichment logic from async behaviour.
  4. Adds integration tests for local e2e execution.

A good entry point for this MR would be the test at src/test/scala/EnrichmentSuite.scala. Enrichment.makePipeline contains most boilerplate for setting up the DAG topology. src/main/scala/org/wikimedia/dataplatform/AsyncEnrichWithContent.scala contains the Flink AsynFunction that calls out Action API to perform enrichment.

TODOs

Follow up work

  1. Add retry on error for network calls.
  2. Implement side output for managing error reporting.
  3. Revisit naming conventions (especially across modules).
  4. Add testing boilerplate that accounts for schema validation and Json schema resources.

Reviewers / informed

@tchin @otto @dcausse @joal @lbowmaker

Edited by Gmodena

Merge request reports