Merge multiple streams and produce enriched page change content.
Bug: T307959 This MR is a draft/wip and not ready to be merged yet.
Pass the page payload to AsyncDataStream, and enrich it in the AsyncFunction. AsyncDataStream will produce a message with a revision schema plus content and action fields.
Changes
The main changes are:
- Support for multiple types of page changes (create, delete, edit).
- Improved error handling and logging.
- Decouples enrichment logic from async behaviour.
- Adds integration tests for local e2e execution.
A good entry point for this MR would be the test at src/test/scala/EnrichmentSuite.scala
.
Enrichment.makePipeline
contains most boilerplate for setting up the DAG topology. src/main/scala/org/wikimedia/dataplatform/AsyncEnrichWithContent.scala
contains the Flink AsynFunction that calls out Action API to perform enrichment.
TODOs
Follow up work
- Add retry on error for network calls.
- Implement side output for managing error reporting.
- Revisit naming conventions (especially across modules).
- Add testing boilerplate that accounts for schema validation and Json schema resources.