Skip to content

Improve support for local development.

Gmodena requested to merge T324951-test-utils into main

Bug: T324951

There's a companion PR that demoes capabilities at https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-stream-enrichment-python/-/merge_requests/2/diffs. This is a good entry point to get a feeling for UX/API changes.

This MR introduces some changes to ease local development of streaming services:

  • changes to the context manager so that data can be consumed from (and produced to) a collection.

  • add a Sink function for collecting datastreams into a local Sink.

  • side output is collected into a dedicated stream, so that stdout/local sink are not polluted with error reporting.

  • add support for json schema resolution for source/destination stream name. This can be used to perform schema validation.

Producing data into kafka with eventutilities.

eventutilities's sink expects a Row and rowTypeInfo to be passed to transformation. This change was out of the scope of this MR and has been implemented in !4 (merged)

TODO

These will be addressed in separate MRs

  • produce data with eventutilities.
  • schema versions should be configurable.
  • make source/sink properties configurable.

cc / @lbowmaker @otto @tchin @milimetric

Edited by Gmodena

Merge request reports