Improve support for local development.
Bug: T324951
There's a companion PR that demoes capabilities at https://gitlab.wikimedia.org/repos/data-engineering/mediawiki-stream-enrichment-python/-/merge_requests/2/diffs. This is a good entry point to get a feeling for UX/API changes.
This MR introduces some changes to ease local development of streaming services:
-
changes to the context manager so that data can be consumed from (and produced to) a collection.
-
add a Sink function for collecting datastreams into a local Sink.
-
side output is collected into a dedicated stream, so that stdout/local sink are not polluted with error reporting.
-
add support for json schema resolution for source/destination stream name. This can be used to perform schema validation.
Producing data into kafka with eventutilities.
eventutilities
's sink expects a Row and rowTypeInfo
to be passed to transformation.
This change was out of the scope of this MR and has been implemented in !4 (merged)
TODO
These will be addressed in separate MRs
- produce data with eventutilities.
- schema versions should be configurable.
- make source/sink properties configurable.
cc / @lbowmaker @otto @tchin @milimetric