Skip to content

Parameterization of stream_manager

Ottomata requested to merge refactor3 into main
  • stream_manager config is now loaded by jsonargparse instantiate_classes.

  • stream_manager API is now one of: -- stream_manager(**kwargs) -- stream_manager.from_config(load_config()) -- custom_parser = ArgumentParser custom_parser.add_argument("--custom_arg1", type=str) config = load_config(custom_parser) stream_manager.from_config(config)

  • FlinkStreamPipeline.Builder is gone

  • ConnectorDescriptor is a simple data class that encompasses stream descriptors, connector names (e.g. kafka), and other options needed for instantiating a connector.

  • SourceDescriptor and SinkDescriptor extend from ConnectorDescriptor, and can use Flink env and EventDataStreamFactory to instantiate sources, datatstreams, and sinks and sink to.

  • stream_manager is now parameterized with config and descriptors

  • This allow us to use jsonargparse add_class_arguments to automatically generate an argparser for CLI and config files for stream_manager.

  • Pipeline developers can add additional args to the returned argparser. E.g. mediawiki_api_endpoint, etc.

  • Small refactor that adds a new stream_manager auto_error_destination_enabled param, instead of relying on boolean union type of error_destination.

Bug: T328478

Edited by Ottomata

Merge request reports