Skip to content

Refactor on the way to better parameterization

Ottomata requested to merge refactor1 into main
  • Methods added that return Source/Sink Builders and/or Sources and Sinks, rather than DataStreams. This will allow us to more easily work with (and carry around) the inputs and outputs of pipelines without instantiating actual DataStreams until we need to. It also allows for more flexible configuration of the Sources and Sinks, and more faithfully proxies to Java EventDataStreamFactory. The returned Sources and Sinks are the PyFlink versions.

  • Refactor the EventDataStreamFactory from_data function to a collection_source method that can be used like all other Sources with from_source.

  • EventDataStreamFactory methods have been refactored to reduce the number of methods that return DataStreams. This is now only 1:

    from_source: like StreamExecutionEnvironment.from_source, but uses the stream_descriptor's type info.

  • Reading of all env vars (except EVENTUTILITIES_LIB_DIR) have been moved out of flink.py and into stream_manager context manager. We still need to parameterize these, but it should be easier now that they are all in one place.

  • Rename StreamManager -> StreamPipeline

  • FlinkStreamPipeline.Builder - don't instantiate DataStream until build() by carrying sources and sinks instead of DataStreams.

Bug: T328478

Merge request reports