Imporve test coverage for Flink wrappers (3/3)
Bug: T326565
This MR depends !15 (merged) and !16 (merged)
Refactor Flink logic (init and serilisation) into a dedicated submodule, and make it more testable. This should allow further codebase evolution and refactoring while keeping a base to track regressions.
Remove schema validation from local sink function, because that is now guaranteed by typed Row support.
When streaming from files (or collection of dicts) stream_manager
now collects output into a local sink.
This is exposed as stream.sink
. See tests/test_data_loading.py
for more details. This works on jdk8, but introduces a regression on jdk11.
While working on this MR I stumbled upon two issues with the from_data
method, which we use to feed collection of dictionaries to Java Eventutilities fileData
helper:
- Temporary file content should be in JSON Lines format, since
fileData
expects LineRecords. - There is a potential race condition when removing the temporary file. In some cases I witnessed the Python process unlinking the file before the Java thread could read.
This MR proposes fixes for both issues.