Skip to content

Imporve test coverage for Flink wrappers (3/3)

Gmodena requested to merge T326565-add-serde-testing into main

Bug: T326565

This MR depends !15 (merged) and !16 (merged)

Refactor Flink logic (init and serilisation) into a dedicated submodule, and make it more testable. This should allow further codebase evolution and refactoring while keeping a base to track regressions.

Remove schema validation from local sink function, because that is now guaranteed by typed Row support.

When streaming from files (or collection of dicts) stream_manager now collects output into a local sink. This is exposed as stream.sink. See tests/test_data_loading.py for more details. This works on jdk8, but introduces a regression on jdk11.

While working on this MR I stumbled upon two issues with the from_data method, which we use to feed collection of dictionaries to Java Eventutilities fileData helper:

  1. Temporary file content should be in JSON Lines format, since fileData expects LineRecords.
  2. There is a potential race condition when removing the temporary file. In some cases I witnessed the Python process unlinking the file before the Java thread could read.

This MR proposes fixes for both issues.

cc / @otto @tchin @dcausse 

Edited by Gmodena

Merge request reports