Skip to content

Init catalog tables with a kafka watermark.

Bug: https://phabricator.wikimedia.org/T324144 This MR initialises TableDescriptors with Kafka watermarks.

The change adds a kafka_timestamp virtual column to schemas accessed through the catalog. kafka_timestamp is a watermark field with a delay of 10 seconds. Both settings are eventutilities defaults. We might want to tune this in the future (or expose a property to let users tune column name and delay) but for now they seem a sensible starting point.

kafka_timestamp carries "event time" Flink semantics (rowtime) as described in https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/concepts/time_attributes/.

An example of tubmle window query on a watermaked table created with this patch can be found at: https://gitlab.wikimedia.org/-/snippets/47/edit.

Note that while the virtual column can be projected from the schema, it will not be persisted with INSERT statements, thus not affecting existing sink functionality. See https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/create/ for details.

Merge request reports