Skip to content

Improve resilience of flink_status_via_k8s

Ebernhardson requested to merge work/ebernhardson/flink-status-curl into main

When running a reindex operation that required two backfills the first worked, but the second failed to get the apppriate status. Improve the chances of things working by increasing the timeout between checks (so we get 10x more time before it fails). While looking into this I also noticed our pod now has curl, switch that that rather than the funny thing we did running an http request via python.

Add a FAQ section to README.md which describes how to clean up the extra backfill release this left behind.

Overall this was still a success, re-running the script with the same argumnts ran a second backfill over the same time period and finished successfully. I think it may have worked fine without it, but i cleaned up the backfill release as described in the FAQ prior to re-running the reindex orchestration.

Merge request reports

Loading