Improve resilience of flink_status_via_k8s
When running a reindex operation that required two backfills the
first worked, but the second failed to get the apppriate status.
Improve the chances of things working by increasing the timeout
between checks (so we get 10x more time before it fails). While
looking into this I also noticed our pod now has curl
, switch
that that rather than the funny thing we did running an http
request via python.
Add a FAQ section to README.md which describes how to clean up the extra backfill release this left behind.
Overall this was still a success, re-running the script with the same argumnts ran a second backfill over the same time period and finished successfully. I think it may have worked fine without it, but i cleaned up the backfill release as described in the FAQ prior to re-running the reindex orchestration.