test_k8s/dumpsv1: tolerate completed pods with the OOMKilled status
We've observed pods in the status
mediawiki-afwiki-sql-xml-dump-metahistorybz2dump-v7fv07c 2/3 OOMKilled 0 4h36m
after a memory hungry php process was OOMkilled by the kernel (and restarted by the dump worker). While the overall dump job completed, the status prevented airflow from considering the pod as completed.
We add a bit of custom logic to tolerate this state, as well as increase
the limits/request ratio (and re-wire the LARGE_WIKI_POD_RESOURCES and
DEFAULT_POD_RESOURCES to the actual container resource requests, that
somehow disappeared).
Signed-off-by: Balthazar Rouberol brouberol@wikimedia.org Bug: T391510
