Skip to content

deployment: use Recreate pod replacement strategy

Arturo Borrero Gonzalez requested to merge arturo-193-deployment-use-recr into main

I have observed that in certain scenarios, when a maintain-kubeusers deployment is being upgraded, a new Pod is scheduled and starts doing work, while the old Pod is still running, and undoing that very same work.

This is because, by default, Kubernetes uses a replacement strategy of rolling update, which waits for the new pod to be fully functional before stopping the old one.

A particular example:

  • version X is reconciling resource Y to be created for every account
  • version X+1 is reconciling resource Y to be deleted for every account

If X and X+1 are running at the same time, the results can be either:

  • case #1: inneficient -- work has to be done again by X+1, no big deal anyway
  • case #2: erroneous -- X creates some state that X+1 will not be able to handle

I would like to prevent case #2 with this patch.

Given maintain-kubeusers is not in the hot path for user requests or anything, having some Pod downtime while they are replaced will not be noticed by users.

But then, what if the deployment of X+1 fails because some bug? We will need to rollback to the previous deployment version X anyway. Again, we don't care if there is a brief downtime while we rollback to the previous deployment version.

Signed-off-by: Arturo Borrero Gonzalez aborrero@wikimedia.org

Merge request reports