Skip to content

introduce environment watcher

Previously, after an environment had started running the only way to update its status was to explicitly request that the API contacts the cluster. This was done via the latestStatus query parameter when calling one of the GET endpoints for environments.

This pull mechanism put undue pressure on the cluster, especially when calling the GET /environments collection endpoint for a large number of envs.

That behavior has now been replaced by a push mechanism implemented using a client-go informer. The informer essentially keeps an open WATCH stream to the cluster which it uses to keep tabs on the deployments associated to environments. When a deployment changes its status, the cluster pushes an event through the stream and the informer reacts; in our case updating the status of the associated env.

A few points worth noting:

  • latestStatus has been removed from GET /environments but kept for GET /environments/:id. This endpoint can be used to still get updates for an env in case there's a problem with the watcher
  • The 3 day TTL for failed envs applies now only for envs that fail during creation. This means:
    • It won't be applied to envs that were being rebuilt. This is done done so that users don't lose any data they may have in an env that otherwise was originally created successfully and may still be recovered
    • It won't be applied to envs that were running when a cluster-side failure occurred. This is done to prevent a cluster-wide failure happening when we are not around and wiping out most or all user envs in the cluster, e.g. if it happens on a Friday evening

Bug: T396871

Edited by Jaime Nuche

Merge request reports

Loading