Skip to content

Handle logstash timeouts separately from spikes in errors reported by logstash

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1005610 changed
logstash_checker.py so that it returns 0 on success (same as before),
10 if the error threshold has been exceeded, and 1 for some other
error.

This commit makes AbstractSync.canary_checks() change its behavior
depending on the exit status of logstash_checker.py. The main change
is if the status is '1', the user is prompted to decide how they want
to proceed (exit scap, retry canary checks, or proceed anyway).

Sample output:

18:35:12 Started sync-canaries-k8s  
18:35:12 Finished sync-canaries-k8s (duration: 00m 00s)  
18:35:12 Started sync-check-canaries  
18:35:12 Waiting 5 seconds for canary traffic...  
18:35:17 Executing check 'Logstash canary error rate'  
18:35:17 Check 'Logstash canary error rate' failed: WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f59e9723be0>: Failed to establish a new connection: [Errno 111] Connection refused')': /logstash-*/_search  
ERROR: Generic connection error: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /logstash-*/_search (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f59e97235c0>: Failed to establish a new connection: [Errno 111] Connection refused'))  
  
18:35:17 Failed to complete canary checks for some reason.  
What do you want to do?  
[1] Exit scap  
[2] Retry canary checks  
[3] Continue with deployment  
-> 2  
18:35:22 Executing check 'Logstash canary error rate'  
18:35:22 Check 'Logstash canary error rate' failed: WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6b04345710>: Failed to establish a new connection: [Errno 111] Connection refused')': /logstash-*/_search  
ERROR: Generic connection error: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /logstash-*/_search (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6b043457b8>: Failed to establish a new connection: [Errno 111] Connection refused'))  
  
18:35:22 Failed to complete canary checks for some reason.  
What do you want to do?  
[1] Exit scap  
[2] Retry canary checks  
[3] Continue with deployment  
-> 3  
18:35:23 Proceeding with deployment  
18:35:23 Finished sync-check-canaries (duration: 00m 11s)  

Also, adjust a comment in scap.utils.confirm() to make it match the
code.

Bug: T144033

Edited by Ahmon Dancy

Merge request reports