Skip to content

Restructure / correct existing prometheus metrics

Rather than accessing metrics indirectly by way of helpers like IncrementCounter that use map lookup, access them directly.

Rationale: Since we don't have tests that cover metric reporting, there should be little or no code between definition and use.

Removing the helper logic avoids the "what to do if the metric name is unknown" issue (which was silently ignored) and no longer hides the use of a method that can panic (With).

While we're here, also:

  1. Remove the no longer used mercurius_version_errors metric.
  2. Correct the units reported to the mercurius_job_duration_seconds metric, which should be float64 seconds (not milliseconds). Note that in order for this metric to be useful, it will still need better buckets.
  3. Correct initialization of the mercurius_job_failures metric, which was not using promauto (and thus not registered).

Bug: T383641

Merge request reports

Loading