Set health check offset based on last updated time #43

gvso · 2020-07-31T14:01:40Z

In #38, we introduced rollbacks and integration with the health package. When obtaining the metrics, we passed a offset of how much time in the past we want to look up (e.g. 10 minutes).

The current implementation always look back at a constant number of time (e.g. 10 minutes, 1 hour, etc.). This has the (potential) advantage that it allows us to measure the latency/error rate given an average load time (e.g. 6000 requests in 10 minutes = 10 request/second). However, it also has the disadvantage of potentially taking longer to completely rollout a new revision (or not rolling out at all if an unreachable request count is set). For instance, if the min request count is 6000 and we check with a constant offset of 10 minutes, it can happen that the new revision always get a number of requests that is below 6000 in the last 10 minutes. That is, the candidate never gets more traffic and stays a candidate for a really long time. This is especially more likely when the candidate gets small shares of the traffic (at the beginning of the rollout).

The alternative solution would be to add an annotation about the last time the candidate's traffic was increased, so we can calculate an offset = time.Now() - lastTime. This would basically determine the health based on the accumulated requests since the last roll forward.

The text was updated successfully, but these errors were encountered:

gvso · 2020-07-31T14:02:44Z

@ahmetb @grayside Any thoughts on this?

ahmetb · 2020-08-03T17:09:28Z

I think the "we look at last N minutes" approach is easier to wrap your mind around (for users) and easier for us to implement (no need to keep track of last checked date).

For crowded services, setting a decent <min request, last N minutes> pair should be fairly easy for the operator of the service.

After all the point of "min requests" criterion is to ensure there were at least some requests and the monitoring signals aren't anecdotal.

Maybe ship with "default min requests=0" (i.e. no such requirement).

gvso changed the title ~~Make health check offset based on last updated time~~ Set health check offset based on last updated time Jul 31, 2020

gvso mentioned this issue Aug 3, 2020

config: Add configuration for min request count #42

Merged

gvso mentioned this issue Aug 6, 2020

main: Add flag for min time before roll forward #56

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set health check offset based on last updated time #43

Set health check offset based on last updated time #43

gvso commented Jul 31, 2020 •

edited

Loading

gvso commented Jul 31, 2020

ahmetb commented Aug 3, 2020

Set health check offset based on last updated time #43

Set health check offset based on last updated time #43

Comments

gvso commented Jul 31, 2020 • edited Loading

gvso commented Jul 31, 2020

ahmetb commented Aug 3, 2020

gvso commented Jul 31, 2020 •

edited

Loading