You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.
In #38, we introduced rollbacks and integration with the health package. When obtaining the metrics, we passed a offset of how much time in the past we want to look up (e.g. 10 minutes).
The current implementation always look back at a constant number of time (e.g. 10 minutes, 1 hour, etc.). This has the (potential) advantage that it allows us to measure the latency/error rate given an average load time (e.g. 6000 requests in 10 minutes = 10 request/second). However, it also has the disadvantage of potentially taking longer to completely rollout a new revision (or not rolling out at all if an unreachable request count is set). For instance, if the min request count is 6000 and we check with a constant offset of 10 minutes, it can happen that the new revision always get a number of requests that is below 6000 in the last 10 minutes. That is, the candidate never gets more traffic and stays a candidate for a really long time. This is especially more likely when the candidate gets small shares of the traffic (at the beginning of the rollout).
The alternative solution would be to add an annotation about the last time the candidate's traffic was increased, so we can calculate an offset = time.Now() - lastTime. This would basically determine the health based on the accumulated requests since the last roll forward.
The text was updated successfully, but these errors were encountered:
I think the "we look at last N minutes" approach is easier to wrap your mind around (for users) and easier for us to implement (no need to keep track of last checked date).
For crowded services, setting a decent <min request, last N minutes> pair should be fairly easy for the operator of the service.
After all the point of "min requests" criterion is to ensure there were at least some requests and the monitoring signals aren't anecdotal.
Maybe ship with "default min requests=0" (i.e. no such requirement).
In #38, we introduced rollbacks and integration with the
health
package. When obtaining the metrics, we passed a offset of how much time in the past we want to look up (e.g. 10 minutes).The current implementation always look back at a constant number of time (e.g. 10 minutes, 1 hour, etc.). This has the (potential) advantage that it allows us to measure the latency/error rate given an average load time (e.g.
6000 requests in 10 minutes = 10 request/second
). However, it also has the disadvantage of potentially taking longer to completely rollout a new revision (or not rolling out at all if an unreachable request count is set). For instance, if the min request count is 6000 and we check with a constant offset of 10 minutes, it can happen that the new revision always get a number of requests that is below 6000 in the last 10 minutes. That is, the candidate never gets more traffic and stays a candidate for a really long time. This is especially more likely when the candidate gets small shares of the traffic (at the beginning of the rollout).The alternative solution would be to add an annotation about the last time the candidate's traffic was increased, so we can calculate an
offset = time.Now() - lastTime
. This would basically determine the health based on the accumulated requests since the last roll forward.The text was updated successfully, but these errors were encountered: