-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use linear buckets in some places #3384
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
3e55e27
to
9faa4f1
Compare
43ce4d1
to
f48ee51
Compare
9faa4f1
to
f0d0d79
Compare
f48ee51
to
fd0602b
Compare
f0d0d79
to
5dcb117
Compare
5dcb117
to
6984d86
Compare
fd0602b
to
cfefa52
Compare
6984d86
to
a44c253
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not use the buckets, instead I am using sum and the numbers which gets me more raw data.
With that being said, while a linear scaling might be better for many cases, I think exponential is better for latency. So, I do not like the global change from exponential to linear.
Why is exponential better for latency? It gives us less accurate latency values, as I explained in the PR description |
d4bb1fb
to
a44c253
Compare
I thought the latency was more in an exponential curves, but if you see it differently fine. If it were up to me, I would remove the buckets altogether. I see it mostly as something useful for presentation. |
So the issue here is that the wider our buckets, the less accurate our quantiles will be (p50, p90, p99, etc), which is not good, because that's what we'll be looking at in our dashboards to monitor the system. So we need to make our buckets as narrow as possible, without generating too many buckets, which can be expensive when using Grafana Cloud, for example. These were starting at 0.0001 ms and it had a base 3 exponential growth. Now it starts at 1 ms, and grows linearly for both the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve because there is less buckets with this system than the preceding.
Motivation
We currently use exponential buckets everywhere. As much as they're good in the sense that they generate less buckets, and can be cheaper on Grafana Cloud, etc, they make our buckets super wide, which makes our Prometheus data less accurate.
Proposal
I need to spend some time later looking at the
testnet
data for these different metrics to adjust the buckets, but for not just changing proxy/server latencies to use linear buckets instead.I also changed the default starting value to 0.001 as 1 microsecond should be enough for most of what we're measuring, and will take at least one bucket away from metrics using this.
Test Plan
Run a validator locally, see the metrics exported with the new buckets
Release Plan