feat: tune latency attribute buckets to reduce cardinality #2432
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #2286
Log Buckets
This PR introduces the ability to constrain
logBucket
with a max and min bucket. Log buckets just bucket numbers (in this case, ms latency):Example for base 4:
In DataDog, looking across the last week, we see several latency buckets displaying low counts in the noise, especially on the buckets closer to 1. This makes sense given the intrinsic nature of log buckets - they are densest close to 1 and sparser the further up you go. To reduce cardinality (thus, cost), we can chop away that noise at the low end.
It's similarly worth capping the buckets at the high end because although the buckets get exponentially larger, you can still have infinitely more buckets. We haven't been hit with any major issues yet, but you can imagine how this could impact cardinality during a latency spike.
Call Latency
DataDog links: bucketed counts, avg latency rollup
Currently set to base 2 without min/max.
<1
,1-2
..8-16
: noise16-32
..1024-2048
: healthy number of data points2048-4096
,4096-8192
: some spikes, but low data8192-16384
: healthy number of data points (worth investigating separately from this change)Proposed change: base 4, min
4^3
(i.e.<64
is the smallest bucket), max4^7
(i.e.>=16384
is the highest bucket, accounting for outliers).Async Call Latency
DataDog links: bucketed counts, avg latency rollup
Currently set to base 8 without min/max.
8-64
: noise. Note that<1
and1-8
buckets have never occurred, though they are possible in the current setup.64-512
,512-4096
: healthy number of data points4096-32768
: noiseProposed change: base 4, min
4^4
(i.e.<256
is the smallest bucket), max4^6
(i.e.>=4096
is the highest bucket)Ingress Latency
DataDog links: bucketed counts, avg latency rollup
Currently set to base 2 without min/max.
<1
,1-2
..16-32
: noise.32-64
..1024-2048
: healthy number of data points2048-4096
,4096-8192
: noise8192-16384
: healthy number of data points (worth investigating separately from this change)Proposed change: base 4, min
4^3
(i.e.<64
is the smallest bucket), max4^7
(i.e.>=16384
is the highest bucket).