-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bloom-Gateway cache #11380
Bloom-Gateway cache #11380
Conversation
Trivy scan found the following vulnerabilities:
|
035e329
to
957ce11
Compare
|
||
// GetQuery returns the query of the request. | ||
// The query is the hash for the input chunks refs and the filter expressions. | ||
func (m *FilterChunkRefRequest) GetQuery() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is really expensive. Is it used as part of the cache key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's used to compute the key. I agree it's probably too expensive.
pkg/bloomgateway/cache.go
Outdated
sort.Slice(chunkRefs, func(i, j int) bool { | ||
return chunkRefs[i].Fingerprint < chunkRefs[j].Fingerprint | ||
}) | ||
return slices.CompactFunc(chunkRefs, func(next, prev *logproto.GroupedChunkRefs) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL: There is a slices.CompactFunc
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm ;) great function ;) previously I would do it manually ;))))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like it will join the chunks only for the first neighbor elements with the same fingerprint. if we have more than 2 neighbors with the same fingerprint, we will just lose chunk ids...
I am not sure if it's expected right now to have more than 2 elements with the same fingerprint but I would not leave this function because in the future somebody will spend days trying to find this bug.
look at this code snippet, https://go.dev/play/p/c_bt6c1HNUj?v= ,
I created a snippet with users, the first 6 users have ID a
, and the next 4 users have ID b
. if we use CompactFunc
to aggregate all the scores
of the users, we won't have the expected results...
compacted 2
&{id:a scores:1}
&{id:b scores:13}
but it has to be:
compacted 2
&{id:a scores:15}
&{id:b scores:30}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! It's quite useful to remove / aggregate duplicated items in a sorted list As Vlad mentioned, this is buggy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right @vlad-diachenko good catch. I pushed a new commit with a test for this and a fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed partially and left some comments, I believe Chirstian has more context here, so completely rely on his review
for _, ref := range chunkRef.Refs { | ||
if model.Time(end) < ref.From || ref.Through <= model.Time(start) { | ||
continue | ||
} | ||
refs = append(refs, ref) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but sorting also requires the same count of iterations to ensure each item is in order. and after this, we will do a binary search. so, I believe it's better to leave it as is. wdyt?
pkg/bloomgateway/cache.go
Outdated
sort.Slice(chunkRefs, func(i, j int) bool { | ||
return chunkRefs[i].Fingerprint < chunkRefs[j].Fingerprint | ||
}) | ||
return slices.CompactFunc(chunkRefs, func(next, prev *logproto.GroupedChunkRefs) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm ;) great function ;) previously I would do it manually ;))))
pkg/bloomgateway/cache.go
Outdated
sort.Slice(chunkRefs, func(i, j int) bool { | ||
return chunkRefs[i].Fingerprint < chunkRefs[j].Fingerprint | ||
}) | ||
return slices.CompactFunc(chunkRefs, func(next, prev *logproto.GroupedChunkRefs) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like it will join the chunks only for the first neighbor elements with the same fingerprint. if we have more than 2 neighbors with the same fingerprint, we will just lose chunk ids...
I am not sure if it's expected right now to have more than 2 elements with the same fingerprint but I would not leave this function because in the future somebody will spend days trying to find this bug.
look at this code snippet, https://go.dev/play/p/c_bt6c1HNUj?v= ,
I created a snippet with users, the first 6 users have ID a
, and the next 4 users have ID b
. if we use CompactFunc
to aggregate all the scores
of the users, we won't have the expected results...
compacted 2
&{id:a scores:1}
&{id:b scores:13}
but it has to be:
compacted 2
&{id:a scores:15}
&{id:b scores:30}
pkg/bloomgateway/client.go
Outdated
@@ -109,9 +116,25 @@ func (i *ClientConfig) RegisterFlags(f *flag.FlagSet) { | |||
// RegisterFlagsWithPrefix registers flags for the Bloom Gateway client configuration with a common prefix. | |||
func (i *ClientConfig) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) { | |||
i.GRPCClientConfig.RegisterFlagsWithPrefix(prefix+"grpc", f) | |||
i.Cache.RegisterFlagsWithPrefix(f, prefix+"cache.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of a trap:
You call RegisterFlagsWithPrefix
on the inlined object resultscache.Config
. Therefore not all flags are registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by implementing RegisterFlagsWithPrefix
in CacheConfig
pkg/logproto/compat.go
Outdated
h := xxhash.New() | ||
for _, ref := range m.Refs { | ||
encodeBuf = encodeBuf[:0] | ||
_, _ = h.Write(fmt.Appendf(encodeBuf, "%d", ref.Fingerprint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative that does not require string->[]byte conversions:
_, _ = h.Write(fmt.Appendf(encodeBuf, "%d", ref.Fingerprint)) | |
_, _ = h.Write(binary.LittleEndian.AppendUint64(encodeBuf[:0], ref.Fingerprint)) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!, leaving fmt.Appendf for the operator since we need to be able to print the operator id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
**What this PR does / why we need it**: This PR adds caching to the bloom-gateway client. It uses the result cache from grafana#11343. Here's how we: - Merge responses: group all chunks by FP and remove duplicated chunk checksums. - Extract responses based on time span: For all chunks in each FP, add to the extracted response only the chunks that overlaps with the desired start and end time.
What this PR does / why we need it:
This PR adds caching to the bloom-gateway client. It uses the result cache from #11343.
Here's how we: