-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Staleness batch #6355
Staleness batch #6355
Conversation
service/labelstore/service_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write some benchmarks and share the results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added a benchmark. TrackStaleness with concurrent calls averages around 2ms per handling 100k entries.
Backporting the benchmark to main takes 14ms per 100k entries. So 7x better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat idea to batch it! Would really like to see some benchmarks so we can know for sure how much better this is :)
// Tested this to ensure it had no cpu impact, since it is called so often. | ||
a.ls.RemoveStaleMarker(uint64(ref)) | ||
} | ||
a.stalenessTrackers = append(a.stalenessTrackers, labelstore.StalenessTracker{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing some context here, but don't we need similar staleness tracking for AppendExemplar and AppendHistogram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do but it never had it so keeping the pr small. I will have 2-3 more prs incoming.
// Tested this to ensure it had no cpu impact, since it is called so often. | ||
a.ls.RemoveStaleMarker(uint64(ref)) | ||
} | ||
a.stalenessTrackers = append(a.stalenessTrackers, labelstore.StalenessTracker{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create a labelstore.StalenessTracker
here, but later in func (s *service) TrackStaleness(ids []StalenessTracker)
we convert these to &staleMarker{}
and calculate the labels hash. Could we instead create the &staleMarker{}
type right away here and calculate the hash here?
That way there will be less work and structs to create, the code will get simpler too I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel those are owned by two different things. Mainly the last marked state should really only be set by the labelstore itself and exposing that field feels off. This gets cleaned up slightly in the next PR.
service/labelstore/service.go
Outdated
globalID: globalRefID, | ||
for _, id := range ids { | ||
if value.IsStaleNaN(id.Value) { | ||
s.staleGlobals[id.GlobalRefID] = &staleMarker{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have a map of pointers, but in the fanout we use a slice of structs stalenessTrackers []labelstore.StalenessTracker
. We also use slice of structs for labels. So it's a bit inconsistent and I'm not sure what's the thinking behind it. Do we have benchmarks for what performs better? Did this come up in allocation profiles somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You dont have the traverse the map to check if you need to add or update the value like you would an array. Now its unlikely you would need to add staleness markers mutliple times, but I would rather not assume that.
service/labelstore/service.go
Outdated
s.mut.Lock() | ||
defer s.mut.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: could we use a different mutex for s.staleGlobals
and thus avoid contention with other methods like GetLocalRefID, GetGlobalRefID, GetOrAddGlobalRefID, etc? seems possible at a glance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fiddled with that a bit but it felt a bit more error prone in developing. I would prefer to split that out to another pr if we wanted to go that route.
* Move the staleness tracking to commit and rollback in a batch. * Move the staleness tracking to commit and rollback in a batch. * add more specific comment * fix linting * PR feedback
PR Description
This moves the staleness tracker to use batch operation. Using the two fanout options should ensure that tracking is always called.
Notes to the Reviewer
PR Checklist