-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2][adjuster] Implement adjuster for deduplicating spans #6391
Conversation
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
if err != nil { | ||
// TODO: what should we do here? | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro how should we handle the case where the hash code cannot be computed. This would happen in case there as an error in protobuf serialization or if the hashing function returned an error. Its probably very unlikely this ever happens. Is skipping over the span sufficient? Do we want to add a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think skipping the span is fine in this case. We could also add a warning with the error to that span.
Signed-off-by: Mahad Zaryab <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6391 +/- ##
=======================================
Coverage 96.20% 96.21%
=======================================
Files 362 363 +1
Lines 20705 20748 +43
=======================================
+ Hits 19919 19962 +43
Misses 601 601
Partials 185 185
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
scopeSpans := rs.ScopeSpans() | ||
for j := 0; j < scopeSpans.Len(); j++ { | ||
ss := scopeSpans.At(j) | ||
spansByHash := make(map[uint64]ptrace.Span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be defined at the top level in the function, so that deduping is global. And the hashing must account for resource and scope attributes.
Signed-off-by: Mahad Zaryab <[email protected]>
// the FNV hashing algorithm to the serialized data. | ||
// | ||
// To ensure consistent hash codes, this adjuster should be executed after | ||
// SortAttributesAndEvents, which normalizes the order of collections within the span. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of thoughts on this:
-
Some storage backends (Cassandra, in particular), perform similar deduping by computing a hash before the span is saved and using it as part of the partition key (it creates tombstones if identical span is saved 2 times or more but no dups on read). So we could make this hashing process to be a part of the ingestion pipeline (e.g. in sanitizers) and simply store it as an attribute on the span. Then this adjuster would be "lazy", it will only recompute the hash if it doesn't already exist in the storage.
-
If we do this on the write path, we would want this to be as efficient as possible, so we would need to implement manual hashing by iterating through the attributes (and pre-sorting them to avoid dependencies) and but manually going through all fields of the Span / SpanEvent / SpanLink. The reason I was reluctant to do that in the past was to avoid unintended bugs if the data model was changed, like a new field added that we'd forget to add to the hash function. To protect against that we probably could use some fuzzing tests, by setting / unsetting each field individually and making sure the hash code changes as a result.
We don't have to do it now, but let's open a ticket for future improvement (I think it could be a good-first-issue
)
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
traces := ptrace.NewTraces() | ||
rs := traces.ResourceSpans().AppendEmpty() | ||
resourceAttributes.CopyTo(rs.Resource().Attributes()) | ||
ss := rs.ScopeSpans().AppendEmpty() | ||
scopeAttributes.CopyTo(ss.Scope().Attributes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do this outside of the loop for spans and only replace the span before hashing
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Mahad Zaryab <[email protected]>
return 0, err | ||
} | ||
hasher := fnv.New64a() | ||
hasher.Write(b) // never returns an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the error here because the Hash64
interface says that the writer never returns an error.
type Hash interface {
// Write (via the embedded io.Writer interface) adds more data to the running hash.
// It never returns an error.
io.Writer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should go in the code as explanation
return 0, err | ||
} | ||
hasher := fnv.New64a() | ||
hasher.Write(b) // never returns an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should go in the code as explanation
hashTrace := ptrace.NewTraces() | ||
rs := resourceSpans.At(i) | ||
hashResourceSpan := hashTrace.ResourceSpans().AppendEmpty() | ||
rs.Resource().Attributes().CopyTo(hashResourceSpan.Resource().Attributes()) | ||
scopeSpans := rs.ScopeSpans() | ||
hashScopeSpan := hashResourceSpan.ScopeSpans().AppendEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard to grok due to ordering and naming
hashTrace := ptrace.NewTraces() | |
rs := resourceSpans.At(i) | |
hashResourceSpan := hashTrace.ResourceSpans().AppendEmpty() | |
rs.Resource().Attributes().CopyTo(hashResourceSpan.Resource().Attributes()) | |
scopeSpans := rs.ScopeSpans() | |
hashScopeSpan := hashResourceSpan.ScopeSpans().AppendEmpty() | |
rs := resourceSpans.At(i) | |
scopeSpans := rs.ScopeSpans() | |
hashTrace := ptrace.NewTraces() | |
hashResourceSpans := hashTrace.ResourceSpans().AppendEmpty() | |
hashScopeSpans := hashResourceSpan.ScopeSpans().AppendEmpty() | |
hashSpan := hashScopeSpans.Spans().AppendEmpty() | |
rs.Resource().Attributes().CopyTo(hashResourceSpan.Resource().Attributes()) |
ss := scopeSpans.At(j) | ||
ss.Scope().Attributes().CopyTo(hashScopeSpan.Scope().Attributes()) | ||
spans := ss.Spans() | ||
newSpans := ptrace.NewSpanSlice() | ||
hashSpan := hashScopeSpan.Spans().AppendEmpty() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ss := scopeSpans.At(j) | |
ss.Scope().Attributes().CopyTo(hashScopeSpan.Scope().Attributes()) | |
spans := ss.Spans() | |
newSpans := ptrace.NewSpanSlice() | |
hashSpan := hashScopeSpan.Spans().AppendEmpty() | |
ss := scopeSpans.At(j) | |
spans := ss.Spans() | |
ss.Scope().Attributes().CopyTo(hashScopeSpan.Scope().Attributes()) | |
dedupedSpans := ptrace.NewSpanSlice() |
Signed-off-by: Mahad Zaryab <[email protected]>
|
||
func (s *SpanHashDeduper) Adjust(traces ptrace.Traces) { | ||
spansByHash := make(map[uint64]ptrace.Span) | ||
resourceSpans := traces.ResourceSpans() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend going forward to use terms resources and scopes. Makes the code more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good - I can open a cleanup PR
…ing#6391) ## Which problem is this PR solving? - Towards jaegertracing#6344 ## Description of the changes - Implemented an adjuster to deduplicate spans. - The span deduplication is done by marshalling each span into protobuf bytes and applying the FNV hash algorithm to it. ## How was this change tested? - Added unit tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <[email protected]>
Which problem is this PR solving?
Description of the changes
How was this change tested?
Checklist
jaeger
:make lint test
jaeger-ui
:npm run lint
andnpm run test