[v2][adjuster] Implement adjuster for deduplicating spans #6391

mahadzaryab1 · 2024-12-21T16:09:36Z

Which problem is this PR solving?

Towards Implement adjusters to operate on OTLP data format #6344

Description of the changes

Implemented an adjuster to deduplicate spans.
The span deduplication is done by marshalling each span into protobuf bytes and applying the FNV hash algorithm to it.

How was this change tested?

Added unit tests

Checklist

I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
I have signed all commits
I have added unit tests for the new functionality
I have run lint and test steps successfully
- for jaeger: make lint test
- for jaeger-ui: npm run lint and npm run test

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-12-21T16:10:52Z

cmd/query/app/querysvc/adjuster/hash.go

+				if err != nil {
+					// TODO: what should we do here?
+					continue
+				}


@yurishkuro how should we handle the case where the hash code cannot be computed. This would happen in case there as an error in protobuf serialization or if the hashing function returned an error. Its probably very unlikely this ever happens. Is skipping over the span sufficient? Do we want to add a warning?

yeah, I think skipping the span is fine in this case. We could also add a warning with the error to that span.

Signed-off-by: Mahad Zaryab <[email protected]>

codecov · 2024-12-21T16:20:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.21%. Comparing base (9fc9d75) to head (48c4021).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #6391   +/-   ##
=======================================
  Coverage   96.20%   96.21%           
=======================================
  Files         362      363    +1     
  Lines       20705    20748   +43     
=======================================
+ Hits        19919    19962   +43     
  Misses        601      601           
  Partials      185      185

Flag	Coverage Δ
badger_v1	`9.04% <ø> (ø)`
badger_v2	`1.64% <ø> (ø)`
cassandra-4.x-v1-manual	`15.04% <ø> (ø)`
cassandra-4.x-v2-auto	`1.58% <ø> (ø)`
cassandra-4.x-v2-manual	`1.58% <ø> (ø)`
cassandra-5.x-v1-manual	`15.04% <ø> (ø)`
cassandra-5.x-v2-auto	`1.58% <ø> (ø)`
cassandra-5.x-v2-manual	`1.58% <ø> (ø)`
elasticsearch-6.x-v1	`18.75% <ø> (ø)`
elasticsearch-7.x-v1	`18.84% <ø> (ø)`
elasticsearch-8.x-v1	`19.00% <ø> (ø)`
elasticsearch-8.x-v2	`1.64% <ø> (ø)`
grpc_v1	`10.71% <ø> (-0.01%)`	⬇️
grpc_v2	`7.98% <ø> (ø)`
kafka-v1	`9.40% <ø> (ø)`
kafka-v2	`1.64% <ø> (ø)`
memory_v2	`1.63% <ø> (ø)`
opensearch-1.x-v1	`18.88% <ø> (-0.01%)`	⬇️
opensearch-2.x-v1	`18.88% <ø> (-0.01%)`	⬇️
opensearch-2.x-v2	`1.63% <ø> (ø)`
tailsampling-processor	`0.47% <ø> (ø)`
unittests	`95.05% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro · 2024-12-21T16:42:44Z

cmd/query/app/querysvc/adjuster/hash.go

+		scopeSpans := rs.ScopeSpans()
+		for j := 0; j < scopeSpans.Len(); j++ {
+			ss := scopeSpans.At(j)
+			spansByHash := make(map[uint64]ptrace.Span)


this needs to be defined at the top level in the function, so that deduping is global. And the hashing must account for resource and scope attributes.

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro · 2024-12-21T16:58:05Z

cmd/query/app/querysvc/adjuster/hash.go

+// the FNV hashing algorithm to the serialized data.
+//
+// To ensure consistent hash codes, this adjuster should be executed after
+// SortAttributesAndEvents, which normalizes the order of collections within the span.


A couple of thoughts on this:

Some storage backends (Cassandra, in particular), perform similar deduping by computing a hash before the span is saved and using it as part of the partition key (it creates tombstones if identical span is saved 2 times or more but no dups on read). So we could make this hashing process to be a part of the ingestion pipeline (e.g. in sanitizers) and simply store it as an attribute on the span. Then this adjuster would be "lazy", it will only recompute the hash if it doesn't already exist in the storage.

If we do this on the write path, we would want this to be as efficient as possible, so we would need to implement manual hashing by iterating through the attributes (and pre-sorting them to avoid dependencies) and but manually going through all fields of the Span / SpanEvent / SpanLink. The reason I was reluctant to do that in the past was to avoid unintended bugs if the data model was changed, like a new field added that we'd forget to add to the hash function. To protect against that we probably could use some fuzzing tests, by setting / unsetting each field individually and making sure the hash code changes as a result.

We don't have to do it now, but let's open a ticket for future improvement (I think it could be a good-first-issue)

Signed-off-by: Mahad Zaryab <[email protected]>

cmd/query/app/querysvc/adjuster/hash.go

Signed-off-by: Mahad Zaryab <[email protected]>

cmd/query/app/querysvc/adjuster/hash.go

yurishkuro · 2024-12-21T18:23:59Z

cmd/query/app/querysvc/adjuster/hash.go

+	traces := ptrace.NewTraces()
+	rs := traces.ResourceSpans().AppendEmpty()
+	resourceAttributes.CopyTo(rs.Resource().Attributes())
+	ss := rs.ScopeSpans().AppendEmpty()
+	scopeAttributes.CopyTo(ss.Scope().Attributes())


I would rather do this outside of the loop for spans and only replace the span before hashing

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-12-22T00:57:25Z

cmd/query/app/querysvc/adjuster/hash.go

+		return 0, err
+	}
+	hasher := fnv.New64a()
+	hasher.Write(b) // never returns an error


Ignoring the error here because the Hash64 interface says that the writer never returns an error.

type Hash interface { // Write (via the embedded io.Writer interface) adds more data to the running hash. // It never returns an error. io.Writer

This comment should go in the code as explanation

yurishkuro · 2024-12-22T01:11:47Z

cmd/query/app/querysvc/adjuster/hash.go

+		return 0, err
+	}
+	hasher := fnv.New64a()
+	hasher.Write(b) // never returns an error


This comment should go in the code as explanation

yurishkuro · 2024-12-22T02:08:49Z

cmd/query/app/querysvc/adjuster/hash.go

+		hashTrace := ptrace.NewTraces()
+		rs := resourceSpans.At(i)
+		hashResourceSpan := hashTrace.ResourceSpans().AppendEmpty()
+		rs.Resource().Attributes().CopyTo(hashResourceSpan.Resource().Attributes())
+		scopeSpans := rs.ScopeSpans()
+		hashScopeSpan := hashResourceSpan.ScopeSpans().AppendEmpty()


hard to grok due to ordering and naming

Suggested change

hashTrace := ptrace.NewTraces()

rs := resourceSpans.At(i)

hashResourceSpan := hashTrace.ResourceSpans().AppendEmpty()

rs.Resource().Attributes().CopyTo(hashResourceSpan.Resource().Attributes())

scopeSpans := rs.ScopeSpans()

hashScopeSpan := hashResourceSpan.ScopeSpans().AppendEmpty()

rs := resourceSpans.At(i)

scopeSpans := rs.ScopeSpans()

hashTrace := ptrace.NewTraces()

hashResourceSpans := hashTrace.ResourceSpans().AppendEmpty()

hashScopeSpans := hashResourceSpan.ScopeSpans().AppendEmpty()

hashSpan := hashScopeSpans.Spans().AppendEmpty()

rs.Resource().Attributes().CopyTo(hashResourceSpan.Resource().Attributes())

yurishkuro · 2024-12-22T02:11:01Z

cmd/query/app/querysvc/adjuster/hash.go

+			ss := scopeSpans.At(j)
+			ss.Scope().Attributes().CopyTo(hashScopeSpan.Scope().Attributes())
+			spans := ss.Spans()
+			newSpans := ptrace.NewSpanSlice()
+			hashSpan := hashScopeSpan.Spans().AppendEmpty()


Suggested change

ss := scopeSpans.At(j)

ss.Scope().Attributes().CopyTo(hashScopeSpan.Scope().Attributes())

spans := ss.Spans()

newSpans := ptrace.NewSpanSlice()

hashSpan := hashScopeSpan.Spans().AppendEmpty()

ss := scopeSpans.At(j)

spans := ss.Spans()

ss.Scope().Attributes().CopyTo(hashScopeSpan.Scope().Attributes())

dedupedSpans := ptrace.NewSpanSlice()

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro · 2024-12-22T03:36:11Z

cmd/query/app/querysvc/adjuster/hash.go

+
+func (s *SpanHashDeduper) Adjust(traces ptrace.Traces) {
+	spansByHash := make(map[uint64]ptrace.Span)
+	resourceSpans := traces.ResourceSpans()


I'd recommend going forward to use terms resources and scopes. Makes the code more readable

sounds good - I can open a cleanup PR

…ing#6391) ## Which problem is this PR solving? - Towards jaegertracing#6344 ## Description of the changes - Implemented an adjuster to deduplicate spans. - The span deduplication is done by marshalling each span into protobuf bytes and applying the FNV hash algorithm to it. ## How was this change tested? - Added unit tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 added 4 commits December 21, 2024 11:04

Implement Span Hash Adjuster

d79a776

Signed-off-by: Mahad Zaryab <[email protected]>

Add Unit Tests

ee253a0

Signed-off-by: Mahad Zaryab <[email protected]>

Add Missing godoc Comment

ed63739

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Documentation

958ab2b

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 commented Dec 21, 2024

View reviewed changes

mahadzaryab1 added the changelog:bugfix-or-minor-feature label Dec 21, 2024

Add TODO Comment

1c245cf

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 mentioned this pull request Dec 21, 2024

Implement adjusters to operate on OTLP data format #6344

Open

13 tasks

mahadzaryab1 added 2 commits December 21, 2024 11:34

Remove State

1bed2d6

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Lint

2b612fe

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro reviewed Dec 21, 2024

View reviewed changes

Add Marshaler To Struct

a96229b

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro reviewed Dec 21, 2024

View reviewed changes

Move Span Map To Global Scope

db88124

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 commented Dec 21, 2024

View reviewed changes

cmd/query/app/querysvc/adjuster/hash.go Show resolved Hide resolved

Account For Resource And Scope Attributes In Hashing

ade5540

Signed-off-by: Mahad Zaryab <[email protected]>

yurishkuro reviewed Dec 21, 2024

View reviewed changes

mahadzaryab1 added 5 commits December 21, 2024 19:18

Insert Warning When Error In Marshaler

7519060

Signed-off-by: Mahad Zaryab <[email protected]>

Add Tests For Error In Marshaler

a239186

Signed-off-by: Mahad Zaryab <[email protected]>

Remove Error Check For Hasher

220cd29

Signed-off-by: Mahad Zaryab <[email protected]>

Add Unit Tests For Different Outer Attributes

39e7a64

Signed-off-by: Mahad Zaryab <[email protected]>

Address Feedback

45fbd22

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 changed the title ~~[WIP][v2][adjuster] Implement adjuster for deduplicating spans~~ [v2][adjuster] Implement adjuster for deduplicating spans Dec 22, 2024

mahadzaryab1 marked this pull request as ready for review December 22, 2024 00:55

mahadzaryab1 requested a review from a team as a code owner December 22, 2024 00:55

mahadzaryab1 requested review from jkowall and yurishkuro December 22, 2024 00:55

dosubot bot added the v2 label Dec 22, 2024

mahadzaryab1 commented Dec 22, 2024

View reviewed changes

Merge branch 'main' into hash

3b0910c

yurishkuro reviewed Dec 22, 2024

View reviewed changes

mahadzaryab1 and others added 2 commits December 21, 2024 21:41

Address Feedback

da83125

Signed-off-by: Mahad Zaryab <[email protected]>

Merge branch 'main' into hash

48c4021

mahadzaryab1 requested a review from yurishkuro December 22, 2024 02:41

yurishkuro approved these changes Dec 22, 2024

View reviewed changes

yurishkuro reviewed Dec 22, 2024

View reviewed changes

mahadzaryab1 merged commit 54ceda2 into jaegertracing:main Dec 22, 2024
54 checks passed

mahadzaryab1 deleted the hash branch December 22, 2024 14:53

mahadzaryab1 mentioned this pull request Dec 22, 2024

[v2][adjuster] Enhance Span Hash Adjuster For Spans That Have Already Been Hashed #6393

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2][adjuster] Implement adjuster for deduplicating spans #6391

[v2][adjuster] Implement adjuster for deduplicating spans #6391

mahadzaryab1 commented Dec 21, 2024

mahadzaryab1 Dec 21, 2024

yurishkuro Dec 21, 2024

codecov bot commented Dec 21, 2024 •

edited

Loading

yurishkuro Dec 21, 2024

yurishkuro Dec 21, 2024

yurishkuro Dec 21, 2024

mahadzaryab1 Dec 22, 2024

yurishkuro Dec 22, 2024

yurishkuro Dec 22, 2024

yurishkuro Dec 22, 2024

yurishkuro Dec 22, 2024

yurishkuro Dec 22, 2024

mahadzaryab1 Dec 22, 2024

[v2][adjuster] Implement adjuster for deduplicating spans #6391

[v2][adjuster] Implement adjuster for deduplicating spans #6391

Conversation

mahadzaryab1 commented Dec 21, 2024

Which problem is this PR solving?

Description of the changes

How was this change tested?

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 21, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 21, 2024 •

edited

Loading