Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampled TraceId produces recorded spans #13

Closed
Steffen911 opened this issue Oct 14, 2024 · 5 comments
Closed

Sampled TraceId produces recorded spans #13

Steffen911 opened this issue Oct 14, 2024 · 5 comments

Comments

@Steffen911
Copy link
Contributor

What happened

We added sampling to our application to reduce the total amount of spans ingested. We use head based sampling and make the decision on the producer side. We use the new useProducerSpanAsConsumerParent flag which means that we'd expect that the sample decision propagates down to the consumer spans. In our case, we still see consumer spans in our observability tool, even though the TraceFlag is set to 00. We validate the latter using the traceparent on the job opts.

What we expect to happen

When the TraceFlags on the producerContext are set to 00 we expect the same thing on the consumer, i.e. the trace should be excluded from recording, if the producerContext is to be used.

Solution approach

@unflxw I think the root cause might be that we still use the consumerContext as active here: https://github.com/appsignal/opentelemetry-instrumentation-bullmq/blob/main/src/instrumentation.ts#L511. Perhaps we should use the parentContext variable in this row as well?

@unflxw
Copy link
Collaborator

unflxw commented Oct 15, 2024

Hi @Steffen911, thanks for reporting this issue!

Yes, I think you're right that we should use parentContext in that line. (Could you send a PR about it? 🙏)

However, I'm not sure if that will solve the problem you're encountering. In that line, we create a context for the newly created span. But said span was created using trace.startSpan(..., parentContext). If the trace flags were being honored, shouldn't that span have been created as a NonRecordingSpan? 🤔

Do give it a try! If it solves the problem, I'm happy to merge it.

@Steffen911
Copy link
Contributor Author

Hey @unflxw,

I'll open the PR for the parentContext, because I think it would be the correct setup anyway. It doesn't seem to fix the issue though based on my local experiments. It seems that the TraceFlags aren't interpreted when creating the span in https://github.com/appsignal/opentelemetry-instrumentation-bullmq/blob/main/src/instrumentation.ts#L474 as the resulting span has a _spanContext.traceFlags = 1 and isRecording() === true.

Do you have any other ideas what it could be?

@Steffen911
Copy link
Contributor Author

I added #14, but it doesn't seem to address this issue.

@unflxw
Copy link
Collaborator

unflxw commented Oct 15, 2024

@Steffen911 Do you have any other ideas what it could be?

I think the default sampling behavior is to sample all traces, which is what you're seeing, regardless of the trace flags. I think you may need to configure OpenTelemetry to use the ParentBasedSampler, so that it takes into account whether the parent span was sampled. Likely in the way that is described in their documentation for the TraceIDRatioBasedSampler, but for the ParentBasedSampler instead.

@Steffen911
Copy link
Contributor Author

@unflxw Thank you so much! That was actually the perfect hint. We used a TraceIDRatioBasedSampler in all of our services which overwrote the ParentBasedSampler (which seems to be the default). So not using the TraceIDRatioBasedSampler fixed it for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants