Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new inferred sampler #12927
Adding new inferred sampler #12927
Changes from all commits
e02fc95
78dcc6b
432d294
ad8d7f3
5c4128b
47bef14
4f0586a
178e48b
6bd52e9
7504c6a
a1b3df1
1e1d624
8bce540
b944c0b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Check warning on line 117 in libs/telemetry/src/main/java/org/opensearch/telemetry/tracing/DefaultTracer.java
Codecov / codecov/patch
libs/telemetry/src/main/java/org/opensearch/telemetry/tracing/DefaultTracer.java#L116-L117
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How that is suppose to work in case of async calls when parent is finished before the child? (example below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the parent span object is still maintained in the scope of child even if the parent has ended. What we plan to do here is mark the current child's parent and the chain above as one of the parent within the chain would still be alive. As suggested here as well if the parent is still in the recording phase we will go ahead and add the attribute to the parent and mark the parent to be sampled as well. If the parent is not recording then the information about this parent will be stored in event of its parent so that we don't loose on the info of not-recording spans.
Adding the closed parent event to its parent span is still pending and will add in the next revision itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the terminology here (what does it mean for span chain to be alive), sorry about that , let me try to illustrate that better:
tracer.startScopedSpan
) , onSpanStart is calledthreadPool
captures context (current span as parent) and schedules the workAfter
onSpanEnd
, the parent span is immutable, no matter if we still have reference to in from Java code or not - any modification to it have no effect but produce warning.My question is: since we not be exporting the parent span (due to absence of the expected attributes), what would happen in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here is how this would work:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I have suspected - we are messing up the trace in unpredictable ways (another red flag), there should be only one possibility: the trace is recorded as it was sampled or not recorded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are correct we will miss the span. However to maintain the lineage we will still maintain the information of closed span in its parent itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here is the thing folks, I my opinion, which I shared on RFC as well:
OTEL cannot solve tail sampling on library level, this is documented and acknowledged:
There are other tracing instrumentations that try to implement sampling differently, including adaptive sampling that could tune the sampling rates based on the different factors:
Please consider those as a robust and reliable alternatives, that do not exhibit flaws of this solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you're coming from regarding the potential gaps in the hierarchy, which users may need to address based on their choice of collector and visualization solutions. It's been recognized as a requirement to address such gaps in the Otel specification, and the community appears to be quite receptive to it. If you take a look at the Otel issues provided, you'll notice that even with these gaps, there are still ways to infer missing parent information through parent span events, aiding in understanding the code path and other relevant details.
open-telemetry/opentelemetry-specification#3205
open-telemetry/opentelemetry-specification#3867
In general, there are numerous benefits to this approach, particularly for customers dealing with larger clusters processing tens of thousands of requests per second and managing multiple clusters. It offers insights into problematic areas of code, and the availability of missing span information as part of the event in the parent span can be very helpful in connecting the dots.
Moreover, this won't be the default behavior and can be enabled as needed (though it's crucial to clearly document the shortcomings and benefits of such samplers). There have been requests from Otel users for deferred sampling support and reverse propagation of context, and it's hoped that these features will be available soon, at which point a switch can be made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gaganjuneja I shared my opinion on the subject many times, please let's get back to this when these features are available, if other maintainers see this is a way to go , I am fine with that, but strong -1 from me moving this further as it stands today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing, I think we can solve this issue by adding a link between the span and it's grand parent to persist the relationship. Since this is a
follows_from
span.