-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor context prop support to allow things like opentelemtry to work properly with the cadence sdk #618
base: master
Are you sure you want to change the base?
Refactor context prop support to allow things like opentelemtry to work properly with the cadence sdk #618
Conversation
GlobalOpenTelemetry.getTracer("cadence-client") | ||
.spanBuilder("cadence.workflow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familar with opentelemtry -- How are these two properties used? I wonder if we should avoid harhcoding here because users may need to customize them.
|
||
/** | ||
* This is a lifecycle method, called after the workflow/activity has completed. If the method | ||
* finished without exception, {@code successful} will be true. Otherwise, it will be false and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where/what is {@code successful}
?
* This is a lifecycle method, called when the workflow/activity finishes by throwing an unhandled | ||
* exception. {@link #finish()} is called after this method. | ||
* | ||
* @param t The unhandled exception that caused the workflow/activity to terminate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: terminate
is a special in Cadence only for specific cases of closed. You can use either close
or stop
to describe different cases finish with exceptions
I triggered a build for this PR but as with my other PR (#619), this one is failing build too. Something is broken with Buildkite and we are trying to understand what the issue is but |
Some update here: The issue is related to Java11 and Alpine. We are looking into how we could fix it. All the PRs will be potentially blocked until the fix unfortunately. |
Fix is here: #625 |
contextScope.close(); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the onError
is not implemented? is it not needed for OpenTelemetry?
// Get the serialized context from the propagator | ||
Map<String, byte[]> serializedContext = | ||
propagator.serializeContext(propagator.getCurrentContext()); | ||
// Namespace each entry in case of overlaps, so foo -> bar becomes MyPropagator:foo -> bar | ||
Map<String, byte[]> namespacedSerializedContext = | ||
serializedContext | ||
.entrySet() | ||
.stream() | ||
.collect( | ||
Collectors.toMap( | ||
k -> propagator.getName() + ":" + k.getKey(), Map.Entry::getValue)); | ||
result.putAll(namespacedSerializedContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move to a util to reuse the code. It's repeated 4 times for ChildWF, WF, activity and localActivity.
Map<String, byte[]> filteredData = | ||
headerData | ||
.entrySet() | ||
.stream() | ||
.filter(e -> e.getKey().startsWith(propagator.getName())) | ||
.collect( | ||
Collectors.toMap( | ||
e -> e.getKey().substring(propagator.getName().length() + 1), | ||
Map.Entry::getValue)); | ||
contextData.put(propagator.getName(), propagator.deserializeContext(filteredData)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice. Looks like the existing one is having issues with more than one ContextPropagators, right? @meiliang86 @mkolodezny
However, I believe there is a backward-compatibility issue by doing this.
e.g. the existing workflow using ContextPropagator already made headers without this format...
To be backward-compatible, maybe we should add an option to use this and default to false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea is that we can safely assume the existing propagators don't start with some special string as key(e.g. __cadence_v1_ctx_key
) and then we can check if the headers have any keys with it.
If yes then we can use this new way. If not, then fallback to use the old way.
new files undo defaultContextProp changes remove bad calls to method no longer in this pr remove health check from this PR remove default adding of otel Add Health check API to worker and service interface (cadence-workflow#617)
9002821
to
2156317
Compare
rebased this PR on top of 3.5.0. Have to go through all the comments still |
Refactor the context propagators to support OpenTelemetry and add an example OTEL propagator.