Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor context prop support to allow things like opentelemtry to work properly with the cadence sdk #618

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AngerM-DD
Copy link
Contributor

@AngerM-DD AngerM-DD commented Jun 15, 2021

Refactor the context propagators to support OpenTelemetry and add an example OTEL propagator.

@AngerM-DD AngerM-DD marked this pull request as ready for review June 15, 2021 19:15
Comment on lines +118 to +119
GlobalOpenTelemetry.getTracer("cadence-client")
.spanBuilder("cadence.workflow")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familar with opentelemtry -- How are these two properties used? I wonder if we should avoid harhcoding here because users may need to customize them.


/**
* This is a lifecycle method, called after the workflow/activity has completed. If the method
* finished without exception, {@code successful} will be true. Otherwise, it will be false and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/what is {@code successful}?

* This is a lifecycle method, called when the workflow/activity finishes by throwing an unhandled
* exception. {@link #finish()} is called after this method.
*
* @param t The unhandled exception that caused the workflow/activity to terminate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: terminate is a special in Cadence only for specific cases of closed. You can use either close or stop to describe different cases finish with exceptions

@demirkayaender
Copy link
Contributor

I triggered a build for this PR but as with my other PR (#619), this one is failing build too. Something is broken with Buildkite and we are trying to understand what the issue is but compileThrift step is failing for some reason. We don't know the exact reason yet but hoping to resolve it soon. Let us know if you can find out something to fix the issue.

@demirkayaender
Copy link
Contributor

Some update here: The issue is related to Java11 and Alpine. We are looking into how we could fix it. All the PRs will be potentially blocked until the fix unfortunately.

@demirkayaender
Copy link
Contributor

Fix is here: #625

contextScope.close();
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the onError is not implemented? is it not needed for OpenTelemetry?

Comment on lines +206 to +219
// Get the serialized context from the propagator
Map<String, byte[]> serializedContext =
propagator.serializeContext(propagator.getCurrentContext());
// Namespace each entry in case of overlaps, so foo -> bar becomes MyPropagator:foo -> bar
Map<String, byte[]> namespacedSerializedContext =
serializedContext
.entrySet()
.stream()
.collect(
Collectors.toMap(
k -> propagator.getName() + ":" + k.getKey(), Map.Entry::getValue));
result.putAll(namespacedSerializedContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move to a util to reuse the code. It's repeated 4 times for ChildWF, WF, activity and localActivity.

Comment on lines 172 to 181
Map<String, byte[]> filteredData =
headerData
.entrySet()
.stream()
.filter(e -> e.getKey().startsWith(propagator.getName()))
.collect(
Collectors.toMap(
e -> e.getKey().substring(propagator.getName().length() + 1),
Map.Entry::getValue));
contextData.put(propagator.getName(), propagator.deserializeContext(filteredData));
Copy link
Contributor

@longquanzheng longquanzheng Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. Looks like the existing one is having issues with more than one ContextPropagators, right? @meiliang86 @mkolodezny

However, I believe there is a backward-compatibility issue by doing this.
e.g. the existing workflow using ContextPropagator already made headers without this format...
To be backward-compatible, maybe we should add an option to use this and default to false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is that we can safely assume the existing propagators don't start with some special string as key(e.g. __cadence_v1_ctx_key ) and then we can check if the headers have any keys with it.

If yes then we can use this new way. If not, then fallback to use the old way.

new files

undo defaultContextProp changes

remove bad calls to method no longer in this pr

remove health check from this PR

remove default adding of otel

Add Health check API to worker and service interface (cadence-workflow#617)
@AngerM-DD AngerM-DD force-pushed the manger-otel-context-prop branch from 9002821 to 2156317 Compare October 26, 2021 21:52
@AngerM-DD
Copy link
Contributor Author

rebased this PR on top of 3.5.0. Have to go through all the comments still

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants