Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add otel integration #19

Closed
wants to merge 3 commits into from
Closed

Add otel integration #19

wants to merge 3 commits into from

Conversation

BenoitRanque
Copy link
Collaborator

@BenoitRanque BenoitRanque commented Feb 16, 2024

This PR adds tracing and metrics, but there are open questions:

  1. OTEL wants to be loaded before anything else.
    The otel examples stress that OTEL libraries must be loaded before anything else, and that the require node flag should be used for this.
    We may consider doing this in a standard dockerfile? For now, it's not been a problem, but users could have issues if they load some of our instrumented libraries before they load our SDK.

  2. Our packaging spec suggests traces should be sent to the endpoint specified with OTEL_EXPORTER_OTLP_ENDPOINT, however the OTEL spec specifies that this should be the base url for telemetry, and traces should be sent to <base_url>/v1/traces (or use OTEL_EXPORTER_OTLP_TRACES_ENDPOINT for traces). We implement the OTEL spec with the expectation that the connector packaging spec will change. comment on RFC
    edit: this has now been clarified, and this implementation is correct.

  3. I tried to test this with our infra, but I think the latest changes are not deployed yet, or I did something else wrong. Either way, this is not tested end-to-end, so caution is advised.

JIRA: NDC-236

Copy link
Contributor

@sordina sordina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @BenoitRanque

A couple of questions in the review, and having OTEL_EXPORTER_OTLP_ENDPOINT determine if the feature is on or not will be useful for transition to Beta.

I also have a strong desire to see all the SDKs output trace information to STDOUT if OTEL is disabled, or if the OTEL service can't be found. But this is a broader change that will apply to Rust SDK, etc. too so there's no reason to debate that here.

Comment on lines +17 to +25
"@opentelemetry/api": "^1.7.0",
"@opentelemetry/exporter-metrics-otlp-proto": "^0.48.0",
"@opentelemetry/instrumentation-fastify": "^0.33.0",
"@opentelemetry/instrumentation-http": "^0.48.0",
"@opentelemetry/instrumentation-pino": "^0.35.0",
"@opentelemetry/resources": "^1.21.0",
"@opentelemetry/sdk-metrics": "^1.21.0",
"@opentelemetry/sdk-node": "^0.48.0",
"@opentelemetry/semantic-conventions": "^1.21.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a few deps. Do we reference these all directly?

Comment on lines -146 to +156
return connector.getCapabilities(configuration);
const capabilitiesResponse = tracer.startActiveSpan(
"getCapabilities",
(span) => {
const capabilitiesResponse = connector.getCapabilities(configuration);
span.end();
return capabilitiesResponse;
}
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remember seeing some fastify middleware that wrapped the endpoints for OTEL automatically. Would this be viable? No worries if you want to keep it manual.

Copy link
Collaborator

@daniel-chambers daniel-chambers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to know more about what the tracing output looks like on the terminal and how we can control the verbosity, if necessary.

Connectors like the NodeJS Lambda connector output information such as compile errors and function type errors on the terminal and if that just gets lost in reams of debug trace info, that would be a bad outcome.

Comment on lines +78 to +82
if (options.otelExporterOtlpEndpoint) {
initTelemetry(
options.otelServiceName,
options.otelExporterOtlpEndpoint
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this wasn't added inside startServer?

For users of the SDK that provide their own server startup action (that eventually calls startServer), it would be nice if this was just done for them inside startServer so they don't need to do it themselves.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this should go inside startServer

Comment on lines +213 to +225
const queryResponse = await tracer.startActiveSpan(
"handleQuery",
async (span) => {
const queryResponse = await connector.query(
configuration,
state,
request.body
);

span.end();

return queryResponse;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this tracing pattern, what happens if connector.query throws and span.end does not get called?

This was referenced Feb 21, 2024
@daniel-chambers
Copy link
Collaborator

I'm closing this PR in favour of #21, which forwardports the backport (lol 😵). That was easier than cherry picking the changes from the backport up here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants