Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry trace error #2058

Open
deweyjose opened this issue Nov 7, 2022 · 7 comments
Open

OpenTelemetry trace error #2058

deweyjose opened this issue Nov 7, 2022 · 7 comments

Comments

@deweyjose
Copy link
Contributor

Describe the bug
We're seeing the following error in our Router logs:

{"timestamp":"2022-11-07T14:34:48.904971Z","level":"ERROR","fields":{"message":"OpenTelemetry trace error occurred: error sending request for url (*********/traces): connection closed before message completed"},"target":"apollo_router::plugins::telemetry"}
@o0Ignition0o
Copy link
Contributor

o0Ignition0o commented Nov 7, 2022

Thanks for opening this issue! This could be tied to #2046

@kmcrawford
Copy link

kmcrawford commented Nov 8, 2022

I'm seeing a similar issue, but may not be related:

{"timestamp":"2022-11-08T19:24:19.723074Z","level":"ERROR","fields":{"message":"OpenTelemetry trace error occurred: oneshot canceled"},"target":"apollo_router::plugins::telemetry"}

Running v1.2.1

@BrynCooke
Copy link
Contributor

Not sure about the oneshot, but the other issue is likely to be caused by the BatchSpanProcessor defaults not being good for production.

In the short term the following env variables are available to tweak:
https://opentelemetry.io/docs/reference/specification/sdk-environment-variables/#batch-span-processor

Longer term we are looking to add this configuration to the router.yaml

@BrynCooke
Copy link
Contributor

Going to close this for now in favor of #1789 as I think this is resolved by tweaking the batch span processor settings. If it still persists then feel free to reopen.

@abernix abernix removed the triage label Feb 22, 2023
@jtribble
Copy link
Contributor

jtribble commented Jan 5, 2024

This issue is till occurring in v1.37.0 (#1789 appears to be a different issue; notice the error message is different).

Here are the relevant error messages:

OpenTelemetry trace error occurred: error sending request for url (http://localhost:8126/v0.5/traces): connection closed before message completed
OpenTelemetry trace error occurred: error sending request for url (http://localhost:8126/v0.5/traces): connection error: Connection reset by peer (os error 104)

Here's my telemetry config (notice the errors are related to the datadog tracer):

telemetry:
  exporters:
    metrics:
      common:
        service_name: ${env.DD_SERVICE:-graphql-router}
      prometheus:
        enabled: true
        listen: 0.0.0.0:9090
        path: /metrics
    tracing:
      common:
        service_name: ${env.DD_SERVICE:-graphql-router}
      datadog:
        enabled: true
        endpoint: ${env.DD_ENDPOINT}
        enable_span_mapping: true

Could be related to open-telemetry/opentelemetry-rust-contrib#7

@Geal Geal reopened this Mar 12, 2024
@Geal
Copy link
Contributor

Geal commented Mar 12, 2024

reopening as it is apparently still not resolved and we are getting more reports of it. Apparently the Datadog agent is correctly receiving the trace, but abruptly closing the connection. Other libraries have workarounds for it: https://github.com/will-bank/datadog-tracing/blob/30cdfba8d00caa04f6ac8e304f76403a5eb97129/src/tracer.rs#L29

@BrynCooke
Copy link
Contributor

I had a go at this and although better it still produced occasional errors in the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants