Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog exporter does not work with async http client #7

Open
ArtBlnd opened this issue Nov 25, 2021 · 20 comments
Open

Datadog exporter does not work with async http client #7

ArtBlnd opened this issue Nov 25, 2021 · 20 comments
Labels
bug Something isn't working

Comments

@ArtBlnd
Copy link

ArtBlnd commented Nov 25, 2021

OpenTelemetry trace error occurred. cannot send span to the batch span processor because the channel is closed

Default and custom client both does not work. same error message occurs.

#[derive(Debug)]
struct ReqwestClient {
    client: reqwest::Client
}

#[async_trait::async_trait]
impl HttpClient for ReqwestClient {
    async fn send(&self, request: Request<Vec<u8>>) -> Result<Response<Bytes>, HttpError> {
        Ok(self.client.send(request).await?)
    }
}
@TommyCpp
Copy link
Contributor

It may not be a problem with datadog exporter. Could you share a more detailed example of how you configure the tracer provider? Or maybe try using some other exporters and see if it works. It may help us narrow down the cause. Thanks in advance.

@TommyCpp TommyCpp added the bug Something isn't working label Nov 28, 2021
@ArtBlnd
Copy link
Author

ArtBlnd commented Nov 29, 2021

let tracer = new_pipeline()
            .with_trace_config(trace::config().with_sampler(Sampler::AlwaysOn))
            .with_service_name("test")
            .with_version(ApiVersion::Version05)
            .with_agent_endpoint("localhost")
            .with_http_client::<ReqwestClient>(Box::new(ReqwestClient {
                client: reqwest::Client::new(),
            }))
            .install_batch(Tokio)
            .expect("failed to initialize tracing pipeline");

        tracing_subscriber::fmt()
            .with_ansi(false)
            .with_writer(std::io::stderr)
            .with_target(false)
            .finish()
            .with(OpenTelemetryLayer::new(tracer))
            .init();

@ArtBlnd
Copy link
Author

ArtBlnd commented Nov 29, 2021

It does work fine with blocking reqwest http client. but emit error sometimes.

OpenTelemetry trace error occurred. error sending request for url (http://localhost/v0.5/traces): connection closed before message completed

@ArtBlnd
Copy link
Author

ArtBlnd commented Nov 29, 2021

If I unwrap error. I got this error from reqwest

IoError(Custom { kind: Other, error: "Connection broken" })

@ArtBlnd
Copy link
Author

ArtBlnd commented Nov 30, 2021

opentelemetry-jaeger works fine.

@TommyCpp
Copy link
Contributor

TommyCpp commented Dec 3, 2021

It feels like a communication problem with the datadog collector. I noticed you are sending spans to http://localhost. Is that expected?

@ArtBlnd
Copy link
Author

ArtBlnd commented Dec 3, 2021

Thats not a communication problem. that is expected.
I changed it to localhost on text not to expose external ip address. (in the real code uses actual address with working port.)

surely its datadog exporter bug.

The exporter does not work with ASYNC http client. it works fine with BLOCKING REQWEST http client. which I already notice.

@TommyCpp
Copy link
Contributor

TommyCpp commented Dec 4, 2021

Thanks for the bug report. I did try to reproduce this bug locally. Here is my current setup.

Here is my approach:

You mentioned the default reqwest async client is not working so I enabled the reqwest-client feature, which should provide a default async reqwest client. Then I start a datadog agent container locally. Running the example and validate the span is show up in the datadog APM console.

I think I am missing something here, could you take a look and see if there is anything that need to change in my example?

@rex-remind101
Copy link
Contributor

fwiw, I ran into this exact same problem last week.

@rex-remind101
Copy link
Contributor

surf-client seems to work until you try install_batch #26

@rex-remind101
Copy link
Contributor

I came to the realization that when using reqwest-client with install_batch I do receive traces in datadog. However, I had set the primary operation name to one of thee child spans operation names and not the root span, and when using install_batch the child spans do not seem to have individual traces. When I set the UI configuration in DataDog to use the root as the primary operation name I do see traces however. This issue does not happen when using surf-client + install_simple, I see all traces in that case. (I set the primary operation name to a child for required reasons I won't dive into, but this is blocking me because I need this configured as such.)

I'm not sure if this will help your case, but I can at least confirm this batch + reqwest-client "works" but is not doing everything I'd expect.

@rex-remind101
Copy link
Contributor

And surf-client does not work with install_batch still.

haixuanTao referenced this issue in dora-rs/dora Jul 5, 2022
Previously the tracing service name was defaulted to rust.dora which
prevented some functionalities of Jaeger tracing.

There were also issues with pushing traces as batch as see here: https://github.com/open-telemetry/opentelemetry-rust/issues/674
@alexpusch
Copy link

I'm getting OpenTelemetry trace error occurred. error sending request for url (http://<ip>:8126/v0.5/traces): connection closed before message completed as well. Using reqwest-client and install_batch.
Traces do reach Datadog, so it seems that this error is not critical, just annoying. I do not see any error on the datadog trace agnet side

@Meemaw
Copy link

Meemaw commented Dec 28, 2022

Seeing these errors as well. Any known workarounds?

@TommyCpp
Copy link
Contributor

Seeing these errors as well. Any known workarounds?

Emm does this impact your spans collections? Since it doesn't seem to impact the span collections it has been a low-priority fix.

@xlc
Copy link

xlc commented May 15, 2023

I see the same issue I think it is impacting to the spans collection. I see data been fed correctly on our staging env but not prod env. I think this could be due to we have lot more data in prod and therefore it is trying to send lot more data in a single batch and hitting some limit and failed.

I tried OTEL_BSP_MAX_EXPORT_BATCH_SIZE and it seems helping.

@mckinnsb
Copy link

mckinnsb commented May 18, 2023

Is it possible this is because the Datadog Exporter doesn't attempt to group spans in batches? It could be spending over thousands of spans at once. I saw issues like this when I was using Jaeger on OSX until I set with_auto_split_batch and with_max_packet_size (but no such options exist for DD exporter, nor are they OTEL standards).

Would it make sense to start investigating around this?

@jtribble
Copy link

Here's a snapshot of what's happening:

https://www.loom.com/share/3dfe4c652cc14633953d82cffa146dae

It looks to me like the datadog agent is accepting the trace batch and closing the connection, but for some reason the client is considering the conn prematurely closed.

@user9747
Copy link

I can confirm that it works with reqwest-client and install_batch, just that its throws an error in the logs even though we are getting data in datadog. Also noticed that in my case it was working only for port :8126 of the datadog agent.

@ryo33
Copy link

ryo33 commented Oct 31, 2023

This seems related. Is it so? https://github.com/will-bank/datadog-tracing/blob/30cdfba8d00caa04f6ac8e304f76403a5eb97129/src/tracer.rs#L29
According to this, the problem may be a keep-alive matter and caused by a long-running socket, but it does not illustrate why the blocking client seems working properly although it has a similar connection pooling mechanism as the async one.

@hdost hdost transferred this issue from open-telemetry/opentelemetry-rust Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants