-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DogStatsd.Configure does not catch SocketException "No such host is known" when the agent host cannot be found in DNS #138
Comments
Hello @tylerohlsen, Thank you for reporting this feature request. I have created a card in our backlog. |
Hi @ogaca-dd, Thanks for adding the card to your backlog! I've run into two causes of DNS failures that have cause this. In the first case, I misconfigured the address to the Datadog agent. This caused an immediate and irrecoverable error where our services in our Kubernetes cluster could not start and were in a continuous restart loop. I would have liked the services to start and an error log created so I could fix the issue on my own time. In this case, the issue would not have been able to self-recover and therefore the issue would be present for a long period of time. In the second case, we had a version of the Kubernetes CNI that had a bug and intermittently was causing pods to start up without any network access. This would prevent cron jobs and sidecar containers from starting up that would otherwise not need network access. In this case, the issue was only a few minutes at a time. Tyler |
Is there any update regarding this issue? I'm using version 6.0.0 and still observe it. DD agent availability or any other issue with metrics should not cause that the whole service is crashed. |
any update on this one? we are still experiencing the same issue. I had to create a wrapper around the service to prevent the whole system to crash because some misconfiguration or if DataDog is down |
thanks for your reply @ogaca-dd. I have one more question for you related to this topic. if for some reason the agent is down, the calls to |
@yoliva , This is a good question. In case of a DNS failure, any errors during |
I call
DogStatsd.Configure
once on application start. I ran into an issue where I had configured the agent wrong and so the agent pod in my cluster was not starting and the DNS entry was not yet added for the agent. This cascaded to where my application was failing to start because it would crash because the DNS lookup was failing and the exception was bubbling up the stack and was unhandled.Now, I could write my own logic to catch this and retry on a background thread and queue the pending metrics. But I think the internals of this library already do that for transient network issues after it connects once, so it feels most appropriate for this library to also handle the DNS lookup failure.
Here's the stack trace:
The text was updated successfully, but these errors were encountered: