DogStatsd.Configure does not catch SocketException "No such host is known" when the agent host cannot be found in DNS #138

tylerohlsen · 2020-10-01T18:44:41Z

I call DogStatsd.Configure once on application start. I ran into an issue where I had configured the agent wrong and so the agent pod in my cluster was not starting and the DNS entry was not yet added for the agent. This cascaded to where my application was failing to start because it would crash because the DNS lookup was failing and the exception was bubbling up the stack and was unhandled.

Now, I could write my own logic to catch this and retry on a background thread and queue the pending metrics. But I think the internals of this library already do that for transient network issues after it connects once, so it feels most appropriate for this library to also handle the DNS lookup failure.

Here's the stack trace:

 ---> System.Net.Sockets.SocketException (11001): No such host is known.
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw(Exception source)
   at System.Net.Dns.HostResolutionEndHelper(IAsyncResult asyncResult)
   at System.Net.Dns.EndGetHostEntry(IAsyncResult asyncResult)
   at System.Net.Dns.<>c.<GetHostEntryAsync>b__27_1(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at System.Threading.Tasks.Task`1.get_Result()
   at StatsdClient.StatsdUDP.GetIpv4Address(String name)
   at StatsdClient.StatsdBuilder.CreateUDPStatsSender(StatsdConfig config, String statsdServerName)
   at StatsdClient.StatsdBuilder.CreateStatsSender(StatsdConfig config, String statsdServerName)
   at StatsdClient.StatsdBuilder.BuildStatsData(StatsdConfig config)
   at StatsdClient.DogStatsdService.Configure(StatsdConfig config)
   at StatsdClient.DogStatsd.Configure(StatsdConfig config)

The text was updated successfully, but these errors were encountered:

ogaca-dd · 2020-11-04T15:51:30Z

Hello @tylerohlsen,

Thank you for reporting this feature request. I have created a card in our backlog.
What kind of DNS failure do you have? Is it brief random DNS failures or the DNS can be unavailable for a few minutes or more?

tylerohlsen · 2020-11-05T15:35:04Z

Hi @ogaca-dd,

Thanks for adding the card to your backlog! I've run into two causes of DNS failures that have cause this.

In the first case, I misconfigured the address to the Datadog agent. This caused an immediate and irrecoverable error where our services in our Kubernetes cluster could not start and were in a continuous restart loop. I would have liked the services to start and an error log created so I could fix the issue on my own time. In this case, the issue would not have been able to self-recover and therefore the issue would be present for a long period of time.

In the second case, we had a version of the Kubernetes CNI that had a bug and intermittently was causing pods to start up without any network access. This would prevent cron jobs and sidecar containers from starting up that would otherwise not need network access. In this case, the issue was only a few minutes at a time.

Tyler

pblachut · 2021-01-08T11:36:38Z

Is there any update regarding this issue? I'm using version 6.0.0 and still observe it.

DD agent availability or any other issue with metrics should not cause that the whole service is crashed.

yoliva · 2022-10-18T13:50:53Z

any update on this one? we are still experiencing the same issue. I had to create a wrapper around the service to prevent the whole system to crash because some misconfiguration or if DataDog is down

ogaca-dd · 2022-10-18T14:54:57Z

Hello @yoliva , @pblachut

I have opened this PR to avoid Configure to throw an exception. It will be part of the next release.

yoliva · 2022-10-18T16:16:04Z

thanks for your reply @ogaca-dd.

I have one more question for you related to this topic. if for some reason the agent is down, the calls to .Gauge(), .Increment(), .Histogram(), etc will fail or the library has any sort of retry or ignore policy for scenarios like this one.
Thanks!

ogaca-dd · 2022-10-21T15:39:40Z

@yoliva ,

This is a good question. In case of a DNS failure, any errors during Configure are fatal errors and all metrics will be ignored.
If the DNS resolution succeeds but the Agent is down and recovers 5 minutes later, then the new metrics will be sent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DogStatsd.Configure does not catch SocketException "No such host is known" when the agent host cannot be found in DNS #138

DogStatsd.Configure does not catch SocketException "No such host is known" when the agent host cannot be found in DNS #138

tylerohlsen commented Oct 1, 2020

ogaca-dd commented Nov 4, 2020

tylerohlsen commented Nov 5, 2020

pblachut commented Jan 8, 2021

yoliva commented Oct 18, 2022

ogaca-dd commented Oct 18, 2022

yoliva commented Oct 18, 2022

ogaca-dd commented Oct 21, 2022

DogStatsd.Configure does not catch SocketException "No such host is known" when the agent host cannot be found in DNS #138

DogStatsd.Configure does not catch SocketException "No such host is known" when the agent host cannot be found in DNS #138

Comments

tylerohlsen commented Oct 1, 2020

ogaca-dd commented Nov 4, 2020

tylerohlsen commented Nov 5, 2020

pblachut commented Jan 8, 2021

yoliva commented Oct 18, 2022

ogaca-dd commented Oct 18, 2022

yoliva commented Oct 18, 2022

ogaca-dd commented Oct 21, 2022