Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/fetch retry and error handling #21

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

TheMagoo73
Copy link
Contributor

A slightly naive implementation of retry logic to mitigate transient errors communicating with the intake. Can be improved down the line by filtering HTTP response codes such as 401 that should not cause a retry as they're not transient.

@TheMagoo73
Copy link
Contributor Author

Found some time today to improve the retry logic slightly. It now only retries HTTP 500 and greater errors. Anything in the 400 range is considered a 'persistent' error and isn't retried - the POST bails immediately. There are a couple of test cases to cover each scenario.

Copy link
Owner

@itsfadnis itsfadnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheMagoo73 My apologies for the delay in getting back at this PR

While I think retrying requests is a good idea, a better idea would be to expose a callback that the consumer of datadog-winston could call if a request fails.

This way the consumer can decide what they want to do with a log when it fails, rather than us trying to resend it to datadog.

We could probably expose something like an onFailure hook, so it could be used like:

const transport = new DatadogTransport({
  // options
  onFailure: (data) => {
	// Log transport failed,
    // Do whatever you want with the failed log
  }
})

What are your thoughts on this?

@TheMagoo73
Copy link
Contributor Author

Apologies for the delay in getting back - holiday season!

I like the idea of a callback, it definitely adds more flexibility, I'm playing with some ideas around it at the moment. The key for me, I think, is to allow the caller to have enough information to handle retries, back offs etc. easily. My thinking is that an enriched version of the original log request that includes the number of failures would probably no the job.

I think supplying an 'out-of-the-box' retry implementation that can be used for the hook handler is also something we should consider, so users have a really simple, zero(ish) config to just get something reliable working.

Will try and get something up this weekend!

@jonathanmorley jonathanmorley mentioned this pull request Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants