-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AJ-1160: reusable retry functions for REST clients #385
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a rewrite of HttpSamClientSupport
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a rewrite/expansion of HttpSamClientSupportTest
} catch (Exception e) { | ||
int exceptionHttpCode = extractResponseCode(e); | ||
if (exceptionHttpCode == Integer.MIN_VALUE) { | ||
LOGGER.error(loggerHint + " REST request resulted in " + e.getMessage(), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend not logging if we're just going to rethrow (per Mistake 3 in randomly Googled to back my hunch).
Let whomever catches and handles (or crashes on) this take care of logging stuff out.
"Error from " + loggerHint + " REST target: " + e.getMessage()); | ||
} | ||
|
||
LOGGER.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defer logging to when this gets handled or causes a crash.
RestCall<String> wrappedFunction = | ||
() -> { | ||
voidRestCall.run(); | ||
return "void"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, there's no Void
class available to use here?
(Asking because I got used to referring to such a class in similar situations in Google's quirky codebase, but perhaps it was a Google-specific thing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a crack at this - this is all logging logic that I just ported over from the previous HttpSamClientSupport
and did not change. I do want to log when we hit an exception and then retry, because those won't get bubbled up to the caller; this means we will have at least one instance of potential double-logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated (and refactored to use switch/case). We now log the retryable exceptions, and do not log the non-retryable ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh heh I realized I am responding to the wrong thread, but I hope you get the gist :)
|
||
/** interface representing a callable REST client function that returns void. */ | ||
@FunctionalInterface | ||
public interface VoidRestCall { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slick :)
@@ -115,13 +115,15 @@ sentry: | |||
mrg: ${LZ_MRG:} | |||
releasename: ${RELEASE_NAME:} | |||
|
|||
sam: | |||
# retry configuration for REST clients (Sam, WSM, TDR, Leo, etc) | |||
rest: | |||
retry: | |||
maxAttempts: 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this set the policy for all rest retries?
At risk of deepening your yakshave, I wanted to suggest thinking about how we might slice the retry policy based on error code.
By my (now potentially very outdated) understanding of http conventions, in general 4xx errors are client errors that should not be retried -- if the client got it wrong once, they are likely to get it wrong again if something doesn't change (this link is just a joke to describe the situation->) ref
Even in the case of a rate limit exception (429), the client should stop because they've overflowed the server and they're being asked to cut it out.
And the only 5xx error that should be retried is 503, which indicates the server can't help right now
Perhaps terra services have different implementations of how these http codes get surfaced, but I'm wondering if there's an opportunity here to really pare back how much retries are actually used and only very selectively retry in the rare case of 503s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does set the retry policy for all REST clients - and I agree in the future we may want different policies for Sam vs. TDR, etc. etc.
As far as which exceptions are retried, I updated some code comments. The logic here is based on which exception is thrown, and specifically if that exception is a RestRetryableException
or not; this is because our retries are based on Spring's @Retryable
and that is based on exceptions. We retry 0 (which represents a connection failure) and 500/502/503/504, mostly because our target services aren't solid at returning 503 vs. 50x
@Autowired RestClientRetry restClientRetry; | ||
|
||
/** | ||
* reusable annotation for @CartesianTest that supplies all the exceptions we care about. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa...@CartesianTest
|
||
@CartesianTest(name = "withRetryAndErrorHandling will retry on status code {0} for exception {1}") | ||
void retryableExceptions( | ||
@CartesianTest.Values(ints = {0, 500, 502, 503, 504}) int code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my earlier comment about retry policy...I wonder if we should make 500 & 502 non-retryable?
Another randomly googled source to back my hunch: https://stackoverflow.com/questions/47680711/which-http-errors-should-never-trigger-an-automatic-retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! The only suggestion I left which I think is important to consider (but you can bust past it anyway) is the suggestion about not logging AND rethrowing.
Everything else is gravy/optional/deferrable. Great work!
Kudos, SonarCloud Quality Gate passed! |
delayExpression = "${rest.retry.backoff.delay}", | ||
multiplierExpression = "${rest.retry.backoff.multiplier}"), | ||
listeners = {"retryLoggingListener"}) | ||
public <T> T withRetryAndErrorHandling(RestCall<T> restCall, String loggerHint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidangb, after seeing this get used more in #396, I wonder if it'd make sense to move the loggerHint
to become the first param to facilitate more inline function passing?
EG:
withRetryAndErrorHandling("WSM.restcall", () -> wsmDao.doStuff());
// easier to read IMO
VS:
withRetryAndErrorHandling(() -> wsmDao.doStuff(), "WSM.restcall"));
// harder to read IMO, especially if the function call breaks onto multiple lines
Or, maybe (somewhat confusingly, but perhaps more usable?) even provide both signatures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about this one way or the other - if it's something you think is important, go for it
import org.springframework.stereotype.Component; | ||
|
||
@Component | ||
public class RestClientRetry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More post-hoc feedback after seeing this get used in the downstream PR...
How do you intend for this to get used by client classes? I noticed in #396 that it's being injected by Spring into the constructor, but by my look at these methods it seems like this class is completely stateless.
I'm thinking it could either become a bunch of static methods (which I honestly prefer to avoid if possible)...OR could actually hold some state, like a loggerHint
prefix.
So if I'm the WSM DAO, I could initialize my own version of RestClientRetry
with loggerHintPrefix
of WSM
, and then only need to provide an extra qualifier of the method name when calling it.
Anyway, this is all API usability feedback that could become its own refactoring/followup ticket, but thought I'd drop it here where it was born for the time being.
This PR takes the Sam-specific retry logic that was in
HttpSamClientSupport
, makes it generic for use in all of the Sam/WSM/TDR/Leo/WDS clients, and rebrands it asRestClientRetry
.We had imho good test coverage for
HttpSamClientSupport
; that coverage was also expanded to testRestClientRetry
.SamDao
now usesRestClientRetry
instead ofHttpSamClientSupport
- in fact,HttpSamClientSupport
is deleted.To protect scope, this PR does NOT attempt to use the generic retry logic in the WSM, TDR, etc clients; that can come in later PRs.
This was spawned as a prefactor for AJ-1371, in which I'll need to add a new API call to WSM, and for which I wanted to use retries.