Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AJ-1160: reusable retry functions for REST clients #385

Merged
merged 9 commits into from
Oct 27, 2023

Conversation

davidangb
Copy link
Collaborator

This PR takes the Sam-specific retry logic that was in HttpSamClientSupport, makes it generic for use in all of the Sam/WSM/TDR/Leo/WDS clients, and rebrands it as RestClientRetry.

We had imho good test coverage for HttpSamClientSupport; that coverage was also expanded to test RestClientRetry.

SamDao now uses RestClientRetry instead of HttpSamClientSupport - in fact, HttpSamClientSupport is deleted.

To protect scope, this PR does NOT attempt to use the generic retry logic in the WSM, TDR, etc clients; that can come in later PRs.

This was spawned as a prefactor for AJ-1371, in which I'll need to add a new API call to WSM, and for which I wanted to use retries.

Copy link
Collaborator Author

@davidangb davidangb Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a rewrite of HttpSamClientSupport

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a rewrite/expansion of HttpSamClientSupportTest

} catch (Exception e) {
int exceptionHttpCode = extractResponseCode(e);
if (exceptionHttpCode == Integer.MIN_VALUE) {
LOGGER.error(loggerHint + " REST request resulted in " + e.getMessage(), e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend not logging if we're just going to rethrow (per Mistake 3 in randomly Googled to back my hunch).

Let whomever catches and handles (or crashes on) this take care of logging stuff out.

"Error from " + loggerHint + " REST target: " + e.getMessage());
}

LOGGER.error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer logging to when this gets handled or causes a crash.

RestCall<String> wrappedFunction =
() -> {
voidRestCall.run();
return "void";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, there's no Void class available to use here?

(Asking because I got used to referring to such a class in similar situations in Google's quirky codebase, but perhaps it was a Google-specific thing?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a crack at this - this is all logging logic that I just ported over from the previous HttpSamClientSupport and did not change. I do want to log when we hit an exception and then retry, because those won't get bubbled up to the caller; this means we will have at least one instance of potential double-logging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated (and refactored to use switch/case). We now log the retryable exceptions, and do not log the non-retryable ones.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh heh I realized I am responding to the wrong thread, but I hope you get the gist :)


/** interface representing a callable REST client function that returns void. */
@FunctionalInterface
public interface VoidRestCall {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slick :)

@@ -115,13 +115,15 @@ sentry:
mrg: ${LZ_MRG:}
releasename: ${RELEASE_NAME:}

sam:
# retry configuration for REST clients (Sam, WSM, TDR, Leo, etc)
rest:
retry:
maxAttempts: 5
Copy link
Contributor

@jladieu jladieu Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this set the policy for all rest retries?

At risk of deepening your yakshave, I wanted to suggest thinking about how we might slice the retry policy based on error code.

By my (now potentially very outdated) understanding of http conventions, in general 4xx errors are client errors that should not be retried -- if the client got it wrong once, they are likely to get it wrong again if something doesn't change (this link is just a joke to describe the situation->) ref

Even in the case of a rate limit exception (429), the client should stop because they've overflowed the server and they're being asked to cut it out.

And the only 5xx error that should be retried is 503, which indicates the server can't help right now

Perhaps terra services have different implementations of how these http codes get surfaced, but I'm wondering if there's an opportunity here to really pare back how much retries are actually used and only very selectively retry in the rare case of 503s.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does set the retry policy for all REST clients - and I agree in the future we may want different policies for Sam vs. TDR, etc. etc.

As far as which exceptions are retried, I updated some code comments. The logic here is based on which exception is thrown, and specifically if that exception is a RestRetryableException or not; this is because our retries are based on Spring's @Retryable and that is based on exceptions. We retry 0 (which represents a connection failure) and 500/502/503/504, mostly because our target services aren't solid at returning 503 vs. 50x

@Autowired RestClientRetry restClientRetry;

/**
* reusable annotation for @CartesianTest that supplies all the exceptions we care about.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa...@CartesianTest


@CartesianTest(name = "withRetryAndErrorHandling will retry on status code {0} for exception {1}")
void retryableExceptions(
@CartesianTest.Values(ints = {0, 500, 502, 503, 504}) int code,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my earlier comment about retry policy...I wonder if we should make 500 & 502 non-retryable?

Another randomly googled source to back my hunch: https://stackoverflow.com/questions/47680711/which-http-errors-should-never-trigger-an-automatic-retry

Copy link
Contributor

@jladieu jladieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! The only suggestion I left which I think is important to consider (but you can bust past it anyway) is the suggestion about not logging AND rethrowing.

Everything else is gravy/optional/deferrable. Great work!

@sonarcloud
Copy link

sonarcloud bot commented Oct 27, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

90.2% 90.2% Coverage
0.0% 0.0% Duplication

@davidangb davidangb merged commit 8c3d88c into main Oct 27, 2023
12 checks passed
@davidangb davidangb deleted the da_AJ-1160_reusableRestRetries branch October 27, 2023 14:41
delayExpression = "${rest.retry.backoff.delay}",
multiplierExpression = "${rest.retry.backoff.multiplier}"),
listeners = {"retryLoggingListener"})
public <T> T withRetryAndErrorHandling(RestCall<T> restCall, String loggerHint)
Copy link
Contributor

@jladieu jladieu Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidangb, after seeing this get used more in #396, I wonder if it'd make sense to move the loggerHint to become the first param to facilitate more inline function passing?

EG:

withRetryAndErrorHandling("WSM.restcall", () -> wsmDao.doStuff()); // easier to read IMO

VS:

withRetryAndErrorHandling(() -> wsmDao.doStuff(), "WSM.restcall")); // harder to read IMO, especially if the function call breaks onto multiple lines

Or, maybe (somewhat confusingly, but perhaps more usable?) even provide both signatures?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about this one way or the other - if it's something you think is important, go for it

import org.springframework.stereotype.Component;

@Component
public class RestClientRetry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More post-hoc feedback after seeing this get used in the downstream PR...

How do you intend for this to get used by client classes? I noticed in #396 that it's being injected by Spring into the constructor, but by my look at these methods it seems like this class is completely stateless.

I'm thinking it could either become a bunch of static methods (which I honestly prefer to avoid if possible)...OR could actually hold some state, like a loggerHint prefix.

So if I'm the WSM DAO, I could initialize my own version of RestClientRetry with loggerHintPrefix of WSM, and then only need to provide an extra qualifier of the method name when calling it.

Anyway, this is all API usability feedback that could become its own refactoring/followup ticket, but thought I'd drop it here where it was born for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants