-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SendBatch: Retry retryable errors #235
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #235 +/- ##
==========================================
+ Coverage 70.10% 70.24% +0.14%
==========================================
Files 27 27
Lines 3733 3771 +38
==========================================
+ Hits 2617 2649 +32
- Misses 1001 1003 +2
- Partials 115 119 +4
☔ View full report in Codecov by Sentry. |
TestConcurrentRetryableError leaks an establishRegion goroutine trying to discover the meta region. It repeats endlessly in a loop because zookeeper is mocked to always return an error. This leaked goroutine breaks a test I add in the next change.
func (c *client) waitForCompletion(ctx context.Context, rc hrpc.RegionClient, | ||
rpcs []hrpc.Call, results []hrpc.RPCResult, rpcToRes map[hrpc.Call]int) bool { | ||
rpcs []hrpc.Call, results []hrpc.RPCResult, rpcToRes map[hrpc.Call]int) ( | ||
retryables []hrpc.Call, shouldBackoff, unretryableError, ok bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a comment about about the unretryableError
in the doc comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
rpc.go
Outdated
}() | ||
|
||
// Exit retry loop if no RPCs are retryable because they all | ||
// succeeded or hit unretryable errors, or the context is done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to clarify - if we have seen an unretryable error, should this be checked here and also a condition to break out of the loop? Does checking len(retries)
encompass this, or do we also need to check unretryableSeen
? This comment makes me think we do want to break here if there are any unretryable errors seen, but previous comments make it unclear if the intention is to still retry any retryable errors (even if another rpc has seen an unretryable error). Although I don't think we should retry a batch that has any unretryable errors, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided not to break here if unretryableErrorSeen
because we can still retry the retryable errors. That way as many RPCs as possible in the batch are able to succeed.
In other words if your batch has 4 RPCs, the first 2 get retryable errors and the last 2 get non-retryable errors, we will retry the first 2 until they succeed and only return errors for the last 2.
Checking for len(retries) == 0
makes sure we only exit the loop when there are no more RPCs with retryable errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
To match the behavior of SendRPC, SendBatch should retry RPCs that hit retryable errors: region.RetryableError, region.ServerError, and region.NotServingRegionError. SendBatch will now retry each RPC that hits a retryable error. What used to be a single step through of assigning regions to RPCs, grouping them by region server and then dispatching the RPCs to their respective servers, is now done in a loop. The first iteration of the loop operates on the entire batch. Later iterations operate on the set of RPCs that failed with retryable errors in the previous batch.
To match the behavior of SendRPC, SendBatch should retry RPCs that hit retryable errors: region.RetryableError, region.ServerError, and region.NotServingRegionError.
SendBatch will now retry each RPC that hits a retryable error. What used to be a single step through of assigning regions to RPCs, grouping them by region server and then dispatching the RPCs to their respective servers, is now done in a loop. The first iteration of the loop operates on the entire batch. Later iterations operate on the set of RPCs that failed with retryable errors in the previous batch.