SendBatch: Retry retryable errors #235

aaronbee · 2023-10-06T20:49:49Z

To match the behavior of SendRPC, SendBatch should retry RPCs that hit retryable errors: region.RetryableError, region.ServerError, and region.NotServingRegionError.

SendBatch will now retry each RPC that hits a retryable error. What used to be a single step through of assigning regions to RPCs, grouping them by region server and then dispatching the RPCs to their respective servers, is now done in a loop. The first iteration of the loop operates on the entire batch. Later iterations operate on the set of RPCs that failed with retryable errors in the previous batch.

codecov · 2023-10-06T20:59:49Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (f750f62) 70.10% compared to head (2711130) 70.24%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #235      +/-   ##
==========================================
+ Coverage   70.10%   70.24%   +0.14%     
==========================================
  Files          27       27              
  Lines        3733     3771      +38     
==========================================
+ Hits         2617     2649      +32     
- Misses       1001     1003       +2     
- Partials      115      119       +4

Files	Coverage Δ
rpc.go	`84.86% <92.85%> (+0.12%)`	⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

TestConcurrentRetryableError leaks an establishRegion goroutine trying to discover the meta region. It repeats endlessly in a loop because zookeeper is mocked to always return an error. This leaked goroutine breaks a test I add in the next change.

ciacono · 2023-10-20T19:27:49Z

rpc.go

 func (c *client) waitForCompletion(ctx context.Context, rc hrpc.RegionClient,
-	rpcs []hrpc.Call, results []hrpc.RPCResult, rpcToRes map[hrpc.Call]int) bool {
+	rpcs []hrpc.Call, results []hrpc.RPCResult, rpcToRes map[hrpc.Call]int) (
+	retryables []hrpc.Call, shouldBackoff, unretryableError, ok bool) {


Should there be a comment about about the unretryableError in the doc comment?

ciacono · 2023-10-20T19:32:31Z

rpc.go

+		}()
+
+		// Exit retry loop if no RPCs are retryable because they all
+		// succeeded or hit unretryable errors, or the context is done


Want to clarify - if we have seen an unretryable error, should this be checked here and also a condition to break out of the loop? Does checking len(retries) encompass this, or do we also need to check unretryableSeen? This comment makes me think we do want to break here if there are any unretryable errors seen, but previous comments make it unclear if the intention is to still retry any retryable errors (even if another rpc has seen an unretryable error). Although I don't think we should retry a batch that has any unretryable errors, right?

I decided not to break here if unretryableErrorSeen because we can still retry the retryable errors. That way as many RPCs as possible in the batch are able to succeed.
In other words if your batch has 4 RPCs, the first 2 get retryable errors and the last 2 get non-retryable errors, we will retry the first 2 until they succeed and only return errors for the last 2.

Checking for len(retries) == 0 makes sure we only exit the loop when there are no more RPCs with retryable errors.

To match the behavior of SendRPC, SendBatch should retry RPCs that hit retryable errors: region.RetryableError, region.ServerError, and region.NotServingRegionError. SendBatch will now retry each RPC that hits a retryable error. What used to be a single step through of assigning regions to RPCs, grouping them by region server and then dispatching the RPCs to their respective servers, is now done in a loop. The first iteration of the loop operates on the entire batch. Later iterations operate on the set of RPCs that failed with retryable errors in the previous batch.

aaronbee force-pushed the batchretry branch from 32cc917 to f29a089 Compare October 6, 2023 20:56

aaronbee force-pushed the batchretry branch from f29a089 to ecff86c Compare October 10, 2023 00:13

aaronbee force-pushed the batchretry branch from ecff86c to e6f33a5 Compare October 11, 2023 02:37

aaronbee force-pushed the batchretry branch from e6f33a5 to a4a5cb5 Compare October 20, 2023 18:06

ciacono reviewed Oct 20, 2023

View reviewed changes

aaronbee force-pushed the batchretry branch from a4a5cb5 to 2711130 Compare October 20, 2023 20:48

ciacono approved these changes Oct 23, 2023

View reviewed changes

aaronbee merged commit 75354d5 into tsuna:master Oct 23, 2023
4 checks passed

aaronbee deleted the batchretry branch October 23, 2023 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SendBatch: Retry retryable errors #235

SendBatch: Retry retryable errors #235

aaronbee commented Oct 6, 2023 •

edited

Loading

codecov bot commented Oct 6, 2023 •

edited

Loading

ciacono Oct 20, 2023

aaronbee Oct 20, 2023

ciacono Oct 20, 2023

aaronbee Oct 20, 2023

ciacono Oct 23, 2023

SendBatch: Retry retryable errors #235

SendBatch: Retry retryable errors #235

Conversation

aaronbee commented Oct 6, 2023 • edited Loading

codecov bot commented Oct 6, 2023 • edited Loading

Codecov Report

ciacono Oct 20, 2023

Choose a reason for hiding this comment

aaronbee Oct 20, 2023

Choose a reason for hiding this comment

ciacono Oct 20, 2023

Choose a reason for hiding this comment

aaronbee Oct 20, 2023

Choose a reason for hiding this comment

ciacono Oct 23, 2023

Choose a reason for hiding this comment

aaronbee commented Oct 6, 2023 •

edited

Loading

codecov bot commented Oct 6, 2023 •

edited

Loading