-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OkHttp: Retry on SocketException: Broken pipe #962
Comments
Thanks for the issue.
I'd be interested in understanding what types of exceptions you are seeing. Are they all the same as the one you included in this issue (e.g.
The issue is that knowing if a network exception is retryable generically for any operation depends on whether the operation is idempotent. Take the example given, you hit something like a socket closed issue (perhaps the server closed it or there was some other network issue). What if this failure happened after the server modified state? How do we know it's safe to retry at that point? The answer is we don't (unless the operation is marked as @idempotent). We don't special case idempotent operations yet, that would be a good addition/feature request. It's possible we may be able to special case S3's retry policy to be more relaxed but again we'd need a good way to classify which networking errors are retryable. In the meantime you can always customize the retry policy for the SDK to retry whatever kind of error you want. This doesn't require changing the e.g. class CustomRetryPolicy : RetryPolicy<Any?> {
override fun evaluate(result: Result<Any?>): RetryDirective {
return when(result.exceptionOrNull()) {
is IOException -> RetryDirective.RetryError(RetryErrorType.Transient)
else -> AwsDefaultRetryPolicy.evaluate(result)
}
}
}
fun main(): Unit = runBlocking {
val s3 = S3Client.fromEnvironment {
retryPolicy = CustomRetryPolicy()
}
... Note: You'll need a dependency on |
Thank you for your response. I see the problem with non-idempotent requests and that not all errors (or all kinds of And thanks for the hint about the
I don't have good stats right now, because we also use some older SDK versions in some services, but I believe that from the recent versions, the One thing we also see regularly, but I suppose it's unrelated and just a bug in OkHttp, is |
Hi @berlix, are you still encountering socket exceptions when using OkHttp with more recent versions of the SDK? |
Googling aws-kotlin-sdk "Unbalanced enter/exit" brought me here :) @ianbotsf I'm getting (this is on This might be a test coroutines issue though? |
Thanks for the test code @madisp but I cannot reproduce the failure on my local Windows machine or my cloud EC2 Linux machine. It's likely that parallelism and timings are making this difficult to reproduce in different executing environments. I see that OkHttp recently fixed an unbalanced enter/exit exception caused by rapid release/re-acquisition of connections (square/okhttp#7381). If you have stable repro code, it would be interesting to see if it still fails on OkHttp 5.0.0-alpha.12 (smithy-kotlin currently uses 5.0.0-alpha.11). If not then we can prioritize upgrading the OkHttp dependency to the latest alpha version. |
Yup, still getting with |
Apologies for the late response. We haven't observed that specific exception ("Broken pipe") with SDK versions 1.0.13 or 1.1.13. We do still regularly get the following exception, at least when downloading objects from S3 - no clue if it's related:
|
The above exception is perhaps related to awslabs/aws-sdk-kotlin#1214 |
As cloudshiftchris said above, we believe this issue is related to S3 closing connections earlier than expected, which OkHttp does not handle by default. We've added a new "connection monitoring" feature to detect when a connection is closed prematurely and prevent its use. You can enable this on your S3 client by configuring a non-null S3Client.fromEnvironment {
httpEngine(OkHttpEngine) {
connectionIdlePollingInterval = 200.milliseconds
}
} Please enable this and reach out if you have any more issues! |
Describe the bug
I am aware of the fix in #896. However, even using SDK versions that include the fix (AWS Kotlin SDK 0.32.0-beta), we run into frequent, unhandled networking exceptions when talking to just about any AWS service, and especially S3.
Expected Behavior
To retry the request, so that application code doesn't have to.
Current Behavior
The request is not retried, and the exception is propagated up to the application:
As a side note: we tried switching to the CRT client, but there we've also observed at least one instance of an exception that looks retryable:
Steps to Reproduce
We observe this mostly (but not only) in long-running tasks that make many concurrent requests to AWS services.
Possible Solution
To consider
java.net.SocketException
retryable.Your Environment
The text was updated successfully, but these errors were encountered: