Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry logic stop early #1318

Closed
NikolayMetchev opened this issue May 23, 2024 · 4 comments
Closed

Retry logic stop early #1318

NikolayMetchev opened this issue May 23, 2024 · 4 comments
Assignees
Labels
bug This issue is a bug.

Comments

@NikolayMetchev
Copy link

NikolayMetchev commented May 23, 2024

Describe the bug

We have our own lambda deployer tool that runs as part of our Continous Delivery setup and it can deploy up to 16 lambdas in parallel.
It turns out this filled up our allotted storage for Lambda versions, so we implemented cleanup logic that would delete old versions of the lambdas.

The first time this cleanup logic ran it did a lot of calls in paralel to delete lambdas and the lambda service responded with HTTP 429: Too Many Requests.

So I changed the code to up the number of retries to 20.

retryStrategy {
maxAttempts = 20
}

However the process didn't retry the full 20 times before failing. For some reason when it tried for the 11th time it failed. See log below.
You can see in the log snippet below that it thinks max should be 20
amz-sdk-request: attempt=11; max=20

2024-05-22T10:25:08,665+00:00 [kotlinx.coroutines.DefaultExecutor] DEBUG aws.smithy.kotlin.runtime.http.middleware.RetryMiddleware - retrying request, attempt 11
2024-05-22T10:25:08,666+00:00 [kotlinx.coroutines.DefaultExecutor] DEBUG aws.smithy.kotlin.runtime.http.operation.AuthHandler - resolved endpoint: Endpoint(uri=https://lambda.us-east-1.amazonaws.com, headers=null, attributes=aws.smithy.kotlin.runtime.collections.EmptyAttributes@62f11f14)
2024-05-22T10:25:08,667+00:00 [kotlinx.coroutines.DefaultExecutor] DEBUG aws.smithy.kotlin.runtime.auth.awssigning.DefaultAwsSignerImpl - Calculated signature: b21d0a438ad57077231bc405ecf43fd3b1a08560ab0bc42448f3a538d9a5f7e9
2024-05-22T10:25:08,667+00:00 [kotlinx.coroutines.DefaultExecutor] DEBUG httpTraceMiddleware - HttpRequest:
DELETE /2015-03-31/functions/arn%3Aaws%3Alambda%3Aus-east-1%3A637423545847%3Afunction%3Aauth-avp-dev?Qualifier=34
Host: lambda.us-east-1.amazonaws.com
User-Agent: aws-sdk-kotlin/1.2.14 ua/2.0 api/lambda#1.2.14 os/linux#4.4.0 lang/kotlin#1.9.24 md/javaVersion#17.0.11 md/jvmName#OpenJDK_64-Bit_Server_VM md/jvmVersion#17.0.11+9-LTS
x-amz-user-agent: aws-sdk-kotlin/1.2.14
amz-sdk-invocation-id: f7c676d3-3515-4d03-b3aa-940da9f670d8
amz-sdk-request: attempt=11; max=20
X-Amz-Date: 20240522T102508Z
X-Amz-Security-Token: IQoJb3JpZ2luX2VjEOP//////////wEaCXVzLWVhc3QtMSJGMEQCIEQjKsjhepoAzreX/XI0FvFFSw1+cj36go1pjsnAqKJuAiBiTI4hhzHDecwWis3P5tx+cD9m4YP4+8j81E8xL3wLoCqcAghcEAAaDDYzNzQyMzU0NTg0NyIMeigmBsfLfcS/JM0TKvkB8SxRNcA3pNZHYht6jr9tk10ZXR2DOJxAYCsAMQjV0ixUYntachAT0DWCIvQPsA06mfmrp9cKfBWNFaxL7tkLtDvzH4e3Tx/1t6hB+gD09rWK767V5lldzq+2lrpkludLeBkPbK8PZ41Z/tu83Nkvc6zYoauSo70DUhEEsdyttQPHpm2DyQBbjh8z5Pjhqcja5zt9Bl8IhP/0mJmznjde9M48PzRddk8AvUmqdIuNxj7R1D2dgStVtgz1HC8fc1Mhme2uTqHs1qP0ZUSmYQsjVsfzB4zqCb3NE+voo3NlQoTx/P5oms4zbQbtG+V6D8rXE5F+qMpoQsInMPiPt7IGOp4BcB5A0431TvbQxzyoeUc/mXl+vtnnVFClfnr0QiKF76xhTefpxl3J2nsfs7gTHue75I8J8CvD+MOpfBMbObE6kd8WlbBaKXm/A1muuO5gdhi6uAr5RRLyVd5yj92HFEy20MOATgz3GxBRQjKQO+DoTylDNj++YUP60LaOqfeNrVmym/I3H2z1+dDB+3NVzQWRZRUuGcwrxxAEtne87os=
Authorization: AWS4-HMAC-SHA256 Credential=ASIAZI2LHNX3QKN5TLG3/20240522/us-east-1/lambda/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;host;x-amz-date;x-amz-security-token;x-amz-user-agent, Signature=b21d0a438ad57077231bc405ecf43fd3b1a08560ab0bc42448f3a538d9a5f7e9

2024-05-22T10:25:08,680+00:00 [DefaultDispatcher-worker-1] DEBUG httpTraceMiddleware - HttpResponse:
HTTP 429: Too Many Requests
content-length: 76
content-type: application/json
date: Wed, 22 May 2024 10:25:08 GMT
x-amzn-errortype: TooManyRequestsException
x-amzn-requestid: 47bab4d9-f58a-46e9-a17f-5939eda5bbe9

{"Reason":"CallerRateLimitExceeded","Type":"User","message":"Rate exceeded"}
2024-05-22T10:25:09,249+00:00 [kotlinx.coroutines.DefaultExecutor] ERROR com.paxos.messaging.cli.Main - Failed while running Lambda Deployer
aws.sdk.kotlin.services.lambda.model.TooManyRequestsException: Rate exceeded
at aws.sdk.kotlin.services.lambda.model.TooManyRequestsException$Builder.build(TooManyRequestsException.kt:82) ~[lambda-jvm-1.2.14.jar:?]
at aws.sdk.kotlin.services.lambda.serde.TooManyRequestsExceptionDeserializer.deserialize(TooManyRequestsExceptionDeserializer.kt:38) ~[lambda-jvm-1.2.14.jar:?]
at aws.sdk.kotlin.services.lambda.serde.DeleteFunctionOperationDeserializerKt.throwDeleteFunctionError(DeleteFunctionOperationDeserializer.kt:44) ~[lambda-jvm-1.2.14.jar:?]
at aws.sdk.kotlin.services.lambda.serde.DeleteFunctionOperationDeserializerKt.access$throwDeleteFunctionError(DeleteFunctionOperationDeserializer.kt:1) ~[lambda-jvm-1.2.14.jar:?]
at aws.sdk.kotlin.services.lambda.serde.DeleteFunctionOperationDeserializer.deserialize(DeleteFunctionOperationDeserializer.kt:21) ~[lambda-jvm-1.2.14.jar:?]
at aws.sdk.kotlin.services.lambda.serde.DeleteFunctionOperationDeserializer.deserialize(DeleteFunctionOperationDeserializer.kt:16) ~[lambda-jvm-1.2.14.jar:?]
at aws.smithy.kotlin.runtime.http.operation.DeserializeHandler.call(SdkOperationExecution.kt:338) ~[http-client-jvm-1.2.4.jar:?]
at aws.smithy.kotlin.runtime.http.operation.DeserializeHandler$call$1.invokeSuspend(SdkOperationExecution.kt) ~[http-client-jvm-1.2.4.jar:?]
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) ~[kotlin-stdlib-1.9.24.jar:1.9.24-release-822]
at kotlinx.coroutines.internal.DispatchedContinuationKt.resumeCancellableWith(DispatchedContinuation.kt:279) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.internal.DispatchedContinuationKt.resumeCancellableWith$default(DispatchedContinuation.kt:274) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.DispatchedCoroutine.afterResume(Builders.common.kt:257) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.AbstractCoroutine.resumeWith(AbstractCoroutine.kt:99) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:46) ~[kotlin-stdlib-1.9.24.jar:1.9.24-release-822]
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:811) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:715) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:702) ~[kotlinx-coroutines-core-jvm-1.8.1.jar:?]

Expected behavior

Retry up to 20 times to make the API call succeed.

Current behavior

Randomly fails after less than 20 tries.

Steps to Reproduce

run lots of lambda API calls in parallel

Possible Solution

No response

Context

No response

AWS SDK for Kotlin version

1.2.17

Platform (JVM/JS/Native)

JVM

Operating system and version

Linux docker

@NikolayMetchev NikolayMetchev added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels May 23, 2024
@lauzadis lauzadis self-assigned this May 31, 2024
@lauzadis lauzadis removed the needs-triage This issue or PR still needs to be triaged. label May 31, 2024
@lauzadis
Copy link
Member

lauzadis commented May 31, 2024

Thanks for the report! I'm able to replicate the issue and looking into a fix.

Simplified reproduction:

fun main(): Unit = runBlocking {
    val client = LambdaClient.fromEnvironment {
        retryStrategy {
            maxAttempts = 20
        }
    }

    for (i in 0 until 200) {
        launch { client.listFunctions() }
    }
}

Exception seen:

Exception in thread "main" TooManyRequestsException(message=Rate exceeded,reason=CallerRateLimitExceeded,retryAfterSeconds=null,type=User)
	at aws.sdk.kotlin.services.lambda.model.TooManyRequestsException$Builder.build(TooManyRequestsException.kt:82)
	at aws.sdk.kotlin.services.lambda.serde.TooManyRequestsExceptionDeserializer.deserialize(TooManyRequestsExceptionDeserializer.kt:38)
	at aws.sdk.kotlin.services.lambda.serde.ListFunctionsOperationDeserializerKt.throwListFunctionsError(ListFunctionsOperationDeserializer.kt:61)
	at aws.sdk.kotlin.services.lambda.serde.ListFunctionsOperationDeserializerKt.access$throwListFunctionsError(ListFunctionsOperationDeserializer.kt:1)
	at aws.sdk.kotlin.services.lambda.serde.ListFunctionsOperationDeserializer.deserialize(ListFunctionsOperationDeserializer.kt:36)
	at aws.sdk.kotlin.services.lambda.serde.ListFunctionsOperationDeserializer.deserialize(ListFunctionsOperationDeserializer.kt:31)
	at aws.smithy.kotlin.runtime.http.operation.DeserializeHandler.call(SdkOperationExecution.kt:338)
	at aws.smithy.kotlin.runtime.http.operation.DeserializeHandler$call$1.invokeSuspend(SdkOperationExecution.kt)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:277)
	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:95)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:69)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:48)
	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
	at sdktest.LambdaThrottlesKt.main(LambdaThrottles.kt:15)
	at sdktest.LambdaThrottlesKt.main(LambdaThrottles.kt)

@lauzadis
Copy link
Member

lauzadis commented Jun 4, 2024

This seems to be related to a change we made in v0.28.0-beta which changed the default StandardRetryTokenBucket configuration. We're still looking into if the change is correct and whether it will be reverted, but in the meantime, you can work around this error by manually reverting the configuration yourself.

Specifically, you need to set useCircuitBreakerMode = false and refillUnitsPerSecond = 10.

    val client = LambdaClient.fromEnvironment {
        retryStrategy {
            maxAttempts = 20
            tokenBucket {
                useCircuitBreakerMode = false
                refillUnitsPerSecond = 10
            }
        }
    }

@lauzadis
Copy link
Member

lauzadis commented Jun 4, 2024

Hey, we confirmed this default retry behavior is correct. It was implemented as part of a campaign to make retries more consistent across all AWS SDKs. Please use the workaround provided if you need to launch this many coroutines in the future.

Let me know if you have any more questions!

@lauzadis lauzadis closed this as completed Jun 4, 2024
Copy link

github-actions bot commented Jun 4, 2024

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

2 participants