Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waiters do not have a configurable retry / timeout strategy #1431

Open
1 task
cloudshiftchris opened this issue Oct 4, 2024 · 2 comments · Fixed by smithy-lang/smithy-kotlin#1183
Open
1 task
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue

Comments

@cloudshiftchris
Copy link

cloudshiftchris commented Oct 4, 2024

Describe the bug

When using waiters the underlying operation may take considerable time that exceeds the hard-coded attempts/timeouts. This results in the waiter throwing a TooManyAttemptsException.

For example, RdsClient.waitUntilDBSnapshotAvailable has a hard-coded retry strategy:

val strategy = StandardRetryStrategy {
        maxAttempts = 20
        tokenBucket = InfiniteTokenBucket
        delayProvider {
            initialDelay = 30_000.milliseconds
            scaleFactor = 1.5
            jitter = 1.0
            maxBackoff = 120_000.milliseconds
        }
    }

In a situation where there is a large snapshot that takes greater than the ~17m the retry strategy allows for it will fail; this is unexpected as a) the operation is still in progress (in our case, it completed successfully after ~30m) and b) waitUntilDBSnapshotAvailable isn't documented as having a timeout / doesn't offer any configurability on timeouts.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected behavior

Expecting that waiters either wait indefinitely (which is the implied behaviour, though not ideal) or take a default-value parameter with a retry strategy (or any alternate approach that makes it clear there are timeouts and how to configure them).

The AWS Java SDK allow waiters to be configurable: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/waiters.html

Current behavior

As described above RdsClient.waitUntilDBSnapshotAvailable (and likely other Waiters) times out unexpectedly (well, unexpected if you haven't read the code).

Steps to Reproduce

Execute a Waiter whose operation takes more than the hard-coded retry/timeout limit.

Possible Solution

Ideally the API would be more up-front and configurable for waiters to allow configurability on the timeouts and clearly document that it isn't solely waiting for the primary condition (there's also timeouts in play).

As a workaround have cobbled together the below to allow coarse-grained configurability:

    private suspend fun waitForSnapshot(snapshotId: String) {
        var retries = 0
        while (true) {
            try {
                rdsClient.waitUntilDBSnapshotAvailable(
                    DescribeDbSnapshotsRequest { this.dbSnapshotIdentifier = snapshotId }
                )
                return
            } catch (e: TooManyAttemptsException) {
                if (retries >= 3) {
                    throw e
                }
                retries++
            }
        }
    }

Context

Using the SDK to initiate an RDS snapshot, and wait for it to complete before continuing on with upgrading the database.

AWS SDK for Kotlin version

1.3.42

Platform (JVM/JS/Native)

JVM

Operating system and version

MacOS / Linux

@cloudshiftchris cloudshiftchris added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 4, 2024
@lauzadis
Copy link
Member

lauzadis commented Oct 7, 2024

Thanks for the feature request! We've added an internal backlog item (SDK-KT-369) and will keep you updated on the status here.

@lauzadis lauzadis added feature-request A feature should be added or improved. p2 This is a standard priority issue and removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 7, 2024
@lauzadis
Copy link
Member

lauzadis commented Dec 9, 2024

This has been implemented and will be released in the next minor version of the SDK (v1.4), coming soon...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants