Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service/s3: HeadObject waiter always retries on error, regardless of type #2937

Open
2 of 3 tasks
goro9 opened this issue Dec 22, 2024 · 3 comments
Open
2 of 3 tasks
Assignees
Labels
bug This issue is a bug. p1 This is a high priority issue queued This issues is on the AWS team's backlog

Comments

@goro9
Copy link

goro9 commented Dec 22, 2024

Acknowledgements

Describe the bug

It seems that the ObjectExistsStateRetryable used by default in NewObjectExistsWaiter is being retried for errors other than NotFound.

func objectExistsStateRetryable(ctx context.Context, input *HeadObjectInput, output *HeadObjectOutput, err error) (bool, error) {
if err == nil {
return false, nil
}
if err != nil {
var errorType *types.NotFound
if errors.As(err, &errorType) {
return true, nil
}
}
return true, nil
}

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

For NotFound errors: retry HeadObject
For errors other than NotFound: return err as is

Current Behavior

Always retry when an error occurs, regardless of the error type.

Reproduction Steps

options.Retryable = func(_ context.Context, _ *s3.HeadObjectInput, _ *s3.HeadObjectOutput, err error) (bool, error) {
	if err == nil {
		return false, nil
	}

	var errorType *types.NotFound
	if errors.As(err, &errorType) {
		return true, nil
	}

	return false, err
}

Possible Solution

No response

Additional Information/Context

No response

AWS Go SDK V2 Module Versions Used

	github.com/aws/aws-sdk-go-v2/service/s3 v1.71.1

Compiler and Version used

go version go1.23.4 darwin/arm64

Operating System and version

macOS Sonoma 14.1

@goro9 goro9 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Dec 22, 2024
@adev-code adev-code self-assigned this Dec 23, 2024
@adev-code adev-code added investigating This issue is being investigated and/or work is in progress to resolve the issue. p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Dec 23, 2024
@adev-code
Copy link

Hello @goro9, thanks for reaching out and reporting the issue. I have replicated from my side and yes Go SDK is not retrying as it should. Meanwhile trying with Python SDK, it does make a retry. In this regard, I have brought this issue to the team for further review. I will let you know as soon as I get any updates. If you have any question let me know. Thanks.

@adev-code adev-code added needs-review This issue or pull request needs review from a core team member. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Dec 23, 2024
@Madrigal Madrigal added p1 This is a high priority issue queued This issues is on the AWS team's backlog and removed needs-review This issue or pull request needs review from a core team member. p3 This is a minor priority issue labels Jan 6, 2025
@Madrigal
Copy link
Contributor

Madrigal commented Jan 6, 2025

Based on waiter workflow it seems like we're not doing the right thing. We need to address this and figure out if it's safe to roll out.

Need to also do some investigation around waiters on other services

@lucix-aws lucix-aws changed the title service/s3: Always retry when an error occurs, regardless of the error type service/s3: HeadObject waiter always retries on error, regardless of type Jan 8, 2025
@lucix-aws lucix-aws assigned lucix-aws and unassigned adev-code Jan 8, 2025
@lucix-aws
Copy link
Contributor

Waiter workflow step 4:

If none of the acceptors are matched and an error was encountered while calling the operation, then transition to the failure state and stop waiting.

Our implementation, for reasons probably lost to time, is just not doing this.

We can change it, trivially, but that comes with risk. Basically, any scenario where a waiter is called, some transient non-matching errors come back, and then a match does eventually come back would be broken. They would now fail on the unmatched error, as the spec says they should, instead of ignoring it. Whether or not anyone in the wild depends on that behavior, it's impossible to say.

The other option would be to let users enable the "on-spec" behavior through a new flag. Not at all a fan of having what would essentially be doThisCorrectly bool but it's the only way to avoid the aforementioned risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p1 This is a high priority issue queued This issues is on the AWS team's backlog
Projects
None yet
Development

No branches or pull requests

4 participants