Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ILambdaMessaging: Reset message visibility timeout when handler returns failed status #150

Open
1 of 2 tasks
brendonparker opened this issue Jul 24, 2024 · 7 comments
Open
1 of 2 tasks
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue

Comments

@brendonparker
Copy link
Contributor

Describe the feature

Not sure if this a feature or a bug.

It appears there is some handling for resetting the visibility timeout when using the SQS Poller. However, the same feature doesn't exist when using the ILambdaMessaging.ProcessLambdaEventWithBatchResponseAsync.

Use Case

When the handler returns a Failed status, the message visibility should be reset, so that it can be immediately re-attempted and doesn't block that message group from processing the next message.

Proposed Solution

No response

Other Information

Using FIFO queue

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

AWS.Messaging (or related) package versions

AWS.Messaging.Lambda 0.9.1

Targeted .NET Platform

.NET 8

Operating System and version

AmazonLinux/Lambda

@brendonparker brendonparker added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jul 24, 2024
@brendonparker
Copy link
Contributor Author

Actually, it looks like this behavior is already called out as a potential TODO here:

awslabs/aws-dotnet-messaging/docs/docs/design/message-visibility-timeout-handling.md

Having some configurability around this would be nice.
I can probably work around it in the short term, by manually changing the visibility back to zero.

@brendonparker
Copy link
Contributor Author

brendonparker commented Jul 24, 2024

This is my current work-around. Rolling my own handling of this, Would be nice to be handled by "the framework".

public async Task<SQSBatchResponse> HandleAsync(SQSEvent evnt, ILambdaContext context)
{
    using var scope = Host.Services.CreateScope();
    var lambdaMessaging = scope.ServiceProvider.GetRequiredService<ILambdaMessaging>();
    var batchResponse = await lambdaMessaging.ProcessLambdaEventWithBatchResponseAsync(evnt, context);
    await ChangeMessageVisibilityForFailures(evnt, scope, batchResponse);
    return batchResponse;
}

private static async Task ChangeMessageVisibilityForFailures(SQSEvent evnt, IServiceScope scope, SQSBatchResponse batchResponse)
{
    var failureCount = batchResponse.BatchItemFailures.Count;
    if (failureCount == 0) return;

    using var sqs = scope.Resolve<IAmazonSQS>();
    var lookup = evnt.Records.ToDictionary(x => x.MessageId);

    if (failureCount == 1)
    {
        await sqs.ChangeMessageVisibilityAsync(new()
        {
            QueueUrl = AwsMessagingExtensions.MainQueueUrl,
            ReceiptHandle = lookup[batchResponse.BatchItemFailures[0].ItemIdentifier].ReceiptHandle,
            VisibilityTimeout = 0
        });
        return;
    }

    await sqs.ChangeMessageVisibilityBatchAsync(new()
    {
        QueueUrl = AwsMessagingExtensions.MainQueueUrl,
        Entries = batchResponse.BatchItemFailures
            .Select(x => new ChangeMessageVisibilityBatchRequestEntry
            {
                Id = x.ItemIdentifier,
                ReceiptHandle = lookup[x.ItemIdentifier].ReceiptHandle,
                VisibilityTimeout = 0
            }).
            ToList()
    });
}

@ashishdhingra ashishdhingra added needs-review p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Jul 24, 2024
@ashishdhingra
Copy link
Contributor

Needs review with the team.

@normj
Copy link
Contributor

normj commented Jul 24, 2024

I wouldn't by default reset visibility to 0 if the handler failed. The failure that would be successful in a retry is often due to a transit dependency issue that might need sometime to recover. But I can see use cases where you do want to retry right away especially in a FIFO environment so you aren't blocking other messages in the group like you said.

@brendonparker
Copy link
Contributor Author

Fair point that a default of 0 may not be best. But some OOTB configurability around what that number is would fit the bill.

May not be best practice, but I've typically setup my default visibility timeout to mirror my lambda timeout (15 minutes), with a MaxReceiveCount to 1 or 2 and reset visibility to zero to retry effectively immediately. Otherwise with the 15 minute timeout you get some pretty big backups in the queue.

@brendonparker
Copy link
Contributor Author

Happy to take a stab at a PR if you're open to it.
But don't want to spend time on it if you'd like to leave it out.

@normj
Copy link
Contributor

normj commented Jul 25, 2024

I'm happy to take a PR for an opt in feature to turn this on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants