ILambdaMessaging: Reset message visibility timeout when handler returns failed status #150

brendonparker · 2024-07-24T01:25:29Z

Describe the feature

Not sure if this a feature or a bug.

It appears there is some handling for resetting the visibility timeout when using the SQS Poller. However, the same feature doesn't exist when using the ILambdaMessaging.ProcessLambdaEventWithBatchResponseAsync.

Use Case

When the handler returns a Failed status, the message visibility should be reset, so that it can be immediately re-attempted and doesn't block that message group from processing the next message.

Proposed Solution

No response

Other Information

Using FIFO queue

Acknowledgements

I may be able to implement this feature request
This feature might incur a breaking change

AWS.Messaging (or related) package versions

AWS.Messaging.Lambda 0.9.1

Targeted .NET Platform

.NET 8

Operating System and version

AmazonLinux/Lambda

The text was updated successfully, but these errors were encountered:

brendonparker · 2024-07-24T01:34:59Z

Actually, it looks like this behavior is already called out as a potential TODO here:

awslabs/aws-dotnet-messaging/docs/docs/design/message-visibility-timeout-handling.md

Having some configurability around this would be nice.
I can probably work around it in the short term, by manually changing the visibility back to zero.

brendonparker · 2024-07-24T15:48:58Z

This is my current work-around. Rolling my own handling of this, Would be nice to be handled by "the framework".

public async Task<SQSBatchResponse> HandleAsync(SQSEvent evnt, ILambdaContext context)
{
    using var scope = Host.Services.CreateScope();
    var lambdaMessaging = scope.ServiceProvider.GetRequiredService<ILambdaMessaging>();
    var batchResponse = await lambdaMessaging.ProcessLambdaEventWithBatchResponseAsync(evnt, context);
    await ChangeMessageVisibilityForFailures(evnt, scope, batchResponse);
    return batchResponse;
}

private static async Task ChangeMessageVisibilityForFailures(SQSEvent evnt, IServiceScope scope, SQSBatchResponse batchResponse)
{
    var failureCount = batchResponse.BatchItemFailures.Count;
    if (failureCount == 0) return;

    using var sqs = scope.Resolve<IAmazonSQS>();
    var lookup = evnt.Records.ToDictionary(x => x.MessageId);

    if (failureCount == 1)
    {
        await sqs.ChangeMessageVisibilityAsync(new()
        {
            QueueUrl = AwsMessagingExtensions.MainQueueUrl,
            ReceiptHandle = lookup[batchResponse.BatchItemFailures[0].ItemIdentifier].ReceiptHandle,
            VisibilityTimeout = 0
        });
        return;
    }

    await sqs.ChangeMessageVisibilityBatchAsync(new()
    {
        QueueUrl = AwsMessagingExtensions.MainQueueUrl,
        Entries = batchResponse.BatchItemFailures
            .Select(x => new ChangeMessageVisibilityBatchRequestEntry
            {
                Id = x.ItemIdentifier,
                ReceiptHandle = lookup[x.ItemIdentifier].ReceiptHandle,
                VisibilityTimeout = 0
            }).
            ToList()
    });
}

ashishdhingra · 2024-07-24T16:42:40Z

Needs review with the team.

normj · 2024-07-24T23:02:44Z

I wouldn't by default reset visibility to 0 if the handler failed. The failure that would be successful in a retry is often due to a transit dependency issue that might need sometime to recover. But I can see use cases where you do want to retry right away especially in a FIFO environment so you aren't blocking other messages in the group like you said.

brendonparker · 2024-07-24T23:25:48Z

Fair point that a default of 0 may not be best. But some OOTB configurability around what that number is would fit the bill.

May not be best practice, but I've typically setup my default visibility timeout to mirror my lambda timeout (15 minutes), with a MaxReceiveCount to 1 or 2 and reset visibility to zero to retry effectively immediately. Otherwise with the 15 minute timeout you get some pretty big backups in the queue.

brendonparker · 2024-07-24T23:26:54Z

Happy to take a stab at a PR if you're open to it.
But don't want to spend time on it if you'd like to leave it out.

normj · 2024-07-25T02:03:16Z

I'm happy to take a PR for an opt in feature to turn this on.

brendonparker added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jul 24, 2024

ashishdhingra added needs-review p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Jul 24, 2024

brendonparker mentioned this issue Jul 25, 2024

Add configurable visibility timeout default when using partial batch failures #151

Merged

ashishdhingra removed the needs-review label Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ILambdaMessaging: Reset message visibility timeout when handler returns failed status #150

ILambdaMessaging: Reset message visibility timeout when handler returns failed status #150

brendonparker commented Jul 24, 2024

brendonparker commented Jul 24, 2024

brendonparker commented Jul 24, 2024 •

edited

Loading

ashishdhingra commented Jul 24, 2024

normj commented Jul 24, 2024

brendonparker commented Jul 24, 2024

brendonparker commented Jul 24, 2024

normj commented Jul 25, 2024

ILambdaMessaging: Reset message visibility timeout when handler returns failed status #150

ILambdaMessaging: Reset message visibility timeout when handler returns failed status #150

Comments

brendonparker commented Jul 24, 2024

Describe the feature

Use Case

Proposed Solution

Other Information

Acknowledgements

AWS.Messaging (or related) package versions

Targeted .NET Platform

Operating System and version

brendonparker commented Jul 24, 2024

brendonparker commented Jul 24, 2024 • edited Loading

ashishdhingra commented Jul 24, 2024

normj commented Jul 24, 2024

brendonparker commented Jul 24, 2024

brendonparker commented Jul 24, 2024

normj commented Jul 25, 2024

brendonparker commented Jul 24, 2024 •

edited

Loading