Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETL-656] Limit input-to-intermediate Lambda batch size to 1 #118

Merged
merged 1 commit into from
Jun 6, 2024

Conversation

philerooski
Copy link
Contributor

See ticket for rationale.

@philerooski philerooski requested a review from a team as a code owner June 4, 2024 23:28
Copy link

sonarqubecloud bot commented Jun 4, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link
Contributor

@rxu17 rxu17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned the bottleneck in our integration testing pipeline: If too many S3 events (new exports) are consumed by the S3 to JSON job, it will timeout. Are you saying that 10 events in a bundle will be too much for the S3 to JSON job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, depending on which exports it's trying to process. The timeout is 2 hours, so more than 6 adult exports would be likely to cause a timeout.

I could have simply increased the timeout, but limiting the batch size to 1 will allow us to process exports in parallel and complete our tests faster. Most of the cost-efficiency gains from a larger batch size come from the reduced setup time of worker instances -- which is a pretty negligible component of the total processing time if it takes us 20 minutes to process one export.

Description: >-
The maximum number of S3 messages in a SQS event that Lambda will process in a batch

LambdaMaximumBatchingWindowInSeconds:
Type: Number
Default: 300
Default: 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it seems like we really won't need this anymore since our S3 to JSON job has a maximum concurrency of ~150 to begin with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it doesn't make much sense to set it at all. But it doesn't hurt to keep it explicitly defined in the template, either.

@philerooski philerooski merged commit 8d027ed into main Jun 6, 2024
15 checks passed
@thomasyu888 thomasyu888 deleted the etl-656 branch October 30, 2024 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants