Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement S3's diverse-IP performance recommendation internally #2331

Open
josh-newman opened this issue Jan 13, 2021 · 7 comments
Open

Implement S3's diverse-IP performance recommendation internally #2331

josh-newman opened this issue Jan 13, 2021 · 7 comments
Labels
feature/s3/manager Pertains to S3 transfer manager HLL (feature/s3/manager). feature-request A feature should be added or improved. p2 This is a standard priority issue queued This issues is on the AWS team's backlog

Comments

@josh-newman
Copy link

Is your feature request related to a problem? Please describe.

My team runs batch data processing jobs using dozens of machines in EC2. The machines tend to boot up at the same time, and then each reads 10,000s of files from S3. Sometimes, this loading process is significantly slowed by S3 throttling (503 SlowDowns, connection resets, etc.), likely depending on S3's internal scaling for the (many) prefixes involved (we observed this before and after 2018-07-17), maybe even the number of concurrent jobs.

S3 performance recommendations say:

Finally, it’s worth paying attention to DNS and double-checking that requests are being spread over a wide pool of Amazon S3 IP addresses. DNS queries for Amazon S3 cycle through a large list of IP endpoints. But caching resolvers or application code that reuses a single IP address do not benefit from address diversity and the load balancing that follows from it.

I observed that the AWS-provided DNS resolver in our VPC seemed to internally cache results for S3 hostnames (bucket.us-west-2.s3.amazonaws.com) for around 4 secs each. Since our machines initiate 10,000s of S3 object reads shortly after booting (and also periodically throughout the job—it works in phases), this apparently led to them connecting to relatively few S3 peers (demonstration program). I think this led to throttling even when our request rates were below S3's theoretical limits.

Describe the solution you'd like

It'd be great if the SDK handled this internally, transparently (for example, diversifying connection pools).

Describe alternatives you've considered

We're trying out a workaround: a custom "net/http".RoundTripper implementation that rewrites requests to spread load over all known S3 peers. Over time (over many VPC DNS cache intervals) we resolve more S3 IPs, spreading load over many peers and avoiding throttling (in our experience so far). However, this implementation is relatively inelegant and inconvenient, and there are probably better ways to handle this.

In other issues I've seen recommendations to use s3manager to retry throttling errors. Unfortunately I don't think we can use that in our application because we're interested in streaming (read, compute, discard), buffering in memory or on local disk might increase costs. Also, that seems to use the same HTTP client as the regular interface, so I'd expect it to succeed slowly, whereas connecting to more peers could succeed quickly.

Additional context

I noticed that issues aws/aws-sdk-go#1763, aws/aws-sdk-go#3707, aws/aws-sdk-go#1242 mention throttling so there's a chance those users could benefit from this, too.

CC @jcharum @yasushi-saito

@skotambkar
Copy link
Contributor

We do not have plans to implement this in V1 SDK. But we may potentially implement this in our aws/aws-sdk-go-v2 SDK. Moving this feature request to track in the V2 SDK

@skotambkar skotambkar transferred this issue from aws/aws-sdk-go May 10, 2021
@vdm
Copy link

vdm commented Sep 30, 2022

CPP SDK 1.9 does this. https://github.com/aws/aws-sdk-cpp/wiki/Improving-S3-Throughput-with-AWS-SDK-for-CPP-v1.9#working-with-the-new-s3-crt-client

@zephyap
Copy link

zephyap commented Jun 29, 2023

Hey does anyone know if this made it into the SDK in the end?

@aajtodd
Copy link
Contributor

aajtodd commented Jun 29, 2023

It has not, we have no plans to implement this currently. Feel free to upvote the issue though (even better comment with your own use case/workload and why this feature would help).

@lucix-aws lucix-aws transferred this issue from aws/smithy-go Oct 23, 2023
@lucix-aws lucix-aws added feature-request A feature should be added or improved. p3 This is a minor priority issue labels Oct 30, 2023
@RanVaknin
Copy link
Contributor

RanVaknin commented Feb 22, 2024

Hi everyone on the thread,

We have decided not to move forward with implementing this. We are not in a position to own this behavior.

I'm going to close this.

Thanks,
Ran

@RanVaknin RanVaknin closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

@lucix-aws lucix-aws reopened this Jul 8, 2024
@lucix-aws
Copy link
Contributor

Reopening and attaching this to the feature/s3/manager backlog, which is where we'd like to put it. The CRT-based S3 transfer manager client has done something similar if not identical.

@lucix-aws lucix-aws added p2 This is a standard priority issue queued This issues is on the AWS team's backlog feature/s3/manager Pertains to S3 transfer manager HLL (feature/s3/manager). and removed p3 This is a minor priority issue labels Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/s3/manager Pertains to S3 transfer manager HLL (feature/s3/manager). feature-request A feature should be added or improved. p2 This is a standard priority issue queued This issues is on the AWS team's backlog
Projects
None yet
Development

No branches or pull requests

7 participants