Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3.client.upload needs restart option #3907

Closed
2 tasks
jschwar313 opened this issue Oct 20, 2023 · 8 comments
Closed
2 tasks

s3.client.upload needs restart option #3907

jschwar313 opened this issue Oct 20, 2023 · 8 comments
Assignees
Labels
feature-request This issue requests a feature. s3

Comments

@jschwar313
Copy link

Describe the feature

The s3.client.upload needs a restart option after an EndpointConnectionError. Or is this coded somewhere I don't know?

Use Case

I'm always frustrated when my internet goes out and the upload has to start over. This can take a while. I have comcast.

Proposed Solution

Suggest a restart parameter to the s3.client.upload that would resume uploading where the upload left off.

I'm using aws-cli/2.9.3 Python/3.9.11 Windows/10 exe/AMD64 prompt/off

I'm coding in python using Python 3.12.0 and

boto3 1.26.160
botocore 1.29.160

Other Information

No response

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

SDK version used

2.9.3

Environment details (OS name and version, etc.)

Windows 10 version 22H2 build 19045.3570

@jschwar313 jschwar313 added feature-request This issue requests a feature. needs-triage This issue or PR still needs to be triaged. labels Oct 20, 2023
@tim-finnigan tim-finnigan self-assigned this Oct 26, 2023
@tim-finnigan
Copy link
Contributor

Hi @jschwar313 thanks for reaching out. Have you tried using multipart uploads for your use case? (For example: create_multipart_upload and complete_multipart_upload). You can find more information here in the S3 User Guide. That documentation notes that one of the features of multipart uploads is:

Pause and resume object uploads – You can upload object parts over time. After you initiate a multipart upload, there is no expiry; you must explicitly complete or stop the multipart upload.

Here's an example:

import boto3
from botocore.exceptions import EndpointConnectionError
s3 = boto3.client('s3')

bucket_name = 'your-bucket-name'
key = 'your-key'
file_path = 'your-file'

# Initiate multipart upload
response = s3.create_multipart_upload(Bucket=bucket_name, Key=key)
upload_id = response['UploadId']

# Keep track of parts
parts = []
part_number = 1

try:
    with open(file_path, 'rb') as f:
        while True:
            # Read part_size (e.g., 5 MB) from the file
            data = f.read(5 * 1024 * 1024)
            if not data:
                break
            # Try to upload part
            try:
                part = s3.upload_part(Body=data, Bucket=bucket_name, Key=key, PartNumber=part_number, UploadId=upload_id)
                parts.append({'PartNumber': part_number, 'ETag': part['ETag']})
                part_number += 1
            except EndpointConnectionError as e:
                print("EndpointConnectionError:", str(e))
                break
except Exception as e:
    print("An error occurred:", str(e))
    s3.abort_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id)
else:
    # Complete multipart upload
    s3.complete_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id, MultipartUpload={'Parts': parts})

I hope that helps, please let us know if you have any follow up questions.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. s3 and removed needs-triage This issue or PR still needs to be triaged. labels Oct 26, 2023
@jschwar313
Copy link
Author

jschwar313 commented Oct 26, 2023 via email

@tim-finnigan tim-finnigan added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed response-requested Waiting on additional information or feedback. labels Oct 26, 2023
@jschwar313
Copy link
Author

Is it possible to make this multi-threaded by using TransferConfig?

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Oct 29, 2023
@jschwar313
Copy link
Author

jschwar313 commented Oct 29, 2023

That's a good start. I see a few things I want to add to it. Thanks so much. I assume there is no way to mutithread this? There are a lot of changes I'm making to that code, but it still is a good start.

@jschwar313
Copy link
Author

jschwar313 commented Nov 2, 2023

I do have a question. I'm working on my solution and I figured out quite a few things I need to add to your starting code. One question I have is, if I kill an upload_part session midstream while I'm testing (I will code for abends to do this), will a lifecycle policy like this clean up these incomplete multipart uploads or complete parts that have been uploaded but will never be assembled? I assume this will happen no matter which method I use to do the uploads, right? Here's a description of the lifecycle policy I have implemented on these buckets: https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/

@tim-finnigan
Copy link
Contributor

Hi @jschwar313 thanks for following up. Yes as described in the blog post you linked, you do need to explicitly configure a lifecycle policy to clean up incomplete multipart uploads. (Or you could manually list and abort incomplete multipart uploads using Boto3.)

Regarding your earlier question - per the Boto3 documentation on file transfer configuration, you can use concurrency/threading when doing multipart uploads. You can also enable S3 Transfer Acceleration.

@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label Nov 2, 2023
@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Nov 3, 2023
@jschwar313
Copy link
Author

jschwar313 commented Nov 5, 2023

I have it working the way I want it to work. I've done quite a few things that I've seen in other posts about these routines and that helped a lot. Thanks for the information. you can close this. I appreciate the help.

Copy link

github-actions bot commented Nov 6, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue requests a feature. s3
Projects
None yet
Development

No branches or pull requests

2 participants