-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3.client.upload needs restart option #3907
Comments
Hi @jschwar313 thanks for reaching out. Have you tried using multipart uploads for your use case? (For example: create_multipart_upload and complete_multipart_upload). You can find more information here in the S3 User Guide. That documentation notes that one of the features of multipart uploads is:
Here's an example: import boto3
from botocore.exceptions import EndpointConnectionError
s3 = boto3.client('s3')
bucket_name = 'your-bucket-name'
key = 'your-key'
file_path = 'your-file'
# Initiate multipart upload
response = s3.create_multipart_upload(Bucket=bucket_name, Key=key)
upload_id = response['UploadId']
# Keep track of parts
parts = []
part_number = 1
try:
with open(file_path, 'rb') as f:
while True:
# Read part_size (e.g., 5 MB) from the file
data = f.read(5 * 1024 * 1024)
if not data:
break
# Try to upload part
try:
part = s3.upload_part(Body=data, Bucket=bucket_name, Key=key, PartNumber=part_number, UploadId=upload_id)
parts.append({'PartNumber': part_number, 'ETag': part['ETag']})
part_number += 1
except EndpointConnectionError as e:
print("EndpointConnectionError:", str(e))
break
except Exception as e:
print("An error occurred:", str(e))
s3.abort_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id)
else:
# Complete multipart upload
s3.complete_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id, MultipartUpload={'Parts': parts}) I hope that helps, please let us know if you have any follow up questions. |
Cool. Thanks. That helps a lot.
From: Tim Finnigan ***@***.***>
Sent: Thursday, October 26, 2023 12:59 PM
To: boto/boto3 ***@***.***>
Cc: jschwar313 ***@***.***>; Mention ***@***.***>
Subject: Re: [boto/boto3] s3.client.upload needs restart option (Issue #3907)
Hi @jschwar313 <https://github.com/jschwar313> thanks for reaching out. Have you tried using multipart uploads for your use case? (For example: create_multipart_upload <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/create_multipart_upload.html> and complete_multipart_upload <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/complete_multipart_upload.html> ). You can find more information here in the S3 User Guide <https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html> . That documentation notes that one of the features of multipart uploads is:
Pause and resume object uploads – You can upload object parts over time. After you initiate a multipart upload, there is no expiry; you must explicitly complete or stop the multipart upload.
Here's an example:
import boto3
from botocore.exceptions import EndpointConnectionError
s3 = boto3.client('s3')
bucket_name = 'your-bucket-name'
key = 'your-key'
file_path = 'your-file'
# Initiate multipart upload
response = s3.create_multipart_upload(Bucket=bucket_name, Key=key)
upload_id = response['UploadId']
# Keep track of parts
parts = []
part_number = 1
try:
with open(file_path, 'rb') as f:
while True:
# Read part_size (e.g., 5 MB) from the file
data = f.read(5 * 1024 * 1024)
if not data:
break
# Try to upload part
try:
part = s3.upload_part(Body=data, Bucket=bucket_name, Key=key, PartNumber=part_number, UploadId=upload_id)
parts.append({'PartNumber': part_number, 'ETag': part['ETag']})
part_number += 1
except EndpointConnectionError as e:
print("EndpointConnectionError:", str(e))
break
except Exception as e:
print("An error occurred:", str(e))
s3.abort_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id)
else:
# Complete multipart upload
s3.complete_multipart_upload(Bucket=bucket_name, Key=key, UploadId=upload_id, MultipartUpload={'Parts': parts})
I hope that helps, please let us know if you have any follow up questions.
—
Reply to this email directly, view it on GitHub <#3907 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQ7M2FYW2X3LNWK3QBTO6HTYBKQFBAVCNFSM6AAAAAA6I5RGUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBRGU4DCNBVGU> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AQ7M2F7UQVZX4SCNPSOWRQDYBKQFBA5CNFSM6AAAAAA6I5RGUCWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTKGDDI6.gif> Message ID: ***@***.*** ***@***.***> >
|
Is it possible to make this multi-threaded by using TransferConfig? |
That's a good start. I see a few things I want to add to it. Thanks so much. I assume there is no way to mutithread this? There are a lot of changes I'm making to that code, but it still is a good start. |
I do have a question. I'm working on my solution and I figured out quite a few things I need to add to your starting code. One question I have is, if I kill an upload_part session midstream while I'm testing (I will code for abends to do this), will a lifecycle policy like this clean up these incomplete multipart uploads or complete parts that have been uploaded but will never be assembled? I assume this will happen no matter which method I use to do the uploads, right? Here's a description of the lifecycle policy I have implemented on these buckets: https://aws.amazon.com/blogs/aws-cloud-financial-management/discovering-and-deleting-incomplete-multipart-uploads-to-lower-amazon-s3-costs/ |
Hi @jschwar313 thanks for following up. Yes as described in the blog post you linked, you do need to explicitly configure a lifecycle policy to clean up incomplete multipart uploads. (Or you could manually list and abort incomplete multipart uploads using Boto3.) Regarding your earlier question - per the Boto3 documentation on file transfer configuration, you can use concurrency/threading when doing multipart uploads. You can also enable S3 Transfer Acceleration. |
I have it working the way I want it to work. I've done quite a few things that I've seen in other posts about these routines and that helped a lot. Thanks for the information. you can close this. I appreciate the help. |
|
Describe the feature
The s3.client.upload needs a restart option after an EndpointConnectionError. Or is this coded somewhere I don't know?
Use Case
I'm always frustrated when my internet goes out and the upload has to start over. This can take a while. I have comcast.
Proposed Solution
Suggest a restart parameter to the s3.client.upload that would resume uploading where the upload left off.
I'm using aws-cli/2.9.3 Python/3.9.11 Windows/10 exe/AMD64 prompt/off
I'm coding in python using Python 3.12.0 and
boto3 1.26.160
botocore 1.29.160
Other Information
No response
Acknowledgements
SDK version used
2.9.3
Environment details (OS name and version, etc.)
Windows 10 version 22H2 build 19045.3570
The text was updated successfully, but these errors were encountered: