Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload_fileobj hangs indefinitely without returning error #4148

Closed
fitnycdigitalinitiatives opened this issue May 29, 2024 · 13 comments
Closed
Assignees
Labels
bug This issue is a confirmed bug. closing-soon This issue will automatically close in 4 days unless further comments are made. p2 This is a standard priority issue s3 third-party

Comments

@fitnycdigitalinitiatives

Describe the bug

Hello,

We've recently run into a problem that seems a bit similar to #3657. Occasionally our application will stall out uploading objects to our bucket seemingly randomly using upload_fileobj without returning an error. We've run this application for years without experience this issue, but recent changes have included upgrading boto3 to 1.33.2 and RHEL to 9 (not sure if relevant to the issue). As suggested in the related issue, we turned on the debug logs and here's an example when the upload just completely hangs for hours on end:

DEBUG     2024-05-28 19:33:11  s3transfer.utils:utils:acquire:632:  Acquiring 0
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_wait_until_all_complete:205:  UploadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f94df942b20>}) about to wait for the following futures []
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_wait_until_all_complete:216:  UploadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f94df942b20>}) done waiting for dependent futures
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_execute_main:160:  Executing task UploadSubmissionTask(transfer_id=0, {'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f94df942b20>}) with kwargs {'client': <botocore.client.S3 object at 0x7f94e0189490>, 'config': <boto3.s3.transfer.TransferConfig object at 0x7f94df860a60>, 'osutil': <s3transfer.utils.OSUtils object at 0x7f94e02bcd90>, 'request_executor': <s3transfer.futures.BoundedExecutor object at 0x7f94e02bc3d0>, 'transfer_future': <s3transfer.futures.TransferFuture object at 0x7f94df942b20>}
DEBUG     2024-05-28 19:33:11  s3transfer.futures:futures:submit:318:  Submitting task CreateMultipartUploadTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'extra_args': {}}) to executor <s3transfer.futures.BoundedExecutor object at 0x7f94e02bc3d0> for transfer request: 0.
DEBUG     2024-05-28 19:33:11  s3transfer.utils:utils:acquire:632:  Acquiring 0
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_wait_until_all_complete:205:  CreateMultipartUploadTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'extra_args': {}}) about to wait for the following futures []
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_wait_until_all_complete:216:  CreateMultipartUploadTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'extra_args': {}}) done waiting for dependent futures
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_execute_main:160:  Executing task CreateMultipartUploadTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'extra_args': {}}) with kwargs {'client': <botocore.client.S3 object at 0x7f94e0189490>, 'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'extra_args': {}}
DEBUG     2024-05-28 19:33:11  s3transfer.futures:futures:submit:318:  Submitting task UploadPartTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'part_number': 1, 'extra_args': {}}) to executor <s3transfer.futures.BoundedExecutor object at 0x7f94e02bc3d0> for transfer request: 0.
DEBUG     2024-05-28 19:33:11  s3transfer.utils:utils:acquire:632:  Acquiring 0
DEBUG     2024-05-28 19:33:11  urllib3.connectionpool:connectionpool:_make_request:474:  https://s3.us-east-1.amazonaws.com:443 "POST /BUCKET/packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z?uploads HTTP/1.1" 200 None
DEBUG     2024-05-28 19:33:11  s3transfer.utils:utils:release:645:  Releasing acquire 0/None
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_wait_until_all_complete:205:  UploadPartTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'part_number': 1, 'extra_args': {}}) about to wait for the following futures [<s3transfer.futures.ExecutorFuture object at 0x7f94e02d68b0>]
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_wait_until_all_complete:210:  UploadPartTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'part_number': 1, 'extra_args': {}}) about to wait for <s3transfer.futures.ExecutorFuture object at 0x7f94e02d68b0>
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_wait_until_all_complete:216:  UploadPartTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'part_number': 1, 'extra_args': {}}) done waiting for dependent futures
DEBUG     2024-05-28 19:33:11  s3transfer.tasks:tasks:_execute_main:160:  Executing task UploadPartTask(transfer_id=0, {'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'part_number': 1, 'extra_args': {}}) with kwargs {'client': <botocore.client.S3 object at 0x7f94e0189490>, 'fileobj': <s3transfer.utils.ReadFileChunk object at 0x7f94df85bb50>, 'bucket': 'BUCKET', 'key': 'packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z', 'part_number': 1, 'extra_args': {}, 'upload_id': '2SeyzpFlJu4q6sWC9ma6r_.qparAJoyiPsNNcFW_Y_R3zbVs_G9bdPIVD_fLSy8gBhGYrawLwYUHZLunxLkDABbJL4DFTfhBY_Fj1RzR7prL6IHtDPO1UvpvBefVagKq'}
DEBUG     2024-05-28 19:33:11  urllib3.connectionpool:connectionpool:_make_request:474:  https://s3.us-east-1.amazonaws.com:443 "PUT /BUCKET/packages/d388/83aa/aa70/4879/944a/20b3/a601/3537/lb_ds_chf_000227-d38883aa-aa70-4879-944a-20b3a6013537.7z?uploadId=2SeyzpFlJu4q6sWC9ma6r_.qparAJoyiPsNNcFW_Y_R3zbVs_G9bdPIVD_fLSy8gBhGYrawLwYUHZLunxLkDABbJL4DFTfhBY_Fj1RzR7prL6IHtDPO1UvpvBefVagKq&partNumber=1 HTTP/1.1" 200 0

As you can see it just hangs after the initial PUT. In successful uploads after the first PUT, there's DEBUG 2024-05-28 19:33:05 s3transfer.utils:utils:release:645: Releasing acquire 0/None

There's nothing different about the files that it's failing on vs. the files it uploads without issue. The vast majority of uploads are successful. Just these occasional uploads stall without error.

Expected Behavior

Uploads should fail with an error.

Current Behavior

See above.

Reproduction Steps

Not able to reproduce, happens randomly when using upload_fileobj. The application uses this method:

boto_args = {
    "service_name": "s3",
    "region_name": "REGION",
    "config": config,
}
resource = boto3.resource(**boto_args)
bucket = resource.Bucket("BUCKET")
with open('filename', 'rb') as data:
    bucket.upload_fileobj(data, 'mykey')

Possible Solution

No response

Additional Information/Context

No response

SDK version used

boto3 1.33.2

Environment details (OS name and version, etc.)

rhel 9

@fitnycdigitalinitiatives fitnycdigitalinitiatives added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels May 29, 2024
@tim-finnigan tim-finnigan self-assigned this Jun 4, 2024
@tim-finnigan tim-finnigan added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jun 4, 2024
@tim-finnigan
Copy link
Contributor

Thanks for reaching out. In the issue you linked someone mentioned that changing the base Docker image fixed this. Have you tried reproducing the issue in environments other than RHEL? Also the latest Boto3 is 1.34.118 per the CHANGELOG, have you tried updating to that version? And are you using the CRT and if so have you tried uninstalling/reinstalling?

You could also try testing with the various S3 transfer configuration options documented here, for example:

import boto3
from boto3.s3.transfer import TransferConfig

# Create an S3 client
s3 = boto3.client('s3')

# Define the transfer config
transfer_config = TransferConfig(
    multipart_threshold=1024 * 25,  # 25 MB
    max_concurrency=10,
    multipart_chunksize=1024 * 1024,  # 1 MB
    num_download_attempts=5,
    max_io_queue=100,
    io_chunksize=1024 * 1024,  # 1 MB
    use_threads=True
)

# Open the local file
with open('file.txt', 'rb') as f:
    # Use the transfer config in an S3 upload operation
    s3.upload_fileobj(
        Fileobj=f,
        Bucket='test-bucket',
        Key='remote_file.txt',
        Config=transfer_config
    )

Otherwise I recommend following the S3 performance guideline optimizations documented here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-guidelines.html

If you're able to narrow down the conditions in which this issue occurs, please let us know.

@tim-finnigan tim-finnigan added response-requested Waiting on additional information or feedback. s3 p2 This is a standard priority issue and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-triage This issue or PR still needs to be triaged. labels Jun 4, 2024
@fitnycdigitalinitiatives
Copy link
Author

Hello,

Not sure if this is related, but we recently got this error:

botocore.exceptions.HTTPClientError: An HTTP Client raised an unhandled exception: AWSHTTPSConnectionPool(host='s3.us-east-1.amazonaws.com', port=443): Pool is closed.

@fitnycdigitalinitiatives
Copy link
Author

More detailed log:

Traceback (most recent call last):
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python3.9/site-packages/urllib3/connectionpool.py", line 276, in _get_conn
    conn = self.pool.get(block=self.block, timeout=timeout)
  File "/usr/lib64/python3.9/queue.py", line 165, in get
    with self.not_empty:
  File "/usr/lib64/python3.9/threading.py", line 257, in __enter__
    return self._lock.__enter__()
  File "src/gevent/_semaphore.py", line 282, in gevent._gevent_c_semaphore.Semaphore.__enter__
  File "src/gevent/_semaphore.py", line 283, in gevent._gevent_c_semaphore.Semaphore.__enter__
  File "src/gevent/_semaphore.py", line 184, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib64/python3.9/site-packages/gevent/thread.py", line 112, in acquire
    acquired = BoundedSemaphore.acquire(self, blocking, timeout)
  File "src/gevent/_semaphore.py", line 184, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "src/gevent/_semaphore.py", line 253, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
  File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
  File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
AttributeError: 'NoneType' object has no attribute 'switch'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python3.9/site-packages/botocore/httpsession.py", line 464, in send
    urllib_response = conn.urlopen(
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python3.9/site-packages/urllib3/connectionpool.py", line 704, in urlopen
    conn = self._get_conn(timeout=pool_timeout)
  File "/usr/share/archivematica/virtualenvs/archivematica-storage-service/lib/python3.9/site-packages/urllib3/connectionpool.py", line 279, in _get_conn
    raise ClosedPoolError(self, "Pool is closed.")
urllib3.exceptions.ClosedPoolError: AWSHTTPSConnectionPool(host='s3.us-east-1.amazonaws.com', port=443): Pool is closed.

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Jun 6, 2024
@fitnycdigitalinitiatives
Copy link
Author

@tim-finnigan Hello, I did some more digging into the system logs and saw this message popping up around these issues:
systemstats_linux: unexpected header length in /proc/net/snmp. 237 != 224
Which is related to this net-snmp bug

Do you think that could be causing the problem?

@tim-finnigan
Copy link
Contributor

Thanks for following up, I'm not sure that could be causing the problem. Could you provide the full snippet you're using and complete debug logs (with sensitive info redacted) for us to investigate further? Adding boto3.set_stream_logger('') to your script will generate the full logs.

@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label Jun 19, 2024
@fitnycdigitalinitiatives
Copy link
Author

Hi Tim,

I don't want to speak too soon, but one thing that I recently tried was turning off threads in the TransferConfig and I haven't run into the error in the last several days. Perhaps there's something interrupting the threading process either with our application or in RHEL 9.

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Jun 27, 2024
@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label Jul 2, 2024
Copy link

github-actions bot commented Jul 7, 2024

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jul 7, 2024
@fitnycdigitalinitiatives
Copy link
Author

Verifying that turning off threading fixed the issue. I did see online that RHEL 9 changed something with it's threading approach, but it's way beyond my understanding to know if it could be related to what I'm experiencing.

@github-actions github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional information or feedback. labels Jul 9, 2024
@Susuvnkl
Copy link

I experienced the same issue,
no error log, the app just freezes.
s3_client.upload_fileobj(
file,
bucket_name,
filename,
ExtraArgs={"ContentType": file.content_type}
)

ATM using s3_client.put_object method instead.

@tim-finnigan
Copy link
Contributor

Thanks for following up, I think we're still waiting on the full debug logs for further investigation of this issue.

@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label Aug 2, 2024
@tim-finnigan
Copy link
Contributor

Checking in — I was made aware of this issue involving gevent: gevent/gevent#1826, which is present in the logs shared earlier here. This issue appears to be specific to RHEL-based systems. Please note that we do not provide or officially support gevent with our networking setup. Any issues related to gevent will need to be addressed by the gevent team.

@tim-finnigan tim-finnigan added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed response-requested Waiting on additional information or feedback. labels Aug 2, 2024
@fitnycdigitalinitiatives
Copy link
Author

Cool, good catch. thanks for your help!

Copy link

github-actions bot commented Aug 5, 2024

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. closing-soon This issue will automatically close in 4 days unless further comments are made. p2 This is a standard priority issue s3 third-party
Projects
None yet
Development

No branches or pull requests

3 participants