Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Botocore 1.35.45 breaks S3:SelectObjectContent #3284

Closed
1 task done
bpandola opened this issue Oct 22, 2024 · 10 comments
Closed
1 task done

Botocore 1.35.45 breaks S3:SelectObjectContent #3284

bpandola opened this issue Oct 22, 2024 · 10 comments
Labels
bug This issue is a confirmed bug. p0 This issue is the highest priority potential-regression Marking this issue as a potential regression to be checked by team member s3

Comments

@bpandola
Copy link

Describe the bug

The S3 action SelectObjectContent fails with the latest version of botocore. An exception is raised in the recently-added _handle_200_error handler (see PR #3276).

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Confirmed working in previous version(s) of botocore.

Current Behavior

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../env/python3.11/lib/python3.11/site-packages/botocore/client.py:569: in _api_call
    return self._make_api_call(operation_name, kwargs)
../env/python3.11/lib/python3.11/site-packages/botocore/client.py:1005: in _make_api_call
    http, parsed_response = self._make_request(
../env/python3.11/lib/python3.11/site-packages/botocore/client.py:1029: in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
../env/python3.11/lib/python3.11/site-packages/botocore/endpoint.py:119: in make_request
    return self._send_request(request_dict, operation_model)
../env/python3.11/lib/python3.11/site-packages/botocore/endpoint.py:197: in _send_request
    success_response, exception = self._get_response(
../env/python3.11/lib/python3.11/site-packages/botocore/endpoint.py:239: in _get_response
    success_response, exception = self._do_get_response(
../env/python3.11/lib/python3.11/site-packages/botocore/endpoint.py:306: in _do_get_response
    self._event_emitter.emit(
../env/python3.11/lib/python3.11/site-packages/botocore/hooks.py:412: in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
../env/python3.11/lib/python3.11/site-packages/botocore/hooks.py:256: in emit
    return self._emit(event_name, kwargs)
../env/python3.11/lib/python3.11/site-packages/botocore/hooks.py:239: in _emit
    response = handler(**kwargs)
../env/python3.11/lib/python3.11/site-packages/botocore/handlers.py:1252: in _handle_200_error
    if _looks_like_special_case_error(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

status_code = 200, body = <urllib3.response.HTTPResponse object at 0x10f687a60>

    def _looks_like_special_case_error(status_code, body):
        if status_code == 200 and body:
            try:
                parser = ETree.XMLParser(
                    target=ETree.TreeBuilder(), encoding='utf-8'
                )
>               parser.feed(body)
E               TypeError: a bytes-like object is required, not 'HTTPResponse'

../env/python3.11/lib/python3.11/site-packages/botocore/handlers.py:174: TypeError

Reproduction Steps

import uuid
import gzip
import json
import boto3

NESTED_JSON = {"a1": {"b1": "b2"}, "a2": [True, False], "a3": True, "a4": [1, 5]}

client = boto3.client("s3")
bucket_name = str(uuid.uuid4())
client.create_bucket(Bucket=bucket_name)
client.put_object(
    Bucket=bucket_name,
    Key="json.gzip",
    Body=gzip.compress(json.dumps(NESTED_JSON).encode("utf-8")),
)
client.select_object_content(
    Bucket=bucket_name,
    Key="json.gzip",
    Expression="SELECT count(*) FROM S3Object",
    ExpressionType="SQL",
    InputSerialization={"JSON": {"Type": "DOCUMENT"}, "CompressionType": "GZIP"},
    OutputSerialization={"JSON": {"RecordDelimiter": ","}},
)

Possible Solution

The new handler has a guard clause checking if operation_model.has_streaming_output but it may also need to guard against has_event_stream_output.

Additional Information/Context

No response

SDK version used

1.35.45

Environment details (OS name and version, etc.)

MacOS, Python 3.11

@bpandola bpandola added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Oct 22, 2024
@github-actions github-actions bot added the potential-regression Marking this issue as a potential regression to be checked by team member label Oct 22, 2024
@fourtyplustwo
Copy link

+1 seeing the same issue, breaking a lot of stuff

@tim-finnigan tim-finnigan added the p0 This issue is the highest priority label Oct 22, 2024
@tim-finnigan
Copy link
Contributor

Thanks for reporting this issue, we were able to reproduce this in 1.35.45. A fix is pending release here: #3285.

@tim-finnigan tim-finnigan added s3 and removed needs-triage This issue or PR still needs to be triaged. labels Oct 22, 2024
@tim-finnigan
Copy link
Contributor

This should now be fixed in version 1.35.46. Please try updating your version of boto3 and let us know if you're still running into any issues.

@tkelman
Copy link

tkelman commented Oct 22, 2024

should that reproduction case be added as a regression test?

@tim-finnigan
Copy link
Contributor

should that reproduction case be added as a regression test?

The team does plan to add a regression test for this in the near future but wanted to first release the fix in 1.35.46.

@bpandola
Copy link
Author

@tim-finnigan This is still broken in version 1.35.46 - no longer for SelectObjectContent, but for other S3 api calls.

Here's another simple example:

import uuid
import json
import boto3

client = boto3.client("s3")
bucket_name = str(uuid.uuid4())
policy = json.dumps(
        {
            "Version": "2012-10-17",
            "Id": "PutObjPolicy",
            "Statement": [
                {
                    "Sid": "DenyUnEncryptedObjectUploads",
                    "Effect": "Deny",
                    "Principal": "*",
                    "Action": "s3:PutObject",
                    "Resource": f"arn:aws:s3:::{bucket_name}/*",
                    "Condition": {
                        "StringNotEquals": {
                            "s3:x-amz-server-side-encryption": "aws:kms"
                        }
                    },
                }
            ],
        }
    )
client.create_bucket(Bucket=bucket_name)
client.put_bucket_policy(Bucket=bucket_name, Policy=policy)
client.get_bucket_policy(Bucket=bucket_name)

I think the new handler that was added isn't guarding against all of the ways in which a valid/successful rest-xml response may not contain actual XML (e.g. a GetBucketPolicy response, which is just the policy in JSON format).

I'm a little surprised that this change didn't break something in your comprehensive test suite... is there really no test covering an httpPayload trait or putting/getting S3 bucket policies?

@tim-finnigan
Copy link
Contributor

@bpandola what specific error are you getting, and did that same code work prior to 1.34.45?

@bpandola
Copy link
Author

@tim-finnigan Yes, that code works fine prior to 1.34.45. Running it with 1.34.46 results in a 500 error after max retries because the new S3 handler changes the status code to a 500 after trying (and failing) to parse the valid JSON response as XML.
botocore.exceptions.ClientError: An error occurred (500) when calling the GetBucketPolicy operation (reached max retries: 4): Internal Server Error

@tim-finnigan
Copy link
Contributor

Thanks for following up — I'm going to close this as the SelectObjectContent issue was fixed in 1.35.46, and we can track the GetBucketPolicy/related issues in the #3286.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p0 This issue is the highest priority potential-regression Marking this issue as a potential regression to be checked by team member s3
Projects
None yet
Development

No branches or pull requests

4 participants