Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request/Response Checksum Behavior Updates #3277

Conversation

jonathan343
Copy link
Contributor

@jonathan343 jonathan343 commented Oct 8, 2024

This PR makes the following updates to the botocore flexible checksum behavior:

Request Checksum Calculation

  • Remove MD5 checksums and make CRC32 the default checksum algorithm.
    • CRC32 is a suitable default algorithm because it is available with and without the optional awscrt dependency.
  • When the request_checksum_calculation config is set to when_supported (default value), a checksum will be generated whenever one of the following conditions is true:
    • An operation applies the httpchecksum trait and defines a requestAlgorithmMember.
    • An operation applies the httpchecksum trait and defines requestChecksumRequired: true.
  • When the request_checksum_calculation config is set to when_required, a checksum will be generated only when:
    • An operation applies the httpchecksum trait and defines requestChecksumRequired: true.

S3 Customizations:

  • Deprecates existing customizations for MD5 checksums.

Response Checksum Validation

  • A value set for requestValidationModeMember by the user will be used by the SDK.
  • If the requestValidationModeMember value is not set by the user, the following behavior is experienced:
    • When the response_checksum_validation config is set to when_supported (default value), the requestValidationModeMember will be set to ENABLED.
    • When the response_checksum_validation config is set to when_required, the requestValidationModeMember will not be set and checksum validation is skipped.

@codecov-commenter
Copy link

codecov-commenter commented Oct 8, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 97.95918% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (flexible-checksums-v2@fac0754). Learn more about missing BASE report.

Files with missing lines Patch % Lines
botocore/utils.py 66.66% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@                   Coverage Diff                    @@
##             flexible-checksums-v2    #3277   +/-   ##
========================================================
  Coverage                         ?   93.14%           
========================================================
  Files                            ?       66           
  Lines                            ?    14424           
  Branches                         ?        0           
========================================================
  Hits                             ?    13435           
  Misses                           ?      989           
  Partials                         ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

botocore/httpchecksum.py Outdated Show resolved Hide resolved
botocore/handlers.py Outdated Show resolved Hide resolved
botocore/httpchecksum.py Outdated Show resolved Hide resolved
botocore/httpchecksum.py Outdated Show resolved Hide resolved
botocore/utils.py Show resolved Hide resolved
tests/functional/test_s3.py Outdated Show resolved Hide resolved
tests/functional/test_s3.py Show resolved Hide resolved
# `payload_signing_enabled` config overrides this logic and forces the
# header.
def test_content_sha256_not_set_if_config_value_is_true(self):
# By default, put_object() provides a trailing checksum and includes the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? I wouldn't expect us to be using trailing checksums by default on put_object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because CRC32 is the new default, put_object will use trailing checksums by default. This is because put_object has streaming input.

You can see this behavior in the current version of the SDK by running:

import boto3
boto3.set_stream_logger('')

s3 = boto3.client("s3")

response = s3.put_object(
    Body=b"Example data.",
    Bucket="<bucket_name>",
    Key="example_key.txt",
    ChecksumAlgorithm="CRC32",
)

tests/functional/test_s3.py Outdated Show resolved Hide resolved
tests/functional/test_s3.py Show resolved Hide resolved
@jonathan343 jonathan343 force-pushed the flexible-checksums-v2 branch from 2d66b37 to d586cba Compare October 22, 2024 03:11
@jonathan343 jonathan343 force-pushed the request-checksum-calculation branch from ccbc9c0 to 17ef6f7 Compare October 22, 2024 03:45
@jonathan343 jonathan343 changed the title Default to CRC32 for Request Checksum Calculation Request/Response Checksum Updates Oct 28, 2024
@jonathan343 jonathan343 changed the title Request/Response Checksum Updates Request/Response Behavior Checksum Updates Oct 28, 2024
@jonathan343 jonathan343 changed the title Request/Response Behavior Checksum Updates Request/Response Checksum Behavior Updates Oct 28, 2024
botocore/handlers.py Outdated Show resolved Hide resolved
tests/functional/test_s3.py Outdated Show resolved Hide resolved
tests/unit/test_httpchecksum.py Show resolved Hide resolved
botocore/utils.py Outdated Show resolved Hide resolved
botocore/httpchecksum.py Show resolved Hide resolved
botocore/httpchecksum.py Show resolved Hide resolved
tests/functional/test_s3.py Outdated Show resolved Hide resolved
tests/functional/test_s3.py Outdated Show resolved Hide resolved
@jonathan343 jonathan343 force-pushed the flexible-checksums-v2 branch from d586cba to fac0754 Compare November 13, 2024 16:27
@jonathan343 jonathan343 force-pushed the request-checksum-calculation branch from 009ea3d to 626c08a Compare November 13, 2024 17:23
botocore/httpchecksum.py Outdated Show resolved Hide resolved
Comment on lines 365 to 366
if "extra_headers" in algorithm:
request["headers"].update(algorithm["extra_headers"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work if the user already supplied these headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header we're worried about would only get added when the default checksum is used. This means that the user didn't supply the "requestAlgorithmMember" member. The user's input would always take precedence.
I'll update this logic to be less generic as suggested by your other comment.

if has_checksum_header(request):
return

extra_headers = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a rough idea of what expansion we're expecting here? Generally grab-bag style containers like this tend to grow in unexpected ways. It may be worth constraining to the exact header we're expecting since it's tightly coupled to the checksum we're creating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to instead store the following in our request checksum context:

"request_algorithm_header": {
    "name": "foo",
    "value": "bar",
},

We can later check if "request_algorithm_header" in request["context"]["checksum"] and apply the header if it exists. Let me know if that's better.

tests/functional/test_s3.py Show resolved Hide resolved
# We only support unsigned trailer checksums currently. As this
# disables payload signing we'll only use trailers over TLS.
location_type = "trailer"
if request["context"]["client_config"].signature_version != 's3':
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: would we prefer this if condition to be added to the parent if viaand?

@jonathan343
Copy link
Contributor Author

This work is being tracked internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants