-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18073. Upgrade AWS SDK to v2 in S3A [work in progress] #5163
Merged
asfgit
merged 229 commits into
apache:feature-HADOOP-18073-s3a-sdk-upgrade
from
ahmarsuhail:feature-HADOOP-18073-s3a-sdk-upgrade
Jan 19, 2023
Merged
HADOOP-18073. Upgrade AWS SDK to v2 in S3A [work in progress] #5163
asfgit
merged 229 commits into
apache:feature-HADOOP-18073-s3a-sdk-upgrade
from
ahmarsuhail:feature-HADOOP-18073-s3a-sdk-upgrade
Jan 19, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ationsHomeSubCluster. (apache#4846)
…during vectored read. (apache#4921) part of HADOOP-18103. Contributed by: Mukund Thakur
…er (apache#4917) Contributed-by: navinko <[email protected]>
…Cluster Use WebAppUtils. (apache#4939)
…n in Federation.md. (apache#4927)
Contributed by Steve Loughran
Make S3APrefetchingInputStream.seek() completely lazy. Calls to seek() will not affect the current buffer nor interfere with prefetching, until read() is called. This change allows various usage patterns to benefit from prefetching, e.g. when calling readFully(position, buffer) in a loop for contiguous positions the intermediate internal calls to seek() will be noops and prefetching will have the same performance as in a sequential read. Contributed by Alessandro Passaro.
…he#4925) Contributed by Daniel Carl Jones
Signed-off-by: Akira Ajisaka <[email protected]>
Add to XMLUtils a set of methods to create secure XML Parsers/transformers, locking down DTD, schema, XXE exposure. Use these wherever XML parsers are created. Contributed by PJ Fanning
… Contributed by Ashutosh Gupta. Reviewed-by: Akira Ajisaka <[email protected]> Signed-off-by: Chris Nauroth <[email protected]>
…#4937) Contributed by PJ Fanning
…er if the item exist. (apache#4987). Contributed by ZanderXu. Signed-off-by: He Xiaoqiao <[email protected]>
…ervice (apache#4775) Co-authored-by: Ashutosh Gupta <[email protected]> Signed-off-by: Akira Ajisaka <[email protected]>
Contributed by Steve Loughran
…pache#4986) Contributed by Mukund Thakur
…izedBlocks (apache#4942). Contributed by ZanderXu. Reviewed-by: Mingxiang Li <[email protected]> Signed-off-by: He Xiaoqiao <[email protected]>
apache#4948). Contributed by ZanderXu. Signed-off-by: He Xiaoqiao <[email protected]>
…nfo. Contributed by Riya Khandelwal
Contributed by P J Fanning
HDFS-16774. Improve async delete replica on datanode to reduce the probability of ReplicationNotFoundException Co-authored-by: Haiyang Hu <[email protected]> Reviewed-by: He Xiaoqiao <[email protected]>
…atic members (apache#5246) Co-authored-by: Chengbing Liu <[email protected]> Signed-off-by: Erik Krogen <[email protected]>
…ailed (apache#5280) Reviewed-by: Takanobu Asanuma <[email protected]> Signed-off-by: Tao Li <[email protected]>
… for usernames with dot (apache#4471)
* YARN-11413. Fix Junit Test ERROR Introduced By YARN-6412. * YARN-11413. Fix CheckStyle. * YARN-11413. Fix CheckStyle. Co-authored-by: slfan1989 <louj1988@@>
Signed-off-by: Tao Li <[email protected]> Signed-off-by: Chris Nauroth <[email protected]>
…dirs (apache#4237) Signed-off-by: Chris Nauroth <[email protected]>
…e#5292) Signed-off-by: Chris Nauroth <[email protected]>
…refreshUserToGroupsMappings API's for Federation. (apache#5193)
…der (apache#5019) Co-authored-by: Ashutosh Gupta <[email protected]> Reviewed-by: Shilun Fan <[email protected]> Signed-off-by: Akira Ajisaka <[email protected]>
…ugins (apache#5023) Co-authored-by: Ashutosh Gupta <[email protected]> Signed-off-by: Akira Ajisaka <[email protected]>
…en transformer factories do not support attributes (apache#5253) Part of HADOOP-18469 and the hardening of XML/XSL parsers. Followup to the main HADOOP-18575 patch, to improve performance when working with xml/xsl engines which don't support the relevant attributes. Include this change when backporting. Contributed by PJ Fanning.
Signed-off-by: Nikita Eshkeev <[email protected]>
See aws_sdk_v2_changelog.md for details. Co-authored-by: Ahmar Suhail <[email protected]> Co-authored-by: Alessandro Passaro <[email protected]>
addresses review comments + yetus errors Co-authored-by: Ahmar Suhail <[email protected]>
ahmarsuhail
force-pushed
the
feature-HADOOP-18073-s3a-sdk-upgrade
branch
from
January 18, 2023 14:26
b2ef9e2
to
369fcfa
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
merged! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of PR
This is an initial draft PR containing all the changes implemented so far to upgrade S3A to the AWS SDK v2. Note that this is still a work in progress and we plan to further contribute to it to fill existing gaps and update the SDK when missing features are released (e.g. support for Client-side Encryption and public release of the new Transfer Manager, currently in preview).
In the meantime, this PR should provide a view of the whole set of changes and start a conversation on the remaining open questions and on how to handle breaking changes that affect S3A.
The new document at
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/aws_sdk_v2_changelog.md
discusses the key changes contained in this PR and is the suggested starting point for the review.
Further open questions to be discussed:
The region logic. Previously, if an endpoint was configured and no region, parse the region from the endpoint. If configured endpoint is the standard us-east-1 endpoint, set region as null, let SDK figure out the region. If no endpoint is configured, set region as us-east-1, and set
.withForceGlobalBucketAccessEnabled
. In SDK v2, there’s no cross region access, so the correct region of the bucket needs to be set. So we now get the region of the bucket using head bucket, and set it. In general, the guidance for the new SDK is to only set the region, and let the SDK determine the endpoint.Bucket probes. Currently done with doesBucketExist and doesBucketExistV2. Why do we need these two separate levels? There is no doesBucketExist operation in SDK V2, it will need to be replaced with a HeadBucket/GetBucketACL. Also consider that, with the new region logic, we will need to do a HeadBucket while configuring the client if the region isn’t specified.
Progress Listeners. SDK V2 currently does not support attaching progress listeners on requests outside the Transfer Manager. We use them in Put and UploadPart in S3ABlockOutputStream. Are they required for the upgrade?
ACLs. LogDeliveryWrite, which is a bucket level ACL, is no longer supported in the SDK V2. S3A seems to use ACLs at the object level only. Can this ACL be removed?
Transfer Manager. You can no longer set a threshold for when to use the Transfer Manager. The default is 8MB.
How was this patch tested?
Run
mvn -Dparallel-tests -DtestsThreadCount=8 clean verify
ineu-west-2
.The following tests are currently failing:
CopyObjectResult
headObject()
with empty key failsheadObject()
with empty key commented outFor code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?