Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download objects operation #45

Merged
merged 16 commits into from
Aug 26, 2024
Merged

download objects operation #45

merged 16 commits into from
Aug 26, 2024

Conversation

aajtodd
Copy link
Contributor

@aajtodd aajtodd commented Aug 16, 2024

Issue #, if available:
closes: #8

Description of changes:
Adds support for the download_objects operation which recursively downloads a bucket (or bucket prefix) to a local directory.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@aajtodd aajtodd requested a review from a team as a code owner August 16, 2024 16:25
@aajtodd aajtodd requested review from graebm, Velfi and waahm7 August 16, 2024 16:26
aws-s3-transfer-manager/examples/cp.rs Show resolved Hide resolved
Comment on lines +32 to +36
let total_bytes_transferred = self
.ctx
.state
.total_bytes_transferred
.load(Ordering::SeqCst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider making an accessor for this.

Comment on lines +129 to +134
let root_dir = ctx.state.input.destination().expect("destination set");
let key = input.key.as_ref().expect("key set");
let prefix = ctx.state.input.key_prefix();
let delim = ctx.state.input.delimiter();

let key_path = local_key_path(root_dir, key.as_str(), prefix, delim)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're grabbing all these fields just to call local_key_path, could you make this a method on cty.state or something? It would shorten this method.

Comment on lines +173 to +199
fn strip_key_prefix<'a>(key: &'a str, prefix: Option<&str>, delimiter: Option<&str>) -> &'a str {
let prefix = prefix.unwrap_or("");
let delim = delimiter.unwrap_or(DEFAULT_DELIMITER);

if key.is_empty() || prefix.is_empty() || !key.starts_with(prefix) || !key.contains(delim) {
return key;
}

let stripped = &key[prefix.len()..];

if prefix.ends_with(delim) || !stripped.starts_with(delim) {
return stripped;
}

&stripped[1..]
}

/// Replace `delimiter` in `key` if it does not match the `path_separator`
fn replace_delim<'a>(key: &'a str, delimiter: Option<&str>, path_separator: &str) -> Cow<'a, str> {
match delimiter {
Some(delim) if delim != path_separator => {
let replaced = key.replace(delim, path_separator);
Cow::Owned(replaced)
}
_ => Cow::Borrowed(key),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be nice to fuzz these to make sure we handle weird characters as expected.

aws-s3-transfer-manager/src/types.rs Show resolved Hide resolved
let prefix = prefix.unwrap_or("");
let delim = delimiter.unwrap_or(DEFAULT_DELIMITER);

if key.is_empty() || prefix.is_empty() || !key.starts_with(prefix) || !key.contains(delim) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things:

  1. !key.starts_with(prefix): wouldn't it be a bug if this were true? should we panic/err if that happens, vs trying to handle it?
  2. !key.contains(delim): I tested with the java transfer manager and if key="photo-1" and prefix="photo-" then actual="1". I don't necessarily agree that it should work this way, but that's how it works for java.
    • It feels weird to me that the transfer manager SEP forces a delimiter on the end of an upload prefix, but doesn't force a delimiter at the end of a download prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Probably yes but that is what Java does
  2. Yes and I believe this is covered by unit tests. I can beef these up further if you think we need additional tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the following behavior, you should probably add tests

key="many///delims-in-a-row" prefix="" delim="/" -> "many/delims-in-a-row"
key="//delims-at-start" prefix="" delim="/" -> "delims-at-start"
key="prefix///then-delims" prefix="prefix" delim="/" -> "then-delims"
key="prefix///then-delims" prefix="prefix/" delim="/" -> "then-delims"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those last 3 are not valid in the Java S3 TM:

Caused by: software.amazon.awssdk.core.exception.SdkClientException: Cannot download key //delims-at-start, its relative path resolves outside the parent directory.
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Cannot download key prefix1///then-delims, its relative path resolves outside the parent directory.
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Cannot download key prefix1///then-delims, its relative path resolves outside the parent directory.

The logic implies that if the relative path after stripping the prefix is an absolute path then it's treated as outside of the destination directory.

Copy link
Contributor

@graebm graebm Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying the Java S3 TM is perfect. I'm saying the SEP doesn't say what to do about multiple delimiters in a row, so we get inconsistent behavior like:

"prefix/then/later//double-delims" -> "then/laser/double-delims"
OK, after removing prefix and doing path_clean

"prefix//double-delims" -> "/double-delims"
BAD, after removing prefix it looks like an absolute path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with a TODO about the SEP needing more guidance on edge cases

Copy link
Contributor

@graebm graebm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix (prefix/delim validation) and ship

.set_bucket(input.bucket.to_owned())
.set_prefix(prefix.clone())
.set_continuation_token(next_token.clone())
.set_delimiter(input.delimiter.to_owned()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in upload_objects I added a recursive bool, defaulting to false, per the SEP. Java TM doesn't have this option, and neither does download_objects in this PR. But if download_objects DID support it, we'd probably want to pass delimiter here and then NOT recurse ?

anyway, I'm adding a TODO about this inconsistency in the upload_objects scaffolding PR. We'll figure it out eventually

@aajtodd aajtodd merged commit 3684958 into main Aug 26, 2024
14 checks passed
@aajtodd aajtodd deleted the atodd/dl-objects branch August 26, 2024 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transfer manager download support
5 participants