Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scaffold download objects operation #39

Merged
merged 4 commits into from
Jul 30, 2024
Merged

scaffold download objects operation #39

merged 4 commits into from
Jul 30, 2024

Conversation

aajtodd
Copy link
Contributor

@aajtodd aajtodd commented Jul 26, 2024

Issue #, if available:
#8

Description of changes:
This PR adds just the scaffolding for a new download_objects operation which will download one or more objects from a bucket and write the results to a directory. This is the first operation with a bit more liberty in the operation input/output type representations so pay attention to those and what the SEP asks for and whether you agree with what's proposed.

Questions/Discuss

  • Type for DownloadFilter?

  • SEP calls out having getObjectRequestCallback which is described as

    A callback mechanism to allow customers to update individual getObjectRequest that the S3 Transfer Manager generates.

    Have to think through how we want to do this

  • Java v2 TM also has a listObjectsV2Transformer for transforming the list objects request. This isn't called out in the SEP, unsure if we want to or should consider equivalent functionality.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@aajtodd aajtodd requested a review from a team as a code owner July 26, 2024 15:45
@@ -51,6 +51,10 @@ pub struct Args {
/// Enable tokio console (requires RUSTFLAGS="--cfg tokio_unstable")
#[arg(long, default_value_t = false, action = clap::ArgAction::SetTrue)]
tokio_console: bool,

/// Command is performed on all files or objects under the specified directory or prefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a prefix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an S3 concept. Not sure if you're just rhetorically asking for more docs here

aws-s3-transfer-manager/src/operation/download/input.rs Outdated Show resolved Hide resolved
aws-s3-transfer-manager/src/client.rs Show resolved Hide resolved
aws-s3-transfer-manager/src/types.rs Show resolved Hide resolved
aws-s3-transfer-manager/src/types.rs Outdated Show resolved Hide resolved
Comment on lines 15 to 19
/// The number of objects that failed to be downloaded
pub objects_failed: u64,

/// A list of failed object transfers
pub failed_transfers: Option<Vec<FailedDownloadTransfer>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial / educate-me: why not just have a Vec? why have both a Vec AND another field which is the length of that Vec? Is there a possibility that objects_failed != failed_transfers.unwrap().len()?

also, why is the Vec wrapped in an Option? Is that common? Why not just have a Vec whose length may be 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, SEP said to have the fields in the response:

Response

Field Description Type Required? Default Value
objectsUploaded The number of objects that have been uploaded long Required N/A
objectsFailed The number of objects that have failed long Required N/A

But then later it says to enumerate failures. I added the Vec after originally defining it and tend to agree it's redundant. I'll remove it for now unless anyone feels otherwise.

#[derive(Debug)]
pub struct DownloadObjectsOutput {
/// The number of objects that were successfully downloaded
pub objects_downloaded: u64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this now, but would we also want to offer a list of successful transfers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be very large, I would think we'd want that to be opt-in or something.

crate::operation::download_objects::DownloadObjects::orchestrate(self.handle, input).await
}

/// Set the bucket name containing the object(s) to download.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a macro can help reduce the boilerplate for XXX and set_XXX, eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I thought about that, we only have one more operation to go but we'll see.

/// A filter for downloading objects from S3
#[derive(Clone)]
pub struct DownloadFilter {
pub(crate) predicate: Arc<dyn Fn(&aws_sdk_s3::types::Object) -> bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May also need Send + Sync bound at the end, i.e.

Arc<dyn Fn(&aws_sdk_s3::types::Object) -> bool + Send + Sync>,

but no need to change it. The compiler can tell us if we do need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed. Also I don't know if i like this representation. I almost wonder if the closure shouldn't just be stored directly on the input type but I opted for defining a new type DownloadFilter for it should we find that it makes sense to change or expand it in some way.

aws-s3-transfer-manager/src/types.rs Show resolved Hide resolved
@aajtodd aajtodd merged commit d085949 into main Jul 30, 2024
13 of 14 checks passed
@aajtodd aajtodd deleted the atodd/dl-multi branch July 30, 2024 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants