-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
naive mpu #2
naive mpu #2
Conversation
aws-s3-transfer-manager/src/types.rs
Outdated
pub enum TargetPartSize { | ||
/// Automatically configure an optimal target part size based on the execution environment. | ||
#[default] | ||
Auto, | ||
|
||
/// Explicitly configured part size. | ||
Explicit(u64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debatable: Seems weird to have an "Explicit" option that might actually get adjusted?
Maybe name it like:
PartSize::Auto
PartSize::Target(u64)
: gets adjusted if necessary
and if some future person ever really wants to get explicit we could add:
PartSize::Explicit(u64)
: never adjusted, causes some configuration error if it's too large or too small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see your point but we are only adjusting for min part size IIRC. Is this distinction going to be worth it? Do we think we'd adjust it anywhere else?
edit: Right we adjust if it's more than 10k parts as well 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the name of the type is TargetPartSize already which conveys this IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am super annoying. We can address the naming crap another day when we get a grip on all our settings
/// | ||
/// The minimum part size is 5 MiB, any part size less than that will be rounded up. | ||
/// Default is [PartSize::Auto] | ||
pub fn multipart_threshold_part_size(self, multipart_threshold_part_size: PartSize) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more annoying name suggestion: multipart_threshold_bytes
or just multipart_threshold
- having "part" in the name twice is weird, other than we're re-using the
PartSize
enum. It seems annoying to make another enum just for this setting, but ... is that how it's done? - seems nice to use the term "bytes", as the SEP does? I see us clarifying it in some places like
MIN_PART_SIZE_BYTES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need another enum here but we can live with it for a bit and see what we think. Changed to multipart_threshold(...)
as suggested.
The distinction where we have _bytes
suffix is usually wherever we have raw u64
values. We could probably make some simple abstractions like StorageUnit
or ByteSize
or whatever that has conversions between bytes/MB/MiB/etc, may be useful for throughput calculations as well. Leaving for now though.
/// | ||
/// [`multipart_threshold_part_size`]: method@Self::multipart_threshold_part_size | ||
/// [`PutObject`]: https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html | ||
pub fn target_part_size(self, part_size: PartSize) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh god I'm annoying: just call this part_size()
now? part_size_bytes
?
Ok(mut completed_parts) => { | ||
all_parts.append(&mut completed_parts); | ||
} | ||
// TODO(aws-sdk-rust#1159, design) - do we want to return first error or collect all errors? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good design question. I suspect both have legit use cases, so it's conceivable we end up giving customers an option for which error report mode they like to use.
/// * `handle` - The upload handle | ||
/// * `stream` - The content to upload | ||
/// * `content_length` - The upper bound on the content length | ||
async fn try_start_mpu_upload( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Late reviewing this, but should the mpu
in this be mp
? Otherwise I would read it as try_start_multipart_upload_upload
Description of changes:
This PR completes a naive / strawman implementation of multipart upload.
Main components:
UploadHandle
to caller after the MPU is started.CompleteMultipartUpload
is invoked by "joining" the handle. Joining involves waiting for all of the parts to be uploaded and then completing the call.Future/Questions
UploadError
struct with different error "kinds" that capture various states/composite failures.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.