Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort Download Object #80

Merged
merged 38 commits into from
Dec 9, 2024
Merged

Abort Download Object #80

merged 38 commits into from
Dec 9, 2024

Conversation

waahm7
Copy link
Contributor

@waahm7 waahm7 commented Dec 4, 2024

Issue #, if available:
#65

Description of changes:
This PR implements the drop functionality for downloads. Here are the changes:

  1. Remove the join function. It was working more like abort function since it closes the body. That behavior is not right because join function should wait for the download to complete instead of aborting it and then only waiting for the in-progress parts to complete. I have removed join function as it didn't made much sense, customers should consume the output if they want to wait for the download to complete. If we don't close the body in the join function, it may hang waiting for the body channel to clear.
  2. Add abort function which cancels any in-progress work and any future work.
  3. If any part fails with an error, cancel any in-progress work and future work.
  4. Rename body->output, this makes it more generic because we are not just delivering body from it. We are delivering metadata and any error that occurs will be delivered to this channel. If there is an error, the channel will be closed and we will not get any more data from this channel. This consolidates our error reporting to a single place instead of join + body channel.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@waahm7 waahm7 changed the title WIP | Abort Download Object Abort Download Object Dec 4, 2024
@waahm7 waahm7 marked this pull request as ready for review December 4, 2024 19:46
@waahm7 waahm7 requested a review from a team as a code owner December 4, 2024 19:46
@waahm7 waahm7 mentioned this pull request Dec 4, 2024
/// Operation builders
pub mod builders;
/// Abstractions for responses and consuming data streams.
pub mod output;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: Be consistent across operations. e.g. upload output module is not public but it re-exports the output type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

///
/// Wraps potentially multiple streams of binary data into a single coherent stream.
/// The data on this stream is sequenced into the correct order.
#[derive(Debug)]
pub struct Body {
inner: UnorderedBody,
pub struct Output {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion/discuss: Should this be consistent with the other operations output types (e.g. DownloadOutput)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah, DownloadOutput makes more sense since there are different types of outputs. I have updated it.

@@ -42,35 +42,35 @@ pub struct ChunkOutput {
// TODO: Do we want to expose something to yield multiple chunks in a single call, like
// recv_many/collect, etc.? We can benchmark to see if we get a significant performance boost once
// we have a better scheduler in place.
impl Body {
/// Create a new empty body
impl Output {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

future: We may want to have a conversion that allows you to just access the bytes off the stream (i.e. user only cares about the contents and not any of the intermediate metadata), we can wait though as it can be added in a backwards compatible way

fn into_body(self) -> Body

Copy link
Contributor Author

@waahm7 waahm7 Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, currently, body vs. output is not very different; it's only .next() vs. .next().data. We can add this util in the future if we add more things to the output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, reverted the name change.

aws-s3-transfer-manager/src/operation/download.rs Outdated Show resolved Hide resolved
}

#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn test_abort_download() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to necessarily use it here but one thing to make you aware of is you can leverage capture_test_logs (example) which you can use to assert the logs contain certain things (e.g. to verify particular branches were hit).

Copy link
Contributor

@ysaito1001 ysaito1001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, leaving minor comments.

@@ -42,35 +42,35 @@ pub struct ChunkOutput {
// TODO: Do we want to expose something to yield multiple chunks in a single call, like
// recv_many/collect, etc.? We can benchmark to see if we get a significant performance boost once
// we have a better scheduler in place.
impl Body {
/// Create a new empty body
impl DownloadOutput {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming: We currently have UploadOutput UploadObjectsOutput and DownloadObjectsOutput, which are single structs representing the end result of an operation.

But this represents a sequence of structs.

Also, we'll eventually need something representing the end result of a download operation (where we'll put stuff like whether checksum validation happened). I would have expected that to be namedDownloadOutput.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this continues to be a sequence of ChunkOutput, I'm fine just continuing to call this Body

If we were going to change it to be a sequence of enums (representing, Discovery, Chunk, Final, etc), then maybe I'd name it Events instead, but I don't think that's the plan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, discussed offline, I have reverted the naming to body.

);
}
}

if let Err(err) = comp_tx.send(resp).await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK that ErrorKind::OperationCancelledgets sent oncomp_tx`? That could result in MANY chunks sending their cancellation error on the chunk transmitter

Do all of these end up reported to the user via the Body/Output iterator? Or does it filter errors so the user only sees the 1st? Can we guarantee that the ACTUAL error gets reported to the user before any cancellation errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as we encounter an error, we close the channel so user will only see the first error.

Can we guarantee that the ACTUAL error gets reported to the user before any cancellation errors?

Tokio does guarantee that "All data sent on Sender will become available on Receiver in the same order as it was sent." However, if the channel is full, all of them will be waiting at an await point. In that case (not 100% sure), the order won't be guaranteed. We will need a fair bit of complexity to work around that, i.e., maybe collect all the errors and then only send those that are not OperationCancelled errors. Not sure if there is a simpler way to do this. @aajtodd any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to send operation cancelled across the channel at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking there can be a case where only the operation cancelled error is there, but yeah, we can just not send that error at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user wouldn't act on that error right? I wouldn't think so anyway, it doesn't tell them anything they don't already know if they cancelled the operation, thus it probably doesn't make sense to yield it to them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have updated it to ignore OperationCancelled errors.

@waahm7 waahm7 merged commit 540285a into main Dec 9, 2024
13 checks passed
@waahm7 waahm7 deleted the cancel-single-download branch December 9, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants