Add support for uploading files from a directory to S3 #67

ysaito1001 · 2024-10-29T20:38:52Z

Description of changes
This PR adds support for uploading files from a specified directory to S3. Example usages can be found in the example as well as in integration tests.

This implementation mirrors the pattern used for downloading multiple objects from S3, but in the opposite direction.

Given a UploadObjectsInput, UploadObjects::orchestrate spawns a producer task, list_directory_contents, and consumer tasks, upload_objects. The producer task retrieves files from the specified target directory (non-recursively by default) and yields them to the consumer tasks, which upload them to S3 using the existing functionality for uploading a single object. A UploadObjectsHandle manages these tasks ~~and joining this handle will drive the async tasks~~.

A couple of details. First, for yielding files to upload, we use the walkdir crate, which supports following symbolic links and recursive traversal natively. Since this crate offers a synchronous API, we integrate the blocking crate to handle blocking I/O tasks in a dedicated pool. ~~If you think this approach is unnecessary, I’m open to either retaining or removing the blocking crate.~~ Using something like spawn_blocking would require the whole directory traversal task to be a sync function to be passed to spawn_blocking, which complicates the use of the async_channel within the function (that said we left TODO to reevaluate the need for the blocking crate).

Second, during directory traversal, I/O errors may occur, such as being unable to read directories/files or failing to construct an InputStream. These client-side errors are sent to the consumer task, which records them in FailedUploadTransfer (in which case the input field will be None for client-side errors) but we proceed with the rest of the "good" files if we employ FailedTransferPolicy::Continue. Deriving an object key from a relative filename may also fail; however, since this is a user input error, we will fail the upload operation even with FailedTransferPolicy::Continue.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

aws-s3-transfer-manager/src/operation.rs

aws-s3-transfer-manager/src/operation/upload_objects/worker.rs

aws-s3-transfer-manager/test-common/src/lib.rs

aws-s3-transfer-manager/src/operation/upload_objects.rs

aws-s3-transfer-manager/src/operation/upload_objects/handle.rs

aws-s3-transfer-manager/examples/cp.rs

aws-s3-transfer-manager/src/operation/upload_objects/builders.rs

aws-s3-transfer-manager/src/operation/upload_objects/handle.rs

aws-s3-transfer-manager/src/operation/upload_objects/worker.rs

Co-authored-by: Waqar Ahmed Khan <[email protected]>

This commit addresses #67 (comment) #67 (comment) #67 (comment)

aws-s3-transfer-manager/src/operation/upload_objects/worker.rs

aajtodd

Looks good overall, great start!

aws-s3-transfer-manager/examples/cp.rs

aws-s3-transfer-manager/src/operation.rs

aws-s3-transfer-manager/src/operation/upload_objects/worker.rs

This commit addresses #67 (comment) #67 (comment)

This commit addresses #67 (comment)

This commit responds to #67 (comment)

This commit addresses #67 (comment)

ysaito1001 added 13 commits October 24, 2024 21:24

Add basic functionality for uploading all files

d824ddb

Address low-hanging fruit in error handling

3d10732

Handle errors properly in the worker module

1549c34

Move directory traversal I/O to a special thread pool

65a3c69

Add unit tests for functions in the worker module

da0a39c

Add integration tests for uploading multiple objects

225d42c

Merge branch 'main' into ysaito/upload_objects

cf74376

Run cargo fmt

e1bfb65

Fix clippy warnings

95c93ff

Add license to test-common

07d53b1

Specify crate version for test-common

8eb661f

Separate unit tests for worker module between Unix and Windows

fc17846

Add missing use statement for derive_object_key

6b5a39f

ysaito1001 marked this pull request as ready for review October 29, 2024 22:32

ysaito1001 requested a review from a team as a code owner October 29, 2024 22:32

waahm7 reviewed Oct 29, 2024

View reviewed changes

graebm reviewed Oct 30, 2024

View reviewed changes

ysaito1001 and others added 2 commits October 30, 2024 10:19

Update aws-s3-transfer-manager/test-common/src/lib.rs

bd3c171

Co-authored-by: Waqar Ahmed Khan <[email protected]>

Fix upload multiple objects in the example

0bf90e7

This commit addresses #67 (comment) #67 (comment) #67 (comment)

graebm reviewed Oct 30, 2024

View reviewed changes

aws-s3-transfer-manager/src/operation/upload_objects/worker.rs Show resolved Hide resolved

aajtodd reviewed Oct 30, 2024

View reviewed changes

ysaito1001 added 9 commits October 30, 2024 17:26

Avoid unwrap in the example

7caee3b

This commit addresses #67 (comment) #67 (comment)

Compare delim against MAIN_SEPARATOR_STR

17fdbbb

This commit addresses #67 (comment)

Rename channel ends work_* to list_directory_*

74af206

This commit addresses #67 (comment)

Add TODO to replace &Option<String> with Option<&str>

81a7ef6

This commit addresses #67 (comment)

Add integration test for checking UploadInput in falied transfer

5c49612

Also verify the object key in failed_transfers

3a82835

Use .into() from an error in constructing InputStream

a53f005

This commit addresses #67 (comment)

Log when upload_objects recevies an error from list_directory

09c2dc5

This commit addresses #67 (comment)

Add comment to DEFAULT_DELIMITER

bb9fbc0

This commit addresses #67 (comment)

aajtodd approved these changes Nov 4, 2024

View reviewed changes

ysaito1001 added 7 commits November 6, 2024 11:01

Add TODO for the use of the blocking crate

245e47b

Remove Option from list of failed transfers

de6d920

This commit addresses #67 (comment)

Add TODO for implementing more sophisticated error handling

457c409

This commit addresses #67 (comment)

Add TODO to consider moving input out of the State struct

fcfac43

This commit responds to #67 (comment)

Add comments on the upper size_hint of InputStream

0c4f872

This commit addresses #67 (comment)

Add impl From to convert state to output

e558339

This commit addresses #67 (comment)

Map walkdir::Error to error::Error accordingly

a1ac71d

This commit addresses #67 (comment)

ysaito1001 merged commit c88e07f into main Nov 6, 2024
14 checks passed

ysaito1001 deleted the ysaito/upload_objects branch November 6, 2024 23:17

This was referenced Nov 8, 2024

Handle symlinks while uploading files in directory #72

Merged

Transfer manager upload directory support #40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for uploading files from a directory to S3 #67

Add support for uploading files from a directory to S3 #67

ysaito1001 commented Oct 29, 2024 •

edited

Loading

aajtodd left a comment

Add support for uploading files from a directory to S3 #67

Add support for uploading files from a directory to S3 #67

Conversation

ysaito1001 commented Oct 29, 2024 • edited Loading

aajtodd left a comment

Choose a reason for hiding this comment

ysaito1001 commented Oct 29, 2024 •

edited

Loading