Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for uploading files from a directory to S3 #67

Merged
merged 31 commits into from
Nov 6, 2024

Conversation

ysaito1001
Copy link
Contributor

@ysaito1001 ysaito1001 commented Oct 29, 2024

Description of changes
This PR adds support for uploading files from a specified directory to S3. Example usages can be found in the example as well as in integration tests.

This implementation mirrors the pattern used for downloading multiple objects from S3, but in the opposite direction.

Given a UploadObjectsInput, UploadObjects::orchestrate spawns a producer task, list_directory_contents, and consumer tasks, upload_objects. The producer task retrieves files from the specified target directory (non-recursively by default) and yields them to the consumer tasks, which upload them to S3 using the existing functionality for uploading a single object. A UploadObjectsHandle manages these tasks and joining this handle will drive the async tasks.

A couple of details. First, for yielding files to upload, we use the walkdir crate, which supports following symbolic links and recursive traversal natively. Since this crate offers a synchronous API, we integrate the blocking crate to handle blocking I/O tasks in a dedicated pool. If you think this approach is unnecessary, I’m open to either retaining or removing the blocking crate. Using something like spawn_blocking would require the whole directory traversal task to be a sync function to be passed to spawn_blocking, which complicates the use of the async_channel within the function (that said we left TODO to reevaluate the need for the blocking crate).

Second, during directory traversal, I/O errors may occur, such as being unable to read directories/files or failing to construct an InputStream. These client-side errors are sent to the consumer task, which records them in FailedUploadTransfer (in which case the input field will be None for client-side errors) but we proceed with the rest of the "good" files if we employ FailedTransferPolicy::Continue. Deriving an object key from a relative filename may also fail; however, since this is a user input error, we will fail the upload operation even with FailedTransferPolicy::Continue.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ysaito1001 ysaito1001 marked this pull request as ready for review October 29, 2024 22:32
@ysaito1001 ysaito1001 requested a review from a team as a code owner October 29, 2024 22:32
Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, great start!

aws-s3-transfer-manager/examples/cp.rs Outdated Show resolved Hide resolved
aws-s3-transfer-manager/src/operation.rs Show resolved Hide resolved
@ysaito1001 ysaito1001 merged commit c88e07f into main Nov 6, 2024
14 checks passed
@ysaito1001 ysaito1001 deleted the ysaito/upload_objects branch November 6, 2024 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants