Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored S3 access #80

Merged
merged 1 commit into from
Sep 8, 2023
Merged

Refactored S3 access #80

merged 1 commit into from
Sep 8, 2023

Conversation

Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Sep 7, 2023

Reverted to low-level implementation (removing need for large buffers); fixes Refine S3 bucket usage #61

Also removed synchronous write; fixes S3 remove sync write (async only) #64

Note: Dispose has async calls/logic because "await using" on the StreamWriters in ClearMLNMTBuildJob calls DisposeAsync on the StreamWriters but Dispose on the BaseStream. This is the only way I could find to circumvent this - other solutions are welcome given there's a workaround I'm missing.

--ECL


This change is Reviewable

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Enkidu93 and @johnml1135)


src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs line 82 at r1 (raw file):

        return new BufferedStream(
            new S3WriteStream(_client, objectId, _bucketName, response.UploadId),
            1024 * 1024 * 5

I've seen this twice. This should probably be a constant, maybe on S3WriteStream.


src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 18 at r1 (raw file):

        _bucketName = bucketName;
        _uploadId = uploadId;
        _logger = new LoggerFactory().CreateLogger<S3WriteStream>();

You should inject ILoggerFactory into S3FileStorage, use it to create the logger instance, and pass it to this class in the constructor.


src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 79 at r1 (raw file):

        catch (Exception e)
        {
            await Abort(e);

You should rethrow the exception after aborting.


src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 109 at r1 (raw file):

            catch (Exception e)
            {
                Abort(e).WaitAndUnwrapException();

You should rethrow the exception after aborting.


src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 141 at r1 (raw file):

    }

    private async Task Abort(Exception e)

This should be suffixed with Async.

@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Sep 7, 2023

src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 18 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should inject ILoggerFactory into S3FileStorage, use it to create the logger instance, and pass it to this class in the constructor.

Can I do that or do I have to inject it at the ShareFileService level since the S3FileStorage is manually constructed with its constructor? Is there a way to get around that - I hate to just pass LoggerFactories if there's a better way.

@johnml1135
Copy link
Collaborator

My review is honestly very brief - I am relying on Damien's review, being that he knows the interface and the memory buffer mechanisms better than I do. Nothing stands out as an issue other than the question - I don't see any changes in the testing. Are we testing the multipart upload/download? Are we sure we are breaking and reconstituting these portions correctly?

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ddaspit and @Enkidu93)

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which tests would you expect to see changes? The multipart upload strategy was being used from the get-go. It's just that we're using the SDK instead of hand-crafting http requests. I've followed the SDK pretty closely. The only difference is the strategy we use (which is ultimately borrowed from the stowage code) in which we write to a 5MB buffered stream (5MB being the max part size) which is flushed as needed, and then we complete the multipart upload on disposal. I can verify that it works properly inasmuch as the E2E tests cover file uploads. (e.g., NmtBatch passes).

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ddaspit)


src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs line 82 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I've seen this twice. This should probably be a constant, maybe on S3WriteStream.

Done.


src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 79 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should rethrow the exception after aborting.

Done.


src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 109 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

You should rethrow the exception after aborting.

Done.


src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 141 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

This should be suffixed with Async.

Done.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)


src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs line 82 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Done.

MaxPartSize would probably be a better name.


src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs line 18 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Can I do that or do I have to inject it at the ShareFileService level since the S3FileStorage is manually constructed with its constructor? Is there a way to get around that - I hate to just pass LoggerFactories if there's a better way.

Yes, you are correct. SharedFileService is where you need to inject the logger factory. Unfortunately, there isn't a much better way to handle this.


src/SIL.Machine.AspNetCore/Services/SharedFileService.cs line 1 at r2 (raw file):

using SIL.Reporting;

Is this using needed?

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 2 unresolved discussions (waiting on @ddaspit and @johnml1135)


src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs line 82 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

MaxPartSize would probably be a better name.

Done.


src/SIL.Machine.AspNetCore/Services/SharedFileService.cs line 1 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Is this using needed?

Done.

@johnml1135
Copy link
Collaborator

At least in one "missing test" - it would be good to test that the S3 multi-part loading works. Even if it is a test that uploads a larger file (of random data even) to S3 bucket and then downloads it again.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like an integration test. We currently haven't implemented any integration tests for Machine. Could this be covered in a Serval E2E test? Although I don't want to get into the business of writing E2E tests to cover small test cases deep in the system. I guess I am asking, do any of our existing E2E tests cover this code branch?

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

…); fixes Refine S3 bucket usage #61

Also removed synchronous write; fixes S3 remove sync write (async only) #64

Note: Dispose has async calls/logic because "await using" on the StreamWriters in ClearMLNMTBuildJob calls DisposeAsync on the StreamWriters but Dispose on the BaseStream. This is the only way I could find to circumvent this - other solutions are welcome given there's a workaround I'm missing.
Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)

@johnml1135 johnml1135 merged commit da821a7 into master Sep 8, 2023
2 checks passed
@Enkidu93
Copy link
Collaborator Author

Enkidu93 commented Sep 8, 2023

This sounds like an integration test. We currently haven't implemented any integration tests for Machine. Could this be covered in a Serval E2E test? Although I don't want to get into the business of writing E2E tests to cover small test cases deep in the system. I guess I am asking, do any of our existing E2E tests cover this code branch?

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @Enkidu93)
is
@ddaspit @johnml1135 The easiest way I see to cover the multi-part element (if that's what you're getting at) is to change (or add a near clone of) one of the E2E tests in serval that uploads a file of more than 5MB.

This was referenced Sep 8, 2023
@johnml1135
Copy link
Collaborator

E2E test for multi-part upload can be handled here: sillsdev/serval#125

@Enkidu93 Enkidu93 deleted the s3_refinement_#61 branch October 27, 2023 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants