-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update machine to 3.5.1 and small bug #546
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #546 +/- ##
==========================================
- Coverage 57.03% 56.97% -0.06%
==========================================
Files 302 302
Lines 15605 15620 +15
Branches 2151 2153 +2
==========================================
Hits 8900 8900
- Misses 6063 6077 +14
- Partials 642 643 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 97 at r1 (raw file):
buildOptionsObject = JsonSerializer.Deserialize<JsonObject>(buildOptions); using MemoryStream sourceStream = new();
What exactly is prompting this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only needed for train
and not for pretranslate
?
Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)
Previously, Enkidu93 (Eli C. Lowry) wrote…
Much crashing - it's trying to call Write on the S3StreamWriter (NotImplementedException). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brings it back to what it was before. For some reason, the Json writer works async.
Reviewable status: 7 of 8 files reviewed, 1 unresolved discussion (waiting on @ddaspit and @Enkidu93)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r1, 4 of 5 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 116 at r3 (raw file):
if (row.SourceSegment.Length > 0 || row.TargetSegment.Length > 0) { await sourceTrainWriter.WriteLineAsync(row.SourceSegment);
WriteLineAsync
uses the OS-specific newline characters. I would prefer that we revert to the previous behavior where it always uses \n
, so that the behavior is consistent across platforms.
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 126 at r3 (raw file):
if (row.SourceSegment.Length > 0 && row.TargetSegment.Length == 0) { pretranslateWriter.WriteStartObject();
I spent a bit of time trying to understand how to properly use Utf8JsonWriter
in an async context. This class is intended for low-level, optimized use cases. It does not automatically flush the buffer to the underlying stream. It only writes when the class is disposed or FlushAsync
is explicitly called. In order to avoid writing all pretranslations to memory before flushing, we should periodically call FlushAsync
. We can use the BytesPending
property to flush after "x" amount of data is in the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 126 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I spent a bit of time trying to understand how to properly use
Utf8JsonWriter
in an async context. This class is intended for low-level, optimized use cases. It does not automatically flush the buffer to the underlying stream. It only writes when the class is disposed orFlushAsync
is explicitly called. In order to avoid writing all pretranslations to memory before flushing, we should periodically callFlushAsync
. We can use theBytesPending
property to flush after "x" amount of data is in the buffer.
We could flush every time 4096 (or something like that) bytes are pending.
Previously, ddaspit (Damien Daspit) wrote…
Done. |
Previously, ddaspit (Damien Daspit) wrote…
No we can't - the S3Writer itself breaks up the parts and has to upload everything in one go - otherwise we get a: Amazon.S3.AmazonS3Exception: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 126 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
No we can't - the S3Writer itself breaks up the parts and has to upload everything in one go - otherwise we get a:
Amazon.S3.AmazonS3Exception: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.
I'm confused. The S3WriteStream
is specifically designed to upload in chunks. If we are receiving an exception from writing to the stream multiple times, then it sounds like a bug. @Enkidu93 Do you have any idea what is going on here?
Previously, ddaspit (Damien Daspit) wrote…
I remember working on this about 6 months ago. The S3 bucket interface was finicky. They really want you to have the whole file ready before you start writing anything. You can break it into pieces, but all of the pieces need to be the predefined size. We could implement double buffering, but that would add more complexity. |
Previously, johnml1135 (John Lambert) wrote…
I think I know how I would do it. I will implement the double buffering in the S3 writer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ddaspit)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 126 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
I think I know how I would do it. I will implement the double buffering in the S3 writer.
You mentioned getting an AmazonS3Exception
: What line is throwing that? Or could you point me to a failed E2E?
OK, seems like you have a plan - what precisely do you think is going wrong that having two buffers would fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 9 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 126 at r3 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
You mentioned getting an
AmazonS3Exception
: What line is throwing that? Or could you point me to a failed E2E?OK, seems like you have a plan - what precisely do you think is going wrong that having two buffers would fix?
Can you give more details about what is happening? I don't understand what "have the whole file ready" means. It doesn't sound like double buffering is the right solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 9 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 126 at r3 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Can you give more details about what is happening? I don't understand what "have the whole file ready" means. It doesn't sound like double buffering is the right solution.
It is more likely that we are simply calling the S3 API incorrectly.
Make is so that write async can be called multiple times on S3Writer. Never have the S3 buffer grow above max size Update machine to 3.5.1
41188aa
to
244d66f
Compare
Previously, ddaspit (Damien Daspit) wrote…
I updated S3Writer - now it will break any stream written to it into parts of MaxUploadSize, while keeping the internal stream size also capped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 9 files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 126 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
I updated S3Writer - now it will break any stream written to it into parts of MaxUploadSize, while keeping the internal stream size also capped.
I think I understand what is going on. S3 has a minimum part size of 5MB, unless it is the last part of the file, so we need to buffer the stream until we have 5MB of data to send a part. Is that correct?
Previously, ddaspit (Damien Daspit) wrote…
Yes - https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 126 at r3 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Yes - https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html
Good work figuring that out. This change will make the S3 code more robust.
src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs
line 138 at r6 (raw file):
} if (pretranslateWriter.BytesPending > 1024 * 1024) pretranslateWriter.FlushAsync();
You should await the FlushAsync
call.
Previously, ddaspit (Damien Daspit) wrote…
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r7, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
This change is