Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix corpus e2e tests #509

Merged
merged 3 commits into from
Oct 15, 2024
Merged

Fix corpus e2e tests #509

merged 3 commits into from
Oct 15, 2024

Conversation

johnml1135
Copy link
Collaborator

@johnml1135 johnml1135 commented Oct 11, 2024

A few updates - this should resolve all issues.


This change is Reviewable

@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 15 lines in your changes missing coverage. Please review.

Project coverage is 56.67%. Comparing base (261dde1) to head (a913230).

Files with missing lines Patch % Lines
...hared/Services/ServalTranslationEngineServiceV1.cs 0.00% 9 Missing ⚠️
...l/src/Serval.Translation/Services/EngineService.cs 86.11% 1 Missing and 4 partials ⚠️
...rval.Machine.Shared/Services/PreprocessBuildJob.cs 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #509      +/-   ##
==========================================
+ Coverage   56.62%   56.67%   +0.04%     
==========================================
  Files         299      299              
  Lines       15592    15625      +33     
  Branches     2149     2154       +5     
==========================================
+ Hits         8829     8855      +26     
- Misses       6109     6114       +5     
- Partials      654      656       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing this, John! Seems like you got to familiarize yourself with the new code just like you wanted 😆

Reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ddaspit and @johnml1135)


src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 469 at r1 (raw file):

    }

    private static IEnumerable<Row> GetPretranslateCorpusNoTarget(ITextCorpus[] srcCorpora)

Seems like this method and AlignPretranslateRows could be merged into a single method. That might allow the logic in WriteDataFiles to be simpler. Just a thought - not necessary.

@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 469 at r1 (raw file):

Previously, Enkidu93 (Eli C. Lowry) wrote…

Seems like this method and AlignPretranslateRows could be merged into a single method. That might allow the logic in WriteDataFiles to be simpler. Just a thought - not necessary.

I wanted to, but the TextRow and ParallelTextRow are incompatible. Slicing it up looked like a bit too much for me. I could possibly make a another class, etc., but I think it is cleaner just to have two functions. Check it out yourself - row.SourceText.Length > 0 vs row.Segment.Count > 0 and !row.IsTargetRangeStart && row.IsTargetInRange vs !row.IsRangeStart && row.IsInRange.

@johnml1135
Copy link
Collaborator Author

Also, Machine will need to be re-released to handle the infinite loop. The E2E tests are failing because of it.

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @Enkidu93 and @johnml1135)


src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 469 at r1 (raw file):

Previously, johnml1135 (John Lambert) wrote…

I wanted to, but the TextRow and ParallelTextRow are incompatible. Slicing it up looked like a bit too much for me. I could possibly make a another class, etc., but I think it is cleaner just to have two functions. Check it out yourself - row.SourceText.Length > 0 vs row.Segment.Count > 0 and !row.IsTargetRangeStart && row.IsTargetInRange vs !row.IsRangeStart && row.IsInRange.

I think that if you pass in an empty target text corpus to the AlignPretranslateCorpus, then it will work for this case.


src/Machine/src/Serval.Machine.Shared/Services/ServalTranslationEngineServiceV1.cs line 308 at r2 (raw file):

            PretranslateTextIds = pretranslateFilter == FilterChoice.TextIds ? pretranslateTextIds : null
        };
        if (source.PretranslateAll)

GetFilterChoice would be the appropriate place to perform this check. At least, that was where we use to perform this check.


src/Serval/src/Serval.Translation/Services/EngineService.cs line 604 at r2 (raw file):

            new() { Language = source.TargetLanguage, Files = { source.TargetFiles.Select(Map) } };

        if (trainingCorpus == null || (trainingCorpus.TextIds is null && trainingCorpus.ScriptureRange is null))

Might as well use trainingCorpus is null here to make it consistent.


src/Serval/src/Serval.Translation/Services/EngineService.cs line 645 at r2 (raw file):

        }
        if (
            pretranslateCorpus == null

See might comment above.

@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Services/ServalTranslationEngineServiceV1.cs line 308 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

GetFilterChoice would be the appropriate place to perform this check. At least, that was where we use to perform this check.

I see now - done.

@johnml1135
Copy link
Collaborator Author

src/Serval/src/Serval.Translation/Services/EngineService.cs line 604 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Might as well use trainingCorpus is null here to make it consistent.

done - trying to add IDE0041 to editorconfig, but it is not catching the errors. Skipping for now.

@johnml1135
Copy link
Collaborator Author

src/Serval/src/Serval.Translation/Services/EngineService.cs line 645 at r2 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

See might comment above.

done

@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 469 at r1 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I think that if you pass in an empty target text corpus to the AlignPretranslateCorpus, then it will work for this case.

Done.

Put "TrainOnAll" and "PretranslateAll" back in
Don't do https redirection
Fixes for E2E testing
Add E2E test
Fix echo
@johnml1135 johnml1135 force-pushed the Fix_corpus_e2e_tests branch from 7ec02cf to 6c95566 Compare October 15, 2024 18:27
Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @johnml1135)


src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 258 at r3 (raw file):

                targetCorpora.Length > 0
                    ? targetCorpora[0].TextCorpus
                    : new DictionaryTextCorpus(new List<MemoryText>());

Passing in an empty list shouldn't be necessary.

@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 258 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

Passing in an empty list shouldn't be necessary.

The fields are nullable in Machine - different version of dotnet...

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 11 of 12 files reviewed, 1 unresolved discussion (waiting on @johnml1135)


src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 258 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

The fields are nullable in Machine - different version of dotnet...

I meant that you don't need to pass an empty list into the DictionaryTextCorpus constructor. I think this change will break things.

@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 258 at r3 (raw file):

Previously, ddaspit (Damien Daspit) wrote…

I meant that you don't need to pass an empty list into the DictionaryTextCorpus constructor. I think this change will break things.

Yes, it does: System.NullReferenceException: Object reference not set to an instance of an object.
machine-job-cntr | at SIL.Machine.Corpora.ParallelTextCorpus.GetRows(IEnumerable1 textIds)+MoveNext() in /home/johnml1135/repos/machine/src/SIL.Machine/Corpora/ParallelTextCorpus.cs:line 39 machine-job-cntr | at System.Linq.Enumerable.SelectManySingleSelectorIterator2.MoveNext()
machine-job-cntr | at Serval.Machine.Shared.Services.PreprocessBuildJob.AlignPretranslateCorpus(ITextCorpus[] srcCorpora, ITextCorpus trgCorpus)+MoveNext() in /home/johnml1135/repos/serval/src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs:line 424

@johnml1135
Copy link
Collaborator Author

src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs line 258 at r3 (raw file):

Previously, johnml1135 (John Lambert) wrote…

Yes, it does: System.NullReferenceException: Object reference not set to an instance of an object.
machine-job-cntr | at SIL.Machine.Corpora.ParallelTextCorpus.GetRows(IEnumerable1 textIds)+MoveNext() in /home/johnml1135/repos/machine/src/SIL.Machine/Corpora/ParallelTextCorpus.cs:line 39 machine-job-cntr | at System.Linq.Enumerable.SelectManySingleSelectorIterator2.MoveNext()
machine-job-cntr | at Serval.Machine.Shared.Services.PreprocessBuildJob.AlignPretranslateCorpus(ITextCorpus[] srcCorpora, ITextCorpus trgCorpus)+MoveNext() in /home/johnml1135/repos/serval/src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs:line 424

Done

Copy link
Contributor

@ddaspit ddaspit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)

@johnml1135 johnml1135 merged commit 30697ae into main Oct 15, 2024
2 checks passed
@johnml1135 johnml1135 deleted the Fix_corpus_e2e_tests branch October 24, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants