-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix USFM parsing bugs #229
Conversation
*Fix off-by-one error *Handle triplicate, quadruplicate, n-plicate verses *Add test to cover triplicate verse
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #229 +/- ##
==========================================
- Coverage 69.67% 69.66% -0.02%
==========================================
Files 374 377 +3
Lines 31317 31374 +57
Branches 4387 4391 +4
==========================================
+ Hits 21821 21856 +35
- Misses 8478 8498 +20
- Partials 1018 1020 +2 ☔ View full report in Codecov by Sentry. |
I would check a few more things:
|
How does this interact with the other sources of errors? |
Why put the try/catch here rather than in the |
Previously, johnml1135 (John Lambert) wrote…
Ahh - no more "stack empty" errors. Please look into the other error handling for these functions and make sure that there is no redundant information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 5 files at r1, all commit messages.
Reviewable status: 4 of 5 files reviewed, 3 unresolved discussions (waiting on @ddaspit and @Enkidu93)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 3 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/SIL.Machine/Corpora/UsfmParser.cs
line 582 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Why put the try/catch here rather than in the
ProcessTokens
function? TheProcessToken
function is already 300 lines of code ...
That's a good point. Somehow I wanted to not break the lovely simplicity of ProcessTokens
😆 . Done.
src/SIL.Machine/Corpora/UsfmParsingException.cs
line 10 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
Ahh - no more "stack empty" errors. Please look into the other error handling for these functions and make sure that there is no redundant information.
Exactly! This way, a readable error will propagate up. I couldn't see any other error handling with specific info.
tests/SIL.Machine.Tests/Corpora/UsfmMemoryTextTests.cs
line 86 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
I would check a few more things:
- That verse one text truly concatenates
- That a verse 2 is added
- That non-verse text in all three parts of verse 1 are added properly
Done.
Previously, Enkidu93 (Eli C. Lowry) wrote…
In UsfmTextBase, it also calls |
f5be462
to
d9d77db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 3 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/SIL.Machine/Corpora/UsfmParsingException.cs
line 10 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
In UsfmTextBase, it also calls
ProcessTokens
and uses theColumnNumber
among other things to propagate the error upward. It would likely be good to remove the duplicate information from the error, even if you catch and rethrow as you have here.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r1, 2 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/SIL.Machine/Corpora/UsfmParser.cs
line 57 at r4 (raw file):
); sb.Append($"column: {parser.State.ColumnNumber}, error: '{ex.Message}'"); throw new InvalidOperationException(sb.ToString(), ex);
I would rather not throw a new exception here. I want to avoid a deeply nested hierarchy of inner exceptions that can hide the initial exception. I intentionally shifted the responsibility for these types of exceptions to the classes that use the parser, so that they can add information that is relevant to their context. Also, this function is one of many ways that the UsfmParser
class can be used. I don't want to have inconsistent exception handling. It would be better to throw this type of exception from the calling code.
src/SIL.Machine/Corpora/UsfmTextUpdater.cs
line 396 at r4 (raw file):
private void PopNewTokens() { // if (_replace.Any())
Was this left in by accident?
…achine into fix_usfm_parsing_bugs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 5 files reviewed, 5 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/SIL.Machine/Corpora/UsfmTextUpdater.cs
line 396 at r4 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
Was this left in by accident?
Yes, it was - thank you! Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 5 files reviewed, 5 unresolved discussions (waiting on @ddaspit and @johnml1135)
src/SIL.Machine/Corpora/UsfmParser.cs
line 57 at r4 (raw file):
Previously, ddaspit (Damien Daspit) wrote…
I would rather not throw a new exception here. I want to avoid a deeply nested hierarchy of inner exceptions that can hide the initial exception. I intentionally shifted the responsibility for these types of exceptions to the classes that use the parser, so that they can add information that is relevant to their context. Also, this function is one of many ways that the
UsfmParser
class can be used. I don't want to have inconsistent exception handling. It would be better to throw this type of exception from the calling code.
OK. I initially had the ProcessToken
throwing a custom UsfmParsingException
that gave line, column, verse ref, and token context information (that way, it's the same regardless of how it is called), but John said you weren't a fan of that either judging from an earlier PR and committed this instead. What do you suggest, @ddaspit? The way it was was pretty unhelpful. When running the CreateUsfm
test, all you're getting is Stack empty or Index out of bounds - it would be helpful to have a more meaningful exception that can help you diagnose the issue. I mean, of course you can just use the debugger 😆 , but it doesn't seem particularly clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 5 files reviewed, 5 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/SIL.Machine/Corpora/UsfmParser.cs
line 57 at r4 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
OK. I initially had the
ProcessToken
throwing a customUsfmParsingException
that gave line, column, verse ref, and token context information (that way, it's the same regardless of how it is called), but John said you weren't a fan of that either judging from an earlier PR and committed this instead. What do you suggest, @ddaspit? The way it was was pretty unhelpful. When running theCreateUsfm
test, all you're getting is Stack empty or Index out of bounds - it would be helpful to have a more meaningful exception that can help you diagnose the issue. I mean, of course you can just use the debugger 😆 , but it doesn't seem particularly clean.
Here is my concern. If we add another layer of exception handling in-between the original exception and the exception at the Serval/SIL.Machine.Corpora layer, then you end up with multiple levels of nesting (InvalidOperationException
-> UsfmParsingException
-> original exception). Each layer obfuscates the original exception even more. I was hoping to keep it to two levels: the original exception and the calling code exception that adds relevant contextual information.
Previously, ddaspit (Damien Daspit) wrote…
I understand - particularly in regards to having a helpful error message bubble up to the other end - you're forced to not only pass the exception into the constructor but also append the message to the exception's message to ensure it's interpretable all the way up which is awkward. And it needs to be an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 5 files reviewed, 4 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
src/SIL.Machine/Corpora/UsfmParser.cs
line 57 at r4 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
I understand - particularly in regards to having a helpful error message bubble up to the other end - you're forced to not only pass the exception into the constructor but also append the message to the exception's message to ensure it's interpretable all the way up which is awkward. And it needs to be an
InvalidOperationException
, right? If so and if your main concern is the nesting, could we throw theInvalidOperationException
fromProcessTokens
? Or perhaps instead of callingProcessTokens
directly, overloadParse
again to take tokens as a first argument and then that can be calledGetVersesInDocOrder
? That way, we wouldn't have to have multiple places that we're throwing this special, parsing-info-ladenInvalidOperationException
. Or is the concern that you'd like to have project information in the exception and that isn't possible from within the parser?
The calling code might have contextual information that isn't available to the parser. The corpus class adds project information to the exception. Pushing the responsibility to the calling code for how exceptions are handled allows the calling code to customize the exceptions for their specific requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I submitted a commit that adds a new ParatextProjectTextUpdaterBase
class. It mirrors the ParatextProjectSettingsParserBase
class and it can be used to update the USFM from a Paratext project. @Enkidu93, you will need to fill out the missing zip classes. You can follow the ZipParatextProjectSettingsParserBase
and ZipParatextProjectSettingsParser
classes. This will give us a logical place to throw an exception with project information similar to what we do in UsfmTextBase
. We will need to update Serval to use the new classes.
Reviewable status: 4 of 9 files reviewed, 4 unresolved discussions (waiting on @Enkidu93 and @johnml1135)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. I'm on it 👍.
Reviewable status: 4 of 9 files reviewed, 4 unresolved discussions (waiting on @ddaspit and @johnml1135)
What is the rationale behind having a |
I'll just push what I have and you can review haha. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a draft - I'll ping when it's time for review (i.e., once I've prepped parallel PR in Serval).
Reviewable status: 4 of 11 files reviewed, 4 unresolved discussions (waiting on @ddaspit and @johnml1135)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnml1135 @ddaspit I think it's ready for review
Reviewable status: 3 of 11 files reviewed, 4 unresolved discussions (waiting on @ddaspit and @johnml1135)
Parallel PR: sillsdev/serval#447 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r6, 1 of 4 files at r7, 5 of 5 files at r9, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @johnml1135)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r1, 1 of 2 files at r2, 2 of 3 files at r3, 2 of 5 files at r6, 1 of 4 files at r7, 5 of 5 files at r9, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
tests/SIL.Machine.Tests/Corpora/UsfmMemoryTextTests.cs
line 86 at r1 (raw file):
Previously, Enkidu93 (Eli C. Lowry) wrote…
Done.
Ok, so the duplicates are thrown on the ground, but each non-verse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @Enkidu93)
Fixes #228
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)