-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix m-070 and add tests #732
Conversation
It's unclear to me whether the inflected forms that aren't present in the text should be present in the |
I think there's still an error here somewhere, because in in your output for Morte d'Arthur it says
Correct on both counts. |
Good find. The issue is that the call to |
You could always clone the dom and strip out all endnotes before testing |
I've fixed the |
Where's the new gist? |
Sorry, updated at the same URL: https://gist.github.com/apasel422/0fcaebed784fbc7fa45282cb1bbbe037. |
Bulfich still matches |
https://github.com/search?q=repo%3Astandardebooks%2Fthomas-bulfinch_bulfinchs-mythology%20pariah&type=code shows that it only appears in the plural. |
I suppose the confusion here is that while Additionally, the glossary spec has this to say about matches with multiple
So I understand the spec to mean that if a I haven't looked at the glossary spec in some time. I think it would be worth it to review it to make sure we're not removing useful information here, even if a bounded string doesn't strictly occur in the text. |
This sounds reasonable, but since we don't have a way to determine whether a particular glossary value is the stem value, perhaps the right thing to do would be to ensure that at least one of the values for a term does appear, rather than require that all of its values do. As is, we shouldn't be assuming that a stem is a prefix of all of its inflected terms, and the current behavior that causes |
Maybe it's enough to simply check child |
With this, a few more entries are removed (see diff). Some notes:
|
More notes:
|
This update fixes So how can we fix this test so this valid case doesn't get caught? |
Maybe I'm misunderstanding the glossary spec, but wouldn't such an entry be more precisely represented as: <search-key-group href="text/glossary.xhtml#plutonic-rocks">
<match value="plutonic"/>
<match value="plutonic rock">
<value value="plutonic rocks"/>
</match>
</search-key-group> With that formulation, I think the current PR would avoid emitting any errors for this case. For example, this seems similar to https://idpf.org/epub/dict/epub-dict-20140204.html#id.7fhcrrky2qjt. However, as I suggested above, if you want to avoid having to change that |
I suppose another option would be to continue requiring all |
OK, yes, I agree with your proposed change to the search key map. What we should do next is go through the remaining errors this update highlights, and confirm that we can fix them in a similar way that makes sense. As we do that, we can submit PRs to fix them. If we encounter a surprise edge case then we can revisit the test. If we're able to fix the entire corpus with no surprises then we can merge this in. Do you have time to do that? |
Will do. I'll post an updated list of speculative changes later, perhaps with example PRs against the affected repos. |
I ran the updated lint against the corpus entries that contain a glossary, and it revealed a large number of unused entries. Some of these are typos (e.g.
Ygdrasil
as the glossary term vsYgdrasill
appearing in the text of Bulfinch's Mythology), while others are a singular/plural form that doesn't appear in that inflected form in the text (e.g.trilobite
vstrilobites
in The Origin of Species).Fixes #731