Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing inflected forms to glossary and remove unused terms #1

Merged
merged 1 commit into from
Jul 25, 2024
Merged

Add missing inflected forms to glossary and remove unused terms #1

merged 1 commit into from
Jul 25, 2024

Conversation

apasel422
Copy link
Contributor

This is not ready to be merged yet, as there are still some terms flagged by m-070 that I'm investigating.

Identified in standardebooks/tools#732.

@apasel422
Copy link
Contributor Author

There are a few things left that I'm not sure about:

  1. savo is listed in the <search-key-map>, but the corresponding definition has an acute accent on the o. In any case, neither the accented nor unaccented form appears in the text, so I'm inclined to remove it from the <search-key-map>.
  2. loovu only appears in the text as part of dloovu, but each word has its own definition in the glossary. Unfortunately for loovu that definition is just See., which is ambiguous as to whether dloovu means the verb see or whether it is an instruction to the reader to look at the immediately following entry for lovó. Based on context, the latter is the intent, but in general, there are a number of similar problems in this glossary and I wonder if it makes sense editorially to clean it up a bit in that regard by using multiple <dt>s for the same <dd>.
  3. dukker does not appear in the text as is, but rather as other seemingly inflected forms. As such, I'm inclined to remove it from the <search-key-map>.
  4. boró appears with the acute accent on the second o only in the glossary. Elsewhere it seems to be written without one, e.g. here. I'm not sure if the text is in error, or the glossary, or if the difference is intentional.

@apasel422 apasel422 marked this pull request as ready for review July 17, 2024 21:29
@apasel422 apasel422 changed the title Add missing inflected forms to glossary Add missing inflected forms to glossary and remove unused terms Jul 17, 2024
@acabal
Copy link
Member

acabal commented Jul 19, 2024

1. [`savo`](https://github.com/standardebooks/george-borrow_lavengro/blob/0277ad8df73109e92a6adec5974f59dd1c03729e/src/epub/glossary-search-key-map.xml#L739) is listed in the `<search-key-map>`, but [the corresponding definition](https://github.com/standardebooks/george-borrow_lavengro/blob/0277ad8df73109e92a6adec5974f59dd1c03729e/src/epub/text/glossary.xhtml#L1446) has an acute accent on the `o`. In any case, neither the accented nor unaccented form appears in the text, so I'm inclined to remove it from the `<search-key-map>`.

Yes, remove

2. [`loovu`](https://github.com/standardebooks/george-borrow_lavengro/blob/0277ad8df73109e92a6adec5974f59dd1c03729e/src/epub/glossary-search-key-map.xml#L476) only appears in the text as part of `dloovu`, but each word has its own definition in the glossary. Unfortunately for `loovu` [that definition](https://github.com/standardebooks/george-borrow_lavengro/blob/0277ad8df73109e92a6adec5974f59dd1c03729e/src/epub/text/glossary.xhtml#L927) is just `See.`, which is ambiguous as to whether `dloovu` _means_ the verb `see` or whether it is an instruction to the reader to look at the immediately following entry for `lovó`. Based on context, the latter is the intent, but in general, there are a number of similar problems in this glossary and I wonder if it makes sense editorially to clean it up a bit in that regard by using multiple `<dt>`s for the same `<dd>`.

Did David forget to include a link? What does the definition of loovu look like in the page scans? I'm not opposed to a cleanup but we should confirm with the scans to make sure it's not our transcription that's the problem.

3. [`dukker`](https://github.com/standardebooks/george-borrow_lavengro/blob/0277ad8df73109e92a6adec5974f59dd1c03729e/src/epub/glossary-search-key-map.xml#L254) does not appear in the text as is, but rather as other seemingly inflected forms. As such, I'm inclined to remove it from the `<search-key-map>`.

This sounds like another case of where we should have the root somewhere in the search key map, say if the user selects just the dukker portion of the longer inflected word.

4. [`boró`](https://github.com/standardebooks/george-borrow_lavengro/blob/0277ad8df73109e92a6adec5974f59dd1c03729e/src/epub/glossary-search-key-map.xml#L68) appears with the acute accent on the second `o` only [in the glossary](https://github.com/standardebooks/george-borrow_lavengro/blob/0277ad8df73109e92a6adec5974f59dd1c03729e/src/epub/text/glossary.xhtml#L140). Elsewhere it seems to be written without one, e.g. [here](https://github.com/standardebooks/george-borrow_lavengro/blob/0277ad8df73109e92a6adec5974f59dd1c03729e/src/epub/text/chapter-1-54.xhtml#L61). I'm not sure if the text is in error, or the glossary, or if the difference is intentional.

Check the page scans, and if the page scans disagree then go with the one that occurs more often.

@apasel422
Copy link
Contributor Author

I've removed savo, but I think items 2-4 need to be investigated by a subject-matter expert.

@acabal
Copy link
Member

acabal commented Jul 23, 2024

We don't have any Romanian subject matter experts, just make your best judgement. It can always be fixed later if someone more knowledgeable comes around.

@apasel422
Copy link
Contributor Author

What does the definition of loovu look like in the page scans?

It seems to match what we have. I've decided to just remove the loovu search term, since there's already one for dloovu and I'm not sure why a reader would highlight only the last five characters of a word they're unfamiliar with and whose morpheme structure they are presumably also unfamiliar with. That said, we should still probably make the editorial decision to replace See. with See <a href="#lovo">lovó</a>.

Check the page scans, and if the page scans disagree then go with the one that occurs more often.

Unfortunately the scan is cut off on Google Books, but Project Gutenberg seems to match what we have, so I've chosen to add the unaccented version as an additional <value/> and retain the accented form, since otherwise it's a bit weird that the glossary spells it that way but searching for it won't find it.

This sounds like another case of where we should have the root somewhere in the search key map, say if the user selects just the dukker portion of the longer inflected word.

Done.

@acabal acabal merged commit 2101942 into standardebooks:master Jul 25, 2024
@acabal
Copy link
Member

acabal commented Jul 25, 2024

OK, thanks! Does this wrap it up for m-070 or is there still more to do?

@apasel422 apasel422 deleted the glossary branch July 25, 2024 18:24
@apasel422
Copy link
Contributor Author

@acabal That should be it! Thanks for your guidance.

@acabal
Copy link
Member

acabal commented Jul 25, 2024

OK, so is the m-070 PR ready to go as-is or did it need further tweaks after all?

@apasel422
Copy link
Contributor Author

apasel422 commented Jul 25, 2024

It's ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants