You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have two algorithms at play in divvunspell that don't exist in hfst-ospell:
Case handling
Penalty weighting for first letter different, last letter difference and Damerau–Levenshtein distance for middle letters
Things to do to make this good:
Document somewhere sane how the algorithms behave
Add some information to --help either with a link or with the information itself
In the suggestion output for divvunspell, show the penalties, and the unmodified weights, as well as the modified weights
Document how to add the weight information to BHFST files so it can be controlled by the linguist
If possible, add a flag for disabling the penalty weighting algorithm (like --no-case-handling already does somewhat, but separate the two into different flags)
The text was updated successfully, but these errors were encountered:
Just a ping: this is important for me; I have orthographic corrections that
specifically apply to the beginning and ends of words; these are given low (or
even zero!) weight.
hfst-ospell makes the correct suggestions, but divvunspell overrides some of
these with much less appropriate corrections. It would be great if I could add
some information to the .bhfst to modify this.
Here's an example.
Input: кера (final glyph is "cyrillic a")
Correct spelling: кера̄ (final glyph is "cyrillic a" + "combining macron")
Suggested spellings (hfst-ospell):
$ echo 'кера' | hfst-ospell tsez.zhfst -S | head
"кера" is NOT in the lexicon:
Corrections for "кера":
кера̄ 1.000000
кека 10.000000
кекра 10.000000
кеза 10.000000
кура 10.000000
кераз 10.000000
кеца 10.000000
кецра 10.000000
We have two algorithms at play in divvunspell that don't exist in hfst-ospell:
Things to do to make this good:
--help
either with a link or with the information itselfdivvunspell
, show the penalties, and the unmodified weights, as well as the modified weights--no-case-handling
already does somewhat, but separate the two into different flags)The text was updated successfully, but these errors were encountered: