Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syllable boundaries #24

Open
wollmers opened this issue Dec 6, 2024 · 2 comments
Open

Syllable boundaries #24

wollmers opened this issue Dec 6, 2024 · 2 comments

Comments

@wollmers
Copy link

wollmers commented Dec 6, 2024

In the default German text the word 'Mithilfe' (Worttrennung: mit·hil·fe, IPA: [mɪtˈhɪlfə]), is transliterated to Ancient Greek 'Μιθιλφε', which IMHO should be 'Μιτἱλφε'.

@davidpomerenke
Copy link
Owner

Agree about the example.

The current implementation uses simple rules that do not take syllable boundaries into account. I suspect it would be very effortful to add consideration for syllable boundaries, and the benefit is very limited. (For German, the transliteration works relatively well and syllable boundaries are not a big concern; and for English there's a lot of other problems besides syllable boundaries.)

One approach could be converting to IPA before transliterating, see #8. At the time of writing of that issue, I had not found an IPA converter for German.

Another more general approach could be combining gpt-4o-mini or similar, which is very cheap now, with some rule-based scaffolding.

@wollmers do you know about German (or ideally, multilingual) IPA converters, or do you have ideas about alternative approaches (I see you've been working on syllable counting)?

@wollmers
Copy link
Author

wollmers commented Dec 8, 2024

@davidpomerenke

There is a German IPA dictionary https://github.com/devio-at/german-ipa-dict. With some modification of the Python scripts, it should be possible to extract IPA for EN, which includes syllable boundaries like 'lexicon' IPA: /ˈlɛk.sɪ.kən/.

For counting syllables I use https://metacpan.org/pod/Text::Hyphen::DE, which uses the Knuth-Liang algorithm (made for TeX). Maybe you find an implementation in JavaScript or you port it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants