Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UA+RU dicts should have accents #26

Open
hypnaceae opened this issue Dec 13, 2023 · 0 comments
Open

UA+RU dicts should have accents #26

hypnaceae opened this issue Dec 13, 2023 · 0 comments

Comments

@hypnaceae
Copy link

hypnaceae commented Dec 13, 2023

Ukrainian and Russian have many words that are homographs and are disambiguated in speech using syllable stress, or (in text) using context or diacritics.

Example:

до́ма: [ˈdomə]
дома́: [dɐˈma]

This is represented in the MFA dict as:

дома	0.99	0.55	0.56	1.1	d̪ o m ə
дома	0.1	0.44	1.18	0.93	d̪ ɐ m a

It would make sense to include accent markers in dict entries for compatibility with TTS systems that use auto-accenting for disambiguation at runtime - which is all of them, as far as I'm aware. Supplying accents would reduce the inherent ambiguity in the dict and eliminate the unnecessary reliance on probabilistic identification at MFA runtime, for words that are homographs.

Like so:

до́ма	0.99	0.55	0.56	1.1	d̪ o m ə
дома́	0.1	0.44	1.18	0.93	d̪ ɐ m a

Or so:

до+ма	0.99	0.55	0.56	1.1	d̪ o m ə
дома+	0.1	0.44	1.18	0.93	d̪ ɐ m a

Caveat: this would require transcriptions to have accents, so an extra check would need to be added in aligner code - to ignore accents in dict and fallback to probs (i.e the current behaviour) if the transcription is not accented. It is also not entirely trivial to add accents back into the dict properly as a third party - ideally this would be done during dict generation, hence this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant