UA+RU dicts should have accents #26

hypnaceae · 2023-12-13T13:01:22Z

Ukrainian and Russian have many words that are homographs and are disambiguated in speech using syllable stress, or (in text) using context or diacritics.

Example:

до́ма: [ˈdomə]
дома́: [dɐˈma]

This is represented in the MFA dict as:

дома	0.99	0.55	0.56	1.1	d̪ o m ə
дома	0.1	0.44	1.18	0.93	d̪ ɐ m a

It would make sense to include accent markers in dict entries for compatibility with TTS systems that use auto-accenting for disambiguation at runtime - which is all of them, as far as I'm aware. Supplying accents would reduce the inherent ambiguity in the dict and eliminate the unnecessary reliance on probabilistic identification at MFA runtime, for words that are homographs.

Like so:

до́ма	0.99	0.55	0.56	1.1	d̪ o m ə
дома́	0.1	0.44	1.18	0.93	d̪ ɐ m a

Or so:

до+ма	0.99	0.55	0.56	1.1	d̪ o m ə
дома+	0.1	0.44	1.18	0.93	d̪ ɐ m a

Caveat: this would require transcriptions to have accents, so an extra check would need to be added in aligner code - to ignore accents in dict and fallback to probs (i.e the current behaviour) if the transcription is not accented. It is also not entirely trivial to add accents back into the dict properly as a third party - ideally this would be done during dict generation, hence this issue.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UA+RU dicts should have accents #26

UA+RU dicts should have accents #26

hypnaceae commented Dec 13, 2023 •

edited

Loading

UA+RU dicts should have accents #26

UA+RU dicts should have accents #26

Comments

hypnaceae commented Dec 13, 2023 • edited Loading

hypnaceae commented Dec 13, 2023 •

edited

Loading