You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ukrainian and Russian have many words that are homographs and are disambiguated in speech using syllable stress, or (in text) using context or diacritics.
Example:
до́ма: [ˈdomə]
дома́: [dɐˈma]
This is represented in the MFA dict as:
дома 0.99 0.55 0.56 1.1 d̪ o m ə
дома 0.1 0.44 1.18 0.93 d̪ ɐ m a
It would make sense to include accent markers in dict entries for compatibility with TTS systems that use auto-accenting for disambiguation at runtime - which is all of them, as far as I'm aware. Supplying accents would reduce the inherent ambiguity in the dict and eliminate the unnecessary reliance on probabilistic identification at MFA runtime, for words that are homographs.
Like so:
до́ма 0.99 0.55 0.56 1.1 d̪ o m ə
дома́ 0.1 0.44 1.18 0.93 d̪ ɐ m a
Or so:
до+ма 0.99 0.55 0.56 1.1 d̪ o m ə
дома+ 0.1 0.44 1.18 0.93 d̪ ɐ m a
Caveat: this would require transcriptions to have accents, so an extra check would need to be added in aligner code - to ignore accents in dict and fallback to probs (i.e the current behaviour) if the transcription is not accented. It is also not entirely trivial to add accents back into the dict properly as a third party - ideally this would be done during dict generation, hence this issue.
The text was updated successfully, but these errors were encountered:
Ukrainian and Russian have many words that are homographs and are disambiguated in speech using syllable stress, or (in text) using context or diacritics.
Example:
This is represented in the MFA dict as:
It would make sense to include accent markers in dict entries for compatibility with TTS systems that use auto-accenting for disambiguation at runtime - which is all of them, as far as I'm aware. Supplying accents would reduce the inherent ambiguity in the dict and eliminate the unnecessary reliance on probabilistic identification at MFA runtime, for words that are homographs.
Like so:
Or so:
Caveat: this would require transcriptions to have accents, so an extra check would need to be added in aligner code - to ignore accents in dict and fallback to probs (i.e the current behaviour) if the transcription is not accented. It is also not entirely trivial to add accents back into the dict properly as a third party - ideally this would be done during dict generation, hence this issue.
The text was updated successfully, but these errors were encountered: