Phone set in Basque CV dictionary v2.0.0 #23
Replies: 3 comments 1 reply
-
Hi Jose, Here are a couple ways I can think of answering your last questions.
|
Beta Was this translation helpful? Give feedback.
-
Emily, correct me if I'm wrong, but there's also no Spanish dictionaries/models as part of Vox Communis, right? The MFA phone set seeks to be a bit more narrow and phonetic (and it's based on community generated transcriptions via Wikipron), so that's why it's going to be larger than a strictly phonemic phone set than more rule based XPF and/or Epitran produces. You can see the general MFA phone set details and Spanish specific notes here: https://mfa-models.readthedocs.io/en/latest/mfa_phone_set.html#spanish. If you want to train a new Spanish model with a broader transcription (i.e., using the Spanish XPF rules to generate a dictionary), you can see the data sources used in training here: https://mfa-models.readthedocs.io/en/latest/acoustic/Spanish/Spanish%20MFA%20acoustic%20model%20v2_0_0a.html. |
Beta Was this translation helpful? Give feedback.
-
Thank you Emily and Michael for your replies, I have a clearer idea now of how I should proceed. I might come back here if I get stuck at some point. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi there!
I'm working on a Basque-Spanish bilingual processing project, using the MFA as aligner. First of all, thank you very much for this wonderful tool, it's really impressive how useful it is for our research.
I have a question regarding how the phone sets were set for the Basque CV dictionary v2.0.0 and the Spanish (Spain) MFA dictionary v2.0.0. In theory the phone sets should be fairly similar (but for a few sounds like [ts̺] or [ts̻] that are only present in Basque). However, the phone set of the Spanish pronunciation dictionary is considerably larger than Basque. For instance, it contains the [w] phone as an adaptation of the /u/ phoneme, while the Basque dictionary does not include it and this adaptation also takes place in Basque.
Given that we are investigating bilingual sound processing, we would like to have as comparable as possible phone sets across both languages. Is there anyway of retraining or adapting the acoustic models so they allow for fairly similar phone sets? Any piece of information on how the phone sets are achieved would also be appreciated.
Thank you very much!
Beta Was this translation helpful? Give feedback.
All reactions