You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there,
I am working on a building a new dataset in Spanish (polysyllabic language). I have gone though MakeDiffSinger but I still have some gaps. I would be grateful if you could sanity check me on my understanding and share any thoughts you might have
Questions for clarifications:
ph_seq: These are sequences of phonemes or syllables?
Currently I using phonemes and their timestamps as provided by MFA. I am using a pre-trained Spanish model available by MFA. Would you recommend training a new one on my specific data?
note_dur: The midi notes should be estimated over phonemes, syllables, or words?
Now I estimated one note for each phoneme and assumed ph_dur==note_dure
ph_num: The number of phonemes in each word or in each syllable?
Now I assumed the number of phonemes in each word
note_seq: Do you think SOME would suffice to get a first shot at this ? I would speculate yes?
is_slur: how would you define slur in this context? I have not found plenty of resources on this topic
Now I assumed no slurs at all
SPs and APs: Would you recommend doing that manually or using the enhance script might be OK for a first shot?
Thanks!
The text was updated successfully, but these errors were encountered:
MikeMpapa
changed the title
Clarification on annotation
Clarifications on annotation
Oct 14, 2024
Hi there,
I am working on a building a new dataset in Spanish (polysyllabic language). I have gone though MakeDiffSinger but I still have some gaps. I would be grateful if you could sanity check me on my understanding and share any thoughts you might have
Questions for clarifications:
ph_seq: These are sequences of phonemes or syllables?
Currently I using phonemes and their timestamps as provided by MFA. I am using a pre-trained Spanish model available by MFA. Would you recommend training a new one on my specific data?
note_dur: The midi notes should be estimated over phonemes, syllables, or words?
Now I estimated one note for each phoneme and assumed ph_dur==note_dure
ph_num: The number of phonemes in each word or in each syllable?
Now I assumed the number of phonemes in each word
note_seq: Do you think SOME would suffice to get a first shot at this ? I would speculate yes?
is_slur: how would you define slur in this context? I have not found plenty of resources on this topic
Now I assumed no slurs at all
SPs and APs: Would you recommend doing that manually or using the enhance script might be OK for a first shot?
Thanks!
The text was updated successfully, but these errors were encountered: