Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] (maybe) : mfa acoustic models crash on alignmentError (Russian, Spanish) #798

Open
maxime-fily opened this issue Apr 23, 2024 · 3 comments
Assignees
Labels

Comments

@maxime-fily
Copy link

Hi,
I'm not 100% sure that this is a bug, but I am getting issues on super small and super simple corpora for russian :

Debugging checklist

[ ] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there? Yes
[ ] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? 3.0.6
[ ] Have you tried rerunning the command with the --clean flag? yes

Describe the issue
A clear and concise description of what the bug is.
when I use russian_mfa dict and acoustic model, I systematically get an "alignmentError", after the validate step worked.
The alignment works functionnally with russian_cv but it is my understanding that if I get to have russain_mfa to work I will get much better alignment performance. Which is why I'd like to insist abit on getting russian_mfa to work.

For Reproducing your issue
Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? Russian
    • How many files/speakers? 2/2
    • Are you using lab files or TextGrid files for input? lab
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? russian_mfa
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? russian_mfa
    • If it's a model you've trained, what data was it trained on?

Log file

Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

2024-04-23 16:06:59,330 - mfa - DEBUG - Beginning run for test_RU
2024-04-23 16:06:59,330 - mfa - DEBUG - Using "global" profile
2024-04-23 16:06:59,330 - mfa - DEBUG - Using multiprocessing with 3
2024-04-23 16:06:59,330 - mfa - DEBUG - Set up logger for MFA version: 3.0.6
2024-04-23 16:06:59,522 - mfa - INFO - Setting up corpus information...
2024-04-23 16:06:59,524 - mfa - DEBUG - Successfully loaded from temporary files
2024-04-23 16:06:59,529 - mfa - INFO - Found 2 speakers across 2 files, average number of utterances per speaker: 1.0
2024-04-23 16:06:59,530 - mfa - DEBUG - Loaded corpus in 0.008 seconds
2024-04-23 16:06:59,531 - mfa - INFO - Jobs already initialized.
2024-04-23 16:06:59,532 - mfa - DEBUG - Initialized jobs in 0.002 seconds
2024-04-23 16:06:59,532 - mfa - INFO - Text already normalized.
2024-04-23 16:07:01,164 - mfa - INFO - Features already generated.
2024-04-23 16:07:01,164 - mfa - DEBUG - Generated features in 0.001 seconds
2024-04-23 16:07:01,164 - mfa - DEBUG - Setting up corpus took 1.643 seconds
2024-04-23 16:07:01,164 - mfa - DEBUG -
2024-04-23 16:07:01,164 - mfa - DEBUG - ====ACOUSTIC MODEL INFO====
2024-04-23 16:07:01,164 - mfa - DEBUG - Acoustic model root directory: /home/mfily/Documents/MFA/extracted_models/acoustic
2024-04-23 16:07:01,164 - mfa - DEBUG - Acoustic model dirname: /home/mfily/Documents/MFA/extracted_models/acoustic/russian_mfa_acoustic
2024-04-23 16:07:01,164 - mfa - DEBUG - Acoustic model meta path: /home/mfily/Documents/MFA/extracted_models/acoustic/russian_mfa_acoustic/meta.json
2024-04-23 16:07:01,164 - mfa - DEBUG - Acoustic model meta information:
2024-04-23 16:07:01,167 - mfa - DEBUG - architecture: gmm-hmm
dictionaries:
bracketed_word: '[bracketed]'
clitic_marker: ''''
default: russian_mfa
laughter_word: '[laughter]'
names:

  • russian_mfa
    oov_word:
    position_dependent_phones: true
    silence_word:
    use_g2p: false
    features:
    allow_downsample: true
    allow_upsample: true
    delta_pitch: 0.005
    feature_type: mfcc
    frame_length: 25
    frame_shift: 10
    high_frequency: 7800
    low_frequency: 20
    max_f0: 500
    min_f0: 50
    penalty_factor: 0.1
    sample_frequency: 16000
    snip_edges: true
    use_delta_pitch: true
    use_energy: false
    use_pitch: true
    use_voicing: true
    uses_cmvn: true
    uses_deltas: false
    uses_speaker_adaptation: true
    uses_splices: true
    uses_voiced: false
    final_non_silence_correction: 0.1
    final_silence_correction: 2.06
    initial_silence_probability: 0.21
    language: unknown
    oov_phone: spn
    optional_silence_phone: sil
    phone_set_type: IPA
    phone_type: triphone
    phones: !!set
    a: null
    b: null
    "b\u02B2": null
    "b\u02B2\u02D0": null
    "b\u02D0": null
    c: null
    "c\u02D0": null
    "dz\u02B2": null
    "dz\u02B2\u02D0": null
    "d\u0290": null
    "d\u0290\u02D0": null
    "d\u02B2": null
    "d\u02B2\u02D0": null
    "d\u032A": null
    "d\u032Az\u032A": null
    "d\u032Az\u032A\u02D0": null
    "d\u032A\u02D0": null
    e: null
    f: null
    "f\u02B2": null
    "f\u02B2\u02D0": null
    "f\u02D0": null
    i: null
    j: null
    "j\u02D0": null
    k: null
    "k\u02D0": null
    m: null
    "m\u02B2": null
    "m\u02B2\u02D0": null
    "m\u02D0": null
    "n\u032A": null
    "n\u032A\u02D0": null
    o: null
    p: null
    "p\u02B2": null
    "p\u02B2\u02D0": null
    "p\u02D0": null
    r: null
    "r\u02B2": null
    "r\u02B2\u02D0": null
    "r\u02D0": null
    "s\u02B2": null
    "s\u02B2\u02D0": null
    "s\u032A": null
    "s\u032A\u02D0": null
    "ts\u02B2": null
    "t\u0255": null
    "t\u0255\u02D0": null
    "t\u0282": null
    "t\u0282\u02D0": null
    "t\u02B2": null
    "t\u02B2\u02D0": null
    "t\u032A": null
    "t\u032As\u032A": null
    "t\u032As\u032A\u02D0": null
    "t\u032A\u02D0": null
    u: null
    v: null
    "v\u02B2": null
    "v\u02B2\u02D0": null
    "v\u02D0": null
    x: null
    "x\u02D0": null
    "z\u02B2": null
    "z\u02B2\u02D0": null
    "z\u032A": null
    "z\u032A\u02D0": null
    "\xE6": null
    "\xE7": null
    "\u0250": null
    "\u0255": null
    "\u0255\u02D0": null
    "\u0259": null
    "\u025B": null
    "\u025F": null
    "\u025F\u02D0": null
    "\u0261": null
    "\u0261\u02D0": null
    "\u0263": null
    "\u0268": null
    "\u026A": null
    "\u026B": null
    "\u026B\u02D0": null
    "\u0272": null
    "\u0272\u02D0": null
    "\u0275": null
    "\u0282": null
    "\u0282\u02D0": null
    "\u0289": null
    "\u028A": null
    "\u028E": null
    "\u028E\u02D0": null
    "\u0290": null
    "\u0290\u02D0": null
    "\u0291": null
    "\u0291\u02D0": null
    silence_probability: 0.21470002836195884
    train_date: '2022-05-20 00:19:00.209288'
    training:
    audio_duration: 1385812.0378750225
    average_log_likelihood: -0.01083423597600454
    num_oovs: 0
    num_speakers: 2355
    num_utterances: 232271
    version: 2.0.0rc4.dev19+ged818cb.d20220404

2024-04-23 16:07:01,167 - mfa - DEBUG -
2024-04-23 16:07:01,167 - mfa - DEBUG - Setup for alignment in 1.785 seconds
2024-04-23 16:07:01,317 - mfa - INFO - Compiling training graphs...
2024-04-23 16:07:04,363 - mfa - DEBUG - Compiling training graphs took 3.046 seconds
2024-04-23 16:07:04,364 - mfa - INFO - Performing first-pass alignment...
2024-04-23 16:07:04,367 - mfa - INFO - Generating alignments...
2024-04-23 16:07:05,644 - mfa - ERROR - There was an error in the run, please see the log.

Desktop (please complete the following information):

  • OS: [e.g. Windows, OSX, Linux] : Linux
  • Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc] Ubuntu 22.04
  • Any other details about the setup (Cloud, Docker, etc)
    Same issue whether I run directly on a terminal or via docker
    Additional context
    Add any other context about the problem here.
    The issue is similar for Spanish, which is all the more surprising. But maybe it is that these acoustic models are less robust than French and English... Indeed, these two work OK.
@maxime-fily
Copy link
Author

maxime-fily commented May 2, 2024

Hi. I finally ended up retraining the acoustic module on the CV dataset for Russian. It works much better now. Should I post the retrained acoustic model somewhere ?

@mmcauliffe
Copy link
Member

I've updated the russian_mfa dictionary and model, which should work better. You can download it via:

mfa model download dictionary russian_mfa --ignore_cache
mfa model download acoustic russian_mfa --ignore_cache

@maxime-fily
Copy link
Author

Hi, thanks a lot ! I will make sure to check it out !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants