Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Style Token Module #605

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Global Style Token Module #605

wants to merge 5 commits into from

Conversation

roedoejet
Copy link
Member

@roedoejet roedoejet commented Nov 29, 2024

PR Goal?

Add a Global Style Token Module à la https://arxiv.org/abs/1803.09017

Fixes?

#293

Feedback sought?

Sanity. Check training/synthesis with GST turned on

Priority?

medium

Tests added?

all for model code, so no added tests, but I still need to update existing tests since the commands and schemas have changed.

How to test?

  1. train a new model and try synthesizing with a style reference.

Confidence?

Modelling - medium
I'm medium-confident for the modelling side since it has successfully trained models and seems to result in better models when training with noisy data. That said, I'm not including it by default due to the extra complications at inference time (you have to provide a reference audio)

Versioning - low
Here's an example of a new item in a config. Should I be doing something in the versioning to automatically add use_global_style_token_module=False if Version==1.0 and that key is not found? Should I also be bumping up the version of the config here?

Version change?

minor version bump for FastSpeech2 and change to schemas

Related PRs?

EveryVoiceTTS/FastSpeech2_lightning#100

Copy link

semanticdiff-com bot commented Nov 29, 2024

Review changes with  SemanticDiff

Changed Files
File Status
  everyvoice/demo/app.py  2% smaller
  everyvoice/model/feature_prediction/FastSpeech2_lightning  0% smaller

@joanise
Copy link
Member

joanise commented Dec 9, 2024

For config file versioning:

  • yes, you need a minor bump, touching the schemas automatically require a minor bump
  • yes, you should write logic so that when we load a model written with an older version, the new parameter is instantiated with the value that keeps the old semantics, i.e. with False in this case. Samuel left hooks in place just for that when we load models.

@joanise
Copy link
Member

joanise commented Dec 9, 2024

  • yes, you should write logic so that when we load a model written with an older version, the new parameter is instantiated with the value that keeps the old semantics, i.e. with False in this case. Samuel left hooks in place just for that when we load models

Oh, since you made the field optional and defaulting to False, this may already all happen automatically for you. The test would be to load a model written with 0.2, and write it back with 0.3, if it works as expected it's all good.

@joanise
Copy link
Member

joanise commented Dec 9, 2024

  • yes, you need a minor bump, touching the schemas automatically require a minor bump

And let's bump to 0.3.0, without the a element, as discussed at the meeting today.

@marctessier
Copy link
Collaborator

When I run all tests, I get this error below or see attached log file:

======================================================================
ERROR: test_filelist_language (model.feature_prediction.FastSpeech2_lightning.fs2.tests.test_cli.PrepareSynthesizeDataTest)
Use a different language than the one provided in the filelist.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_pr605_gst/everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py", line 117, in test_filelist_language
    data = prepare_synthesize_data(
TypeError: prepare_data() missing 1 required positional argument: 'style_reference'

======================================================================
ERROR: test_filelist_speaker (model.feature_prediction.FastSpeech2_lightning.fs2.tests.test_cli.PrepareSynthesizeDataTest)
Use a different speaker than the one provided in the filelist.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_pr605_gst/everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py", line 140, in test_filelist_speaker
    data = prepare_synthesize_data(
TypeError: prepare_data() missing 1 required positional argument: 'style_reference'

======================================================================
ERROR: test_plain_filelist (model.feature_prediction.FastSpeech2_lightning.fs2.tests.test_cli.PrepareSynthesizeDataTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/TxT2SPEECH/EveryVoice_pr605_gst/everyvoice/model/feature_prediction/FastSpeech2_lightning/fs2/tests/test_cli.py", line 166, in test_plain_filelist
    data = prepare_synthesize_data(
TypeError: prepare_data() missing 1 required positional argument: 'style_reference'

----------------------------------------------------------------------
Ran 213 tests in 81.990s

FAILED (errors=3)
============ Finished job 3365462 on Wed 11 Dec 2024 02:59:24 PM EST with rc=1

TEST-all.e3365462.txt

@marctessier
Copy link
Collaborator

I tried to train a FP , it starts but crashed before finishing the first epoch when I run as-is with this : use_global_style_token_module: true

RuntimeError: It looks like your LightningModule has parameters that were not 
used in producing the loss returned by training_step. If this is intentional, 
you must enable the detection of unused parameters in DDP, either by setting the
string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag
in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.
Loading EveryVoice modules: 100%|██████████| 4/4 [00:10<00:00,  2.73s/it]   
srun: error: ib14gpu-002: task 0: Exited with exit code 1

gst.e3365525.txt
gst.o3365525.txt

@marctessier
Copy link
Collaborator

If I use use_global_style_token_module: false training will work...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants