You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your effort and sharing the code!
The architecture blocks in the speech prompted text encoder and CFM decoder differ from the initial ones introduced in the paper. I would like to know what made you do the changes. Was the model not converging with official architecture?
The text was updated successfully, but these errors were encountered:
I wanted to reproduce the results from the paper, so I used this repo (master branch) to train the model on LibriTTS. I trained it for 800k steps and longer but the overall generation quality is quite far from the official demo.
Have you tried reproducing the results?
Hi, thanks for your effort and sharing the code!
The architecture blocks in the speech prompted text encoder and CFM decoder differ from the initial ones introduced in the paper. I would like to know what made you do the changes. Was the model not converging with official architecture?
The text was updated successfully, but these errors were encountered: