-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consolidate spectrogram dimensions #572
base: main
Are you sure you want to change the base?
Conversation
Changed Files
|
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #572 +/- ##
=======================================
Coverage 76.74% 76.74%
=======================================
Files 46 46
Lines 3445 3445
Branches 470 470
=======================================
Hits 2644 2644
Misses 700 700
Partials 101 101 ☔ View full report in Codecov by Sentry. |
@roedoejet I think we might have an issues with this. For my first test, I trained a Vocoder ( not issues here) . Then I tried to use that vocoder for training the FP model but it keeps on crashing with this message below. ( see attachment for the full error log. ) ( I will try other things like you listed to see how that behaves.
|
FYI, I also get the same issue when using I managed to get the FP training to work by removing the reference to vocoder " vocoder_path: " in |
@marctessier - did you maybe not re-run preprocessing? The old mel spectrograms that were calculated will have to be re-processed ( EDIT: nevermind! I see what you mean. training worked for me, but when I added a vocoder checkpoint it failed during the validation step - nice catch! I fixed this and it should now be ready to go. |
Since this is a breaking change, and it's possible some users will have preprocessed files saved, I'd like to see some heuristic tests that gives a friendly error message if the input looks transposed, with instructions telling the user what to rerun. |
This looks good to me, actually. It'll need rebasing in the submodules and a small conflict resolution. |
Previously, there was an issue where:
#513 noticed this problem when synthesizing from certain freq-oriented tensors. Our models work with time-oriented tensors, but it's more standard to have frequency/Mel-band oriented tensors when saving spectrograms (this is the default in torchaudio, librosa etc). Since the output files should be as interoperable as possible, I've consolidated our read/write operations to use [K, T] orientation throughout (i.e. changing text-to-spec synthesis to output [K, T] tensors and expecting [K, T] tensors during spec-to-wav synthesis.
I also moved a log message that said "Loading Vocoder from None" which was annoying. And I replaced writing the wav files with scipy to torchaudio since I started getting some bit depth errors with the spec-to-wav synthesis.
PR Goal?
Ideally this should just work going forward. You should be able to:
--time-oriented
flagFixes?
#513
Feedback sought?
Sanity. I've tested the above expectations 1-5 but please try one or some of them to corroborate and write a comment for which things you tested.
Priority?
medium-high (synthesizing from non-time-oriented spectrograms causes an error right now)
Tests added?
How to test?
Try doing some of the things described in the PR Goal
Confidence?
medium
Version change?
This is a breaking change but we'll just include it in alpha.
Related PRs?
EveryVoiceTTS/FastSpeech2_lightning#94
EveryVoiceTTS/HiFiGAN_iSTFT_lightning#39