refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

roedoejet · 2024-10-30T01:18:17Z

semanticdiff-com · 2024-10-30T01:18:19Z

Review changes with

Changed Files

File	Status
fs2/prediction_writing_callback.py	17% smaller
fs2/cli/synthesize.py	16% smaller
fs2/model.py	0% smaller

joanise

This looks good, modulo some questions in the comments below.

joanise · 2024-12-09T22:16:58Z

fs2/prediction_writing_callback.py

-        assert "output" in outputs and outputs["output"] is not None
-        assert wavs.shape[0] == outputs["output"].size(
+            wavs.ndim == 3
+        ), f"The generated audio did not contain 3 dimensions. First dimension should be B(atch) and the second dimension should be C(hannels) and third dimension should be T(ime) in samples. Got {wavs.shape} instead."


Can this happen due to a user error (like providing the wrong kind of input file), or is this strictly due to a programmer error? If the latter, OK, if the former, I don't like using assert.

It could happen due to a user error. What error would you prefer?

joanise · 2024-12-09T22:20:39Z

fs2/prediction_writing_callback.py

-                    basename=basename,
-                    speaker=speaker,
-                    language=language,
+            torchaudio.save(


Is the change of audio writer function related to this PR, or just an unrelated improvement? I assume you've tested and you can confirm this works well?

yes just a consolidation/refactor since we use torchaudio everywhere else. tested and works for me.

joanise · 2024-12-09T22:26:38Z

fs2/prediction_writing_callback.py

+                data[:unmasked_len]
+                .cpu()
+                .transpose(0, 1),  # save tensors as [K (bands), T (frames)]
+                str(


We didn't use to need to cast this Path to a str, I wonder why you do now. In my PR #102, get_filename is factored out to the base class, we should have it do return str(path) in one place, in get_filename, instead of casting everywhere we use it.
Warning: Whichever PR is merged second will have to rebase and resolve conflicts over the use of get_filename.

rebased and fixed using my suggestion here

…f time-oriented ones

…aining too

codecov · 2024-12-10T17:56:55Z

Codecov Report

Attention: Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.

Project coverage is 46.13%. Comparing base (2afc610) to head (9549082).
Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
fs2/model.py	0.00%	2 Missing ⚠️
fs2/cli/synthesize.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
- Coverage   46.24%   46.13%   -0.12%     
==========================================
  Files          22       22              
  Lines        1464     1461       -3     
==========================================
- Hits          677      674       -3     
  Misses        787      787

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

roedoejet mentioned this pull request Oct 30, 2024

consolidate spectrogram dimensions EveryVoiceTTS/EveryVoice#572

Merged

joanise approved these changes Dec 9, 2024

View reviewed changes

roedoejet added 3 commits December 10, 2024 12:26

refactor!: change model to output mel-band oriented tensors instead o…

5b5abe0

…f time-oriented ones

fix: write strings not paths using torch

229645c

fix: the vocoder expects [B, K, T] tensors and this applies during tr…

9549082

…aining too

joanise force-pushed the dev.ap/513 branch from 47b94e9 to 9549082 Compare December 10, 2024 17:42

roedoejet merged commit 9549082 into main Dec 17, 2024
6 checks passed

roedoejet deleted the dev.ap/513 branch December 17, 2024 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

roedoejet commented Oct 30, 2024 •

edited

Loading

semanticdiff-com bot commented Oct 30, 2024 •

edited

Loading

joanise left a comment

joanise Dec 9, 2024

roedoejet Dec 17, 2024

joanise Dec 9, 2024

roedoejet Dec 17, 2024

joanise Dec 9, 2024

joanise Dec 10, 2024

codecov bot commented Dec 10, 2024 •

edited

Loading

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

Conversation

roedoejet commented Oct 30, 2024 • edited Loading

semanticdiff-com bot commented Oct 30, 2024 • edited Loading

joanise left a comment

Choose a reason for hiding this comment

joanise Dec 9, 2024

Choose a reason for hiding this comment

roedoejet Dec 17, 2024

Choose a reason for hiding this comment

joanise Dec 9, 2024

Choose a reason for hiding this comment

roedoejet Dec 17, 2024

Choose a reason for hiding this comment

joanise Dec 9, 2024

Choose a reason for hiding this comment

joanise Dec 10, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 10, 2024 • edited Loading

Codecov Report

roedoejet commented Oct 30, 2024 •

edited

Loading

semanticdiff-com bot commented Oct 30, 2024 •

edited

Loading

codecov bot commented Dec 10, 2024 •

edited

Loading