-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94
Conversation
Changed Files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, modulo some questions in the comments below.
assert "output" in outputs and outputs["output"] is not None | ||
assert wavs.shape[0] == outputs["output"].size( | ||
wavs.ndim == 3 | ||
), f"The generated audio did not contain 3 dimensions. First dimension should be B(atch) and the second dimension should be C(hannels) and third dimension should be T(ime) in samples. Got {wavs.shape} instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen due to a user error (like providing the wrong kind of input file), or is this strictly due to a programmer error? If the latter, OK, if the former, I don't like using assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could happen due to a user error. What error would you prefer?
basename=basename, | ||
speaker=speaker, | ||
language=language, | ||
torchaudio.save( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the change of audio writer function related to this PR, or just an unrelated improvement? I assume you've tested and you can confirm this works well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes just a consolidation/refactor since we use torchaudio everywhere else. tested and works for me.
fs2/prediction_writing_callback.py
Outdated
data[:unmasked_len] | ||
.cpu() | ||
.transpose(0, 1), # save tensors as [K (bands), T (frames)] | ||
str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't use to need to cast this Path to a str, I wonder why you do now. In my PR #102, get_filename
is factored out to the base class, we should have it do return str(path)
in one place, in get_filename
, instead of casting everywhere we use it.
Warning: Whichever PR is merged second will have to rebase and resolve conflicts over the use of get_filename
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased and fixed using my suggestion here
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #94 +/- ##
==========================================
- Coverage 46.24% 46.13% -0.12%
==========================================
Files 22 22
Lines 1464 1461 -3
==========================================
- Hits 677 674 -3
Misses 787 787 ☔ View full report in Codecov by Sentry. |
See EveryVoiceTTS/EveryVoice#572