You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realized that the output is not equivalent to those provided by the numpy implementation. And it isn't merely the values, but also the output shapes. I was just wondering if this is intended behavior, since the code comments that the results are close to the numpy/scipy but slightly differs.
Here's the code I used to test. Wav file used is LJ001-0001.wav from LJSpeech dataset ver 1.1, which I downsampled to 16kHz.
Considering the repos which use YAAPT as a part of their data preprocessing pipeline (example), I think it would be really nice to align both YAPPT outputs, or at least to have a compliant version.
Also would it be ok if I were to try correct this?
The text was updated successfully, but these errors were encountered:
Thanks for your comment.
This one-night project aimed to overcome the Facebook research/speech-resynthesis data loader issue ;).
In my testing (which wasn't thorough), I only observed a slight difference, but your testing indicates that what I thought was a slight difference is quite large. (having proper testing on multiple files would be nice).
I also never observed the shape issue; I thought this implementation matched the numpy/scipy one in this regard (it is not a huge issue if the difference is 4 features, I believe).
For the samp_interp output, it is unnecessary to put it here; this can be performed elsewhere and in other ways.
But yeah, for the samp_values, it would be good to align with the numpy/scipy implementation.
You are more than welcome to try to correct this!
If I recall correctly, a scipy function didn't exist in Pytroch, so I didn't use it, and it explained some output differences. I don't remember which scipy function I did not use; you will have to do your investigation.
If you fix this, I'll add you as an author on this repo.
Thanks for considering this work.
Best
Hi, thanks for your work.
I realized that the output is not equivalent to those provided by the numpy implementation. And it isn't merely the values, but also the output shapes. I was just wondering if this is intended behavior, since the code comments that the results are close to the numpy/scipy but slightly differs.
Here's the code I used to test. Wav file used is LJ001-0001.wav from LJSpeech dataset ver 1.1, which I downsampled to 16kHz.
Now, we see that the sizes are different:
and the values are different as well, although there is a resemblance.
Considering the repos which use YAAPT as a part of their data preprocessing pipeline (example), I think it would be really nice to align both YAPPT outputs, or at least to have a compliant version.
Also would it be ok if I were to try correct this?
The text was updated successfully, but these errors were encountered: