Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inequivalence to the numpy implementation #1

Open
stet-stet opened this issue Sep 26, 2024 · 1 comment
Open

Inequivalence to the numpy implementation #1

stet-stet opened this issue Sep 26, 2024 · 1 comment

Comments

@stet-stet
Copy link

stet-stet commented Sep 26, 2024

Hi, thanks for your work.

I realized that the output is not equivalent to those provided by the numpy implementation. And it isn't merely the values, but also the output shapes. I was just wondering if this is intended behavior, since the code comments that the results are close to the numpy/scipy but slightly differs.

Here's the code I used to test. Wav file used is LJ001-0001.wav from LJSpeech dataset ver 1.1, which I downsampled to 16kHz.

import yaapt_torch
import amfm_decompy.basic_tools as basic
import amfm_decompy.pYAAPT as pYAAPT
import soundfile as sf 
import numpy as np

example_file = "LJSpeech-1.1/wavs_16khz/LJ001-0001.wav"
y, sr = sf.read(example_file)

yaapt_opts = {
    "sr": sr,
    "frame_length": 20.0,
    "frame_space": 5.0,
    "nccf_thresh1": 0.25,
    "tda_frame_length": 25.0,
}

frame_length = 20.0
to_pad = int(frame_length / 1000 * 16000) //2
y_pad = np.pad(y.squeeze(), (to_pad, to_pad), "constant", constant_values=0)
signal = basic.SignalObj(y_pad, 16000)
numpy_pitch = pYAAPT.yaapt(signal, **{
    "frame_length": 20.0,
    "frame_space": 5.0,
    "nccf_thresh1": 0.25,
    "tda_frame_length": 25.0
})

import torch
torch_pitch = yaapt_torch.yaapt(
    torch.tensor(y_pad).unsqueeze(0), 
    yaapt_opts
)

Now, we see that the sizes are different:

> torch_pitch.shape
torch.size([1,1940])

> numpy_pitch.samp_values.shape
(1936,)

and the values are different as well, although there is a resemblance.

> torch_pitch[:, 100:120]
tensor([[231.8841, 238.8060, 246.1538, 250.0000, 253.9682, 262.2951, 271.1864,
         275.8621, 285.7143, 290.9091, 301.8868, 307.6923, 320.0000, 326.5306,
         333.3333, 340.4255, 347.8261, 347.8261, 347.8261, 355.5555]])

> numpy_pitch.samp_values[100:120]
array([242.42424242, 246.15384615, 253.96825397, 258.06451613,
       262.29508197, 271.18644068, 275.86206897, 285.71428571,
       285.71428571, 285.71428571, 285.71428571, 173.91304348,
       173.91304348, 173.91304348, 173.91304348, 175.82417582,
       175.82417582, 175.82417582, 175.82417582, 173.91304348])

> numpy_pitch.samp_interp[100:120]
array([242.42424242, 246.15384615, 253.96825397, 258.06451613,
       262.29508197, 271.18644068, 275.86206897, 285.71428571,
       285.71428571, 285.71428571, 285.71428571, 173.91304348,
       173.91304348, 173.91304348, 173.91304348, 175.82417582,
       175.82417582, 175.82417582, 175.82417582, 174.19267129])

Considering the repos which use YAAPT as a part of their data preprocessing pipeline (example), I think it would be really nice to align both YAPPT outputs, or at least to have a compliant version.

Also would it be ok if I were to try correct this?

@pchampio
Copy link
Owner

pchampio commented Sep 26, 2024

Thanks for your comment.
This one-night project aimed to overcome the Facebook research/speech-resynthesis data loader issue ;).
In my testing (which wasn't thorough), I only observed a slight difference, but your testing indicates that what I thought was a slight difference is quite large. (having proper testing on multiple files would be nice).
I also never observed the shape issue; I thought this implementation matched the numpy/scipy one in this regard (it is not a huge issue if the difference is 4 features, I believe).

For the samp_interp output, it is unnecessary to put it here; this can be performed elsewhere and in other ways.
But yeah, for the samp_values, it would be good to align with the numpy/scipy implementation.

You are more than welcome to try to correct this!

If I recall correctly, a scipy function didn't exist in Pytroch, so I didn't use it, and it explained some output differences. I don't remember which scipy function I did not use; you will have to do your investigation.
If you fix this, I'll add you as an author on this repo.
Thanks for considering this work.
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants