Inequivalence to the numpy implementation #1

stet-stet · 2024-09-26T19:26:10Z

Hi, thanks for your work.

I realized that the output is not equivalent to those provided by the numpy implementation. And it isn't merely the values, but also the output shapes. I was just wondering if this is intended behavior, since the code comments that the results are close to the numpy/scipy but slightly differs.

Here's the code I used to test. Wav file used is LJ001-0001.wav from LJSpeech dataset ver 1.1, which I downsampled to 16kHz.

import yaapt_torch
import amfm_decompy.basic_tools as basic
import amfm_decompy.pYAAPT as pYAAPT
import soundfile as sf 
import numpy as np

example_file = "LJSpeech-1.1/wavs_16khz/LJ001-0001.wav"
y, sr = sf.read(example_file)

yaapt_opts = {
    "sr": sr,
    "frame_length": 20.0,
    "frame_space": 5.0,
    "nccf_thresh1": 0.25,
    "tda_frame_length": 25.0,
}

frame_length = 20.0
to_pad = int(frame_length / 1000 * 16000) //2
y_pad = np.pad(y.squeeze(), (to_pad, to_pad), "constant", constant_values=0)
signal = basic.SignalObj(y_pad, 16000)
numpy_pitch = pYAAPT.yaapt(signal, **{
    "frame_length": 20.0,
    "frame_space": 5.0,
    "nccf_thresh1": 0.25,
    "tda_frame_length": 25.0
})

import torch
torch_pitch = yaapt_torch.yaapt(
    torch.tensor(y_pad).unsqueeze(0), 
    yaapt_opts
)

Now, we see that the sizes are different:

> torch_pitch.shape
torch.size([1,1940])

> numpy_pitch.samp_values.shape
(1936,)

and the values are different as well, although there is a resemblance.

> torch_pitch[:, 100:120]
tensor([[231.8841, 238.8060, 246.1538, 250.0000, 253.9682, 262.2951, 271.1864,
         275.8621, 285.7143, 290.9091, 301.8868, 307.6923, 320.0000, 326.5306,
         333.3333, 340.4255, 347.8261, 347.8261, 347.8261, 355.5555]])

> numpy_pitch.samp_values[100:120]
array([242.42424242, 246.15384615, 253.96825397, 258.06451613,
       262.29508197, 271.18644068, 275.86206897, 285.71428571,
       285.71428571, 285.71428571, 285.71428571, 173.91304348,
       173.91304348, 173.91304348, 173.91304348, 175.82417582,
       175.82417582, 175.82417582, 175.82417582, 173.91304348])

> numpy_pitch.samp_interp[100:120]
array([242.42424242, 246.15384615, 253.96825397, 258.06451613,
       262.29508197, 271.18644068, 275.86206897, 285.71428571,
       285.71428571, 285.71428571, 285.71428571, 173.91304348,
       173.91304348, 173.91304348, 173.91304348, 175.82417582,
       175.82417582, 175.82417582, 175.82417582, 174.19267129])

Considering the repos which use YAAPT as a part of their data preprocessing pipeline (example), I think it would be really nice to align both YAPPT outputs, or at least to have a compliant version.

Also would it be ok if I were to try correct this?

The text was updated successfully, but these errors were encountered:

pchampio · 2024-09-26T22:25:55Z

Thanks for your comment.
This one-night project aimed to overcome the Facebook research/speech-resynthesis data loader issue ;).
In my testing (which wasn't thorough), I only observed a slight difference, but your testing indicates that what I thought was a slight difference is quite large. (having proper testing on multiple files would be nice).
I also never observed the shape issue; I thought this implementation matched the numpy/scipy one in this regard (it is not a huge issue if the difference is 4 features, I believe).

For the samp_interp output, it is unnecessary to put it here; this can be performed elsewhere and in other ways.
But yeah, for the samp_values, it would be good to align with the numpy/scipy implementation.

You are more than welcome to try to correct this!

If I recall correctly, a scipy function didn't exist in Pytroch, so I didn't use it, and it explained some output differences. I don't remember which scipy function I did not use; you will have to do your investigation.
If you fix this, I'll add you as an author on this repo.
Thanks for considering this work.
Best

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inequivalence to the numpy implementation #1

Inequivalence to the numpy implementation #1

stet-stet commented Sep 26, 2024 •

edited

Loading

pchampio commented Sep 26, 2024 •

edited

Loading

Inequivalence to the numpy implementation #1

Inequivalence to the numpy implementation #1

Comments

stet-stet commented Sep 26, 2024 • edited Loading

pchampio commented Sep 26, 2024 • edited Loading

stet-stet commented Sep 26, 2024 •

edited

Loading

pchampio commented Sep 26, 2024 •

edited

Loading