You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to reproduce this work at 16kHz. I used a frame length and frame shift similar to 22.05kHz, which means the frame shift is 11.6ms (256 points) at 22.05kHz and 10ms (160 points) at 16kHz. The frame length is four times the frame shift. The FFT size is also 1024.
Currently, my model has been trained for 200k steps on a 256-dimensional input, but there are still noticeable phase artifacts. I would like to know if this model is sensitive to the sampling rate or if my current training steps are insufficient.
The paper does not seem to provide information on how many steps the model was trained for, and I'm curious about how many steps are generally sufficient for acceptable results (do I really need to run all 3100 epochs?).
I would be approciate if you could offer your pretrained model params weights file! Thanks a lot!
Also, I noticed that the trained model parameters are very small. The encoder parameter file is only 543kB, and the generator parameter file is only 545kB. Is this normal? It's amazing that such a small number of parameters can achieve this task!
Besides, I noticed my mel loss on val set at 200k steps is about 0.3, which is much higher than the curve you gave. I use 'both' mode to train this project, is this also a sampling rate problem?
The text was updated successfully, but these errors were encountered:
Hi!
I want to reproduce this work at 16kHz. I used a frame length and frame shift similar to 22.05kHz, which means the frame shift is 11.6ms (256 points) at 22.05kHz and 10ms (160 points) at 16kHz. The frame length is four times the frame shift. The FFT size is also 1024.
Currently, my model has been trained for 200k steps on a 256-dimensional input, but there are still noticeable phase artifacts. I would like to know if this model is sensitive to the sampling rate or if my current training steps are insufficient.
The paper does not seem to provide information on how many steps the model was trained for, and I'm curious about how many steps are generally sufficient for acceptable results (do I really need to run all 3100 epochs?).
I would be approciate if you could offer your pretrained model params weights file! Thanks a lot!
Also, I noticed that the trained model parameters are very small. The encoder parameter file is only 543kB, and the generator parameter file is only 545kB. Is this normal? It's amazing that such a small number of parameters can achieve this task!
Besides, I noticed my mel loss on val set at 200k steps is about 0.3, which is much higher than the curve you gave. I use 'both' mode to train this project, is this also a sampling rate problem?
The text was updated successfully, but these errors were encountered: