-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
48k_training #12
Comments
你好!转换到48k是需要修改模型的。对于48k数据,如果STFT沿用32ms帧长的设置,频率维度将会是769个点而不是257个点。需要考虑以下几个改动:
可能还存在其它我没考虑到的修改,祝你工作顺利! |
我尝试采用512点STFT,也就是帧长为10.6ms,频带3k以下不压缩,3k以上erb压缩频带,但是出现语音被消掉的情况,请问有什么需要注意的细节吗? |
可能是512点STFT太短了,一般至少使用20ms以上的帧长。ERB压缩可能放到4k会好一点 |
还有我发现训练集的合成方式不同,对模型泛化的影响非常大,您可以分享一下训练集的合成方法吗? |
数据集合成的方式在paper里面讲的比较清楚了,没有什么特别的地方。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
你好!多谢你的工作及开源!如果我想使用48k采样率,需要对模型代码进行修改吗,还是只需要改变输入数据即可。
祝您工作顺利!
The text was updated successfully, but these errors were encountered: