Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with the integer values of RVQ #5

Open
phdshliang opened this issue Apr 1, 2024 · 1 comment
Open

How to deal with the integer values of RVQ #5

phdshliang opened this issue Apr 1, 2024 · 1 comment

Comments

@phdshliang
Copy link

Hi author,
I've been experimenting with encoding audio using your fantastic method, and I noticed that the RVQ (Residual Vector Quantization) values I obtain are integers like the follows:
values

I'm curious if this is expected behavior. Additionally, I'm interested in using these encoded features for downstream tasks, but I'm unsure about how to adjust these integer values for training purposes. Would it be appropriate to apply normalization techniques such as min-max scaling or Z-Score normalization? The distribution of these encoded feature values is unknown to me, so I'm seeking guidance on how to handle them effectively for training.

Any advice or suggestions on how to deal with these encoded feature values would be greatly appreciated.

Thank you!

@GAN-pie
Copy link

GAN-pie commented Oct 9, 2024

Hi, these integers simply correspond to the indexes of the codebooks in the different quantizers.
I don't really think you will be able to do anything interesting only with these indexes. Trying to do something with the codewords associated with these indexes should be more interesting in my humble opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants