Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning Speech Encoders further #28

Open
King-Rafat opened this issue Jun 29, 2024 · 0 comments
Open

Finetuning Speech Encoders further #28

King-Rafat opened this issue Jun 29, 2024 · 0 comments

Comments

@King-Rafat
Copy link

King-Rafat commented Jun 29, 2024

Hi,

I tried finetuning the Swahili speech encoder but the performance only increases to 9.6 BLEU from a base BLEU score of 7.5 on your already finetuned encoder. I finetuned the speech encoder for 5 epochs with augmented data. I am not willing to try more epochs as the performance increase is not I had imagined. I finetuned with about 30hrs of data. The MSE loss in the last epoch was 1.5*10^-6. Any different approach that might help achieve a better BLEU?

Also, what is the finetuned decoder model checkpoint that I read in the paper does well for Swahili? When I try to use it I get the error - ValueError: The input sequence length must be less than or equal to the maximum sequence length (512), but is 513 instead which I do not get for the normal decoder. All my audios are less than or equal to 30 sec.

Thank you for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant