huggingFaceModel #2020
Jefresongil
started this conversation in
General
Replies: 1 comment
-
Hi @Jefresongil. We don't have any pre processing step to enhance audio quality. 1,5 year ago, wav2vec2 quality was similar to Google's or Microsoft's services, but algorithms evolved a lot since then. So yes, Google and Microsoft can provide better results today. We plan to offer Whisper in 4.2 version, to be released next year. If you can't wait, there is a draft code in #1335 you can try. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Nassif, I've already done some audio transcriptions using the template "huggingFaceModel = jonatasgrosman/wav2vec2-xls-r-1b-portuguese" and indeed it offers a better result. But I would like to know if there is any previous action that we can do in these audios with a view to improvements, since many times, wiretaps have a low audio quality. Could Microsoft or Google's models represent any gain in these cases?
Beta Was this translation helpful? Give feedback.
All reactions