huggingFaceModel #2020

Jefresongil · 2023-12-07T11:38:50Z

Jefresongil
Dec 7, 2023

Hi Nassif, I've already done some audio transcriptions using the template "huggingFaceModel = jonatasgrosman/wav2vec2-xls-r-1b-portuguese" and indeed it offers a better result. But I would like to know if there is any previous action that we can do in these audios with a view to improvements, since many times, wiretaps have a low audio quality. Could Microsoft or Google's models represent any gain in these cases?

lfcnassif · 2023-12-07T13:26:14Z

lfcnassif
Dec 7, 2023
Maintainer

Hi @Jefresongil. We don't have any pre processing step to enhance audio quality. 1,5 year ago, wav2vec2 quality was similar to Google's or Microsoft's services, but algorithms evolved a lot since then. So yes, Google and Microsoft can provide better results today. We plan to offer Whisper in 4.2 version, to be released next year. If you can't wait, there is a draft code in #1335 you can try.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

huggingFaceModel #2020

{{title}}

Replies: 1 comment

{{title}}

Select a reply

huggingFaceModel #2020

Jefresongil Dec 7, 2023

Replies: 1 comment

lfcnassif Dec 7, 2023 Maintainer

Jefresongil
Dec 7, 2023

lfcnassif
Dec 7, 2023
Maintainer