-
-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pyannote is 10 times slower than WhisperX with GPU utilization 10%: expected behavior or misconfiguration? #1652
Comments
Would you mind sharing a link to a Google Colab that one can just click and run to reproduce the issue? |
Unfortunately, I have no access to Google Colab from my Google Account (I can create a new account if needed), I noticed that the problem disappears, when I load the audio file using
instead of loading |
The code might be "trivial" but the whole point of sharing a Google Colab is for pyannote maintainers to avoid wasting time on problems that are not reproducible. For instance, two files with two different extensions (.wav and .mp3) are mentioned here. Preparing a Google Colab will definitely increase your chances of having someone look at your issue. It might also happen that the mere preparation of the Google Colab makes you realize that the problem is on your side (I am not saying that this is the case here but it happened in the past). |
+1 for this issue thanks for the note @chubin , I have used your solution with
and got much faster inference 👍 |
Wow, after updatin from 2.x to 3.x I had performance issues. Now It's better than old code. I really didn't get what caused that but.. Thanks |
This has resolved the issue for me, thank you @chubin!! |
Tested versions
System information
Ubuntu 22.04, NVIDIA RTX A6000
Issue description
I am not sure if it is a bug, so please feel free to close it if it is expected behavior.
I am trying to diarize a large recording (approximately 60 minutes), and the
diarization process takes 8.5 minutes:
Here is my code:
It uses the GPU during diarization, but with a low utilization level (~10%),
and it uses 1 core of the CPU (100%) all the time.
When doing the diarization with
whisperx
, though, it takes just a minute,and GPU utilization is at full capacity.
However, the quality of diarization is slightly worse in this case (approximately 5% of text
is attributed to wrong/non-existent speakers).
Pyannote diarization quality is just brilliant, but it takes an order of magnitude more time.
I suppose that I am doing something wrong, but I don't know what exactly.
Could you please point me in the right direction,
or just say that it is exactly as it should be, and the behavior is expected.
GPU utilization while using pyannote pure
GPU utilization when using whisperX
Minimal reproduction example (MRE)
(not applicable)
The text was updated successfully, but these errors were encountered: