Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GPU auto detection in transcription code #2398

Open
lfcnassif opened this issue Jan 2, 2025 · 10 comments
Open

Improve GPU auto detection in transcription code #2398

lfcnassif opened this issue Jan 2, 2025 · 10 comments

Comments

@lfcnassif
Copy link
Member

Currently we use gputil python library to query GPU presence, it uses nvidia-smi cmd line tool internally. Unfortunately it can cause issues if a GPU exists but is badly configured (missing libraries, etc). Maybe it is better to use torch.cuda.device_count() to check for GPU availability. Or even make the choice between CPU x GPU an explicit parameter in AudioTranscriptConfig.txt and abort if it doesn't work with the selected device instead of trying to fallback to CPU, thoughts?

@lfcnassif
Copy link
Member Author

lfcnassif commented Jan 2, 2025

Fallbacking to CPU may not be desired by the user and he may prefer an explicit abortion.

@gfd2020
Copy link
Collaborator

gfd2020 commented Jan 6, 2025

@lfcnassif , I was testing here on version 4.2.0 and I have a suggestion. GPUtil informs you if you have cuda installed on your PC. Torch already tells you whether the python being run is prepared to run cuda, so it is more reliable information. See the code below. With it I can run whisper without errors if my iped installation has pytorch cuda via gpu or cpu. If it's in the standard version, I'm getting errors.

    #import GPUtil
    #cudaCount = len(GPUtil.getGPUs())

    try:
        import torch
        torch_found = torch.cuda.is_available()
        cudaCount = torch.cuda.device_count();
    except:
        torch_found = False
        cudaCount = 0

    print(str(cudaCount), file=stdout, flush=True)

    if cudaCount > 0 and torch_found:
        deviceId = 'cuda'
    else:
        deviceId = 'cpu'
        deviceNum = 0

@lfcnassif
Copy link
Member Author

lfcnassif commented Jan 6, 2025

Thanks @gfd2020. Unfortunately I just realized torch isn't needed by faster-whisper, it's not a required dependency, and I wouldn't like to add such a heavy dependency (+1.1GB size for cpu) just for a GPU check...

Is there a different GPUtil command to check for CUDA availability other than getGPUs()?

Anyway, I think the best approach here is to externalize the device option (CPU x GPU) and to abort processing if it fails.

@gfd2020
Copy link
Collaborator

gfd2020 commented Jan 7, 2025

Thanks @gfd2020. Unfortunately I just realized torch isn't needed by faster-whisper, it's not a required dependency, and I wouldn't like to add such a heavy dependency (+1.1GB size for cpu) just for a GPU check...

I agree, but if you only use CPU (pip install numpy faster-whisper) the import torch check will throw an exception, so it's ok, go for the CPU.

If you are going to use GPU ( pip install torch torchvision torchaudio faster-whisper) the torch import will run and the two variables will be tested to see if everything is available.

In short, you only need torch if you are going to use GPU. This check will only be done in the case of GPU installation.

Is there a different GPUtil command to check for CUDA availability other than getGPUs()?

From what I've tested, GPUtil tells you the cudaCount if cuda is installed on the PC. As for torch, it references the python.exe that is executing the script. For example, if IPED's python has torch installed correctly, it will return 1; if it doesn't have torch, it will return 0.

Anyway, I think the best approach here is to externalize the device option (CPU x GPU) and to abort processing if it fails.

I agree, there could be an option to force the cpu if the user wants.

@lfcnassif
Copy link
Member Author

If you are going to use GPU ( pip install torch torchvision torchaudio faster-whisper) the torch import will run and the two variables will be tested to see if everything is available.

But where in the official faster-whisper site they say pytorch is needed? I can run faster-whisper on GPU without pytorch, installing the needed CUDA libs (cuDNN and cuBLAS). Pytorch was required by the previous wav2vec2, but seems it is not by faster-whisper.

@gfd2020
Copy link
Collaborator

gfd2020 commented Jan 7, 2025

But where in the official faster-whisper site they say pytorch is needed? I can run faster-whisper on GPU without pytorch, installing the needed CUDA libs (cuDNN and cuBLAS). Pytorch was required by the previous wav2vec2, but seems it is not by faster-whisper.

I see. You're right then.

@gfd2020
Copy link
Collaborator

gfd2020 commented Jan 7, 2025

@lfcnassif , I found another lib to detect cuda count. It's called numba. It's 100 MB. It worked here, tell me if it works.

    #import GPUtil
    #cudaCount = len(GPUtil.getGPUs())
    try:
        from numba import cuda
        dev = cuda.list_devices()
        cudaCount = len(dev) if dev is not None else 0
    except:
        cudaCount = 0

    print(str(cudaCount), file=stdout, flush=True)

    if cudaCount > 0:
        deviceId = 'cuda'
    else:
        deviceId = 'cpu'
        deviceNum = 0

PS: I didn't know that faster-whisper doesn't need pytorch. That way, I'll have to update the wiki.

Just to confirm:

  1. Vosk - no need for pytorch
  2. Wav2Vec2 - needs pytorch
  3. faster-whisper - no need for pytorch
  4. whisperX- ?

I had to do additional checks to see if cuudn and cublas are installed correctly, otherwise the transcription throws an exception.

@gfd2020
Copy link
Collaborator

gfd2020 commented Jan 8, 2025

Information:

  1. I tried using faster-whisper 1.1.1 with CUDA 12 and cuDNN 9 but it was not possible on Windows. Python crashed without raising an exception. I was only able to use it with CUDA 12 and cuDNN 8. Therefore, as the faster-whisper github page recommends, I downgraded ctranslate 2 to another version.

pip install faster-whisper
pip install --force-reinstall ctranslate2==4.4.0.

On that same page there is a link to download the DLLs necessary to run on Windows.
https://github.com/Purfview/whisper-standalone-win/releases/tag/libs

I downloaded CUDA12_v1: - cuBLAS.and.cuDNN____v12.4.5.8___v8.9.7.29.

So, just place 4 DLLs in the python folder and the transcription was done successfully on GPU. It wasn't even necessary to install NVIDIA's own cuDNN and cuBLAS (which, by the way, comes with a bunch of other software) . These 4 DLLs are only 440 MB compressed.

The Python folder with faster-whisper plus the GPU DLLs was approximately 515 MB compressed. I thought it was reasonable.

Remembering that updating the CUDA SDK driver to version 12.x must be done through the NVIDIA installer normally.

I'm going to update the wiki to help anyone who wants to install faster-whisper with GPU drivers.

  1. In my customized build I used numba to get the 'cudaCount', I also used this lib to request the CUDA capability and the CUDA SDK to better measure the cuda count so as not to give an error as cuDNN and cuBLAS depend on the version of the CUDA SDK and very old graphics card with old cuda support ( kepler or older ).
    cudaCount = 0
    try:
        from numba import cuda
        from numba.cuda.cudadrv.driver import driver
        cuda_min_sdk_version = (12,0)
        cuda_min_version = (5,0) #(3,5) - CUDA SDK 11.8 MAX
        dev_list = cuda.list_devices()
        for dev in dev_list:
            if dev.compute_capability >= cuda_min_version and driver.get_version() >= cuda_min_sdk_version:
                cudaCount = len(dev_list)
                deviceNum = dev.id
                cuda.detect() # Print GPUs Info
                break
        del dev_list
    except Exception as e:
        pass

    print(str(cudaCount), file=stdout, flush=True)
     
    force_cpu = False # for debug only

    if cudaCount > 0 and not force_cpu:
        deviceId = 'cuda'
    else:
        deviceId = 'cpu'
        deviceNum = 0

@lfcnassif
Copy link
Member Author

Thank you very much @gfd2020 for all your work here!

whisperX- ?

WhisperX uses Faster-whisper internally, so it doesn't need pytorch.

I had to do additional checks to see if cuudn and cublas are installed correctly, otherwise the transcription throws an exception.

Yes, I got the same errors with GPUtil and numba with your previous code.

I'm going to update the wiki to help anyone who wants to install faster-whisper with GPU drivers.

Thank you very much!

@gfd2020
Copy link
Collaborator

gfd2020 commented Jan 10, 2025

It seems that after installing numba and faster-whisper ( pip install numba faster-whisper ) the installed numpy is version 2. However, this may cause some exceptions from other modules, so the best thing is to downgrade numpy also with the following command:

pip install --force-reinstall "numpy<2" ctranslate2==4.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants