-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
whisper local LLM #10
Comments
We will attempt to make the NPU work on intel. We start with an attempt to get whisper working, along with its variants:
https://github.com/intel/AI-PC_Notebooks?tab=readme-ov-file So far seemingly the best LLMs to use locally: |
We decided not to move forward with attempts at using NPU as it is too early. However, we can leverage OpenVINO CPU acceleration for whisper. Find example whisper pipeline in python: and full python recorder file, speech recognition python and README file where they mention converting a model for openVino: note that the final goal is to have a whisper cpp server running locally, which can then be accessed by gp.nvim: whisper.cpp is available on fedora but appears not to have openVINO support built in. We have to package it ourselves. if this is the case we will also need openvino runtime, which can be installed from a yum repo: finally, we will need to convert the model to be openVINO-compatible: |
clarification: |
models/convert-whisper-to-openvino.py: Looking at the code and whisper.cpp's implementation, this conversion is NOT automatic. It needs to be done manually before you can use OpenVINO acceleration. Here's the workflow: First build and install whisper.cpp with OpenVINO support Set up a Python environment with the required dependencies (whisper, torch, openvino) Finally, when running the server, it will look for these OpenVINO IR model files alongside your GGML/GGUF models This is a one-time setup per model - once you've converted a model, you can reuse the OpenVINO version. The script converts the encoder part of the model to OpenVINO's format, which is what enables the hardware acceleration. Built whisper.cpp with OpenVINO support (-DWHISPER_OPENVINO=ON) You can run the server with OpenVINO acceleration by adding these flags: |
this would be great to have: https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#confidence-color-coding |
we take note that some suported moels by openVINO: https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md#whisper-models |
using this plugin:
Robitx/gp.nvim#122
we make sure to always have whisper server running locally, probably whisper.cpp
we note that the
turbo
models run very fasthttps://github.com/openai/whisper/pull/2361/files
The text was updated successfully, but these errors were encountered: