whisper local LLM #10

mecattaf · 2024-10-07T09:09:02Z

we make sure to always have whisper server running locally, probably whisper.cpp

we note that the turbo models run very fast
https://github.com/openai/whisper/pull/2361/files

The text was updated successfully, but these errors were encountered:

mecattaf · 2024-11-29T12:39:59Z

We will attempt to make the NPU work on intel.
The github repository where they show work towards porting the drivers for fedora:

We start with an attempt to get whisper working, along with its variants:

https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/NPU/HF-Transformers-AutoModels/Multimodal

https://github.com/intel/AI-PC_Notebooks?tab=readme-ov-file

So far seemingly the best LLMs to use locally:

mecattaf · 2024-12-01T10:14:46Z

We decided not to move forward with attempts at using NPU as it is too early. However, we can leverage OpenVINO CPU acceleration for whisper.

Find example whisper pipeline in python:
https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#speech-to-text-processing-using-whisper-pipeline
jupyter notebook:
https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb

and full python recorder file, speech recognition python and README file where they mention converting a model for openVino:
https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/whisper_speech_recognition

note that the final goal is to have a whisper cpp server running locally, which can then be accessed by gp.nvim:
Robitx/gp.nvim#224

whisper.cpp is available on fedora but appears not to have openVINO support built in. We have to package it ourselves.
we find latest releases here: https://github.com/ggerganov/whisper.cpp/releases
the spec file to draw inspiration from is herre:
https://src.fedoraproject.org/rpms/whisper-cpp/blob/rawhide/f/whisper-cpp.spec
however it does not have openVINO turned on by default out of the box?
https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#openvino-support

if this is the case we will also need openvino runtime, which can be installed from a yum repo:
https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-yum.html

finally, we will need to convert the model to be openVINO-compatible:
https://medium.com/openvino-toolkit/how-to-run-whisper-automatic-speech-recognition-system-locally-on-cpu-or-gpu-with-openvino-a6dc0c000ada

mecattaf · 2024-12-01T10:27:29Z

clarification:
The original post conflated two separate approaches to using OpenVINO with Whisper. The Python implementation using OpenVINO GenAI Library (with its model conversion via Optimum-CLI) is a completely separate solution from whisper.cpp's built-in OpenVINO support. While both achieve hardware acceleration through OpenVINO, they use different toolchains and starting points - the Python approach converts from PyTorch models directly to OpenVINO IR format, while whisper.cpp converts from GGML to OpenVINO IR format using its own conversion tools.
For the goal of running an accelerated whisper.cpp server locally, we should focus solely on whisper.cpp's native OpenVINO support. This means building whisper.cpp with OpenVINO support enabled, installing the OpenVINO runtime from Intel's repository, and using whisper.cpp's own model conversion script (found in the models/ directory) to convert GGML models to OpenVINO IR format. The Python implementation and its conversion process described in the Medium article are not relevant to this specific use case. We do not have to convert models, it is sufficient to build whisper cpp COPR woth openvino support, and then feed the gguf models to it.
we first experiment with the performance without openvino backend to see if it is acceptable.

mecattaf · 2024-12-01T10:31:27Z

models/convert-whisper-to-openvino.py:

Looking at the code and whisper.cpp's implementation, this conversion is NOT automatic. It needs to be done manually before you can use OpenVINO acceleration. Here's the workflow:

First build and install whisper.cpp with OpenVINO support
Then, before running the server with OpenVINO acceleration, you need to:

Set up a Python environment with the required dependencies (whisper, torch, openvino)
Run this conversion script for your model (e.g., python convert-whisper-to-openvino.py --model base.en)
The script will create ggml-base.en-encoder-openvino.xml and .bin files

Finally, when running the server, it will look for these OpenVINO IR model files alongside your GGML/GGUF models

This is a one-time setup per model - once you've converted a model, you can reuse the OpenVINO version. The script converts the encoder part of the model to OpenVINO's format, which is what enables the hardware acceleration.
So your previous plan needs to include this conversion step if you want to use OpenVINO acceleration. However, your approach to test without OpenVINO first is still valid - you can decide whether the conversion effort is worth it based on your baseline performance tests. CopyRetryTand then how can i run the server with acceleration?EditLooking at the whisper.cpp server implementation, once you have:

Built whisper.cpp with OpenVINO support (-DWHISPER_OPENVINO=ON)
Converted your model using the Python script above
Have both the original GGML/GGUF model and the converted OpenVINO files in your models directory

You can run the server with OpenVINO acceleration by adding these flags:
bashCopy./server -m models/ggml-base.en.bin --openvino
The server will look for the corresponding OpenVINO files (ggml-base.en-encoder-openvino.xml and .bin) in the same directory as your GGML model.
You can verify OpenVINO is being used by checking the server startup logs - it should mention something about loading the OpenVINO encoder model.
So the main difference from your current server command is just adding the --openvino flag to enable the OpenVINO backend for acceleration.

mecattaf · 2024-12-09T11:00:22Z

this would be great to have: https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#confidence-color-coding

mecattaf · 2024-12-09T11:51:40Z

we take note that some suported moels by openVINO: https://github.com/openvinotoolkit/openvino.genai/blob/master/src/docs/SUPPORTED_MODELS.md#whisper-models

mecattaf mentioned this issue Dec 4, 2024

Add whisper.cpp with openVINO support mecattaf/zenRPM#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper local LLM #10

whisper local LLM #10

mecattaf commented Oct 7, 2024 •

edited

Loading

mecattaf commented Nov 29, 2024

mecattaf commented Dec 1, 2024

mecattaf commented Dec 1, 2024

mecattaf commented Dec 1, 2024

mecattaf commented Dec 9, 2024

mecattaf commented Dec 9, 2024

whisper local LLM #10

whisper local LLM #10

Comments

mecattaf commented Oct 7, 2024 • edited Loading

mecattaf commented Nov 29, 2024

mecattaf commented Dec 1, 2024

mecattaf commented Dec 1, 2024

mecattaf commented Dec 1, 2024

mecattaf commented Dec 9, 2024

mecattaf commented Dec 9, 2024

mecattaf commented Oct 7, 2024 •

edited

Loading