Use WhisperKit for voice capabilities #7

philippzagar · 2024-04-03T06:16:20Z

Use Case

Provide powerful voice capabilities, so text-to-speech and speech-to-text, via OpenAI Whisper and the open-source and Swift-based WhisperKit

Problem

As Apple's text-to-speech and speech-to-text capabilities via the AVFoundation and Speech frameworks are getting on in years and are no longer up to date with the current industry standards, we aim to incorporate new voice capabilities into SpeziSpeech that generate better voice output / perform better speech recognition.

Solution

OpenAI Whisper provides powerful voice capabilities, so text-to-speech and speech-to-text, that could replace the AVFoundation and Speech frameworks from Apple. SpeziSpeech could therefore use the powerful local Whisper inference to perform voice capabilities directly on the edge device.
The open-source and Swift-based WhisperKit enables the local Whisper inference on the device.

Alternatives considered

Alternative voice capabilities could be performed via ElevenLabs, however, they only provide cloud-service as of now.

Additional context

One could use the typical Strategy pattern software engineering pattern to dynamically switch between Apple's Speech capabilities via AVFoundation / Speech frameworks and WhisperKit.

Keep in mind that the Whisper models first need to be downloaded to the local device (which is automatically done via WhisperKit), the pull process could take some time.

Code of Conduct

I agree to follow this project's Code of Conduct and Contributing Guidelines

philippzagar added the enhancement New feature or request label Apr 3, 2024

philippzagar added this to Project Planning Apr 3, 2024

github-project-automation bot moved this to Backlog in Project Planning Apr 3, 2024

philippzagar added the good first issue Good for newcomers label Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use WhisperKit for voice capabilities #7

Use WhisperKit for voice capabilities #7

philippzagar commented Apr 3, 2024 •

edited

Loading

Use WhisperKit for voice capabilities #7

Use WhisperKit for voice capabilities #7

Comments

philippzagar commented Apr 3, 2024 • edited Loading

Use Case

Problem

Solution

Alternatives considered

Additional context

Code of Conduct

philippzagar commented Apr 3, 2024 •

edited

Loading