Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use WhisperKit for voice capabilities #7

Open
1 task done
philippzagar opened this issue Apr 3, 2024 · 0 comments
Open
1 task done

Use WhisperKit for voice capabilities #7

philippzagar opened this issue Apr 3, 2024 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@philippzagar
Copy link
Member

philippzagar commented Apr 3, 2024

Use Case

Provide powerful voice capabilities, so text-to-speech and speech-to-text, via OpenAI Whisper and the open-source and Swift-based WhisperKit

Problem

As Apple's text-to-speech and speech-to-text capabilities via the AVFoundation and Speech frameworks are getting on in years and are no longer up to date with the current industry standards, we aim to incorporate new voice capabilities into SpeziSpeech that generate better voice output / perform better speech recognition.

Solution

OpenAI Whisper provides powerful voice capabilities, so text-to-speech and speech-to-text, that could replace the AVFoundation and Speech frameworks from Apple. SpeziSpeech could therefore use the powerful local Whisper inference to perform voice capabilities directly on the edge device.
The open-source and Swift-based WhisperKit enables the local Whisper inference on the device.

Alternatives considered

Alternative voice capabilities could be performed via ElevenLabs, however, they only provide cloud-service as of now.

Additional context

One could use the typical Strategy pattern software engineering pattern to dynamically switch between Apple's Speech capabilities via AVFoundation / Speech frameworks and WhisperKit.

Keep in mind that the Whisper models first need to be downloaded to the local device (which is automatically done via WhisperKit), the pull process could take some time.

Code of Conduct

  • I agree to follow this project's Code of Conduct and Contributing Guidelines
@philippzagar philippzagar added the enhancement New feature or request label Apr 3, 2024
@philippzagar philippzagar added the good first issue Good for newcomers label Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: Backlog
Development

No branches or pull requests

1 participant