You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Provide powerful voice capabilities, so text-to-speech and speech-to-text, via OpenAI Whisper and the open-source and Swift-based WhisperKit
Problem
As Apple's text-to-speech and speech-to-text capabilities via the AVFoundation and Speech frameworks are getting on in years and are no longer up to date with the current industry standards, we aim to incorporate new voice capabilities into SpeziSpeech that generate better voice output / perform better speech recognition.
Solution
OpenAI Whisper provides powerful voice capabilities, so text-to-speech and speech-to-text, that could replace the AVFoundation and Speech frameworks from Apple. SpeziSpeech could therefore use the powerful local Whisper inference to perform voice capabilities directly on the edge device.
The open-source and Swift-based WhisperKit enables the local Whisper inference on the device.
Alternatives considered
Alternative voice capabilities could be performed via ElevenLabs, however, they only provide cloud-service as of now.
Additional context
One could use the typical Strategy pattern software engineering pattern to dynamically switch between Apple's Speech capabilities via AVFoundation / Speech frameworks and WhisperKit.
Keep in mind that the Whisper models first need to be downloaded to the local device (which is automatically done via WhisperKit), the pull process could take some time.
Code of Conduct
I agree to follow this project's Code of Conduct and Contributing Guidelines
The text was updated successfully, but these errors were encountered:
Use Case
Provide powerful voice capabilities, so text-to-speech and speech-to-text, via OpenAI Whisper and the open-source and Swift-based WhisperKit
Problem
As Apple's text-to-speech and speech-to-text capabilities via the AVFoundation and Speech frameworks are getting on in years and are no longer up to date with the current industry standards, we aim to incorporate new voice capabilities into SpeziSpeech that generate better voice output / perform better speech recognition.
Solution
OpenAI Whisper provides powerful voice capabilities, so text-to-speech and speech-to-text, that could replace the AVFoundation and Speech frameworks from Apple. SpeziSpeech could therefore use the powerful local Whisper inference to perform voice capabilities directly on the edge device.
The open-source and Swift-based WhisperKit enables the local Whisper inference on the device.
Alternatives considered
Alternative voice capabilities could be performed via ElevenLabs, however, they only provide cloud-service as of now.
Additional context
One could use the typical Strategy pattern software engineering pattern to dynamically switch between Apple's Speech capabilities via AVFoundation / Speech frameworks and WhisperKit.
Keep in mind that the Whisper models first need to be downloaded to the local device (which is automatically done via WhisperKit), the pull process could take some time.
Code of Conduct
The text was updated successfully, but these errors were encountered: