Skip to content

Commit

Permalink
Merge pull request #808 from ftnext/test/google-cloud-speech
Browse files Browse the repository at this point in the history
Add tests (google-cloud-speech)
  • Loading branch information
ftnext authored Dec 21, 2024
2 parents 597d09e + 9aa5504 commit b04353b
Show file tree
Hide file tree
Showing 8 changed files with 319 additions and 113 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,16 @@ jobs:
- name: Install Python dependencies (Ubuntu, <=3.12)
if: matrix.os == 'ubuntu-latest' && matrix.python-version != '3.13'
run: |
python -m pip install .[dev,audio,pocketsphinx,whisper-local,openai,groq]
python -m pip install .[dev,audio,pocketsphinx,google-cloud,whisper-local,openai,groq]
- name: Install Python dependencies (Ubuntu, 3.13)
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.13'
run: |
python -m pip install standard-aifc setuptools
python -m pip install --no-build-isolation .[dev,audio,pocketsphinx,openai,groq]
python -m pip install --no-build-isolation .[dev,audio,pocketsphinx,google-cloud,openai,groq]
- name: Install Python dependencies (Windows)
if: matrix.os == 'windows-latest'
run: |
python -m pip install .[dev,whisper-local,openai,groq]
python -m pip install .[dev,whisper-local,google-cloud,openai,groq]
- name: Test with unittest
run: |
pytest --doctest-modules -v speech_recognition/recognizers/ tests/
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ lint:
@pipx run flake8 --ignore=E501,E701,W503 .

rstcheck:
@pipx run rstcheck --ignore-directives autofunction README.rst reference/*.rst
@pipx run rstcheck[sphinx] --ignore-directives autofunction README.rst reference/*.rst
11 changes: 6 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,14 +151,15 @@ You also have to install Vosk Models:

`Here <https://alphacephei.com/vosk/models>`__ are models avaiable for download. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"

Google Cloud Speech Library for Python (for Google Cloud Speech API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Google Cloud Speech Library for Python (for Google Cloud Speech-to-Text API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`Google Cloud Speech library for Python <https://cloud.google.com/speech-to-text/docs/quickstart>`__ is required if and only if you want to use the Google Cloud Speech API (``recognizer_instance.recognize_google_cloud``).
The library `google-cloud-speech <https://pypi.org/project/google-cloud-speech/>`__ is **required if and only if you want to use Google Cloud Speech-to-Text API** (``recognizer_instance.recognize_google_cloud``).

If not installed, everything in the library will still work, except calling ``recognizer_instance.recognize_google_cloud`` will raise an ``RequestError``.
You can install it with :command:`python3 -m pip install SpeechRecognition[google-cloud]`.
(ref: `official installation instructions <https://cloud.google.com/speech-to-text/docs/transcribe-client-libraries#install_the_client_library>`__)

According to the `official installation instructions <https://cloud.google.com/speech-to-text/docs/quickstart>`__, the recommended way to install this is using `Pip <https://pip.readthedocs.org/>`__: execute ``pip install google-cloud-speech`` (replace ``pip`` with ``pip3`` if using Python 3).
Currently only `V1 <https://cloud.google.com/speech-to-text/docs/quickstart>`__ is supported. (`V2 <https://cloud.google.com/speech-to-text/v2/docs/quickstart>`__ is not supported)

FLAC (for some systems)
~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
22 changes: 3 additions & 19 deletions reference/library-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,26 +227,10 @@ Returns the most likely transcription if ``show_all`` is false (the default). Ot

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json: Union[str, None] = None, language: str = "en-US", preferred_phrases: Union[Iterable[str], None] = None, show_all: bool = False, **api_params) -> Union[str, Dict[str, Any]]``
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json_path: Union[str, None] = None, language: str = "en-US", preferred_phrases: Union[Iterable[str], None] = None, show_all: bool = False, **api_params) -> Union[str, Dict[str, Any]]``
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API.

This function requires a Google Cloud Platform account; see the `Google Cloud Speech API Quickstart <https://cloud.google.com/speech/docs/getting-started>`__ for details and instructions. Basically, create a project, enable billing for the project, enable the Google Cloud Speech API for the project, and set up Service Account Key credentials for the project. The result is a JSON file containing the API credentials. The text content of this JSON file is specified by ``credentials_json``. If not specified, the library will try to automatically `find the default API credentials JSON file <https://developers.google.com/identity/protocols/application-default-credentials>`__.

The recognition language is determined by ``language``, which is a BCP-47 language tag like ``"en-US"`` (US English). A list of supported language tags can be found in the `Google Cloud Speech API documentation <https://cloud.google.com/speech/docs/languages>`__.

If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.

``api_params`` are Cloud Speech API-specific parameters as dict (optional). For more information see <https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.RecognitionConfig>

The ``use_enhanced`` is a boolean option. If use_enhanced is set to true and the model field is not set, then an appropriate enhanced model is chosen if an enhanced model exists for the audio. If use_enhanced is true and an enhanced version of the specified model does not exist, then the speech is recognized using the standard version of the specified model.

Furthermore, if the option ``use_enhanced`` has not been set the option ``model`` can be used, which can be used to select the model best suited to your domain to get best results. If a model is not explicitly specified, then we auto-select a model based on the other parameters of this method.

Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection.
.. autofunction:: speech_recognition.recognizers.google_cloud.recognize

``recognizer_instance.recognize_wit(audio_data: AudioData, key: str, show_all: bool = False) -> Union[str, Dict[str, Any]]``
----------------------------------------------------------------------------------------------------------------------------
Expand Down
2 changes: 2 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ audio =
PyAudio >= 0.2.11
pocketsphinx =
pocketsphinx < 5
google-cloud =
google-cloud-speech
whisper-local =
openai-whisper
soundfile
Expand Down
87 changes: 2 additions & 85 deletions speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -693,90 +693,6 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g
if hypothesis is not None: return hypothesis.hypstr
raise UnknownValueError() # no transcriptions available

def recognize_google_cloud(self, audio_data, credentials_json=None, language="en-US", preferred_phrases=None, show_all=False, **api_params):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API.
This function requires a Google Cloud Platform account; see the `Google Cloud Speech API Quickstart <https://cloud.google.com/speech/docs/getting-started>`__ for details and instructions. Basically, create a project, enable billing for the project, enable the Google Cloud Speech API for the project, and set up Service Account Key credentials for the project. The result is a JSON file containing the API credentials. The text content of this JSON file is specified by ``credentials_json``. If not specified, the library will try to automatically `find the default API credentials JSON file <https://developers.google.com/identity/protocols/application-default-credentials>`__.
The recognition language is determined by ``language``, which is a BCP-47 language tag like ``"en-US"`` (US English). A list of supported language tags can be found in the `Google Cloud Speech API documentation <https://cloud.google.com/speech/docs/languages>`__.
If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.
``api_params`` are Cloud Speech API-specific parameters as dict (optional). For more information see <https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.RecognitionConfig>
The ``use_enhanced`` is a boolean option. If use_enhanced is set to true and the model field is not set,
then an appropriate enhanced model is chosen if an enhanced model exists for the audio.
If use_enhanced is true and an enhanced version of the specified model does not exist,
then the speech is recognized using the standard version of the specified model.
Furthermore, if the option ``use_enhanced`` has not been set the option ``model`` can be used, which can be used to select the model best
suited to your domain to get best results. If a model is not explicitly specified,
then we auto-select a model based on the other parameters of this method.
Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection.
"""
assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data"
if credentials_json is None:
assert os.environ.get('GOOGLE_APPLICATION_CREDENTIALS') is not None
assert isinstance(language, str), "``language`` must be a string"
assert preferred_phrases is None or all(isinstance(preferred_phrases, (type(""), type(u""))) for preferred_phrases in preferred_phrases), "``preferred_phrases`` must be a list of strings"

try:
import socket

from google.api_core.exceptions import GoogleAPICallError
from google.cloud import speech
except ImportError:
raise RequestError('missing google-cloud-speech module: ensure that google-cloud-speech is set up correctly.')

if credentials_json is not None:
client = speech.SpeechClient.from_service_account_json(credentials_json)
else:
client = speech.SpeechClient()

flac_data = audio_data.get_flac_data(
convert_rate=None if 8000 <= audio_data.sample_rate <= 48000 else max(8000, min(audio_data.sample_rate, 48000)), # audio sample rate must be between 8 kHz and 48 kHz inclusive - clamp sample rate into this range
convert_width=2 # audio samples must be 16-bit
)
audio = speech.RecognitionAudio(content=flac_data)

config = {
'encoding': speech.RecognitionConfig.AudioEncoding.FLAC,
'sample_rate_hertz': audio_data.sample_rate,
'language_code': language,
**api_params,
}
if preferred_phrases is not None:
config['speechContexts'] = [speech.SpeechContext(
phrases=preferred_phrases
)]
if show_all:
config['enableWordTimeOffsets'] = True # some useful extra options for when we want all the output

opts = {}
if self.operation_timeout and socket.getdefaulttimeout() is None:
opts['timeout'] = self.operation_timeout

config = speech.RecognitionConfig(**config)

try:
response = client.recognize(config=config, audio=audio)
except GoogleAPICallError as e:
raise RequestError(e)
except URLError as e:
raise RequestError("recognition connection failed: {0}".format(e.reason))

if show_all: return response
if len(response.results) == 0: raise UnknownValueError()

transcript = ''
for result in response.results:
transcript += result.alternatives[0].transcript.strip() + ' '
return transcript

def recognize_wit(self, audio_data, key, show_all=False):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Wit.ai API.
Expand Down Expand Up @@ -1518,11 +1434,12 @@ def flush(self, *args, **kwargs):
# At this time, the dependencies are not yet installed, resulting in a ModuleNotFoundError.
# This is a workaround to resolve this issue
try:
from .recognizers import google, openai, groq
from .recognizers import google, google_cloud, openai, groq
except (ModuleNotFoundError, ImportError):
pass
else:
Recognizer.recognize_google = google.recognize_legacy
Recognizer.recognize_google_cloud = google_cloud.recognize
Recognizer.recognize_openai = openai.recognize
Recognizer.recognize_whisper_api = openai.recognize # Deprecated
Recognizer.recognize_groq = groq.recognize
Expand Down
Loading

0 comments on commit b04353b

Please sign in to comment.