Merge pull request #808 from ftnext/test/google-cloud-speech

Add tests (google-cloud-speech)
Uberi · Dec 21, 2024 · b04353b · b04353b
2 parents 597d09e + 9aa5504
commit b04353b
Show file tree

Hide file tree

Showing 8 changed files with 319 additions and 113 deletions.
diff --git a/.github/workflows/unittests.yml b/.github/workflows/unittests.yml
@@ -44,16 +44,16 @@ jobs:
       - name: Install Python dependencies (Ubuntu, <=3.12)
         if: matrix.os == 'ubuntu-latest' && matrix.python-version != '3.13'
         run: |
-          python -m pip install .[dev,audio,pocketsphinx,whisper-local,openai,groq]
+          python -m pip install .[dev,audio,pocketsphinx,google-cloud,whisper-local,openai,groq]
       - name: Install Python dependencies (Ubuntu, 3.13)
         if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.13'
         run: |
           python -m pip install standard-aifc setuptools
-          python -m pip install --no-build-isolation .[dev,audio,pocketsphinx,openai,groq]
+          python -m pip install --no-build-isolation .[dev,audio,pocketsphinx,google-cloud,openai,groq]
       - name: Install Python dependencies (Windows)
         if: matrix.os == 'windows-latest'
         run: |
-          python -m pip install .[dev,whisper-local,openai,groq]
+          python -m pip install .[dev,whisper-local,google-cloud,openai,groq]
       - name: Test with unittest
         run: |
           pytest --doctest-modules -v speech_recognition/recognizers/ tests/
diff --git a/Makefile b/Makefile
@@ -3,4 +3,4 @@ lint:
 	@pipx run flake8 --ignore=E501,E701,W503 .
 
 rstcheck:
-	@pipx run rstcheck --ignore-directives autofunction README.rst reference/*.rst
+	@pipx run rstcheck[sphinx] --ignore-directives autofunction README.rst reference/*.rst
diff --git a/README.rst b/README.rst
@@ -151,14 +151,15 @@ You also have to install Vosk Models:
 
 `Here <https://alphacephei.com/vosk/models>`__ are models avaiable for download. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"
 
-Google Cloud Speech Library for Python (for Google Cloud Speech API users)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Google Cloud Speech Library for Python (for Google Cloud Speech-to-Text API users)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-`Google Cloud Speech library for Python <https://cloud.google.com/speech-to-text/docs/quickstart>`__ is required if and only if you want to use the Google Cloud Speech API (``recognizer_instance.recognize_google_cloud``).
+The library `google-cloud-speech <https://pypi.org/project/google-cloud-speech/>`__ is **required if and only if you want to use Google Cloud Speech-to-Text API** (``recognizer_instance.recognize_google_cloud``).
 
-If not installed, everything in the library will still work, except calling ``recognizer_instance.recognize_google_cloud`` will raise an ``RequestError``.
+You can install it with :command:`python3 -m pip install SpeechRecognition[google-cloud]`.
+(ref: `official installation instructions <https://cloud.google.com/speech-to-text/docs/transcribe-client-libraries#install_the_client_library>`__)
 
-According to the `official installation instructions <https://cloud.google.com/speech-to-text/docs/quickstart>`__, the recommended way to install this is using `Pip <https://pip.readthedocs.org/>`__: execute ``pip install google-cloud-speech`` (replace ``pip`` with ``pip3`` if using Python 3).
+Currently only `V1 <https://cloud.google.com/speech-to-text/docs/quickstart>`__ is supported. (`V2 <https://cloud.google.com/speech-to-text/v2/docs/quickstart>`__ is not supported)
 
 FLAC (for some systems)
 ~~~~~~~~~~~~~~~~~~~~~~~

diff --git a/reference/library-reference.rst b/reference/library-reference.rst
@@ -227,26 +227,10 @@ Returns the most likely transcription if ``show_all`` is false (the default). Ot
 
 Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.
 
-``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json: Union[str, None] = None, language: str = "en-US", preferred_phrases: Union[Iterable[str], None] = None, show_all: bool = False, **api_params) -> Union[str, Dict[str, Any]]``
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json_path: Union[str, None] = None, language: str = "en-US", preferred_phrases: Union[Iterable[str], None] = None, show_all: bool = False, **api_params) -> Union[str, Dict[str, Any]]``
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API.
-
-This function requires a Google Cloud Platform account; see the `Google Cloud Speech API Quickstart <https://cloud.google.com/speech/docs/getting-started>`__ for details and instructions. Basically, create a project, enable billing for the project, enable the Google Cloud Speech API for the project, and set up Service Account Key credentials for the project. The result is a JSON file containing the API credentials. The text content of this JSON file is specified by ``credentials_json``. If not specified, the library will try to automatically `find the default API credentials JSON file <https://developers.google.com/identity/protocols/application-default-credentials>`__.
-
-The recognition language is determined by ``language``, which is a BCP-47 language tag like ``"en-US"`` (US English). A list of supported language tags can be found in the `Google Cloud Speech API documentation <https://cloud.google.com/speech/docs/languages>`__.
-
-If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.
-
-``api_params`` are Cloud Speech API-specific parameters as dict (optional). For more information see <https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.RecognitionConfig>
-
-The ``use_enhanced`` is a boolean option. If use_enhanced is set to true and the model field is not set, then an appropriate enhanced model is chosen if an enhanced model exists for the audio. If use_enhanced is true and an enhanced version of the specified model does not exist, 	then the speech is recognized using the standard version of the specified model.
-
-Furthermore, if the option ``use_enhanced`` has not been set the option ``model`` can be used, which can be used to select the model best suited to your domain to get best results. If a model is not explicitly specified, then we auto-select a model based on the other parameters of this method.
-
-Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary.
-
-Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection.
+.. autofunction:: speech_recognition.recognizers.google_cloud.recognize
 
 ``recognizer_instance.recognize_wit(audio_data: AudioData, key: str, show_all: bool = False) -> Union[str, Dict[str, Any]]``
 ----------------------------------------------------------------------------------------------------------------------------

diff --git a/setup.cfg b/setup.cfg
@@ -7,6 +7,8 @@ audio =
     PyAudio >= 0.2.11
 pocketsphinx =
     pocketsphinx < 5
+google-cloud =
+    google-cloud-speech
 whisper-local =
     openai-whisper
     soundfile

diff --git a/speech_recognition/__init__.py b/speech_recognition/__init__.py
@@ -693,90 +693,6 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g
         if hypothesis is not None: return hypothesis.hypstr
         raise UnknownValueError()  # no transcriptions available
 
-    def recognize_google_cloud(self, audio_data, credentials_json=None, language="en-US", preferred_phrases=None, show_all=False, **api_params):
-        """
-        Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API.
-
-        This function requires a Google Cloud Platform account; see the `Google Cloud Speech API Quickstart <https://cloud.google.com/speech/docs/getting-started>`__ for details and instructions. Basically, create a project, enable billing for the project, enable the Google Cloud Speech API for the project, and set up Service Account Key credentials for the project. The result is a JSON file containing the API credentials. The text content of this JSON file is specified by ``credentials_json``. If not specified, the library will try to automatically `find the default API credentials JSON file <https://developers.google.com/identity/protocols/application-default-credentials>`__.
-
-        The recognition language is determined by ``language``, which is a BCP-47 language tag like ``"en-US"`` (US English). A list of supported language tags can be found in the `Google Cloud Speech API documentation <https://cloud.google.com/speech/docs/languages>`__.
-
-        If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.
-
-        ``api_params`` are Cloud Speech API-specific parameters as dict (optional). For more information see <https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1.types.RecognitionConfig>
-
-            The ``use_enhanced`` is a boolean option. If use_enhanced is set to true and the model field is not set,
-            then an appropriate enhanced model is chosen if an enhanced model exists for the audio.
-            If use_enhanced is true and an enhanced version of the specified model does not exist,
-            then the speech is recognized using the standard version of the specified model.
-
-            Furthermore, if the option ``use_enhanced`` has not been set the option ``model`` can be used, which can be used to select the model best
-            suited to your domain to get best results. If a model is not explicitly specified,
-            then we auto-select a model based on the other parameters of this method.
-
-        Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary.
-
-        Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection.
-        """
-        assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data"
-        if credentials_json is None:
-            assert os.environ.get('GOOGLE_APPLICATION_CREDENTIALS') is not None
-        assert isinstance(language, str), "``language`` must be a string"
-        assert preferred_phrases is None or all(isinstance(preferred_phrases, (type(""), type(u""))) for preferred_phrases in preferred_phrases), "``preferred_phrases`` must be a list of strings"
-
-        try:
-            import socket
-
-            from google.api_core.exceptions import GoogleAPICallError
-            from google.cloud import speech
-        except ImportError:
-            raise RequestError('missing google-cloud-speech module: ensure that google-cloud-speech is set up correctly.')
-
-        if credentials_json is not None:
-            client = speech.SpeechClient.from_service_account_json(credentials_json)
-        else:
-            client = speech.SpeechClient()
-
-        flac_data = audio_data.get_flac_data(
-            convert_rate=None if 8000 <= audio_data.sample_rate <= 48000 else max(8000, min(audio_data.sample_rate, 48000)),  # audio sample rate must be between 8 kHz and 48 kHz inclusive - clamp sample rate into this range
-            convert_width=2  # audio samples must be 16-bit
-        )
-        audio = speech.RecognitionAudio(content=flac_data)
-
-        config = {
-            'encoding': speech.RecognitionConfig.AudioEncoding.FLAC,
-            'sample_rate_hertz': audio_data.sample_rate,
-            'language_code': language,
-            **api_params,
-        }
-        if preferred_phrases is not None:
-            config['speechContexts'] = [speech.SpeechContext(
-                phrases=preferred_phrases
-            )]
-        if show_all:
-            config['enableWordTimeOffsets'] = True  # some useful extra options for when we want all the output
-
-        opts = {}
-        if self.operation_timeout and socket.getdefaulttimeout() is None:
-            opts['timeout'] = self.operation_timeout
-
-        config = speech.RecognitionConfig(**config)
-
-        try:
-            response = client.recognize(config=config, audio=audio)
-        except GoogleAPICallError as e:
-            raise RequestError(e)
-        except URLError as e:
-            raise RequestError("recognition connection failed: {0}".format(e.reason))
-
-        if show_all: return response
-        if len(response.results) == 0: raise UnknownValueError()
-
-        transcript = ''
-        for result in response.results:
-            transcript += result.alternatives[0].transcript.strip() + ' '
-        return transcript
-
     def recognize_wit(self, audio_data, key, show_all=False):
         """
         Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Wit.ai API.
@@ -1518,11 +1434,12 @@ def flush(self, *args, **kwargs):
 # At this time, the dependencies are not yet installed, resulting in a ModuleNotFoundError.
 # This is a workaround to resolve this issue
 try:
-    from .recognizers import google, openai, groq
+    from .recognizers import google, google_cloud, openai, groq
 except (ModuleNotFoundError, ImportError):
     pass
 else:
     Recognizer.recognize_google = google.recognize_legacy
+    Recognizer.recognize_google_cloud = google_cloud.recognize
     Recognizer.recognize_openai = openai.recognize
     Recognizer.recognize_whisper_api = openai.recognize  # Deprecated
     Recognizer.recognize_groq = groq.recognize