Add support for AT&T Speech to Text, bump version.

Uberi · Sep 2, 2015 · 300caa9 · 300caa9
1 parent fc361e5
commit 300caa9
Show file tree

Hide file tree

Showing 6 changed files with 126 additions and 26 deletions.
diff --git a/README.rst b/README.rst
@@ -21,7 +21,7 @@ Speech Recognition
     :target: https://pypi.python.org/pypi/SpeechRecognition/
     :alt: License
 
-Library for performing speech recognition with support for Google Speech Recognition, `Wit.ai <https://wit.ai/>`__, and `IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__.
+Library for performing speech recognition with support for Google Speech Recognition, `Wit.ai <https://wit.ai/>`__, `IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__, and `AT&T Speech to Text <http://developer.att.com/apis/speech>`__.
 
 Links:
 
@@ -34,11 +34,11 @@ To quickly try it out, run ``python -m speech_recognition`` after installing.
 
 How to cite this library (APA style):
 
-    Zhang, A. (2015). Speech Recognition (Version 3.0) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.
+    Zhang, A. (2015). Speech Recognition (Version 3.1) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.
 
 How to cite this library (Chicago style):
 
-    Zhang, Anthony. 2015. *Speech Recognition* (version 3.0).
+    Zhang, Anthony. 2015. *Speech Recognition* (version 3.1).
 
 Also check out the `Python Baidu Yuyin API <https://github.com/DelightRun/PyBaiduYuyin>`__, which is based on an older version of this project, and adds support for `Baidu Yuyin <http://yuyin.baidu.com/>`__.
 
@@ -125,7 +125,7 @@ The solution is to decrease this threshold, or call ``recognizer_instance.adjust
 The recognizer doesn't understand my particular language/dialect.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Try setting the recognition language to your language/dialect. To do this, see the documentation for ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, and ``recognizer_instance.recognize_ibm``.
+Try setting the recognition language to your language/dialect. To do this, see the documentation for ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_ibm``, and ``recognizer_instance.recognize_att``.
 
 For example, if your language/dialect is British English, it is better to use ``"en-GB"`` as the language rather than ``"en-US"``.
 
@@ -134,7 +134,7 @@ The code examples throw ``UnicodeEncodeError: 'ascii' codec can't encode charact
 
 When you're using Python 2, and your language uses non-ASCII characters, and the terminal or file-like object you're printing to only supports ASCII, an error is thrown when trying to write non-ASCII characters.
 
-This is because in Python 2, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, and ``recognizer_instance.recognize_ibm`` return unicode strings (``u"something"``) rather than byte strings (``"something"``). In Python 3, all strings are unicode strings.
+This is because in Python 2, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_ibm``, and ``recognizer_instance.recognize_att`` return unicode strings (``u"something"``) rather than byte strings (``"something"``). In Python 3, all strings are unicode strings.
 
 To make printing of unicode strings work in Python 2 as well, replace all print statements in your code of the following form:
 
@@ -374,6 +374,21 @@ Returns the most likely transcription if ``show_all`` is false (the default). Ot
 
 Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.
 
+``recognizer_instance.recognize_att(audio_data, app_key, app_secret, language="en-US", show_all = False)``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the AT&T Speech to Text API.
+
+The AT&T Speech to Text app key and app secret are specified by ``app_key`` and ``app_secret``, respectively. Unfortunately, these are not available without `signing up for an account <http://developer.att.com/apis/speech>`__ and creating an app.
+
+To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.
+
+The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
+
+Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.
+
+Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.
+
 ``AudioSource``
 ~~~~~~~~~~~~~~~
 

diff --git a/examples/extended_results.py b/examples/extended_results.py
@@ -46,3 +46,13 @@
     print("IBM Speech to Text could not understand audio")
 except sr.RequestError:
     print("Could not request results from IBM Speech to Text service")
+
+# recognize speech using AT&T Speech to Text
+ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
+ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
+try:
+    print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
+except sr.UnknownValueError:
+    print("AT&T Speech to Text could not understand audio")
+except sr.RequestError:
+    print("Could not request results from AT&T Speech to Text service")
diff --git a/examples/microphone_recognition.py b/examples/microphone_recognition.py
@@ -39,3 +39,13 @@
     print("IBM Speech to Text could not understand audio")
 except sr.RequestError:
     print("Could not request results from IBM Speech to Text service")
+
+# recognize speech using AT&T Speech to Text
+ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
+ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
+try:
+    print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
+except sr.UnknownValueError:
+    print("AT&T Speech to Text could not understand audio")
+except sr.RequestError:
+    print("Could not request results from AT&T Speech to Text service")
diff --git a/examples/wav_transcribe.py b/examples/wav_transcribe.py
@@ -40,3 +40,13 @@
     print("IBM Speech to Text could not understand audio")
 except sr.RequestError:
     print("Could not request results from IBM Speech to Text service")
+
+# recognize speech using AT&T Speech to Text
+ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
+ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
+try:
+    print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
+except sr.UnknownValueError:
+    print("AT&T Speech to Text could not understand audio")
+except sr.RequestError:
+    print("Could not request results from AT&T Speech to Text service")
diff --git a/setup.py b/setup.py
@@ -1,29 +1,29 @@
 #!/usr/bin/env python3
 
-from setuptools import setup, find_packages
+from setuptools import setup
 
 import sys
 if sys.version_info < (2, 6):
-    print("THIS MODULE REQUIRES PYTHON 2.6 OR LATER. YOU ARE CURRENTLY USING PYTHON " + sys.version)
+    print("THIS MODULE REQUIRES PYTHON 2.6 OR LATER. YOU ARE CURRENTLY USING PYTHON {0}".format(sys.version))
     sys.exit(1)
 
 import speech_recognition
 
 setup(
-    name="SpeechRecognition",
-    version=speech_recognition.__version__,
-    packages=["speech_recognition"],
-    include_package_data=True,
+    name = "SpeechRecognition",
+    version = speech_recognition.__version__,
+    packages = ["speech_recognition"],
+    include_package_data = True,
 
     # PyPI metadata
-    author=speech_recognition.__author__,
-    author_email="[email protected]",
-    description=speech_recognition.__doc__,
-    long_description=open("README.rst").read(),
-    license=speech_recognition.__license__,
-    keywords="speech recognition google",
-    url="https://github.com/Uberi/speech_recognition#readme",
-    classifiers=[
+    author = speech_recognition.__author__,
+    author_email = "[email protected]",
+    description = speech_recognition.__doc__,
+    long_description = open("README.rst").read(),
+    license = speech_recognition.__license__,
+    keywords = "speech recognition google",
+    url = "https://github.com/Uberi/speech_recognition#readme",
+    classifiers = [
         "Development Status :: 5 - Production/Stable",
         "Intended Audience :: Developers",
         "Natural Language :: English",

diff --git a/speech_recognition/__init__.py b/speech_recognition/__init__.py
@@ -3,7 +3,7 @@
 """Library for performing speech recognition with the Google Speech Recognition API."""
 
 __author__ = "Anthony Zhang (Uberi)"
-__version__ = "3.0.0"
+__version__ = "3.1.0"
 __license__ = "BSD"
 
 import io, os, subprocess, wave, base64
@@ -380,7 +380,7 @@ def recognize_google(self, audio_data, key = None, language = "en-US", show_all
         # check for invalid key response from the server
         try:
             response = urlopen(request)
-        except HTTPError as e:
+        except HTTPError:
             raise RequestError("request failed, ensure that key is correct and quota is not maxed out")
         except URLError:
             raise RequestError("no internet connection available to transfer audio data")
@@ -416,7 +416,7 @@ def recognize_wit(self, audio_data, key, show_all = False):
 
         The recognition language is configured in the Wit.ai app settings.
 
-        Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
+        Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://wit.ai/docs/http/20141022#get-intent-via-text-link>`__ as a JSON dictionary.
 
         Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, the quota for the key is maxed out, or there is no internet connection.
         """
@@ -440,22 +440,22 @@ def recognize_wit(self, audio_data, key, show_all = False):
         if "_text" not in result or result["_text"] is None: raise UnknownValueError()
         return result["_text"]
 
-    def recognize_ibm(self, audio_data, username, password, language="en-US", show_all = False):
+    def recognize_ibm(self, audio_data, username, password, language = "en-US", show_all = False):
         """
         Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the IBM Speech to Text API.
 
         The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without an account. IBM has published instructions for obtaining these credentials in the `IBM Watson Developer Cloud documentation <https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml>`__.
 
         The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
 
-        Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
+        Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize>`__ as a JSON dictionary.
 
         Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.
         """
         assert isinstance(audio_data, AudioData), "Data must be audio data"
         assert isinstance(username, str), "`username` must be a string"
         assert isinstance(password, str), "`password` must be a string"
-        assert language in {"en-US", "es-ES", "ja-JP"}, "`language` must be a valid language."
+        assert language in ["en-US", "es-ES", "ja-JP"], "`language` must be a valid language."
 
         flac_data = audio_data.get_flac_data()
         model = "{0}_BroadbandModel".format(language)
@@ -468,7 +468,7 @@ def recognize_ibm(self, audio_data, username, password, language="en-US", show_a
         request.add_header("Authorization", "Basic {0}".format(authorization_value))
         try:
             response = urlopen(request)
-        except HTTPError as e:
+        except HTTPError:
             raise RequestError("request failed, ensure that username and password are correct")
         except URLError:
             raise RequestError("no internet connection available to transfer audio data")
@@ -485,6 +485,61 @@ def recognize_ibm(self, audio_data, username, password, language="en-US", show_a
         # no transcriptions available
         raise UnknownValueError()
 
+    def recognize_att(self, audio_data, app_key, app_secret, language = "en-US", show_all = False):
+        """
+        Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the AT&T Speech to Text API.
+
+        The AT&T Speech to Text app key and app secret are specified by ``app_key`` and ``app_secret``, respectively. Unfortunately, these are not available without `signing up for an account <http://developer.att.com/apis/speech>`__ and creating an app.
+
+        To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.
+
+        The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
+
+        Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.
+
+        Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.
+        """
+        assert isinstance(audio_data, AudioData), "Data must be audio data"
+        assert isinstance(app_key, str), "`app_key` must be a string"
+        assert isinstance(app_secret, str), "`app_secret` must be a string"
+        assert language in ["en-US", "es-US"], "`language` must be a valid language."
+
+        # ensure we have an authentication token
+        authorization_url = "https://api.att.com/oauth/v4/token"
+        authorization_body = "client_id={0}&client_secret={1}&grant_type=client_credentials&scope=SPEECH".format(app_key, app_secret)
+        try:
+            authorization_response = urlopen(authorization_url, data = authorization_body.encode("utf-8"))
+        except HTTPError:
+            raise RequestError("credential request failed, ensure that app key and app secret are correct")
+        except URLError:
+            raise RequestError("no internet connection available to request credentials")
+        authorization_text = authorization_response.read().decode("utf-8")
+        authorization_bearer = json.loads(authorization_text).get("access_token")
+        if authorization_bearer is None: raise RequestError("missing OAuth access token in requested credentials")
+
+        wav_data = audio_data.get_wav_data()
+        url = "https://api.att.com/speech/v3/speechToText"
+        request = Request(url, data = wav_data, headers = {"Authorization": "Bearer {0}".format(authorization_bearer), "Content-Language": language, "Content-Type": "audio/wav"})
+        try:
+            response = urlopen(request)
+        except HTTPError:
+            raise RequestError("request failed, ensure that username and password are correct")
+        except URLError:
+            raise RequestError("no internet connection available to transfer audio data")
+        response_text = response.read().decode("utf-8")
+        result = json.loads(response_text)
+
+        if show_all: return result
+
+        if "Recognition" not in result or "NBest" not in result["Recognition"]:
+            raise UnknownValueError()
+        for entry in result["Recognition"]["NBest"]:
+            if entry.get("Grade") == "accept" and "ResultText" in entry:
+                return entry["ResultText"]
+
+        # no transcriptions available
+        raise UnknownValueError()
+
 def shutil_which(pgm):
     """
     python2 backport of python3's shutil.which()