Skip to content

Commit

Permalink
Add support for AT&T Speech to Text, bump version.
Browse files Browse the repository at this point in the history
  • Loading branch information
Uberi committed Sep 2, 2015
1 parent fc361e5 commit 300caa9
Show file tree
Hide file tree
Showing 6 changed files with 126 additions and 26 deletions.
25 changes: 20 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Speech Recognition
:target: https://pypi.python.org/pypi/SpeechRecognition/
:alt: License

Library for performing speech recognition with support for Google Speech Recognition, `Wit.ai <https://wit.ai/>`__, and `IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__.
Library for performing speech recognition with support for Google Speech Recognition, `Wit.ai <https://wit.ai/>`__, `IBM Speech to Text <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html>`__, and `AT&T Speech to Text <http://developer.att.com/apis/speech>`__.

Links:

Expand All @@ -34,11 +34,11 @@ To quickly try it out, run ``python -m speech_recognition`` after installing.

How to cite this library (APA style):

Zhang, A. (2015). Speech Recognition (Version 3.0) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.
Zhang, A. (2015). Speech Recognition (Version 3.1) [Software]. Available from https://github.com/Uberi/speech_recognition#readme.

How to cite this library (Chicago style):

Zhang, Anthony. 2015. *Speech Recognition* (version 3.0).
Zhang, Anthony. 2015. *Speech Recognition* (version 3.1).

Also check out the `Python Baidu Yuyin API <https://github.com/DelightRun/PyBaiduYuyin>`__, which is based on an older version of this project, and adds support for `Baidu Yuyin <http://yuyin.baidu.com/>`__.

Expand Down Expand Up @@ -125,7 +125,7 @@ The solution is to decrease this threshold, or call ``recognizer_instance.adjust
The recognizer doesn't understand my particular language/dialect.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Try setting the recognition language to your language/dialect. To do this, see the documentation for ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, and ``recognizer_instance.recognize_ibm``.
Try setting the recognition language to your language/dialect. To do this, see the documentation for ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_ibm``, and ``recognizer_instance.recognize_att``.

For example, if your language/dialect is British English, it is better to use ``"en-GB"`` as the language rather than ``"en-US"``.

Expand All @@ -134,7 +134,7 @@ The code examples throw ``UnicodeEncodeError: 'ascii' codec can't encode charact

When you're using Python 2, and your language uses non-ASCII characters, and the terminal or file-like object you're printing to only supports ASCII, an error is thrown when trying to write non-ASCII characters.

This is because in Python 2, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, and ``recognizer_instance.recognize_ibm`` return unicode strings (``u"something"``) rather than byte strings (``"something"``). In Python 3, all strings are unicode strings.
This is because in Python 2, ``recognizer_instance.recognize_google``, ``recognizer_instance.recognize_wit``, ``recognizer_instance.recognize_ibm``, and ``recognizer_instance.recognize_att`` return unicode strings (``u"something"``) rather than byte strings (``"something"``). In Python 3, all strings are unicode strings.

To make printing of unicode strings work in Python 2 as well, replace all print statements in your code of the following form:

Expand Down Expand Up @@ -374,6 +374,21 @@ Returns the most likely transcription if ``show_all`` is false (the default). Ot

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.

``recognizer_instance.recognize_att(audio_data, app_key, app_secret, language="en-US", show_all = False)``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the AT&T Speech to Text API.

The AT&T Speech to Text app key and app secret are specified by ``app_key`` and ``app_secret``, respectively. Unfortunately, these are not available without `signing up for an account <http://developer.att.com/apis/speech>`__ and creating an app.

To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.

The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.

Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.

``AudioSource``
~~~~~~~~~~~~~~~

Expand Down
10 changes: 10 additions & 0 deletions examples/extended_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,13 @@
print("IBM Speech to Text could not understand audio")
except sr.RequestError:
print("Could not request results from IBM Speech to Text service")

# recognize speech using AT&T Speech to Text
ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
try:
print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
except sr.UnknownValueError:
print("AT&T Speech to Text could not understand audio")
except sr.RequestError:
print("Could not request results from AT&T Speech to Text service")
10 changes: 10 additions & 0 deletions examples/microphone_recognition.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,13 @@
print("IBM Speech to Text could not understand audio")
except sr.RequestError:
print("Could not request results from IBM Speech to Text service")

# recognize speech using AT&T Speech to Text
ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
try:
print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
except sr.UnknownValueError:
print("AT&T Speech to Text could not understand audio")
except sr.RequestError:
print("Could not request results from AT&T Speech to Text service")
10 changes: 10 additions & 0 deletions examples/wav_transcribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,13 @@
print("IBM Speech to Text could not understand audio")
except sr.RequestError:
print("Could not request results from IBM Speech to Text service")

# recognize speech using AT&T Speech to Text
ATT_APP_KEY = "INSERT AT&T SPEECH TO TEXT APP KEY HERE" # AT&T Speech to Text app keys are 32-character lowercase alphanumeric strings
ATT_APP_SECRET = "INSERT AT&T SPEECH TO TEXT APP SECRET HERE" # AT&T Speech to Text app secrets are 32-character lowercase alphanumeric strings
try:
print("AT&T Speech to Text thinks you said " + r.recognize_att(audio, app_key=ATT_APP_KEY, app_secret=ATT_APP_SECRET))
except sr.UnknownValueError:
print("AT&T Speech to Text could not understand audio")
except sr.RequestError:
print("Could not request results from AT&T Speech to Text service")
28 changes: 14 additions & 14 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
#!/usr/bin/env python3

from setuptools import setup, find_packages
from setuptools import setup

import sys
if sys.version_info < (2, 6):
print("THIS MODULE REQUIRES PYTHON 2.6 OR LATER. YOU ARE CURRENTLY USING PYTHON " + sys.version)
print("THIS MODULE REQUIRES PYTHON 2.6 OR LATER. YOU ARE CURRENTLY USING PYTHON {0}".format(sys.version))
sys.exit(1)

import speech_recognition

setup(
name="SpeechRecognition",
version=speech_recognition.__version__,
packages=["speech_recognition"],
include_package_data=True,
name = "SpeechRecognition",
version = speech_recognition.__version__,
packages = ["speech_recognition"],
include_package_data = True,

# PyPI metadata
author=speech_recognition.__author__,
author_email="[email protected]",
description=speech_recognition.__doc__,
long_description=open("README.rst").read(),
license=speech_recognition.__license__,
keywords="speech recognition google",
url="https://github.com/Uberi/speech_recognition#readme",
classifiers=[
author = speech_recognition.__author__,
author_email = "[email protected]",
description = speech_recognition.__doc__,
long_description = open("README.rst").read(),
license = speech_recognition.__license__,
keywords = "speech recognition google",
url = "https://github.com/Uberi/speech_recognition#readme",
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"Natural Language :: English",
Expand Down
69 changes: 62 additions & 7 deletions speech_recognition/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""Library for performing speech recognition with the Google Speech Recognition API."""

__author__ = "Anthony Zhang (Uberi)"
__version__ = "3.0.0"
__version__ = "3.1.0"
__license__ = "BSD"

import io, os, subprocess, wave, base64
Expand Down Expand Up @@ -380,7 +380,7 @@ def recognize_google(self, audio_data, key = None, language = "en-US", show_all
# check for invalid key response from the server
try:
response = urlopen(request)
except HTTPError as e:
except HTTPError:
raise RequestError("request failed, ensure that key is correct and quota is not maxed out")
except URLError:
raise RequestError("no internet connection available to transfer audio data")
Expand Down Expand Up @@ -416,7 +416,7 @@ def recognize_wit(self, audio_data, key, show_all = False):
The recognition language is configured in the Wit.ai app settings.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://wit.ai/docs/http/20141022#get-intent-via-text-link>`__ as a JSON dictionary.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, the quota for the key is maxed out, or there is no internet connection.
"""
Expand All @@ -440,22 +440,22 @@ def recognize_wit(self, audio_data, key, show_all = False):
if "_text" not in result or result["_text"] is None: raise UnknownValueError()
return result["_text"]

def recognize_ibm(self, audio_data, username, password, language="en-US", show_all = False):
def recognize_ibm(self, audio_data, username, password, language = "en-US", show_all = False):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the IBM Speech to Text API.
The IBM Speech to Text username and password are specified by ``username`` and ``password``, respectively. Unfortunately, these are not available without an account. IBM has published instructions for obtaining these credentials in the `IBM Watson Developer Cloud documentation <https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/getting_started/gs-credentials.shtml>`__.
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text/api/v1/#recognize>`__ as a JSON dictionary.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.
"""
assert isinstance(audio_data, AudioData), "Data must be audio data"
assert isinstance(username, str), "`username` must be a string"
assert isinstance(password, str), "`password` must be a string"
assert language in {"en-US", "es-ES", "ja-JP"}, "`language` must be a valid language."
assert language in ["en-US", "es-ES", "ja-JP"], "`language` must be a valid language."

flac_data = audio_data.get_flac_data()
model = "{0}_BroadbandModel".format(language)
Expand All @@ -468,7 +468,7 @@ def recognize_ibm(self, audio_data, username, password, language="en-US", show_a
request.add_header("Authorization", "Basic {0}".format(authorization_value))
try:
response = urlopen(request)
except HTTPError as e:
except HTTPError:
raise RequestError("request failed, ensure that username and password are correct")
except URLError:
raise RequestError("no internet connection available to transfer audio data")
Expand All @@ -485,6 +485,61 @@ def recognize_ibm(self, audio_data, username, password, language="en-US", show_a
# no transcriptions available
raise UnknownValueError()

def recognize_att(self, audio_data, app_key, app_secret, language = "en-US", show_all = False):
"""
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the AT&T Speech to Text API.
The AT&T Speech to Text app key and app secret are specified by ``app_key`` and ``app_secret``, respectively. Unfortunately, these are not available without `signing up for an account <http://developer.att.com/apis/speech>`__ and creating an app.
To get the app key and app secret for an AT&T app, go to the `My Apps page <https://matrix.bf.sl.attcompute.com/apps>`__ and look for "APP KEY" and "APP SECRET". AT&T app keys and app secrets are 32-character lowercase alphanumeric strings.
The recognition language is determined by ``language``, an IETF language tag with a dialect like ``"en-US"`` or ``"es-ES"``, defaulting to US English. At the moment, this supports the tags ``"en-US"``, ``"es-ES"``, and ``"ja-JP"``.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the `raw API response <https://developer.att.com/apis/speech/docs#resources-speech-to-text>`__ as a JSON dictionary.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the key isn't valid, or there is no internet connection.
"""
assert isinstance(audio_data, AudioData), "Data must be audio data"
assert isinstance(app_key, str), "`app_key` must be a string"
assert isinstance(app_secret, str), "`app_secret` must be a string"
assert language in ["en-US", "es-US"], "`language` must be a valid language."

# ensure we have an authentication token
authorization_url = "https://api.att.com/oauth/v4/token"
authorization_body = "client_id={0}&client_secret={1}&grant_type=client_credentials&scope=SPEECH".format(app_key, app_secret)
try:
authorization_response = urlopen(authorization_url, data = authorization_body.encode("utf-8"))
except HTTPError:
raise RequestError("credential request failed, ensure that app key and app secret are correct")
except URLError:
raise RequestError("no internet connection available to request credentials")
authorization_text = authorization_response.read().decode("utf-8")
authorization_bearer = json.loads(authorization_text).get("access_token")
if authorization_bearer is None: raise RequestError("missing OAuth access token in requested credentials")

wav_data = audio_data.get_wav_data()
url = "https://api.att.com/speech/v3/speechToText"
request = Request(url, data = wav_data, headers = {"Authorization": "Bearer {0}".format(authorization_bearer), "Content-Language": language, "Content-Type": "audio/wav"})
try:
response = urlopen(request)
except HTTPError:
raise RequestError("request failed, ensure that username and password are correct")
except URLError:
raise RequestError("no internet connection available to transfer audio data")
response_text = response.read().decode("utf-8")
result = json.loads(response_text)

if show_all: return result

if "Recognition" not in result or "NBest" not in result["Recognition"]:
raise UnknownValueError()
for entry in result["Recognition"]["NBest"]:
if entry.get("Grade") == "accept" and "ResultText" in entry:
return entry["ResultText"]

# no transcriptions available
raise UnknownValueError()

def shutil_which(pgm):
"""
python2 backport of python3's shutil.which()
Expand Down

0 comments on commit 300caa9

Please sign in to comment.