Speech Recognition Accuracy #33

devloop0 · 2016-06-04T07:56:06Z

Currently, the library we are using is CMU Sphinx: http://cmusphinx.sourceforge.net/
The problem is the unlimited vocabulary recognition is extremely difficult locally. Even Google sends speech to their servers for more accurate speech recognition.
http://9to5google.com/2016/03/11/google-accurate-offline-voice-recognition/
The good news is that it is probably possible to get highly accurate speech recognition locally, the bad news is that it will take significant processing power and memory. If we choose to use our own speech recognition system, then we may have to stop using CMU Sphinx or modify their code.

Xymanek · 2016-06-04T10:40:07Z

Well some of us (like me) are running on pretty decent hardware and can support the amount of processing required for accurate speech recognition.

Also I think google provides an API for speech recognition so that may be an option for others.

That said how about changing the webcam select window to "Machine boot properties" window and adding another drop down - "Speech recognition mode" with three options: "Local (fast)", "Local (advanced)" and "Online (send speech to Google)". Or something similar

devloop0 · 2016-06-04T10:53:53Z

The problem with a service like Google is that there is only a limited number of requests you can make. Also, their desktop Speech API is currently in preview and may cost money for Google Cloud instances. Additionally, according to this link: http://stackoverflow.com/questions/12721436/google-speech-api, it may not be wise to just latch onto a Google service. If this is the only other option, I am willing to look into it, but it is far from ideal. At least, with CMU Sphinx, we have unlimited requests and no API changes suddenly.

Xymanek · 2016-06-04T11:10:59Z

In any case I do suggest doing something about because in its current state it fail to recognize even simple phrases

devloop0 · 2016-06-04T11:14:13Z

Yeah, I expect this issue to be open for a while. I don't see any near-term solution for this problem. Additionally, to train a system like this, you need thousands of hours of audio with various noise levels and modulations and accents. Accessing open training sets like these are rare to come by.

Xalaxis added enhancement client server labels Jun 5, 2016

Xalaxis added this to the Day 1 milestone Jun 5, 2016

Xalaxis assigned devloop0 Jun 5, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech Recognition Accuracy #33

Speech Recognition Accuracy #33

devloop0 commented Jun 4, 2016

Xymanek commented Jun 4, 2016

devloop0 commented Jun 4, 2016

Xymanek commented Jun 4, 2016

devloop0 commented Jun 4, 2016

Speech Recognition Accuracy #33

Speech Recognition Accuracy #33

Comments

devloop0 commented Jun 4, 2016

Xymanek commented Jun 4, 2016

devloop0 commented Jun 4, 2016

Xymanek commented Jun 4, 2016

devloop0 commented Jun 4, 2016