Skip to content
This repository has been archived by the owner on Jul 5, 2018. It is now read-only.

Speech Recognition Accuracy #33

Open
devloop0 opened this issue Jun 4, 2016 · 4 comments
Open

Speech Recognition Accuracy #33

devloop0 opened this issue Jun 4, 2016 · 4 comments

Comments

@devloop0
Copy link
Collaborator

devloop0 commented Jun 4, 2016

Currently, the library we are using is CMU Sphinx: http://cmusphinx.sourceforge.net/
The problem is the unlimited vocabulary recognition is extremely difficult locally. Even Google sends speech to their servers for more accurate speech recognition.
http://9to5google.com/2016/03/11/google-accurate-offline-voice-recognition/
The good news is that it is probably possible to get highly accurate speech recognition locally, the bad news is that it will take significant processing power and memory. If we choose to use our own speech recognition system, then we may have to stop using CMU Sphinx or modify their code.

@Xymanek
Copy link

Xymanek commented Jun 4, 2016

Well some of us (like me) are running on pretty decent hardware and can support the amount of processing required for accurate speech recognition.

Also I think google provides an API for speech recognition so that may be an option for others.

That said how about changing the webcam select window to "Machine boot properties" window and adding another drop down - "Speech recognition mode" with three options: "Local (fast)", "Local (advanced)" and "Online (send speech to Google)". Or something similar

@devloop0
Copy link
Collaborator Author

devloop0 commented Jun 4, 2016

The problem with a service like Google is that there is only a limited number of requests you can make. Also, their desktop Speech API is currently in preview and may cost money for Google Cloud instances. Additionally, according to this link: http://stackoverflow.com/questions/12721436/google-speech-api, it may not be wise to just latch onto a Google service. If this is the only other option, I am willing to look into it, but it is far from ideal. At least, with CMU Sphinx, we have unlimited requests and no API changes suddenly.

@Xymanek
Copy link

Xymanek commented Jun 4, 2016

In any case I do suggest doing something about because in its current state it fail to recognize even simple phrases

@devloop0
Copy link
Collaborator Author

devloop0 commented Jun 4, 2016

Yeah, I expect this issue to be open for a while. I don't see any near-term solution for this problem. Additionally, to train a system like this, you need thousands of hours of audio with various noise levels and modulations and accents. Accessing open training sets like these are rare to come by.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants