-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ros_speech_recognition] Add WordInfo to SpeechRecognitionCandidates message. #320
base: master
Are you sure you want to change the base?
Conversation
b6edf13
to
ae3f3eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the useful features.
I left some reviews.
@@ -30,6 +30,106 @@ This package uses Python package [SpeechRecognition](https://pypi.python.org/pyp | |||
print result # => 'Hello, world!' | |||
``` | |||
|
|||
If you are using `ros_speech_recognition` with `~continuous` is `True`, you can subscribe `/Tablet/voice` (`speech_recognition_msgs/SpeechRecognitionCandidates`) message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You changed speech_recognition_msgs/SpeechRecognitionCandidates
content in jsk-ros-pkg/jsk_common_msgs#28
In my opinion, we should create new message type like speech_recognition_msgs/GoogleCloudSpeechRecognitionCandidates
in addition to the existing speech_recognition_msgs/SpeechRecognitionCandidates
This is because the new fields (e.g. wordinfo) seems to be specific to the google cloud speech-to-text.
This is my preference, but how about wrapping the common message by each speech-to-text service message?
The advantage of this method is we do not need to change the current speech_recognition_msgs/SpeechRecognitionCandidates
.
For example,
$ rosmsg show speech_recognition_msgs/GoogleCloudSpeechRecognitionCandidates
Header header
speech_recognition_msgs/SpeechRecognitionCandidates candidates
speech_recognition_msgs/SentenceInfo[] sentences
In addition, please consider that julius_ros
also publishes speech_recognition_msgs/SpeechRecognitionCandidates
.
https://github.com/jsk-ros-pkg/jsk_3rdparty/tree/master/julius_ros#gmm-version
I'd like to hear your thoughts. @iory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the new fields (e.g. wordinfo) seems to be specific to the google cloud speech-to-text.
This is not specific to the google cloud speech-to-text.
For example, azure cognitive service has similar functions.
https://docs.microsoft.com/ja-jp/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speechconfig?view=azure-python#request-word-level-timestamps--
I think that the start time and end time for each word are general values as a framework for speech recognition.
This is my preference, but how about wrapping the common message by each speech-to-text service message?
The advantage of this method is we do not need to change the current
However, this is a good way to avoid affecting other users, so I'll take this direction.
|
||
|
||
```bash | ||
roslaunch ros_speech_recognition sample_ros_speech_recognition.launch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to set google_cloud_credentials_json:=xxx
arg?
I got the following error.
[ERROR] [1641453200.346079]: Unexpected error: (<class 'oauth2client.client.ApplicationDefaultCredentialsError'>, ApplicationDefaultCredentialsError('The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.',), <traceback object at 0x7f980f9a6960>)
I think
roslaunch ros_speech_recognition sample_ros_speech_recognition.launch google_cloud_credentials_json:=xxx.json
is more helpful.
<launch> | ||
|
||
<arg name="google_cloud_credentials_json" default="''" doc="Credential JSON is only used when the engine is GoogleCloud." /> | ||
<arg name="engine" default="GoogleCloud" doc="Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, GoogleCloud
needs credentials. (It costs money)
I think the default engine should be Google
so that we can try this ROS package for free.
If you think GoogleCloud
should be the default for ros_speech_recognition
, I think it's ok to keep this change.
@@ -9,10 +13,12 @@ | |||
</rosparam> | |||
<include file="$(find ros_speech_recognition)/launch/speech_recognition.launch"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about integrating the content of this test to sample_ros_speech_recognition.launch
?
or
How about moving the content of this test to sample_ros_speech_recognition_ja.launch
?
@@ -30,6 +30,106 @@ This package uses Python package [SpeechRecognition](https://pypi.python.org/pyp | |||
print result # => 'Hello, world!' | |||
``` | |||
|
|||
If you are using `ros_speech_recognition` with `~continuous` is `True`, you can subscribe `/Tablet/voice` (`speech_recognition_msgs/SpeechRecognitionCandidates`) message. | |||
|
|||
1. Launch sample launch file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding google cloud speech-to-text URL to README.md too?
It would be helpful.
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#wordinfo
@@ -30,6 +30,106 @@ This package uses Python package [SpeechRecognition](https://pypi.python.org/pyp | |||
print result # => 'Hello, world!' | |||
``` | |||
|
|||
If you are using `ros_speech_recognition` with `~continuous` is `True`, you can subscribe `/Tablet/voice` (`speech_recognition_msgs/SpeechRecognitionCandidates`) message. | |||
|
|||
1. Launch sample launch file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for documentation.
In addition to the engine:=GoogleCloud
example, it would be very helpful if you could add the engine:=Google
example.
|
||
The `word` is recognized word and the `confidence` means a higher number indicates an estimated greater likelihood that the recognized words are correct. | ||
`start_time` indicates time offset relative to the beginning of the audio (timestamp of header), and corresponding to the start of the spoken word. | ||
`end_time` indicates time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be usefull to put these descriptions to message definition.
@@ -322,11 +392,14 @@ def speech_recognition_srv_cb(self, req): | |||
rospy.loginfo("Waiting for result... (Sent %d bytes)" % len(audio.get_raw_data())) | |||
|
|||
try: | |||
header = std_msgs.msg.Header(stamp=rospy.Time.now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this timestamp may be misleading. (Not start time of original audio data, But the time of speech recognition).
@iory Great work! I have also left some comments on jsk-ros-pkg/jsk_common_msgs#28. I think we needs more discussion about |
What is this?
This PR enables publishing
start_time
,end_time
,confidence
andspeaker_tag
.This PR requires the following PR for new message. jsk-ros-pkg/jsk_common_msgs#28
Example
If you are using
ros_speech_recognition
with~continuous
isTrue
, you can subscribe/Tablet/voice
(speech_recognition_msgs/SpeechRecognitionCandidates
) message.Launch sample launch file.
echo the message.
The
word
is recognized word and theconfidence
means a higher number indicates an estimated greater likelihood that the recognized words are correct.start_time
indicates time offset relative to the beginning of the audio (timestamp of header), and corresponding to the start of the spoken word.end_time
indicates time offset relative to the beginning of the audio, and corresponding to the end of the spoken word.