[ros_speech_recognition] Add WordInfo to SpeechRecognitionCandidates message. #320

iory · 2022-01-05T12:24:42Z

What is this?

This PR enables publishing start_time, end_time, confidence and speaker_tag.
This PR requires the following PR for new message. jsk-ros-pkg/jsk_common_msgs#28

Example

If you are using ros_speech_recognition with ~continuous is True, you can subscribe /Tablet/voice (speech_recognition_msgs/SpeechRecognitionCandidates) message.

Launch sample launch file.

roslaunch ros_speech_recognition sample_ros_speech_recognition.launch

echo the message.

$ rostopic echo /Tablet/voice
transcript:
  - may I help you
confidence: [0.9286448955535889]
sentences:
  -
    header:
      seq: 0
      stamp:
        secs: 1641425262
        nsecs: 268165588
      frame_id: ''
    words:
      -
        start_time: 0.0
        end_time: 0.2
        word: "may"
        confidence: 0.91376436
        speaker_tag: 0
      -
        start_time: 0.2
        end_time: 0.4
        word: "I"
        confidence: 0.9366196
        speaker_tag: 0
      -
        start_time: 0.4
        end_time: 0.5
        word: "help"
        confidence: 0.9531065
        speaker_tag: 0
      -
        start_time: 0.5
        end_time: 0.8
        word: "you"
        confidence: 0.9110889
        speaker_tag: 0
---
transcript:
  - pick up the red kettle
confidence: [0.9499567747116089]
sentences:
  -
    header:
      seq: 0
      stamp:
        secs: 1641425268
        nsecs:  58182954
      frame_id: ''
    words:
      -
        start_time: 0.0
        end_time: 0.4
        word: "pick"
        confidence: 0.953269
        speaker_tag: 0
      -
        start_time: 0.4
        end_time: 0.6
        word: "up"
        confidence: 0.95326656
        speaker_tag: 0
      -
        start_time: 0.6
        end_time: 0.8
        word: "the"
        confidence: 0.96866167
        speaker_tag: 0
      -
        start_time: 0.8
        end_time: 1.1
        word: "red"
        confidence: 0.98762906
        speaker_tag: 0
      -
        start_time: 1.1
        end_time: 1.5
        word: "kettle"
        confidence: 0.8869578
        speaker_tag: 0

The word is recognized word and the confidence means a higher number indicates an estimated greater likelihood that the recognized words are correct.
start_time indicates time offset relative to the beginning of the audio (timestamp of header), and corresponding to the start of the spoken word.
end_time indicates time offset relative to the beginning of the audio, and corresponding to the end of the spoken word.

…on for test

…-speech

…ation is True

…e launch file

708yamaguchi

Thank you very much for the useful features.

I left some reviews.

708yamaguchi · 2022-01-06T06:38:47Z

ros_speech_recognition/README.md

@@ -30,6 +30,106 @@ This package uses Python package [SpeechRecognition](https://pypi.python.org/pyp
  print result # => 'Hello, world!'
  ```

+If you are using `ros_speech_recognition` with `~continuous` is `True`, you can subscribe `/Tablet/voice` (`speech_recognition_msgs/SpeechRecognitionCandidates`) message.


@iory

You changed speech_recognition_msgs/SpeechRecognitionCandidates content in jsk-ros-pkg/jsk_common_msgs#28

In my opinion, we should create new message type like speech_recognition_msgs/GoogleCloudSpeechRecognitionCandidates in addition to the existing speech_recognition_msgs/SpeechRecognitionCandidates

This is because the new fields (e.g. wordinfo) seems to be specific to the google cloud speech-to-text.

This is my preference, but how about wrapping the common message by each speech-to-text service message?
The advantage of this method is we do not need to change the current speech_recognition_msgs/SpeechRecognitionCandidates.

For example,

$ rosmsg show speech_recognition_msgs/GoogleCloudSpeechRecognitionCandidates Header header speech_recognition_msgs/SpeechRecognitionCandidates candidates speech_recognition_msgs/SentenceInfo[] sentences

In addition, please consider that julius_ros also publishes speech_recognition_msgs/SpeechRecognitionCandidates.
https://github.com/jsk-ros-pkg/jsk_3rdparty/tree/master/julius_ros#gmm-version

I'd like to hear your thoughts. @iory

This is because the new fields (e.g. wordinfo) seems to be specific to the google cloud speech-to-text.

This is not specific to the google cloud speech-to-text.
For example, azure cognitive service has similar functions.
https://docs.microsoft.com/ja-jp/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.speechconfig?view=azure-python#request-word-level-timestamps--

I think that the start time and end time for each word are general values as a framework for speech recognition.

This is my preference, but how about wrapping the common message by each speech-to-text service message?
The advantage of this method is we do not need to change the current

However, this is a good way to avoid affecting other users, so I'll take this direction.

708yamaguchi · 2022-01-06T07:16:36Z

ros_speech_recognition/README.md

+
+
+    ```bash
+    roslaunch ros_speech_recognition sample_ros_speech_recognition.launch


We may need to set google_cloud_credentials_json:=xxx arg?
I got the following error.

[ERROR] [1641453200.346079]: Unexpected error: (<class 'oauth2client.client.ApplicationDefaultCredentialsError'>, ApplicationDefaultCredentialsError('The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.',), <traceback object at 0x7f980f9a6960>)

I think

roslaunch ros_speech_recognition sample_ros_speech_recognition.launch google_cloud_credentials_json:=xxx.json

is more helpful.

708yamaguchi · 2022-01-06T07:23:26Z

ros_speech_recognition/sample/sample_ros_speech_recognition.launch

+<launch>
+
+  <arg name="google_cloud_credentials_json" default="''" doc="Credential JSON is only used when the engine is GoogleCloud." />
+  <arg name="engine" default="GoogleCloud" doc="Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM" />


In my understanding, GoogleCloud needs credentials. (It costs money)
I think the default engine should be Google so that we can try this ROS package for free.

If you think GoogleCloud should be the default for ros_speech_recognition, I think it's ok to keep this change.

708yamaguchi · 2022-01-06T07:28:16Z

ros_speech_recognition/test/sample_ros_speech_recognition.test

@@ -9,10 +13,12 @@
  </rosparam>
  <include file="$(find ros_speech_recognition)/launch/speech_recognition.launch">


How about integrating the content of this test to sample_ros_speech_recognition.launch?
or
How about moving the content of this test to sample_ros_speech_recognition_ja.launch?

708yamaguchi · 2022-01-06T07:35:54Z

ros_speech_recognition/README.md

@@ -30,6 +30,106 @@ This package uses Python package [SpeechRecognition](https://pypi.python.org/pyp
  print result # => 'Hello, world!'
  ```

+If you are using `ros_speech_recognition` with `~continuous` is `True`, you can subscribe `/Tablet/voice` (`speech_recognition_msgs/SpeechRecognitionCandidates`) message.
+
+1. Launch sample launch file.


How about adding google cloud speech-to-text URL to README.md too?
It would be helpful.
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#wordinfo

708yamaguchi · 2022-01-06T07:43:00Z

ros_speech_recognition/README.md

@@ -30,6 +30,106 @@ This package uses Python package [SpeechRecognition](https://pypi.python.org/pyp
  print result # => 'Hello, world!'
  ```

+If you are using `ros_speech_recognition` with `~continuous` is `True`, you can subscribe `/Tablet/voice` (`speech_recognition_msgs/SpeechRecognitionCandidates`) message.
+
+1. Launch sample launch file.


Thank you very much for documentation.

In addition to the engine:=GoogleCloud example, it would be very helpful if you could add the engine:=Google example.

sktometometo · 2022-04-04T08:35:09Z

ros_speech_recognition/README.md

+
+The `word` is recognized word and the `confidence` means a higher number indicates an estimated greater likelihood that the recognized words are correct.
+`start_time` indicates time offset relative to the beginning of the audio (timestamp of header), and corresponding to the start of the spoken word.
+`end_time` indicates time offset relative to the beginning of the audio, and corresponding to the end of the spoken word.


It will be usefull to put these descriptions to message definition.

sktometometo · 2022-04-04T08:41:11Z

ros_speech_recognition/scripts/speech_recognition_node.py

@@ -322,11 +392,14 @@ def speech_recognition_srv_cb(self, req):
                rospy.loginfo("Waiting for result... (Sent %d bytes)" % len(audio.get_raw_data()))

                try:
+                    header = std_msgs.msg.Header(stamp=rospy.Time.now())


I think this timestamp may be misleading. (Not start time of original audio data, But the time of speech recognition).

sktometometo · 2022-04-04T08:44:37Z

@iory Great work! I have also left some comments on jsk-ros-pkg/jsk_common_msgs#28. I think we needs more discussion about start_time and end_time representation.

iory added 9 commits January 5, 2022 21:29

[ros_speech_recognition] Add WordInfo for GoogleCloud option

c841abc

[ros_speech_recognition] Add credentials_path error handling

53f5abb

[ros_speech_recognition] Add language and google_cloud_credentials_js…

17962f8

…on for test

[ros_speech_recognition] Add enable_word_confidence option for google…

c74cc89

…-speech

[ros_speech_recognition] Publish information when enableSpeakerDiariz…

3c4b326

…ation is True

[ros_speech_recognition] Add enableSpeakerDialization option in sampl…

3387500

…e launch file

[ros_speech_recognition] Add doc for GoogleCloud

03cd8ed

[ros_speech_recognition] Add URL for wordinfo

9ca04e9

[ros_speech_recognition] Publish float time

ae3f3eb

iory force-pushed the add-time-information branch from b6edf13 to ae3f3eb Compare January 5, 2022 12:30

iory added 4 commits January 5, 2022 21:56

[ros_speech_recognition] Seperate parse result

c267662

[ros_speech_recognition] Fixed service call results

46b319d

[ros_speech_recognition] Add publish result information

a927fef

[ros_speech_recognition] Add sample rosbag and sample launch file

70aa6a4

iory requested review from mqcmd196 and 708yamaguchi January 5, 2022 23:39

708yamaguchi reviewed Jan 6, 2022

View reviewed changes

Merge branch 'master' into add-time-information

c0b44bd

k-okada requested a review from sktometometo April 2, 2022 03:18

k-okada added NeedReview waitForUpstream labels Apr 2, 2022

sktometometo reviewed Apr 4, 2022

View reviewed changes

k-okada added the ros_speech_recognition label Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ros_speech_recognition] Add WordInfo to SpeechRecognitionCandidates message. #320

[ros_speech_recognition] Add WordInfo to SpeechRecognitionCandidates message. #320

iory commented Jan 5, 2022 •

edited

Loading

708yamaguchi left a comment

708yamaguchi Jan 6, 2022

iory Jan 6, 2022

708yamaguchi Jan 6, 2022

708yamaguchi Jan 6, 2022

708yamaguchi Jan 6, 2022

708yamaguchi Jan 6, 2022

708yamaguchi Jan 6, 2022

sktometometo Apr 4, 2022

sktometometo Apr 4, 2022

sktometometo commented Apr 4, 2022



		```bash
		roslaunch ros_speech_recognition sample_ros_speech_recognition.launch

		@@ -9,10 +13,12 @@
		</rosparam>
		<include file="$(find ros_speech_recognition)/launch/speech_recognition.launch">

[ros_speech_recognition] Add WordInfo to SpeechRecognitionCandidates message. #320

Are you sure you want to change the base?

[ros_speech_recognition] Add WordInfo to SpeechRecognitionCandidates message. #320

Conversation

iory commented Jan 5, 2022 • edited Loading

What is this?

Example

708yamaguchi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sktometometo commented Apr 4, 2022

iory commented Jan 5, 2022 •

edited

Loading