-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Same continuous phonemes are aggregated when computing gop features via compute-gop #4919
base: master
Are you sure you want to change the base?
Conversation
`queue.pl -config ...` should be revised as `queue.pl --config ...`, line 248
Same continuous phonemes are aggregated in comput-gop [Problem Statement] In computer-assisted pronunciation training, we use time-aligned information to compute the pronunciation features such as goodness of pronunciation (GOP). We want each phoneme to be separately processed to obtain their features or scores. However, in the original implementation of compute-gop:L163, it used phoneme transition to decide if it is the next phoneme or not to recompute the phoneme duration, which encounters the problem that if some word is composed of some continuous duplicated phonemes, for example: SOUNDNESS S AH1 D AH0 N N AH0 S it finally makes an outcome for a single N. [Solution] Add the phoneme boundary information to avoid such a case.
@jimbozhang perhaps you'd like to comment? |
@a2d8a4v Thanks for fixing the issue. I think it is reasonable. Could you please ensure the modified recipe has been thoroughly tested, as I won't have time to do so myself? |
@@ -130,16 +133,17 @@ int main(int argc, char *argv[]) { | |||
|
|||
po.Read(argc, argv); | |||
|
|||
if (po.NumArgs() != 4 && po.NumArgs() != 5) { | |||
if (po.NumArgs() != 6) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is problematic.
The help message at line 111 says it takes 5 required positional arguments and 1 optional one. So 5 is also a valid value. But this line requires that the number of arguments has to be exactly 6.
I am wondering how you have tested the changes.
[<phone-feature-wspecifier>]
the []
means it is optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing my code. I got it, thank you for your explanation, I fix this condition latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized it is a problem with the original code.
I think the []
in [<phone-feature-wspecifier>]
can be removed. The code does actually need it as a required argument.
src/bin/compute-gop.cc
Outdated
std::vector<int32> phone_boundary; | ||
for (int32 i = 0; i < split.size(); i++) { | ||
for (int32 j = 0; j < split[i].size(); j++) { | ||
phone_boundary.push_back(static_cast<int32>(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i
is already of type int32
and there is no need to cast it to int32
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I got it. I will remove 'static_cast'
With the advisement from code reviewer @csukuangfj, I updated the code, addressing all the identified issues and implementing best practices.
Hi, @jimbozhang I have already tested the updated code with two corpora: speechocean762 and L2-ARCTIC. The speechocean762 corpus do not have the problem of duplicated continuous phonemes. The test for this corpus is to check if this influences the original results from here in terms of pure phone sequences and their lengths. This phenomenon exists in the L2-ARCTIC corpus. For example, in the utterance with the identity number 'arctic_a0086'. |
Hi @a2d8a4v, Could you do me a favor? I'd like to remove the Google Docs link from the top of
Thanks alot. |
Remove the deprecated document link from the request of @jimbozhang.
Hi, @jimbozhang, I've dealt with it. |
Dear @jimbozhang and @csukuangfj, Do you have any other suggestions about the code? Alternatively, do you think we should proceed by having @danpovey confirm the pull request? |
LGTM |
@danpovey |
[Problem Statement]
In computer-assisted pronunciation training, we use time-aligned information to compute the pronunciation features such as goodness of pronunciation (GOP). We want each phoneme to be separately processed to obtain their features or scores. However, in the original implementation of compute-gop:L163, it used phoneme transition to decide if it is the next phoneme or not to recompute the phoneme duration, which encounters the problem that if some word is composed of some continuous duplicated phonemes, for example:
SUDDENNESS S AH1 D AH0 N N AH0 S
it finally makes an outcome for a single N.
[Solution]
Add the phoneme boundary information to solve such a case.