Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

espeak-ng-mborla sound is worse than espeak-ng-mborla-generic (or espeak-mborla-geneirc) #949

Closed
cwendling opened this issue Aug 2, 2024 · 13 comments · Fixed by #954 or #967
Closed

Comments

@cwendling
Copy link
Contributor

Steps to reproduce

Compare sound output between espeak-ng-mbrola (beware of #902) and espeak-ng-mborla-generic: the espeak-ng-mbrola one is a lot less human-like.

I used the following command to capture a sample (using French mbrola voices):

parecord -d @DEFAULT_MONITOR@ /tmp/sample.flac & pid=$!; spd-say -w -o espeak-ng-mbrola "Voici quelques mots pour tester." -y french-mbrola-1; sleep 1; spd-say -w -o espeak-ng-mbrola-generic "Voici quelques mots pour tester." -y fr1; kill "$pid"

Obtained behavior

The espeak-ng-mbrola one is a lot less human-like, the espeak-ng-mborla-generic one sounds "better".

Expected behavior

This is actually OK if it's not a bug (it might just be that the mbrola synthesizer is better at this, which is fine); but speech-dispatcher lists espeak-ng-mbrola as "better" than espeak-ng-mbrola-generic (in module_compare() from src/server/speechd.c). It might be true for the feature set, but it's not for (my) ears.

IMO the sorting should take into account the perceived voice quality as well as other factors, especially when two modules otherwise look so similar to the user.

@sthibaul
Copy link
Collaborator

sthibaul commented Aug 4, 2024

espeak-ng-mbrola is not supposed to produce worse than espeak-ng-mbrola-generic, they're supposed to be exactly the same, since the code is actually the same: libespeak-ng is the same, and in the non-generic case it calls the external mbrola tool, thus essentially the same as the pipeline in the -generic case. If a difference exists that makes the non-generic worse, it should be spotted to fix it, it's probably something dumb such as some default parameters that for whatever reason don't end up being the same.

In the end, espeak-ng-mbrola is supposed to be better, not in terms of audio quality (since they're expected to be exactly the same) but in terms of flexibility (audio pipelining, stopping, etc.)

@sthibaul
Copy link
Collaborator

Apparently espeak-ng doesn't report the proper audio rate (22KHz instead of 16KHz)

sthibaul added a commit to sthibaul/speechd that referenced this issue Sep 15, 2024
espeak_ng_GetSampleRate does not report the rate of mbrola voices.

Fixes brailcom#949
@sthibaul
Copy link
Collaborator

I believe this is now fixed

@cwendling
Copy link
Contributor Author

@sthibaul indeed, thanks! However, (trying a patched 0.11.4 ATM, I'll try testing true master at some point) now the sentence end is cut off. I don't know if it's a direct consequence of this or it just reveals a side effect, but it's affecting both spd-say and Orca.

@sthibaul
Copy link
Collaborator

now the sentence end is cut off

Was it not the case before patching?

@cwendling
Copy link
Contributor Author

No, the sound was weird and fast but not cut off at the end, at least not that I can hear.

I didn't look into it, but maybe there's another discrepancy with the sample rate leading to incorrect timing computation or something? Or a bug dropping the last sample could have more impact maybe as it spans more?

@sthibaul
Copy link
Collaborator

That would completely depend on your configuration. Here with master and the pulse backend, I'm not noticing anything.

@sthibaul
Copy link
Collaborator

Does it also cut off with french-mbrola-2?

@sthibaul
Copy link
Collaborator

Does the cut-off show up in parecord too?

@cwendling
Copy link
Contributor Author

Does it also cut off with french-mbrola-2?

I don't have -2, but it doesn't happen with -4. However, this voice always sounds a bit weird (with the generic or not), and didn't change with the patching.

Does the cut-off show up in parecord too?

Yes.

I tried debugging this a bit, and the issue seems to be that espeak sends a spurious sample rate change event, or that it's not handled in the correct order versus the sample collection. With french-mborla-4, I get 22050 all the way:

[…]
 Thu Sep 19 11:03:10 2024 [390224]: Espeak-ng: Successfully set synthesis voice to french-mbrola-4.
 Thu Sep 19 11:03:10 2024 [391570]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:10 2024 [391624]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:10 2024 [391666]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:10 2024 [391688]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:10 2024 [392130]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:10 2024 [392489]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:10 2024 [392790]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:10 2024 [393152]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:10 2024 [393189]: Espeak-ng: pushing 5867 samples
 Thu Sep 19 11:03:10 2024 [492627]: Espeak-ng: pushing 418 samples
 Thu Sep 19 11:03:10 2024 [529520]: Espeak-ng: Leaving module_speak() normally.

While with french-mborla-1, I get 16000 temporarily, and it reverts back to 22050 for the last couple of sample batch:

[…]
 Thu Sep 19 11:03:13 2024 [483213]: Espeak-ng: Successfully set synthesis voice to french-mbrola-1.
 Thu Sep 19 11:03:13 2024 [486501]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:13 2024 [486709]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:13 2024 [486811]: Espeak-ng: Got sample rate 16000
 Thu Sep 19 11:03:13 2024 [486917]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:13 2024 [487246]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:13 2024 [487549]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:13 2024 [487883]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:13 2024 [487992]: Espeak-ng: pushing 4787 samples
 Thu Sep 19 11:03:13 2024 [488158]: Espeak-ng: pushing 418 samples
 Thu Sep 19 11:03:13 2024 [488282]: Espeak-ng: Leaving module_speak() normally.

I believe this likely explains the issue if the last samples are not actually at rate 22050.

@cwendling
Copy link
Contributor Author

If I hack to force sample rate to 16000 the sound is good with no cutoff using french-mbrola-1.

@cwendling
Copy link
Contributor Author

I took a moment to look into this a bit, and I don't know the solution but possibly espeak-ng (1.51) is the issue. It's own code (in speech.c's dispatch_audio()) is only looking for the espeakEVENT_SAMPLERATE if it's the first event in the list, which looks like a bug to me. Doing the same in speech-dispatcher's module leads to the first sentence being properly spoken with french-mbrola-1 (right sample right, not cutoff), but all subsequent ones are using 22050 sample rate again.

espeka-ng CLI tool seems to work at first, but that's until you try and mix voices with different sample rates. Basically if using only 22050 voices it's all good, but mixing them seems to deadlock it. For example, this works:

$ espeak-ng -m '<ssml><p><voice name="french-mbrola-4">Voici quelques mots pour tester.</voice> <voice name="English (Great Britain)">This is</voice> <voice name="french-mbrola-4">un test, tout va bien ?</voice>  <voice name="English (American)">Dunno, whatcha thinkin?</voice></p></ssml>'

but only until you replace any french-mbrola-4 with french-mbrola-1, in which case it stops in the middle of the latter voice's part.

@sthibaul
Copy link
Collaborator

Ok, leaving the espeak-ng bug for now, and compensating here, assuming that the event list starts with the proper sample rate change, and we ignore the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants