espeak-ng-mborla sound is worse than espeak-ng-mborla-generic (or espeak-mborla-geneirc) #949

cwendling · 2024-08-02T08:36:04Z

Steps to reproduce

Compare sound output between espeak-ng-mbrola (beware of #902) and espeak-ng-mborla-generic: the espeak-ng-mbrola one is a lot less human-like.

I used the following command to capture a sample (using French mbrola voices):

parecord -d @DEFAULT_MONITOR@ /tmp/sample.flac & pid=$!; spd-say -w -o espeak-ng-mbrola "Voici quelques mots pour tester." -y french-mbrola-1; sleep 1; spd-say -w -o espeak-ng-mbrola-generic "Voici quelques mots pour tester." -y fr1; kill "$pid"

Obtained behavior

The espeak-ng-mbrola one is a lot less human-like, the espeak-ng-mborla-generic one sounds "better".

Expected behavior

This is actually OK if it's not a bug (it might just be that the mbrola synthesizer is better at this, which is fine); but speech-dispatcher lists espeak-ng-mbrola as "better" than espeak-ng-mbrola-generic (in module_compare() from src/server/speechd.c). It might be true for the feature set, but it's not for (my) ears.

IMO the sorting should take into account the perceived voice quality as well as other factors, especially when two modules otherwise look so similar to the user.

The text was updated successfully, but these errors were encountered:

sthibaul · 2024-08-04T22:17:28Z

espeak-ng-mbrola is not supposed to produce worse than espeak-ng-mbrola-generic, they're supposed to be exactly the same, since the code is actually the same: libespeak-ng is the same, and in the non-generic case it calls the external mbrola tool, thus essentially the same as the pipeline in the -generic case. If a difference exists that makes the non-generic worse, it should be spotted to fix it, it's probably something dumb such as some default parameters that for whatever reason don't end up being the same.

In the end, espeak-ng-mbrola is supposed to be better, not in terms of audio quality (since they're expected to be exactly the same) but in terms of flexibility (audio pipelining, stopping, etc.)

sthibaul · 2024-09-15T20:42:12Z

Apparently espeak-ng doesn't report the proper audio rate (22KHz instead of 16KHz)

espeak_ng_GetSampleRate does not report the rate of mbrola voices. Fixes brailcom#949

sthibaul · 2024-09-15T20:57:28Z

I believe this is now fixed

cwendling · 2024-09-17T12:48:10Z

@sthibaul indeed, thanks! However, (trying a patched 0.11.4 ATM, I'll try testing true master at some point) now the sentence end is cut off. I don't know if it's a direct consequence of this or it just reveals a side effect, but it's affecting both spd-say and Orca.

sthibaul · 2024-09-17T13:28:54Z

now the sentence end is cut off

Was it not the case before patching?

cwendling · 2024-09-17T15:05:16Z

No, the sound was weird and fast but not cut off at the end, at least not that I can hear.

I didn't look into it, but maybe there's another discrepancy with the sample rate leading to incorrect timing computation or something? Or a bug dropping the last sample could have more impact maybe as it spans more?

sthibaul · 2024-09-17T23:27:21Z

That would completely depend on your configuration. Here with master and the pulse backend, I'm not noticing anything.

sthibaul · 2024-09-18T07:55:12Z

Does it also cut off with french-mbrola-2?

sthibaul · 2024-09-18T07:55:44Z

Does the cut-off show up in parecord too?

cwendling · 2024-09-19T09:09:35Z

Does it also cut off with french-mbrola-2?

I don't have -2, but it doesn't happen with -4. However, this voice always sounds a bit weird (with the generic or not), and didn't change with the patching.

Does the cut-off show up in parecord too?

Yes.

I tried debugging this a bit, and the issue seems to be that espeak sends a spurious sample rate change event, or that it's not handled in the correct order versus the sample collection. With french-mborla-4, I get 22050 all the way:

[…]
 Thu Sep 19 11:03:10 2024 [390224]: Espeak-ng: Successfully set synthesis voice to french-mbrola-4.
 Thu Sep 19 11:03:10 2024 [391570]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:10 2024 [391624]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:10 2024 [391666]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:10 2024 [391688]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:10 2024 [392130]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:10 2024 [392489]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:10 2024 [392790]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:10 2024 [393152]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:10 2024 [393189]: Espeak-ng: pushing 5867 samples
 Thu Sep 19 11:03:10 2024 [492627]: Espeak-ng: pushing 418 samples
 Thu Sep 19 11:03:10 2024 [529520]: Espeak-ng: Leaving module_speak() normally.

While with french-mborla-1, I get 16000 temporarily, and it reverts back to 22050 for the last couple of sample batch:

[…]
 Thu Sep 19 11:03:13 2024 [483213]: Espeak-ng: Successfully set synthesis voice to french-mbrola-1.
 Thu Sep 19 11:03:13 2024 [486501]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:13 2024 [486709]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:13 2024 [486811]: Espeak-ng: Got sample rate 16000
 Thu Sep 19 11:03:13 2024 [486917]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:13 2024 [487246]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:13 2024 [487549]: Espeak-ng: pushing 6616 samples
 Thu Sep 19 11:03:13 2024 [487883]: Espeak-ng: Got sample rate 22050
 Thu Sep 19 11:03:13 2024 [487992]: Espeak-ng: pushing 4787 samples
 Thu Sep 19 11:03:13 2024 [488158]: Espeak-ng: pushing 418 samples
 Thu Sep 19 11:03:13 2024 [488282]: Espeak-ng: Leaving module_speak() normally.

I believe this likely explains the issue if the last samples are not actually at rate 22050.

cwendling · 2024-09-19T13:41:56Z

If I hack to force sample rate to 16000 the sound is good with no cutoff using french-mbrola-1.

cwendling · 2024-09-19T16:43:34Z

I took a moment to look into this a bit, and I don't know the solution but possibly espeak-ng (1.51) is the issue. It's own code (in speech.c's dispatch_audio()) is only looking for the espeakEVENT_SAMPLERATE if it's the first event in the list, which looks like a bug to me. Doing the same in speech-dispatcher's module leads to the first sentence being properly spoken with french-mbrola-1 (right sample right, not cutoff), but all subsequent ones are using 22050 sample rate again.

espeka-ng CLI tool seems to work at first, but that's until you try and mix voices with different sample rates. Basically if using only 22050 voices it's all good, but mixing them seems to deadlock it. For example, this works:

$ espeak-ng -m '<ssml><p><voice name="french-mbrola-4">Voici quelques mots pour tester.</voice> <voice name="English (Great Britain)">This is</voice> <voice name="french-mbrola-4">un test, tout va bien ?</voice>  <voice name="English (American)">Dunno, whatcha thinkin?</voice></p></ssml>'

but only until you replace any french-mbrola-4 with french-mbrola-1, in which case it stops in the middle of the latter voice's part.

… list See espeak-ng/espeak-ng#2028 Fixes brailcom#949

… list See espeak-ng/espeak-ng#2028 Fixes #949

sthibaul · 2024-10-28T23:48:42Z

Ok, leaving the espeak-ng bug for now, and compensating here, assuming that the event list starts with the proper sample rate change, and we ignore the others.

sthibaul added bug help wanted labels Aug 4, 2024

sthibaul added a commit to sthibaul/speechd that referenced this issue Sep 15, 2024

espeak-ng-mbrola: Fix mbrola voices with rate different from 22KHz

e51235d

espeak_ng_GetSampleRate does not report the rate of mbrola voices. Fixes brailcom#949

sthibaul mentioned this issue Sep 15, 2024

espeak-ng-mbrola: Fix mbrola voices with rate different from 22KHz #954

Merged

sthibaul closed this as completed in #954 Sep 15, 2024

sthibaul closed this as completed in 29d4f4a Sep 15, 2024

sthibaul mentioned this issue Oct 28, 2024

Bogus sample rate change events espeak-ng/espeak-ng#2028

Open

sthibaul added a commit to sthibaul/speechd that referenced this issue Oct 28, 2024

espeak-ng: Ignore samplerate change when it is not first in the event…

201979f

… list See espeak-ng/espeak-ng#2028 Fixes brailcom#949

sthibaul mentioned this issue Oct 28, 2024

espeak-ng: Ignore samplerate change when it is not first in the event list #967

Merged

sthibaul added a commit that referenced this issue Oct 28, 2024

espeak-ng: Ignore samplerate change when it is not first in the event…

c8207dc

… list See espeak-ng/espeak-ng#2028 Fixes #949

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

espeak-ng-mborla sound is worse than espeak-ng-mborla-generic (or espeak-mborla-geneirc) #949

espeak-ng-mborla sound is worse than espeak-ng-mborla-generic (or espeak-mborla-geneirc) #949

cwendling commented Aug 2, 2024

sthibaul commented Aug 4, 2024

sthibaul commented Sep 15, 2024

sthibaul commented Sep 15, 2024

cwendling commented Sep 17, 2024

sthibaul commented Sep 17, 2024

cwendling commented Sep 17, 2024

sthibaul commented Sep 17, 2024

sthibaul commented Sep 18, 2024

sthibaul commented Sep 18, 2024

cwendling commented Sep 19, 2024

cwendling commented Sep 19, 2024

cwendling commented Sep 19, 2024

sthibaul commented Oct 28, 2024

espeak-ng-mborla sound is worse than espeak-ng-mborla-generic (or espeak-mborla-geneirc) #949

espeak-ng-mborla sound is worse than espeak-ng-mborla-generic (or espeak-mborla-geneirc) #949

Comments

cwendling commented Aug 2, 2024

Steps to reproduce

Obtained behavior

Expected behavior

sthibaul commented Aug 4, 2024

sthibaul commented Sep 15, 2024

sthibaul commented Sep 15, 2024

cwendling commented Sep 17, 2024

sthibaul commented Sep 17, 2024

cwendling commented Sep 17, 2024

sthibaul commented Sep 17, 2024

sthibaul commented Sep 18, 2024

sthibaul commented Sep 18, 2024

cwendling commented Sep 19, 2024

cwendling commented Sep 19, 2024

cwendling commented Sep 19, 2024

sthibaul commented Oct 28, 2024