Problem Scenario: There can be multiple speakers in one room which are speaking at the same time. This creates an overlapped speech signal, where one speaker is leaking into the conversation of the other speaker. This degrades the speech quality and means a privacy risk when confidential information is leaked to a different conversation.
-
3.7 mio. network parameters offer real-time capability
-
PESQ score of 3.7 after attenuation
-
Listening tests confirm that the subjective speech quality is doubled (MUSHRA score 67)
-
No prior information is needed to identify the targeted speaker
-
Reduction of mutual information by 60%
A multi-device setup can be used to isolate the dominant speaker by attenuating an undesired speaker. For this we use an adapted convolutional time-domain audio separation network. It uses two microphone inputs, 1. the mixed channel of the speaker to be isolated, 2. the mixed channel of the speaker that needs to be attenuated.
The network consists of two parts, one masking network, where a mask is generated for the undesired speaker. The inverse of that mask is then applied to the targeted speaker channel to remove the undesired speech content. An enhancement block is added to further increase the speech quality.
Listening Example 1
Listening Example for original speech mixture
Listening Example for isolated speaker
Listening Example 2
Listening Example for original speech mixture
Listening Example for isolated speaker
Listening Example 3