why use the reverberated speech signal as the training target #16

flytair · 2023-10-04T04:57:11Z

hi,
it a great amazing project, thanks for your effort.
When I looked at the code, I found that the training target signal was reverberated speech. (https://github.com/Audio-WestlakeU/NBSS/blob/af66db92bb9d6f72f7100d613d3df38c40b10b09/data_loaders/ss_semi_online_dataset.py#L294C27-L294C27)
I wander why not use clean speech as the training target, as it would not only separate speakers, but also remove reverberation and even noise.

quancs · 2023-10-06T14:57:58Z

pls check sms_wsj_plus.py which is the latest dataset for jointly speech separation, denoising and dereverberation. The code you referred to is old and not used in SpatialNet.

flytair · 2023-10-19T09:55:45Z

thanks for your response!
i have another 2 questions regarding to the sms_wsj_plus dataset that the speech signal in this dataset is treated as babble noise source:
https://github.com/Audio-WestlakeU/NBSS/blob/e988a6ec845b6153910bbd106059a50b0b2c4a09/data_loaders/sms_wsj_plus.py#L95C9-L95C115
self.noises = list(set(original_sources)) # take the speech signal in this dataset as babble noise source

as the babble noise is speech and the targets of the network is also speech, how can the model know which source are the targets, the babble or the other speech?
as the babble noise is directional source, does the babble noise need to be convoluted with the rirs?

thanks!

quancs · 2023-10-20T12:48:20Z

@flytair

as the babble noise is speech and the targets of the network is also speech, how can the model know which source are the targets, the babble or the other speech?

The babble noise is diffuse, while the target speech signals are directional, that is the key clue for the model to learn to distinguish them.

as the babble noise is directional source, does the babble noise need to be convoluted with the rirs?

The babble noise is diffuse not directional, so it doesn't need to be convolved with rirs. And we use the method implemented in https://github.com/Audio-WestlakeU/NBSS/blob/main/data_loaders/utils/diffuse_noise.py to make it diffuse.

flytair · 2023-10-26T05:14:06Z

thanks for your response!
do you think it is reasonable to use wham noise as babble noise in sms_wsj_plus dataset?

AkenoSyuRi mentioned this issue Sep 25, 2024

关于训练数据集的target的问题 #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why use the reverberated speech signal as the training target #16

why use the reverberated speech signal as the training target #16

flytair commented Oct 4, 2023

quancs commented Oct 6, 2023

flytair commented Oct 19, 2023

quancs commented Oct 20, 2023

flytair commented Oct 26, 2023

why use the reverberated speech signal as the training target #16

why use the reverberated speech signal as the training target #16

Comments

flytair commented Oct 4, 2023

quancs commented Oct 6, 2023

flytair commented Oct 19, 2023

quancs commented Oct 20, 2023

flytair commented Oct 26, 2023