Problem with corrupting the clean corpus with noise types ;cra trafic, crowd, machine #4

BilalDendani · 2019-06-23T15:28:19Z

Hello,
I am trying to corrupt clean TIMIT data set using the library maracas using the following code :

from maracas.dataset import Dataset
import numpy as np
np.random.seed(42)
d = Dataset()

d.add_speech_files('/home/bilal/krProjects/timit', recursive=True)
d.add_noise_files('/home/bilal/krProjects/noiseTypes/carTrafic.wav', name='carTrafic')
d.add_noise_files('/home/bilal/krProjects/noiseTypes/crowd.wav', name='crowd')
d.add_noise_files('/home/bilal/krProjects/noiseTypes/machine.wav', name='machine')
d.generate_dataset([-6, -3, 0, 3, 6], '/home/bilal/krProjects/noise_dataset', files_per_condition=5)

I got the following error when executing the code ;
bilal@myhost$ python corruptCleanDC.py
/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/numba/decorators.py:29: NumbaDeprecationWarning: autojit is deprecated, use jit instead, which provides the same functionality. For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-numba-autojit
warnings.warn(NumbaDeprecationWarning(msg))
Traceback (most recent call last):
File "corruptCleanDC.py", line 25, in
d.generate_dataset([-6, -3, 0, 3, 6], '/home/bilal/krProjects/noise_dataset', files_per_condition=5)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py", line 130, in generate_dataset
files_per_condition=files_per_condition)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py", line 100, in generate_condition
speech_files = np.random.choice(self.speech, files_per_condition, replace=False).tolist()
File "mtrand.pyx", line 1168, in mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

The text was updated successfully, but these errors were encountered:

jfsantos · 2019-06-23T15:57:20Z

It looks like your speech dataset is empty, which might mean there is a bug in recursive_glob. Can you share the output of ls home/bilal/krProjects/timit and ls /home/bilal/krProjects/timit/**/*.WAV with me?

BilalDendani · 2019-06-23T16:03:02Z

@jfsantos thank you for your quick replay.
The output of my timit clean folder is

$ ls timit/
sa1.wav sa2.wav
I just take two clean wav files from Timit corpus for test.

jfsantos · 2019-06-23T16:05:37Z

This is not a bug, it's the expected behaviour. You requested to have 5 files per condition but your corpus only has two files. Try either adding three more files or changing `files_per_condition=2`, it should work then.

…

-- João Felipe Santos

On Sun, 23 Jun 2019 at 12:03, Bilal Dendani ***@***.***> wrote: @jfsantos <https://github.com/jfsantos> thank you for your quick replay. The output of my timit clean folder is gw1 krPro $ ls timit/ sa1.wav sa2.wav I just take two clean wav files from Timit corpus for test. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AAABMZK3QE56TDGZ5BIHUVTP36NDNA5CNFSM4H2ZFCW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYLBUOQ#issuecomment-504764986>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAABMZNCSAOAP22LLXC6R6TP36NDNANCNFSM4H2ZFCWQ> .

BilalDendani · 2019-06-23T16:28:41Z

I changed the parameter files_per_condition = 2 and it shows the following error.
$ python corruptCleanDs.py
/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/numba/decorators.py:29: NumbaDeprecationWarning: autojit is deprecated, use jit instead, which provides the same functionality. For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-numba-autojit
warnings.warn(NumbaDeprecationWarning(msg))
Condition folder already exists!
-6dB: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "corruptCleanDC.py", line 25, in
d.generate_dataset([-6, -3, 0, 3, 6], '/home/bilal/krProjects/noise_dataset', files_per_condition=2)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py", line 130, in generate_dataset
files_per_condition=files_per_condition)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py", line 105, in generate_condition
x, fs = wavread(f)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/utils.py", line 9, in wavread
fs, x = scipy.io.wavfile.read(filename)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/scipy/io/wavfile.py", line 236, in read
file_size, is_big_endian = _read_riff_chunk(fid)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/scipy/io/wavfile.py", line 168, in _read_riff_chunk
"understood.".format(repr(str1)))
ValueError: File format b'NIST'... not understood.

jfsantos · 2019-06-23T16:31:35Z

The .WAV files originally on TIMIT are not actually WAV, they are in the NIST SPHERE format. See this on Stack Overflow for more details: https://stackoverflow.com/questions/44748258/reading-a-wav-file-from-timit-database-in-python

…

-- João Felipe Santos

On Sun, 23 Jun 2019 at 12:28, Bilal Dendani ***@***.***> wrote: I changed the parameter files_per_condition = 2 and it shows the following error. $ python corruptCleanDs.py /home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/numba/decorators.py:29: NumbaDeprecationWarning: autojit is deprecated, use jit instead, which provides the same functionality. For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-numba-autojit warnings.warn(NumbaDeprecationWarning(msg)) Condition folder already exists! -6dB: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last): File "corruptCleanDC.py", line 25, in d.generate_dataset([-6, -3, 0, 3, 6], '/home/bilal/krProjects/noise_dataset', files_per_condition=2) File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py", line 130, in generate_dataset files_per_condition=files_per_condition) File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py", line 105, in generate_condition x, fs = wavread(f) File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/utils.py", line 9, in wavread fs, x = scipy.io.wavfile.read(filename) File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/scipy/io/wavfile.py", line 236, in read file_size, is_big_endian = _read_riff_chunk(fid) File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/scipy/io/wavfile.py", line 168, in _read_riff_chunk "understood.".format(repr(str1))) ValueError: File format b'NIST'... not understood. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AAABMZNEQXSYVJIE4E6R3QDP36QDVA5CNFSM4H2ZFCW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYLCEYI#issuecomment-504767073>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAABMZMYCUXQI62QSMMQT63P36QDVANCNFSM4H2ZFCWQ> .

BilalDendani · 2019-06-23T16:39:21Z

Thank you so much @jfsantos.
I will try to convert TIMIT originally .WAV files to .wav then execute the code for corruption.

BilalDendani · 2019-06-29T09:21:07Z

I changed the '.WAV' TIMIT files from NIST format to the wave form '.wav'. Now all is better. I have another issue. I want to generate corrupted noise files for speech files having same names (many speakers pronounce same sentence, so the file name is the same).
When I generated data set. I got only one file (the last one).
The following is an example.
......................
.......................
d.add_speech_files('/run/media/bilal/Data/datasets/DataSet/TIMIT/TRAIN/DR6/MKES0/SA1.wav', recursive=True)
d.add_speech_files('/run/media/bilal/Data/datasets/DataSet/TIMIT/TRAIN/DR7/MTMN0/SA1.wav', recursive=True)
d.generate_dataset([-15, -10, -5, 0, 5,10,15], '/run/media/bilal/fb8b3d1d-9bbf-42d6-b741-ad7e4940ac3e/noise_dataset', files_per_condition=600)
I did not get all files with same name.
I want to save generated data set by saving the path of all speech files accordingly. I want to generate these files on the same path in the output "/run/media/bilal/fb8b3d1d-9bbf-42d6-b741-ad7e4940ac3e/noise_dataset". For example
"/run/media/bilal/fb8b3d1d-9bbf-42d6-b741-ad7e4940ac3e/noise_dataset/NoisyTIMIT/TRAIN/DR6/MKES0/SA1.wav"
"/run/media/bilal/fb8b3d1d-9bbf-42d6-b741-ad7e4940ac3e/noise_dataset/NoisyTIMIT/TRAIN/DR7/MTMN0/SA1.wav"
How can I change the method generate_dataset(self, snrs, output_dir, files_per_condition=None) to save all files with same names ?
Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with corrupting the clean corpus with noise types ;cra trafic, crowd, machine #4

Problem with corrupting the clean corpus with noise types ;cra trafic, crowd, machine #4

BilalDendani commented Jun 23, 2019 •

edited

Loading

jfsantos commented Jun 23, 2019

BilalDendani commented Jun 23, 2019 •

edited

Loading

jfsantos commented Jun 23, 2019 via email

BilalDendani commented Jun 23, 2019

jfsantos commented Jun 23, 2019 via email

BilalDendani commented Jun 23, 2019 •

edited

Loading

BilalDendani commented Jun 29, 2019 •

edited

Loading

Problem with corrupting the clean corpus with noise types ;cra trafic, crowd, machine #4

Problem with corrupting the clean corpus with noise types ;cra trafic, crowd, machine #4

Comments

BilalDendani commented Jun 23, 2019 • edited Loading

jfsantos commented Jun 23, 2019

BilalDendani commented Jun 23, 2019 • edited Loading

jfsantos commented Jun 23, 2019 via email

BilalDendani commented Jun 23, 2019

jfsantos commented Jun 23, 2019 via email

BilalDendani commented Jun 23, 2019 • edited Loading

BilalDendani commented Jun 29, 2019 • edited Loading

BilalDendani commented Jun 23, 2019 •

edited

Loading

BilalDendani commented Jun 23, 2019 •

edited

Loading

BilalDendani commented Jun 23, 2019 •

edited

Loading

BilalDendani commented Jun 29, 2019 •

edited

Loading