-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with corrupting the clean corpus with noise types ;cra trafic, crowd, machine #4
Comments
It looks like your speech dataset is empty, which might mean there is a bug in |
@jfsantos thank you for your quick replay. $ ls timit/ |
This is not a bug, it's the expected behaviour. You requested to have 5
files per condition but your corpus only has two files. Try either adding
three more files or changing `files_per_condition=2`, it should work then.
…--
João Felipe Santos
On Sun, 23 Jun 2019 at 12:03, Bilal Dendani ***@***.***> wrote:
@jfsantos <https://github.com/jfsantos> thank you for your quick replay.
The output of my timit clean folder is
gw1 krPro
$ ls timit/
sa1.wav sa2.wav
I just take two clean wav files from Timit corpus for test.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AAABMZK3QE56TDGZ5BIHUVTP36NDNA5CNFSM4H2ZFCW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYLBUOQ#issuecomment-504764986>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABMZNCSAOAP22LLXC6R6TP36NDNANCNFSM4H2ZFCWQ>
.
|
I changed the parameter files_per_condition = 2 and it shows the following error. |
The .WAV files originally on TIMIT are not actually WAV, they are in the
NIST SPHERE format. See this on Stack Overflow for more details:
https://stackoverflow.com/questions/44748258/reading-a-wav-file-from-timit-database-in-python
…--
João Felipe Santos
On Sun, 23 Jun 2019 at 12:28, Bilal Dendani ***@***.***> wrote:
I changed the parameter files_per_condition = 2 and it shows the following
error.
$ python corruptCleanDs.py
/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/numba/decorators.py:29:
NumbaDeprecationWarning: autojit is deprecated, use jit instead, which
provides the same functionality. For more information visit
http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-numba-autojit
warnings.warn(NumbaDeprecationWarning(msg))
Condition folder already exists!
-6dB: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "corruptCleanDC.py", line 25, in
d.generate_dataset([-6, -3, 0, 3, 6],
'/home/bilal/krProjects/noise_dataset', files_per_condition=2)
File
"/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py",
line 130, in generate_dataset
files_per_condition=files_per_condition)
File
"/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py",
line 105, in generate_condition
x, fs = wavread(f)
File
"/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/utils.py",
line 9, in wavread
fs, x = scipy.io.wavfile.read(filename)
File
"/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/scipy/io/wavfile.py",
line 236, in read
file_size, is_big_endian = _read_riff_chunk(fid)
File
"/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/scipy/io/wavfile.py",
line 168, in _read_riff_chunk
"understood.".format(repr(str1)))
ValueError: File format b'NIST'... not understood.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AAABMZNEQXSYVJIE4E6R3QDP36QDVA5CNFSM4H2ZFCW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYLCEYI#issuecomment-504767073>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAABMZMYCUXQI62QSMMQT63P36QDVANCNFSM4H2ZFCWQ>
.
|
Thank you so much @jfsantos. |
I changed the '.WAV' TIMIT files from NIST format to the wave form '.wav'. Now all is better. I have another issue. I want to generate corrupted noise files for speech files having same names (many speakers pronounce same sentence, so the file name is the same). |
Hello,
I am trying to corrupt clean TIMIT data set using the library maracas using the following code :
I got the following error when executing the code ;
bilal@myhost$ python corruptCleanDC.py
/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/numba/decorators.py:29: NumbaDeprecationWarning: autojit is deprecated, use jit instead, which provides the same functionality. For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-numba-autojit
warnings.warn(NumbaDeprecationWarning(msg))
Traceback (most recent call last):
File "corruptCleanDC.py", line 25, in
d.generate_dataset([-6, -3, 0, 3, 6], '/home/bilal/krProjects/noise_dataset', files_per_condition=5)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py", line 130, in generate_dataset
files_per_condition=files_per_condition)
File "/home/bilal/krProjects/DAE/DAE_venv/lib64/python3.6/site-packages/maracas/dataset.py", line 100, in generate_condition
speech_files = np.random.choice(self.speech, files_per_condition, replace=False).tolist()
File "mtrand.pyx", line 1168, in mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'
The text was updated successfully, but these errors were encountered: