issue with short audios #2

zmahoor · 2018-03-04T13:06:46Z

the program crashes for short audio clips (less than 4 seconds). Any thoughts what could be wrong? is there any requirements on the input length?
W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at conv_ops.cc:384 : Invalid argument: computed output size would be negative [[Node: conv1d_8/convolution/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv1d_8/convolution/ExpandDims, conv1d_8/convolution/ExpandDims_1)]]

The text was updated successfully, but these errors were encountered:

pseeth · 2018-03-04T20:02:49Z

Yeah that's a limitation of the original paper because of the convolution receptive fields. You could fix it with padding your input?

zmahoor · 2018-03-04T21:56:16Z

I found a tensorflow version of soundnet which still works with short audios. I am sure the code does not pad the audio input because I commented it out the part it was padding! The only difference I found is your code is using conv1D, and the other one is using conv2D. Not sure if it matters.

janaal1 · 2018-10-23T11:37:29Z

Hi! I have found a Tensorflow version but it seems it does not work with short audios (https://github.com/eborboihuc/SoundNet-tensorflow)... Can you post the link that works with short audios?
Thanks in advance

kristosh · 2018-10-31T14:22:40Z

I am facing a similar issue with short videos (less than 2 seconds). So is it a good idea to do zero padding to the audio clip in order to extract the features? Or is there any other version than I can use for extracting the features?

janaal1 · 2018-10-31T14:27:01Z

I did not add zeros in my solution. I just changed the window parameters from the repo I linked before. I do not know the scores because I have not run the final application yet.

I cannot give you I final answer? What do you think about my aproximation?

kristosh · 2018-10-31T14:30:18Z

Why then not to change the sample_rate during loading audio clips? Is there a chance that this can be the solution too?

janaal1 · 2018-10-31T14:33:43Z

I also changed the sample_rate. Sorry I forgot to mention it.
I changed both parameters: window_size and sample_rate in order to extract features from my audios

kristosh · 2018-10-31T14:39:56Z

But why not from the keras version? I am trying to read my wav files and the returning size after prediction (i change the sample rate 3 times bigger) is (1, 0, 401) ?? Any idea why that is happening? Have you came across a similar issue?

janaal1 · 2018-10-31T14:47:57Z

I found keras version is less stable.... Can you paste your cose so I can see what you have modified please?

kristosh · 2018-10-31T14:50:36Z

I modified the line 29 and the sample rate (to be 3*sample_rate). I have a feeling though that my issue is in the raw file itself. When I am using the sample file it works properly. When I am using my own file which is short (around 2 sec) I am getting the error you mentioned, otherwise if I change the sample rate (or for example concatenate three times the input audio vector) the result is a vector with size (1, 0, 401).

janaal1 · 2018-10-31T14:54:37Z

That is very weird. Let's keep in touch with this issue. I think next week I will be able to work in the Keras gitHub to check if I can solve the problem in the same way as in Tensorflow

kristosh · 2018-10-31T14:58:56Z

So it is recommending to check the tensorflow version using sound8.npy with my file, right? In that case do I need to change the way I am reading the wav files (is that version dedicated to mp3)?

janaal1 · 2018-10-31T15:12:32Z

Yes, that is right. I could extract features from both extensions: mp3 and wav

kristosh · 2018-10-31T16:20:26Z

By using the tensorflow code I have another issue when I add my files in the text file, load_from_txt complaints and returns the following error: *** TypeError: 'float' object cannot be interpreted as an integer

sudonto · 2018-11-13T05:50:43Z

@kristosh you could change this syntax in util.py:
from raw_audio = np.tile(raw_audio, length/raw_audio.shape[0] + 1) to raw_audio = np.tile(raw_audio, length//raw_audio.shape[0] + 1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue with short audios #2

issue with short audios #2

zmahoor commented Mar 4, 2018 •

edited

Loading

pseeth commented Mar 4, 2018

zmahoor commented Mar 4, 2018

janaal1 commented Oct 23, 2018 •

edited

Loading

kristosh commented Oct 31, 2018

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018 •

edited

Loading

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018

sudonto commented Nov 13, 2018

issue with short audios #2

issue with short audios #2

Comments

zmahoor commented Mar 4, 2018 • edited Loading

pseeth commented Mar 4, 2018

zmahoor commented Mar 4, 2018

janaal1 commented Oct 23, 2018 • edited Loading

kristosh commented Oct 31, 2018

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018 • edited Loading

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018

janaal1 commented Oct 31, 2018

kristosh commented Oct 31, 2018

sudonto commented Nov 13, 2018

zmahoor commented Mar 4, 2018 •

edited

Loading

janaal1 commented Oct 23, 2018 •

edited

Loading

kristosh commented Oct 31, 2018 •

edited

Loading