Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with short audios #2

Open
zmahoor opened this issue Mar 4, 2018 · 15 comments
Open

issue with short audios #2

zmahoor opened this issue Mar 4, 2018 · 15 comments

Comments

@zmahoor
Copy link

zmahoor commented Mar 4, 2018

the program crashes for short audio clips (less than 4 seconds). Any thoughts what could be wrong? is there any requirements on the input length?
W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at conv_ops.cc:384 : Invalid argument: computed output size would be negative [[Node: conv1d_8/convolution/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv1d_8/convolution/ExpandDims, conv1d_8/convolution/ExpandDims_1)]]

@pseeth
Copy link
Owner

pseeth commented Mar 4, 2018

Yeah that's a limitation of the original paper because of the convolution receptive fields. You could fix it with padding your input?

@zmahoor
Copy link
Author

zmahoor commented Mar 4, 2018

I found a tensorflow version of soundnet which still works with short audios. I am sure the code does not pad the audio input because I commented it out the part it was padding! The only difference I found is your code is using conv1D, and the other one is using conv2D. Not sure if it matters.

@janaal1
Copy link

janaal1 commented Oct 23, 2018

Hi! I have found a Tensorflow version but it seems it does not work with short audios (https://github.com/eborboihuc/SoundNet-tensorflow)... Can you post the link that works with short audios?
Thanks in advance

@kristosh
Copy link

I am facing a similar issue with short videos (less than 2 seconds). So is it a good idea to do zero padding to the audio clip in order to extract the features? Or is there any other version than I can use for extracting the features?

@janaal1
Copy link

janaal1 commented Oct 31, 2018

I did not add zeros in my solution. I just changed the window parameters from the repo I linked before. I do not know the scores because I have not run the final application yet.

I cannot give you I final answer? What do you think about my aproximation?

@kristosh
Copy link

Why then not to change the sample_rate during loading audio clips? Is there a chance that this can be the solution too?

@janaal1
Copy link

janaal1 commented Oct 31, 2018

I also changed the sample_rate. Sorry I forgot to mention it.
I changed both parameters: window_size and sample_rate in order to extract features from my audios

@kristosh
Copy link

But why not from the keras version? I am trying to read my wav files and the returning size after prediction (i change the sample rate 3 times bigger) is (1, 0, 401) ?? Any idea why that is happening? Have you came across a similar issue?

@janaal1
Copy link

janaal1 commented Oct 31, 2018

I found keras version is less stable.... Can you paste your cose so I can see what you have modified please?

@kristosh
Copy link

kristosh commented Oct 31, 2018

I modified the line 29 and the sample rate (to be 3*sample_rate). I have a feeling though that my issue is in the raw file itself. When I am using the sample file it works properly. When I am using my own file which is short (around 2 sec) I am getting the error you mentioned, otherwise if I change the sample rate (or for example concatenate three times the input audio vector) the result is a vector with size (1, 0, 401).

@janaal1
Copy link

janaal1 commented Oct 31, 2018

That is very weird. Let's keep in touch with this issue. I think next week I will be able to work in the Keras gitHub to check if I can solve the problem in the same way as in Tensorflow

@kristosh
Copy link

So it is recommending to check the tensorflow version using sound8.npy with my file, right? In that case do I need to change the way I am reading the wav files (is that version dedicated to mp3)?

@janaal1
Copy link

janaal1 commented Oct 31, 2018

Yes, that is right. I could extract features from both extensions: mp3 and wav

@kristosh
Copy link

By using the tensorflow code I have another issue when I add my files in the text file, load_from_txt complaints and returns the following error: *** TypeError: 'float' object cannot be interpreted as an integer

@sudonto
Copy link

sudonto commented Nov 13, 2018

@kristosh you could change this syntax in util.py:
from raw_audio = np.tile(raw_audio, length/raw_audio.shape[0] + 1) to raw_audio = np.tile(raw_audio, length//raw_audio.shape[0] + 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants