-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with short audios #2
Comments
Yeah that's a limitation of the original paper because of the convolution receptive fields. You could fix it with padding your input? |
I found a tensorflow version of soundnet which still works with short audios. I am sure the code does not pad the audio input because I commented it out the part it was padding! The only difference I found is your code is using conv1D, and the other one is using conv2D. Not sure if it matters. |
Hi! I have found a Tensorflow version but it seems it does not work with short audios (https://github.com/eborboihuc/SoundNet-tensorflow)... Can you post the link that works with short audios? |
I am facing a similar issue with short videos (less than 2 seconds). So is it a good idea to do zero padding to the audio clip in order to extract the features? Or is there any other version than I can use for extracting the features? |
I did not add zeros in my solution. I just changed the window parameters from the repo I linked before. I do not know the scores because I have not run the final application yet. I cannot give you I final answer? What do you think about my aproximation? |
Why then not to change the sample_rate during loading audio clips? Is there a chance that this can be the solution too? |
I also changed the sample_rate. Sorry I forgot to mention it. |
But why not from the keras version? I am trying to read my wav files and the returning size after prediction (i change the sample rate 3 times bigger) is (1, 0, 401) ?? Any idea why that is happening? Have you came across a similar issue? |
I found keras version is less stable.... Can you paste your cose so I can see what you have modified please? |
I modified the line 29 and the sample rate (to be 3*sample_rate). I have a feeling though that my issue is in the raw file itself. When I am using the sample file it works properly. When I am using my own file which is short (around 2 sec) I am getting the error you mentioned, otherwise if I change the sample rate (or for example concatenate three times the input audio vector) the result is a vector with size (1, 0, 401). |
That is very weird. Let's keep in touch with this issue. I think next week I will be able to work in the Keras gitHub to check if I can solve the problem in the same way as in Tensorflow |
So it is recommending to check the tensorflow version using sound8.npy with my file, right? In that case do I need to change the way I am reading the wav files (is that version dedicated to mp3)? |
Yes, that is right. I could extract features from both extensions: mp3 and wav |
By using the tensorflow code I have another issue when I add my files in the text file, load_from_txt complaints and returns the following error: *** TypeError: 'float' object cannot be interpreted as an integer |
@kristosh you could change this syntax in |
the program crashes for short audio clips (less than 4 seconds). Any thoughts what could be wrong? is there any requirements on the input length?
W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at conv_ops.cc:384 : Invalid argument: computed output size would be negative [[Node: conv1d_8/convolution/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](conv1d_8/convolution/ExpandDims, conv1d_8/convolution/ExpandDims_1)]]
The text was updated successfully, but these errors were encountered: