You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems very strange to me that the methods work super well for conventional STFT spectrograms but not so well for other reps. One reason why representations might be important for bioacoustic applications is the high sampling rates required. For instance, many dolphin recordings are samples at 96kHz. While it might be possible to downsample a bit and still avoid aliasing, this would lead to an audio input with ~300k elements. However we could also try slicing audio inputs into more reasonable frame lengths and then concatenating results during inference. This I suppose is related to the “Variable Time Scales I’m Vocal Behavior” problem to some degree
The text was updated successfully, but these errors were encountered:
It seems very strange to me that the methods work super well for conventional STFT spectrograms but not so well for other reps. One reason why representations might be important for bioacoustic applications is the high sampling rates required. For instance, many dolphin recordings are samples at 96kHz. While it might be possible to downsample a bit and still avoid aliasing, this would lead to an audio input with ~300k elements. However we could also try slicing audio inputs into more reasonable frame lengths and then concatenating results during inference. This I suppose is related to the “Variable Time Scales I’m Vocal Behavior” problem to some degree
The text was updated successfully, but these errors were encountered: