Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading file may hang if offset is greater than file duration #88

Open
frankenjoe opened this issue Jul 11, 2022 · 3 comments
Open

Reading file may hang if offset is greater than file duration #88

frankenjoe opened this issue Jul 11, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@frankenjoe
Copy link
Collaborator

frankenjoe commented Jul 11, 2022

I have a MP3 encoded stereo file of the following length:

>>> audiofile.duration(path)
3.996734693877551

Reading in the full file works:

>>> audiofile.read(path)
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]] 44100

Also reading with an offset of 3 seconds works:

>>> audiofile.read(path,   offset=3.0,)
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]] 44100

But when I try to read with an offset greater than the file duration it hangs instead of returning an empty array.

I tested with a WAV file and there it worked. So it seems to be related to reading encoded files with sox:

@frankenjoe frankenjoe changed the title Reading file hangs if offset is greater file duraiton Reading file may hang if offset is greater file duraiton Jul 11, 2022
@frankenjoe frankenjoe added the bug Something isn't working label Jul 11, 2022
@hagenw hagenw changed the title Reading file may hang if offset is greater file duraiton Reading file may hang if offset is greater than file duration Dec 22, 2022
@hagenw
Copy link
Member

hagenw commented Jan 27, 2023

This seems unfortunate as the bug is coming from sox.

One easy solution would be to change (inside audiofile.read())

            convert(file, tmpfile, offset, duration)
            signal, sampling_rate = soundfile.read(
                tmpfile,
                dtype=dtype,
                always_2d=always_2d,
                **kwargs,
            )

to

            convert(file, tmpfile)
            signal, sampling_rate = soundfile.read(
                tmpfile,
                offset=offset,
                duration=duration,
                dtype=dtype,
                always_2d=always_2d,
                **kwargs,
            )

But this has the disadvantage of having to convert the whole file even if only a very short segment of it is requested.

So maybe we first ask how long the file is and adjust duration accordingly, or return an empty array directly if offest is already out-of-bounds.

@hagenw
Copy link
Member

hagenw commented Jan 27, 2023

In order to check the exact duration of the file, we also need to convert it first completely to WAV, so I guess we can just go with my proposed change to first convert the whole file.

@hagenw
Copy link
Member

hagenw commented Jan 27, 2023

I added a test for out-of-bounds offset reading of a MP3 file in #104.

On Github all tests passes, but it might still be safer to first do the conversion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants