Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pulling segments for a particular time period #3

Open
valentina-s opened this issue Feb 16, 2022 · 3 comments
Open

pulling segments for a particular time period #3

valentina-s opened this issue Feb 16, 2022 · 3 comments

Comments

@valentina-s
Copy link
Contributor

This issue may be due to my misunderstanding of how to pull data. I am trying to use DateRangeHLSStream to pull .ts files between two time stamps, specifically 2020-01-08 2:06:00 - 2020-01-08 2:07:00 utc times, when we know that humpbacks have been heard.
My understanding is that if I set the polling_interval=60, and provide the starting and ending time stamp, I should get the corresponding files, when running .get_next_clip. The ones are getting are: 194.ts - 199.ts. in folder 1578447032 on orcalab node. The one I am expecting is 212-218 (=unixtime - 1578447032)/10. The latter indeed have humpback calls, while the former do not. Can anyone check my calculations and my understanding of the function? I have some of the operations in this notebook.

@Benjamintdk
Copy link

Benjamintdk commented Feb 26, 2022

@valentina-s I tried exploring the notebook you linked, and have some findings on where the issue might be. As you mentioned about the .get_next_clip function, I looked into it and discovered that this line might be the issue. The reason being that the stream_obj.target_duration is 11 seconds, while we have been using it on the assumption that it is 10 seconds all along.

When I isolate that part of the code, I do indeed (erroneously) get 194 as the starting index:
starting_index

When I manually substitute the stream_obj.target_duration with 10.0 instead in line 4, it gets a lot closer to the desired result (because math.ceil is used which rounds the value up to 213 instead):
corrected_start

stream_obj.target_duration is getting generated from loading the m3u8 file using the m3u8 library, so I suspect that the issue might stem either from the library or the m3u8 file itself?

@Benjamintdk
Copy link

Benjamintdk commented Feb 26, 2022

Another thing I noted is that the m3u8 file seems to have fewer segments than what is actually inside the folder. The first image shows the number of segments obtained from loading the m3u8 file, with the last few segment file names shown:
num_segs

However, the folder actually has additional files up to 358.ts, as shown below (pipe.txt contains the output from listing the files in the 1578447032 folder):
num_segs_correct

Not sure if this might be linked to the issue as well?

@valentina-s
Copy link
Contributor Author

Great find @Benjamintdk! I checked the m3u8 files, and they have target duration is 11 while the file lengths are close to 10. Not sure why this is happening, but for now I changed that value to average of the file lengths and it seems to pull what we expect: #5. The issue with some missing and extra files is something @Molkree has observed too, and there should be a separate check to make sure those are handled properly. Need some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants