-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move towards human-readable timestamps in audio filenames and/or directory names #7
Comments
It would be even better, as Paul pointed out on Slack recently, to get rid of the datetime-stamped S3 objects (akin to directories) and just store all data under a nodename with each data filename incorporating a NIST-synchronized timestamp. We could get HLS segments to match the filename format of the FLAC files, which in the archive-orcasound-net bucket currently look something like: Or we could align with ONC or OOI filename formats: OOI: |
I have this working now based on Pauls suggestion for using " -strftime 1"and modifying stream.sh (for research) to this "/tmp/$NODE_NAME/hls/$timestamp/%Y-%m-%d_%H-%M-%S.ts" filename. |
I think the more standard format is %Y-%m-%dTH:%M:%S.ts
@Molkree you want to add your comments on the format? |
I tried %Y-%m-%dTH:%M:%S.ts instead of %Y-%m-%d_%H-%M-%S.ts" and I could not get the player to work. Not sure if it's unhappy with the : or the TH (probably the :) but ffmpeg does write the files fine. I can look into milliseconds, but I think the rpi's time in probably only accurate to maybe 10 ms? It uses NTP to sync time. |
ISO 8601 is a good idea, the full thing with timezone is The timezone could be easier to translate with |
Haha, actually you can't even upload such files using actions/upload-artifact#35 in GitHub workflows. I used colons at first but then changed it to this Haven't thought about timezone, I just used UTC everywhere I believe. If you do add it to the filename I'd also prefer |
I did not think about the colons. The OOI Archive has them but I guess this causes issues for some users. |
Right now we use Unix time so I'd prefer to stay with UTC. Not specifying time zone implies local time so fully compliant ISO 8601 UTC time without colons would look like I personally don't care that much about strict standard adherence in this case and would prefer something more readable but still in UTC. |
@tsuize @veirs this is the HLS timestamp issue I was seeking on today's call. I think we should tackle this formatting decision this winter, adjust the
|
After looking at MBARI's Pacific Sound open data registry a bit, they seem to be using something like this:
and John Ryan confirms via Slack that this is relying on the convention of scientific timestamps being assumed to be in the UTC time zone. Personally, I find the ambiguity unnerving enough that I think it's worth resolving with the extra 3 characters So, I'd propose one of the following options:
Or just use Modified Julian Date (MJD) for the filenames and utilize existing packages to decode into human-readable formats if/when necessary. Opinions? |
Also, we should test whether we can ensure |
@ben-hendricks shared on a call today that the BC Hydrophone Network uses a custom driver to generate timestamps from their icListen hydrophones in this format:
Where The archived format for processed calibrated noise level files assumes the user knows the timestamp is in UTC time zone, so ends up as (or close to?):
|
Related to this @ben-hendricks also made a good point that -- if possible -- it's ideal to have different nodes start their recordings on the minute (or they use a 5-minute interval) so that file names and time intervals end up being consistent across the network. This allows a direct request for a matching file, rather than a search through ~20k files for the desired matching time period from another location (e.g. for localization). |
As a comment to @scottveirs suggestion regarding filename convention and time synchronization: A change in filename convention is usually a small step, from a coding perspective. Synchronizing recording periods gave our coding team some headaches because we also wanted to be sure that all files have a predictable length (those with different length were re-named so that a search algorithm could filter them). However, in our experience the benefits outweigh the costs. a) It is a virtual requirement to x-correlate and localize transient signals. b) any match between a timestamp and a corresponding audio file can be made instantaneously. |
Great advice @ben-hendricks . Thanks for sharing insights from the BC Hydrophone Network! I've created two |
These details ^^^ from Ben may be of interest @valentina-s @savageGrant @CaseCal @mitchhaldeman |
@ben-hendricks Can you confirm/deny that the |
Thanks @scottveirs and @ben-hendricks, this is helpful and timely as we're juts developing our file naming and access tool. I notice in that example that the .flac file contains a start and end time, while the wav file has just a start time. Is there any standard or preference to including only start time, start time and end time, or start time and duration? Especially as we gear towards efficient storage in our own project, we may not have conveniently sized archive file durations. My though is having start time and end time makes it the easiest to scan files for a specific timestamp or period, but it also starts to become somewhat verbose. |
Hi all,
I would not worry about verbosity … as in the end most files are typically handled by an algorithm. The caveat (or one of them) for including end-time is that the filename is created when the file/recording is created at which point the end time is not known. So your logging algorithm could either
- write the start time at the beginning, and add an end-time after the file is completed, or
- force a certain file length (e.g. by zero-padding samples that do not contain data), hard-code the end-time into the file, and flag the filename when it required zero-padding (this is how we do it)
In any case your algorithm would have to read/write to the filename twice, not only once, as far as I see it.
Cheers,
Ben
… On Feb 3, 2023, at 4:11 PM, Caleb Case ***@***.***> wrote:
Thanks @scottveirs <https://github.com/scottveirs> and @ben-hendricks <https://github.com/ben-hendricks>, this is helpful and timely as we're juts developing our file naming and access tool.
I notice in that example that the .flac file contains a start and end time, while the wav file has just a start time. Is there any standard or preference to including only start time, start time and end time, or start time and duration? Especially as we gear towards efficient storage in our own project, we may not have conveniently sized archive file durations.
My though is having start time and end time makes it the easiest to scan files for a specific timestamp or period, but it also starts to become somewhat verbose.
—
Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A374DCPJTW4FUVCKVV7GSQLWVWNBNANCNFSM4E6563XA>.
You are receiving this because you were mentioned.
--
Benjamin Hendricks, PhD__________________________
*SoundSpace Analytics*
***@***.***
***@***.***>
(+1) 250 532 3179
|
Thanks to facilitation by @ben-hendricks , Tom Dakin confirms via email:
|
Noting that MANTA (Matlab-based noise analysis software) says this about datetime formats:
The date/time information can be located at any position within the filename. To aid users in renaming their acoustic data files to be compatible with MANTA software, a file renaming tool (Sox-o-matic) is available from The Cornell Lab of Ornithology Center for Conservation Bioacoustics: Sox-o-matic Wiki: https://bitbucket.org/CLO-BRP/sox-o-matic/wiki/Home Sox-o-matic Software download: https://www.birds.cornell.edu/ccb/sox-o-matic/ |
|
Comparing readability of these two options, for fun:
And noting that OOI added a lot of precision beyond MBARI, but neither added a
|
In the long run, it would be valuable to stream and archive the Orcasound acoustic data with a NIST-synchronized timebase encoded in both the FLAC files and possibly also the HLS/DASH stream manifest and/or segments. If adjacent hydrophones (within earshot of each other) are synchronized with millisecond to microsecond precision, then we will be able to localize sounds with an accuracy that will help us learn more about biology: e.g. direction a soniferous animal is moving, location of a sound source, or identity of a signaler.
To this end, the shell script might be adapted (along with changes to how the player stays current) from its current syntax --
timestamp=$(date +%s)
-- to syntax such as:
timestamp=$(date +\%Y-\%m-\%d)
code source and snippet:
The text was updated successfully, but these errors were encountered: