Move towards human-readable timestamps in audio filenames and/or directory names #7

scottveirs · 2018-05-08T23:34:14Z

In the long run, it would be valuable to stream and archive the Orcasound acoustic data with a NIST-synchronized timebase encoded in both the FLAC files and possibly also the HLS/DASH stream manifest and/or segments. If adjacent hydrophones (within earshot of each other) are synchronized with millisecond to microsecond precision, then we will be able to localize sounds with an accuracy that will help us learn more about biology: e.g. direction a soniferous animal is moving, location of a sound source, or identity of a signaler.

To this end, the shell script might be adapted (along with changes to how the player stays current) from its current syntax --

timestamp=$(date +%s)

-- to syntax such as:

timestamp=$(date +\%Y-\%m-\%d)

code source and snippet:

$ rsync -avz --delete --backup --backup-dir="backup_$(date +%Y-%m-%d)" /source/path/ /dest/path
By using $(date +%Y-%m-%d) I’m telling it to use today’s date in the folder name.

The text was updated successfully, but these errors were encountered:

scottveirs · 2021-08-06T06:15:38Z

It would be even better, as Paul pointed out on Slack recently, to get rid of the datetime-stamped S3 objects (akin to directories) and just store all data under a nodename with each data filename incorporating a NIST-synchronized timestamp.

We could get HLS segments to match the filename format of the FLAC files, which in the archive-orcasound-net bucket currently look something like:
2020-12-09_23-22-16_rpi_orcasound_lab--2.flac

Or we could align with ONC or OOI filename formats:

OOI: OO-HYVM2--YDH-2017-08-21T00_02_42.437000.mseed
ONC: ICLISTENHF1293_20171226T145827.651Z.wav

mcshicks · 2021-08-11T17:22:18Z

I have this working now based on Pauls suggestion for using " -strftime 1"and modifying stream.sh (for research) to this "/tmp/$NODE_NAME/hls/$timestamp/%Y-%m-%d_%H-%M-%S.ts" filename.

valentina-s · 2021-08-11T18:04:16Z

I think the more standard format is %Y-%m-%dTH:%M:%S.ts
i.e. colons for the hours, and T instead of the _ (space is also used but bad for filenames). Also, what about milliseconds?
I agree timezone indication will be good since I am never sure it is Greenwich time or local time.

ISO-8601	2021-08-11T18:01:50+00:00
UTC	2021-08-11T18:01:50Z

@Molkree you want to add your comments on the format?

mcshicks · 2021-08-12T17:34:39Z

I tried %Y-%m-%dTH:%M:%S.ts instead of %Y-%m-%d_%H-%M-%S.ts" and I could not get the player to work. Not sure if it's unhappy with the : or the TH (probably the :) but ffmpeg does write the files fine. I can look into milliseconds, but I think the rpi's time in probably only accurate to maybe 10 ms? It uses NTP to sync time.

paulcretu · 2021-08-12T21:35:55Z

ISO 8601 is a good idea, the full thing with timezone is %Y-%m-%dT%H:%M:%S%z. The problem is colons won't work on some filesystems (Windows), not sure if that has anything to do with it not working for you @mcshicks. I would propose something like %Y-%m-%d_%H-%M-%S_%Z (2021-08-12_20-52-09_UTC). It's readable, portable, and easy-ish to translate into ISO 8601.

The timezone could be easier to translate with %z (e.g. 2021-08-12_20-52-09+0000) since you wouldn't have to look up the abbreviation (like PDT in 2021-08-12_20-52-09_PDT). But there might be some cases where the + is a problem, and with negative offsets, it's a bit confusing to have the - (2021-08-12_20-52-09-0700). It would be nicest to get 2021-08-12_20-52-09Z for UTC and +0000 offset notation for other timezones but that doesn't seem to be an option with strftime.

Molkree · 2021-08-14T17:57:24Z

The problem is colons won't work on some filesystems (Windows)

Haha, actually you can't even upload such files using actions/upload-artifact#35 in GitHub workflows. I used colons at first but then changed it to this %Y-%m-%dT%H-%M-%S-%f

Haven't thought about timezone, I just used UTC everywhere I believe. If you do add it to the filename I'd also prefer +0000. If extra - at the end looks confusing, can always add delimiter like TZ or something (2021-08-12_20-52-09TZ-0100).

valentina-s · 2021-08-15T01:36:21Z

I did not think about the colons. The OOI Archive has them but I guess this causes issues for some users.
The format without dashes and colons %Y%m%dT%H%M%S%Z is also supported by ISO 8601. I wonder if that can be run by the player? It may be less human readable but is also machine readable. I am more biased toward using something standard. The fractions are expected to be delimited with dots (or commas) to distinguish 01.05 (1 h 3 min) vs 01:05 (1h 5min). If there are no dashes before, maybe then the -/+ timezone will be more obvious. Is the local timezone preferred? It is only one but it may not be obvious to a non-local person.

Molkree · 2021-08-17T21:32:04Z

Is the local timezone preferred? It is only one but it may not be obvious to a non-local person.

Right now we use Unix time so I'd prefer to stay with UTC. Not specifying time zone implies local time so fully compliant ISO 8601 UTC time without colons would look like 20210812T205209+0000, 20210812T205209+00 or 20210812T205209Z.

I personally don't care that much about strict standard adherence in this case and would prefer something more readable but still in UTC.

scottveirs · 2022-10-05T20:12:37Z

@tsuize @veirs this is the HLS timestamp issue I was seeking on today's call. I think we should tackle this formatting decision this winter, adjust the orcanode code accordingly, and then fix everything that we're going to break, including at least:

The orcasite player code
The ingestion of live HLS data by aifororcas-livesystem (within Azure)
Scripts and packages that retrieve HLS data for particular time ranges
Likely the mseed transcoding tools built by @karan2704 and @mcshicks?

scottveirs · 2022-10-08T22:51:16Z

After looking at MBARI's Pacific Sound open data registry a bit, they seem to be using something like this:

2017-06-13T16:00:00

and John Ryan confirms via Slack that this is relying on the convention of scientific timestamps being assumed to be in the UTC time zone.

Personally, I find the ambiguity unnerving enough that I think it's worth resolving with the extra 3 characters +00...

So, I'd propose one of the following options:

20170613T160000+00
20170613-160000+00 which I find just barely human-readable enough
2017-06-13T16-00-00+00
2017-06-13_16-00-00+00 which I feel is the most human-readable while avoiding colons :

Or just use Modified Julian Date (MJD) for the filenames and utilize existing packages to decode into human-readable formats if/when necessary.

Opinions?

scottveirs · 2022-10-08T22:53:34Z

Also, we should test whether we can ensure ffmpeg can write a file with data starting at YYMMDD-HHMMSS precisely (to the nearest 10 or 100 microseconds). Otherwise we may need or want to add precision within the filename, i.e. precision high enough for any future localization efforts (e.g. 10 or 100 microseconds?).

scottveirs · 2023-01-11T21:32:10Z

@ben-hendricks shared on a call today that the BC Hydrophone Network uses a custom driver to generate timestamps from their icListen hydrophones in this format:

ICLISTENHF1281_20190704T085500.000Z_20190704T090000.000Z.flac

Where 1291 is the instrument ID (serial number?) and the .000 suffix is precision in seconds.

The archived format for processed calibrated noise level files assumes the user knows the timestamp is in UTC time zone, so ends up as (or close to?):

1281_20190704T085500.wav

scottveirs · 2023-01-11T21:34:33Z

Also, we should test whether we can ensure ffmpeg can write a file with data starting at YYMMDD-HHMMSS precisely (to the nearest 10 or 100 microseconds). Otherwise we may need or want to add precision within the filename, i.e. precision high enough for any future localization efforts (e.g. 10 or 100 microseconds?).

Related to this @ben-hendricks also made a good point that -- if possible -- it's ideal to have different nodes start their recordings on the minute (or they use a 5-minute interval) so that file names and time intervals end up being consistent across the network. This allows a direct request for a matching file, rather than a search through ~20k files for the desired matching time period from another location (e.g. for localization).

ben-hendricks · 2023-01-16T19:26:28Z

As a comment to @scottveirs suggestion regarding filename convention and time synchronization: A change in filename convention is usually a small step, from a coding perspective. Synchronizing recording periods gave our coding team some headaches because we also wanted to be sure that all files have a predictable length (those with different length were re-named so that a search algorithm could filter them). However, in our experience the benefits outweigh the costs. a) It is a virtual requirement to x-correlate and localize transient signals. b) any match between a timestamp and a corresponding audio file can be made instantaneously.

scottveirs · 2023-02-03T21:51:00Z

Great advice @ben-hendricks . Thanks for sharing insights from the BC Hydrophone Network!

I've created two orcanode issues based on your input:

scottveirs · 2023-02-03T21:55:42Z

@ben-hendricks shared on a call today that the BC Hydrophone Network uses a custom driver to generate timestamps from their icListen hydrophones in this format:

ICLISTENHF1281_20190704T085500.000Z_20190704T090000.000Z.flac

Where 1281 is the instrument ID (serial number?) and the .000 suffix is precision in seconds.

The archived format for processed calibrated noise level files assumes the user knows the timestamp is in UTC time zone, so ends up as (or close to?):

1281_20190704T085500.wav

These details ^^^ from Ben may be of interest @valentina-s @savageGrant @CaseCal @mitchhaldeman

scottveirs · 2023-02-03T21:57:19Z

@ben-hendricks Can you confirm/deny that the .000 part of the ICLISTEN file name is precision in seconds (rather an indication of zero hours offset from UTC (Z) time)?

CaseCal · 2023-02-04T00:10:49Z

Thanks @scottveirs and @ben-hendricks, this is helpful and timely as we're juts developing our file naming and access tool.

I notice in that example that the .flac file contains a start and end time, while the wav file has just a start time. Is there any standard or preference to including only start time, start time and end time, or start time and duration? Especially as we gear towards efficient storage in our own project, we may not have conveniently sized archive file durations.

My though is having start time and end time makes it the easiest to scan files for a specific timestamp or period, but it also starts to become somewhat verbose.

ben-hendricks · 2023-02-04T02:32:15Z

Hi all, I would not worry about verbosity … as in the end most files are typically handled by an algorithm. The caveat (or one of them) for including end-time is that the filename is created when the file/recording is created at which point the end time is not known. So your logging algorithm could either - write the start time at the beginning, and add an end-time after the file is completed, or - force a certain file length (e.g. by zero-padding samples that do not contain data), hard-code the end-time into the file, and flag the filename when it required zero-padding (this is how we do it) In any case your algorithm would have to read/write to the filename twice, not only once, as far as I see it. Cheers, Ben

…

On Feb 3, 2023, at 4:11 PM, Caleb Case ***@***.***> wrote: Thanks @scottveirs <https://github.com/scottveirs> and @ben-hendricks <https://github.com/ben-hendricks>, this is helpful and timely as we're juts developing our file naming and access tool. I notice in that example that the .flac file contains a start and end time, while the wav file has just a start time. Is there any standard or preference to including only start time, start time and end time, or start time and duration? Especially as we gear towards efficient storage in our own project, we may not have conveniently sized archive file durations. My though is having start time and end time makes it the easiest to scan files for a specific timestamp or period, but it also starts to become somewhat verbose. — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A374DCPJTW4FUVCKVV7GSQLWVWNBNANCNFSM4E6563XA>. You are receiving this because you were mentioned.

-- Benjamin Hendricks, PhD__________________________ *SoundSpace Analytics* ***@***.*** ***@***.***> (+1) 250 532 3179

scottveirs · 2023-02-07T22:26:18Z

@ben-hendricks Can you confirm/deny that the .000 part of the ICLISTEN file name is precision in seconds (rather an indication of zero hours offset from UTC (Z) time)?

Thanks to facilitation by @ben-hendricks , Tom Dakin confirms via email:

Yes the .000 are milliseconds.

scottveirs · 2023-04-25T17:14:07Z

Noting that MANTA (Matlab-based noise analysis software) says this about datetime formats:

The preferred time/date format in the filename is yyyymmdd_HHMMSS (HHMMSS.FFF is also acceptable).

The date/time information can be located at any position within the filename. To aid users in renaming their acoustic data files to be compatible with MANTA software, a file renaming tool (Sox-o-matic) is available from The Cornell Lab of Ornithology Center for Conservation Bioacoustics:

Sox-o-matic Wiki: https://bitbucket.org/CLO-BRP/sox-o-matic/wiki/Home

Sox-o-matic Software download: https://www.birds.cornell.edu/ccb/sox-o-matic/

scottveirs · 2023-04-25T17:15:54Z

Also, we should test whether we can ensure ffmpeg can write a file with data starting at YYMMDD-HHMMSS precisely (to the nearest 10 or 100 microseconds). Otherwise we may need or want to add precision within the filename, i.e. precision high enough for any future localization efforts (e.g. 10 or 100 microseconds?).

See Steve's thoughts in this other orcanode issue for more info about achieving high precision with ffmpeg...

scottveirs · 2023-04-25T17:43:35Z

Comparing readability of these two options, for fun:

20190704T085500.000Z (BCHN format)
20190704_092314.000Z (Proposed Orcasound format)

And noting that OOI added a lot of precision beyond MBARI, but neither added a Z or +00...

2017-06-13T16:00:00 (MBARI format, relying on convention of scientific timestamps defaulting to UTC time zone)
2021-08-04T00:20:00.000015 (OOI)

scottveirs changed the title ~~Consider human-readable datetime or MJD directory names~~ Move towards human-readable timestamps in audio filenames and/or directory names Aug 6, 2021

scottveirs closed this as completed Aug 6, 2021

scottveirs reopened this Aug 6, 2021

mcshicks self-assigned this Aug 11, 2021

scottveirs self-assigned this Nov 3, 2021

scottveirs mentioned this issue Feb 2, 2024

Create an Orcasound data catalogue and facilitate data access orcasound/orca-hls-utils#12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move towards human-readable timestamps in audio filenames and/or directory names #7

Move towards human-readable timestamps in audio filenames and/or directory names #7

scottveirs commented May 8, 2018

scottveirs commented Aug 6, 2021

mcshicks commented Aug 11, 2021

valentina-s commented Aug 11, 2021

mcshicks commented Aug 12, 2021

paulcretu commented Aug 12, 2021

Molkree commented Aug 14, 2021 •

edited

Loading

valentina-s commented Aug 15, 2021

Molkree commented Aug 17, 2021 •

edited

Loading

scottveirs commented Oct 5, 2022

scottveirs commented Oct 8, 2022

scottveirs commented Oct 8, 2022

scottveirs commented Jan 11, 2023 •

edited

Loading

scottveirs commented Jan 11, 2023

ben-hendricks commented Jan 16, 2023

scottveirs commented Feb 3, 2023

scottveirs commented Feb 3, 2023 •

edited

Loading

scottveirs commented Feb 3, 2023

CaseCal commented Feb 4, 2023

ben-hendricks commented Feb 4, 2023 via email

scottveirs commented Feb 7, 2023

scottveirs commented Apr 25, 2023 •

edited

Loading

scottveirs commented Apr 25, 2023

scottveirs commented Apr 25, 2023

Move towards human-readable timestamps in audio filenames and/or directory names #7

Move towards human-readable timestamps in audio filenames and/or directory names #7

Comments

scottveirs commented May 8, 2018

scottveirs commented Aug 6, 2021

mcshicks commented Aug 11, 2021

valentina-s commented Aug 11, 2021

mcshicks commented Aug 12, 2021

paulcretu commented Aug 12, 2021

Molkree commented Aug 14, 2021 • edited Loading

valentina-s commented Aug 15, 2021

Molkree commented Aug 17, 2021 • edited Loading

scottveirs commented Oct 5, 2022

scottveirs commented Oct 8, 2022

scottveirs commented Oct 8, 2022

scottveirs commented Jan 11, 2023 • edited Loading

scottveirs commented Jan 11, 2023

ben-hendricks commented Jan 16, 2023

scottveirs commented Feb 3, 2023

scottveirs commented Feb 3, 2023 • edited Loading

scottveirs commented Feb 3, 2023

CaseCal commented Feb 4, 2023

ben-hendricks commented Feb 4, 2023 via email

scottveirs commented Feb 7, 2023

scottveirs commented Apr 25, 2023 • edited Loading

scottveirs commented Apr 25, 2023

scottveirs commented Apr 25, 2023

Molkree commented Aug 14, 2021 •

edited

Loading

Molkree commented Aug 17, 2021 •

edited

Loading

scottveirs commented Jan 11, 2023 •

edited

Loading

scottveirs commented Feb 3, 2023 •

edited

Loading

scottveirs commented Apr 25, 2023 •

edited

Loading