-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] allow timestamps for physio table #1796
Comments
cc @sjeung @JuliusWelzel who have thought about this / worked on this. |
Hi, thanks for tagging us :) In Motion-BIDS we introduced the channel type On the topic of timestamp vs. latency, we decided that each data file assumes a recording start at 0 and each sample can be assigned a latency. Multiple recordings can be synchronized via the Does this help with your idea? |
For physio files I think that the |
In my opinion, this is more to gauge the possibility that the There's no specific limitation to having a column (channel?) called latency, time, or timestamp. The limitation is that, if Currently, you cannot have a physio file with two rows that were sampled simultaneously (i.e., two rows with same value in the time/latency/timestamp column) AND you have to have a row for every sample at the given frequency. The latter can be workarounded by filling the row with |
Yeah there are definitely some limitations to relying on the sampling frequency value. We made a distinction between nominal sampling rate (SamplingFrequency) that is used as hardware spec description and the effective sampling rate (SamplingFrequencyEffective, link) computed by the measured duration and the number of samples. Effective sampling rate allows for deviation from nominal sampling rate, for example 50.5 Hz instead of 50 Hz, but still time stamps are needed for restoring precise timing information when sample intervals are irregular (like @oesteban described, missing samples or two consecutive timestamps repeating). I was not aware that using a latency (or time stamps) channel, you are still not allowed to sample two data points in one channel simultaneously. The other case of having to fill the cells with n/a is also something I have not encountered myself. My current understanding is that by having a latency channel that goes for example 1.00, 1.01, 1.01, 1.02, 1.05 (repeated 1.01s and missing 1.03 and 1.04 when nominal sampling rate is 100 hz) |
I am +1 for adding an optional timestamp column, and either make Just to throw another proposal into the mix (and hoping I'm not stating the obvious), what we're looking at are sparse vs dense 1D arrays with vector values, where the index is the time stamp: Dense
Sparse
BEP 17 has proposed adding It's actually not difficult in the schema to make metadata optional based on the contents of the file, but it might help other tools to know what they're getting into, if they've already hard-coded assumptions about what |
I agree with @sjeung and like @effigies' options. The best solution here, in my opinion, is to create an explicit Regarding the proposal of |
It's nice to see the two cases clearly written out like that. "Synchronisation and latency channel" section in this Motion-BIDS preprint is describing those cases, depending on the availability of metadata/latency. But we did not think of assigning labels to make the distinction clearer. "For tracking systems that do not provide single-sample timestamp information, the latency of each sample is can be reconstructed based on the effective sampling frequency (recommended field “SamplingFrequencyEffective”), if available, or the nominal sampling frequency (required field “SamplingFrequency”), both found in “motion.json” metadata file. Synchronizing the onset of motion data between different tracking systems and/or with data from other modalities is achieved using the “sub-XX_scans.tsv” file, which contains an optional column “acq_time” that documents the onset of acquisition in datetime format." If I can make a suggestion there it would be to stick with the name "latency", in numeric, instead of "time stamps", in datetime format? |
@sjeung I couldn't find where the timestamps are defined in datetime format. I prefer timestamp because it is generic -- it can be an integer (Unix sense) it can be a float in seconds (like latency), etc. Perhaps we need to think about the format itself rather than the device: For "sparse" (explicit indexing): {
"NominalSamplingFrequency": 1000.0,
"StartTime": 23.3,
"TabularDataIndex": "<column_name>"
} where "<column_name>" can be anything such as latency, timestamp, time or index (depending on what section of the specs is being discussed and could even be left open to the user). Start time is still necessary to synchronize with other experiments/acquisitions. For dense (implicit indexing): {
"SamplingFrequency": 1000.0,
"StartTime": 23.3,
} (i.e., current physio specs, just making "SamplingFrequency" optional) |
I would have expected @oesteban Timestamps are defined in https://bids-specification.readthedocs.io/en/stable/common-principles.html#units.
It's not a glossary term, so I'm not sure it's a "definition" so much as describing what to do when storing times without dates. This item would presumably need to be clarified in any PR. FWIW I don't like |
The start time approach was explored with respect to cross-modality synchronization and the issue we had back then was that some modalities like EEG do not come with the field and we would have to look at scans.tsv file to unambiguously determine the temporal alignment between recordings. Of course, scans.tsv file itself and acq_time fields are both optional.... |
What about: {
"NominalSamplingFrequency": 1000.0,
"StartTime": 23.3,
"IndexColumn": 1
} For dense (implicit indexing): {
"SamplingFrequency": 1000.0,
"StartTime": 23.3,
"IndexColumn": 0 # Or just omitted
} I just realized that physio tsvs are headerless, so having a name there requires you resolve the "Columns" metadata too. What brings the other side of your distaste for this -- these tsv files are headerless and therefore you have to look at the JSON anyway. |
Good point that physio requires reading the JSON anyway. I was forgetting that. In any event, I would want the logic as simple as possible. Index columns, if they exist, should come first. Ideally the name is set by the spec, not the user. It is less painful to me to consider the edge cases of "When I see motion, I look for As a minor aside, the schema uses |
I think "timestamp" is a more natural choice than "latency", but I'd be fine with bending on that for the sake of consistency with the motion modality. The plan for physio then would simply be to have "latency" be (optionally) the first item in the "Columns" list of the associated *_physio.json. The units would default to seconds but maybe the JSON could override that by specifying a different Unit for that column. Then the times of these samples would be the first column of the corresponding tsv file. It looks like this is done a bit differently in motion, where the columns of the motion.tsv are described by the corresponding channels.tsv, in which one of the channels can be marked as the LATENCY type. I like the idea of explicitly using NominalSamplingFrequency for irregularly sampled data. I would also be OK with keeping SamplingFrequency and having that be interpreted as the nominal frequency if the latency column is present, though I will point out that I think this might cause mistakes for downstream tools that are not aware of this new feature. |
To be honest I also like "timestamp" more than "latency". This would give the option to synchronize datastream across modalities. For now, recordings have to be synchronized based on the onset from the "scans.tsv" file and the latency channel. I believe for BIDS 2.0 recordings.tsv will replace scans.tsv. Maybe the option to synchronize recordings via timestamps can be introduced in BIDS 2.0 where backward compatibility is broken? |
Your idea
Tables of time series data, e.g. physio and motion, currently require SamplingFrequency, which assumes regularly sampled data. For irregularly sampled data, it would be helpful to have an OPTIONAL column "timestamp." (which acts similar to the "onset" column of the current events table). If this column is provided, SamplingFrequency should be optional. This feature would also be helpful when you want to capture temporal drift between two acquisition systems.
Motivated by #1792 (comment) by @oesteban
The text was updated successfully, but these errors were encountered: