Processing functions should not strip metadata from processed xdata. #16

Brow71189 · 2020-04-30T08:48:12Z

Right now, some of them at least try to add some sensible dimensional and intensity calibrations to the output data. But all functions in Core.py strip the metadata dict from their inputs.
Processing functions with one input and one output should just copy the metdata to their outputs.
For processing functions with multiple inputs, I'm not sure what the best solution would be. 3 ideas:

Use the metadata from the first input: Very simple but probably the worst solution (still better than the current situation, though)
Add a list of metadata dicts to the output: Also very simple, but can overload the metadata of the output a bit. We already have a lot of stuff in there, and multiplying that might make it even harder for the user to browse through it.
Merge the metadata dicts: This is the most complex, but probably most elegant solution. We would recursively go through the metadata and only put common keys into the result once. If the values of these keys are the same, we just list the values once, otherwise they get turned into a list of values for each input.
Alternatively we could have the common key, value pairs in one dict and the "unique" ones in a list of dicts as in point 2.

Tasks

Give feedback

Establish rules/architecture for transferring acquisition metadata during processing nionswift#915

audience - developer effort - week f - acquisition f - processing f - user-interface impact - high reach - all type - enhancement
Options

The text was updated successfully, but these errors were encountered:

cmeyer · 2020-04-30T15:15:33Z

Do not copy metadata. Copying almost always results in incorrect metadata and makes it impossible to use metadata for anything useful. For instance, let's say we try to detect "data acquired from a camera" using metadata, which we do currently. If all processed data from the original has a copy of the metadata, how are we to distinguish it from the original?
Put source metadata into a sub-tag. This is also not a good idea as it can propagate a lot of idea quickly and still make it difficult to access the metadata.

My plan for this in the future (which may be now) is to provide more metadata functions that allow for tracing the sources of the metadata. So an operation can add metadata about its sources and then additional functions can help to access a processed item's sources and metadata can be grabbed from the source. The obvious downside is if the source is missing.

Another approach would be to get rid of the concept of unstructured metadata completely and define subsets that are more specific and may be copied. We already do this to some extent with calibrations and data descriptions. In the near future we may add coordinate systems. The first example might be "description of processing". Another new example might be "original data acquisition information." This needs some thinking.

cmeyer · 2020-04-30T15:23:55Z

Before I close this issue, I'd like to understand your use case. What are you trying to accomplish by copying metadata?

Brow71189 · 2020-05-04T14:28:48Z

I think there are a lot of processing routines where copying the metadata is valid. Think of things like a Gaussian blur (or any other filter). Even alining a sequence does not invalidate metadata.
Right now if you process your data in Swift, you always loose all metadata.
If you then export or even just snapshot the result, you have no idea about how this data was acquired because all the information about it is just gone.
There might be operations that invalidate metdata, but defaulting to just dropping everything is the wrong approach in my opinion.

cmeyer · 2020-05-04T16:43:39Z

I've considered the exporting case in the past and my conclusion in the past was that export is a special operation that should consolidate processing information and include metadata from sources. See nion-software/nionswift#397 and nion-software/nionswift#398.

If the user has a way to view sources and metadata of those sources directly in an expanded metadata editor and export has an option for including processing info and metadata from sources, would that satisfy your concerns?

Brow71189 · 2020-05-05T16:21:43Z

It is better than nothing but I would still prefer if functions that do not cause metadata to be invalid would just copy it to their results.

Brow71189 · 2020-09-10T12:19:08Z

New example where this behavior leads to an awkward implementation:

Consider a camera with the new acquire_synchronized capabilities. It returns Partial data with the xdata attribute being the SI data which usually still contains the flyback pixels. So the obvoius thing to do would be to use the follwing code to crop the data:

from nion.data import xdata_1_0 as xd
partial_data = camera.acquire_synchronized_continue(...)
xdata = xd.data_slice(partial_data.xdata, (slice(None), slice(-2), ...))

Which is great because it keeps calibrations etc. and it is an API function.
But now we have to access a "private" function to get our metadata dict back:

xdata._set_metadata(partial_data.xdata.metadata)

Why does data_slice strip the metadata dict from the data? This makes absolutely no sense to me...

cmeyer · 2021-02-25T19:15:57Z

An additional rationale for this issue is described in the similar-to-niondata xarray project, where they only copy metadata unless explicitly requested or it is unambiguous.

xarray: What is your approach to metadata?

cmeyer · 2023-01-17T05:23:17Z

Additional notes 2021-07-21:

1:

In general metadata describes what we are looking at, how it was collected, and its calibrations. Obviously none of these things change from an align process. An integrate will obviously affect the effective exposure time.
Chris' comment that in the metadata there is an item specific to "data acquired from a camera" means that we should discard all information is highly facetious. The proper way to handle this is to have a metadata tag "Origin" that gets set to what the origin of the data was after copying the metadata. I would also add a tag "Trail" that appends the current Origin to the Trail of the source. So we would see "Trail"="Camera ELA,Align,Integrate,Crop" for a basic wq.
losing calibrations is nuts, and it is not helped by it not being possible to use inspector to copy them, as inspector only shows a heavily rounded version of the calibration.

2:

We would be hugely better off if we simply copied the meta-data from the parent to the child. There is indeed a chance that this could lead to some confusion, and this would be an improvement on the current situation where confusion is guaranteed.

Generally speaking, a more robust method for handling meta data is needed. We need options for what to do to handle complex cases, with sensible defaults for the most common operations.

Common use cases include:

multi-acquire NxM, align, integrate -- please do not lose the calibration information (scale nor intensity), microscope kV, acquisition detector, etc...
(this could be 10,000 spectra with acquisition time 1ms, or 1,000 images with 1us/pixel, or other)
acquire a spectrum, subtract dark image, multiply by gain image. please do not lose the calibration information (scale nor intensity), microscope kV, acquisition detector, etc...

Brow71189 mentioned this issue Oct 27, 2020

Allow metadata to be copied from the Metadata Panel nion-software/nionswift#551

Open

Brow71189 mentioned this issue Jan 15, 2021

Don't ignore metadata coming from camera. nion-software/nionswift-instrumentation-kit#60

Closed

This was referenced Jan 17, 2023

Establish rules/architecture for transferring acquisition metadata during processing nion-software/nionswift#915

Open

Multiplying a calibrated image by an uncalibrated image should result in a calibrated image #49

Open

cmeyer added audience - scripting affects advanced users doing scripting or plug-in development effort - month a month or less impact - medium has straightforward but non-obvious workaround needs planning needs planning reach - medium affects several users weekly labels Feb 23, 2024

cmeyer added effort - few weeks a few weeks impact - high has difficult or onerous workaround and removed effort - month a month or less impact - medium has straightforward but non-obvious workaround labels Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing functions should not strip metadata from processed xdata. #16

Processing functions should not strip metadata from processed xdata. #16

Brow71189 commented Apr 30, 2020 •

edited by cmeyer

Loading

Tasks

cmeyer commented Apr 30, 2020

cmeyer commented Apr 30, 2020

Brow71189 commented May 4, 2020

cmeyer commented May 4, 2020

Brow71189 commented May 5, 2020

Brow71189 commented Sep 10, 2020

cmeyer commented Feb 25, 2021

cmeyer commented Jan 17, 2023

Processing functions should not strip metadata from processed xdata. #16

Processing functions should not strip metadata from processed xdata. #16

Comments

Brow71189 commented Apr 30, 2020 • edited by cmeyer Loading

Tasks

cmeyer commented Apr 30, 2020

cmeyer commented Apr 30, 2020

Brow71189 commented May 4, 2020

cmeyer commented May 4, 2020

Brow71189 commented May 5, 2020

Brow71189 commented Sep 10, 2020

cmeyer commented Feb 25, 2021

cmeyer commented Jan 17, 2023

Brow71189 commented Apr 30, 2020 •

edited by cmeyer

Loading