Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ordering of OriginalFile objects registered in saveFileset() #148

Merged
merged 1 commit into from
May 7, 2024

Conversation

sbesson
Copy link
Member

@sbesson sbesson commented Mar 22, 2024

Short summary

Initially reported in https://forum.image.sc/t/in-place-import-path-from-image-id-with-omero-database-incoherence-with-originalfile-filesetentry/81509 by @Tom-TBT, the link between FilesetEntry and OriginalFile objects created at import time can be incorrect under some conditions for multi-file filesets imports.

Technical description

During the import process, the server receives a Fileset object and a matching list of paths. The saveFileset() API first registers these paths as OriginalFile then associate these files to each fileset entry in order.

The current implementation of _internalRegister groups the input checked paths by parent using a ListMultimap and iterates through each group to create or retrieve the OriginalFile objects. Under some conditions, the returned list will be in a different order than the input list. This change was originally implemented in ome/openmicroscopy#3399.

This commit does not modify the mapping and creation logic but adds a final step to return a list of OriginalFile sorted in the same order as the input checked paths. This contract is also added to the Javadoc.

Testing

The issue should only affect multi-file filesets with multiple directories. Single-file filesets or multi-file filesets contained within a single directory should be unaffected. In the HCS domain, several formats will span multiple folders and can be used for testing. In the Digital Pathology domain, the VSI file format generated by Evident (formerly Olympus) cellSens software is a good candidate for testing.

Without this PR, importing a fileset in any of these formats using either OMERO.insight or the command-line, independently of the selected options, can result in incorrect links between FilesetEntry and OriginalFile objects. This can be visualized by looking the order of the Imported from and Paths on server lists in OMERO.web when inspecting the Fileset info e.g. in https://idr.openmicroscopy.org/webclient/?show=image-4995801

Screenshot 2024-03-22 at 10 37 49

The Paths on server list should be grouped by sub-directory in an non-deterministic order. This means 1- multiple imports of the same fileset might result in possibly different orders of the server paths 2- it is possible that an import will result in the server paths matching the client paths.

With this PR including, the same import workflow should result consistently in correct FilesetEntry / OriginalFile links. At the OMERO.web level, the order of the client paths and server paths lists should be consistent across all formats.

Next steps

This change only aims to fix the issue with new imports and won't address incorrect FilesetEntry/OriginaFile links for existing filesets. For the latter, the only viable approach will be to used an upgrade script that ideally takes advantage of the Fileset.clientPath entry, if this has not been modified. I will write up a separate issue discussing this upgrade script as well as wider cleanups and DB schema upgrade associated with this problem

During the import process, the server receives a Fileset object and a
matching list of paths. The saveFileset() API first registers these
paths as OriginalFile then associate these files to each fileset entry
in order.

The current implementation of _internalRegister groups the input checked
paths by parent using a ListMultimap and iterates through each group to
create or retrieve the OriginalFile objects. Under some conditions, the
returned list will be in a different order than the input list.

This commit does not modify the mapping and creation logic but adds a
final step to return a list of OriginalFile sorted in the same order
as the input checked paths. This contract is also added to the Javadoc.
@dominikl
Copy link
Member

Looks good to me 👍 On demo server (without PR) I got the different ordering ('imported from' vs 'path on server'), on merge-ci (with PR) the order was the same.

@jburel jburel merged commit 7feed13 into ome:master May 7, 2024
8 checks passed
@sbesson sbesson deleted the saveFileset_originalfile_ordering_bug branch July 11, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants