Fix ordering of OriginalFile objects registered in saveFileset() #148
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Short summary
Initially reported in https://forum.image.sc/t/in-place-import-path-from-image-id-with-omero-database-incoherence-with-originalfile-filesetentry/81509 by @Tom-TBT, the link between FilesetEntry and OriginalFile objects created at import time can be incorrect under some conditions for multi-file filesets imports.
Technical description
During the import process, the server receives a Fileset object and a matching list of paths. The saveFileset() API first registers these paths as OriginalFile then associate these files to each fileset entry in order.
The current implementation of _internalRegister groups the input checked paths by parent using a ListMultimap and iterates through each group to create or retrieve the OriginalFile objects. Under some conditions, the returned list will be in a different order than the input list. This change was originally implemented in ome/openmicroscopy#3399.
This commit does not modify the mapping and creation logic but adds a final step to return a list of OriginalFile sorted in the same order as the input checked paths. This contract is also added to the Javadoc.
Testing
The issue should only affect multi-file filesets with multiple directories. Single-file filesets or multi-file filesets contained within a single directory should be unaffected. In the HCS domain, several formats will span multiple folders and can be used for testing. In the Digital Pathology domain, the VSI file format generated by Evident (formerly Olympus) cellSens software is a good candidate for testing.
Without this PR, importing a fileset in any of these formats using either OMERO.insight or the command-line, independently of the selected options, can result in incorrect links between
FilesetEntry
andOriginalFile
objects. This can be visualized by looking the order of theImported from
andPaths on server
lists in OMERO.web when inspecting the Fileset info e.g. in https://idr.openmicroscopy.org/webclient/?show=image-4995801The
Paths on server
list should be grouped by sub-directory in an non-deterministic order. This means 1- multiple imports of the same fileset might result in possibly different orders of the server paths 2- it is possible that an import will result in the server paths matching the client paths.With this PR including, the same import workflow should result consistently in correct
FilesetEntry
/OriginalFile
links. At the OMERO.web level, the order of the client paths and server paths lists should be consistent across all formats.Next steps
This change only aims to fix the issue with new imports and won't address incorrect FilesetEntry/OriginaFile links for existing filesets. For the latter, the only viable approach will be to used an upgrade script that ideally takes advantage of the
Fileset.clientPath
entry, if this has not been modified. I will write up a separate issue discussing this upgrade script as well as wider cleanups and DB schema upgrade associated with this problem