Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import fails for large CIF files #118

Open
lahr-ul opened this issue Jun 18, 2021 · 3 comments
Open

Import fails for large CIF files #118

lahr-ul opened this issue Jun 18, 2021 · 3 comments

Comments

@lahr-ul
Copy link

lahr-ul commented Jun 18, 2021

We want to import CIF files of varying size in OMERO with the importer. This process works seamlessly for small files (e.g. 16MB) but not for large files (e.g. 192MB). The import of a small file takes about 20 seconds and the import of a large file times out after several hours (user session and/or Hibernate session). The issue can be reproduces on a production system and a local docker compose setup.

Here is an excerpt of the log:

2021-06-17 13:09:54,223 INFO  [    ome.formats.OMEROMetadataStoreClient] (1-thread-2) Handling # of references: 40000
2021-06-17 13:09:54,225 INFO  [    ome.security.basic.BasicEventContext] (.Server-78)  cctx:    group=3
2021-06-17 13:09:54,227 INFO  [         ome.security.basic.EventHandler] (.Server-78)  Auth:    user=53,group=3,event=null(User),sess=5b5cab00-7a3f-4883-8d85-925f8c12abb0
2021-06-17 13:09:54,240 INFO  [                 org.perf4j.TimingLogger] (.Server-78) start[1623935394224] time[15] tag[omero.call.success.ome.services.blitz.impl.MetadataStoreI$5.doWork]
2021-06-17 13:09:54,240 INFO  [    ome.formats.OMEROMetadataStoreClient] (1-thread-2) Starting referenceBatch #2

There are multiple Starting referenceBatch # statements, the importer hangs at "importing metadata" and after several hours there is a timeout.

We also tried to change some configuration values without success:

# Database 
omero.db.poolsize=100
# Memory
omero.jvmcfg.strategy=percent
omero.jvmcfg.max_system_memory=64000
omero.jvmcfg.percent.blitz=40
omero.jvmcfg.percent.indexer=10
omero.jvmcfg.percent.pixeldata=30
omero.jvmcfg.percent.repository=10
omero.jvmcfg.heap_size=8000
omero.pixeldata.threads=10
omero.threads.min_threads=10
@sbesson
Copy link
Member

sbesson commented Jun 18, 2021

Hi @lahr-ul we have been facing a similar issue while trying to load CIF files in the context of an IDR submission.

My suspicion is that the problem is related to the number of objects in the file, which can easily reach several 10K in this cytometry format. Do you know how many individual images (Bio-Formats series) are contained in the file?

@lahr-ul
Copy link
Author

lahr-ul commented Jun 18, 2021

Hi @lahr-ul we have been facing a similar issue while trying to load CIF files in the context of an IDR submission.

My suspicion is that the problem is related to the number of objects in the file, which can easily reach several 10K in this cytometry format. Do you know how many individual images (Bio-Formats series) are contained in the file?

About 70K. Sometimes smaller files with about 60MB also fail.

@sbesson
Copy link
Member

sbesson commented Jun 23, 2021

Sorry for dropping the ball. Understood and the large number of images (>10K) is most likely the reason for the hanging metadata due to the huge number of objects to be inserted into the database (typically ~10 / image so we are talking about 1M rows insertion).

We dealt with very similar scalability issues in the case of high-content screening datasets, which have similar number of images in the 1-100K range. The database bottlenecks have been mitigated by a series of optimizations like collapsing some of the elements e.g. ome/openmicroscopy#3261.

The only immediate workaround I can think of would be to export the CIF series as individual images e.g. using bfconvert or bioformats2raw and import the images individually. To be able to natively import these filesets, I suspect we need to identify the elements that are duplicated and could be reduced if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants