Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of cleanup in ProcessContainer #141

Open
chris-allan opened this issue Jul 18, 2023 · 4 comments
Open

Lack of cleanup in ProcessContainer #141

chris-allan opened this issue Jul 18, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@chris-allan
Copy link
Member

chris-allan commented Jul 18, 2023

While debugging a series of out of memory exceptions on one particular installation during a very large (over 100,000) number of imports Glencoe has noticed that the number of ome.services.blitz.repo.ManagedImportProcessI instances on the heap never decreases. The longer a server is running and the more complex the imports, the more of the available heap these objects consume. If left long enough, this will make a server unusable. As currently implemented this is due to ProcessContainer.removeProcess() not being called.

A lot of this code is quite old. The ProcessContainer itself was added as part of the initial work on OMERO 5.0.0 in ome/openmicroscopy#577 back in January, 2013. It's remained largely unchanged since. The intent for most of this work is outlined in the OMERO.fs documentation:

A few of these things never really got implemented as originally intended. In particular, the Process.ping(), Process.shutdown(), ProcessContainer.pingAll() and ProcessContainer.shutdownAll() methods are either not called or throw exceptions signifying they are not yet implemented. It's also not really clear if currently the ProcessContainer serves any other purpose beyond satisfying ManagedRepository.listImports(). While this method is client API accessible I'm unable to find any use of it by the OMERO CLI tooling, OMERO.web or OMERO.insight.

There is definitely more investigation and discussion to be had on whether it is valuable to implement the original intent of ProcessContainer either in full or in part. The community now has over 10 years of experience with OMERO.fs to draw on and lots of things have been added to the OMERO server import infrastructure during that time.

While this happens, Glencoe has started a repository to work on an OMERO server plugin which can act as a steward for ProcessContainer. Hopefully, this will help people out who currently have problems and support the discussion on where to go from here:

/cc @jburel, @joshmoore, @kkoz, @sbesson

@chris-allan
Copy link
Member Author

You can check how many ManagedImportProcessI instances are in flight on your OMERO server system by executing:

jmap -histo:live <omero_blitz_pid> &> histo.out
grep 'ManagedImportProcess' histo.out

This will give output like this:

  85:          2362         188960  ome.services.blitz.repo.ManagedImportProcessI

Where the first column is the object ranking by bytes consumed, the second is the number of instances, and the third is the number of bytes consumed.

@sbesson sbesson added the bug Something isn't working label Jul 18, 2023
@joshmoore
Copy link
Member

Wow. "bug" feels like an understatement. At a glance, I would assume the change most inline with the original intent would be to pass the processes field into the ManagedImportProcessI instance in order to remove the instance during cleaning, somewhere here:

    public void closeCalled(int idx) {
        final UploadState state = uploaders.getIfPresent(idx);
        if (state == null) {
            log.warn(String.format("closeCalled(%s) - no such object", idx));
        } else {
            uploaders.invalidate(idx);
            log.debug(String.format("closeCalled(%s) successfully", idx));
        }
    }

    //
    // CLOSE LOGIC
    //

    @Override
    protected void preClose(Current current) throws Throwable {
        // no-op
    }

    @Override
    protected void postClose(Current current) {
        // no-op
    }

@chris-allan
Copy link
Member Author

The first pass of a cleanup plugin for OMERO servers has now been released (0.1.0):

@sbesson sbesson self-assigned this Aug 14, 2023
@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/first-release-of-omero-process-container-steward/85067/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

4 participants