Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock during ORB shutdown #126

Open
okummer opened this issue Dec 2, 2021 · 4 comments
Open

Deadlock during ORB shutdown #126

okummer opened this issue Dec 2, 2021 · 4 comments

Comments

@okummer
Copy link
Contributor

okummer commented Dec 2, 2021

I found the following two threads in a server with stuck JMX calls:

   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.jmxRegistrationDebug(ManagedObjectManagerImpl.java:1225)
        - waiting to lock <0x00000000e3cf8858> (a org.glassfish.gmbal.impl.ManagedObjectManagerImpl)
        at org.glassfish.gmbal.impl.MBeanImpl.unregister(MBeanImpl.java:315)
        - locked <0x00000000e3df4ab8> (a org.glassfish.gmbal.impl.MBeanImpl)
        at org.glassfish.gmbal.impl.JMXRegistrationManager.unregister(JMXRegistrationManager.java:201)
        - locked <0x00000000e3f2e628> (a java.lang.Object)
        at org.glassfish.gmbal.impl.MBeanTree.unregister(MBeanTree.java:383)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.MBeanTree.unregister(MBeanTree.java:378)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.MBeanTree.clear(MBeanTree.java:419)
        - locked <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.init(ManagedObjectManagerImpl.java:322)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.close(ManagedObjectManagerImpl.java:344)
        at com.sun.corba.ee.impl.orb.ORBImpl.destroy(ORBImpl.java:1516)
        at ...

   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.glassfish.gmbal.impl.MBeanTree.getMBeanImpl(MBeanTree.java:413)
        - waiting to lock <0x00000000e3cf88a0> (a org.glassfish.gmbal.impl.MBeanTree)
        at org.glassfish.gmbal.impl.ManagedObjectManagerImpl.getFacetAccessor(ManagedObjectManagerImpl.java:746)
        - locked <0x00000000e3cf8858> (a org.glassfish.gmbal.impl.ManagedObjectManagerImpl)
        at org.glassfish.gmbal.impl.TypeConverterImpl$3.toManagedEntity(TypeConverterImpl.java:435)
        at org.glassfish.gmbal.impl.TypeConverterImpl$TypeConverterListBase.toManagedEntity(TypeConverterImpl.java:900)
        at org.glassfish.gmbal.impl.AttributeDescriptor.get(AttributeDescriptor.java:110)
        at org.glassfish.gmbal.impl.TypeConverterImpl$3.toManagedEntity(TypeConverterImpl.java:436)
        at org.glassfish.gmbal.impl.AttributeDescriptor.get(AttributeDescriptor.java:110)
        at org.glassfish.gmbal.impl.MBeanSkeleton.getAttribute(MBeanSkeleton.java:526)
        at org.glassfish.gmbal.impl.MBeanSkeleton.getAttributes(MBeanSkeleton.java:572)
        at org.glassfish.gmbal.impl.MBeanImpl.getAttributes(MBeanImpl.java:362)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes([email protected]/Unknown Source)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes([email protected]/Unknown Source)
        at javax.management.remote.rmi.RMIConnectionImpl.doOperation([email protected]/Unknown Source)
        at ...

The two threads are trying to obtains locks on MBeanTree and ManagedObjectManagerImpl in an inconsistent order, leading to a deadlock. This prevents the ORB (and hence the server) from shutting down.

@arjantijms
Copy link
Contributor

Thanks for the report! Which server did this concern, and what version of it?

@okummer
Copy link
Contributor Author

okummer commented Dec 2, 2021

This report applies to Glassfish 4.2.2. The server is a custom Java application that uses CORBA for outgoing connections and that is monitored over JMX. The other end of the CORBA connection also runs on Glassfish 4.2.2.

@arjantijms
Copy link
Contributor

I guess it's not easy to try to reproduce this on the current version of GlassFish?

The code for GlassFish 4.x wasn't transferred to Eclipse, and GlassFish 4.x is essentially unsupported.

@okummer
Copy link
Contributor Author

okummer commented Dec 3, 2021

This is probably a rare bug, which we observed once during thousands or tens of thousands of shutdowns. I have little hope that I can reproduce it under controlled conditions.

But as I looked into the code, I see that the affected classes actually stem from https://github.com/eclipse-ee4j/orb-gmbal and not from this exact repo. Should I recreate my issue there?

Over there, the code on the main branch and the line numbers have not changed since 4.0.0 (the release used by 4.2.2 of the ORB). There is still the pattern that a thread synchronized on ManagedObjectManagerImpl may want to synchronize on MBeanTree and that a thread synchronized on MBeanTree may want to synchronize on ManagedObjectManagerImpl.

In my specific case, the access on org.glassfish.gmbal.impl.ManagedObjectManagerImpl#jmxRegistrationDebugFlag in jmxRegistrationDebug() would not have to be synchronized. It would be sufficient to make the field jmxRegistrationDebugFlag volatile to enfore correct concurrency semantics. This would break the cycle.

There might be other cycles, but those that I could find immediately are harmless: org.glassfish.gmbal.impl.MBeanTree#setRoot calls org.glassfish.gmbal.impl.ManagedObjectManagerImpl#constructMBean, but only while it is already synchronized on ManagedObjectManagerImpl, so that's fine. MBeanImpl makes no other direct calls to ManagedObjectManagerImpl that I can find and while calls through the MBeanSkeleton might be problematic due to a reference back to the ManagedObjectManagerInternal, this reference is probably only used in the analyze phase and not when answering to the MBeanImpl.

Long story short: It might well be that removing the synchronization for jmxRegistrationDebug() actually breaks the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants