Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock - Polling Transport #22

Closed
cbdwan opened this issue Jul 7, 2020 · 6 comments
Closed

Deadlock - Polling Transport #22

cbdwan opened this issue Jul 7, 2020 · 6 comments

Comments

@cbdwan
Copy link

cbdwan commented Jul 7, 2020

New to github projects, but I think I have found a deadlock in code, that goes back several versions.

The socketTimeout thread is not involved, but I have two threads one writing a message to the session and one servicing a http request.

The thread (PlatMgrDelegate-thread-3) sending the message has gotten the lock on the mWriteBuffer in the EngineIOSocket and is blocked on the send call on the mTransport object which is synchronized.

The thread (https-jsse-nio2-0.0.0.0-2921-exec-131) servicing the http request is inside the PollingTransport owning the lock on it from the synchronized onRequest method, but if fires of a "drain" event that results in the EngineIOSocket.flush being called that attempts to lock the mWriteBuffer which is owned by the PlatMgrDelegate-thread-3 thread.

I was thinking that rather than synchronize on the mWriteBuffer in EngineIOSocket it should probably synchronize on the mTransport in the sendPacket and flush methods, but wanted to get that reviewed by the guys on this project so see if that was a good idea?

Found one Java-level deadlock:

"engineIo-socketTimeout-pool-1":
waiting to lock monitor 0x00007f80e8034b38 (object 0x00000000c2642e68, a io.socket.engineio.server.transport.Polling),
which is held by "https-jsse-nio2-0.0.0.0-2921-exec-131"
"https-jsse-nio2-0.0.0.0-2921-exec-131":
waiting to lock monitor 0x00007f8118041948 (object 0x00000000c2642e48, a java.util.LinkedList),
which is held by "PlatMgrDelegate-thread-3"
"PlatMgrDelegate-thread-3":
waiting to lock monitor 0x00007f80e8034b38 (object 0x00000000c2642e68, a io.socket.engineio.server.transport.Polling),
which is held by "https-jsse-nio2-0.0.0.0-2921-exec-131"

Java stack information for the threads listed above:

"engineIo-socketTimeout-pool-1":
at io.socket.engineio.server.transport.Polling.doClose(Polling.java:138)
- waiting to lock <0x00000000c2642e68> (a io.socket.engineio.server.transport.Polling)
at io.socket.engineio.server.Transport.close(Transport.java:65)
at io.socket.engineio.server.EngineIoSocket.clearTransport(EngineIoSocket.java:228)
at io.socket.engineio.server.EngineIoSocket.onClose(EngineIoSocket.java:261)
at io.socket.engineio.server.EngineIoSocket.lambda$new$0(EngineIoSocket.java:35)
at io.socket.engineio.server.EngineIoSocket$$Lambda$608/303505070.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"https-jsse-nio2-0.0.0.0-2921-exec-131":
at io.socket.engineio.server.EngineIoSocket.flush(EngineIoSocket.java:307)
- waiting to lock <0x00000000c2642e48> (a java.util.LinkedList)
at io.socket.engineio.server.EngineIoSocket.lambda$setTransport$10(EngineIoSocket.java:209)
at io.socket.engineio.server.EngineIoSocket$$Lambda$612/987190445.call(Unknown Source)
at io.socket.emitter.Emitter.emit(Emitter.java:117)
at io.socket.engineio.server.transport.Polling.onPollRequest(Polling.java:215)
at io.socket.engineio.server.transport.Polling.onRequest(Polling.java:55)
- locked <0x00000000c2642e68> (a io.socket.engineio.server.transport.Polling)
at io.socket.engineio.server.EngineIoSocket.onRequest(EngineIoSocket.java:121)
at io.socket.engineio.server.EngineIoServer.handleRequest(EngineIoServer.java:126)
at com.bcdfx.engineio.nio.EngineIOServlet.service(EngineIOServlet.java:30)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.filters.HttpHeaderSecurityFilter.doFilter(HttpHeaderSecurityFilter.java:126)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.filters.CorsFilter.handleNonCORS(CorsFilter.java:364)
at org.apache.catalina.filters.CorsFilter.doFilter(CorsFilter.java:170)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:668)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:690)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:615)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:818)
at org.apache.tomcat.util.net.Nio2Endpoint$SocketProcessor.doRun(Nio2Endpoint.java:1621)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
- locked <0x00000000c5115058> (a org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper)
at org.apache.tomcat.util.net.AbstractEndpoint.processSocket(AbstractEndpoint.java:1082)
at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$2.completed(Nio2Endpoint.java:566)
at org.apache.tomcat.util.net.Nio2Endpoint$Nio2SocketWrapper$2.completed(Nio2Endpoint.java:544)
at org.apache.tomcat.util.net.SecureNio2Channel$1.completed(SecureNio2Channel.java:969)
at org.apache.tomcat.util.net.SecureNio2Channel$1.completed(SecureNio2Channel.java:898)
at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
at sun.nio.ch.Invoker$2.run(Invoker.java:218)
at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
"PlatMgrDelegate-thread-3":
at io.socket.engineio.server.transport.Polling.send(Polling.java:69)
- waiting to lock <0x00000000c2642e68> (a io.socket.engineio.server.transport.Polling)
at io.socket.engineio.server.EngineIoSocket.flush(EngineIoSocket.java:308)
- locked <0x00000000c2642e48> (a java.util.LinkedList)
at io.socket.engineio.server.EngineIoSocket.sendPacket(EngineIoSocket.java:298)
at io.socket.engineio.server.EngineIoSocket.send(EngineIoSocket.java:84)
at com.bcdfx.engineio.nio.DataBridge.onIncomingServerPacket(DataBridge.java:147)
- locked <0x00000000c8152c50> (a com.bcdfx.engineio.nio.DataBridge)
at com.bcdfx.engineio.nio.DataBridge.access$100(DataBridge.java:25)
at com.bcdfx.engineio.nio.DataBridge$2.call(DataBridge.java:52)
at io.socket.emitter.Emitter.emit(Emitter.java:117)
at com.bcdfx.engineio.nio.PlatformManager$MessageRunnable.run(PlatformManager.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

@trinopoty
Copy link
Collaborator

We cannot synchronize on mTransport because it changes when transport is upgraded from HTTP to WebSocket.
I'll look into this deadlock issue in more detail.

@trinopoty
Copy link
Collaborator

This case seems pretty difficult to reproduce. As far as the code goes, it should not deadlock like this.
Can you provide a sample case where this deadlock occurs.

@cbdwan
Copy link
Author

cbdwan commented Jul 7, 2020

Hi Trinopoty
I will try and extract a simple sample case but with a debugger if the put a breakpoint on the "emit("drain")" in the Polling.onPollRequest method and pause one of the polling requests, then have a thread send a packet, and then release the breakpoint you will get the problem. It needs to happen for a connection before it upgrades to a websocket or that was unable to upgrade to a websocket and so is running in polling mode. I was doing a soak test with many connections, while publishing several events to the endpoints.

@cbdwan
Copy link
Author

cbdwan commented Jul 8, 2020

Hi Trinopoty
Attached is a unit test that will simulate the problem, it starts two threads one sending a packet and one polling the socket. It should exit with the deadlock after a few seconds. Just drop it in the engine.io-server-java-master\engine.io-server\src\test\java\io\socket\engineio\server directory and it should run.

Output look like:

Sending packet
Deadlock found
"Thread-2" Id=13 BLOCKED on java.util.LinkedList@5f9edf14 owned by "Thread-1" Id=12
io.socket.engineio.server.EngineIoSocket.flush(EngineIoSocket.java:302)
io.socket.engineio.server.EngineIoSocket.lambda$10(EngineIoSocket.java:207)
io.socket.engineio.server.EngineIoSocket$$Lambda$5/1982791261.call(Unknown Source)
io.socket.emitter.Emitter.emit(Emitter.java:117)
io.socket.engineio.server.transport.Polling.onPollRequest(Polling.java:215)
io.socket.engineio.server.transport.Polling.onRequest(Polling.java:55)
io.socket.engineio.server.EngineIoSocket.onRequest(EngineIoSocket.java:119)
io.socket.engineio.server.DeadLockTest$3.run(DeadLockTest.java:54)
"Thread-1" Id=12 BLOCKED on io.socket.engineio.server.transport.Polling@68746f22 owned by "Thread-2" Id=13
io.socket.engineio.server.transport.Polling.send(Polling.java:69)
io.socket.engineio.server.EngineIoSocket.flush(EngineIoSocket.java:305)
io.socket.engineio.server.EngineIoSocket.sendPacket(EngineIoSocket.java:296)
io.socket.engineio.server.EngineIoSocket.send(EngineIoSocket.java:82)
io.socket.engineio.server.DeadLockTest$2.run(DeadLockTest.java:42)

DeadLockTest.zip

@trinopoty
Copy link
Collaborator

I haven't had the time to look into this issue yet. I'll probably assess this issue this week.

trinopoty added a commit to trinopoty/engine.io-server-java that referenced this issue Jul 21, 2020
@trinopoty
Copy link
Collaborator

This issue has been fixed and I've also integrated your test into the suite. Look out for a new version on maven in a few hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants