-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Under moderate load, JVM eventually spins indefinitely when this shim is in use #7
Comments
Mm, looks like a bunch of the threads are waiting after |
After leaving the JVM like this for a few hours, eventually heap slowly grows and will start hitting OOMs. Got a heap dump when it happened so I'll see what I can find. |
Reproduced behaviour with |
Second dump, same proc, few minutes later: |
Thread dump with async UI fetching code enabled: |
New thread dump on astyanax 3.9.0, and pool connections dropped to 2 per host. |
Here's something curious I've found in MAT: A bunch of the threads are parking waiting for a single lock. That lock is being held by a |
Confirmed that the |
Cassandra yaml in use: https://gist.github.com/alienth/34d448e7525b5c2ded3471f7769c6841 (Note that the Keyspace stats:
Deets of setup: 2 node cluster, RF of 1. Dell servers with 32GB of RAM, 16 core E5640. Disks are Dell SAS drives. |
Solved this by customizing the astyanax threadpool to have 64 workers, and bumping connection pool maxconns per host to 20. |
Addressed in #9. |
After writing a few hundred thousand datapoints, the JVM inevitably eats up all CPU and sits there not doing much. Even if you stop writing datapoints to the service, the process will sit there indefinitely eating up CPU. JVM becomes unresponsive to sleuthing via JMX or jvmtop. Tried both OpenJDK and Oracle JRE, both 1.8, and got the same behaviour. Very reproducable: Jam hundreds of thousands of datapoints at it for ~10 or so minutes.
When the proc enters this state, no communication is happening with Cassandra.
All of the metrics and tagk/tagvs I'm sending already have IDs, so I don't believe it is an issue with the shim's pseudo row lock method to acquire those.
I was able to get two thread dumps while a proc was in this state. These dumps are on the same PID, separated by a few minutes.
threads5.txt
threads6.txt
Some type of soft resource deadlock, perhaps?
The text was updated successfully, but these errors were encountered: