Under moderate load, JVM eventually spins indefinitely when this shim is in use #7

alienth · 2017-03-13T07:28:50Z

After writing a few hundred thousand datapoints, the JVM inevitably eats up all CPU and sits there not doing much. Even if you stop writing datapoints to the service, the process will sit there indefinitely eating up CPU. JVM becomes unresponsive to sleuthing via JMX or jvmtop. Tried both OpenJDK and Oracle JRE, both 1.8, and got the same behaviour. Very reproducable: Jam hundreds of thousands of datapoints at it for ~10 or so minutes.

When the proc enters this state, no communication is happening with Cassandra.

All of the metrics and tagk/tagvs I'm sending already have IDs, so I don't believe it is an issue with the shim's pseudo row lock method to acquire those.

I was able to get two thread dumps while a proc was in this state. These dumps are on the same PID, separated by a few minutes.

threads5.txt
threads6.txt

Some type of soft resource deadlock, perhaps?

alienth · 2017-03-13T07:47:40Z

Mm, looks like a bunch of the threads are waiting after getOrCreateId. Since all of my data already has IDs, perhaps this indicates there is some issue with fetching IDs? :/ Table seems fine, tho. And as I mentioned, while the proc is spinning there is no communication with cassandra happening.

alienth · 2017-03-14T00:20:14Z

After leaving the JVM like this for a few hours, eventually heap slowly grows and will start hitting OOMs. Got a heap dump when it happened so I'll see what I can find.

alienth · 2017-03-14T01:57:56Z

Hmmmm

alienth · 2017-03-14T02:04:46Z

All of those nodes contain the following:

alienth · 2017-03-14T02:08:51Z

alienth · 2017-03-14T02:46:02Z

Reproduced behaviour with auto_create_metrics set to false. Now all of the threads are WAITING contain a getId frame, kinda as expected.

alienth · 2017-03-14T21:59:09Z

threads.txt

Second dump, same proc, few minutes later:

threads.txt

alienth · 2017-03-14T22:31:56Z

Thread dump with async UI fetching code enabled:

threads.txt

alienth · 2017-03-14T23:41:50Z

Did a tcpdump of the chatter with cassandra up to the failure. This is a bit curious:

alienth · 2017-03-15T00:19:18Z

New thread dump on astyanax 3.9.0, and pool connections dropped to 2 per host.
threads.txt

alienth · 2017-03-15T02:37:19Z

Here's something curious I've found in MAT: A bunch of the threads are parking waiting for a single lock. That lock is being held by a ThreadPoolExecutor by AstyanaxConfigurationImpl. The thing that is weird is that the executor has a completedTaskCount value of 0..

alienth · 2017-03-15T02:48:12Z

Confirmed that the ThreadPoolExecutor which is shared by both the AstyanaxConfigurationImpl and theThriftKeyspaceImpl is at maximum capacity, and has never completed a single task (in the heap dump I have).

alienth · 2017-03-15T06:22:22Z

Cassandra yaml in use: https://gist.github.com/alienth/34d448e7525b5c2ded3471f7769c6841
Table definitions: https://gist.github.com/alienth/3e6493598e0c37824417b4ffd53843ef

(Note that the WITH COMPACT STORAGE line is what makes them visible to Thrift clients).

Keyspace stats:

Keyspace : tsdbuid
        Read Count: 3078259
        Read Latency: 0.0274613627378333 ms.
        Write Count: 118
        Write Latency: 0.02597457627118644 ms.

Keyspace : tsdb
        Read Count: 9
        Read Latency: 0.2872222222222222 ms.
        Write Count: 653899
        Write Latency: 0.01409285837721116 ms.

Deets of setup: 2 node cluster, RF of 1. Dell servers with 32GB of RAM, 16 core E5640. Disks are Dell SAS drives.

alienth · 2017-03-15T21:58:51Z

threads.txt

alienth · 2017-03-15T22:35:17Z

Solved this by customizing the astyanax threadpool to have 64 workers, and bumping connection pool maxconns per host to 20.

alienth · 2017-03-16T00:30:08Z

Addressed in #9.

alienth changed the title ~~Under moderate load, JVM eventually stops responding when this shim is in use~~ Under moderate load, JVM eventually spins indefinitely when this shim is in use Mar 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Under moderate load, JVM eventually spins indefinitely when this shim is in use #7

Under moderate load, JVM eventually spins indefinitely when this shim is in use #7

alienth commented Mar 13, 2017 •

edited

Loading

alienth commented Mar 13, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017 •

edited

Loading

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 15, 2017

alienth commented Mar 15, 2017

alienth commented Mar 15, 2017

alienth commented Mar 15, 2017 •

edited

Loading

alienth commented Mar 15, 2017

alienth commented Mar 15, 2017

alienth commented Mar 16, 2017

Under moderate load, JVM eventually spins indefinitely when this shim is in use #7

Under moderate load, JVM eventually spins indefinitely when this shim is in use #7

Comments

alienth commented Mar 13, 2017 • edited Loading

alienth commented Mar 13, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017 • edited Loading

alienth commented Mar 14, 2017

alienth commented Mar 14, 2017

alienth commented Mar 15, 2017

alienth commented Mar 15, 2017

alienth commented Mar 15, 2017

alienth commented Mar 15, 2017 • edited Loading

alienth commented Mar 15, 2017

alienth commented Mar 15, 2017

alienth commented Mar 16, 2017

alienth commented Mar 13, 2017 •

edited

Loading

alienth commented Mar 14, 2017 •

edited

Loading

alienth commented Mar 15, 2017 •

edited

Loading