Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in Solr connections leading to Tomcat crash #14

Open
jordanpadams opened this issue Oct 19, 2023 · 4 comments · Fixed by #16
Open

Memory leak in Solr connections leading to Tomcat crash #14

jordanpadams opened this issue Oct 19, 2023 · 4 comments · Fixed by #16
Assignees
Labels
B14.1 B15.1 bug Something isn't working i&t.skip Skip I&T of this task/ticket s.high High severity sprint-backlog

Comments

@jordanpadams
Copy link
Member

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When we got bombarded by openai crawling, Tomcat was spinning up too many open threads and crashing daily.

🕵️ Expected behavior

I expected we could take the traffic.

📜 To Reproduce

See SA logs and convo on Slack.

See these errors in catalina.out

"Connection evictor" #762 daemon prio=5 os_prio=0 tid=0x00007fc2700c9000 nid=0xd79b waiting on condition [0x00007fc248002000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.http.impl.client.IdleConnectionEvictor$1.run(IdleConnectionEvictor.java:66)
	at java.lang.Thread.run(Thread.java:748)

And when you take ds-view offline, it is explicitly noting connection evictor issues related to [ds-view], noting there is a memory leak spawning new threads and not closing them.

🖥 Environment Info

Chrome

📚 Version of Software Used

v2.14.3

🩺 Test Data / Additional context

No response

🦄 Related requirements

No response

⚙️ Engineering Details

No response

@jordanpadams jordanpadams added bug Something isn't working needs:triage labels Oct 19, 2023
@jordanpadams jordanpadams self-assigned this Oct 19, 2023
jordanpadams added a commit that referenced this issue Oct 19, 2023
Fix memory leaks for solr connection left open.

Not sure why this wasn't an issue before. Maybe Tomcat would kill these threads after so long, and since we are getting pummeled with queries, the threads didn't have a chance to die.

Either way, closing the solr client connections each time seems to fix it.

Fixes #14
jordanpadams added a commit that referenced this issue Oct 19, 2023
Fix memory leaks for solr connection left open.

Not sure why this wasn't an issue before. Maybe Tomcat would kill these threads after so long, and since we are getting pummeled with queries, the threads didn't have a chance to die.

Either way, closing the solr client connections each time seems to fix it.

Cherry-picked from `hotfix/2.14.4` branch and `v2.14.4` release

Fixes #14
Refs #15
@tloubrieu-jpl tloubrieu-jpl added the i&t.skip Skip I&T of this task/ticket label Jan 29, 2024
@jordanpadams jordanpadams reopened this Dec 4, 2024
@jordanpadams
Copy link
Member Author

jordanpadams commented Dec 4, 2024

This issue may have reared its head again, and may be related to the latest solrj upgrades:

Additional leak possible in ds-view:

04-Dec-2024 11:53:27.292 SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ds-view] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@17fc1173]) and a value of type [org.eclipse.jetty.util.Pool.MonoEntry] (value [MonoEntry@6ae31bac{IDLE,pooled=RetainableByteBuffer@b7ddc10{DirectByteBuffer@3916fe27[p=0,l=0,c=16384,r=0]={<<<>>>\x00\x018\x01\x04\x00\x00\x00\x01...\x00\x00\x00\x00\x00\x00\x00},r=0}}]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

Here is a discussion on possible resolution upgrading Jetty to fix this leak. https://lists.apache.org/thread/n2rgq5l5jngbnpz8my9flk52zk7zg2xb

@jordanpadams
Copy link
Member Author

Most likely tightly coupled with NASA-PDS/registry-legacy-solr#172

@jordanpadams
Copy link
Member Author

Nightly Tomcat restarts have been instantiated by SAs, and seems to avoiding this now. Not an ideal solution. But it works for now. Closing this.

@jordanpadams
Copy link
Member Author

Still not fixed. It is pretty clear this is a leak in the Solr HTTP2Client. When trying to restart Tomcat:

Caused by: java.lang.InterruptedException
	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1343)
	at java.base/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
	at org.eclipse.jetty.io.ManagedSelector.doStop(ManagedSelector.java:137)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:132)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:182)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:205)
	at org.eclipse.jetty.io.SelectorManager.doStop(SelectorManager.java:280)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:132)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:182)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:205)
	at org.eclipse.jetty.io.ClientConnector.doStop(ClientConnector.java:383)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:132)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:182)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:205)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:132)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:182)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:205)
	at org.eclipse.jetty.http2.client.http.HttpClientTransportOverHTTP2.doStop(HttpClientTransportOverHTTP2.java:106)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:132)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.stop(ContainerLifeCycle.java:182)
	at org.eclipse.jetty.util.component.ContainerLifeCycle.doStop(ContainerLifeCycle.java:205)
	at org.eclipse.jetty.client.HttpClient.doStop(HttpClient.java:265)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:132)
	at org.apache.solr.client.solrj.impl.Http2SolrClient.close(Http2SolrClient.java:296)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B14.1 B15.1 bug Something isn't working i&t.skip Skip I&T of this task/ticket s.high High severity sprint-backlog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants