Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch Data Nodes memory exhaustion after upgrade from 2.9 to 2.12 (JDK 21 upgrade) #12454

Closed
rlevytskyi opened this issue Feb 26, 2024 · 45 comments
Labels

Comments

@rlevytskyi
Copy link

Describe the bug

Hello OpenSearch Team,
We’ve just updated our OpenSearch cluster from version 2.9.0 to 2.12.0.
Among other issues, we’ve noticed that Opensearch is now consuming waaay more memory than previous version, i.e. it became unusable with the same configuration, even after providing it with 15% more RAM. To make it responsive again, we had to close many indices.

Related component

Other

To Reproduce

  1. Have a 2.9 cluster of 4 data nodes with 112GB of Xmx RAM and 13.6 TB of storage
  2. Fill it with 5500 indices (mostly small of 1 shards, but several big of 4 shards) up to 75% of capacity
  3. Update 2.9 to 2.12 and add RAM to make it 128GB
  4. See many GC messages at logs and almost inoperable cluster
  5. Close 2000 indices to make it work again

Expected behavior

We didn't expect significant memory usage increase at version upgrade

Additional Details

Plugins
Security pluging for SAML authn and authz

Screenshots
Please note almost horizontal Heap usage before upgrade, increase after upgrade, and horizontal again after closing some indices.
image

Host/Environment (please complete the following information):

  • OS: Oracle Linux 8.9
  • OS: Docker image opensearchproject/opensearch:2.12.0
@rlevytskyi rlevytskyi added bug Something isn't working untriaged labels Feb 26, 2024
@github-actions github-actions bot added the Other label Feb 26, 2024
@rlevytskyi rlevytskyi changed the title OpenSearch Data Nodes memory exhaustion after upgrade from 2.9 to 2.12[BUG] <title> OpenSearch Data Nodes memory exhaustion after upgrade from 2.9 to 2.12 Feb 26, 2024
@shwetathareja
Copy link
Member

shwetathareja commented Feb 27, 2024

Thanks @rlevytskyi for reporting the issue. did you try a heap dump? It will help us debug further here. (You can try with smaller heap, the issue might generate faster in that case).

Couple of questions:

  1. Are you running non dedicated cluster manager cluster?
  2. What is the cluster state size, you can check via _cluster/state API output?
  3. How many shards are there overall?
  4. When you observed JVM heap spiking, was it during upgrade from 2.9 to 2.12 or post upgrade as well, it was consistently high?

@rlevytskyi
Copy link
Author

rlevytskyi commented Feb 27, 2024

Thank you @shwetathareja for your reply!
Here are clarifications:

  1. Yes we are running non-dedicated manager cluster, we have four nodes running both data and master-eligible nodes and two coordinating nodes:
    % curl logs:9200/_cat/nodes\?s=name
    d data - v480-data.company.com
    m master - v480-master.company.com
    d data - v481-data.company.com
    m master * v481-master.company.com
    d data - v482-data.company.com
    m master - v482-master.company.com
    d data - v483-data.company.com
    m master - v483-master.company.com
    - - - v484-coordinator.company.com
    - - - v485-coordinator.company.com
  2. Quite a lot of output:
    % curl logs:9200/_cluster/state | wc
    0 4989 193053405
  3. 26696 reported by _cat/shards
  4. It was hitting the top during upgrade and also post upgrade.

@rlevytskyi
Copy link
Author

Re heap dump, where should we collect it and when?
Right now, we see nothing at data nodes.
Coordinating nodes sometimes telling something like
[INFO ][o.o.i.b.HierarchyCircuitBreakerService] [v484-coordinator.company.com] attempting to trigger G1GC due to high heap usage [8204216264]
[INFO ][o.o.i.b.HierarchyCircuitBreakerService] [v484-coordinator.company.com] GC did bring memory usage down, before [8204216264], after [3248648136], allocations [71], duration [62]
but will it's heap dump useful?

@reta
Copy link
Collaborator

reta commented Feb 27, 2024

@rlevytskyi one of the major changes in 2.12 is that it is bundled with JDK-21 by default, any chances you could downgrade JDK to 17 for your deployment (may need altering Docker image) to eliminate the JDK version change as a suspect? Thank you.

@rlevytskyi
Copy link
Author

Thank you Andriy for your reply.
I've searched the https://github.com/opensearch-project/OpenSearch and was unable to find appropriate Dockerfile.
Could you please point me to the right one?

@reta
Copy link
Collaborator

reta commented Feb 27, 2024

I think you need those https://github.com/opensearch-project/opensearch-build/tree/main/docker/release/dockerfiles, but may be simpler way is to "inherit" from 2.12 image and install/replace JDK version to run with.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5]
@rlevytskyi Thanks for filing - we will keep this issue untriaged for 1 week and if it does not have a root cause we will close the issue.

The following were some recent investigations in the security plugin for your consideration.

@rlevytskyi
Copy link
Author

rlevytskyi commented Feb 29, 2024

I am unable to build OpenSearch image yet.
Moreover, Dockerfile ( https://github.com/opensearch-project/opensearch-build/blob/main/docker/release/dockerfiles/opensearch.al2.dockerfile ) is telling that

This dockerfile generates an AmazonLinux-based image containing an OpenSearch installation (1.x Only).
Dockerfile for building an OpenSearch image.
It assumes that the working directory contains these files: an OpenSearch tarball (opensearch.tgz), log4j2.properties, opensearch.yml, opensearch-docker-entrypoint.sh, opensearch-onetime-setup.sh.`

First of all, it tells "1.x Only"
Second, it tells that I have to put some files there but I see no way to make sure I use exactly the same files you use.

So my question is if there is a way to build image exactly as yours to make sure we have the same configuration?

@peternied
Copy link
Member

peternied commented Mar 1, 2024

@rlevytskyi I believe the new file is right next to the that one dockerfile. Take a look at the readme.md, maybe that will help if you are looking to construct a docker image from a custom configuration

Note; following "inherit" from 2.12 image and install/replace JDK version to run with. seems easier IMO

@peternied
Copy link
Member

peternied commented Mar 1, 2024

@rlevytskyi I'm not sure if you've managed to capture and investigate a heap dump of the OpenSearch process, see this guide
to capture that information in a docker environment [1]. This will steer the investigation towards what is causing memory to be consumed. They can also be used to compare a 2.9 vs 2.12 versions for the difference.

@rlevytskyi
Copy link
Author

Thank you Peter,
However, neither I am a Java programmer nor a Docker enthusiast, and that "inherit" from 2.12 image and install/replace JDK version to run with" doesn't seem to be clear to me.
As far as I understand, it can be achieved by changing "ENTRYPOINT" to "/bin/bash", starting a container, install new Java inside, set JAVA_HOME and run Opensearch.
However, you need to rebuild the image to change ENTRYPOINT, and we got in recursion...

@rlevytskyi
Copy link
Author

Re Heap Dump, I managed to get and even sanitize it using Paypal's tool https://github.com/paypal/heap-dump-tool .
However, it's not feasible to get it right now because cluster is running smoothly now.

@rlevytskyi
Copy link
Author

Thank you again @peternied Peter for pointing out the https://github.com/opensearch-project/opensearch-build/blob/main/docker/release/README.md
I managed to build the 2.12 image with JDK17 from 2.11.1.
Have a nice weekend!

@rlevytskyi
Copy link
Author

rlevytskyi commented Mar 5, 2024

I managed to create an image based on 2.12 using the following Dockerfile:
FROM opensearchproject/opensearch:2.12.0
USER root
RUN dnf install -y java-17-amazon-corretto
USER opensearch
ENV JAVA_HOME=/usr
`
Running it at the test installation doesn't reveal any memory usage difference.
Looking forward to run a big (prod) installation with it.
Do you guys think if it is safe?

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5]

Do you guys think if it is safe?

@rlevytskyi Without a root cause / and bugfix it is hard to qualify what next steps to take. I would recommend doing testing and have a mitigation plan if something happens, but your mileage my vary.

Thanks for filing - we will keep this issue untriaged for 1 week and if it does not have a root cause we will close the issue.

Since it has been a week and there is no root cause, we are closing out this issue. Feel free to open a new issue if you find a proximal cause from a heap analysis or a way to reproduce the leak.

@tophercullen
Copy link

tophercullen commented May 5, 2024

Want to chime in and say we were running into something similar after upgrading to 2.12. Suddenly all sorts of previously normal operations were causing the overall parent circuit breakers to trip, and there were significantly more GC logs emitted by opensearch overall. This problem was most exacerbated by the snapshot and reindex APIs.

I applied the image changes from @rlevytskyi to use JDK17 and it has completely solved the issues and symptoms we were seeing. Average heap dropped considerably and is much more stable.

@dblock
Copy link
Member

dblock commented May 5, 2024

Sounds like upgrading to JDK 21 is the change that caused this. Seems like a real problem. I am going to reopen this and edit the title to say something to this effect. @tophercullen do you think you can help us debug what's going on? There are a few suggestions above to take some heap dumps and compare.

@dblock dblock reopened this May 5, 2024
@dblock dblock changed the title OpenSearch Data Nodes memory exhaustion after upgrade from 2.9 to 2.12 OpenSearch Data Nodes memory exhaustion after upgrade from 2.9 to 2.12 (JDK 21 upgrade) May 5, 2024
@tophercullen
Copy link

tophercullen commented May 6, 2024

Using the above paypal tool that sanitizes them, I've generated heap dumps from all nodes in a new standalone cluster (nothing else using it) while taking a full cluster snapshot at 1x JDK17 and 2x JDK21. This is 24 files and ~5GB compressed. I'm unsure what I'm supposed to be comparing between them.

From the stdout logging for the cluster, there were no GC logs with JDK17, and a bunch with JDK21. So it seems to be repeatable in an otherwise idle cluster, assuming that is not just a red herring.

Might also consider the reproducer in #12694. That seems fairly similar to our real use case, and the operations we were seeing/getting circuit breakers tripped. Snaphots never directly tripped breakers and/or failed, and were seemingly just exacerbating the problem

@dblock
Copy link
Member

dblock commented May 6, 2024

Maybe @backslasht has some ideas about what to do with this next?

@reta
Copy link
Collaborator

reta commented May 6, 2024

Using the above paypal tool that sanitizes them, I've generated heap dumps from all nodes in a new standalone cluster (nothing else using it) while taking a full cluster snapshot at 1x JDK17 and 2x JDK21. This is 24 files and ~5GB compressed. I'm unsure what I'm supposed to be comparing between them.

May be sharing class histogram first could help (even as a screenshot) , thanks @tophercullen

@dblock
Copy link
Member

dblock commented May 6, 2024

#12694 could be related

@ansjcy
Copy link
Member

ansjcy commented May 6, 2024

This might be related to this issue in JDK: https://bugs.openjdk.org/browse/JDK-8297639
The G1UsePreventiveGC was introduced and set to true by default in JDK17 (introduced in this commit, renamed in this commit ) The related issue is https://bugs.openjdk.org/browse/JDK-8257774. This was introduced to solve

...bursts of short lived humongous object allocations. These bursts quickly consume all of the G1ReservePercent regions and then the rest of the free regions

In JDK 20, this flag was set to false by default and in JDK 21 it was completely removed in https://bugs.openjdk.org/browse/JDK-8293861.

Summarizing the observations and reproducing efforts by the community around this JDK issue: removing this flag might have caused memory increase when sending and receiving document with chunks > 2MB. In JDK 20 we can add the G1UsePreventiveGC flag back to bypass this issue but in JDK21 it is not an option anymore :( We either need go back to JDK 20 with that flag enabled, or we need to explore other possible ways to fix this.

@reta
Copy link
Collaborator

reta commented May 6, 2024

@ansjcy that was suggested before (I think on the forum) but we did not use -XX:+G1UsePreventiveGC (AFAIK)

@dblock
Copy link
Member

dblock commented May 7, 2024

@rlevytskyi @tophercullen Do you still have your repro. Care you try with JDK 21 and -XX:+G1UsePreventiveGC, please?

@tophercullen
Copy link

tophercullen commented May 7, 2024

@dblock I can do what I did before: create a new cluster and populate it with data, run snapshots.

However based on what @ansjcy provided, that option is no longer available in JDK21. The issue tracker for openJDK links to a similar issue with Elasticsearch in this regard, which also has no solution using JDK21.

@dblock
Copy link
Member

dblock commented May 7, 2024

However based on what @ansjcy provided, that option is no longer available in JDK21.

Yes, my bad for not reading carefully enough.

@ansjcy
Copy link
Member

ansjcy commented May 8, 2024

but we did not use -XX:+G1UsePreventiveGC

No, but if I'm understanding correctly, this flag was enabled by default in g1_globals.hpp for G1GC in JDK 17.

Also, today I did some more experiments using https://github.com/kroepke/opensearch-jdk21-memory (Thanks, @kroepke! ). I ran bulk (20MB workload per request, ~5MB each document) with docker-based set up, each for 1 hour in the following scenarios:

  • 2.11 with JDK 17, G1UsePreventiveGC flags enabled [1].
  • 2.11 with JDK 17, G1UsePreventiveGC flags disabled [2].
  • 2.11 with JDK 21 [3]

captured the jvm usage results in the 1 hour run:
image

  • for [1], the average jvm usage is 191707377 bytes
  • for [2], the average jvm usage is 196708634 bytes
  • for [3], the average jvm usage is 201973645 bytes

The results shows certain but not significant impact from disabling the flag G1UsePreventiveGC in JDK 17, but there might be some unknown factors impacting the jvm usage in JDK 21 as well. We need to run even longer and heavier benchmark tests to better understand this.

@backslasht
Copy link
Contributor

@ansjcy - Do you think G1UsePreventiveGC is the root cause or it is something else?

@tophercullen - Can you please share the heap dumps?

@dblock - Is there a common share location where these heap dumps can be uploaded?

@dblock
Copy link
Member

dblock commented May 20, 2024

@dblock - Is there a common share location where these heap dumps can be uploaded?

AFAIK no, we don't have a place to host outputs from individual runs - I would just make an S3 bucket and give access to the folks in this thread offline if they don't have a place to put these

@zakisaad
Copy link

zakisaad commented Jun 18, 2024

We're seeing background memory use climb over time (pointing to some kind of GC/memory leak as described) on our AWS managed OpenSearch clusters since the 2.13 upgrade. We went from 2.11 (where the issue was not manifesting), to 2.13. We've had to bump all our nodes from 8GB memory instances to 32GB memory instances just to keep the cluster from falling over every night.

Apart from the version upgrade, there have been no other changes.

Attached can be seen climbing min/avg/max JVM mem pressure over the last week (we've been on 2.13 for >1 week, some adverse cluster events can be seen on this chart too).

Screenshot 2024-06-18 at 11 27 48

Anything we can pull from our managed clusters to help resolve this? We're sorely over-provisioned now, so we're willing to put in some legwork to solve this.

@tophercullen
Copy link

tophercullen commented Jun 18, 2024

@zakisaad Since downgrading the JVM version over a month ago, we haven't had any more issues. I would check the JVM version AWS is using. If its 21, you'll likely need to get in tough with AWS support to escalate this issue because its has not been tested thoroughly enough for actual production use from what we've found first hand (see also opensearch-project/performance-analyzer-rca#545 (comment)).

If AWS is unable or unwilling to escalate this, I think your only options is to (somehow) revert to a previous version of the hosted service.

@zakisaad
Copy link

We'll be reaching out to AWS support to get this resolved as it's essentially unusable in current state (we're rolling out a cluster reboot cron to mask GC issues until resolved). Upgrades to managed OS are one-way only, so downgrading our cluster will require a restore from a snapshot -- we may attempt this if AWS can't provide a remediation timeline.

Thanks for confirming the JVM downgrade sorted this out for you, if we were self-hosting I'd jump on it. One day, we'll have the bandwidth to internally manage our OS cluster 🙇‍♂️

@tophercullen
Copy link

@zakisaad Yeah, there are pros and cons to the hosted service. My advice: don't hold your breath for AWS and/or this issue to be resolved. Create a new (older) cluster and determine if a snapshot restore is even possible, and plan an alternative data migration accordingly.

@reta
Copy link
Collaborator

reta commented Jun 19, 2024

@tophercullen sadly JDK-21 provides no workaround for this issue (#12454 (comment)), downgrading is the best option as suggested by @zakisaad

@hogesako
Copy link
Contributor

The issue seemed to have been alleviated in Elasticsearch by stopping unnecessary copying of byte arrays.

elastic/elasticsearch#99592
elastic/elasticsearch#104692
elastic/elasticsearch#105712

@dblock
Copy link
Member

dblock commented Jun 25, 2024

@hogesako Appreciate any fixes you can make to OpenSearch. Please make sure to no look at / copy non-APLv2-open-source code.

@zakisaad
Copy link

Hi @dblock

This issue is adversely affecting our clusters in production -- as I understand it, AWS maintains OpenSearch (and provides a managed OpenSearch service to monetise the product). As it stands, the default configuration of a fully up to date managed OS cluster on AWS exhibits memory-leak like behaviour. There are hacky fixes such as scheduled cluster reboots ~once a week (with over-provisioned nodes to accomodate the leaking memory...), but this is for sure a short term fix with various shortcomings.

Our clusters aren't even that large, so I can bet other clients are seeing this issue for sure.

As Amazon has forked ES specifically to be able to continue monetising the product via their managed service, I assume it is expected that AWS fixes or at least addresses this issue as important.

We haven't bothered considering self-managed clusters yet as we assumed AWS would fix an issue of this magnitude, but if AWS won't prioritise it we'll be moving off the managed service for sure. If we were self-managed, we'd be able to downgrade the JVM and avoid this issue entirely, for instance.

@42wim
Copy link

42wim commented Jul 1, 2024

We're seeing the same issue here, I've rebuild an image with JDK17 as specified above, but this didn't solve it for us, also tested on 2.14.0

Even with JDK17 and increasing memory with 400% we need to restart our cluster every few days because of the memory issues.

So it's not just the JVM, maybe there's a memory leak and a GC problem.

@kroepke
Copy link

kroepke commented Jul 1, 2024

@zakisaad @42wim Are you aware of the other memory issue affecting many projects, as described in #13927?

If you use 2.14 with Java 17 and still see memory leaks, you might be looking at that instead, if so, give 2.15 a shot keeping at Java 17 if possible.

@zakisaad
Copy link

zakisaad commented Jul 1, 2024

@kroepke unfortunately as far as I know, Amazon managed OpenSearch doesn't allow us to specify/pin JDKs. We're at the mercy of whatever the development team Amazon has rolled out as part of the managed service.

@shwetathareja
Copy link
Member

shwetathareja commented Jul 4, 2024

@zakisaad please reach out to AWS support to follow up on the fix for your clusters.

@42wim
Copy link

42wim commented Jul 4, 2024

I've updated the 2.14.0 image with the jackson-core to 2.17.1, keeping the default java (openjdk version "21.0.3" 2024-04-16 LTS) of that image.

I've downscaled the cluster back to the original 100% and it is now running for 72h without issues.

Next week we'll upgrade to 2.15.0

@reta
Copy link
Collaborator

reta commented Jul 4, 2024

Next week we'll upgrade to 2.15.0

Thanks for the update @42wim , please share the outcomes , it would help us to pinpoint if the issue is still there (JDK related) or gone (Jackson related)

@42wim
Copy link

42wim commented Jul 12, 2024

Running 2.15.0 now for > 72h, issues are gone, so seems Jackson related.

@dhwanilpatel
Copy link
Contributor

dhwanilpatel commented Sep 16, 2024

[Indexing Triage 09/16]

Thanks @42wim for confirmation. Closing the issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests