[Optimization] AsyncShardFetch response contains DiscoveryNode which takes GBs of memory #7861

amkhar · 2023-06-01T11:57:34Z

Describe the bug
This issue is a subtask to the project #5098 where we are trying to revamp the flow for AsyncShardFetch, as that takes GBs in memory and impacts cluster manager with JVM spikes.

After analyzing the heap dump carefully, we found that DiscoveryNode object is maintained in memory for 1.5 million times though we know these nodes are limited in cluster like 100 only. As part of this optimization we need to understand the reasoning behind it and improve the current objects so DiscoveryNode is maintained in memory for a limited number of times not in order of number of shards etc.

To Reproduce
Step of repro were same as here: #5098 (comment)

Expected behavior
DiscoveryNode objects should be limited and not take memory in GBs.

anasalkouz · 2023-06-06T21:33:23Z

@amkhar are you actively working on fixing this?

amkhar · 2023-06-13T04:57:47Z

@anasalkouz I've not started working on it, currently working on a revamp approach for #5098

I'll come up with proposal for this after 1-2 weeks. It's in my next priority.

amkhar added bug Something isn't working untriaged labels Jun 1, 2023

andrross added the distributed framework label Jun 1, 2023

anasalkouz added Performance This is for any performance related enhancements or bugs and removed untriaged labels Jun 6, 2023

shwetathareja assigned amkhar Jun 8, 2023

amkhar mentioned this issue Jun 16, 2023

[META] Cluster Manager Async Shard Fetch Revamp #8098

Open

13 tasks

macohen added this to OpenSearch Lucene & Core Performance Tracking Sep 8, 2023

github-project-automation bot moved this to Lucene (In Progress) in OpenSearch Lucene & Core Performance Tracking Sep 8, 2023

gauravruhela added the Cluster Manager label Oct 10, 2023

rwali-aws added this to Cluster Manager Project Board Apr 20, 2024

github-project-automation bot moved this to 🆕 New in Cluster Manager Project Board Apr 20, 2024

rwali-aws removed this from OpenSearch Lucene & Core Performance Tracking Apr 22, 2024

rwali-aws moved this from 🆕 New to Now(This Quarter) in Cluster Manager Project Board Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Optimization] AsyncShardFetch response contains DiscoveryNode which takes GBs of memory #7861

[Optimization] AsyncShardFetch response contains DiscoveryNode which takes GBs of memory #7861

amkhar commented Jun 1, 2023

anasalkouz commented Jun 6, 2023

amkhar commented Jun 13, 2023

[Optimization] AsyncShardFetch response contains DiscoveryNode which takes GBs of memory #7861

[Optimization] AsyncShardFetch response contains DiscoveryNode which takes GBs of memory #7861

Comments

amkhar commented Jun 1, 2023

anasalkouz commented Jun 6, 2023

amkhar commented Jun 13, 2023