Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Restructure] Change Cache Structure in AsyncShardFetch class #12248

Closed
amkhar opened this issue Feb 8, 2024 · 0 comments
Closed

[Restructure] Change Cache Structure in AsyncShardFetch class #12248

amkhar opened this issue Feb 8, 2024 · 0 comments
Assignees
Labels
bug Something isn't working Cluster Manager

Comments

@amkhar
Copy link
Contributor

amkhar commented Feb 8, 2024

Describe the bug

AsyncShardFetch Revamp strategy (explained #5098 (comment) ) uses ShardId object directly in the key to store metadata of all the shards received from all the nodes. So overall memory usage goes in the factor of

ShardId object size * shard_count * node_count = this goes in GBs as ShardId object contains more data.
One optimization issue is already opened to use smaller size : #12010

After reducing the size from 208 Bytes to 72 bytes (using a string = indexUUID_shardNumber), overall impact on heap is reduced to 16 GBs for a 500N, 500K shards setup.

As we're storing the full response(T) from data nodes

private final Map<String, NodeEntry<T>> cache = new HashMap<>();

it'll still keep the cache structure like Map<NodeId, Map<ShardId, ShardMetaData>>

Note : ShardMetaData is not an actual class, just using here to show metadata or primary/replica.
As we can see ShardId(72 Bytes) will keep getting repeated for all the nodes.

Approach

We should use an array/ArrayList instead of Map<ShardId, ShardMetaData> so only data is stored for each shard not the shard Id itself. And ShardId should only be stored once in the cache, may be in another map which can store the reference to the array index. Map<ShardId, array_index>

Map<NodeId, ShardMetadata[]>

This will take actual ShardId size in heap i.e. = 34 MBs and we'll save all that 16GBs.

Related component

Cluster Manager

To Reproduce

  1. Use the latest code of this project
  2. spin up a new cluster with 500 nodes and 500K shards
  3. Restart all cluster manager nodes at once
  4. We'll see heap getting full, can go up to 50GB

Expected behavior

Ideally ShardId key should not get repeated even when its size is reduced.

Additional Details

This change depends on the already existing PRs of the overall project #8098 to revamp reroute flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager
Projects
Status: ✅ Done
Development

No branches or pull requests

3 participants