You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AsyncShardFetch Revamp strategy (explained #5098 (comment) ) uses ShardId object directly in the key to store metadata of all the shards received from all the nodes. So overall memory usage goes in the factor of
ShardId object size * shard_count * node_count = this goes in GBs as ShardId object contains more data.
One optimization issue is already opened to use smaller size : #12010
After reducing the size from 208 Bytes to 72 bytes (using a string = indexUUID_shardNumber), overall impact on heap is reduced to 16 GBs for a 500N, 500K shards setup.
As we're storing the full response(T) from data nodes
it'll still keep the cache structure like Map<NodeId, Map<ShardId, ShardMetaData>>
Note : ShardMetaData is not an actual class, just using here to show metadata or primary/replica.
As we can see ShardId(72 Bytes) will keep getting repeated for all the nodes.
Approach
We should use an array/ArrayList instead of Map<ShardId, ShardMetaData> so only data is stored for each shard not the shard Id itself. And ShardId should only be stored once in the cache, may be in another map which can store the reference to the array index. Map<ShardId, array_index>
Map<NodeId, ShardMetadata[]>
This will take actual ShardId size in heap i.e. = 34 MBs and we'll save all that 16GBs.
Related component
Cluster Manager
To Reproduce
Use the latest code of this project
spin up a new cluster with 500 nodes and 500K shards
Restart all cluster manager nodes at once
We'll see heap getting full, can go up to 50GB
Expected behavior
Ideally ShardId key should not get repeated even when its size is reduced.
Additional Details
This change depends on the already existing PRs of the overall project #8098 to revamp reroute flow.
The text was updated successfully, but these errors were encountered:
Describe the bug
AsyncShardFetch Revamp strategy (explained #5098 (comment) ) uses ShardId object directly in the key to store metadata of all the shards received from all the nodes. So overall memory usage goes in the factor of
ShardId object size * shard_count * node_count = this goes in GBs as ShardId object contains more data.
One optimization issue is already opened to use smaller size : #12010
After reducing the size from 208 Bytes to 72 bytes (using a string = indexUUID_shardNumber), overall impact on heap is reduced to 16 GBs for a 500N, 500K shards setup.
As we're storing the full response(T) from data nodes
OpenSearch/server/src/main/java/org/opensearch/gateway/AsyncShardFetch.java
Line 89 in a0b5198
it'll still keep the cache structure like
Map<NodeId, Map<ShardId, ShardMetaData>>
Note : ShardMetaData is not an actual class, just using here to show metadata or primary/replica.
As we can see ShardId(72 Bytes) will keep getting repeated for all the nodes.
Approach
We should use an array/ArrayList instead of
Map<ShardId, ShardMetaData>
so only data is stored for each shard not the shard Id itself. And ShardId should only be stored once in the cache, may be in another map which can store the reference to the array index. Map<ShardId, array_index>Map<NodeId, ShardMetadata[]>
This will take actual ShardId size in heap i.e. = 34 MBs and we'll save all that 16GBs.
Related component
Cluster Manager
To Reproduce
Expected behavior
Ideally ShardId key should not get repeated even when its size is reduced.
Additional Details
This change depends on the already existing PRs of the overall project #8098 to revamp reroute flow.
The text was updated successfully, but these errors were encountered: