[BUG] Segment Replication stats throwing NPE when shards are unassigned or are in delayed allocation phase #11945
Labels
bug
Something isn't working
good first issue
Good for newcomers
low hanging fruit
Storage
Issues and PRs relating to data and metadata storage
Describe the bug
We are seeing NPEs coming up from the NodesStats API when there are nodes data nodes dropping out of the cluster because of resource constraints. NodesStats API fired at that point of time when there are shards getting unassigned from the data nodes (because of data nodes leaving the cluster), fails with this error:
This seems to be coming up from the SegmentReplicationStats code, specifically from this code block which tries to detect if a primary shard is being relocated by cross checking the current
allocationId
with all theallocationIds
from the shard routing table.OpenSearch/server/src/main/java/org/opensearch/index/seqno/ReplicationTracker.java
Lines 1233 to 1239 in 774e7d4
Related component
Storage
To Reproduce
N/A
Expected behavior
NodesStats API should not fail even during transient data node drops
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: