[BUG] [Segment Replication] NO-OP recovery not attempted #7161
Labels
bug
Something isn't working
distributed framework
Indexing:Replication
Issues and PRs related to core replication framework eg segrep
Describe the bug
Coming from #6761 exercise, few tests are flaky because NO-OP recovery is not performed.
Background
Cluster manager today relies on ReplicaShardAllocator for replica shards recoveries (other than newly created indices), where it pings all nodes for their store metadata. Cluster manager then selects a node and identify the feasibility to perform no-op recovery. If it is posible, cluster-manager cancels the ongoing recovery. A no-op recovery is identified when the target satisfies either of below two conditions
sync_id
(marker which is updated for inactive (no indexing for 5 mins) shard copies & persisted on disk)To Reproduce
Below integration tests fails reliably
** Impact **
During node restart, replica recovery will perform a full file based recovery.
Expected behavior
No-op recovery should be performed when it is applicable
The text was updated successfully, but these errors were encountered: