Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Segment Replication] NO-OP recovery not attempted #7161

Closed
dreamer-89 opened this issue Apr 14, 2023 · 1 comment
Closed

[BUG] [Segment Replication] NO-OP recovery not attempted #7161

dreamer-89 opened this issue Apr 14, 2023 · 1 comment
Assignees
Labels
bug Something isn't working distributed framework Indexing:Replication Issues and PRs related to core replication framework eg segrep

Comments

@dreamer-89
Copy link
Member

dreamer-89 commented Apr 14, 2023

Describe the bug
Coming from #6761 exercise, few tests are flaky because NO-OP recovery is not performed.

Background
Cluster manager today relies on ReplicaShardAllocator for replica shards recoveries (other than newly created indices), where it pings all nodes for their store metadata. Cluster manager then selects a node and identify the feasibility to perform no-op recovery. If it is posible, cluster-manager cancels the ongoing recovery. A no-op recovery is identified when the target satisfies either of below two conditions

  1. Retaining sequence no >= primary's retaining seq no
  2. Both primary and target node has same sync_id (marker which is updated for inactive (no indexing for 5 mins) shard copies & persisted on disk)

To Reproduce
Below integration tests fails reliably

  1. CloseIndexIT.testNoopPeerRecoveriesWhenIndexClosed
  2. ReplicaShardAllocatorIT.testFullClusterRestartPerformNoopRecovery

** Impact **
During node restart, replica recovery will perform a full file based recovery.

Expected behavior
No-op recovery should be performed when it is applicable

@dreamer-89
Copy link
Member Author

Similar to #7163 (comment) this test expects sequence number of recovery to go through. Closing this issue as this is expected behavior with segment replication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework Indexing:Replication Issues and PRs related to core replication framework eg segrep
Projects
Status: Done
Development

No branches or pull requests

4 participants