-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication] Primary promotion on shard failing during node removal in RoutingNodes#failShard #4131
Comments
Broken down tasks in following sub-tasks:
|
RoutingNodes#failShard method can not be removed as
Removing failShard method would need lot of core level changes in shard allocation, which is not intended as part of issue. |
There are following approaches for handling RoutingNodes#failShard to include ReplicationCheckpoint. There are three options here:
I am planning to move forward with option 2 above. |
I think 2 is the best option given we want this as a best effort. I also wouldn't be worried about it delaying shard promotion right now, we can set timeouts to reduce that impact and measure the total time. |
Discussed this during team standup, where we identified that we need to get data around segment replication performance when furthese ahead replica is not chosen. This is to also to evaluate the trade off we will get with implementing this core change. |
@dreamer-89 @mch2 this is tagged for 2.5. Can we make it ? |
Thank you @saratvemulapalli for bringing this up. This work will not make into 2.5.0 release, removing the tag. From previous discussion, this is an optimization task which tries to select the replica with highest checkpoint (to ensure minimum file copy ops from new selected primary & prevent segment conflicts). We also don't have data around how bad this I/O can go if we do not select the replica with highest replication checkpoint. The segment conflicts are avoided today by bumping the SegGen on selected primary. Even with approach 2 above (sync call to replicas to fetch highest replication checkpoint), this solution will be best effort and can't guarantee the selection of furthest ahead replica; which leaves room for segment conflict. Based on this, prioritizing existing GA task over this. CC @mch2 @anasalkouz |
Coming from #3988 where RoutingNodes#failShard was identified as another workflow where master eagerly promotes replica as part of node removal workflow. failShard method is also handles cluster state updates (e.g. assigned shards etc).
The text was updated successfully, but these errors were encountered: