-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce follower replication #249
base: master
Are you sure you want to change the base?
Conversation
I've finished the main part of this PR. But I'm not sure the best way to implement the |
@Fullstop000 Can you please include links to any design docs you worked on? |
@Hoverbear Can I make a WIP PR in tikv/rfc ? |
@Fullstop000 I think it's OK. :) I'd still really appreciate seeing an RFC (like https://docs.google.com/document/d/1Sp9Tnc_nk_i0feTOgLGPT3jZYVFjfP7C6pwA-wuVHko/edit?ts=5c8c5fe3 which you worked on) |
ae47fee
to
3502119
Compare
@hicqu PTAL |
3502119
to
0902e2f
Compare
src/group/mod.rs
Outdated
|
||
/// The max number of members of a group of which contains leader and the leader never pick a delegate. | ||
/// If the group size is larger than this, a delegate will be picked even the leader belongs to this group. | ||
pub max_leader_group_no_delegate: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I prefer to remove this, because without explicit information about group_id
, follower replication may be not helpful. For example 7 peers in one datacenter, follower replication can't save any network traffics.
src/raft.rs
Outdated
// Also, A peer with smaller 'match' is able to receive more un-compacted entries from leader | ||
// and then send them to others in same group. | ||
// | ||
fn pick_delegate(&self, group: &[u64], prs: &ProgressSet) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems in the new implementation, choose the peer with most raft logs is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems more reasonable. Actually maybe we can choose any unpaused peer?
src/raft.rs
Outdated
@@ -130,6 +130,9 @@ pub struct Raft<T: Storage> { | |||
/// The list of messages. | |||
pub msgs: Vec<Message>, | |||
|
|||
/// The especial map for messages to be sent to group delegates. | |||
pub delegated_msgs: HashMap<u64, Message>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to put it with groups
src/raft.rs
Outdated
@@ -545,41 +546,159 @@ impl<T: Storage> Raft<T> { | |||
is_batched | |||
} | |||
|
|||
// Whether the given peer could use Follower Replication | |||
fn use_delegate(&self, to: u64) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer
fn use_delegate(&self, to: u64) -> bool {
self.is_leader() &&
self.groups.get_members(to).map_or(0, |members| members.len()) > 1 &&
!self.groups.in_same_group(self.id, to)
}
And we need to make sure that peers with invalid group_id shouldn't exist in groups
.
src/raft.rs
Outdated
/// of produced message should be set to the leader id. | ||
pub fn send_append(&mut self, to: u64, prs: &mut ProgressSet) { | ||
if self.use_delegate(to) { | ||
if let Some(gid) = self.groups.get_group_id(to) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many indents. Maybe we can define use_delegate(to) -> Option<group_id>
.
src/raft.rs
Outdated
None => self.pick_delegate(&members, prs), | ||
}; | ||
if delegate != INVALID_ID { | ||
if let Some(pr) = prs.get_mut(delegate) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling unwrap
is better, because the progress must exist.
src/raft.rs
Outdated
} | ||
None => { | ||
let members = self.groups.get_members(to).cloned().unwrap(); // this is safe because of the checking in `self.use_delegate` | ||
let delegate = match self.groups.get_delegate(gid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer let delegate = self.pick_delegate(prs)
, which already contains get_delegate
logic. And, its valid only if the pr
is not paused.
src/raft.rs
Outdated
} | ||
*maybe_commit = true; | ||
} | ||
|
||
fn handle_append_response_in_delegate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems only difference between this and handle_append_response
is the latter needs to handle leader transferring. That logic is compatible for delegates. So I think this function is unnecessary.
src/raft.rs
Outdated
self.set_prs(prs); | ||
} else { | ||
let members = self.groups.get_members(m.from_delegate).cloned(); | ||
self.bcast_append(members.as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bcast_append
calls send_append
, which checks members' delegate again. I believe it can be improved, maybe we can take a try?
src/raft.rs
Outdated
@@ -1811,14 +1993,63 @@ impl<T: Storage> Raft<T> { | |||
}; | |||
self.read_states.push(rs); | |||
} | |||
MessageType::MsgAppendResponse => { | |||
if self.prs().get(m.from).is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a common case for receiving responses from removed peers. So I think the log is unnecessary. Without it the code could be cleaner.
src/raft.rs
Outdated
// | ||
// The delegate must satisfy conditions below: | ||
// 1. Must be 'recent_active' | ||
// 2. The progress state should be 'Replicate' but not 'paused' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems Probe
needs also to be allowed? I think calling is_paused
here is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probe
might not be suitable for a delegate in the leader's view because the Probe
member can probably just recover from a crash or is suffering the network partition.
src/raft.rs
Outdated
} | ||
|
||
#[inline] | ||
fn is_follower_replication_enabled(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not accurate. Every peer can't know follower replication
is enabled or not. It just choose a strategy after receives followers' messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add some comments to make this clear.
harness/tests/integration_cases/test_raft_follower_replication.rs
Outdated
Show resolved
Hide resolved
4259f29
to
8ed4829
Compare
b7d2176
to
5189128
Compare
88f583e
to
f89086c
Compare
Signed-off-by: Fullstop000 <[email protected]>
88f583e
to
1da637c
Compare
Signed-off-by: Fullstop000 <[email protected]>
Any updates? |
This PR is discontinued for so long, any updates here? Do we have a plan to get it merged? |
#136
New features
Message
Group
andDelegate
in the cluster topologyA message now could be sent to a proxy and the proxy redirects the message to the destinationTODOs
Delegate
of aGroup
Delegate
canbcast_append
orsend_append
to the other members in theGroup
Problems
The ConfChange protocol for raft groups might need to be discussed.Solved asGroupConfig
is now volatile.The implementation has been ported to etcd: etcd-io/etcd#11455.