Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forget previous generation #108

Merged
merged 5 commits into from
Jan 25, 2024
Merged

Conversation

trinity-1686a
Copy link
Contributor

fix #94

I figured what was preventing gc.

  • Node1 and Node2 communicate.
  • Node2 dies.
  • Node3 appears.
  • Node1 isn't aware yet that Node2 is dead; it gives info about Node2 to Node3
  • Node3 receives info of Node2, it record a heartbeat, but Node2 isn't in the state, so the failure detector actually doesn't record anything. Then the state is updated.
  • Node1 and Node3 don't communicate about Node2 as nothing changed
  • Node1 discovers Node2 is dead
  • after some time, Node1 finally forget Node2
  • Node3 notice Node1 haven't heard of Node2, and sends it the info
  • Node1 receives info of Node2, it record a heartbeat, but Node2 isn't in the state, so the failure detector actually doesn't record anything. Then the state is updated.

Both Node1 and Node3 knows about Node2, but none track its state in their failure detector

what I implemented to fix that is that on first learning about a node, it's recorded in the failure detector as dead. if we get a 2nd heartbeat (with higher sequence number), we update to alive. That way a node in state should always be in the failure detector. We add it as dead because otherwise each time Node1 forget, the message from Node3 makes it believe that Node2 is alive, then Node3 forget, Node1 makes it believe Node2 is alive, then Node1 forget.... (this is only an issue if dead_node_grace_period can be smaller than the time it takes to detect a node is dead)


currently tests don't pass, and I have not been successful at fixing them. I left a comment with what I've figured so far at the place where it fails. Any help welcome!

@trinity-1686a trinity-1686a force-pushed the trinity--forget-previous-generation2 branch from c43c2d4 to cc12036 Compare January 17, 2024 09:39
@trinity-1686a trinity-1686a marked this pull request as ready for review January 17, 2024 09:45
@trinity-1686a
Copy link
Contributor Author

while trying to fix the tests, I found 2 issues in a test which worked before, as far as I can tell because of favorable timing.

  • after a partition, when recovery start, if a node used to have a key, but has deleted it and has none left, other nodes may not learn about its max_version, and accept stall writes from nodes that aren't yet reset. It itself won't reset further, as it now has up to date heartbeat => send a sentinel value (tombstone for "") if we reset a node, and gives it no key at all.
  • if a node A thinks an other node B is dead, it no longer includes it in its digest, which means other nodes will tell it about a new version of B, but won't tell it to reset its state. If B created a key before disconnection, deleted it during that phase, and its tombstone expired during that phase too, when A and B starts to communicate again, A will get an up to date heartbeat, but will keep the old key for A as neither tombstone nor state reset told it to drop the key => always include dead nodes in digest so we get resets when we should

chitchat/src/failure_detector.rs Outdated Show resolved Hide resolved
chitchat/src/state.rs Show resolved Hide resolved
// send a sentinel element to update the max_version. Otherwise the node's vision
// of max_version will be 0, and it may accept writes that are supposed to be
// stale, but it can tell they are.
if !delta_writer.add_kv(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch does not seem unit-tested.

@fulmicoton
Copy link
Contributor

what I implemented to fix that is that on first learning about a node, it's recorded in the failure detector as dead. if we get a 2nd heartbeat (with higher sequence number), we update to alive.

I think it is possible for a Node 4 to send Node 3 info about Node 2 with a higher sequence number. Node 3 would then update it as alive for a little bit. Can you confirm that this scenario will just result in Node 3 thinking for a few seconds that Node 2 is alive.

@fulmicoton
Copy link
Contributor

Can we merge this? If you have identified more faulty corner case, please open an issue to document them.

@trinity-1686a trinity-1686a merged commit 5d7ff34 into main Jan 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generation increase in chitchat does not really clean up the state
3 participants