forget previous generation #108

trinity-1686a · 2024-01-12T17:18:45Z

I figured what was preventing gc.

Node1 and Node2 communicate.
Node2 dies.
Node3 appears.
Node1 isn't aware yet that Node2 is dead; it gives info about Node2 to Node3
Node3 receives info of Node2, it record a heartbeat, but Node2 isn't in the state, so the failure detector actually doesn't record anything. Then the state is updated.
Node1 and Node3 don't communicate about Node2 as nothing changed
Node1 discovers Node2 is dead
after some time, Node1 finally forget Node2
Node3 notice Node1 haven't heard of Node2, and sends it the info
Node1 receives info of Node2, it record a heartbeat, but Node2 isn't in the state, so the failure detector actually doesn't record anything. Then the state is updated.

Both Node1 and Node3 knows about Node2, but none track its state in their failure detector

what I implemented to fix that is that on first learning about a node, it's recorded in the failure detector as dead. if we get a 2nd heartbeat (with higher sequence number), we update to alive. That way a node in state should always be in the failure detector. We add it as dead because otherwise each time Node1 forget, the message from Node3 makes it believe that Node2 is alive, then Node3 forget, Node1 makes it believe Node2 is alive, then Node1 forget.... (this is only an issue if dead_node_grace_period can be smaller than the time it takes to detect a node is dead)

currently tests don't pass, and I have not been successful at fixing them. I left a comment with what I've figured so far at the place where it fails. Any help welcome!

…vious-generation" This reverts commit 107a764, reversing changes made to a3e3f8b.

trinity-1686a · 2024-01-17T09:51:40Z

while trying to fix the tests, I found 2 issues in a test which worked before, as far as I can tell because of favorable timing.

after a partition, when recovery start, if a node used to have a key, but has deleted it and has none left, other nodes may not learn about its max_version, and accept stall writes from nodes that aren't yet reset. It itself won't reset further, as it now has up to date heartbeat => send a sentinel value (tombstone for "") if we reset a node, and gives it no key at all.
if a node A thinks an other node B is dead, it no longer includes it in its digest, which means other nodes will tell it about a new version of B, but won't tell it to reset its state. If B created a key before disconnection, deleted it during that phase, and its tombstone expired during that phase too, when A and B starts to communicate again, A will get an up to date heartbeat, but will keep the old key for A as neither tombstone nor state reset told it to drop the key => always include dead nodes in digest so we get resets when we should

chitchat/src/failure_detector.rs

chitchat/src/state.rs

guilload · 2024-01-22T15:55:46Z

chitchat/src/state.rs

+                // send a sentinel element to update the max_version. Otherwise the node's vision
+                // of max_version will be 0, and it may accept writes that are supposed to be
+                // stale, but it can tell they are.
+                if !delta_writer.add_kv(


This branch does not seem unit-tested.

fulmicoton · 2024-01-23T02:48:42Z

what I implemented to fix that is that on first learning about a node, it's recorded in the failure detector as dead. if we get a 2nd heartbeat (with higher sequence number), we update to alive.

I think it is possible for a Node 4 to send Node 3 info about Node 2 with a higher sequence number. Node 3 would then update it as alive for a little bit. Can you confirm that this scenario will just result in Node 3 thinking for a few seconds that Node 2 is alive.

fulmicoton · 2024-01-25T05:48:30Z

Can we merge this? If you have identified more faulty corner case, please open an issue to document them.

trinity-1686a added 2 commits January 11, 2024 17:17

Revert "Merge pull request #103 from quickwit-oss/trinity--forget-pre…

1e42423

…vious-generation" This reverts commit 107a764, reversing changes made to a3e3f8b.

make initial assumption that node is dead if not in state

812432a

trinity-1686a requested a review from guilload January 12, 2024 17:18

fix correctness issues arround node reset

cc12036

trinity-1686a force-pushed the trinity--forget-previous-generation2 branch from c43c2d4 to cc12036 Compare January 17, 2024 09:39

trinity-1686a marked this pull request as ready for review January 17, 2024 09:45

guilload reviewed Jan 22, 2024

View reviewed changes

rename key, change timeout in tests

5e11445

fulmicoton approved these changes Jan 23, 2024

View reviewed changes

Merge branch 'main' into trinity--forget-previous-generation2

b54089c

trinity-1686a merged commit 5d7ff34 into main Jan 25, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forget previous generation #108

forget previous generation #108

trinity-1686a commented Jan 12, 2024

trinity-1686a commented Jan 17, 2024

guilload Jan 22, 2024

fulmicoton commented Jan 23, 2024

fulmicoton commented Jan 25, 2024

forget previous generation #108

forget previous generation #108

Conversation

trinity-1686a commented Jan 12, 2024

trinity-1686a commented Jan 17, 2024

guilload Jan 22, 2024

Choose a reason for hiding this comment

fulmicoton commented Jan 23, 2024

fulmicoton commented Jan 25, 2024