Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus: after proposal misses (leader failures) a node should be suspended from proceeding rounds #1162

Closed
sdbondi opened this issue Sep 27, 2024 · 2 comments
Assignees
Milestone

Comments

@sdbondi
Copy link
Member

sdbondi commented Sep 27, 2024

Background

If a node does not propose or proposes invalid blocks, the chain slows down because in order to progress we need to wait for a view change after minimum 12s.

We seek to temporarily suspend offline nodes until they come online to minimise their impact on consensus.

Proposal 1

Suspend:
Each replica tracks proposal misses for each validator member. A proposal miss is detected from a dummy block reaching commit phase(5). In this case a counter is incremented. On propose, a proposer checks for any missed proposal counts that exceed a threshold (MISSED_PROPOSAL_SUSPEND_THRESHOLD) and proposes Command::Suspend(vn_index) in the next block.

Once that block is committed, anytime is_suspended_vn(leader_index(cur_view)) the view is immediately progressed to the next view and the normal leader failure procedure applies (1).

Notes:

  1. This could be implemented as the nodes immediately sending a NEWVIEW to the next leader, or
  2. the leader(cur_view + 1) could immediately propose a new block with a dummy parent.
  3. Validators need to accept the dummy parent block in (2) without NEWVIEW signatures (consensus: prevent unjustified view-change from transmitted dummy blocks (using NEWVIEW signatures) #1160) if the previous leader is suspended.
  4. For each of these "forced" dummy blocks, the missed_proposal count is increased (should there be an upper bound?)
  5. Updating counts at the commit phase is easiest as we don't have to keep fork-dependent count data

Resume
As votes are received from the suspended node, the missed_proposal count is decremented. Once it reaches 0, a proposer MUST propose Command::Resume(vn_index). Since the counts are only updated on commit blocks, all non-faulty validators have the same counts(1).

Once the resume block has been committed, the validator is reinstated into the round and continues as normal.

Notes:

  1. We may want to provide some proof of this in the block, for example, a hash of validator PKs and their corresponding non-zero missed proposal count.
@sdbondi sdbondi converted this from a draft issue Sep 27, 2024
@sdbondi sdbondi added this to the v0.0.1 milestone Sep 27, 2024
@stringhandler
Copy link
Contributor

Nice, I like it

@sdbondi sdbondi self-assigned this Oct 15, 2024
@sdbondi sdbondi moved this from In Progress to In Review in Tari Digital Assets Network (DAN) backlog Dec 11, 2024
@sdbondi
Copy link
Member Author

sdbondi commented Dec 11, 2024

Closed in #1211

@sdbondi sdbondi closed this as completed Dec 11, 2024
@sdbondi sdbondi moved this from In Review to Done in Tari Digital Assets Network (DAN) backlog Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants