Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.18: blockstore: only consume duplicate proofs from root_slot + 1 on startup (backport of #1971) #2113

Merged
merged 2 commits into from
Aug 30, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Jul 12, 2024

Problem

Slots marked as root during startup (restart slot and any newly cluster confirmed slots / tower reconciliations) should also have duplicate proofs ignored, as otherwise we will end up attempting to mark a rooted slot as invalid on startup.

Summary of Changes

At replay initialization we use the root from bank forks and start consuming duplicate proofs for any slot after it.
The bank forks root takes into account any artificial root adjustment made during the startup process.


This is an automatic backport of pull request #1971 done by [Mergify](https://mergify.com).

@mergify mergify bot added the conflicts label Jul 12, 2024
@mergify mergify bot requested a review from a team as a code owner July 12, 2024 15:32
Copy link
Author

mergify bot commented Jul 12, 2024

Cherry-pick of 2a48564 has failed:

On branch mergify/bp/v1.18/pr-1971
Your branch is up to date with 'origin/v1.18'.

You are currently cherry-picking commit 2a48564b59.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   Cargo.lock
	modified:   core/Cargo.toml
	modified:   core/src/replay_stage.rs
	modified:   ledger/Cargo.toml
	modified:   programs/sbf/Cargo.lock

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   ledger/src/blockstore_processor.rs

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@AshwinSekar AshwinSekar force-pushed the mergify/bp/v1.18/pr-1971 branch 2 times, most recently from 5e13995 to c826fc9 Compare July 16, 2024 16:31
@AshwinSekar AshwinSekar requested a review from steviez July 16, 2024 16:31
// either during a previous run or artificially means that we should ignore any
// duplicate proofs for the root slot, thus we start consuming duplicate proofs
// from the root slot + 1
.duplicate_slots_iterator(root_bank.slot().saturating_add(1))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only functional change, remaining is test code.

This backport ensures that nodes are able to restart cleanly in the presence of duplicate blocks. The root chosen upon restart could have previously been a duplicate block that was resolved, or chosen as a restart slot anyway. It is important that we ignore the duplicate column for the restart root slot, otherwise we will panic on replay initialization.

Copy link

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as v2.0, I think this BP is important to make nodes more robust in cluster restart scenarios. I feel fairly good about the unit test proving validity of the change (especially given #1971 (comment)); however, we could potentially stage a one-off cluster restart with private cluster if we're wanting to be very cautious with v1.18 BP's.

Going to hold off on giving my ship-it until further discussion occurs

@t-nelson
Copy link

@AshwinSekar mind fixing the merge conflicts here so we can merge? mainly want to take steps out of any potential mb restart before 2.0

@AshwinSekar AshwinSekar force-pushed the mergify/bp/v1.18/pr-1971 branch from c826fc9 to 61143af Compare August 28, 2024 16:57
steviez
steviez previously approved these changes Aug 28, 2024
Copy link

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could potentially stage a one-off cluster restart with private cluster if we're wanting to be very cautious with v1.18 BP's.

The recent testnet restart served as such a test case, and one of our nodes (td2) specifically had a duplicate on the restart slot. So, with the widespread testing we got from testnet, I think we can be comfortable with a v1.18 BP

AshwinSekar and others added 2 commits August 29, 2024 16:21
…up (#1971)

* blockstore: only consume duplicate proofs from root_slot + 1 on startup

* pr feedback: update test comments

* pr feedback: add pub behind dcou for test fns

(cherry picked from commit 2a48564)
@AshwinSekar AshwinSekar merged commit e5d267d into v1.18 Aug 30, 2024
35 checks passed
@AshwinSekar AshwinSekar deleted the mergify/bp/v1.18/pr-1971 branch August 30, 2024 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants