-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.18: blockstore: only consume duplicate proofs from root_slot + 1 on startup (backport of #1971) #2113
Conversation
Cherry-pick of 2a48564 has failed:
To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
5e13995
to
c826fc9
Compare
// either during a previous run or artificially means that we should ignore any | ||
// duplicate proofs for the root slot, thus we start consuming duplicate proofs | ||
// from the root slot + 1 | ||
.duplicate_slots_iterator(root_bank.slot().saturating_add(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only functional change, remaining is test code.
This backport ensures that nodes are able to restart cleanly in the presence of duplicate blocks. The root chosen upon restart could have previously been a duplicate block that was resolved, or chosen as a restart slot anyway. It is important that we ignore the duplicate column for the restart root slot, otherwise we will panic on replay initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as v2.0, I think this BP is important to make nodes more robust in cluster restart scenarios. I feel fairly good about the unit test proving validity of the change (especially given #1971 (comment)); however, we could potentially stage a one-off cluster restart with private cluster if we're wanting to be very cautious with v1.18 BP's.
Going to hold off on giving my ship-it until further discussion occurs
@AshwinSekar mind fixing the merge conflicts here so we can merge? mainly want to take steps out of any potential mb restart before 2.0 |
c826fc9
to
61143af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could potentially stage a one-off cluster restart with private cluster if we're wanting to be very cautious with v1.18 BP's.
The recent testnet restart served as such a test case, and one of our nodes (td2
) specifically had a duplicate on the restart slot. So, with the widespread testing we got from testnet, I think we can be comfortable with a v1.18 BP
61143af
to
107f19a
Compare
Problem
Slots marked as root during startup (restart slot and any newly cluster confirmed slots / tower reconciliations) should also have duplicate proofs ignored, as otherwise we will end up attempting to mark a rooted slot as invalid on startup.
Summary of Changes
At replay initialization we use the root from bank forks and start consuming duplicate proofs for any slot after it.
The bank forks root takes into account any artificial root adjustment made during the startup process.
This is an automatic backport of pull request #1971 done by [Mergify](https://mergify.com).