-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix(node): altda failover to ethda should keep finalizing l2 chain #12845
base: develop
Are you sure you want to change the base?
Fix(node): altda failover to ethda should keep finalizing l2 chain #12845
Conversation
… after failover to ethda Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing
Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.
op-e2e/actions/altda/altda_test.go
Outdated
if i == 0 { | ||
// TODO: figure out why this is needed | ||
// I think it's because the L1 driven finalizedHead is set to L1FinalizedHead-ChallengeWindow (see damgr.go updateFinalizedFromL1), | ||
// so it trails behind by an extra challenge_window when we switch over to ethDA. | ||
harness.ActNewL2TxFinalized(t) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this felt weird to me, but maybe its expected and fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
op-alt-da/damgr.go
Outdated
// If a commitment was pruned, set the finalized head to that commitment's inclusion block | ||
// When no commitments are left to be pruned (one example is if we have failed over to ethda) | ||
// then updateFinalizedFromL1 becomes the main driver of the finalized head. | ||
// Note that updateFinalizedFromL1 is only called when len(d.state.commitments) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ... only called when d.state.NoCommitments()
is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
op-e2e/actions/altda/altda_test.go
Outdated
|
||
func TestAltDA_FinalizationAfterEthDAFailover(gt *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this test case covers ethDA -> altDA. Can we also have a similar case ethDA -> altDA -> ethDA, simulating a temp altDA failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: ce0267b
I don't fully understand the finalization behavior. Look at the weird if i==0
cases. Would appreciate help figuring out whether this behavior is really what we want, and if so, how to best explain it in the comment/assert logic.
op-alt-da/damgr.go
Outdated
// then updateFinalizedFromL1 becomes the main driver of the finalized head. | ||
// Note that updateFinalizedFromL1 is only called when len(d.state.commitments) == 0 | ||
var zero eth.L1BlockRef | ||
if lastPrunedCommIncBlock != zero { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use zero value directly, to be consistent with other zero value cases in the repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len(d.state.commitments) == 0 => d.state.NoCommitments() is true.
Description
Wenwei from Polymer raised this bug with us.
We are currently in the process of adding failover to altda (see this; batcher PR coming soon), and realized that when failover happens, the L2 finalized head completely stalls.
This PR consists of 2 commits:
Tests
First commit (see above) contains test that shows the buggy behavior. To run:
Fix
The problem with the damgr as it is is that
is always run no matter what. This means that finalization is always driven by altda commitments, but after failover, damgr stops seeing the commitments (because they arent altda commitments anymore). The fix in the second commit is to let the finalization be driven by L1 finalization when there are no altda commitments managed by the damgr.
NOTE: I don't fully understand the subtleties of the damgr and derivation pipeline interactions, so might be doing something wrong here... but seems sound to me, especially since it passes the test.
Additional Context
The test is a bit ugly currently (see TODO comment below). Someone with better understanding of op-node and event processing order would surely be able to help me clean this up. I use this patch for debugging event processing, in case this is of use to anyone: