Fix(node): altda failover to ethda should keep finalizing l2 chain #12845

samlaf · 2024-11-06T15:11:39Z

Description

Wenwei from Polymer raised this bug with us.

We are currently in the process of adding failover to altda (see this; batcher PR coming soon), and realized that when failover happens, the L2 finalized head completely stalls.

This PR consists of 2 commits:

bb6be8d: test to show the behavior (fails on this commit)
672f820: fix damgr so that finalizedHead can keep advancing even after failover

Tests

First commit (see above) contains test that shows the buggy behavior. To run:

git checkout bb6be8d
go test -run ^TestAltDA_FinalizationAfterEthDAFailover$ github.com/ethereum-optimism/optimism/op-e2e/actions/altda

Fix

The problem with the damgr as it is is that

d.finalizedHead = d.state.lastPrunedCommitment

is always run no matter what. This means that finalization is always driven by altda commitments, but after failover, damgr stops seeing the commitments (because they arent altda commitments anymore). The fix in the second commit is to let the finalization be driven by L1 finalization when there are no altda commitments managed by the damgr.

NOTE: I don't fully understand the subtleties of the damgr and derivation pipeline interactions, so might be doing something wrong here... but seems sound to me, especially since it passes the test.

Additional Context
The test is a bit ugly currently (see TODO comment below). Someone with better understanding of op-node and event processing order would surely be able to help me clean this up. I use this patch for debugging event processing, in case this is of use to anyone:

diff --git a/op-e2e/actions/altda/altda_test.go b/op-e2e/actions/altda/altda_test.go
index 9d60f2fee..c6ef361d7 100644
--- a/op-e2e/actions/altda/altda_test.go
+++ b/op-e2e/actions/altda/altda_test.go
@@ -194,7 +194,7 @@ func (a *L2AltDA) ActNewL2Tx(t helpers.Testing) {
 //
 // 17 makes sense because challengeWindow=16 and we create 1 extra block before that,
 // and 204 L2blocks = 17 L1blocks * 12 L2blocks/L1block (L1blocktime=12s, L2blocktime=1s)
-func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing) {
+func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing, logEvents ...bool) {
 	// Include a new l2 batcher transaction, submitting an input commitment to the l1.
 	a.ActNewL2Tx(t)
 	// Create ChallengeWindow empty blocks so the above batcher blocks can finalize (can't be challenged anymore)
@@ -203,7 +203,20 @@ func (a *L2AltDA) ActNewL2TxFinalized(t helpers.Testing) {
 	// TODO: understand why we need to drain the pipeline before AND after actL1Finalized
 	a.sequencer.ActL2PipelineFull(t)
 	a.ActL1Finalized(t)
-	a.sequencer.ActL2PipelineFull(t)
+	if logEvents == nil {
+		a.sequencer.ActL2PipelineFull(t)
+	} else {
+		// Log all events until the end of the pipeline
+		count := 0
+		a.sequencer.ActL2EventsUntil(t, func(ev event.Event) bool {
+			count++
+			a.log.Info("new event", "event", ev, "count", count)
+			if count == 100 {
+				return true
+			}
+			return false
+		}, 100, false)
+	}
 
 	// Uncomment the below code to observe the behavior described in the TODO above
 	// syncStatus := a.sequencer.SyncStatus()
@@ -680,5 +693,12 @@ func TestAltDA_FinalizationAfterEthDAFailover(gt *testing.T) {
 		ssAfter := harness.sequencer.SyncStatus()
 		// Even after failover, the finalized head should continue advancing normally
 		require.Equal(t, ssBefore.FinalizedL2.Number+diffL2Blocks, ssAfter.FinalizedL2.Number)
+
+		harness.log.Info("Sync status before",
+			"unsafeL1", ssBefore.HeadL1.Number, "safeL1", ssBefore.SafeL1.Number, "finalizedL1", ssBefore.FinalizedL1.Number,
+			"unsafeL2", ssBefore.UnsafeL2.Number, "safeL2", ssBefore.SafeL2.Number, "finalizedL2", ssBefore.FinalizedL2.Number)
+		harness.log.Info("Sync status after",
+			"unsafeL1", ssAfter.HeadL1.Number, "safeL1", ssAfter.SafeL1.Number, "finalizedL1", ssAfter.FinalizedL1.Number,
+			"unsafeL2", ssAfter.UnsafeL2.Number, "safeL2", ssAfter.SafeL2.Number, "finalizedL2", ssAfter.FinalizedL2.Number)
 	}
 }

… after failover to ethda Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing

Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.

samlaf · 2024-11-06T15:22:45Z

op-e2e/actions/altda/altda_test.go

+		if i == 0 {
+			// TODO: figure out why this is needed
+			// I think it's because the L1 driven finalizedHead is set to L1FinalizedHead-ChallengeWindow (see damgr.go updateFinalizedFromL1),
+			// so it trails behind by an extra challenge_window when we switch over to ethDA.
+			harness.ActNewL2TxFinalized(t)
+		}


TODO: this felt weird to me, but maybe its expected and fine?

alfredo-stonk

Thanks for the fix!

alfredo-stonk · 2024-11-08T14:59:22Z

op-alt-da/damgr.go

+	// If a commitment was pruned, set the finalized head to that commitment's inclusion block
+	// When no commitments are left to be pruned (one example is if we have failed over to ethda)
+	// then updateFinalizedFromL1 becomes the main driver of the finalized head.
+	// Note that updateFinalizedFromL1 is only called when len(d.state.commitments) == 0


nit: ... only called when d.state.NoCommitments() is true

alfredo-stonk · 2024-11-08T15:36:02Z

op-e2e/actions/altda/altda_test.go

+
+func TestAltDA_FinalizationAfterEthDAFailover(gt *testing.T) {


Nice, this test case covers ethDA -> altDA. Can we also have a similar case ethDA -> altDA -> ethDA, simulating a temp altDA failure?

Done: ce0267b
I don't fully understand the finalization behavior. Look at the weird if i==0 cases. Would appreciate help figuring out whether this behavior is really what we want, and if so, how to best explain it in the comment/assert logic.

alfredo-stonk · 2024-11-08T15:37:39Z

op-alt-da/damgr.go

+	// then updateFinalizedFromL1 becomes the main driver of the finalized head.
+	// Note that updateFinalizedFromL1 is only called when len(d.state.commitments) == 0
+	var zero eth.L1BlockRef
+	if lastPrunedCommIncBlock != zero {


nit: use zero value directly, to be consistent with other zero value cases in the repo

len(d.state.commitments) == 0 => d.state.NoCommitments() is true.

…nAfterEthDAFailover

samlaf added 2 commits November 6, 2024 19:01

test(altda): add a test to make sure altda node keeps finalizing even…

bb6be8d

… after failover to ethda Currently it does not, as shown by the test TestAltDA_FinalizationAfterEthDAFailover failing

fix(damgr): ethda failover finalization stall bug

672f820

Weiwei from Polymer found this bug. He proposed a solution. This is an alternative solution which seems simpler, but not 100% of its soundness.

samlaf commented Nov 6, 2024

View reviewed changes

alfredo-stonk reviewed Nov 8, 2024

View reviewed changes

docs(damgr): fix inaccurate comment

1bf1f68

len(d.state.commitments) == 0 => d.state.NoCommitments() is true.

samlaf requested review from a team as code owners November 18, 2024 10:25

samlaf requested a review from geoknee November 18, 2024 10:25

samlaf added 2 commits November 18, 2024 14:30

style(damgr): instantiate zero value inline

34b2df9

test: also test fallback (to ethda) behavior in TestAltDA_Finalizatio…

ce0267b

…nAfterEthDAFailover

samlaf changed the title ~~Fix: altda failover to ethda should keep finalizing l2 chain~~ Fix(node): altda failover to ethda should keep finalizing l2 chain Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(node): altda failover to ethda should keep finalizing l2 chain #12845

Fix(node): altda failover to ethda should keep finalizing l2 chain #12845

samlaf commented Nov 6, 2024 •

edited

Loading

samlaf Nov 6, 2024

alfredo-stonk left a comment

alfredo-stonk Nov 8, 2024

samlaf Nov 18, 2024

alfredo-stonk Nov 8, 2024

samlaf Nov 18, 2024

alfredo-stonk Nov 8, 2024

samlaf Nov 18, 2024


		func TestAltDA_FinalizationAfterEthDAFailover(gt *testing.T) {

Fix(node): altda failover to ethda should keep finalizing l2 chain #12845

Are you sure you want to change the base?

Fix(node): altda failover to ethda should keep finalizing l2 chain #12845

Conversation

samlaf commented Nov 6, 2024 • edited Loading

samlaf Nov 6, 2024

Choose a reason for hiding this comment

alfredo-stonk left a comment

Choose a reason for hiding this comment

alfredo-stonk Nov 8, 2024

Choose a reason for hiding this comment

samlaf Nov 18, 2024

Choose a reason for hiding this comment

alfredo-stonk Nov 8, 2024

Choose a reason for hiding this comment

samlaf Nov 18, 2024

Choose a reason for hiding this comment

alfredo-stonk Nov 8, 2024

Choose a reason for hiding this comment

samlaf Nov 18, 2024

Choose a reason for hiding this comment

samlaf commented Nov 6, 2024 •

edited

Loading