-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: EigenDA failover #46
Conversation
arbnode/batch_poster.go
Outdated
@@ -125,6 +125,7 @@ type BatchPoster struct { | |||
batchReverted atomic.Bool // indicates whether data poster batch was reverted | |||
nextRevertCheckBlock int64 // the last parent block scanned for reverting batches | |||
postedFirstBatch bool // indicates if batch poster has posted the first batch | |||
failoverETHDA bool // indicates if batch poster should failover to ETHDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add more context here, like "when a failover signal is received from proxy". Or "indicates whether batch posted is allowed to failover" or something
arbnode/batch_poster.go
Outdated
@@ -148,6 +149,7 @@ type BatchPosterDangerousConfig struct { | |||
type BatchPosterConfig struct { | |||
Enable bool `koanf:"enable"` | |||
DisableDapFallbackStoreDataOnChain bool `koanf:"disable-dap-fallback-store-data-on-chain" reload:"hot"` | |||
EnableEigenDAFailover bool `koanf:"eigenda-failover-to-anytrust" reload:"hot"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh they use koanf, cool! Been meaning to give it a try. Do you like it?
Also check the lint issue, the flag name and tag are different.
arbnode/batch_poster.go
Outdated
@@ -1241,7 +1253,7 @@ func (b *BatchPoster) maybePostSequencerBatch(ctx context.Context) (bool, error) | |||
} | |||
|
|||
var useEigenDA bool | |||
if b.eigenDAWriter != nil { | |||
if b.eigenDAWriter != nil && !b.failoverETHDA { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment explaining the logic here. Why are you setting useEigenDA to true only when we don't failover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we still want to use eigenda until we failover?
arbnode/node.go
Outdated
return nil, errors.New("eigenDA and anytrust cannot both be enabled without EnableEigenDAFailover=true in batch poster config") | ||
} | ||
|
||
if config.EigenDA.Enable { // anytrust is enabled as an EigenDA failover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can I know that anytrust is enabled here? if just seems to check that eigenda is enabled
arbnode/batch_poster.go
Outdated
// to update an internal field and retrigger the poster's event loop. Since this update should be down across batch posters on a distributed cluster, we | ||
// must also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: down -> done
also comment unfinished
defer func() { | ||
cancel() | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer func() { | |
cancel() | |
}() | |
defer cancel() |
@epociask my baaaddddd..... :( :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits
batchReverted atomic.Bool // indicates whether data poster batch was reverted | ||
nextRevertCheckBlock int64 // the last parent block scanned for reverting batches | ||
postedFirstBatch bool // indicates if batch poster has posted the first batch | ||
eigenDAFailoverToETHDA bool // indicates if batch poster should failover to ETHDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eigenDAFailoverToETHDA
sounds like a config variable and not a state variable. Consider renaming to failedOverToETHDA
or something more "stateful". Right now it sounds like a config flag that just failover is enabled.
if err != nil && errors.Is(err, eigenda.SvcUnavailableErr) && b.config().EnableEigenDAFailover && b.dapWriter != nil { // Failover to anytrust commitee if enabled | ||
log.Error("EigenDA service is unavailable, failing over to any trust mode") | ||
b.building.useEigenDA = false | ||
failOver = true | ||
} | ||
|
||
if err != nil && errors.Is(err, eigenda.SvcUnavailableErr) && b.config().EnableEigenDAFailover && b.dapWriter == nil { // Failover to ETH DA if enabled | ||
// when failing over to ETHDA (i.e 4844, calldata), we may need to re-encode the batch. To do this in compliance with the existing code, it's easiest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe refactor to have an outer if with the 3 same conditions and then an internal if/else only for the dapWriter. Took me a while to parse the difference, and to realized that dapWriter was probably a writer to anytrust.
|
||
if err != nil && errors.Is(err, eigenda.SvcUnavailableErr) && b.config().EnableEigenDAFailover && b.dapWriter == nil { // Failover to ETH DA if enabled | ||
// when failing over to ETHDA (i.e 4844, calldata), we may need to re-encode the batch. To do this in compliance with the existing code, it's easiest | ||
// to update an internal field and retrigger the poster's event loop. Since the batch poster can be distributed across mulitple nodes, there could be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodes
sounds a bit weird to me in this context. Is this standard arbitrum terminology? If not I would consider renaming to "multiple batch_posters". Node reminds me of too many different things like eth node, like k8s node, etc, but typically it means "a computer/physical server", and not a process (k8s pod)
// degraded temporary performance as each batch poster will re-encode the batch on another event loop tick using the coordination lock which could worst case | ||
// could require every batcher instance to fail dispersal to EigenDA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: could worst case could
if b.dapWriter != nil { | ||
var eigenDaBlobInfo *eigenda.EigenDABlobInfo | ||
eigenDADispersed := false | ||
failOver := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this failOver
variable different from eigenDAFailoverToETHDA
?
log.Error("EigenDA service is unavailable and anytrust is disabled, failing over to ETH DA") | ||
b.eigenDAFailoverToETHDA = true | ||
|
||
// // if the batch's size exceeds the native DA max size limit, we must re-encode the batch to accomodate the AnyTrust, calldata, and 4844 size limits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// //
b.eigenDAFailoverToETHDA = true | ||
|
||
// // if the batch's size exceeds the native DA max size limit, we must re-encode the batch to accomodate the AnyTrust, calldata, and 4844 size limits | ||
if (len(sequencerMsg) > b.config().MaxSize && !b.building.use4844) || (len(sequencerMsg) > b.config().Max4844BatchSize && b.building.use4844) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider changing order to make it cleaner:
if (b.building.use4844 && len(sequencerMsg) > b.config().Max4844BatchSize) || (!b.building.use4844 && len(sequencerMsg) > b.config().MaxSize)
|
||
batchPosterDASuccessCounter.Inc(1) | ||
batchPosterDALastSuccessfulActionGauge.Update(time.Now().Unix()) | ||
} else if failOver { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use eigenDAFailoverToETHDA
instead? Or are those variables different?
TLDR: Added failover from ETH DA to EigenDA in the event of detected 503 or service unavailable errors.