Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EFM] Recoverable Random Beacon State Machine follow up updates #6815

Open
wants to merge 12 commits into
base: feature/efm-recovery
Choose a base branch
from

Conversation

durkmurder
Copy link
Member

@durkmurder durkmurder commented Dec 13, 2024

Context

This PR addresses some open comments from PR #6771.
Specifically:

By far the biggest change is how we insert/upsert keys, now caller needs to provide evidence in form of EpochCommit so state machine can sanity check if it's inserting a valid private key.

@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 45.94595% with 60 lines in your changes missing coverage. Please review.

Project coverage is 41.74%. Comparing base (a35632f) to head (caef7c3).

Files with missing lines Patch % Lines
storage/badger/dkg_state.go 58.22% 23 Missing and 10 partials ⚠️
storage/mock/dkg_state.go 0.00% 12 Missing ⚠️
cmd/consensus/main.go 0.00% 6 Missing ⚠️
engine/consensus/dkg/reactor_engine.go 44.44% 4 Missing and 1 partial ⚠️
storage/mock/epoch_recovery_my_beacon_key.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                    @@
##           feature/efm-recovery    #6815      +/-   ##
========================================================
- Coverage                 41.81%   41.74%   -0.08%     
========================================================
  Files                      1588     2033     +445     
  Lines                    144098   181300   +37202     
========================================================
+ Hits                      60251    75677   +15426     
- Misses                    78887    99385   +20498     
- Partials                   4960     6238    +1278     
Flag Coverage Δ
unittests 41.74% <45.94%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@durkmurder durkmurder marked this pull request as ready for review December 17, 2024 14:29

// start up the engine
unittest.AssertClosesBefore(suite.T(), suite.engine.Ready(), time.Second)
unittest.AssertClosesBefore(suite.T(), suite.engine.Ready(), 100*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really not starting up within a second? 100s seems like a lot.. Maybe 5s or 10s?

// verify that the key is part of the EpochCommit
if err = ensureKeyIncludedInEpoch(epochCounter, key, commit); err != nil {
return storage.NewInvalidDKGStateTransitionErrorf(currentState, flow.RandomBeaconKeyCommitted,
"previously storred key has not been found in epoch commit event: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"previously storred key has not been found in epoch commit event: %w", err)
"previously stored key has not been found in epoch commit event: %w", err)

// CommitMyBeaconPrivateKey commits the previously inserted random beacon private key for an epoch.
// Effectively, this method transitions the state machine into the [flow.RandomBeaconKeyCommitted] state if the current state is [flow.DKGStateCompleted].
// Caller needs to supply the [flow.EpochCommit] which is an evidence that the key has been indeed included for the given epoch.
// No errors are expected during normal operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// No errors are expected during normal operations.
// If the current state is already [flow.RandomBeaconKeyCommitted], this function is a no-op regardless of input.
// No errors are expected during normal operations.

Describing the behaviour on line 238.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding Jordan's suggestion:

If the current state is already [flow.RandomBeaconKeyCommitted], this function is a no-op regardless of input.

This is probably the most minimal implementation, but I am not sure it would be the most robust. I think there are two non-happy path scenarios I think we should consider for the CommitMyBeaconPrivateKey:

  1. A random beacon key was previously committed (i.e. we are in the flow.RandomBeaconKeyCommitted state). If there is a call to CommitMyBeaconPrivateKey with an flow.EpochCommit that is inconsistent with the committed key, we have a clear failure condition.
  2. A random beacon key was previously committed (i.e. we are in the flow.RandomBeaconKeyCommitted state). The state machine receives a call to CommitMyBeaconPrivateKey with an flow.EpochCommit that is consistent with the committed key. Here, we could either return an exception and say that repeated calls are in principle not allowed. Alternatively, we can treat this call as confirming information that the state machine already has and ignore this call (as information is idempotent). The latter is my preference.

What makes me nervous is taking the response strategy for 2. and also applying it to 1., because we would be proceeding in a clear failure case. I think it would be comparatively easy to be more strict in this case.

@@ -344,9 +515,16 @@ func TestDKGState_RandomBeaconKeyCommittedState(t *testing.T) {
store, err := NewRecoverableRandomBeaconStateMachine(metrics, db)
require.NoError(t, err)

var evidence *flow.EpochCommit
var pkey crypto.PrivateKey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var pkey crypto.PrivateKey
var privKey crypto.PrivateKey

Nitpick: avoiding just the "p" prefix (because public also starts with "p" 😄)

Comment on lines +522 to +527
pkey = unittest.StakingPrivKeyFixture()
evidence = unittest.EpochCommitFixture(func(commit *flow.EpochCommit) {
commit.Counter = epochCounter
commit.DKGParticipantKeys[0] = pkey.PublicKey()
})
err = store.UpsertMyBeaconPrivateKey(epochCounter, pkey, evidence)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pkey = unittest.StakingPrivKeyFixture()
evidence = unittest.EpochCommitFixture(func(commit *flow.EpochCommit) {
commit.Counter = epochCounter
commit.DKGParticipantKeys[0] = pkey.PublicKey()
})
err = store.UpsertMyBeaconPrivateKey(epochCounter, pkey, evidence)
privKey = unittest.StakingPrivKeyFixture()
evidence = unittest.EpochCommitFixture(func(commit *flow.EpochCommit) {
commit.Counter = epochCounter
commit.DKGParticipantKeys[0] = privKey.PublicKey()
})
err = store.UpsertMyBeaconPrivateKey(epochCounter, privKey, evidence)

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks for the detailed work and solid tests. My comments mostly fall into two categories:

  • how we represent the underlying logic in code - in particular, we want to make it as easy as possible for somebody reading the code to convince themself of correctness (and which parts can be modified / extended without breaking correctness)
  • clearly documenting under which conditions operations are idempotent and structuring the code accordingly for clarity.

Comment on lines +227 to +229
// CommitMyBeaconPrivateKey commits the previously inserted random beacon private key for an epoch.
// Effectively, this method transitions the state machine into the [flow.RandomBeaconKeyCommitted] state if the current state is [flow.DKGStateCompleted].
// Caller needs to supply the [flow.EpochCommit] which is an evidence that the key has been indeed included for the given epoch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CommitMyBeaconPrivateKey commits the previously inserted random beacon private key for an epoch.
// Effectively, this method transitions the state machine into the [flow.RandomBeaconKeyCommitted] state if the current state is [flow.DKGStateCompleted].
// Caller needs to supply the [flow.EpochCommit] which is an evidence that the key has been indeed included for the given epoch.
// CommitMyBeaconPrivateKey commits the previously inserted random beacon private key for an epoch. Effectively, this method
// transitions the state machine into the [flow.RandomBeaconKeyCommitted] state if the current state is [flow.DKGStateCompleted].
// The caller needs to supply the [flow.EpochCommit] as evidence that the stored key is valid for the specified epoch. Repeated
// calls for the same epoch are accepted (idempotent operation), if and only if the provided EpochCommit confirms the already
// committed key.

Comment on lines +237 to +251
// if we are in committed state then there is nothing to do
if currentState == flow.RandomBeaconKeyCommitted {
return nil
}
key, err := ds.keyCache.Get(epochCounter)(tx.DBTxn)
if err != nil {
return storage.NewInvalidDKGStateTransitionErrorf(currentState, flow.RandomBeaconKeyCommitted, "cannot transition without a valid random beacon key: %w", err)
}

// verify that the key is part of the EpochCommit
if err = ensureKeyIncludedInEpoch(epochCounter, key, commit); err != nil {
return storage.NewInvalidDKGStateTransitionErrorf(currentState, flow.RandomBeaconKeyCommitted,
"previously storred key has not been found in epoch commit event: %w", err)
}
return ds.processStateTransition(epochCounter, currentState, flow.RandomBeaconKeyCommitted)(tx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, I would suggest to always enforce that the stored random beacon key is consistent with the provided EpochCommit. Only the state transition itself should be omitted, if this is a repeated call. In a nutshell, the block if currentState == flow.RandomBeaconKeyCommitted would just move down.

Suggested change
// if we are in committed state then there is nothing to do
if currentState == flow.RandomBeaconKeyCommitted {
return nil
}
key, err := ds.keyCache.Get(epochCounter)(tx.DBTxn)
if err != nil {
return storage.NewInvalidDKGStateTransitionErrorf(currentState, flow.RandomBeaconKeyCommitted, "cannot transition without a valid random beacon key: %w", err)
}
// verify that the key is part of the EpochCommit
if err = ensureKeyIncludedInEpoch(epochCounter, key, commit); err != nil {
return storage.NewInvalidDKGStateTransitionErrorf(currentState, flow.RandomBeaconKeyCommitted,
"previously storred key has not been found in epoch commit event: %w", err)
}
return ds.processStateTransition(epochCounter, currentState, flow.RandomBeaconKeyCommitted)(tx)
// Repeated calls for same epoch are idempotent, but only if they consistently confirm the stored private key. We explicitly
// enforce consistency with the already committed key here. Repetitions are considered rare, so performance overhead is acceptable.
key, err := ds.keyCache.Get(epochCounter)(tx.DBTxn)
if err != nil {
return storage.NewInvalidDKGStateTransitionErrorf(currentState, flow.RandomBeaconKeyCommitted, "cannot transition without a valid random beacon key: %w", err)
}
// verify that the key is part of the EpochCommit
if err = ds.ensureKeyIncludedInEpoch(epochCounter, key, commit); err != nil {
return storage.NewInvalidDKGStateTransitionErrorf(currentState, flow.RandomBeaconKeyCommitted,
"according to EpochCommit event, my stored random beacon key is not valid for signing: %w", err)
}
// transition to RandomBeaconKeyCommitted, unless this is a repeated call, in which case there is nothing else to do
if currentState == flow.RandomBeaconKeyCommitted {
return nil
}
return ds.processStateTransition(epochCounter, currentState, flow.RandomBeaconKeyCommitted)(tx)

// verify that the key is part of the EpochCommit
if err = ensureKeyIncludedInEpoch(epochCounter, key, commit); err != nil {
return storage.NewInvalidDKGStateTransitionErrorf(currentState, flow.RandomBeaconKeyCommitted,
"previously storred key has not been found in epoch commit event: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"previously storred key has not been found in epoch commit event: %w", err)
"according to EpochCommit event, the input random beacon key is not valid for signing: %w", err)

err := operation.RetrieveDKGStateForEpoch(epochCounter, &currentState)(txn)
if err != nil && !errors.Is(err, storage.ErrNotFound) {
return currentState, fmt.Errorf("could not retrieve current state for epoch %d: %w", epochCounter, err)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines +275 to +277
// if we are in committed state, we cannot overwrite the key, but we can ignore this input iff the provided key is the same
if currentState == flow.RandomBeaconKeyCommitted {
// check if the stored key is equal to the provided key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if we are in committed state, we cannot overwrite the key, but we can ignore this input iff the provided key is the same
if currentState == flow.RandomBeaconKeyCommitted {
// check if the stored key is equal to the provided key
// Repeated calls for same epoch are idempotent, but only if they consistently confirm the stored private key. We explicitly
// enforce consistency with the already committed key here. Repetitions are considered rare, so performance overhead is acceptable.
if currentState == flow.RandomBeaconKeyCommitted {

Comment on lines +76 to +78
// CommitMyBeaconPrivateKey commits the previously inserted random beacon private key for an epoch.
// Effectively, this method transitions the state machine into the [flow.RandomBeaconKeyCommitted] state if the current state is [flow.DKGStateCompleted].
// Caller needs to supply the [flow.EpochCommit] which is an evidence that the key has been indeed included for the given epoch.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CommitMyBeaconPrivateKey commits the previously inserted random beacon private key for an epoch.
// Effectively, this method transitions the state machine into the [flow.RandomBeaconKeyCommitted] state if the current state is [flow.DKGStateCompleted].
// Caller needs to supply the [flow.EpochCommit] which is an evidence that the key has been indeed included for the given epoch.
// CommitMyBeaconPrivateKey commits the previously inserted random beacon private key for an epoch. Effectively, this method
// transitions the state machine into the [flow.RandomBeaconKeyCommitted] state if the current state is [flow.DKGStateCompleted].
// The caller needs to supply the [flow.EpochCommit] as evidence that the stored key is valid for the specified epoch. Repeated
// calls for the same epoch are accepted (idempotent operation),if and only if the provided EpochCommit confirms the already
// committed key.

Comment on lines 95 to 96
// from Epoch Fallback Mode. State transitions are allowed if and only if the current state is not equal to
// [flow.RandomBeaconKeyCommitted]. The resulting state of this method call is [flow.RandomBeaconKeyCommitted].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// from Epoch Fallback Mode. State transitions are allowed if and only if the current state is not equal to
// [flow.RandomBeaconKeyCommitted]. The resulting state of this method call is [flow.RandomBeaconKeyCommitted].
// from Epoch Fallback Mode. The resulting state of this method call is [flow.RandomBeaconKeyCommitted].
// State transitions are allowed if and only if the current state is not equal to [flow.RandomBeaconKeyCommitted].
// Repeated calls for the same epoch are idempotent, if and only if the provided EpochCommit confirms the already
// committed key (error otherwise).

Comment on lines 256 to 257
// from Epoch Fallback Mode. State transitions are allowed if and only if the current state is not equal to
// [flow.RandomBeaconKeyCommitted]. The resulting state of this method call is [flow.RandomBeaconKeyCommitted].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// from Epoch Fallback Mode. State transitions are allowed if and only if the current state is not equal to
// [flow.RandomBeaconKeyCommitted]. The resulting state of this method call is [flow.RandomBeaconKeyCommitted].
// from Epoch Fallback Mode. The resulting state of this method call is [flow.RandomBeaconKeyCommitted].
// State transitions are allowed if and only if the current state is not equal to [flow.RandomBeaconKeyCommitted].
// Repeated calls for the same epoch are idempotent, if and only if the provided EpochCommit confirms the already
// committed key (error otherwise).

@@ -316,7 +316,7 @@ func main() {
return fmt.Errorf("could not load beacon key file: %w", err)
}

rootEpoch := node.State.AtBlockID(node.FinalizedRootBlock.ID()).Epochs().Current()
rootEpoch := rootSnapshot.Epochs().Current()
epochCounter, err := rootEpoch.Counter()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, could you rename this variable

Suggested change
epochCounter, err := rootEpoch.Counter()
rootEpochCounter, err := rootEpoch.Counter()

@@ -346,7 +346,11 @@ func main() {
// perform this only if state machine is in initial state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would suggest documenting the logical block of lines 341 to 357 as a whole. Something like

// store my beacon key for the first epoch post-spork (only if we haven't run this logic before, i.e. state machine is in initial state)
started, err := myBeaconKeyStateMachine.IsDKGStarted(epochCounter)
if err != nil {
  return fmt.Errorf("could not get DKG started flag for root epoch %d: %w", epochCounter, err)
}
if !started {
  epochProtocolState, err := rootSnapshot.EpochProtocolState()
  ⋮

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants