-
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use binary diff to persist finalized states #7005
base: feature/differential-archive
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/differential-archive #7005 +/- ##
================================================================
- Coverage 50.91% 50.90% -0.01%
================================================================
Files 594 594
Lines 39609 39602 -7
Branches 2245 2254 +9
================================================================
- Hits 20165 20160 -5
+ Misses 19444 19442 -2 |
Performance Report✔️ no performance regression detected Full benchmark results
|
packages/beacon-node/src/chain/historicalState/historicalState.ts
Outdated
Show resolved
Hide resolved
packages/beacon-node/src/chain/historicalState/historicalState.ts
Outdated
Show resolved
Hide resolved
packages/beacon-node/src/chain/historicalState/historicalState.ts
Outdated
Show resolved
Hide resolved
packages/beacon-node/src/chain/historicalState/historicalState.ts
Outdated
Show resolved
Hide resolved
docs/pages/contribution/advanced-topics/historical-state-regen.md
Outdated
Show resolved
Hide resolved
docs/pages/contribution/advanced-topics/historical-state-regen.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great, I guess I need much more time to review through. Need to add much more comments on how the strategy works and the comment for classes/methods to make sure we have a maintainable code. I suggest to have this style of comment to make it easier to understand for everyone
* Persist states every some epochs to |
also need more insight for this work:
- how much disc space does it need for storing states with the current approach, compared to the default config of the new approach
- need benchmark results to come up with the default config
- the time to compute state diff and apply it, and the disc space needed
- need to download mainnet/holesky states for a good estimate
There is still some work left to optimize the diff size:
Afterwards we can have more realistic disk space estimation. |
could you add TODOs in the description? |
@nazarhussain
|
d3441b2
to
3c5ecef
Compare
} | ||
|
||
async storeHistoricalState(slot: number, stateBytes: Uint8Array): Promise<void> { | ||
return this.api.storeHistoricalState(slot, stateBytes, this.stateArchiveMode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should pass partialState
, balances
and whatever through worker thread boundary, instead of serializing on main thread and deserialize it on worker thread again because deserializing state takes a lot of memory and inefficient
state root and slot could be passed through worker thread boundary too
activeSlot: intermediateStateArchive.slot, | ||
activeStateSize: formatBytes(StateArchiveSSZType.serialize(activeStateArchive).byteLength), | ||
diffSlot: intermediateStateArchive.slot, | ||
diffStateSize: formatBytes(StateArchiveSSZType.serialize(intermediateStateArchive).byteLength), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not serialize just to have its length, same to above
there is value_serializedSize
api for it
|
||
export function stateBytesToStateArchive(stateBytes: Uint8Array, forkConfig: ChainForkConfig): StateArchive { | ||
const slot = getStateSlotFromBytes(stateBytes); | ||
return stateToStateArchive(forkConfig.getForkTypes(slot).BeaconState.deserialize(stateBytes), forkConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not performant to deserialize to state object here, if we have state ViewDU on main thread we should be able to compute partialState and balances there before passing through thread boundary
but if we have the full state bytes on the main thread we need to find a way to extract a full state bytes to partial state bytes and balances state bytes anyway. Maybe we can think about a ultility function to do that right on state bytes?
throw e; | ||
} | ||
blockCount++; | ||
if (Buffer.compare(state.hashTreeRoot(), block.message.stateRoot) !== 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this inside the for loop for every block because if it happens there it will not reach here
|
||
return measure(metrics?.stateTransitionTime, async () => { | ||
let state = config.getForkTypes(toSlot).BeaconState.deserializeToViewDU(lastFullStateBytes); | ||
syncPubkeyCache(state, pubkey2index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync pubkeys is expensive, we should not do it if really necessary
since we don't verify signatures, there's a chance we don't have to do it please double check
case HistoricalStateStorageType.Snapshot: { | ||
return measure(metrics?.loadSnapshotStateTime, async () => { | ||
const stateArchive = await db.hierarchicalStateArchiveRepository.get(slot); | ||
return stateArchive ? stateArchiveToStateBytes(stateArchive, config) : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
converting a Uint8Array to state archive object to full state bytes seems not efficient
we can think about an util to work on Uint8Array directly instead
it's tricky but if we have enough unit tests for different forks I think should be good
Motivation
Reduce the storage requirement and improve performance for the state regeneration for archival node.
Description
Steps to test or reproduce