From d48c8f794358c46e281332cdf018771a78a10c40 Mon Sep 17 00:00:00 2001 From: Stan Bondi Date: Mon, 9 Oct 2023 12:50:24 +0400 Subject: [PATCH] fix(consensus)!: use temporary status updates for blocks > locked block (#706) Description --- - Creates state updates per block per transaction as blocks are processed to allow for forks. - Changes sync protocol to send many smaller messages due to substate data for full blocks exceeding 4Mb limit in RPC, preventing sync - Only lock substate objects when a block has been justified (i.e. is a new leaf block) Motivation and Context --- Up to two justified blocks can be forked out. Since we process blocks as they come in to determine what commands to propose in the next block, which require state changes, a fork can result in invalid new proposals. Forks occur in non-malicious cases (stress test) due to high message volumes which can prevent a node both from sending and receiving proposal messages before leader timeout, though this is notably improved after #681 and #693. After leader failure, new proposals with dummy block parents supersede previously processed blocks. This PR tracks state changes above the locked block and uses the correct transaction phases/stages from the locked block and current leaf for new proposals. Stress testing (up to 1000 transactions in a batch) largely went better and the chain always continued. Some issues were encountered when a node fell behind during stress tests and switched to sync. Some nodes were left with already finalised transactions in their pool (TBD BUG) and would re-propose them when it was their turn to be leader. The block would not be voted on and leader failure would result in the chain continuing. Fixing this bug will be a focus for subsequent PRs. How Has This Been Tested? --- Existing consensus tests, manually with 8 VNs and stress testing, existing cucumbers What process can a PR reviewer use to test or verify this change? --- Run a multi-node network Breaking Changes --- - [ ] None - [x] Requires data directory to be deleted - [ ] Other - Please specify --- .../src/transaction_executor.rs | 2 +- .../src/handlers/rpc.rs | 6 +- .../tari_signaling_server/src/data.rs | 10 +- .../tari_validator_node/src/bootstrap.rs | 42 +- .../src/comms/deserialize.rs | 27 +- .../src/consensus/handle.rs | 33 + .../tari_validator_node/src/consensus/mod.rs | 36 +- .../tari_validator_node/src/dan_node.rs | 2 +- .../src/event_subscription.rs | 1 + .../src/p2p/rpc/sync_task.rs | 58 +- .../src/p2p/services/mempool/handle.rs | 12 - .../src/p2p/services/mempool/initializer.rs | 5 + .../src/p2p/services/mempool/service.rs | 38 +- dan_layer/common_types/src/shard_id.rs | 2 +- dan_layer/comms_rpc_state_sync/src/manager.rs | 97 ++- dan_layer/consensus/src/hotstuff/common.rs | 6 +- dan_layer/consensus/src/hotstuff/error.rs | 9 + .../consensus/src/hotstuff/on_force_beat.rs | 2 +- .../src/hotstuff/on_next_sync_view.rs | 4 +- .../consensus/src/hotstuff/on_propose.rs | 10 +- .../src/hotstuff/on_receive_new_view.rs | 72 +- .../src/hotstuff/on_receive_proposal.rs | 703 +++++++++++------- .../on_receive_requested_transactions.rs | 1 + .../consensus/src/hotstuff/on_receive_vote.rs | 2 +- dan_layer/consensus/src/hotstuff/pacemaker.rs | 2 +- .../src/hotstuff/state_machine/mod.rs | 1 + .../src/hotstuff/state_machine/state.rs | 30 + .../src/hotstuff/state_machine/worker.rs | 13 +- dan_layer/consensus/src/hotstuff/worker.rs | 13 +- dan_layer/consensus_tests/src/consensus.rs | 22 +- .../src/support/epoch_manager.rs | 7 +- .../consensus_tests/src/support/harness.rs | 54 +- .../src/support/validator/builder.rs | 4 +- dan_layer/engine/src/transaction/processor.rs | 11 +- dan_layer/engine_types/src/commit_result.rs | 13 +- dan_layer/state_store_sqlite/Cargo.toml | 2 +- .../up.sql | 67 +- dan_layer/state_store_sqlite/src/reader.rs | 324 +++++++- dan_layer/state_store_sqlite/src/schema.rs | 30 +- .../src/sql_models/transaction_pool.rs | 70 +- dan_layer/state_store_sqlite/src/store.rs | 10 + dan_layer/state_store_sqlite/src/writer.rs | 213 ++++-- dan_layer/state_store_sqlite/tests/tests.rs | 126 +++- .../storage/src/consensus_models/block.rs | 36 +- .../storage/src/consensus_models/command.rs | 25 +- .../consensus_models/executed_transaction.rs | 2 +- .../storage/src/consensus_models/high_qc.rs | 16 +- .../src/consensus_models/last_proposed.rs | 6 + .../src/consensus_models/leaf_block.rs | 2 +- .../src/consensus_models/locked_output.rs | 8 + dan_layer/storage/src/consensus_models/mod.rs | 2 + .../storage/src/consensus_models/substate.rs | 8 + .../src/consensus_models/transaction.rs | 29 +- .../consensus_models/transaction_decision.rs | 7 + .../src/consensus_models/transaction_pool.rs | 190 +++-- .../transaction_pool_status_update.rs | 48 ++ dan_layer/storage/src/state_store/mod.rs | 35 +- dan_layer/transaction/src/transaction_id.rs | 4 + dan_layer/validator_node_rpc/proto/rpc.proto | 18 +- .../validator_node_rpc/src/block_sync.rs | 48 ++ dan_layer/validator_node_rpc/src/lib.rs | 2 + 61 files changed, 2015 insertions(+), 663 deletions(-) create mode 100644 applications/tari_validator_node/src/consensus/handle.rs create mode 100644 dan_layer/storage/src/consensus_models/transaction_pool_status_update.rs create mode 100644 dan_layer/validator_node_rpc/src/block_sync.rs diff --git a/applications/tari_dan_app_utilities/src/transaction_executor.rs b/applications/tari_dan_app_utilities/src/transaction_executor.rs index 5d58109f4..0d331ec30 100644 --- a/applications/tari_dan_app_utilities/src/transaction_executor.rs +++ b/applications/tari_dan_app_utilities/src/transaction_executor.rs @@ -80,7 +80,7 @@ where TTemplateProvider: TemplateProvider