121 last quorum acked #93

drmingdrmer · 2024-07-24T08:45:49Z

Checklist

Updated guide with pertinent info (may not always apply).
Squash down commits to one or two logical commits which clearly describe the work you've done.
Unittest is a friend:)

…is received When a `SnapshotMismatch` is received, the sending end should re-send all snapshot data from the beginning.

When `Engine::handle_install_full_snapshot()` is called and the provided snapshot is not up-to-date, the snapshot should not be installed, and the response should be sent back immediately. Previously, the method might delay the response unnecessarily, waiting for an installation process that would not proceed. This commit adjusts the logic so that if the snapshot is recognized as outdated, it immediately returns a `None` `Condition`, ensuring the caller is informed straightaway that no installation will occur.

- Remove `generic-snapshot-data`. The snapshot data trait is now consistently bound to `OptionalSend + 'static`. - Introduction of `RaftNetworkV2::full_snapshot()` for transmitting snapshots in one piece. - Retention of the original chunk-based snapshot transmission API in `RaftNetwork::install_snapshot()`. From this commit onwards: - To use the one-piece snapshot transmission, implement `RaftNetworkV2` when creating a `Raft` instance. - To continue using the chunk-based transmission, implement `RaftNetwork`. Compatibility: - `RaftNetworkV2` is automatically implemented for any `RaftNetwork` if `RaftTypeConfig::SnapshotData` implements `tokio::io::AsyncRead`, `tokio::io::AsyncWrite`, `tokio::io::AsyncSeek`, and `Unpin`. Upgrade tip: - Remove the `generic-snapshot-data` feature flag. - If you prefer using the original chunk-based method, no further action is required. Otherwise, to adopt the new one-piece snapshot transmission: - Replace `RaftNetwork` and `RaftNetworkFactory` with `RaftNetworkV2` and `RaftNetworkFactoryV2`.

…kv-memstore` example (databendlabs#1121) - Avoid race conditions in `raft-kv-memstore` example - Add Storage tests for `raft-kv-memstore` - Test: add snapshotting test to Suite

…arence order in doc page (databendlabs#1122)

…abendlabs#1120)

`LogIOId` is utilized to uniquely identify an append-entries IO globally. It plays a crucial role in the log IO callback, informing `RaftCore` about the latest log that has been successfully flushed to disk. In this commit, `LogIOId` is defined as a tuple comprising `leader_id`, which submits the log IO, and `last_log_id`, which represents the final log ID in the append-entries IO operation. Given that the `leader_id` is monotonically increasing and a leader consistently submits log entries in a sequential order, this design ensures that `LogIOId` is also monotonically increasing. This feature allows for effective tracking of IO progress.

This commit addresses a critical issue where if a new leader does not flush the blank log to disk upon becoming established and then restarts immediately, there is a possibility that previously committed data becomes invisible to readers. Before the blank log is flushed, the leader (identified by vote `v3`) assumes it will be flushed and commits this log once (|quorum|-1) replication responses are received. If the blank log is lost and the server is restarted, data committed by a new leader (vote `v2`) may not be visible. This issue is addressed by utilizing `LeaderHandler::leader_append_entries()` instead of `ReplicationHandler::append_blank_log()`, where the former does not wait for the blank log to flush. Changes: - When assigning log IDs to log entries, the `Leading.last_log_id`, which represents the state of the log proposer (equivalent term in Paxos is Proposer), should be used instead of `RaftState.last_log_id`, which represents the state of the log receiver (equivalent term in Paxos is Acceptor). - Consequently, the method `assign_log_ids()` has been moved from `RaftState` to `Leading`. - Avoid manual implementation of duplicated logic: - During `initialize()`, reuse `FollowingHandler::do_append_entries()` to submit the very first log to storage. - In `establish_leader()`, reuse `LeaderHandler::leader_append_entries()` to submit log to storage and remove `ReplicationHandler::append_blank_log()`. - Remove `Command::AppendEntry`.

In this commit, `LeaderId` has been replaced with `CommittedLeaderId` for identifying log IO requests. This change is made because `CommittedLeaderId` is shorter and sufficiently identifies the necessary attributes since only an established leader (a committed leader) can write or replicate logs. Thus, using `CommittedLeaderId` streamlines the identification process for log IO requests.

Signed-off-by: Phoeniix Zhao <[email protected]>

This commit adds the `RaftLogReader::limited_get_log_entries()` method, which enables applications to fetch log entries that are equal to or smaller than a specified range. This functionality is particularly useful for customizing the size of AppendEntries requests at the storage API level. - Applications can now decide the number of log entries to return based on the input range. If the application determines that the requested log entries range is too large for a single RPC, it can opt to return only the first several requested log entries instead of the full range. - The method provides a default implementation that delegates the operation to `RaftLogReader::try_get_log_entries`. This enhancement allows for more flexible and efficient handling of log entries, particularly in scenarios where network constraints or performance considerations require smaller data transfers.

This commit updates `RaftNetworkV2` to treat any remote errors occurring during RPC, like a `Fatal` error, as an `Unreachable` error. This is due to Openraft's current inability to distinguish between an unreachable node and a broken node. Other changes: - **Helper trait `DecomposeResult`:** Introduced to simplify handling composite errors. It converts a result of the form `Result<R, ErrorAOrB>` into a nested result `Result<Result<R, ErrorA>, ErrorB>`. Upgrade tip: - Implementations of `RaftNetworkV2` or `RaftNetwork` should now convert remote errors such as `Raft::Fatal` to `Unreachable` errors to maintain consistency in error handling.

- **Election Update Delay:** - The `Voting` state is now updated only after `SaveVote` is successfully persisted on disk. This anticipates future modifications where `SaveVote` will be callback-based, ensuring state consistency. - **Deprecation of `VoteResponse.vote_granted`:** - Granting a `RequestVote` is now determined solely by whether `VoteResponse.vote` matches the candidate's vote. This simplifies the logic and reduces ambiguity in vote handling. - **Replication Stream Reorganization:** - Replication stream handles have been moved from `RaftCore::LeaderData` to `RaftCore`. This reorganization prepares for supporting `RequestVote` RPCs in the replication mechanism, although it currently continues to support only log and snapshot replication. - **Vote Comparison Enhancement:** - When filtering out stale messages (e.g., when a new Leader receives an `AppendEntries` response intended for a previous Leader), the responded vote is now compared with the vote in the `Leading` state. Previously, it was compared with `RaftCore.engine.raft_state.vote`, which might differ from the `Leading` state's vote in future scenarios. - **Election State Management:** - During election, `Engine.candidate` is `Some` and `Engine.leader` is `None`. Upon election completion, `Engine.candidate` is cleared, and a new `Leader` is assigned to `Engine.leader`. - Renamed `Engine.internal_server_state` to `leader`, as it now only stores a `Leader` instance. The candidate state is moved to `Engine.candidate`. - Moved the process of establishing a Leader to `EstablishHandler`. - A Leader can now be established in two ways: by election or by re-creating the Leader upon startup. - Replace `InternalServerState` with `Option<Box<Leader<...>>>` - **Logging Enhancements:** - Added `DisplaySliceExt` to create a displayable instance for slices of `impl Display`. - Added `DisplayInstant` to display `Instant` in a human-readable format, improving logging readability.

…` response This commit addresses an issue in the vote updating mechanism during the handling of `RequestVote` responses. Previously, encountering a higher `vote` in a response incorrectly led to an update of the local `state.vote`, which could break Raft consistency rules. **Issue Description:** - A higher `vote` seen in a `RequestVote` response does not necessarily mean that the vote is granted. The local `state.vote` should only be updated if the vote is actually granted, i.e., the responding node has a higher `vote` and a `last_log_id` that allows it to become a leader. **Resolution:** - The local `state.vote` will no longer be updated upon merely seeing a higher `vote` in the `RequestVote` response. Instead, this higher vote will be recorded in `last_seen_vote` for consideration in the next election cycle, without updating the current `state.vote`. This bug is introduced in `0.10`: 36d2c11

This commit introduces a new trait, `TypeConfigExt`, which extends `RaftTypeConfig`. The purpose of this trait is to simplify the access to various functionalities provided by the `RaftTypeConfig` trait, enhancing code readability and reducing complexity. **Methods Added to `TypeConfigExt`:** - `now()` - `sleep()` - `sleep_until()` - `timeout()` - `timeout_at()` - `oneshot()` - `spawn()` **Usage Improvement:** - Instead of using the `<<C as RaftTypeConfig>::AsyncRuntime as AsyncRuntime>::Instant::now()`, you can now simply call `C::now()`.

- `ReplicationSessionId<NID>` to `ReplicationSessionId<C>` - `LeaderQuorumSet<NID>` to `LeaderQuorumSet<C>` - `Condition<NID>` to `Condition<C>` - `ProgressEntry<NID>` to `ProgressEntry<C>`

- `StorageError<NID>` to `StorageError<C>` - `StorageIOError<NID>` to `StorageIOError<C>` - `ErrorSubject<NID>` to `ErrorSubject<C>` - `DefensiveError<NID>` to `DefensiveError<C>` - `Violation<NID>` to `Violation<C>` - `SnapshotSignature<NID>` to `SnapshotSignature<C>` - `ChangeMembers<NID, N>` to `ChangeMembers<C>` - `LogIdList<NID>` to `LogIdList<C>` Upgrade tip: Replace `StorageError<NID>` with `StorageError<C>`, such as: ```diff - async fn apply<I>(&mut self, entries: I) -> Result<Vec<Response>, StorageError<NodeId>> + async fn apply<I>(&mut self, entries: I) -> Result<Vec<Response>, StorageError<TypeConfig>> ``` Types to replace: - `StorageError<NID>` to `StorageError<C>` - `StorageIOError<NID>` to `StorageIOError<C>` - `ErrorSubject<NID>` to `ErrorSubject<C>` - `DefensiveError<NID>` to `DefensiveError<C>` - `Violation<NID>` to `Violation<C>` - `SnapshotSignature<NID>` to `SnapshotSignature<C>` - `ChangeMembers<NID, N>` to `ChangeMembers<C>` - `LogIdList<NID>` to `LogIdList<C>`

This commit introduces a new type, `CommittedVote`, which represents a committed vote. This enhancement is aimed at ensuring that only committed votes are considered valid for the leader's status. By using the `CommittedVote` type, we can eliminate several runtime assertion checks previously required to verify the commitment status of votes.

LogIdRange Inflight LogStateReader Accepted

- Part of: databendlabs#1013

When a replication response is handled, RaftCore should initiate next replication if there are more logs to to replicate. Before this commit this is done in response-handler. In this commit it is moved to `RaftCore::runtime_loop`: replications are re-initiated when all of the events are processed. This way the replication response handler is simplified.

This commit enhances the initialization process of a `Leader` instance by automating the setup of progress information and replication streams. Previously, these tasks were manually performed by the caller after the leader's creation, which could lead to inconsistencies or additional complexity.

* Refactor: Abstract the watch channel into a traits

Since this commit, `RaftCore` returns at once upon submitting ApendEntries IO request to `RaftLogStorage`, without waiting for the IO to be flushed to disk. When flushed, the result is responded to `RaftCore` via a `Notify` channel. This way `RaftCore` won't be blocked by AppendEntries IO operation: while entries being flushing to disk, `RaftCore` is still able to deal with other operations. Upgrade(non-breaking) tip: - Deprecated `LogFlushed`, use `IOFlushed` instead. - Deprecated `LogFlushed::log_io_completed()`, use `IOId::io_completed()` instead.

- `WatchSender` is added in: 1b4ec53

This commit refines our logging strategy by utilizing the `Display` trait instead of `Debug` for significant events. This change is aimed at producing logs that are easier to read and understand. Three kinds of significant events are logged at DEBUG level: - `input`: the `RaftMsg`s received by `RaftCore`, such as client-write or AppendEntries request from the Leader. - `cmd`: the `Command` outputted by `Engine` to execute by storage or network layer, such as `AppendInputEntries` or `ReplicateCommitted`. - `notify`: the `Notification`s received by `RaftCore` from storage or network. Example significant event logs: ``` RAFT_event id=0 cmd: Commit: seq: 5, (T1-N0.4, T1-N0.5] RAFT_event id=1 input: AppendEntries: vote=<T1-N0:Q>, prev_log_id=T1-N0.5, leader_commit=T1-N0.5, entries=[] RAFT_event id=0 notify: sm::Result(command_seq:5, Ok(ApplyResult([5, 6), last_applied=T1-N0.5, entries=[T1-N0.5]))) RAFT_event id=0 input: ClientWriteRequest ```

This commit addresses the premature sending of `Respond` messages in response to `AppendEntries` RPCs and similar requests. Previously, responses could be sent before the associated IO operations were confirmed as complete, potentially leading to inconsistencies. Changes: - The `Respond` now blocks until the corresponding IO operation is completed and properly recorded in `IOProgress`. - `RequestVote` and `AppendEntries` RPC handlers now include a `Condition` that remains unsatisfied until operations like `SaveVote` or `AppendInputEntries` are flushed to disk. - Add `IOProgress`: - **`accepted`:** Tracks the ID of the last IO operation that was accepted but not yet submitted. - **`submitted`:** The ID of the last IO operation submitted to either `RaftLogStorage` or `RaftStateMachine`. - **`flushed`:** The ID of the last IO operation successfully flushed to storage. - Remove `command_seq`: Instead of using `command_seq` to track the completion of state machine commands, this update uses `IOState.applied` and `IOState.snapshot`. This change accounts for the non-sequential execution of some SM commands, such as `BuildSnapshot`, which runs in a separate task. - Remove `IOState.vote`: The `IOState.vote` field has been removed. The system now utilizes `IOState.io_progress.flushed().voted_ref()` for tracking vote operations.

…d.md - Fix: databendlabs#1168

- Fix: databendlabs#1175

- Fix: databendlabs#124

This commit refines the error handling within Openraft by eliminating obsolete error variants previously used in defensive assertions. The transition to using `validit::Valid` has made these variants redundant. - **Removal of `Violation` and `DefensiveError`:** These components are no longer necessary and have been removed from the codebase. - **Refactoring of `StorageError`:** The `enum StorageError { Defensive:.., IO:... }` has been replaced with `StorageIOError`. Consequently, `StorageError` is now a type alias of `StorageIOError`. - **Deletion of `RaftLogReaderExt::get_log_entries()`:** This method, which relied on the removed `DefensiveError`, has been removed. Upgrade tips: - Replace any usage of `StorageError::IO{ source: StorageIOError }` with the simplified `StorageError`. Stop using `StorageIOError`. - Substitute calls to `RaftLogReaderExt::get_log_entries()` with `RaftLogReader::try_get_log_entries()`.

Whne a `StorageError` occurs locally during replication, the `RaftCore` has to be shutdown at once. in this case, just send a generic notification `Notification::StorageError`. There is no need for `replication::Response` to contain `StorageError`.

`#[test_harness::test(harness = func)]` builds the harness for test function in a more structural way.

…move existing learners

The unaffected Learner won't be removed during membership change. - Discussion: databendlabs#1191

…, LogId)`

`RaftMetrics::last_quorum_acked` is the absolute timestamp of the most recent time point that is accepted by a quorum via `AppendEntries` RPC. This field is a wrapped `Instant` type: `SerdeInstant` which support serde for `Instant`. This field is added as a replacement of `millis_since_quorum_ack`, which is a relative time. `SerdeInstant` serialize `Instant` into a string formatted as "%Y-%m-%dT%H:%M:%S%.9f%z", e.g., "2024-07-24T04:07:32.567025000+0000". Note: Serialization and deserialization are not perfectly accurate and can be indeterministic, resulting in minor variations each time. These deviations(could be smaller or greater) are typically less than a microsecond (10^-6 seconds).

coderabbitai · 2024-07-24T08:46:34Z

Important

Review skipped

More than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review.

201 files out of 300 files are above the max files limit of 50. Please upgrade to Pro plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

mergify · 2024-07-24T08:48:51Z

⚠️ The sha of the head commit of this PR conflicts with #94. Mergify cannot evaluate rules on this PR. ⚠️

drmingdrmer and others added 30 commits May 1, 2024 10:20

Chore: fix typos

55cd84c

Fix: Chunked should reset offset 0 when a SnapshotMismatch error …

946dc3f

…is received When a `SnapshotMismatch` is received, the sending end should re-send all snapshot data from the beginning.

Test: Add snapshotting test to Suite; Avoid race conditions in `raft-…

8245f84

…kv-memstore` example (databendlabs#1121) - Avoid race conditions in `raft-kv-memstore` example - Add Storage tests for `raft-kv-memstore` - Test: add snapshotting test to Suite

Doc: Add link to Raft::trigger() for doc of Trigger, disable appe…

9e786de

…arence order in doc page (databendlabs#1122)

Improve: cancel passed to full_snapshot() should be static (dat…

6e178c4

…abendlabs#1120)

Doc: Add docs::components::state_machine

94600eb

Fixdoc: RemoveNodes -> add_leaner in dynamic-membership

9320022

Chore: adjust the comment of VecProgress::update_with method

77349f5

Signed-off-by: Phoeniix Zhao <[email protected]>

Chore: add initialize multi nodes cluster for raft-kv-memstore

39980cb

Signed-off-by: Phoeniix Zhao <[email protected]>

Doc: ensure connection to the correct node

d752e92

Chore: extend sleep time for rafk-kv-rocksdb example

2f1d57f

Refactor: remove some duplicated code in append_memvership_test.rs

86a338f

Chore: remove unused field Engine.snapshot_streaming

6cbfdc6

Chore: disable inline attribute

a977378

Chore: add since to new methods

7cef53a

Refactor: use RaftTypeConfig to replace NID

c6004cd

- `ReplicationSessionId<NID>` to `ReplicationSessionId<C>` - `LeaderQuorumSet<NID>` to `LeaderQuorumSet<C>` - `Condition<NID>` to `Condition<C>` - `ProgressEntry<NID>` to `ProgressEntry<C>`

Refactor: use RaftTypeConfig to replace NID

e486124

LogIdRange Inflight LogStateReader Accepted

Refactor: rename Command::DeleteConflictLog to TruncateLog

52f83c8

drmingdrmer and others added 27 commits July 8, 2024 14:59

Refactor: move async-runtime to dir type_config/async_runtime/

b51cb8d

Feature: AsyncRuntime::MpscUnbounded to abstract mpsc channels

f1792dc

- Part of: databendlabs#1013

Dep: Require tokio 1.22 for WeakUnboundedSender

25d9c9d

Refactor: Abstract the watch channel into a traits (databendlabs#1167)

1b4ec53

* Refactor: Abstract the watch channel into a traits

Fixdoc: cluster benchmark README.md

593b1f6

Change: Openraft does not require Clone for WatchSender

78a1d48

- `WatchSender` is added in: 1b4ec53

Refactor: rename notify to notification

3ae6b4b

Doc: explain default types used by RaftTypeConfig in getting-starte…

d9580ea

…d.md - Fix: databendlabs#1168

Doc: in FAQ: add: RaftServerMetrics to subscribe server state change

0117ce5

- Fix: databendlabs#1175

Doc: add algorithm AppendEntries uses to local last matching log id

c323974

- Fix: databendlabs#124

Feature: store heartbeat metric in RaftMetrics and RaftDataMetrics

d857a85

Test: allow deprecated std::panic::PanicInfo in example tests

fff880a

Chore: mark StorageIOError as deprecated

55f4e5a

Refactor: use test_harness to build unittest harness

4318f4e

`#[test_harness::test(harness = func)]` builds the harness for test function in a more structural way.

Doc: mention is_initialized() in initialize()'s doc

bd99077

Chore: add test to ensure change_membership(.., retain=false) wont re…

4bd821d

…move existing learners

Doc: fix description of retain in change_membership()

9ec4ae1

The unaffected Learner won't be removed during membership change. - Discussion: databendlabs#1191

Doc: Add example explaining why LogIOId must be `(CommittedLeaderId…

ab96d1a

…, LogId)`

Refactor: remove unused mod timer

ec284cb

drmingdrmer closed this Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

121 last quorum acked #93

121 last quorum acked #93

drmingdrmer commented Jul 24, 2024

coderabbitai bot commented Jul 24, 2024

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

mergify bot commented Jul 24, 2024

121 last quorum acked #93

121 last quorum acked #93

Conversation

drmingdrmer commented Jul 24, 2024

coderabbitai bot commented Jul 24, 2024

Review skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

mergify bot commented Jul 24, 2024

CodeRabbit Configuration File (`.coderabbit.yaml`)