-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
121 last quorum acked #93
Conversation
…is received When a `SnapshotMismatch` is received, the sending end should re-send all snapshot data from the beginning.
When `Engine::handle_install_full_snapshot()` is called and the provided snapshot is not up-to-date, the snapshot should not be installed, and the response should be sent back immediately. Previously, the method might delay the response unnecessarily, waiting for an installation process that would not proceed. This commit adjusts the logic so that if the snapshot is recognized as outdated, it immediately returns a `None` `Condition`, ensuring the caller is informed straightaway that no installation will occur.
- Remove `generic-snapshot-data`. The snapshot data trait is now consistently bound to `OptionalSend + 'static`. - Introduction of `RaftNetworkV2::full_snapshot()` for transmitting snapshots in one piece. - Retention of the original chunk-based snapshot transmission API in `RaftNetwork::install_snapshot()`. From this commit onwards: - To use the one-piece snapshot transmission, implement `RaftNetworkV2` when creating a `Raft` instance. - To continue using the chunk-based transmission, implement `RaftNetwork`. Compatibility: - `RaftNetworkV2` is automatically implemented for any `RaftNetwork` if `RaftTypeConfig::SnapshotData` implements `tokio::io::AsyncRead`, `tokio::io::AsyncWrite`, `tokio::io::AsyncSeek`, and `Unpin`. Upgrade tip: - Remove the `generic-snapshot-data` feature flag. - If you prefer using the original chunk-based method, no further action is required. Otherwise, to adopt the new one-piece snapshot transmission: - Replace `RaftNetwork` and `RaftNetworkFactory` with `RaftNetworkV2` and `RaftNetworkFactoryV2`.
…kv-memstore` example (databendlabs#1121) - Avoid race conditions in `raft-kv-memstore` example - Add Storage tests for `raft-kv-memstore` - Test: add snapshotting test to Suite
…arence order in doc page (databendlabs#1122)
`LogIOId` is utilized to uniquely identify an append-entries IO globally. It plays a crucial role in the log IO callback, informing `RaftCore` about the latest log that has been successfully flushed to disk. In this commit, `LogIOId` is defined as a tuple comprising `leader_id`, which submits the log IO, and `last_log_id`, which represents the final log ID in the append-entries IO operation. Given that the `leader_id` is monotonically increasing and a leader consistently submits log entries in a sequential order, this design ensures that `LogIOId` is also monotonically increasing. This feature allows for effective tracking of IO progress.
This commit addresses a critical issue where if a new leader does not flush the blank log to disk upon becoming established and then restarts immediately, there is a possibility that previously committed data becomes invisible to readers. Before the blank log is flushed, the leader (identified by vote `v3`) assumes it will be flushed and commits this log once (|quorum|-1) replication responses are received. If the blank log is lost and the server is restarted, data committed by a new leader (vote `v2`) may not be visible. This issue is addressed by utilizing `LeaderHandler::leader_append_entries()` instead of `ReplicationHandler::append_blank_log()`, where the former does not wait for the blank log to flush. Changes: - When assigning log IDs to log entries, the `Leading.last_log_id`, which represents the state of the log proposer (equivalent term in Paxos is Proposer), should be used instead of `RaftState.last_log_id`, which represents the state of the log receiver (equivalent term in Paxos is Acceptor). - Consequently, the method `assign_log_ids()` has been moved from `RaftState` to `Leading`. - Avoid manual implementation of duplicated logic: - During `initialize()`, reuse `FollowingHandler::do_append_entries()` to submit the very first log to storage. - In `establish_leader()`, reuse `LeaderHandler::leader_append_entries()` to submit log to storage and remove `ReplicationHandler::append_blank_log()`. - Remove `Command::AppendEntry`.
In this commit, `LeaderId` has been replaced with `CommittedLeaderId` for identifying log IO requests. This change is made because `CommittedLeaderId` is shorter and sufficiently identifies the necessary attributes since only an established leader (a committed leader) can write or replicate logs. Thus, using `CommittedLeaderId` streamlines the identification process for log IO requests.
Signed-off-by: Phoeniix Zhao <[email protected]>
Signed-off-by: Phoeniix Zhao <[email protected]>
This commit adds the `RaftLogReader::limited_get_log_entries()` method, which enables applications to fetch log entries that are equal to or smaller than a specified range. This functionality is particularly useful for customizing the size of AppendEntries requests at the storage API level. - Applications can now decide the number of log entries to return based on the input range. If the application determines that the requested log entries range is too large for a single RPC, it can opt to return only the first several requested log entries instead of the full range. - The method provides a default implementation that delegates the operation to `RaftLogReader::try_get_log_entries`. This enhancement allows for more flexible and efficient handling of log entries, particularly in scenarios where network constraints or performance considerations require smaller data transfers.
This commit updates `RaftNetworkV2` to treat any remote errors occurring during RPC, like a `Fatal` error, as an `Unreachable` error. This is due to Openraft's current inability to distinguish between an unreachable node and a broken node. Other changes: - **Helper trait `DecomposeResult`:** Introduced to simplify handling composite errors. It converts a result of the form `Result<R, ErrorAOrB>` into a nested result `Result<Result<R, ErrorA>, ErrorB>`. Upgrade tip: - Implementations of `RaftNetworkV2` or `RaftNetwork` should now convert remote errors such as `Raft::Fatal` to `Unreachable` errors to maintain consistency in error handling.
- **Election Update Delay:** - The `Voting` state is now updated only after `SaveVote` is successfully persisted on disk. This anticipates future modifications where `SaveVote` will be callback-based, ensuring state consistency. - **Deprecation of `VoteResponse.vote_granted`:** - Granting a `RequestVote` is now determined solely by whether `VoteResponse.vote` matches the candidate's vote. This simplifies the logic and reduces ambiguity in vote handling. - **Replication Stream Reorganization:** - Replication stream handles have been moved from `RaftCore::LeaderData` to `RaftCore`. This reorganization prepares for supporting `RequestVote` RPCs in the replication mechanism, although it currently continues to support only log and snapshot replication. - **Vote Comparison Enhancement:** - When filtering out stale messages (e.g., when a new Leader receives an `AppendEntries` response intended for a previous Leader), the responded vote is now compared with the vote in the `Leading` state. Previously, it was compared with `RaftCore.engine.raft_state.vote`, which might differ from the `Leading` state's vote in future scenarios. - **Election State Management:** - During election, `Engine.candidate` is `Some` and `Engine.leader` is `None`. Upon election completion, `Engine.candidate` is cleared, and a new `Leader` is assigned to `Engine.leader`. - Renamed `Engine.internal_server_state` to `leader`, as it now only stores a `Leader` instance. The candidate state is moved to `Engine.candidate`. - Moved the process of establishing a Leader to `EstablishHandler`. - A Leader can now be established in two ways: by election or by re-creating the Leader upon startup. - Replace `InternalServerState` with `Option<Box<Leader<...>>>` - **Logging Enhancements:** - Added `DisplaySliceExt` to create a displayable instance for slices of `impl Display`. - Added `DisplayInstant` to display `Instant` in a human-readable format, improving logging readability.
…` response This commit addresses an issue in the vote updating mechanism during the handling of `RequestVote` responses. Previously, encountering a higher `vote` in a response incorrectly led to an update of the local `state.vote`, which could break Raft consistency rules. **Issue Description:** - A higher `vote` seen in a `RequestVote` response does not necessarily mean that the vote is granted. The local `state.vote` should only be updated if the vote is actually granted, i.e., the responding node has a higher `vote` and a `last_log_id` that allows it to become a leader. **Resolution:** - The local `state.vote` will no longer be updated upon merely seeing a higher `vote` in the `RequestVote` response. Instead, this higher vote will be recorded in `last_seen_vote` for consideration in the next election cycle, without updating the current `state.vote`. This bug is introduced in `0.10`: 36d2c11
This commit introduces a new trait, `TypeConfigExt`, which extends `RaftTypeConfig`. The purpose of this trait is to simplify the access to various functionalities provided by the `RaftTypeConfig` trait, enhancing code readability and reducing complexity. **Methods Added to `TypeConfigExt`:** - `now()` - `sleep()` - `sleep_until()` - `timeout()` - `timeout_at()` - `oneshot()` - `spawn()` **Usage Improvement:** - Instead of using the `<<C as RaftTypeConfig>::AsyncRuntime as AsyncRuntime>::Instant::now()`, you can now simply call `C::now()`.
- `ReplicationSessionId<NID>` to `ReplicationSessionId<C>` - `LeaderQuorumSet<NID>` to `LeaderQuorumSet<C>` - `Condition<NID>` to `Condition<C>` - `ProgressEntry<NID>` to `ProgressEntry<C>`
- `StorageError<NID>` to `StorageError<C>` - `StorageIOError<NID>` to `StorageIOError<C>` - `ErrorSubject<NID>` to `ErrorSubject<C>` - `DefensiveError<NID>` to `DefensiveError<C>` - `Violation<NID>` to `Violation<C>` - `SnapshotSignature<NID>` to `SnapshotSignature<C>` - `ChangeMembers<NID, N>` to `ChangeMembers<C>` - `LogIdList<NID>` to `LogIdList<C>` Upgrade tip: Replace `StorageError<NID>` with `StorageError<C>`, such as: ```diff - async fn apply<I>(&mut self, entries: I) -> Result<Vec<Response>, StorageError<NodeId>> + async fn apply<I>(&mut self, entries: I) -> Result<Vec<Response>, StorageError<TypeConfig>> ``` Types to replace: - `StorageError<NID>` to `StorageError<C>` - `StorageIOError<NID>` to `StorageIOError<C>` - `ErrorSubject<NID>` to `ErrorSubject<C>` - `DefensiveError<NID>` to `DefensiveError<C>` - `Violation<NID>` to `Violation<C>` - `SnapshotSignature<NID>` to `SnapshotSignature<C>` - `ChangeMembers<NID, N>` to `ChangeMembers<C>` - `LogIdList<NID>` to `LogIdList<C>`
This commit introduces a new type, `CommittedVote`, which represents a committed vote. This enhancement is aimed at ensuring that only committed votes are considered valid for the leader's status. By using the `CommittedVote` type, we can eliminate several runtime assertion checks previously required to verify the commitment status of votes.
LogIdRange Inflight LogStateReader Accepted
When a replication response is handled, RaftCore should initiate next replication if there are more logs to to replicate. Before this commit this is done in response-handler. In this commit it is moved to `RaftCore::runtime_loop`: replications are re-initiated when all of the events are processed. This way the replication response handler is simplified.
This commit enhances the initialization process of a `Leader` instance by automating the setup of progress information and replication streams. Previously, these tasks were manually performed by the caller after the leader's creation, which could lead to inconsistencies or additional complexity.
* Refactor: Abstract the watch channel into a traits
Since this commit, `RaftCore` returns at once upon submitting ApendEntries IO request to `RaftLogStorage`, without waiting for the IO to be flushed to disk. When flushed, the result is responded to `RaftCore` via a `Notify` channel. This way `RaftCore` won't be blocked by AppendEntries IO operation: while entries being flushing to disk, `RaftCore` is still able to deal with other operations. Upgrade(non-breaking) tip: - Deprecated `LogFlushed`, use `IOFlushed` instead. - Deprecated `LogFlushed::log_io_completed()`, use `IOId::io_completed()` instead.
- `WatchSender` is added in: 1b4ec53
This commit refines our logging strategy by utilizing the `Display` trait instead of `Debug` for significant events. This change is aimed at producing logs that are easier to read and understand. Three kinds of significant events are logged at DEBUG level: - `input`: the `RaftMsg`s received by `RaftCore`, such as client-write or AppendEntries request from the Leader. - `cmd`: the `Command` outputted by `Engine` to execute by storage or network layer, such as `AppendInputEntries` or `ReplicateCommitted`. - `notify`: the `Notification`s received by `RaftCore` from storage or network. Example significant event logs: ``` RAFT_event id=0 cmd: Commit: seq: 5, (T1-N0.4, T1-N0.5] RAFT_event id=1 input: AppendEntries: vote=<T1-N0:Q>, prev_log_id=T1-N0.5, leader_commit=T1-N0.5, entries=[] RAFT_event id=0 notify: sm::Result(command_seq:5, Ok(ApplyResult([5, 6), last_applied=T1-N0.5, entries=[T1-N0.5]))) RAFT_event id=0 input: ClientWriteRequest ```
This commit addresses the premature sending of `Respond` messages in response to `AppendEntries` RPCs and similar requests. Previously, responses could be sent before the associated IO operations were confirmed as complete, potentially leading to inconsistencies. Changes: - The `Respond` now blocks until the corresponding IO operation is completed and properly recorded in `IOProgress`. - `RequestVote` and `AppendEntries` RPC handlers now include a `Condition` that remains unsatisfied until operations like `SaveVote` or `AppendInputEntries` are flushed to disk. - Add `IOProgress`: - **`accepted`:** Tracks the ID of the last IO operation that was accepted but not yet submitted. - **`submitted`:** The ID of the last IO operation submitted to either `RaftLogStorage` or `RaftStateMachine`. - **`flushed`:** The ID of the last IO operation successfully flushed to storage. - Remove `command_seq`: Instead of using `command_seq` to track the completion of state machine commands, this update uses `IOState.applied` and `IOState.snapshot`. This change accounts for the non-sequential execution of some SM commands, such as `BuildSnapshot`, which runs in a separate task. - Remove `IOState.vote`: The `IOState.vote` field has been removed. The system now utilizes `IOState.io_progress.flushed().voted_ref()` for tracking vote operations.
This commit refines the error handling within Openraft by eliminating obsolete error variants previously used in defensive assertions. The transition to using `validit::Valid` has made these variants redundant. - **Removal of `Violation` and `DefensiveError`:** These components are no longer necessary and have been removed from the codebase. - **Refactoring of `StorageError`:** The `enum StorageError { Defensive:.., IO:... }` has been replaced with `StorageIOError`. Consequently, `StorageError` is now a type alias of `StorageIOError`. - **Deletion of `RaftLogReaderExt::get_log_entries()`:** This method, which relied on the removed `DefensiveError`, has been removed. Upgrade tips: - Replace any usage of `StorageError::IO{ source: StorageIOError }` with the simplified `StorageError`. Stop using `StorageIOError`. - Substitute calls to `RaftLogReaderExt::get_log_entries()` with `RaftLogReader::try_get_log_entries()`.
Whne a `StorageError` occurs locally during replication, the `RaftCore` has to be shutdown at once. in this case, just send a generic notification `Notification::StorageError`. There is no need for `replication::Response` to contain `StorageError`.
`#[test_harness::test(harness = func)]` builds the harness for test function in a more structural way.
…move existing learners
The unaffected Learner won't be removed during membership change. - Discussion: databendlabs#1191
`RaftMetrics::last_quorum_acked` is the absolute timestamp of the most recent time point that is accepted by a quorum via `AppendEntries` RPC. This field is a wrapped `Instant` type: `SerdeInstant` which support serde for `Instant`. This field is added as a replacement of `millis_since_quorum_ack`, which is a relative time. `SerdeInstant` serialize `Instant` into a string formatted as "%Y-%m-%dT%H:%M:%S%.9f%z", e.g., "2024-07-24T04:07:32.567025000+0000". Note: Serialization and deserialization are not perfectly accurate and can be indeterministic, resulting in minor variations each time. These deviations(could be smaller or greater) are typically less than a microsecond (10^-6 seconds).
Important Review skippedMore than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review. 201 files out of 300 files are above the max files limit of 50. Please upgrade to Pro plan to get higher limits. You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
|
Checklist