Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards trace validation of real CCF nodes #5794

Closed
achamayou opened this issue Oct 27, 2023 · 6 comments
Closed

Towards trace validation of real CCF nodes #5794

achamayou opened this issue Oct 27, 2023 · 6 comments
Labels
tla TLA+ specifications

Comments

@achamayou
Copy link
Member

achamayou commented Oct 27, 2023

This is taking a step back from #5769, to document the existing gap between the raft driver, and actual nodes, with the goal of closing it.

As of af21407, using jq .msg -c to remove the less important fields such as timestamps etc.

A three node network startup, in the driver

{"committable_indices":[],"configurations":[],"function":"add_configuration","new_configuration":{"idx":0,"nodes":{"0":{"address":":"},"1":{"address":":"},"2":{"address":":"}},"rid":0},"state":{"commit_idx":0,"current_view":0,"last_idx":0,"leadership_state":"None","membership_state":"Active","new_view_idx":0,"node_id":"0"}}
{"committable_indices":[],"configurations":[],"function":"add_configuration","new_configuration":{"idx":0,"nodes":{"0":{"address":":"},"1":{"address":":"},"2":{"address":":"}},"rid":0},"state":{"commit_idx":0,"current_view":0,"last_idx":0,"leadership_state":"None","membership_state":"Active","new_view_idx":0,"node_id":"1"}}
{"committable_indices":[],"configurations":[],"function":"add_configuration","new_configuration":{"idx":0,"nodes":{"0":{"address":":"},"1":{"address":":"},"2":{"address":":"}},"rid":0},"state":{"commit_idx":0,"current_view":0,"last_idx":0,"leadership_state":"None","membership_state":"Active","new_view_idx":0,"node_id":"2"}}
{"committable_indices":[],"configurations":[{"idx":0,"nodes":{"0":{"address":":"},"1":{"address":":"},"2":{"address":":"}},"rid":0}],"function":"become_candidate","state":{"commit_idx":0,"current_view":1,"last_idx":0,"leadership_state":"Candidate","membership_state":"Active","new_view_idx":0,"node_id":"0"}}
{"committable_indices":[],"function":"send_request_vote","packet":{"last_committable_idx":0,"msg":"raft_request_vote","term":1,"term_of_last_committable_idx":0},"state":{"commit_idx":0,"current_view":1,"last_idx":0,"leadership_state":"Candidate","membership_state":"Active","new_view_idx":0,"node_id":"0"},"to_node_id":"2"}
{"committable_indices":[],"function":"send_request_vote","packet":{"last_committable_idx":0,"msg":"raft_request_vote","term":1,"term_of_last_committable_idx":0},"state":{"commit_idx":0,"current_view":1,"last_idx":0,"leadership_state":"Candidate","membership_state":"Active","new_view_idx":0,"node_id":"0"},"to_node_id":"1"}
{"committable_indices":[],"from_node_id":"0","function":"recv_request_vote","packet":{"last_committable_idx":0,"msg":"raft_request_vote","term":1,"term_of_last_committable_idx":0},"state":{"commit_idx":0,"current_view":0,"last_idx":0,"leadership_state":"None","membership_state":"Active","new_view_idx":0,"node_id":"2"}}
{"committable_indices":[],"configurations":[{"idx":0,"nodes":{"0":{"address":":"},"1":{"address":":"},"2":{"address":":"}},"rid":0}],"function":"become_follower","state":{"commit_idx":0,"current_view":1,"last_idx":0,"leadership_state":"Follower","membership_state":"Active","new_view_idx":0,"node_id":"2"}}
{"committable_indices":[],"from_node_id":"0","function":"recv_request_vote","packet":{"last_committable_idx":0,"msg":"raft_request_vote","term":1,"term_of_last_committable_idx":0},"state":{"commit_idx":0,"current_view":0,"last_idx":0,"leadership_state":"None","membership_state":"Active","new_view_idx":0,"node_id":"1"}}
{"committable_indices":[],"configurations":[{"idx":0,"nodes":{"0":{"address":":"},"1":{"address":":"},"2":{"address":":"}},"rid":0}],"function":"become_follower","state":{"commit_idx":0,"current_view":1,"last_idx":0,"leadership_state":"Follower","membership_state":"Active","new_view_idx":0,"node_id":"1"}}
{"committable_indices":[],"from_node_id":"1","function":"recv_request_vote_response","packet":{"msg":"raft_request_vote_response","term":1,"vote_granted":true},"state":{"commit_idx":0,"current_view":1,"last_idx":0,"leadership_state":"Candidate","membership_state":"Active","new_view_idx":0,"node_id":"0"}}
{"committable_indices":[],"configurations":[{"idx":0,"nodes":{"0":{"address":":"},"1":{"address":":"},"2":{"address":":"}},"rid":0}],"function":"become_leader","state":{"commit_idx":0,"current_view":1,"last_idx":0,"leadership_state":"Leader","membership_state":"Active","new_view_idx":0,"node_id":"0"}}

There are some discrepancies here with the way a CCF network starts, as outlined in the documentation.

  1. Driver nodes have an initial configuration each, containing all their ids. But CCF nodes only ever obtain a configuration through add_configuration, which is always a side-effect of a KV hook, either through replicate (when Leader) or recv_append_entries (when Backup).
  2. An election takes place over this initial configuration, pre-set in all nodes. But a CCF Start node will instead become leader without an election. Since it has no configuration, it wouldn't know who to ask for votes.
  3. The initial view is 0, but that value is used in CCF as a sentinel for no view.
@achamayou
Copy link
Member Author

For comparison, this is a trace extract from the schema test, a one-node end to end test:

{"committable_indices":[],"configurations":[],"function":"become_leader","state":{"commit_idx":0,"current_view":2,"last_idx":0,"leadership_state":"Leader","membership_state":"Active","new_view_idx":0,"node_id":"3489cba4157d54c52836c6db3c7fc4f8515d4d2dabc422304d5607ed24092d84"}}
{"committable_indices":[],"function":"replicate","globally_committable":false,"seqno":1,"state":{"commit_idx":0,"current_view":2,"last_idx":0,"leadership_state":"Leader","membership_state":"Active","new_view_idx":0,"node_id":"3489cba4157d54c52836c6db3c7fc4f8515d4d2dabc422304d5607ed24092d84"},"view":2}
{"committable_indices":[],"configurations":[],"function":"add_configuration","new_configuration":{"idx":1,"nodes":{"3489cba4157d54c52836c6db3c7fc4f8515d4d2dabc422304d5607ed24092d84":{"address":"127.187.23.188:44337"}},"rid":1},"state":{"commit_idx":0,"current_view":2,"last_idx":0,"leadership_state":"Leader","membership_state":"Active","new_view_idx":0,"node_id":"3489cba4157d54c52836c6db3c7fc4f8515d4d2dabc422304d5607ed24092d84"}}
{"committable_indices":[],"function":"replicate","globally_committable":true,"seqno":2,"state":{"commit_idx":0,"current_view":2,"last_idx":1,"leadership_state":"Leader","membership_state":"Active","new_view_idx":0,"node_id":"3489cba4157d54c52836c6db3c7fc4f8515d4d2dabc422304d5607ed24092d84"},"view":2}

@lemmy lemmy added this to the TLA+ Trace Validation milestone Oct 27, 2023
@lemmy lemmy added the tla TLA+ specifications label Oct 27, 2023
@achamayou
Copy link
Member Author

Note from discussion with @lemmy and @eddyashton:

We can simplify the trace spec if we log state before and after actions such as replicate, become_* etc. This is already done in a somewhat ad-hoc way with new_configuration in add_configuration for example, but could be generalised.

@lemmy
Copy link
Contributor

lemmy commented Oct 27, 2023

The JSON key in the "after" action should not be named state' because, during JSON deserialization, state' becomes the key of the TLA+ record.

@eddyashton
Copy link
Member

Summarising some more issues spotted in discussion with @achamayou:

  • committableIndices in the spec is incomplete, because we often log committable_indices in the implementation before we've modified it. It should contain all signatures after the commit_idx, but the spec invariant currently only says all entries are such values, not that this contains all such values.
  • AdvanceCommitIndex always picks the lowest next signature, but there are many cases where we advance commit past multiple signatures. Can create a non-matching scenario by duplicating an emit_signature line.

@achamayou
Copy link
Member Author

Second one has a repro in #5798

@achamayou
Copy link
Member Author

Superseded by #5822.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tla TLA+ specifications
Projects
None yet
Development

No branches or pull requests

3 participants