Paxos is a distributed consensus algorithm with a rather nice correctness proof of its safety property: the decisions made by a Paxos cluster will always be consistent even in the face of arbitrary network partitions or delays (although if the network goes down it will stop making progress).
The file Paxos.thy contains an Isabelle-checked proof of the safety property of Paxos and related theorems. This document is a less formal commentary on the proofs.
First we prove a couple of facts about quorums as defined by weighted majority: if the weighting function is integer then one of its values can be changed by one unit and every quorum of the new function intersects every quorum of the old function, and the weights can all be multiplied or divided by constants without changing the quorum structure.
This will be useful later when looking at managing the topology of a Paxos cluster using consensus within the cluster, as if the topology changes sufficiently slowly then this can be achieved without having to re-run the whole algorithm at each change.
Next we define the notion of a sequence of topologies, derived from a sequence of values as agreed by a Paxos cluster. Each topology must consist of pairwise-intersecting, finite, nonempty sets called 'quorums', and moreover quorums from adjacent pairs of topologies must all pairwise intersect too.
Next we prove (following Lamport's method) that any single instance of Paxos remains consistent, by describing an invariant and showing that this invariant implies consistency.
Let
pid
be the type of proposal identifiers,acc
the type of acceptors,val
the type of values of proposals,<
be an ordering onpid
which is transitive, wellfounded and total,QP
be a nonempty collection of finite subsets ofacc
,QL
be a functionpid -> Powerset(acc)
such that for every proposalp
every member ofQP
intersects every member ofQL p
,promised_free
be a binary relation onacc * pid
,promised_prev
be a ternary relation onacc * pid * pid
,accepted
be a binary relation onacc * pid
, andproposed
andchosen
be sets of proposal identifiers,v
be a functionpid -> val
, andvP
andvA
be functionsacc -> pid -> val
.
Intuitively, QP
and QL p
are the collections of quorums from the Proposers' and Learners' points of view respectively. (QP
and QL p
are typically equal but are permitted to vary like this to support changing the membership of the group of acceptors whilst remaining consistent). The function v
gives the values of proposals which have been proposed, and vP
(resp. vA
) gives the values of proposals that each acceptor has promised (resp. accepted). The predicates promised_free
, promised_prev
, proposed
, accepted
and chosen
represent the messages that have been sent (but not necessarily received or processed by all other agents):
promised_free a p
indicates that acceptora
promises to accept no proposals numbered belowp
, and that it has not yet accepted any proposal.promised_prev a p p'
indicates that acceptora
promises to accept no proposals numbered belowp
, and that it accepted proposalp'
but has accepted no later one.proposed p
indicates that proposalp
has been made by some proposer.accepted a p
indicates that acceptora
accepts proposalp
.chosen p
indicates that proposalp
has been chosen by some learner.
Given the following invariants:
- If
chosen p
then there exists a quorumS ∈ QL p
such that ifa ∈ S
thenaccepted a p
. - If
proposed p
then there exists a quorumS ∈ QP
such that ifa ∈ S
thenpromised_free a p
orpromised_prev a p _
, and eitherpromised_free a p Nothing
for alla ∈ S
, or- for each
a1 ∈ S
withpromised_prev a1 p p1
, eitherv p == vP a1 p
, or- there is an
a2 ∈ S
withpromised_prev a2 p p2
andp1 < p2
.
- If
accepted a p
thenproposed p
. - If
accepted a p
thenv p == vA a p
. - If
promised_prev a p0 p1
thenaccepted a p1
. - If
promised_prev a p0 p1
thenp1 < p0
. - If
promised_prev a p0 p1
andaccepted a p2
andp2 < p0
then eitherp1 == p2
andvP p0 == vA p1
, orp2 < p1
- If
promised_free a p0
andaccepted a p1
thenp0 < p1
orp0 == p1
.
Then any two chosen propositions p1
and p2
have v p1 == v p2
. This is theorem paxos_consistent
in the theory file.
This is the heart of the safety proof for Paxos. The predicates promised_free
, promised_prev
, proposed
and accepted
correspond to messages that may be sent. Pretty remarkable.
Interestingly, there's no need to consider the passage of time or similar, which is how it remains safe even given arbitrary message loss, delay, and reordering. The model is in terms of some kind of platonic whole-system state - no single agent can know the whole state.
Of course, other messages can be sent without affecting this safety proof - in particular, you need to pass the actual values around somehow, and hold leadership elections and send negative acknowledgements and so on, and actually get the acceptors to promise things by sending the initial prepare
message which is also notably absent here.
Next we extend our attention to a sequence of Paxos instances, including the ability for each instance to alter
the topology of subsequent instances. The multi-Paxos invariant is quite similar to the single-Paxos one with an extra
parameter i
indicating the instance number. Proposals identifiers now carry a topology version number, and instances
have a version number which is defined by the sequence of values agreed by previous instances. Topology versions
must be increasing in the proposal identifiers and instance number respectively.
There are also two properties, proposed_topology
and chosen_topology
, relating the topology version number of proposals
to the topology version of each instance: when making a proposal the topology version number must be no
greater than the instance's topology version and when choosing a proposal the proposal must
have a topology version no less than the instance's topology version minus one. That 'minus one' gives the extra wiggle room
needed to allow proposals to be chosen with a slightly-old topology. The condition on each topology to intersect all of
its subsequent topology is used to show that consistency is preserved even given this extra flexibility.
Notice that, because instance topology versions are increasing, proposers can safely continue to make proposals even if they have not yet learned that a new topology version has been chosen. Of course if they get too far out-of-date then their proposals will never be chosen.
However, it is important when choosing a proposal to know the instance's topology version,
so condition chosen_Suc
ensures that an instance cannot have a value chosen until the previous
instance has been chosen (and hence all prior instances have too).
There is one extra message type, multi_promised
, which is effectively a promised_free
message for all future instances simultaneously.
Consistency follows from these invariants by showing that each instance satisfies the single-Paxos invariant.
The remainder of the file shows that the empty model (containing no messages) satisfies the multi-Paxos invariant and shows some conditions under which an invariant-satisfying model can be modified (by sending a message or updating one of the value functions) into another invariant-satisfying model. These conditions are designed to be straightforward to verify in an implementation.
See lemmas multiPaxos_change_value_proposed
, multiPaxos_add_proposal_free
and multiPaxos_add_proposal_constrained
.
- If there's a quorum of acceptors
S ∈ QP
that all promised to accept a proposalp
with no prior value (i.e.promised_free a p
for alla ∈ S
) thenp
can be proposed. - If there's a quorum of acceptors
S ∈ QP
that all promised to accept a proposalp
but some of them sent a prior value (i.e.promised_free a p
orpromised_prev a p
for alla ∈ S
andpromised_prev a p p'
for somea ∈ S
) thenp
can be proposed as long as its value matches the value of the response with the highest identifier. - The value of a proposal can be freely changed as long as it has not already been proposed.
See lemmas multiPaxos_change_value_promised
, multiPaxos_change_value_accepted
,
multiPaxos_add_promise_free
, multiPaxos_add_multi_promise
and multiPaxos_add_promise_prev
.
- If an acceptor has accepted no proposals then it may promise to accept anything, with no prior value.
- If an acceptor has accepted some proposals then it may promise to accept any later proposals, as long as it includes the identifier of the highest proposal it has previously accepted and as long as the values
vP
andvA
agree where promises are made. - An acceptor may accept a proposal as long as it has been proposed, and it hasn't promised to accept later proposals, and the value of
vA
agrees with the value of the proposal itself. - An acceptor may freely update
vP
for any proposals for which it has not sent a promise. - An acceptor may freely update
vA
for any proposals which it has not accepted.
See lemma multiPaxos_add_choice
.
- If there's a quorum of acceptors
S ∈ QL p
that all accept a proposalp
then that proposal may be chosen. QL p
can be freely changed as long as its elements all remain quorums; in particular- extra quorums may be added to
QL p
, and - if
p
can never be chosen thenQL p
can be changed freely.