-
Notifications
You must be signed in to change notification settings - Fork 32
PoET 2.0 Consensus - Updated #20
base: main
Are you sure you want to change the base?
Conversation
Proposes an new PoET Consensus mechanism designed to provide the PoET functionality without requiring SGX Platform Services. Signed-off-by: kulkarniamol <[email protected]>
Signed-off-by: kulkarniamol <[email protected]>
text/0002-poet2-consensus.md
Outdated
infrastructure. | ||
|
||
This document details a new mechanism for the PoET algorithm that overcomes | ||
some of the challenges with the original algorithm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest a wording more like "extends the original algorithm to new platforms"
text/0002-poet2-consensus.md
Outdated
|
||
Sawtooth includes an implementation which simulates the secure instructions. | ||
This should make it easier for the community to work with the software but | ||
also forgoes Byzantine fault tolerance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking... it does give you BFT. The problem is that it is trivially easy to "compromise" nodes (so the 3f+1/2f+1 guarantees are easy to violate).
This should make it easier for the community to work with the software but | ||
also forgoes Byzantine fault tolerance. | ||
|
||
PoET 2.0 essentially works as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest that you link to the full description of PoET v1.
|
||
+ The `WaitCertificate` contains a `Duration` as well as a related `WaitTime`. | ||
The `Duration` is a 256-bit random number generated using the secure | ||
RNG available within the SGX. The `WaitTime` is derived from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to compute the WaitTime in the enclave. Since the wait time is essentially meaningless to the enclave. All the other validators can do the conversion from the random number/duration into a time. That would allow you to just call Duration what it really is... a random number.
+ On the originating validator, the `WaitTime` is used to throttle broadcast of | ||
claim blocks. Upon creating the `WaitCertificate`, the validator waits | ||
until `WaitTime` seconds have elapsed before broadcasting the block over | ||
the gossip network |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that you describe how the wait time is computed from the random number. The computation is more or less the same as the computation from PoET v1.
double WaitTime # The number of seconds to wait, as a function of the | ||
# Duration and the LocalMean | ||
double LocalMean # The computed local mean | ||
byte[32] BlockID # The BlockID passed in to the Enclave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments above. this information is redundant.
text/0002-poet2-consensus.md
Outdated
>The implication of this change is that the signup data is lost each time the | ||
>enclave is unloaded or the platform is restarted. The enclave has to register | ||
>afresh with a new set of keys each time it gets loaded. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm confused. poet 1 doesn't store signup data, it stores the identity of the monotonic counter. we regenerate data on reboot because the protocol REQUIRES that the enclave re-register, not because there is no sealed storage. Sealed storage without a monotonic counter cannot prevent replay attacks (just copy an old version of sealed storage into place if you want to have multiple signups for the processor).
current `WaitCertId_{n}` to upper layers for registration in | ||
EndPoint registry. | ||
* Goto 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be missing the two delays that are necessary. The first delay is the time between registration & use of the registration. The second is the delay between subsequent registrations for the processor.
text/0002-poet2-consensus.md
Outdated
>Note 1: In practice, the WC may be calculated by recording the system time at | ||
>the moment of the arrival of the Sync Block and subsequently subtracting this | ||
>timestamp from the current time. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably worth receiving several blocks and computing an average over those blocks to ensure that you don't favor a low-latency neighbor or a cheating neighbor.
text/0002-poet2-consensus.md
Outdated
|
||
An early arriving block (where `WC < CC'`) is considered 'Ineligible'. The block | ||
is cached for `CC' - WC` seconds until it becomes 'Eligible'. It is then | ||
broadcast to neighboring peers over the Gossip Network. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will be broadcast assuming that another, better claim is not received before the timer expires.
…p mechanism. Signed-off-by: kulkarniamol <[email protected]>
mechanism. Signed-off-by: kulkarniamol <[email protected]>
>subtracting this timestamp from the current time (`WC = CurrentTime - BaseTime`). | ||
|
||
>Note 2: Notice that the CC is a function of the WaitTime, which is computed within | ||
>the enclave. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important that it's computed within the enclave. Only enclave trusted function is the RNG. If we compute WaitTime in the enclave it is for convenience / readability of having all the logic in one method.
EndPoint registry (otherwise sender needs to re-sign). | ||
|
||
4. Verify the `WaitCertificate.LocalMean` is correct by comparing | ||
against `LocalMean` computed locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify wait time similarly
I have a question about this fork resolution process. Can someone explain why we only need to compare chain lengths (rather than check the total number of "work" done on the chain--i.e., incorporate the population estimate)? Here's my thought process: suppose I controlled a small group of nodes. We could could fork off everything to a side chain and then wait for the "difficulty" to go down. In the steady state, due to the population adjustment, we would be adding blocks at the same rate as the main chain. We could let this go on for a while--maybe we would get lucky and accumulate blocks at a slightly (sublinear) rate faster than the main chain. Then, at some point, we could add in a bunch of new members and use these to add much more blocks than expected and try to catch up to the main chain. It's possible there's some mathematical reasoning that says this is impossible, but it's not immediate (and needs to be written up if this is becomes the spec). Does anyone have an explanation for why this doesn't work? Am I misunderstanding something here? Thanks! |
Working through the logic we developed...
First, every block should contain the raw random number that was generated
in the enclave. NOTE... This is different from PoETv1 where we put the
computed duration in the block. We can put the random number in the block
since all conversion and enforcement of the wait happens in untrusted space
anyway.
Second, just a reminder that there must be a maximum fork length for other
reasons mostly related to how quickly enclaves can be registered. Among
other things, this also prevents "from the beginning of time" forks.
Let me just define sum(C, i, j) as the sum of the random numbers in chain C
between blocks i and j.
And lets just call our chains C1 and C2. L1 = length(C1) and L2 =
length(C2). To be clear... these are *VALID* chains which means that the
chain clock through the chain is less than the wall clock. For now use
CC(C1) is chain clock of C1 and WC(C1) is wall clock for C1. For a valid
chain CC(C) < WC(C).
Easy case #1: if L1 == L2, then we prefer the chain with the smaller sum.
For example, if sum(C1,0,L1) < sum(C2,0,L2) then we pick C1. This is
because the distribution of the random numbers is directly proportional to
the population size, it is sufficient to assume that the chain with the
smaller sum of random numbers was generated by a larger population. You do
not need to worry about population adjustments since the random numbers are
not adjusted for the population. Note... this basic premise kills most of
the the partitioned community attacks.
Now... assume that L1 < L2.
Easy case #2: if sum(C1, 0, L1) > sum(C2, 0, L1), then again we have an
obvious result (pick C2). That is, up through block L1 (the last in chain
C1), chain C2 has a lower sum of random numbers, then clearly C2 represents
a larger population for a longer time.
Easy case #3: if sum(C1, 0, L1) < sum(C2, 0, L1) and CC(C1) > CC(C2) then
again we have a fairly obvious result (pick C1). The chain clock represents
all of the population adjustments that were made. So C1 over a given time
block, C1 represents the larger population. Again... this should take care
of the rest of the partitioned community attacks.
Hard case: what happens if sum(C1, 0, L1) < sum(C2, 0, L1) and CC(C1) <
CC(C2)? That is, the shorter chain C1 represents (up to block L1, the
length of the shorter chain) a larger population, but C2 covers a larger
time block (i.e. it is more recent). Some things to consider... 1) how far
is the shorter chain's chain clock behind the wall clock? if the shorter
chain's chain clock is close to wall clock (based on expected time of
arrival of the next block), then you should probably pick C1 (the next
block will likely move it back into easy case #3). if the shorter chain's
chain clock is a long way behind the wall clock (or a long way behind
CC(C2), then the shorter chain is likely partitioned (and out of contact or
abandoned). At this point... any reasonable deterministic policy is
probably sufficient.
…On Tue, Sep 4, 2018 at 2:30 PM hartm ***@***.***> wrote:
I have a question about this fork resolution process. Can someone explain
why we only need to compare chain lengths (rather than check the total
number of "work" done on the chain--i.e., incorporate the population
estimate)?
Here's my thought process: suppose I controlled a small group of nodes. We
could could fork off everything to a side chain and then wait for the
"difficulty" to go down. In the steady state, due to the population
adjustment, we would be adding blocks at the same rate as the main chain.
We could let this go on for a while--maybe we would get lucky and
accumulate blocks at a slightly (sublinear) rate faster than the main
chain. Then, at some point, we could add in a bunch of new members and use
these to add much more blocks than expected and try to catch up to the main
chain. It's possible there's some mathematical reasoning that says this is
impossible, but it's not immediate (and needs to be written up if this is
becomes the spec).
Does anyone have an explanation for why this doesn't work? Am I
misunderstanding something here? Thanks!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbEUIGbuy7GDBe63yYDN0bjeospqHOqks5uXvDrgaJpZM4VJxU1>
.
|
Thanks for the in-depth explanation Mic. I guess my point is the following: since you're storing the randomness of the block winner in the chain, you can compute the population estimate trivially. Consider the following formula: W(chain, block_start, block_end) = \sum_{i = block_start}^{block_end} Population_Estimate(block_i). The function W can be computed (with the current and proposed estimators) solely from the raw random numbers in the enclave, and is probably the most direct measure of "work" on a blockchain using PoET that we can possibly manage. Now consider the following rule for deciding which branch of a fork to take. Suppose we have two branches: branch_1 and branch_2, with b_1 and b_2 blocks each, respectively. Let block number b* be the last block that both have in common. We choose branch_1 if: W(branch_1, b* + 1, b* + b_1) > W(branch_2, b* + 1, b* + b_2) and branch_2 otherwise (we can break ties based on equality in some deterministic manner, say based on the randomness of the most recent block). Note that this functionality exactly agrees with your analysis above and simplifies it considerably, eliminating the need for a case-by-case analysis. Is there a reason something like this doesn't work? It seems like a much simpler rule than what you're proposing, and leads us nicely in the direction of provable security (which I care about, obviously). Thanks! |
efficiency. Existing system clock synchronization mechanisms like NTP | ||
etc. may be sufficient for PoET 2.0 requirements. | ||
|
||
Network latencies may be exploited by malicious nodes to broadcast blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the points in this section sound like they could be moved to/or repeated in the currently empty drawbacks section.
Proposes an new PoET Consensus mechanism designed to provide the
PoET functionality without requiring SGX Platform Services.
Deprecates PR#12.
Signed-off-by: kulkarniamol [email protected]