Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing Diffie-Hellman operations with inverted keys #16

Closed
eaon opened this issue Oct 14, 2023 · 25 comments
Closed

Consider removing Diffie-Hellman operations with inverted keys #16

eaon opened this issue Oct 14, 2023 · 25 comments
Labels
enhancement New feature or request protocol research Issues for tracking protocol research and choices

Comments

@eaon
Copy link
Contributor

eaon commented Oct 14, 2023

It turns out with the exact same protocol flow, and the exact same properties of who knows what when, the "Sealed Assurance of Secret-Key Possession" can be achieved with regular DH operations, no party has to invert keys to arrive at the same functionality. An example implementation is here https://gist.github.com/eaon/aa46e54ae34aaf98cb670fba41c0647c

A quick summary: instead of using the "public key" as the "assurance", which requires information to be removed from a shared secret, the shared secret between three parties itself can act as the assurance.

@eaon
Copy link
Contributor Author

eaon commented Oct 14, 2023

Based on the scheme above, one is able to move the message ID discovery to the client side, leaving the server in the dark about whether a keyholder can open 1 message or 20, which hardens the scheme against correlation attacks by a malicious server.

It involves using the secret shared among three parties to encrypt the message ID, and sharing the resulting ciphertext along with the 2 party shared secret. An example implementation can be found here: https://gist.github.com/eaon/9e0b1f8bebd8cd64f6d625dac20ca590

I believe that this counts as a variation of an Oblivious Transfer, though I'm not certain and would appreciate a cryptographer's input to find a proper name for this.

@lsd-cat
Copy link
Member

lsd-cat commented Oct 14, 2023

Thanks for the contribution, this is actually a breakthrough toward both establishing and verifying the security of the message fetching messaging, and by moving the final step on the client side, we remove the in memory visibility of messages fetched together from the server side, which was the main security concern at the moment.

Here is a text summary of the improved protocol:

Keys recap

Source

long term:

  • SCSK, SCPK = G(KDF(challenge_salt || PW))

ephemeral, per message:

  • MESK, MEPK = G()

Server

ephemeral, per request:

  • RESK, REPK = G()

Journalist

long term:

  • JCSK, JCPK = G()

Protocol

Submission

Source sends:

  • Ciphertext
  • MEPK
  • DH(JCPK, MESK)

Fetching

Server calculates shared secret:

  • K = DH(DH(JCPK, MESK), RESK)

Server sends:

  • DH(MEPK, RESK)
  • C = Enc(K, message_id)

Journalist calculates:

  • K = DH(DH(MEPK, RESK), JCSK)
  • message_id = Dec(K, C)

Journalist can now query the server for individual message_id, without revealing message correlation. I will port this to a branch in this repo as well for testing purposes.

@lsd-cat lsd-cat closed this as completed Oct 14, 2023
@lsd-cat lsd-cat reopened this Oct 14, 2023
@eaon
Copy link
Contributor Author

eaon commented Oct 14, 2023

DH(MESK, JCPK)

Just noticed that on every other DH operation the secret key is on the right hand side except for here. Probably not confusing overall but maybe better to be consistent for the final documentation!

Glad to be of help 😄

@lsd-cat
Copy link
Member

lsd-cat commented Oct 14, 2023

Just noticed that on every other DH operation the secret key is on the right hand side except for here. Probably not confusing overall but maybe better to be consistent for the final documentation!

Glad to be of help 😄

Fixed thanks! I have started developing in the branch https://github.com/freedomofpress/securedrop-poc/tree/improved-challenge-response

I wanted to get rid of our custom ECDH lib wrapper that we modified to use the inverted keys, but it turns out python-ecdsa is not very friendly in doing cascades DH operations.

On a side note, the new protocol removes the need of a lot of things:

  • We do not need to save RESK/PK anymore
  • We do not need challenge_id
  • Clients will make a single request to the /challenge endpoint and never report back
  • As such, as a result the bandwidth change of adding the encrypted message_ids is minimal
  • Server logic for verifying the challenge response is deprecated and client logic for answering a specific challenge id for the server is deprecated

@lsd-cat
Copy link
Member

lsd-cat commented Oct 15, 2023

improved-challenge-response now fully implements this mechanism bidirectionally. It was also very quick to plug'n'play in the architecture!

@TheZ3ro
Copy link
Collaborator

TheZ3ro commented Oct 16, 2023

I really like this approach!

I will add some rationale/view from my prospective:

In the initial challenge-response protocol, the Entities (Source and Journalist) were directly exchanging sensitive data (the encrypted messages). This required the protocol to hide this information to the Server, through the use of inverted keys etc.
In a subsequent iteration we ditched this in favor of basing the challenge on message_ids generated from the Server. This approach is based on the fact that if an Entity knows a given message_id, then it should be able to download the related message (the message is also encrypted with ephemeral keys) and it should be trusted to instruct the Server delete the messages (see #8 and #10).
Now, message_ids are not sensitive (with respect to the Server) so it makes total sense to remove the complexity of the inverted key mechanism and adopt a 3-party group Diffie Hellman as proposed.
This is great since proofs that Diffie Hellman works in 3-party groups are already proven safe.


In @eaon proposal,

an Entity that wants to submit a message:

  1. takes the Recipient public key
  2. adds their per-message private key (in DH fashion) and get a shared key
  3. submit it to the Server (along with their per-message public key and the encrypted message)

an Entity that wants to read a message:

  1. fetch challenges from the Server:
    1. the Server adds per-request private keys (in DH fashion) to every message's shared key. At this point the shared key contains 3 shares (Journalist, Source, and Server)
    2. the Server adds per-request private keys (in DH fashion) to the per-message public key calculating an "attestation".
    3. the Server encrypt each message_id with the shared key
    4. the Server return every message's attestation and the encrypted message_id
  2. the Entity add to every "attestation" their private keys (in DH fashion) recovering the same shared secret that the Server used to encrypt the message_id.
  3. the Entity decrypt the message_id.

As for the name I wouldn't call it Oblivious Transfer since it is used to describe protocols that have some more cryptographic properties that this protocol doesn't provide.

@TheZ3ro
Copy link
Collaborator

TheZ3ro commented Oct 16, 2023

A great news now that inverted keys are not used anymore is that we can start creating formal proofs for the protocol.

I've started modeling it with Verifpal.
Even though it is not best fit since it has many drawbacks and limitation, it has an easy modeling language and it can be used to translate the model to more robust modeling language like ProVerif or Coq.

Here the output diagram:
sd2

And the (heavily commented) model:
https://gist.github.com/TheZ3ro/81270c2c62922c9ba25500d8a2f2d0b3
(I suggest using Verifpal VSCode extension for syntax highlight)

@lsd-cat
Copy link
Member

lsd-cat commented Oct 16, 2023

Thanks a lot for the contribution! It would be very useful to prove the properties we are most unsure about such as those regarding indistinguishability and correlation.

In the meantime, I also did a chart to simulate the full architecture :D

sd

@eaon
Copy link
Contributor Author

eaon commented Oct 16, 2023

Lovely to see this flurry of great activity in response to this!

As for the name I wouldn't call it Oblivious Transfer since it is used to describe protocols that have some more cryptographic properties that this protocol doesn't provide.

I agree that it's not a traditional OT one as there really isn't a choice on the receivers end, but other than that, I find it best maps onto the description of k-n Oblivious transfers. Edit: also relevant: PIR but that talks primarily about 1-out-of-n, which the scheme above is obviously not…

Staying with terminology for a moment, attestation strikes me as an odd choice for the name of the 2-thirds shared-secret that gets sent to the entity requesting a list of possible downloads. Do you have a rationale for using that term? I personally see the 2-thirds shared-secrets (i.e. the one sent by the sender, and the one sent to the receiver) as blinded public keys. On the sender side the blinded public key disguises the recipients identity from the server, and on the receiver side it prevents "passive" enumeration/tracking of particular messages.

@eaon
Copy link
Contributor Author

eaon commented Oct 16, 2023

More terminology nitpicking… I'm not sure if challenge-response is the best way to describe this anymore either: clients can learn message_ids (or even entire messages, if one doesn't care about size of transmission and computation for that much encryption) without having to reveal anything to the server other than their intent to download stuff. It's up to the client whether they decide to actually then request the message with the message_id, which they may not always do in order to make probabilistic identity inference by a malicious server harder.

@lsd-cat
Copy link
Member

lsd-cat commented Oct 16, 2023

I agree, there is no longer a response as message fetching is not interactive anymore. Before updating the documentation and the paper we need to define things better AND be consistent around all the part of the projects (code, docs, paper).

@TheZ3ro
Copy link
Collaborator

TheZ3ro commented Oct 16, 2023

I agree that it totally looks like an OT, especially during Message Retrieval!
There are some other DH-based OT already (by Chou et al and by Lai et al), but personally I'm not enough knowledgeable of the topic to name it, especially in a field where even naming a thing means having certain assumptions and proofs.
Maybe I'm just afraid about doing it wrong. 😄

Same thing about blinded public keys, see section 3 and section 4.1 of draft-irtf-cfrg-signature-key-blinding-01.
I choose attestation as a non-technical term for the information that was used by the retriever to verify the message were intended to them. Thinking about it I agree is not a best fit.

I also agree that now the challenge-response terminology should indeed be changed. This could even cause confusion.

Having said this, I'm down for whatever name we pick 👍

@eaon
Copy link
Contributor Author

eaon commented Oct 16, 2023

Maybe I'm just afraid about doing it wrong. 😄

Same, really. I'll poke some more people who I believe could have good insights, luckily this is compact and "standard" enough to be communicated relatively quickly.

Edit: my current preference of thinking about this is as ECDH and AEAD based Identity-derived-k-out-of-n Oblivious Transfer protocol. A true mouthful but potentially more correct and understandable?

@eaon eaon changed the title Consider removing Diffie-Hellman operations with inverted keys. Consider removing Diffie-Hellman operations with inverted keys Oct 16, 2023
@TheZ3ro
Copy link
Collaborator

TheZ3ro commented Oct 17, 2023

If we really want to name it, what about 3-party Oblivious Transfer ? IMHO that's the peculiarity of this protocol, since there are already (EC)DH and AEAD OT out there, and also considering that we do use AEAD algorithms but we never make use of AD for authentication purposes.

WDYT?

@lsd-cat
Copy link
Member

lsd-cat commented Oct 17, 2023

In the meantime, as part of code and Readme refactoring in light of these changes, I have dropped the challenge/response terminology in the whole source code (Readme is still in progress, it needs a lot of rewriting) and moved to a more generic and fitting "Message fetching protocol". I have also added a demo script that simulates a full server setup and an example of journalists/source submissions/reply flow with multiple parties.

@cfm
Copy link
Member

cfm commented Oct 17, 2023

This is a brilliant and exciting development. Thank you, @eaon, for proposing it; @lsd-cat for implementing it so quickly; and @TheZ3ro for modeling it already! Not being able to model the original protocol was always an unfortunate smell for me, so this is a very encouraging smell indeed. :-)

I'm very interested in the naming question here, which is to say the question of conceptually "What is this?" as opposed to mechanically "How does this work?" I'll think on that too.

@cfm
Copy link
Member

cfm commented Oct 18, 2023

In case useful for anyone else, here are my "notes" from following this discussion and especially #16 (comment), attempting to answer the question, "What secrets are constructed and reconstructed from what?"

https://gist.github.com/cfm/f27e60fa2f2d2a4db899499fd8d29737

@lsd-cat
Copy link
Member

lsd-cat commented Oct 18, 2023

Opened #17. I think it is worth merging as we are progressing towards a more established scheme. Beside the code, I have almost completely rewritten the README, not only to apply the latest development, but also by formalizing more the notation used and make it more consistent with the whitepaper and the charts.

@cfm
Copy link
Member

cfm commented Oct 19, 2023

I've updated https://gist.github.com/cfm/f27e60fa2f2d2a4db899499fd8d29737#file-2-md with the diagram I (at least :-) needed to understand the asymmetry between decryption/recovery of the message ID and decryption/recovery of the message "envelope" retrieved by that ID. Everything prior in that gist is just sketches to get me there. :-)

@cfm
Copy link
Member

cfm commented Oct 20, 2023

Re: naming: Is it possible that the two- and three-party layers of this scheme are worth naming separately, before naming their composition?

The recovery of one or more message IDs from the server is both anonymous and oblivious. The server doesn't know who's requesting the set of message IDs or what subset, if any, any given requester will be able to recover.

The retrieval of the message is anonymous but not oblivious. A specific ciphertext has been requested.

Oblivious disclosure before anonymous retrieval?

@lsd-cat
Copy link
Member

lsd-cat commented Oct 21, 2023

in 13c8a28
I have changed the per-request server ephemeral key to be also unique per-message. The reasoning is that since we dropped the challenge response mechanism, we no longer need to store server per-request ephemeral keys, even temporarily as we can discard them as soon as they are used to compute the partial Group DH. The overhead of generating new keys at the PoC scale is negligible, and we are already doing that for decoy payloads, which when testing are usually the majority.

I think that the security guarantees hold even before, but I think that rotating keys as much as possible, especially if it comes at a low performance cost, should be done to reduce the attack surface. The code change is minimal, as the key generation is just brought inside the per-message loop on the server side. Nothing change on clients side.

Re: naming: Is it possible that the two- and three-party layers of this scheme are worth naming separately, before naming their composition?

The recovery of one or more message IDs from the server is both anonymous and oblivious. The server doesn't know who's requesting the set of message IDs or what subset, if any, any given requester will be able to recover.

The retrieval of the message is anonymous but not oblivious. A specific ciphertext has been requested.

Oblivious disclosure before anonymous retrieval?

I am starting to think that Oblivious Transfer could be appropriate for the message_id.

@cfm
Copy link
Member

cfm commented Oct 26, 2023

@lsd-cat in #16 (comment):

I am starting to think that Oblivious Transfer could be appropriate for the message_id.

It occurred me while I was away that one way of thinking about this application of an oblivious-transfer-like scheme is to prevent enumeration: by making it both (a) constant (MAX_MESSAGES) and (b) opaque. That led me to oblivious enumeration....

@lsd-cat
Copy link
Member

lsd-cat commented Oct 26, 2023

Note that we merged #17 into main.

I do not want both @TheZ3ro and @cfm schemas and model to be lost in this issue, so we should think about merging/adding those to the main documentation as well, even if partial.

@cfm
Copy link
Member

cfm commented Oct 27, 2023

@lsd-cat, in #22 I've proposed adapting the useful iteration of https://gist.github.com/cfm/f27e60fa2f2d2a4db899499fd8d29737 into a supplemental document.

@lsd-cat lsd-cat added protocol research Issues for tracking protocol research and choices enhancement New feature or request labels Dec 22, 2023
@lsd-cat
Copy link
Member

lsd-cat commented Mar 27, 2024

Closing as we have merged the code, ported it to libsodium and merged documentation and diagrams. We also audited the protocol with these changes are are aiming at drafting formal proofs. Kudos to @eaon for the bright discovery and contributing it back.

@lsd-cat lsd-cat closed this as completed Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request protocol research Issues for tracking protocol research and choices
Projects
None yet
Development

No branches or pull requests

4 participants