From b2be69132e6d38b4ff910d3b91a66abb5c5fa63b Mon Sep 17 00:00:00 2001 From: Fredrik Dahlgren Date: Wed, 7 Feb 2024 17:38:06 +0100 Subject: [PATCH 1/6] Added initial content on formal verification and Tamarin --- content/_index.md | 8 +- content/docs/formal-verification/_index.md | 74 +++++++++ .../tamarin/00-installation.md | 151 ++++++++++++++++++ .../tamarin/10-rule-based-models.md | 52 ++++++ .../formal-verification/tamarin/_index.md | 35 ++++ 5 files changed, 319 insertions(+), 1 deletion(-) create mode 100644 content/docs/formal-verification/_index.md create mode 100644 content/docs/formal-verification/tamarin/00-installation.md create mode 100644 content/docs/formal-verification/tamarin/10-rule-based-models.md create mode 100644 content/docs/formal-verification/tamarin/_index.md diff --git a/content/_index.md b/content/_index.md index a92f79f..3e31057 100644 --- a/content/_index.md +++ b/content/_index.md @@ -67,6 +67,12 @@ We currently cover the following tools: - Fuzzing. _Coming soon!_ +<---> + +#### Formal verification + +- [Tamarin]({{< relref "tamarin" >}}) + {{< /columns >}} We are working on expanding the tools we cover here. We are also planning to @@ -107,4 +113,4 @@ codeql database analyze codeql.db --format=sarif-latest --output=results.sarif - We want to actively maintain the highest possible quality and expand the content of Testing Handbook. If you see a way to improve the Testing Handbook, please let us know! The best way to let us know is by raising an issue directly on the [Testing Handbook GitHub page](https://github.com/trailofbits/testing-handbook). - \ No newline at end of file + diff --git a/content/docs/formal-verification/_index.md b/content/docs/formal-verification/_index.md new file mode 100644 index 0000000..19de28c --- /dev/null +++ b/content/docs/formal-verification/_index.md @@ -0,0 +1,74 @@ +--- +weight: 2 +bookFlatSection: true +title: "Formal verification" +--- + +# Formal verification + +This section presents a number of tools that can be used to formally verify +cryptographic protocols. It currently covers the symbolic protocol verifier +Tamarin, but our long-term goal is to cover both symbolic and computational +verifiers. For each tool, we go though: + +- Installation and basic use +- How to define and formally verify a new model +- Tool-specific pain points, and potential workarounds + +{{< section >}} + +## Background + +Formal verification tools allow us to formally prove security properties of cryptographic protocols. Alternatively, they can often provide counterexamples (and sometimes real attacks) showing that a particular protocol does not guarantee the security properties that we expect it to. To formally verify a protocol we need to describe the protocol we are reviewing in a language understood by the tool, and we also need to describe the security properties that we would like the protocol to have. The tool will then automatically search for a formal proof that the properties we have specified are upheld by the protocol, and if it terminates, it will output either a proof showing that the properties hold, or a counterexample showing how some property fails to hold. + +For this reason, formal verification tools provide great value for anyone designing a new cryptographic protocol. They allow developers to verify that a new design meets the expected security guarantees. They allow us to experiment with the design and compare the security properties and trade-offs between different design choices. Formal verification tools also provide great value for anyone who is modifying or extending an already existing protocol. If the original protocol already has a formal model, modeling an extension to the protocol is typically cheap, allowing the developer to prove that the protocol extension is secure without having to model the entire protocol from scratch. + +{{< hint danger >}} + +### What is a cryptograpic protocol? + +You may think that only cryptographers design cryptographic protocols, and that you don't need complex tools to understand the security properties of your relatively simple use case. However, our experience shows that it makes more sense to take the following very broad view of the term _cryptographic protocol_, which also has implications for formal verification. + +**Anyone who is composing different cryptographic primitives (like encryption, signatures, and hashes) in a way that has not been previously specified by a standards document like an IETF RFC or NIST standard, and has not been analyzed in a public academic paper, is designing a new cryptographic protocol.** + +New cryptographic protocols need to be proven secure before they are deployed. Formal verification is one way of achieving this which ensures both correctness and auditability. At Trail of Bits, we recommend that _all_ new cryptographic protocols should be formally verified. + +{{< /hint >}} + +## The symbolic model or the computational model? + +Tools used to formally verify cryptographic protocols typically come in one of two different flavors. Tools like [ProVerif](https://bblanche.gitlabpages.inria.fr/proverif/), [Tamarin](https://tamarin-prover.com/), and [Verifpal](https://verifpal.com/) analyze protocols in the _symbolic model_. This means that cryptographic primitives are modeled as black boxes satisfying certain given equations. For example, symmetric encryption could be modeled using two functions `encrypt` and `decrypt` satisfying the following equation: + +```js +decrypt(key, encrypt(key, data)) = data +``` + +This allows the tool to replace any occurrence of the term `decrypt(key, encrypt(key, data))` with `data`, but it says nothing about the strength of the encryption algorithm. In a sense, it treats the encryption as perfect, since the only way to obtain the plaintext from the ciphertext is with knowledge of the correct key. Modeling cryptographic primitives in this way implies a number of trade-offs that may not be immediately obvious. On the one hand, it provides clean abstractions that allows us to specify the high-level building blocks of cryptographic protocols without having to worry too much about how each cryptographic primitive is implemented. This also allows us to instantiate the protocol with any primitive that satisfies the given equations and still be sure that the security proofs hold. On the other hand, it means that we cannot reason about some things like brute-force or padding-oracle attacks in a straight-forward manner. (If we want to do this, we need to model our primitives in a way that allow us to express these types of attacks within the model. Depending on the security properties we are interested in, this may either add complexity, which often has adverse effects on proving time, or may sometimes be impossible within the symbolic model.) + +Symbolic verification tools typically model the network as untrusted, allowing the attacker to intercept, delay, modify, and replay messages between participants at will. This is known as the _Dolev-Yao model_, and was first described in the paper [_On the Security of Public-Key Protocols_](https://www.cs.huji.ac.il/~dolev/pubs/dolev-yao-ieee-01056650.pdf). Individual tools often provide abstractions (like the `reliable-channel` builtin in Tamarin or the `passive` keyword in Verifpal) that can be used to restrict the capabilities of the attacker in different ways. + +Tools like [CryptoVerif](https://bblanche.gitlabpages.inria.fr/CryptoVerif/) analyze protocols in the _computational model_. Here, individual messages are modeled as bitstrings, cryptographic primitives are modeled as functions on bitstrings, and the adversary is modeled as a arbitrary probabilistic polynomial-time Turing Machine. The probability of the attacker successfully breaking the security of a given primitive has to be provided up-front by the user. The tool can then automatically prove indistinguishability between sequences of games (up to some probability), where the first game captures the analyzed protocol, and the final game is one where the required security property is obvious. The output is given as a bound on the probability of a successful attack as a function of the security parameters of the individual primitives used, and the number of protocol executions. A key benefit of the computational model is that this mimics how cryptographers usually prove security of cryptographic protocols. This makes it easier to carry over constructions from traditional pen-and-paper proofs to automated proofs. + +The computational model is clearly more expressive than the symbolic model. However, it is also pre-supposes a deep understanding of provable security and game-based proofs, which is not required when working in the symbolic model. + +Ultimately, which tool to use depends as much on your background as on the protocol you are trying to analyze. If you don't have a background in cryptography or formal verification we recommend that you start out with a simple symbolic verification tool like Verifpal. If you struggle to model your protocol in Verifpal, or need to express more complex security properties that are not supported by by the tool, we suggest switching to a more expressive and mature symbolic prover like ProVerif or Tamarin. Finally, if you want to bound attack probabilities or translate a game-based proof from a paper, you need to work in the computational model with a tool like CryptoVerif. + +{{< hint info >}} + +### Where do I start? + +[Verifpal](https://verifpal.com) is a great starting point if you have no previous exposure to formal verification. It has a small set of built-in primitives that cover a range of use-cases, and the syntax mimics how developers typically think about cryptographic protocols. Verifpal's intuitive syntax also makes the resulting models easier to parse and maintain for less experienced users. + +This is an important point. If you develop a formal model of a protocol that is too complex for the developers of the protocol to maintain, then the proof becomes a snapshot in time with limited usefulness. If developers understand the model and can update and extend it as the protocol develops, it can be built on to provide assurance for future versions of the protocol as well. + +{{< /hint >}} + +{{< hint info >}} + +### How do I start? + +Since most cryptographic protocols have a large number of valid states, formal verification tools often struggle because of state-space explosion. For this reason, it is generally a good idea to try to divide your protocol into smaller components, or sub-protocols, like _registering a new device_ or _sending and receiving a single message_, and try to prove that these individual components provide the expected security guarantees. + +This approach clearly limits what the attacker can do since each sub-protocol is analyzed in a vacuum. However, it makes each model more manageable and helps avoid the issue of state-space explosion. At Trail of Bits we often use this method to quickly model protocol components on cryptographic design reviews. It is both a useful tool to verify our understanding of the protocol, and to identify potential weaknesses in the design. + +{{< /hint >}} diff --git a/content/docs/formal-verification/tamarin/00-installation.md b/content/docs/formal-verification/tamarin/00-installation.md new file mode 100644 index 0000000..7c3cbd0 --- /dev/null +++ b/content/docs/formal-verification/tamarin/00-installation.md @@ -0,0 +1,151 @@ +--- +title: "Installation and basic concepts" +slug: basic-concepts +summary: "This section explains the basic concepts of Tamarin." +weight: 1 +--- + +# Installation and basic concepts + +## Installing Tamarin + +We recommend installing Tamarin using a package manager to ensure that the correct dependencies are installed together with the tool. Tamarin is currently packaged for Homebrew, Arch Linux, Nixpkgs, and NixOS. + +{{< tabs "installing-tamarin" >}} + +{{< tab "Using Homebrew" >}} +On MacOS and Linux, you can install Tamarin using Howebrew as follows: + +```sh +brew install tamarin-prover/tap/tamarin-prover +``` + +{{< /tab >}} + +{{< tab "Using Pacman" >}} +On Arch Linux you can install Tamarin using Pacman with the following command: + +```sh +pacman -S tamarin-prover +``` + +{{< /tab >}} + +{{< tab "Using the Nix package manager" >}} +You can install Tamarin from Nixpkgs using the Nix package manager with the following command: + +```sh +nix-env -i tamarin-prover +``` + +{{< /tab >}} + +{{< tab "On NixOS" >}} +If you are running NixOS you can install Tamarin by adding `tamarin-prover` to your `environment.systemPackages`. + +{{< /tab >}} + +{{< /tabs >}} + +## Tamarin basic concepts + +Tamarin models can be specified either using [multiset](https://en.wikipedia.org/wiki/Multiset) rewrite rules, or using a process calculus based on a variant of the applied pi-calculus. Protocol properties are specified using a fragment of temporal logic. This means that we can express properties of terms at given points in time. The global protocol state is given by a multiset of facts that is expanded and updated as the protocol progresses. Here, a fact is essentially a statement about a set of terms. For example, "`Alice` has registered the public key `pk` with the server", or "the attacker has learned the private key `sk`". + +### Variables and sorts + +Variables in Tamarin range over five different sorts (or types). There is a top sort for messages that can be sent over the network, with subsorts for public values, fresh (or private) values, and natural numbers. There is also an sort for temporal values (essentially, points in time). To make the sort clear from context, each sort is prefixed by a single character. + +- `~x` denotes a fresh value. _Fresh values_ are random values like secret keys + and nonces and are not known to the attacker. +- `$x` denotes a public value. _Public values_ are publicly known. In + particular, they are known to the attacker. Constant public values are written as strings (without the `$`). For example, the constant 0x30 is written `'0x30'`, and a constant generator _g_ of a cyclic group is written `'g'`. (For example, the public key corresponding to the private key `~x` in a Diffie-Hellman based protocol is written `'g'^~x`.) +- `%n` denotes a natural number. _Natural numbers_ start at `%1` in Tamarin and + require the `natural-numbers` builtin to use. They are used to represent (small) numbers like counters, and are assumed to be known to the attacker. +- `#i` denotes a temporal value. _Temporal values_ are not used to describe the + protocol directly, but are used to express protocol properties like "If the attacker knows Alice's key at time `#j`, then Alice's key must have been leaked at some time `#i` before `#j`." + +Terms may be composed using functions. These are either user defined, or defined by importing a pre-defined theory (known as a `builtin`). + +### Functions and equations + +User-defined functions are declared using the `functions` keyword. New functions can be defined in two different ways. It is possible to define new functions by simply specifying the function name and arity (how many arguments it takes) as follows: + +```js +functions: + encrypt/2, + decrypt/2, + ... +``` + +New functions can also be defined by specifying the full signature of the function, using user-defined type names, as follows: + +```js +functions: + encrypt(SecretKey, Bytes): Bytes, + decrypt(SecretKey, Bytes): Bytes, + ... +``` + +Here, `SecretKey` and `Bytes` are arbitrary type names specified by the user. There are a number of benefits to using explicit types in this way when defining your own functions. Apart from making definitions more readable, all equations are type checked by Tamarin before the prover runs. This ensures that any issues due to inconsistent typing of arguments is caught early in the modeling process. For example, using the term `encrypt(key, message)` in one location would cause Tamarin to infer that `key` has type `SecretKey` and `message` has type `Bytes`. If we later introduced the term `decrypt(message, key)` Tamarin would complain, saying that it expected the first argument to `decrypt` to have the type `SecretKey`, but that `message` must be of type `Bytes`. Note that function types are ignored after the type-checking phase is complete. In particular, they do not affect how the attacker may use the defined function. + +{{< hint info >}} + +#### Include function signatures with sensible type names when declaring new functions + +We recommend always including the full function signature when introducing new functions. This makes function definitions easier to parse, allows you to take full advantage of Tamarin's build-in typing checking, and avoid potential typing inconsistencies. + +{{< /hint >}} + +### Facts + +Facts express properties about the current state of the protocol. New facts can be introduced by the user, but there are a few pre-defined facts that are useful to know about. + +- `Fr(x)` says that `x` is a fresh value. That is, a random value which cannot + be inferred by the attacker. +- `Out('c', x)` says that `x` is sent on the public channel with name + `'c'`. Since we're assuming a Dolev-Yao adversary who is in complete control of the network, this also means that `x` becomes available to the attacker as soon as it is sent over the network. If there is only one channel, the channel identifier `'c'` may be omitted. +- `In('c', x)` says that `x` is received on the public channel `'c'`. + Since the attacker controls the network she may drop or alter messages at + will. This means that we cannot assume that the message `x` is ever received just because it is sent at some point by some honest participant, or conversely, that `x` was sent by an honest participant, just because `x` was received. If there is only one channel, the channel identifier `'c'` may be omitted. +- `K(x)` says that the attacker knows the value `x`. +- `T` and `F` denote the boolean constants `true` and `false`. They are + sometimes useful when formulating security properties. (See below for an example.) + +## Protocol properties and lemmas + +Protocol properties are expressed as _lemmas_ in Tamarin. (In mathematics, a lemma is an intermediate step or proposition, typically used as a stepping stone in the proof of some other theorem.) Lemmas express properties of protocol execution traces, and by default, they contain an implicit quantification over all execution traces. For example, consider the following lemma, which informally says that the only way that the attacker could learn the private key of an initialized device, is if that private key is leaked to the attacker. + +```js +lemma device_key_confidentiality: + " + All private_key #i #j. ( + DeviceInitialized(to_public_key(private_key)) @ #i & K(private_key) @ #j + ==> + Ex #k. (k < j & DeviceKeyLeaked(to_public_key(private_key)) @ #k) + ) + " +``` + +There is a lot going on here. Following the `lemma` keyword is a name which identifies the lemma. This should be something expressive, describing the intended meaning of the statement. The `All` and `Ex` keywords represent universal and existential quantification. The operators `&` and `==>` represent conjunction and implication. (Disjunction is written `|`, and negation is written as `not(...)`). The variables `#i`, `#j`, and `#k` all range over temporal values. Undecorated variables like `private_key` range over messages. + +The statements `DeviceInitialized(...)` and `DeviceKeyLeaked(...)` are special facts called _action facts_ in the rewrite rule-version of Tamarin, or _events_ in the pi calculus-version. `DeviceInitialized(...) @ #i` means that the fact occurs (that is, is added to the execution trace) at time `#i`. Facts in lemmas always have to be qualified with a temporal variable in this way. + +We note that this lemma implicitly contains a universal quantification over all execution traces of the analyzed protocol. It is really saying that "_for any execution of the protocol_, the private key will remain confidential as long as it is not leaked to the attacker." It is possible to make this explicit by adding the keyword `all-traces` before the opening quotation mark. It is also possible to write lemmas that use existential quantification over execution traces using the keyword `exists-trace`. This is useful for proving correctness properties, saying that there exists an execution trace where the protocol completes successfully. + +```js +lemma protocol_correctness: + exists-trace + " + Ex public_key #i #j. ( + DeviceInitialized(public_key) @ i & + ServerInitialized(public_key) @ j & + All #k. (DeviceKeyLeaked(public_key) @ k ==> F) + ) + " +``` + +This lemma informally says that there is an execution trace where at least one device is initialized, registers its public key with the server, without leaking the private key to the attacker. Note that the final statement, that the device key is not leaked, is formulated in a somewhat roundabout fashion. The reason for this is that lemmas must be expressed in the _guarded fragment_ of first-order temporal logic. In practice, this means that all existentially quantified formulas must be on the form `Ex x. (A(x) @ #i & B(x) @ #j)` and all universally quantified formulas must be on the form `All x. (A(x) @ #i ==> B(x) @ #j)`. (Since `F` represents `false`, the guarded formula `DeviceKeyLeaked(...) @k ==> F` is equivalent to `not(DeviceKeyLeaked(...) @ k)`.) + +## Restricting the execution trace + +## Rewrite rules or process calculus? diff --git a/content/docs/formal-verification/tamarin/10-rule-based-models.md b/content/docs/formal-verification/tamarin/10-rule-based-models.md new file mode 100644 index 0000000..b4a9c0a --- /dev/null +++ b/content/docs/formal-verification/tamarin/10-rule-based-models.md @@ -0,0 +1,52 @@ +--- +title: "Rule-based models" +slug: rule-based-models +summary: "This section describes how to define a Tamarin model using multiset rewrite rules." +weight: 10 +--- + +# Rule-based models + +A Tamarin model can be defined either using multiset rewrite rules, or as +processes using Sapic+. This section will go through the rewrite rule-based +approach. + +## Facts + +To model a protocol using rewrite rules we view the global state of the protocol +as a multiset of _facts_. In Tamarin, a fact is a statement like "Alice has +registered the public key _Y_ with the server", or "The server successfully +decrypted the message _m_ from Bob". Rewrite rules consume facts and produce +new facts. In this way, they define a transition system which describes how the +global state is updated during the execution of the protocol. Since rules may +consume facts, the global state is modeled as a multiset which may contain +multiple instances of the same fact. + +There are a few facts with predefined meanings that are useful to know about +before we start defining our own rewrite rules: + +- `Fr(x)`: This says that `x` is a _fresh_ value. This means that `x` is random + and cannot be inferred by the attacker. It also means that the value of `x` + is unique across different runs of the protocol. Fresh values are usually used + to model randomly generated keys, nonces, and random IVs. For convenience, + Tamarin has a built-in rule for creating fresh values. This means that any + rule that takes one or more fresh values as input can be executed at any time + since new fresh values can always be created. +- `Out('c', x)`: This says that `x` is sent on the public channel with name + `'c'`. Since we're assuming a Dolev-Yao adversary who is in complete control of + the network, this also means that `x` becomes available to the attacker. If there + is only one channel, the channel identifier `'c'` may be omitted. +- `In('c', x)`: This says that `x` is received on the public channel `'c'`. + Since the attacker controls the network she may drop or alter messages at + will. This means that we cannot assume that the message `x` is received, just + because it is sent by some protocol participant, or conversely, that `x` was sent by some participant, just because the fact `In('c', x)` is true. If there is only one channel, the channel identifier `'c'` may be omitted. +- `K(x)`: This means that the attacker has learned `x`. + +## Rewrite rules + +Rewrite rules are defined using the following syntax: + +```ml +rule Rule_name: + [ input facts ] --[ action facts ]-> [ output facts ] +``` diff --git a/content/docs/formal-verification/tamarin/_index.md b/content/docs/formal-verification/tamarin/_index.md new file mode 100644 index 0000000..fc99321 --- /dev/null +++ b/content/docs/formal-verification/tamarin/_index.md @@ -0,0 +1,35 @@ +--- +title: "Tamarin" +weight: 2 +summary: "Tamarin a a symbolic verification tool that can be used to formally +model security protocols. Protocols are described using either multiset rewrite +rules or a variant of the applied pi-calculus. Security properties are modeled using first-order temporal logic and can be proved either automatically or interactively." +bookCollapseSection: true +--- + +# Tamarin + +Tamarin is a formal verification tool that can be used to model the security +properties of a protocol in the symbolic model. Protocols are modelled either +using multiset term rewrite rules, or using a version of the applied +pi-calculus. Security properties are specified using a fragment of first-order +temporal logic, and can be proved either automatically or interactively using a +web-based interface. + +## Benefits of using Tamarin + +- Built-in support for common cryptographic construction such as symmetric and + asymmetric encryption, hashes, signatures, and Diffie-Hellman based constructions. +- Easy to define new primitives and corresponding equations. +- Security properties are specified using an expressive fragment of temporal + first-order logic. +- Security properties can be proved either automatically or interactively. + +## Ideal use case + +- If you need to define custom primitives or prove more complex security properties. + not supported by simpler tools like [Verifpal](https://verifpal.com). +- If you need to model concurrent, non-deterministic processes, which is not supported by [Verifpal](https://verifpal.com). +- If the protocol you are modeling uses multiplicative inverses to cancel out + exponents in Diffie-Hellman based sub-protocols, or if you want to consider + attacks based on multiplicative behavior against your protocol. From da3873356f1b7543820555b36135f0404b542a78 Mon Sep 17 00:00:00 2001 From: Fredrik Dahlgren Date: Thu, 8 Feb 2024 11:19:25 +0100 Subject: [PATCH 2/6] Added PR preview workflow --- .github/workflows/preview.yml | 55 +++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 .github/workflows/preview.yml diff --git a/.github/workflows/preview.yml b/.github/workflows/preview.yml new file mode 100644 index 0000000..f3c0d10 --- /dev/null +++ b/.github/workflows/preview.yml @@ -0,0 +1,55 @@ +# .github/workflows/preview.yml +name: Deploy PR previews + +on: + pull_request: + types: + - opened + - reopened + - synchronize + - closed + +concurrency: preview-${{ github.ref }} + +permissions: + pull-requests: write + +# Default to bash +defaults: + run: + shell: bash + +jobs: + # Build job + build-deploy: + runs-on: ubuntu-latest + env: + HUGO_VERSION: 0.122.0 + steps: + - name: Install Hugo CLI + run: | + wget -O ${{ runner.temp }}/hugo.deb https://github.com/gohugoio/hugo/releases/download/v${HUGO_VERSION}/hugo_extended_${HUGO_VERSION}_linux-amd64.deb \ + && sudo dpkg -i ${{ runner.temp }}/hugo.deb + - name: Install Dart Sass Embedded + run: sudo snap install dart-sass-embedded + - name: Checkout + uses: actions/checkout@v3 + with: + submodules: recursive + - name: Install Node.js dependencies + run: "[[ -f package-lock.json || -f npm-shrinkwrap.json ]] && npm ci || true" + - name: Build with Hugo + env: + # For maximum backward compatibility with Hugo modules + HUGO_ENVIRONMENT: production + HUGO_ENV: production + run: | + hugo \ + --minify \ + --baseURL "https://trailofbits.github.io/testing-handbook-preview/pr-preview/pr-${{ github.event.number }}/" + - name: Deploy preview + uses: rossjrw/pr-preview-action@v1 + with: + source-dir: ./public/ + token: ${{ secrets.TESTING_HANDBOOK_PREVIEW_REPO }} + deploy-repository: trailofbits/testing-handbook-preview From 31c626ccb06b1e52d3dbdf23c0c5f6f93c5e4614 Mon Sep 17 00:00:00 2001 From: Fredrik Dahlgren Date: Thu, 8 Feb 2024 14:40:41 +0100 Subject: [PATCH 3/6] Moved markdownlint config to ensure it is found by VSCode --- .github/workflows/markdown.yml | 2 +- .github/workflows/.markdownlint.jsonc => .markdownlint.jsonc | 0 .../tamarin/{00-installation.md => 00-basic-concepts.md} | 0 3 files changed, 1 insertion(+), 1 deletion(-) rename .github/workflows/.markdownlint.jsonc => .markdownlint.jsonc (100%) rename content/docs/formal-verification/tamarin/{00-installation.md => 00-basic-concepts.md} (100%) diff --git a/.github/workflows/markdown.yml b/.github/workflows/markdown.yml index d6529a3..3827eea 100644 --- a/.github/workflows/markdown.yml +++ b/.github/workflows/markdown.yml @@ -25,7 +25,7 @@ jobs: with: command: config globs: | - .github/workflows/.markdownlint.jsonc + .markdownlint.jsonc **/*.md # Spellcheck Markdown files using `retext` and `remark` # Uses: a custom dictionary file diff --git a/.github/workflows/.markdownlint.jsonc b/.markdownlint.jsonc similarity index 100% rename from .github/workflows/.markdownlint.jsonc rename to .markdownlint.jsonc diff --git a/content/docs/formal-verification/tamarin/00-installation.md b/content/docs/formal-verification/tamarin/00-basic-concepts.md similarity index 100% rename from content/docs/formal-verification/tamarin/00-installation.md rename to content/docs/formal-verification/tamarin/00-basic-concepts.md From 78dca4af88da329a6436b33262e1db8de62638f9 Mon Sep 17 00:00:00 2001 From: Fredrik Dahlgren Date: Thu, 8 Feb 2024 14:41:33 +0100 Subject: [PATCH 4/6] Fixed spelling and formatting issues --- .github/workflows/dictionary.txt | 35 +++++ content/docs/formal-verification/_index.md | 105 +++++++++++--- .../tamarin/00-basic-concepts.md | 135 ++++++++++++++---- .../tamarin/10-rule-based-models.md | 42 +----- .../formal-verification/tamarin/_index.md | 19 ++- 5 files changed, 238 insertions(+), 98 deletions(-) diff --git a/.github/workflows/dictionary.txt b/.github/workflows/dictionary.txt index 8e37c08..2cc5d5b 100644 --- a/.github/workflows/dictionary.txt +++ b/.github/workflows/dictionary.txt @@ -1,20 +1,25 @@ .semgrepignore +0x30 0xdea 20-ci AFL APIs AppSec +arity AST ATL +auditability autobuild autocomplete autofix autofixes Autofix Autofixes +bitstrings bookCollapseSection bookFlatSection cargo-fuzz +ciphertext CLI CMake codebase @@ -24,6 +29,9 @@ CodeQL codeql-queries config Config +cryptographic +CryptoVerif +CryptoVerif's CSV CTFs Ctrl @@ -33,8 +41,11 @@ dataflow DeepSemgrep DevOps derefs +Disjunction +Diffie-Hellman dgryski Dockerfiles +Dolev-Yao elttam eval Exiv2 @@ -52,6 +63,8 @@ goroutine GuardDog Homebrew hugo +IETF +indistinguishability interprocedural intraprocedural ioutil-readdir-deprecated @@ -63,6 +76,7 @@ lifecycle LSP lxml-in-pandas macOS +MacOS memcpy Memcpy metavariable @@ -80,17 +94,29 @@ mindedsecurity misconfiguration multiline multivalued +multiset natively Neovim +NIST +NixOS +Nixpkgs NodeJsScan +nonce +nonces Nullcon OCaml OWASP +Pacman PCRE-compatible +plaintext Pohekar postMessage pre-built pre-compiled +propositionally +prover +ProVerif +ProVerif's README QL repo @@ -101,6 +127,7 @@ Rulesets sanitization SARIF SARIF-file +Sapic SAST SDLC semgrep @@ -115,13 +142,21 @@ Sheeraz Shreya src subdirectories +subsorts +Tamarin +Tamarin's trailofbits Triaging Trello +Undecorated unhandled unsanitized untracked untrusted +verifier +verifiers +Verifpal +Verifpal's VSCode vuln XSS diff --git a/content/docs/formal-verification/_index.md b/content/docs/formal-verification/_index.md index 19de28c..b08130e 100644 --- a/content/docs/formal-verification/_index.md +++ b/content/docs/formal-verification/_index.md @@ -19,47 +19,104 @@ verifiers. For each tool, we go though: ## Background -Formal verification tools allow us to formally prove security properties of cryptographic protocols. Alternatively, they can often provide counterexamples (and sometimes real attacks) showing that a particular protocol does not guarantee the security properties that we expect it to. To formally verify a protocol we need to describe the protocol we are reviewing in a language understood by the tool, and we also need to describe the security properties that we would like the protocol to have. The tool will then automatically search for a formal proof that the properties we have specified are upheld by the protocol, and if it terminates, it will output either a proof showing that the properties hold, or a counterexample showing how some property fails to hold. - -For this reason, formal verification tools provide great value for anyone designing a new cryptographic protocol. They allow developers to verify that a new design meets the expected security guarantees. They allow us to experiment with the design and compare the security properties and trade-offs between different design choices. Formal verification tools also provide great value for anyone who is modifying or extending an already existing protocol. If the original protocol already has a formal model, modeling an extension to the protocol is typically cheap, allowing the developer to prove that the protocol extension is secure without having to model the entire protocol from scratch. +Formal verification tools allow us to formally prove security properties of cryptographic protocols. Alternatively, they +can often provide counterexamples (and sometimes real attacks) showing that a particular protocol does not guarantee the +security properties that we expect it to. To formally verify a protocol we need to describe the protocol we are +reviewing in a language understood by the tool, and we also need to describe the security properties that we would like +the protocol to have. The tool will then automatically search for a formal proof that the properties we have specified +are upheld by the protocol, and if it terminates, it will output either a proof showing that the properties hold, or a +counterexample showing how some property fails to hold. + +For this reason, formal verification tools provide great value for anyone designing a new cryptographic protocol. They +allow developers to verify that a new design meets the expected security guarantees. They allow us to experiment with +the design and compare the security properties and trade-offs between different design choices. Formal verification +tools also provide great value for anyone who is modifying or extending an already existing protocol. If the original +protocol already has a formal model, modeling an extension to the protocol is typically cheap, allowing the developer to +prove that the protocol extension is secure without having to model the entire protocol from scratch. {{< hint danger >}} -### What is a cryptograpic protocol? +### What is a cryptographic protocol? -You may think that only cryptographers design cryptographic protocols, and that you don't need complex tools to understand the security properties of your relatively simple use case. However, our experience shows that it makes more sense to take the following very broad view of the term _cryptographic protocol_, which also has implications for formal verification. +You may think that only cryptographers design cryptographic protocols, and that you don't need complex tools to +understand the security properties of your relatively simple use case. However, our experience shows that it makes more +sense to take the following very broad view of the term _cryptographic protocol_, which also has implications for formal +verification. -**Anyone who is composing different cryptographic primitives (like encryption, signatures, and hashes) in a way that has not been previously specified by a standards document like an IETF RFC or NIST standard, and has not been analyzed in a public academic paper, is designing a new cryptographic protocol.** +**Anyone who is composing different cryptographic primitives (like encryption, signatures, and hashes) in a way that has +not been previously specified by a standards document like an IETF RFC or NIST standard, and has not been analyzed in a +public academic paper, is designing a new cryptographic protocol.** -New cryptographic protocols need to be proven secure before they are deployed. Formal verification is one way of achieving this which ensures both correctness and auditability. At Trail of Bits, we recommend that _all_ new cryptographic protocols should be formally verified. +New cryptographic protocols need to be proven secure before they are deployed. Formal verification is one way of +achieving this which ensures both correctness and auditability. At Trail of Bits, we recommend that _all_ new +cryptographic protocols should be formally verified. {{< /hint >}} ## The symbolic model or the computational model? -Tools used to formally verify cryptographic protocols typically come in one of two different flavors. Tools like [ProVerif](https://bblanche.gitlabpages.inria.fr/proverif/), [Tamarin](https://tamarin-prover.com/), and [Verifpal](https://verifpal.com/) analyze protocols in the _symbolic model_. This means that cryptographic primitives are modeled as black boxes satisfying certain given equations. For example, symmetric encryption could be modeled using two functions `encrypt` and `decrypt` satisfying the following equation: +Tools used to formally verify cryptographic protocols typically come in one of two different flavors. Tools like +[ProVerif](https://bblanche.gitlabpages.inria.fr/proverif/), [Tamarin](https://tamarin-prover.com/), and +[Verifpal](https://verifpal.com/) analyze protocols in the _symbolic model_. This means that cryptographic primitives +are modeled as black boxes satisfying certain given equations. For example, symmetric encryption could be modeled using +two functions `encrypt` and `decrypt` satisfying the following equation: ```js decrypt(key, encrypt(key, data)) = data ``` -This allows the tool to replace any occurrence of the term `decrypt(key, encrypt(key, data))` with `data`, but it says nothing about the strength of the encryption algorithm. In a sense, it treats the encryption as perfect, since the only way to obtain the plaintext from the ciphertext is with knowledge of the correct key. Modeling cryptographic primitives in this way implies a number of trade-offs that may not be immediately obvious. On the one hand, it provides clean abstractions that allows us to specify the high-level building blocks of cryptographic protocols without having to worry too much about how each cryptographic primitive is implemented. This also allows us to instantiate the protocol with any primitive that satisfies the given equations and still be sure that the security proofs hold. On the other hand, it means that we cannot reason about some things like brute-force or padding-oracle attacks in a straight-forward manner. (If we want to do this, we need to model our primitives in a way that allow us to express these types of attacks within the model. Depending on the security properties we are interested in, this may either add complexity, which often has adverse effects on proving time, or may sometimes be impossible within the symbolic model.) - -Symbolic verification tools typically model the network as untrusted, allowing the attacker to intercept, delay, modify, and replay messages between participants at will. This is known as the _Dolev-Yao model_, and was first described in the paper [_On the Security of Public-Key Protocols_](https://www.cs.huji.ac.il/~dolev/pubs/dolev-yao-ieee-01056650.pdf). Individual tools often provide abstractions (like the `reliable-channel` builtin in Tamarin or the `passive` keyword in Verifpal) that can be used to restrict the capabilities of the attacker in different ways. - -Tools like [CryptoVerif](https://bblanche.gitlabpages.inria.fr/CryptoVerif/) analyze protocols in the _computational model_. Here, individual messages are modeled as bitstrings, cryptographic primitives are modeled as functions on bitstrings, and the adversary is modeled as a arbitrary probabilistic polynomial-time Turing Machine. The probability of the attacker successfully breaking the security of a given primitive has to be provided up-front by the user. The tool can then automatically prove indistinguishability between sequences of games (up to some probability), where the first game captures the analyzed protocol, and the final game is one where the required security property is obvious. The output is given as a bound on the probability of a successful attack as a function of the security parameters of the individual primitives used, and the number of protocol executions. A key benefit of the computational model is that this mimics how cryptographers usually prove security of cryptographic protocols. This makes it easier to carry over constructions from traditional pen-and-paper proofs to automated proofs. - -The computational model is clearly more expressive than the symbolic model. However, it is also pre-supposes a deep understanding of provable security and game-based proofs, which is not required when working in the symbolic model. - -Ultimately, which tool to use depends as much on your background as on the protocol you are trying to analyze. If you don't have a background in cryptography or formal verification we recommend that you start out with a simple symbolic verification tool like Verifpal. If you struggle to model your protocol in Verifpal, or need to express more complex security properties that are not supported by by the tool, we suggest switching to a more expressive and mature symbolic prover like ProVerif or Tamarin. Finally, if you want to bound attack probabilities or translate a game-based proof from a paper, you need to work in the computational model with a tool like CryptoVerif. +This allows the tool to replace any occurrence of the term `decrypt(key, encrypt(key, data))` with `data`, but it says +nothing about the strength of the encryption algorithm. In a sense, it treats the encryption as perfect, since the only +way to obtain the plaintext from the ciphertext is with knowledge of the correct key. Modeling cryptographic primitives +in this way implies a number of trade-offs that may not be immediately obvious. On the one hand, it provides clean +abstractions that allows us to specify the high-level building blocks of cryptographic protocols without having to worry +too much about how each cryptographic primitive is implemented. This also allows us to instantiate the protocol with any +primitive that satisfies the given equations and still be sure that the security proofs hold. On the other hand, it +means that we cannot reason about some things like brute-force or padding-oracle attacks in a straight-forward manner. +(If we want to do this, we need to model our primitives in a way that allow us to express these types of attacks within +the model. Depending on the security properties we are interested in, this may either add complexity, which often has +adverse effects on proving time, or may sometimes be impossible within the symbolic model.) + +Symbolic verification tools typically model the network as untrusted, allowing the attacker to intercept, delay, modify, +and replay messages between participants at will. This is known as the _Dolev-Yao model_, and was first described in the +paper [_On the Security of Public-Key Protocols_](https://www.cs.huji.ac.il/~dolev/pubs/dolev-yao-ieee-01056650.pdf). +Individual tools often provide abstractions (like the `reliable-channel` builtin in Tamarin or the `passive` keyword in +Verifpal) that can be used to restrict the capabilities of the attacker in different ways. + +Tools like [CryptoVerif](https://bblanche.gitlabpages.inria.fr/CryptoVerif/) analyze protocols in the _computational +model_. Here, individual messages are modeled as bitstrings, cryptographic primitives are modeled as functions on +bitstrings, and the adversary is modeled as an arbitrary probabilistic polynomial-time Turing Machine. The probability +of the attacker successfully breaking the security of a given primitive has to be provided up-front by the user. The +tool can then automatically prove indistinguishability between sequences of games (up to some probability), where the +first game captures the analyzed protocol, and the final game is one where the required security property is obvious. +The output is given as a bound on the probability of a successful attack as a function of the security parameters of the +individual primitives used, and the number of protocol executions. A key benefit of the computational model is that this +mimics how cryptographers usually prove security of cryptographic protocols. This makes it easier to carry over +constructions from traditional pen-and-paper proofs to automated proofs. + +The computational model is clearly more expressive than the symbolic model. However, it is also presupposes a deep +understanding of provable security and game-based proofs, which is not required when working in the symbolic model. + +Ultimately, which tool to use depends as much on your background as on the protocol you are trying to analyze. If you +don't have a background in cryptography or formal verification we recommend that you start out with a simple symbolic +verification tool like Verifpal. If you struggle to model your protocol in Verifpal, or need to express more complex +security properties that are not supported by the tool, we suggest switching to a more expressive and mature symbolic +prover like ProVerif or Tamarin. Finally, if you want to bound attack probabilities or translate a game-based proof from +a paper, you need to work in the computational model with a tool like CryptoVerif. {{< hint info >}} ### Where do I start? -[Verifpal](https://verifpal.com) is a great starting point if you have no previous exposure to formal verification. It has a small set of built-in primitives that cover a range of use-cases, and the syntax mimics how developers typically think about cryptographic protocols. Verifpal's intuitive syntax also makes the resulting models easier to parse and maintain for less experienced users. +[Verifpal](https://verifpal.com) is a great starting point if you have no previous exposure to formal verification. It +has a small set of built-in primitives that cover a range of use-cases, and the syntax mimics how developers typically +think about cryptographic protocols. Verifpal's intuitive syntax also makes the resulting models easier to parse and +maintain for less experienced users. -This is an important point. If you develop a formal model of a protocol that is too complex for the developers of the protocol to maintain, then the proof becomes a snapshot in time with limited usefulness. If developers understand the model and can update and extend it as the protocol develops, it can be built on to provide assurance for future versions of the protocol as well. +This is an important point. If you develop a formal model of a protocol that is too complex for the developers of the +protocol to maintain, then the proof becomes a snapshot in time with limited usefulness. If developers understand the +model and can update and extend it as the protocol develops, it can be built on to provide assurance for future versions +of the protocol as well. {{< /hint >}} @@ -67,8 +124,14 @@ This is an important point. If you develop a formal model of a protocol that is ### How do I start? -Since most cryptographic protocols have a large number of valid states, formal verification tools often struggle because of state-space explosion. For this reason, it is generally a good idea to try to divide your protocol into smaller components, or sub-protocols, like _registering a new device_ or _sending and receiving a single message_, and try to prove that these individual components provide the expected security guarantees. +Since most cryptographic protocols have a large number of valid states, formal verification tools often struggle because +of state-space explosion. For this reason, it is generally a good idea to try to divide your protocol into smaller +components, or sub-protocols, like _registering a new device_ or _sending and receiving a single message_, and try to +prove that these individual components provide the expected security guarantees. -This approach clearly limits what the attacker can do since each sub-protocol is analyzed in a vacuum. However, it makes each model more manageable and helps avoid the issue of state-space explosion. At Trail of Bits we often use this method to quickly model protocol components on cryptographic design reviews. It is both a useful tool to verify our understanding of the protocol, and to identify potential weaknesses in the design. +This approach clearly limits what the attacker can do since each sub-protocol is analyzed in a vacuum. However, it makes +each model more manageable and helps avoid the issue of state-space explosion. At Trail of Bits we often use this method +to quickly model protocol components on cryptographic design reviews. It is both a useful tool to verify our +understanding of the protocol, and to identify potential weaknesses in the design. {{< /hint >}} diff --git a/content/docs/formal-verification/tamarin/00-basic-concepts.md b/content/docs/formal-verification/tamarin/00-basic-concepts.md index 7c3cbd0..9069807 100644 --- a/content/docs/formal-verification/tamarin/00-basic-concepts.md +++ b/content/docs/formal-verification/tamarin/00-basic-concepts.md @@ -1,5 +1,5 @@ --- -title: "Installation and basic concepts" +title: "Basic concepts" slug: basic-concepts summary: "This section explains the basic concepts of Tamarin." weight: 1 @@ -9,12 +9,13 @@ weight: 1 ## Installing Tamarin -We recommend installing Tamarin using a package manager to ensure that the correct dependencies are installed together with the tool. Tamarin is currently packaged for Homebrew, Arch Linux, Nixpkgs, and NixOS. +We recommend installing Tamarin using a package manager to ensure that the correct dependencies are installed together +with the tool. Tamarin is currently packaged for Homebrew, Arch Linux, Nixpkgs, and NixOS. {{< tabs "installing-tamarin" >}} {{< tab "Using Homebrew" >}} -On MacOS and Linux, you can install Tamarin using Howebrew as follows: +On MacOS and Linux, you can install Tamarin using Homebrew as follows: ```sh brew install tamarin-prover/tap/tamarin-prover @@ -49,26 +50,53 @@ If you are running NixOS you can install Tamarin by adding `tamarin-prover` to y ## Tamarin basic concepts -Tamarin models can be specified either using [multiset](https://en.wikipedia.org/wiki/Multiset) rewrite rules, or using a process calculus based on a variant of the applied pi-calculus. Protocol properties are specified using a fragment of temporal logic. This means that we can express properties of terms at given points in time. The global protocol state is given by a multiset of facts that is expanded and updated as the protocol progresses. Here, a fact is essentially a statement about a set of terms. For example, "`Alice` has registered the public key `pk` with the server", or "the attacker has learned the private key `sk`". +Tamarin models can be specified either using [multiset](https://en.wikipedia.org/wiki/Multiset) rewrite rules, or using +a process calculus based on a variant of the applied pi-calculus. Protocol properties are specified using a fragment of +temporal logic. This means that we can express properties of terms at given points in time. The global protocol state is +given by a multiset of facts that is expanded and updated as the protocol progresses. Here, a fact is essentially a +statement about a set of terms. For example, "`Alice` has registered the public key `pk` with the server", or "the +attacker has learned the private key `sk`". ### Variables and sorts -Variables in Tamarin range over five different sorts (or types). There is a top sort for messages that can be sent over the network, with subsorts for public values, fresh (or private) values, and natural numbers. There is also an sort for temporal values (essentially, points in time). To make the sort clear from context, each sort is prefixed by a single character. +Variables in Tamarin range over five different sorts (or types). There is a top sort for messages that can be sent over +the network, with subsorts for public values, fresh (or private) values, and natural numbers. There is also a sort for +temporal values (essentially, points in time). To make the sort clear from context, each sort is prefixed by a single +character. - `~x` denotes a fresh value. _Fresh values_ are random values like secret keys and nonces and are not known to the attacker. - `$x` denotes a public value. _Public values_ are publicly known. In - particular, they are known to the attacker. Constant public values are written as strings (without the `$`). For example, the constant 0x30 is written `'0x30'`, and a constant generator _g_ of a cyclic group is written `'g'`. (For example, the public key corresponding to the private key `~x` in a Diffie-Hellman based protocol is written `'g'^~x`.) + particular, they are known to the attacker. Constant public values are + written as strings (without the `$`). For example, the constant 0x30 is + written `'0x30'`, and a constant generator _g_ of a cyclic group is written + `'g'`. (For example, the public key corresponding to the private key `~x` in + a Diffie-Hellman based protocol is written `'g'^~x`.) - `%n` denotes a natural number. _Natural numbers_ start at `%1` in Tamarin and - require the `natural-numbers` builtin to use. They are used to represent (small) numbers like counters, and are assumed to be known to the attacker. + require the `natural-numbers` builtin to use. They are used to represent + (small) numbers like counters, and are assumed to be known to the attacker. - `#i` denotes a temporal value. _Temporal values_ are not used to describe the - protocol directly, but are used to express protocol properties like "If the attacker knows Alice's key at time `#j`, then Alice's key must have been leaked at some time `#i` before `#j`." + protocol directly, but are used to express protocol properties like "If the + attacker knows Alice's key at time `#j`, then Alice's key must have been + leaked at some time `#i` before `#j`." -Terms may be composed using functions. These are either user defined, or defined by importing a pre-defined theory (known as a `builtin`). +Terms may be composed using functions. These are either user defined, or defined by importing a predefined theory (known as a `builtin`). + +{{< hint info >}} + + +**Use descriptive variable and function names** + +Avoid using one-letter names for user-defined variables and functions. If you are modeling an existing protocol, reuse +names from the specification and include clear references to the section of the specification where they are defined. + +{{< /hint >}} ### Functions and equations -User-defined functions are declared using the `functions` keyword. New functions can be defined in two different ways. It is possible to define new functions by simply specifying the function name and arity (how many arguments it takes) as follows: +User-defined functions are declared using the `functions` keyword. New functions can be defined in two different ways. +It is possible to define new functions by simply specifying the function name and arity (how many arguments it takes) as +follows: ```js functions: @@ -86,34 +114,56 @@ functions: ... ``` -Here, `SecretKey` and `Bytes` are arbitrary type names specified by the user. There are a number of benefits to using explicit types in this way when defining your own functions. Apart from making definitions more readable, all equations are type checked by Tamarin before the prover runs. This ensures that any issues due to inconsistent typing of arguments is caught early in the modeling process. For example, using the term `encrypt(key, message)` in one location would cause Tamarin to infer that `key` has type `SecretKey` and `message` has type `Bytes`. If we later introduced the term `decrypt(message, key)` Tamarin would complain, saying that it expected the first argument to `decrypt` to have the type `SecretKey`, but that `message` must be of type `Bytes`. Note that function types are ignored after the type-checking phase is complete. In particular, they do not affect how the attacker may use the defined function. +Here, `SecretKey` and `Bytes` are arbitrary type names specified by the user. There are a number of benefits to using +explicit types in this way when defining your own functions. Apart from making definitions more readable, all equations +are type checked by Tamarin before the prover runs. This ensures that any issues due to inconsistent typing of arguments +is caught early in the modeling process. For example, using the term `encrypt(key, message)` in one location would +cause Tamarin to infer that `key` has type `SecretKey` and `message` has type `Bytes`. If we later introduced the term +`decrypt(message, key)` Tamarin would complain, saying that it expected the first argument to `decrypt` to have the type +`SecretKey`, but that `message` must be of type `Bytes`. Note that function types are ignored after the type-checking +phase is complete. In particular, they do not affect how the attacker may use the defined function. {{< hint info >}} -#### Include function signatures with sensible type names when declaring new functions + +**Include function signatures with descriptive type names when declaring new functions** -We recommend always including the full function signature when introducing new functions. This makes function definitions easier to parse, allows you to take full advantage of Tamarin's build-in typing checking, and avoid potential typing inconsistencies. +We recommend always including the full function signature when introducing new functions. This makes function +definitions easier to parse, allows you to take full advantage of Tamarin's build-in typing checking, and avoid +potential typing inconsistencies. {{< /hint >}} ### Facts -Facts express properties about the current state of the protocol. New facts can be introduced by the user, but there are a few pre-defined facts that are useful to know about. +Facts express properties about the current state of the protocol. New facts can be introduced by the user, but there are +a few predefined facts that are useful to know about. - `Fr(x)` says that `x` is a fresh value. That is, a random value which cannot be inferred by the attacker. -- `Out('c', x)` says that `x` is sent on the public channel with name - `'c'`. Since we're assuming a Dolev-Yao adversary who is in complete control of the network, this also means that `x` becomes available to the attacker as soon as it is sent over the network. If there is only one channel, the channel identifier `'c'` may be omitted. -- `In('c', x)` says that `x` is received on the public channel `'c'`. - Since the attacker controls the network she may drop or alter messages at - will. This means that we cannot assume that the message `x` is ever received just because it is sent at some point by some honest participant, or conversely, that `x` was sent by an honest participant, just because `x` was received. If there is only one channel, the channel identifier `'c'` may be omitted. +- `Out('c', x)` says that `x` is sent on the public channel with name `'c'`. + Since we're assuming a Dolev-Yao adversary who is in complete control of the + network, this also means that `x` becomes available to the attacker as soon as + it is sent over the network. If there is only one channel, the channel + identifier `'c'` may be omitted. +- `In('c', x)` says that `x` is received on the public channel `'c'`. Since the + attacker controls the network she may drop or alter messages at will. This + means that we cannot assume that the message `x` is ever received just + because it is sent at some point by some honest participant, or conversely, + that `x` was sent by an honest participant, just because `x` was received. If + there is only one channel, the channel identifier `'c'` may be omitted. - `K(x)` says that the attacker knows the value `x`. - `T` and `F` denote the boolean constants `true` and `false`. They are - sometimes useful when formulating security properties. (See below for an example.) + sometimes useful when formulating security properties. (See below for an + example.) ## Protocol properties and lemmas -Protocol properties are expressed as _lemmas_ in Tamarin. (In mathematics, a lemma is an intermediate step or proposition, typically used as a stepping stone in the proof of some other theorem.) Lemmas express properties of protocol execution traces, and by default, they contain an implicit quantification over all execution traces. For example, consider the following lemma, which informally says that the only way that the attacker could learn the private key of an initialized device, is if that private key is leaked to the attacker. +Protocol properties are expressed as _lemmas_ in Tamarin. (In mathematics, a lemma is an intermediate step or +proposition, typically used as a stepping stone in the proof of some other theorem.) Lemmas express properties of +protocol execution traces, and by default, they contain an implicit quantification over all execution traces. For +example, consider the following lemma, which informally says that the only way that the attacker could learn the private +key of an initialized device, is if that private key is leaked to the attacker. ```js lemma device_key_confidentiality: @@ -126,25 +176,54 @@ lemma device_key_confidentiality: " ``` -There is a lot going on here. Following the `lemma` keyword is a name which identifies the lemma. This should be something expressive, describing the intended meaning of the statement. The `All` and `Ex` keywords represent universal and existential quantification. The operators `&` and `==>` represent conjunction and implication. (Disjunction is written `|`, and negation is written as `not(...)`). The variables `#i`, `#j`, and `#k` all range over temporal values. Undecorated variables like `private_key` range over messages. +There is a lot going on here. Following the `lemma` keyword is a name which identifies the lemma. This should be +something expressive, describing the intended meaning of the statement. The `All` and `Ex` keywords represent universal +and existential quantification. The operators `&` and `==>` represent conjunction and implication. (Disjunction is +written `|`, and negation is written as `not(...)`). The variables `#i`, `#j`, and `#k` all range over temporal values. +Undecorated variables like `private_key` range over messages. -The statements `DeviceInitialized(...)` and `DeviceKeyLeaked(...)` are special facts called _action facts_ in the rewrite rule-version of Tamarin, or _events_ in the pi calculus-version. `DeviceInitialized(...) @ #i` means that the fact occurs (that is, is added to the execution trace) at time `#i`. Facts in lemmas always have to be qualified with a temporal variable in this way. +The statements `DeviceInitialized(...)` and `DeviceKeyLeaked(...)` are special facts called _action facts_ in the +rewrite rule-version of Tamarin, or _events_ in the pi calculus-version. `DeviceInitialized(...) @ #i` means that the +fact occurs (that is, is added to the execution trace) at time `#i`. Facts in lemmas always have to be qualified with a +temporal variable in this way. -We note that this lemma implicitly contains a universal quantification over all execution traces of the analyzed protocol. It is really saying that "_for any execution of the protocol_, the private key will remain confidential as long as it is not leaked to the attacker." It is possible to make this explicit by adding the keyword `all-traces` before the opening quotation mark. It is also possible to write lemmas that use existential quantification over execution traces using the keyword `exists-trace`. This is useful for proving correctness properties, saying that there exists an execution trace where the protocol completes successfully. +We note that this lemma implicitly contains a universal quantification over all execution traces of the analyzed +protocol. It is really saying that "_for any execution of the protocol_, the private key will remain confidential as +long as it is not leaked to the attacker." It is possible to make this explicit by adding the keyword `all-traces` +before the opening quotation mark. It is also possible to write lemmas that use existential quantification over +execution traces using the keyword `exists-trace`. This is useful for proving correctness properties, saying that there +exists an execution trace where the protocol completes successfully. ```js lemma protocol_correctness: - exists-trace - " + exists-trace + " Ex public_key #i #j. ( DeviceInitialized(public_key) @ i & ServerInitialized(public_key) @ j & All #k. (DeviceKeyLeaked(public_key) @ k ==> F) ) - " + " ``` -This lemma informally says that there is an execution trace where at least one device is initialized, registers its public key with the server, without leaking the private key to the attacker. Note that the final statement, that the device key is not leaked, is formulated in a somewhat roundabout fashion. The reason for this is that lemmas must be expressed in the _guarded fragment_ of first-order temporal logic. In practice, this means that all existentially quantified formulas must be on the form `Ex x. (A(x) @ #i & B(x) @ #j)` and all universally quantified formulas must be on the form `All x. (A(x) @ #i ==> B(x) @ #j)`. (Since `F` represents `false`, the guarded formula `DeviceKeyLeaked(...) @k ==> F` is equivalent to `not(DeviceKeyLeaked(...) @ k)`.) +This lemma informally says that there is an execution trace where at least one device is initialized, registers its +public key with the server, without leaking the private key to the attacker. Note that the final statement, that the +device key is not leaked, is formulated in a somewhat roundabout manner. The reason for this is that lemmas must be +expressed in the _guarded fragment_ of first-order temporal logic. In practice, this means that all existentially +quantified formulas must be on the form `Ex x. (A(x, y, ..., z) @ #i & ...)` and all universally quantified formulas +must be on the form `All x. (A(x, y, ..., z) @ #i ==> ...)`. Since `not(P)` is propositionally equivalent to `P ==> F`, +the final statement allows us to express `All #k. not(DeviceKeyLeaked(public_key) @ #k)` as a guarded formula accepted +by Tamarin. + +{{< hint info >}} + + +**Include a lemma ensuring that the entire protocol can be executed correctly from start to finish** + +This is almost always the first lemma to prove. It may be updated as you add more components of the protocol to your +model, and serves as a form of sanity check or integration test, ensuring that the model can be executed correctly. + +{{< /hint >}} ## Restricting the execution trace diff --git a/content/docs/formal-verification/tamarin/10-rule-based-models.md b/content/docs/formal-verification/tamarin/10-rule-based-models.md index b4a9c0a..85aa532 100644 --- a/content/docs/formal-verification/tamarin/10-rule-based-models.md +++ b/content/docs/formal-verification/tamarin/10-rule-based-models.md @@ -7,46 +7,10 @@ weight: 10 # Rule-based models -A Tamarin model can be defined either using multiset rewrite rules, or as -processes using Sapic+. This section will go through the rewrite rule-based -approach. +A Tamarin model can be defined either using multiset rewrite rules, or as processes using Sapic+. This section will go +through the rewrite rule-based approach. Rewrite rules are defined using the following syntax: -## Facts - -To model a protocol using rewrite rules we view the global state of the protocol -as a multiset of _facts_. In Tamarin, a fact is a statement like "Alice has -registered the public key _Y_ with the server", or "The server successfully -decrypted the message _m_ from Bob". Rewrite rules consume facts and produce -new facts. In this way, they define a transition system which describes how the -global state is updated during the execution of the protocol. Since rules may -consume facts, the global state is modeled as a multiset which may contain -multiple instances of the same fact. - -There are a few facts with predefined meanings that are useful to know about -before we start defining our own rewrite rules: - -- `Fr(x)`: This says that `x` is a _fresh_ value. This means that `x` is random - and cannot be inferred by the attacker. It also means that the value of `x` - is unique across different runs of the protocol. Fresh values are usually used - to model randomly generated keys, nonces, and random IVs. For convenience, - Tamarin has a built-in rule for creating fresh values. This means that any - rule that takes one or more fresh values as input can be executed at any time - since new fresh values can always be created. -- `Out('c', x)`: This says that `x` is sent on the public channel with name - `'c'`. Since we're assuming a Dolev-Yao adversary who is in complete control of - the network, this also means that `x` becomes available to the attacker. If there - is only one channel, the channel identifier `'c'` may be omitted. -- `In('c', x)`: This says that `x` is received on the public channel `'c'`. - Since the attacker controls the network she may drop or alter messages at - will. This means that we cannot assume that the message `x` is received, just - because it is sent by some protocol participant, or conversely, that `x` was sent by some participant, just because the fact `In('c', x)` is true. If there is only one channel, the channel identifier `'c'` may be omitted. -- `K(x)`: This means that the attacker has learned `x`. - -## Rewrite rules - -Rewrite rules are defined using the following syntax: - -```ml +```js rule Rule_name: [ input facts ] --[ action facts ]-> [ output facts ] ``` diff --git a/content/docs/formal-verification/tamarin/_index.md b/content/docs/formal-verification/tamarin/_index.md index fc99321..ff9bb40 100644 --- a/content/docs/formal-verification/tamarin/_index.md +++ b/content/docs/formal-verification/tamarin/_index.md @@ -1,20 +1,18 @@ --- title: "Tamarin" weight: 2 -summary: "Tamarin a a symbolic verification tool that can be used to formally -model security protocols. Protocols are described using either multiset rewrite -rules or a variant of the applied pi-calculus. Security properties are modeled using first-order temporal logic and can be proved either automatically or interactively." +summary: "Tamarin a symbolic verification tool that can be used to formally model security protocols. Protocols are +described using either multiset rewrite rules or a variant of the applied pi-calculus. Security properties are modeled +using first-order temporal logic and can be proved either automatically or interactively." bookCollapseSection: true --- # Tamarin -Tamarin is a formal verification tool that can be used to model the security -properties of a protocol in the symbolic model. Protocols are modelled either -using multiset term rewrite rules, or using a version of the applied -pi-calculus. Security properties are specified using a fragment of first-order -temporal logic, and can be proved either automatically or interactively using a -web-based interface. +Tamarin is a formal verification tool that can be used to model the security properties of a protocol in the symbolic +model. Protocols are modeled either using multiset term rewrite rules, or using a version of the applied pi-calculus. +Security properties are specified using a fragment of first-order temporal logic, and can be proved either automatically +or interactively using a web-based interface. ## Benefits of using Tamarin @@ -29,7 +27,8 @@ web-based interface. - If you need to define custom primitives or prove more complex security properties. not supported by simpler tools like [Verifpal](https://verifpal.com). -- If you need to model concurrent, non-deterministic processes, which is not supported by [Verifpal](https://verifpal.com). +- If you need to model concurrent, non-deterministic processes, which is not supported by + [Verifpal](https://verifpal.com). - If the protocol you are modeling uses multiplicative inverses to cancel out exponents in Diffie-Hellman based sub-protocols, or if you want to consider attacks based on multiplicative behavior against your protocol. From bf81eb9c15289f4adb35303ebe629708fbb09fb5 Mon Sep 17 00:00:00 2001 From: Fredrik Dahlgren Date: Thu, 8 Feb 2024 17:20:57 +0100 Subject: [PATCH 5/6] Added content on builtins and equations --- .github/workflows/dictionary.txt | 1 + content/docs/formal-verification/_index.md | 6 +- .../tamarin/00-basic-concepts.md | 69 +++++++++++++++++-- .../tamarin/10-rule-based-models.md | 9 ++- 4 files changed, 73 insertions(+), 12 deletions(-) diff --git a/.github/workflows/dictionary.txt b/.github/workflows/dictionary.txt index 2cc5d5b..1c47e2f 100644 --- a/.github/workflows/dictionary.txt +++ b/.github/workflows/dictionary.txt @@ -15,6 +15,7 @@ autofix autofixes Autofix Autofixes +autogenerated bitstrings bookCollapseSection bookFlatSection diff --git a/content/docs/formal-verification/_index.md b/content/docs/formal-verification/_index.md index b08130e..5de876d 100644 --- a/content/docs/formal-verification/_index.md +++ b/content/docs/formal-verification/_index.md @@ -17,7 +17,7 @@ verifiers. For each tool, we go though: {{< section >}} -## Background +## Who is formal verification for? Formal verification tools allow us to formally prove security properties of cryptographic protocols. Alternatively, they can often provide counterexamples (and sometimes real attacks) showing that a particular protocol does not guarantee the @@ -58,8 +58,8 @@ cryptographic protocols should be formally verified. Tools used to formally verify cryptographic protocols typically come in one of two different flavors. Tools like [ProVerif](https://bblanche.gitlabpages.inria.fr/proverif/), [Tamarin](https://tamarin-prover.com/), and [Verifpal](https://verifpal.com/) analyze protocols in the _symbolic model_. This means that cryptographic primitives -are modeled as black boxes satisfying certain given equations. For example, symmetric encryption could be modeled using -two functions `encrypt` and `decrypt` satisfying the following equation: +are modeled as black boxes satisfying certain given equations. For example, a symmetric encryption scheme could be +modeled using two functions `encrypt` and `decrypt` satisfying the following equation: ```js decrypt(key, encrypt(key, data)) = data diff --git a/content/docs/formal-verification/tamarin/00-basic-concepts.md b/content/docs/formal-verification/tamarin/00-basic-concepts.md index 9069807..29161f7 100644 --- a/content/docs/formal-verification/tamarin/00-basic-concepts.md +++ b/content/docs/formal-verification/tamarin/00-basic-concepts.md @@ -51,11 +51,12 @@ If you are running NixOS you can install Tamarin by adding `tamarin-prover` to y ## Tamarin basic concepts Tamarin models can be specified either using [multiset](https://en.wikipedia.org/wiki/Multiset) rewrite rules, or using -a process calculus based on a variant of the applied pi-calculus. Protocol properties are specified using a fragment of -temporal logic. This means that we can express properties of terms at given points in time. The global protocol state is -given by a multiset of facts that is expanded and updated as the protocol progresses. Here, a fact is essentially a -statement about a set of terms. For example, "`Alice` has registered the public key `pk` with the server", or "the -attacker has learned the private key `sk`". +a process calculus based on a variant of the applied pi-calculus. The global protocol state is given by a multiset of +facts that is expanded and updated as the protocol progresses. Here, a fact is essentially a statement about a set of +terms. For example, _`Alice` has registered the public key `pk` with the server_, or _The attacker has learned the +private key `sk`_. Protocol properties are expressed as properties of the set of possible executions of the protocol, +and are specified using a decidable fragment of temporal logic. (This means that we can express properties of terms at +different points in time.) ### Variables and sorts @@ -134,6 +135,32 @@ potential typing inconsistencies. {{< /hint >}} +To define equations that the introduced function symbols satisfy, we use the `equations` keyword. For example, we could +express that decryption is the (left-)inverse of encryption as follows: + +```js +equations: + decrypt(key, encrypt(key, data)) = data, + ... +``` + +### Built-in theories + +Tamarin comes with a number of cryptographic primitives such as symmetric and asymmetric encryption, signatures, and +hash functions already built-in. These built-in theories can be added to your project using the `builtins` keyword. For +example, the following would add two function symbols `senc` and `sdec` modeling symmetric encryption, and a function +symbol `h` modeling a hash function, to your project: + +```js +builtins: + symmetric-encryption, + hashing, + ... +``` + +For a complete list of the built-in theories supported by Tamarin, we refer to [the section on cryptographic messages in +the Tamarin user manual](https://tamarin-prover.com/manual/master/book/004_cryptographic-messages.html). + ### Facts Facts express properties about the current state of the protocol. New facts can be introduced by the user, but there are @@ -179,8 +206,8 @@ lemma device_key_confidentiality: There is a lot going on here. Following the `lemma` keyword is a name which identifies the lemma. This should be something expressive, describing the intended meaning of the statement. The `All` and `Ex` keywords represent universal and existential quantification. The operators `&` and `==>` represent conjunction and implication. (Disjunction is -written `|`, and negation is written as `not(...)`). The variables `#i`, `#j`, and `#k` all range over temporal values. -Undecorated variables like `private_key` range over messages. +written `|`, and negation of `P` is written `not(P)`). The variables `#i`, `#j`, and `#k` all range over temporal +values. Undecorated variables like `private_key` range over messages. The statements `DeviceInitialized(...)` and `DeviceKeyLeaked(...)` are special facts called _action facts_ in the rewrite rule-version of Tamarin, or _events_ in the pi calculus-version. `DeviceInitialized(...) @ #i` means that the @@ -227,4 +254,32 @@ model, and serves as a form of sanity check or integration test, ensuring that t ## Restricting the execution trace +Tamarin uses _restrictions_ to restrict the space of execution traces explored by the prover. The most common use case +for restrictions is to model branching behavior. Consider the following restriction which says that if the fact +`EnsureEqual(x, y)` is emitted, then `x` and `y` must evaluate to the same term. + +```js +restriction ensure_equal: + " + All x y #i. (EnsureEqual(x, y) @ #i ==> x = y) + " +``` + +This restriction defines a new fact `EnsureEqual(x, y)` that can be used to restrict execution traces to those where the +two arguments are equal. For example, we can use the fact `EnsureEqual(verify(sig, msg, pub_key), true)` to express that +the signature `sig` must be valid for the protocol to progress. For details on how restrictions are used, see the +following sections which introduce multiset rewrite rules and Sapic+. + ## Rewrite rules or process calculus? + +Tamarin allows the user to specify protocols using either multiset rewrite rules, or as processes using a version of the +applied pi calculus known as Sapic+. There are benefits and drawbacks of both approaches. Tamarin originally only +supported multiset rewrite rules, and if you use processes to specify your protocol, the specification will be +translated into rewrite rules before the prover runs. This means rule names will be autogenerated, and hence +unrecognizable, if you run the prover in interactive mode. Since multiset rewrite rules have been supported from the +start, there are also more examples available describing this approach. This means that if you're looking for examples +for how to model a certain protocol component, you are more likely to find examples using rewrite rules than Sapic+. + +On the other hand, if you are interested in porting your model to ProVerif, Tamarin's process calculus is very similar +to the applied pi-calculus used by ProVerif. Thus, if you are already familiar with ProVerif, or if you are uncertain of +which tool would be best suited for your problem, it may make sense to use processes to model your protocol. diff --git a/content/docs/formal-verification/tamarin/10-rule-based-models.md b/content/docs/formal-verification/tamarin/10-rule-based-models.md index 85aa532..696c590 100644 --- a/content/docs/formal-verification/tamarin/10-rule-based-models.md +++ b/content/docs/formal-verification/tamarin/10-rule-based-models.md @@ -7,10 +7,15 @@ weight: 10 # Rule-based models -A Tamarin model can be defined either using multiset rewrite rules, or as processes using Sapic+. This section will go -through the rewrite rule-based approach. Rewrite rules are defined using the following syntax: +A Tamarin model can be defined either using multiset rewrite rules, or as processes using Sapic+. This section will +introduce the rewrite rule-based approach. + +Rewrite rules are defined using the following syntax: ```js rule Rule_name: [ input facts ] --[ action facts ]-> [ output facts ] ``` + +This should be interpreted as follows: if we have the given input facts, we can execute the rule to emit the action +facts and produce the corresponding output facts. From d95ed3a03cb31d74ce9f5b5268f212658d4adc4b Mon Sep 17 00:00:00 2001 From: Fredrik Dahlgren Date: Fri, 9 Feb 2024 15:20:11 +0100 Subject: [PATCH 6/6] Added more content on Tamarin rewrite rules --- .github/workflows/dictionary.txt | 2 + assets/mermaid.json | 21 ++++ .../tamarin/00-basic-concepts.md | 52 ++++----- .../tamarin/10-rule-based-models.md | 106 +++++++++++++++++- .../formal-verification/tamarin/_index.md | 8 +- 5 files changed, 155 insertions(+), 34 deletions(-) create mode 100644 assets/mermaid.json diff --git a/.github/workflows/dictionary.txt b/.github/workflows/dictionary.txt index 1c47e2f..3c0c462 100644 --- a/.github/workflows/dictionary.txt +++ b/.github/workflows/dictionary.txt @@ -131,6 +131,7 @@ SARIF-file Sapic SAST SDLC +se semgrep Semgrep SEMGREP_SEND_METRICS @@ -139,6 +140,7 @@ semgrep-rules Semgrep's semgrep-rules-manager semgrep-rules-android-security +sequenceDiagram Sheeraz Shreya src diff --git a/assets/mermaid.json b/assets/mermaid.json new file mode 100644 index 0000000..d30ea93 --- /dev/null +++ b/assets/mermaid.json @@ -0,0 +1,21 @@ +{ + "theme": "base", + "themeVariables": { + "actorBkg": "#dcefff", + "actorBorder": "#6d9eeb", + "actorTextColor": "#000000", + "actorLineColor": "#576675", + "noteBkgColor": "#e7edf4", + "noteTextColor": "#000000", + "noteBorderColor": "#d9d9d9" + }, + "fontSize": "20px", + "fontFamily": "monospace", + "sequence": { + "showSequenceNumbers": true + }, + "messageAlign": "center", + "wrap": true, + "useMaxWidth": false, + "noteAlign": "center" +} diff --git a/content/docs/formal-verification/tamarin/00-basic-concepts.md b/content/docs/formal-verification/tamarin/00-basic-concepts.md index 29161f7..ac74261 100644 --- a/content/docs/formal-verification/tamarin/00-basic-concepts.md +++ b/content/docs/formal-verification/tamarin/00-basic-concepts.md @@ -55,8 +55,8 @@ a process calculus based on a variant of the applied pi-calculus. The global pro facts that is expanded and updated as the protocol progresses. Here, a fact is essentially a statement about a set of terms. For example, _`Alice` has registered the public key `pk` with the server_, or _The attacker has learned the private key `sk`_. Protocol properties are expressed as properties of the set of possible executions of the protocol, -and are specified using a decidable fragment of temporal logic. (This means that we can express properties of terms at -different points in time.) +and are specified using a fragment of temporal logic. (This means that we can express properties of terms at different +points in time.) ### Variables and sorts @@ -85,8 +85,7 @@ Terms may be composed using functions. These are either user defined, or defined {{< hint info >}} - -**Use descriptive variable and function names** +#### Use descriptive variable and function names Avoid using one-letter names for user-defined variables and functions. If you are modeling an existing protocol, reuse names from the specification and include clear references to the section of the specification where they are defined. @@ -99,7 +98,7 @@ User-defined functions are declared using the `functions` keyword. New functions It is possible to define new functions by simply specifying the function name and arity (how many arguments it takes) as follows: -```js +```ml functions: encrypt/2, decrypt/2, @@ -108,7 +107,7 @@ functions: New functions can also be defined by specifying the full signature of the function, using user-defined type names, as follows: -```js +```ml functions: encrypt(SecretKey, Bytes): Bytes, decrypt(SecretKey, Bytes): Bytes, @@ -126,8 +125,7 @@ phase is complete. In particular, they do not affect how the attacker may use th {{< hint info >}} - -**Include function signatures with descriptive type names when declaring new functions** +#### Include function signatures with descriptive type names when declaring new functions We recommend always including the full function signature when introducing new functions. This makes function definitions easier to parse, allows you to take full advantage of Tamarin's build-in typing checking, and avoid @@ -138,7 +136,7 @@ potential typing inconsistencies. To define equations that the introduced function symbols satisfy, we use the `equations` keyword. For example, we could express that decryption is the (left-)inverse of encryption as follows: -```js +```ml equations: decrypt(key, encrypt(key, data)) = data, ... @@ -151,7 +149,7 @@ hash functions already built-in. These built-in theories can be added to your pr example, the following would add two function symbols `senc` and `sdec` modeling symmetric encryption, and a function symbol `h` modeling a hash function, to your project: -```js +```ml builtins: symmetric-encryption, hashing, @@ -184,7 +182,7 @@ a few predefined facts that are useful to know about. sometimes useful when formulating security properties. (See below for an example.) -## Protocol properties and lemmas +### Protocol properties and lemmas Protocol properties are expressed as _lemmas_ in Tamarin. (In mathematics, a lemma is an intermediate step or proposition, typically used as a stepping stone in the proof of some other theorem.) Lemmas express properties of @@ -192,13 +190,13 @@ protocol execution traces, and by default, they contain an implicit quantificati example, consider the following lemma, which informally says that the only way that the attacker could learn the private key of an initialized device, is if that private key is leaked to the attacker. -```js +```ml lemma device_key_confidentiality: " - All private_key #i #j. ( - DeviceInitialized(to_public_key(private_key)) @ #i & K(private_key) @ #j + All priv_key #i #j. ( + DeviceInitialized(to_pub_key(priv_key)) @ #i & K(priv_key) @ #j ==> - Ex #k. (k < j & DeviceKeyLeaked(to_public_key(private_key)) @ #k) + Ex #k. (k < j & DeviceKeyLeaked(to_pub_key(priv_key)) @ #k) ) " ``` @@ -209,10 +207,9 @@ and existential quantification. The operators `&` and `==>` represent conjunctio written `|`, and negation of `P` is written `not(P)`). The variables `#i`, `#j`, and `#k` all range over temporal values. Undecorated variables like `private_key` range over messages. -The statements `DeviceInitialized(...)` and `DeviceKeyLeaked(...)` are special facts called _action facts_ in the -rewrite rule-version of Tamarin, or _events_ in the pi calculus-version. `DeviceInitialized(...) @ #i` means that the -fact occurs (that is, is added to the execution trace) at time `#i`. Facts in lemmas always have to be qualified with a -temporal variable in this way. +The statements `DeviceInitialized(...)` and `DeviceKeyLeaked(...)` are facts. `DeviceInitialized(...) @ #i` means that +the fact occurs (that is, is added to the execution trace or is emitted as an event or action fact) at time `#i`. Facts +in lemmas always have to be qualified with a temporal variable in this way. We note that this lemma implicitly contains a universal quantification over all execution traces of the analyzed protocol. It is really saying that "_for any execution of the protocol_, the private key will remain confidential as @@ -221,14 +218,14 @@ before the opening quotation mark. It is also possible to write lemmas that use execution traces using the keyword `exists-trace`. This is useful for proving correctness properties, saying that there exists an execution trace where the protocol completes successfully. -```js +```ml lemma protocol_correctness: exists-trace " - Ex public_key #i #j. ( - DeviceInitialized(public_key) @ i & - ServerInitialized(public_key) @ j & - All #k. (DeviceKeyLeaked(public_key) @ k ==> F) + Ex pub_key #i #j. ( + DeviceInitialized(pub_key) @ i & + ServerInitialized(pub_key) @ j & + All #k. (DeviceKeyLeaked(pub_key) @ k ==> F) ) " ``` @@ -244,21 +241,20 @@ by Tamarin. {{< hint info >}} - -**Include a lemma ensuring that the entire protocol can be executed correctly from start to finish** +#### Include a lemma ensuring that the entire protocol can be executed correctly from start to finish This is almost always the first lemma to prove. It may be updated as you add more components of the protocol to your model, and serves as a form of sanity check or integration test, ensuring that the model can be executed correctly. {{< /hint >}} -## Restricting the execution trace +### Restricting the execution trace Tamarin uses _restrictions_ to restrict the space of execution traces explored by the prover. The most common use case for restrictions is to model branching behavior. Consider the following restriction which says that if the fact `EnsureEqual(x, y)` is emitted, then `x` and `y` must evaluate to the same term. -```js +```ml restriction ensure_equal: " All x y #i. (EnsureEqual(x, y) @ #i ==> x = y) diff --git a/content/docs/formal-verification/tamarin/10-rule-based-models.md b/content/docs/formal-verification/tamarin/10-rule-based-models.md index 696c590..54a1fef 100644 --- a/content/docs/formal-verification/tamarin/10-rule-based-models.md +++ b/content/docs/formal-verification/tamarin/10-rule-based-models.md @@ -10,12 +10,114 @@ weight: 10 A Tamarin model can be defined either using multiset rewrite rules, or as processes using Sapic+. This section will introduce the rewrite rule-based approach. +## Execution traces + +In Tamarin, the execution state of a protocol is given by a multiset of facts. We execute the protocol by applying +rewrite rules to the state, which remove some facts and add others to the state. A single execution of the +protocol, starting from an empty state, is called an _execution trace_. + +## Defining rewrite rules + Rewrite rules are defined using the following syntax: -```js +```ml rule Rule_name: [ input facts ] --[ action facts ]-> [ output facts ] ``` This should be interpreted as follows: if we have the given input facts, we can execute the rule to emit the action -facts and produce the corresponding output facts. +facts and produce the corresponding output facts. By default, the input facts are replaced by the output fact in the +global state. _Action facts_ can be thought of as signals, indicating that some event has occurred, or that a term +satisfies a certain property. They can be referenced by lemmas and are useful for describing protocol properties that do +not directly reference the execution state. Some rules don't have action facts. In this case, we can abbreviate the rule +definition using the following shorter syntax. + +```ml +rule Rule_name: + [ input facts ] --> [ output facts ] +``` + +It is also possible to include definitions using `let` statements to make rules more readable: + +```ml +rule Rule_name: + let + x = f(a, b, ...) + y = g(a, b, ...) + in + [ input facts containing a, b, ... ] --> [ output facts containing x, y, a, b, ... ] +``` + +## Linear and persistent facts + +By default, Tamarin rules consume the set of input facts when they are executed. This means that if the global state is +given by `{DeviceState(~x, pk(~x)), F(~y)}` and we execute the following rule, then `Fr(~y)` would be consumed and the +updated global state would be given by `{DeviceState(~x, pk(~x)), DeviceState(~y, pk(~y))}`. + +```ml +rule Initialize_device: + let + public_key = pk(~private_key) + in + [ Fr(~private_key) ] --> [ DeviceState(~private_key, public_key) ] +``` + +Consuming `Fr(~y)` here ensures that the random value `~y` is not reused which would risk breaking the security +guarantees of the scheme. However, facts like `DeviceState(~y, pk(~y))`, which represents an initialized device, should +probably be reusable multiple times. For example, we may have a rule which signs a message using the private key `~y` +that we want to be able to execute multiple times. + +We can indicate to Tamarin that `DeviceState(~y, pk(~y))` is a _persistent fact_ and is not consumed when it is used +as input to a rule, by prefixing the name with an exclamation mark `!`. (Note that the exclamation mark must be used +wherever `DeviceState` is used to signal that `DeviceState` is a persistent fact.) Ordinary facts, that are consumed +when used, are called _linear facts_. + +## Modeling local state + +Linear facts can be used to model the current state of protocol participants. As an example, consider the following +protocol between Alice and Bob. + +{{< mermaid >}} +sequenceDiagram + Alice ->> Bob: e + Bob ->> Alice: e, ee, s, es + Alice ->> Bob: s, se +{{< /mermaid >}} + +The diagram is a high-level description of the Noise XX handshake between Alice and Bob. We don't need to worry about +the content of the individual messages sent. However, it is clear that Alice and Bob have to track their local state +to be able to progress and complete the handshake since Alice's message in step 3 of the protocol depends on what she +sent to Bob in step 1. Linear facts are useful when modeling this type of linear progression through a protocol. We can +denote Alice's local state after step 1 by `Alice_1(...)`, and have the rule modeling step 1 output Alice's local state. + +```ml +rule Noise_XX_1: + [ ... ] + --> + [ + Alice_1(...), + Out(...) + ] +``` + +In step 3, when Alice responds to Bob's message to complete the handshake, the local state `Alice_1(...)` is consumed. +This ensures that the attacker cannot rewind the protocol and complete the handshake multiple times from `Alice_1(...)`. + +```ml +rule Noise_XX_3: + [ + Alice_1(...), + In(...) + ] + --> + [ + Out(...) + ] +``` + +## Where do facts come from? + +Most rules take some facts as input and produce new facts as output. A natural question is, how do we get started? For +example, if each execution trace starts from the empty state, how would we ever be able to execute the +`Initialize_device` rule defined above? The answer is that Tamarin has a built-in rule to generate fresh values. In +fact, `Fr(...)` may only occur as input in user-defined rules to ensure that the corresponding value is unique. diff --git a/content/docs/formal-verification/tamarin/_index.md b/content/docs/formal-verification/tamarin/_index.md index ff9bb40..e82523a 100644 --- a/content/docs/formal-verification/tamarin/_index.md +++ b/content/docs/formal-verification/tamarin/_index.md @@ -9,10 +9,10 @@ bookCollapseSection: true # Tamarin -Tamarin is a formal verification tool that can be used to model the security properties of a protocol in the symbolic -model. Protocols are modeled either using multiset term rewrite rules, or using a version of the applied pi-calculus. -Security properties are specified using a fragment of first-order temporal logic, and can be proved either automatically -or interactively using a web-based interface. +Tamarin is a symbolic verification tool that can be used to model the security properties of cryptographic protocols. +Protocols are modeled either using multiset term rewrite rules, or using a version of the applied pi-calculus. Security +properties are specified using a fragment of first-order temporal logic, and can be proved either automatically or +interactively using a web-based interface. ## Benefits of using Tamarin