diff --git a/.github/workflows/dictionary.txt b/.github/workflows/dictionary.txt index d5c8194..b03100e 100644 --- a/.github/workflows/dictionary.txt +++ b/.github/workflows/dictionary.txt @@ -1,20 +1,26 @@ .semgrepignore +0x30 0xdea 20-ci AFL APIs AppSec +arity AST ATL +auditability autobuild autocomplete autofix autofixes Autofix Autofixes +autogenerated +bitstrings bookCollapseSection bookFlatSection cargo-fuzz +ciphertext CLI CMake codebase @@ -24,6 +30,9 @@ CodeQL codeql-queries config Config +cryptographic +CryptoVerif +CryptoVerif's CSV CTFs Ctrl @@ -33,8 +42,11 @@ dataflow DeepSemgrep DevOps derefs +Disjunction +Diffie-Hellman dgryski Dockerfiles +Dolev-Yao elttam eval Exiv2 @@ -52,6 +64,8 @@ goroutine GuardDog Homebrew hugo +IETF +indistinguishability interprocedural intraprocedural ioutil-readdir-deprecated @@ -63,6 +77,7 @@ lifecycle LSP lxml-in-pandas macOS +MacOS memcpy Memcpy metavariable @@ -80,17 +95,29 @@ mindedsecurity misconfiguration multiline multivalued +multiset natively Neovim +NIST +NixOS +Nixpkgs NodeJsScan +nonce +nonces Nullcon OCaml OWASP +Pacman PCRE-compatible +plaintext Pohekar postMessage pre-built pre-compiled +propositionally +prover +ProVerif +ProVerif's README QL repo @@ -101,8 +128,10 @@ Rulesets sanitization SARIF SARIF-file +Sapic SAST SDLC +se semgrep Semgrep SEMGREP_SEND_METRICS @@ -111,17 +140,26 @@ semgrep-rules Semgrep's semgrep-rules-manager semgrep-rules-android-security +sequenceDiagram Sheeraz Shreya src subdirectories +subsorts +Tamarin +Tamarin's trailofbits Triaging Trello +Undecorated unhandled unsanitized untracked untrusted +verifier +verifiers +Verifpal +Verifpal's VSCode vuln XSS diff --git a/assets/mermaid.json b/assets/mermaid.json new file mode 100644 index 0000000..d30ea93 --- /dev/null +++ b/assets/mermaid.json @@ -0,0 +1,21 @@ +{ + "theme": "base", + "themeVariables": { + "actorBkg": "#dcefff", + "actorBorder": "#6d9eeb", + "actorTextColor": "#000000", + "actorLineColor": "#576675", + "noteBkgColor": "#e7edf4", + "noteTextColor": "#000000", + "noteBorderColor": "#d9d9d9" + }, + "fontSize": "20px", + "fontFamily": "monospace", + "sequence": { + "showSequenceNumbers": true + }, + "messageAlign": "center", + "wrap": true, + "useMaxWidth": false, + "noteAlign": "center" +} diff --git a/content/_index.md b/content/_index.md index 53a5c69..d729a3f 100644 --- a/content/_index.md +++ b/content/_index.md @@ -66,6 +66,12 @@ We currently cover the following tools and techniques: - [Fuzzing]({{< relref "fuzzing" >}}) +<---> + +#### Formal verification + +- [Tamarin]({{< relref "tamarin" >}}) + {{< /columns >}} We are working on expanding the tools we cover here. We are also planning to @@ -106,4 +112,4 @@ codeql database analyze codeql.db --format=sarif-latest --output=results.sarif - We want to actively maintain the highest possible quality and expand the content of the Testing Handbook. If you see a way to improve the Testing Handbook, please let us know! The best way to let us know is by raising an issue directly on the [Testing Handbook GitHub page](https://github.com/trailofbits/testing-handbook). - \ No newline at end of file + diff --git a/content/docs/formal-verification/_index.md b/content/docs/formal-verification/_index.md new file mode 100644 index 0000000..5de876d --- /dev/null +++ b/content/docs/formal-verification/_index.md @@ -0,0 +1,137 @@ +--- +weight: 2 +bookFlatSection: true +title: "Formal verification" +--- + +# Formal verification + +This section presents a number of tools that can be used to formally verify +cryptographic protocols. It currently covers the symbolic protocol verifier +Tamarin, but our long-term goal is to cover both symbolic and computational +verifiers. For each tool, we go though: + +- Installation and basic use +- How to define and formally verify a new model +- Tool-specific pain points, and potential workarounds + +{{< section >}} + +## Who is formal verification for? + +Formal verification tools allow us to formally prove security properties of cryptographic protocols. Alternatively, they +can often provide counterexamples (and sometimes real attacks) showing that a particular protocol does not guarantee the +security properties that we expect it to. To formally verify a protocol we need to describe the protocol we are +reviewing in a language understood by the tool, and we also need to describe the security properties that we would like +the protocol to have. The tool will then automatically search for a formal proof that the properties we have specified +are upheld by the protocol, and if it terminates, it will output either a proof showing that the properties hold, or a +counterexample showing how some property fails to hold. + +For this reason, formal verification tools provide great value for anyone designing a new cryptographic protocol. They +allow developers to verify that a new design meets the expected security guarantees. They allow us to experiment with +the design and compare the security properties and trade-offs between different design choices. Formal verification +tools also provide great value for anyone who is modifying or extending an already existing protocol. If the original +protocol already has a formal model, modeling an extension to the protocol is typically cheap, allowing the developer to +prove that the protocol extension is secure without having to model the entire protocol from scratch. + +{{< hint danger >}} + +### What is a cryptographic protocol? + +You may think that only cryptographers design cryptographic protocols, and that you don't need complex tools to +understand the security properties of your relatively simple use case. However, our experience shows that it makes more +sense to take the following very broad view of the term _cryptographic protocol_, which also has implications for formal +verification. + +**Anyone who is composing different cryptographic primitives (like encryption, signatures, and hashes) in a way that has +not been previously specified by a standards document like an IETF RFC or NIST standard, and has not been analyzed in a +public academic paper, is designing a new cryptographic protocol.** + +New cryptographic protocols need to be proven secure before they are deployed. Formal verification is one way of +achieving this which ensures both correctness and auditability. At Trail of Bits, we recommend that _all_ new +cryptographic protocols should be formally verified. + +{{< /hint >}} + +## The symbolic model or the computational model? + +Tools used to formally verify cryptographic protocols typically come in one of two different flavors. Tools like +[ProVerif](https://bblanche.gitlabpages.inria.fr/proverif/), [Tamarin](https://tamarin-prover.com/), and +[Verifpal](https://verifpal.com/) analyze protocols in the _symbolic model_. This means that cryptographic primitives +are modeled as black boxes satisfying certain given equations. For example, a symmetric encryption scheme could be +modeled using two functions `encrypt` and `decrypt` satisfying the following equation: + +```js +decrypt(key, encrypt(key, data)) = data +``` + +This allows the tool to replace any occurrence of the term `decrypt(key, encrypt(key, data))` with `data`, but it says +nothing about the strength of the encryption algorithm. In a sense, it treats the encryption as perfect, since the only +way to obtain the plaintext from the ciphertext is with knowledge of the correct key. Modeling cryptographic primitives +in this way implies a number of trade-offs that may not be immediately obvious. On the one hand, it provides clean +abstractions that allows us to specify the high-level building blocks of cryptographic protocols without having to worry +too much about how each cryptographic primitive is implemented. This also allows us to instantiate the protocol with any +primitive that satisfies the given equations and still be sure that the security proofs hold. On the other hand, it +means that we cannot reason about some things like brute-force or padding-oracle attacks in a straight-forward manner. +(If we want to do this, we need to model our primitives in a way that allow us to express these types of attacks within +the model. Depending on the security properties we are interested in, this may either add complexity, which often has +adverse effects on proving time, or may sometimes be impossible within the symbolic model.) + +Symbolic verification tools typically model the network as untrusted, allowing the attacker to intercept, delay, modify, +and replay messages between participants at will. This is known as the _Dolev-Yao model_, and was first described in the +paper [_On the Security of Public-Key Protocols_](https://www.cs.huji.ac.il/~dolev/pubs/dolev-yao-ieee-01056650.pdf). +Individual tools often provide abstractions (like the `reliable-channel` builtin in Tamarin or the `passive` keyword in +Verifpal) that can be used to restrict the capabilities of the attacker in different ways. + +Tools like [CryptoVerif](https://bblanche.gitlabpages.inria.fr/CryptoVerif/) analyze protocols in the _computational +model_. Here, individual messages are modeled as bitstrings, cryptographic primitives are modeled as functions on +bitstrings, and the adversary is modeled as an arbitrary probabilistic polynomial-time Turing Machine. The probability +of the attacker successfully breaking the security of a given primitive has to be provided up-front by the user. The +tool can then automatically prove indistinguishability between sequences of games (up to some probability), where the +first game captures the analyzed protocol, and the final game is one where the required security property is obvious. +The output is given as a bound on the probability of a successful attack as a function of the security parameters of the +individual primitives used, and the number of protocol executions. A key benefit of the computational model is that this +mimics how cryptographers usually prove security of cryptographic protocols. This makes it easier to carry over +constructions from traditional pen-and-paper proofs to automated proofs. + +The computational model is clearly more expressive than the symbolic model. However, it is also presupposes a deep +understanding of provable security and game-based proofs, which is not required when working in the symbolic model. + +Ultimately, which tool to use depends as much on your background as on the protocol you are trying to analyze. If you +don't have a background in cryptography or formal verification we recommend that you start out with a simple symbolic +verification tool like Verifpal. If you struggle to model your protocol in Verifpal, or need to express more complex +security properties that are not supported by the tool, we suggest switching to a more expressive and mature symbolic +prover like ProVerif or Tamarin. Finally, if you want to bound attack probabilities or translate a game-based proof from +a paper, you need to work in the computational model with a tool like CryptoVerif. + +{{< hint info >}} + +### Where do I start? + +[Verifpal](https://verifpal.com) is a great starting point if you have no previous exposure to formal verification. It +has a small set of built-in primitives that cover a range of use-cases, and the syntax mimics how developers typically +think about cryptographic protocols. Verifpal's intuitive syntax also makes the resulting models easier to parse and +maintain for less experienced users. + +This is an important point. If you develop a formal model of a protocol that is too complex for the developers of the +protocol to maintain, then the proof becomes a snapshot in time with limited usefulness. If developers understand the +model and can update and extend it as the protocol develops, it can be built on to provide assurance for future versions +of the protocol as well. + +{{< /hint >}} + +{{< hint info >}} + +### How do I start? + +Since most cryptographic protocols have a large number of valid states, formal verification tools often struggle because +of state-space explosion. For this reason, it is generally a good idea to try to divide your protocol into smaller +components, or sub-protocols, like _registering a new device_ or _sending and receiving a single message_, and try to +prove that these individual components provide the expected security guarantees. + +This approach clearly limits what the attacker can do since each sub-protocol is analyzed in a vacuum. However, it makes +each model more manageable and helps avoid the issue of state-space explosion. At Trail of Bits we often use this method +to quickly model protocol components on cryptographic design reviews. It is both a useful tool to verify our +understanding of the protocol, and to identify potential weaknesses in the design. + +{{< /hint >}} diff --git a/content/docs/formal-verification/tamarin/00-basic-concepts.md b/content/docs/formal-verification/tamarin/00-basic-concepts.md new file mode 100644 index 0000000..ac74261 --- /dev/null +++ b/content/docs/formal-verification/tamarin/00-basic-concepts.md @@ -0,0 +1,281 @@ +--- +title: "Basic concepts" +slug: basic-concepts +summary: "This section explains the basic concepts of Tamarin." +weight: 1 +--- + +# Installation and basic concepts + +## Installing Tamarin + +We recommend installing Tamarin using a package manager to ensure that the correct dependencies are installed together +with the tool. Tamarin is currently packaged for Homebrew, Arch Linux, Nixpkgs, and NixOS. + +{{< tabs "installing-tamarin" >}} + +{{< tab "Using Homebrew" >}} +On MacOS and Linux, you can install Tamarin using Homebrew as follows: + +```sh +brew install tamarin-prover/tap/tamarin-prover +``` + +{{< /tab >}} + +{{< tab "Using Pacman" >}} +On Arch Linux you can install Tamarin using Pacman with the following command: + +```sh +pacman -S tamarin-prover +``` + +{{< /tab >}} + +{{< tab "Using the Nix package manager" >}} +You can install Tamarin from Nixpkgs using the Nix package manager with the following command: + +```sh +nix-env -i tamarin-prover +``` + +{{< /tab >}} + +{{< tab "On NixOS" >}} +If you are running NixOS you can install Tamarin by adding `tamarin-prover` to your `environment.systemPackages`. + +{{< /tab >}} + +{{< /tabs >}} + +## Tamarin basic concepts + +Tamarin models can be specified either using [multiset](https://en.wikipedia.org/wiki/Multiset) rewrite rules, or using +a process calculus based on a variant of the applied pi-calculus. The global protocol state is given by a multiset of +facts that is expanded and updated as the protocol progresses. Here, a fact is essentially a statement about a set of +terms. For example, _`Alice` has registered the public key `pk` with the server_, or _The attacker has learned the +private key `sk`_. Protocol properties are expressed as properties of the set of possible executions of the protocol, +and are specified using a fragment of temporal logic. (This means that we can express properties of terms at different +points in time.) + +### Variables and sorts + +Variables in Tamarin range over five different sorts (or types). There is a top sort for messages that can be sent over +the network, with subsorts for public values, fresh (or private) values, and natural numbers. There is also a sort for +temporal values (essentially, points in time). To make the sort clear from context, each sort is prefixed by a single +character. + +- `~x` denotes a fresh value. _Fresh values_ are random values like secret keys + and nonces and are not known to the attacker. +- `$x` denotes a public value. _Public values_ are publicly known. In + particular, they are known to the attacker. Constant public values are + written as strings (without the `$`). For example, the constant 0x30 is + written `'0x30'`, and a constant generator _g_ of a cyclic group is written + `'g'`. (For example, the public key corresponding to the private key `~x` in + a Diffie-Hellman based protocol is written `'g'^~x`.) +- `%n` denotes a natural number. _Natural numbers_ start at `%1` in Tamarin and + require the `natural-numbers` builtin to use. They are used to represent + (small) numbers like counters, and are assumed to be known to the attacker. +- `#i` denotes a temporal value. _Temporal values_ are not used to describe the + protocol directly, but are used to express protocol properties like "If the + attacker knows Alice's key at time `#j`, then Alice's key must have been + leaked at some time `#i` before `#j`." + +Terms may be composed using functions. These are either user defined, or defined by importing a predefined theory (known as a `builtin`). + +{{< hint info >}} + +#### Use descriptive variable and function names + +Avoid using one-letter names for user-defined variables and functions. If you are modeling an existing protocol, reuse +names from the specification and include clear references to the section of the specification where they are defined. + +{{< /hint >}} + +### Functions and equations + +User-defined functions are declared using the `functions` keyword. New functions can be defined in two different ways. +It is possible to define new functions by simply specifying the function name and arity (how many arguments it takes) as +follows: + +```ml +functions: + encrypt/2, + decrypt/2, + ... +``` + +New functions can also be defined by specifying the full signature of the function, using user-defined type names, as follows: + +```ml +functions: + encrypt(SecretKey, Bytes): Bytes, + decrypt(SecretKey, Bytes): Bytes, + ... +``` + +Here, `SecretKey` and `Bytes` are arbitrary type names specified by the user. There are a number of benefits to using +explicit types in this way when defining your own functions. Apart from making definitions more readable, all equations +are type checked by Tamarin before the prover runs. This ensures that any issues due to inconsistent typing of arguments +is caught early in the modeling process. For example, using the term `encrypt(key, message)` in one location would +cause Tamarin to infer that `key` has type `SecretKey` and `message` has type `Bytes`. If we later introduced the term +`decrypt(message, key)` Tamarin would complain, saying that it expected the first argument to `decrypt` to have the type +`SecretKey`, but that `message` must be of type `Bytes`. Note that function types are ignored after the type-checking +phase is complete. In particular, they do not affect how the attacker may use the defined function. + +{{< hint info >}} + +#### Include function signatures with descriptive type names when declaring new functions + +We recommend always including the full function signature when introducing new functions. This makes function +definitions easier to parse, allows you to take full advantage of Tamarin's build-in typing checking, and avoid +potential typing inconsistencies. + +{{< /hint >}} + +To define equations that the introduced function symbols satisfy, we use the `equations` keyword. For example, we could +express that decryption is the (left-)inverse of encryption as follows: + +```ml +equations: + decrypt(key, encrypt(key, data)) = data, + ... +``` + +### Built-in theories + +Tamarin comes with a number of cryptographic primitives such as symmetric and asymmetric encryption, signatures, and +hash functions already built-in. These built-in theories can be added to your project using the `builtins` keyword. For +example, the following would add two function symbols `senc` and `sdec` modeling symmetric encryption, and a function +symbol `h` modeling a hash function, to your project: + +```ml +builtins: + symmetric-encryption, + hashing, + ... +``` + +For a complete list of the built-in theories supported by Tamarin, we refer to [the section on cryptographic messages in +the Tamarin user manual](https://tamarin-prover.com/manual/master/book/004_cryptographic-messages.html). + +### Facts + +Facts express properties about the current state of the protocol. New facts can be introduced by the user, but there are +a few predefined facts that are useful to know about. + +- `Fr(x)` says that `x` is a fresh value. That is, a random value which cannot + be inferred by the attacker. +- `Out('c', x)` says that `x` is sent on the public channel with name `'c'`. + Since we're assuming a Dolev-Yao adversary who is in complete control of the + network, this also means that `x` becomes available to the attacker as soon as + it is sent over the network. If there is only one channel, the channel + identifier `'c'` may be omitted. +- `In('c', x)` says that `x` is received on the public channel `'c'`. Since the + attacker controls the network she may drop or alter messages at will. This + means that we cannot assume that the message `x` is ever received just + because it is sent at some point by some honest participant, or conversely, + that `x` was sent by an honest participant, just because `x` was received. If + there is only one channel, the channel identifier `'c'` may be omitted. +- `K(x)` says that the attacker knows the value `x`. +- `T` and `F` denote the boolean constants `true` and `false`. They are + sometimes useful when formulating security properties. (See below for an + example.) + +### Protocol properties and lemmas + +Protocol properties are expressed as _lemmas_ in Tamarin. (In mathematics, a lemma is an intermediate step or +proposition, typically used as a stepping stone in the proof of some other theorem.) Lemmas express properties of +protocol execution traces, and by default, they contain an implicit quantification over all execution traces. For +example, consider the following lemma, which informally says that the only way that the attacker could learn the private +key of an initialized device, is if that private key is leaked to the attacker. + +```ml +lemma device_key_confidentiality: + " + All priv_key #i #j. ( + DeviceInitialized(to_pub_key(priv_key)) @ #i & K(priv_key) @ #j + ==> + Ex #k. (k < j & DeviceKeyLeaked(to_pub_key(priv_key)) @ #k) + ) + " +``` + +There is a lot going on here. Following the `lemma` keyword is a name which identifies the lemma. This should be +something expressive, describing the intended meaning of the statement. The `All` and `Ex` keywords represent universal +and existential quantification. The operators `&` and `==>` represent conjunction and implication. (Disjunction is +written `|`, and negation of `P` is written `not(P)`). The variables `#i`, `#j`, and `#k` all range over temporal +values. Undecorated variables like `private_key` range over messages. + +The statements `DeviceInitialized(...)` and `DeviceKeyLeaked(...)` are facts. `DeviceInitialized(...) @ #i` means that +the fact occurs (that is, is added to the execution trace or is emitted as an event or action fact) at time `#i`. Facts +in lemmas always have to be qualified with a temporal variable in this way. + +We note that this lemma implicitly contains a universal quantification over all execution traces of the analyzed +protocol. It is really saying that "_for any execution of the protocol_, the private key will remain confidential as +long as it is not leaked to the attacker." It is possible to make this explicit by adding the keyword `all-traces` +before the opening quotation mark. It is also possible to write lemmas that use existential quantification over +execution traces using the keyword `exists-trace`. This is useful for proving correctness properties, saying that there +exists an execution trace where the protocol completes successfully. + +```ml +lemma protocol_correctness: + exists-trace + " + Ex pub_key #i #j. ( + DeviceInitialized(pub_key) @ i & + ServerInitialized(pub_key) @ j & + All #k. (DeviceKeyLeaked(pub_key) @ k ==> F) + ) + " +``` + +This lemma informally says that there is an execution trace where at least one device is initialized, registers its +public key with the server, without leaking the private key to the attacker. Note that the final statement, that the +device key is not leaked, is formulated in a somewhat roundabout manner. The reason for this is that lemmas must be +expressed in the _guarded fragment_ of first-order temporal logic. In practice, this means that all existentially +quantified formulas must be on the form `Ex x. (A(x, y, ..., z) @ #i & ...)` and all universally quantified formulas +must be on the form `All x. (A(x, y, ..., z) @ #i ==> ...)`. Since `not(P)` is propositionally equivalent to `P ==> F`, +the final statement allows us to express `All #k. not(DeviceKeyLeaked(public_key) @ #k)` as a guarded formula accepted +by Tamarin. + +{{< hint info >}} + +#### Include a lemma ensuring that the entire protocol can be executed correctly from start to finish + +This is almost always the first lemma to prove. It may be updated as you add more components of the protocol to your +model, and serves as a form of sanity check or integration test, ensuring that the model can be executed correctly. + +{{< /hint >}} + +### Restricting the execution trace + +Tamarin uses _restrictions_ to restrict the space of execution traces explored by the prover. The most common use case +for restrictions is to model branching behavior. Consider the following restriction which says that if the fact +`EnsureEqual(x, y)` is emitted, then `x` and `y` must evaluate to the same term. + +```ml +restriction ensure_equal: + " + All x y #i. (EnsureEqual(x, y) @ #i ==> x = y) + " +``` + +This restriction defines a new fact `EnsureEqual(x, y)` that can be used to restrict execution traces to those where the +two arguments are equal. For example, we can use the fact `EnsureEqual(verify(sig, msg, pub_key), true)` to express that +the signature `sig` must be valid for the protocol to progress. For details on how restrictions are used, see the +following sections which introduce multiset rewrite rules and Sapic+. + +## Rewrite rules or process calculus? + +Tamarin allows the user to specify protocols using either multiset rewrite rules, or as processes using a version of the +applied pi calculus known as Sapic+. There are benefits and drawbacks of both approaches. Tamarin originally only +supported multiset rewrite rules, and if you use processes to specify your protocol, the specification will be +translated into rewrite rules before the prover runs. This means rule names will be autogenerated, and hence +unrecognizable, if you run the prover in interactive mode. Since multiset rewrite rules have been supported from the +start, there are also more examples available describing this approach. This means that if you're looking for examples +for how to model a certain protocol component, you are more likely to find examples using rewrite rules than Sapic+. + +On the other hand, if you are interested in porting your model to ProVerif, Tamarin's process calculus is very similar +to the applied pi-calculus used by ProVerif. Thus, if you are already familiar with ProVerif, or if you are uncertain of +which tool would be best suited for your problem, it may make sense to use processes to model your protocol. diff --git a/content/docs/formal-verification/tamarin/10-rule-based-models.md b/content/docs/formal-verification/tamarin/10-rule-based-models.md new file mode 100644 index 0000000..54a1fef --- /dev/null +++ b/content/docs/formal-verification/tamarin/10-rule-based-models.md @@ -0,0 +1,123 @@ +--- +title: "Rule-based models" +slug: rule-based-models +summary: "This section describes how to define a Tamarin model using multiset rewrite rules." +weight: 10 +--- + +# Rule-based models + +A Tamarin model can be defined either using multiset rewrite rules, or as processes using Sapic+. This section will +introduce the rewrite rule-based approach. + +## Execution traces + +In Tamarin, the execution state of a protocol is given by a multiset of facts. We execute the protocol by applying +rewrite rules to the state, which remove some facts and add others to the state. A single execution of the +protocol, starting from an empty state, is called an _execution trace_. + +## Defining rewrite rules + +Rewrite rules are defined using the following syntax: + +```ml +rule Rule_name: + [ input facts ] --[ action facts ]-> [ output facts ] +``` + +This should be interpreted as follows: if we have the given input facts, we can execute the rule to emit the action +facts and produce the corresponding output facts. By default, the input facts are replaced by the output fact in the +global state. _Action facts_ can be thought of as signals, indicating that some event has occurred, or that a term +satisfies a certain property. They can be referenced by lemmas and are useful for describing protocol properties that do +not directly reference the execution state. Some rules don't have action facts. In this case, we can abbreviate the rule +definition using the following shorter syntax. + +```ml +rule Rule_name: + [ input facts ] --> [ output facts ] +``` + +It is also possible to include definitions using `let` statements to make rules more readable: + +```ml +rule Rule_name: + let + x = f(a, b, ...) + y = g(a, b, ...) + in + [ input facts containing a, b, ... ] --> [ output facts containing x, y, a, b, ... ] +``` + +## Linear and persistent facts + +By default, Tamarin rules consume the set of input facts when they are executed. This means that if the global state is +given by `{DeviceState(~x, pk(~x)), F(~y)}` and we execute the following rule, then `Fr(~y)` would be consumed and the +updated global state would be given by `{DeviceState(~x, pk(~x)), DeviceState(~y, pk(~y))}`. + +```ml +rule Initialize_device: + let + public_key = pk(~private_key) + in + [ Fr(~private_key) ] --> [ DeviceState(~private_key, public_key) ] +``` + +Consuming `Fr(~y)` here ensures that the random value `~y` is not reused which would risk breaking the security +guarantees of the scheme. However, facts like `DeviceState(~y, pk(~y))`, which represents an initialized device, should +probably be reusable multiple times. For example, we may have a rule which signs a message using the private key `~y` +that we want to be able to execute multiple times. + +We can indicate to Tamarin that `DeviceState(~y, pk(~y))` is a _persistent fact_ and is not consumed when it is used +as input to a rule, by prefixing the name with an exclamation mark `!`. (Note that the exclamation mark must be used +wherever `DeviceState` is used to signal that `DeviceState` is a persistent fact.) Ordinary facts, that are consumed +when used, are called _linear facts_. + +## Modeling local state + +Linear facts can be used to model the current state of protocol participants. As an example, consider the following +protocol between Alice and Bob. + +{{< mermaid >}} +sequenceDiagram + Alice ->> Bob: e + Bob ->> Alice: e, ee, s, es + Alice ->> Bob: s, se +{{< /mermaid >}} + +The diagram is a high-level description of the Noise XX handshake between Alice and Bob. We don't need to worry about +the content of the individual messages sent. However, it is clear that Alice and Bob have to track their local state +to be able to progress and complete the handshake since Alice's message in step 3 of the protocol depends on what she +sent to Bob in step 1. Linear facts are useful when modeling this type of linear progression through a protocol. We can +denote Alice's local state after step 1 by `Alice_1(...)`, and have the rule modeling step 1 output Alice's local state. + +```ml +rule Noise_XX_1: + [ ... ] + --> + [ + Alice_1(...), + Out(...) + ] +``` + +In step 3, when Alice responds to Bob's message to complete the handshake, the local state `Alice_1(...)` is consumed. +This ensures that the attacker cannot rewind the protocol and complete the handshake multiple times from `Alice_1(...)`. + +```ml +rule Noise_XX_3: + [ + Alice_1(...), + In(...) + ] + --> + [ + Out(...) + ] +``` + +## Where do facts come from? + +Most rules take some facts as input and produce new facts as output. A natural question is, how do we get started? For +example, if each execution trace starts from the empty state, how would we ever be able to execute the +`Initialize_device` rule defined above? The answer is that Tamarin has a built-in rule to generate fresh values. In +fact, `Fr(...)` may only occur as input in user-defined rules to ensure that the corresponding value is unique. diff --git a/content/docs/formal-verification/tamarin/_index.md b/content/docs/formal-verification/tamarin/_index.md new file mode 100644 index 0000000..e82523a --- /dev/null +++ b/content/docs/formal-verification/tamarin/_index.md @@ -0,0 +1,34 @@ +--- +title: "Tamarin" +weight: 2 +summary: "Tamarin a symbolic verification tool that can be used to formally model security protocols. Protocols are +described using either multiset rewrite rules or a variant of the applied pi-calculus. Security properties are modeled +using first-order temporal logic and can be proved either automatically or interactively." +bookCollapseSection: true +--- + +# Tamarin + +Tamarin is a symbolic verification tool that can be used to model the security properties of cryptographic protocols. +Protocols are modeled either using multiset term rewrite rules, or using a version of the applied pi-calculus. Security +properties are specified using a fragment of first-order temporal logic, and can be proved either automatically or +interactively using a web-based interface. + +## Benefits of using Tamarin + +- Built-in support for common cryptographic construction such as symmetric and + asymmetric encryption, hashes, signatures, and Diffie-Hellman based constructions. +- Easy to define new primitives and corresponding equations. +- Security properties are specified using an expressive fragment of temporal + first-order logic. +- Security properties can be proved either automatically or interactively. + +## Ideal use case + +- If you need to define custom primitives or prove more complex security properties. + not supported by simpler tools like [Verifpal](https://verifpal.com). +- If you need to model concurrent, non-deterministic processes, which is not supported by + [Verifpal](https://verifpal.com). +- If the protocol you are modeling uses multiplicative inverses to cancel out + exponents in Diffie-Hellman based sub-protocols, or if you want to consider + attacks based on multiplicative behavior against your protocol.