Is CrossHair a Software Verification Tool? #156

pschanely · 2022-02-11T18:05:28Z

pschanely
Feb 11, 2022
Maintainer

Generally speaking, CrossHair does not perform software verification. (i.e. it does not prove that your contracts always hold)

Under some very specific assumptions, however, it can fulfill this purpose. The following post describes how this can be done and what assumptions are required. Further discussion may happen below, but this content will be maintained over time in an attempt to remain up-to-date. (last edit: 2022-07-01; many thanks to lmontand for helping to catalog these!)

CrossHair explores different "paths" through the code and uses the SMT solver to reason about all possible inputs that would follow this path. Sometimes, CrossHair runs out of paths to explore. If you supply the --report_all option to crosshair check, CrossHair will report Confirmed over all paths if it exhausted all paths. Apart from the cases detailed below, "Confirmed over all paths." may be interpreted as a verification of the property.

Cases where CrossHair might incorrectly return `CONFIRMED` instead of `REFUTED`

The following is a list of cases were CrossHair might report a wrong analysis result because it made some assumptions which your code does not satisfy.

Non-determinism: All functions, directly or indirectly invoked by the function (or its pre/post conditions) should exhibit the same behavior when given the same arguments. Here are some examples of non-determinism. For cases raising NotDeterministic, see section below. Examples of non-determinism are:
- Calls to functions with randomized output (example: random library), functions depending on the system's state (example: time and datetime libraries), or functions whose output depends on previous queries.
- Reading a mutable global variable.
- Using cached results. (though the functools cache operations like lru_cache are safe to use - CrossHair detects these and disables their usual caching behaviors)
- Functions raising exceptions depending on the state of the system (e.g. if memory is full). CrossHair will only trigger exceptions which depend on the function's args.
- Self-modifying code is also a kind of non-determinism, as different executions of the same function might lead to different results.
If you still wish to use such functions in your code, you might want to register contract for those to help CrossHair understand them. See plugins and contract registration.
Code termination: Verification requires that the code terminates on all inputs. (why? Think about it like an inductive proof - the inductive step only works if you also have a "base case"). A real verification tool will have you write a termination proof before reasoning about the property to be verified; CrossHair simply assumes termination. If your code does not terminate, CrossHair may declare a contract confirmed over all paths when it isn't. Here is an example.
All arguments have states that are directly constructable: Problematic examples: classes with mutable private members like this. A list containing itself as a member (this is also an example of the aliasing problem below). To ensure CrossHair correctly understand your classes, please follow these guidelines.
Aliases: Aliasing problems might happen when you are using containers/collections holding non-trivial objects. CrossHair remembers instances created before and is able to detect some cases. However, CrossHair is not able to correctly reason about all cases: see this issue which contains two interesting examples. More to this topic, you should avoid comparing objects with the is keyword. Here is an example of what could go wrong. However, you may safely make is comparisons with None and values of an Enum.
Closed type hierarchy: CrossHair will detect and attempt to use subclasses that have already been defined in the interpreter (warning, see exceptions below under "IMPORTANT"), but will not consider the possibility of additional subclasses. Path exhaustion will happen after all known subclasses have been attempted. As an example, suppose you run CrossHair on some module module1 containing the class class Class1. Suppose another module module2 contains class Class2(Class1) (i.e. a subclass of Class1). If module2 is not loaded by the interpreter, CrossHair will have no way of knowing that Class2 exists. So, the analysis result might be "confirmed over all paths" even if Class2 breaks some postconditions.

IMPORTANT: classes defining either __copy__, __deepcopy__, __reduce__ or __reduce_ex__ are not considered in the possible subclasses, because they have a non-trivial implementation of copying. Here is an example.

Another similar and important note is that CrossHair won't detect subclassing that is revealed via dynamic isinstance hooks, e.g. __subclasscheck__ and __subclasshook__.
Catching exceptions: If you do some "advanced" exception handling - not only based on the exception type itself, but on other information carried with the exception - your code handling the exception might not behave the correct way when CrossHair runs it. The reason for this is that CrossHair simplifies information of some exceptions of the standard library, to avoid unnecessary realization of symbolic variables. This only concerns exceptions thrown by the standard library, if your code throws some exceptions, they will remain untouched. Here is an example where handling the exception depends on the exception's message, which is missing when executing CrossHair.
Callables with TypeVar: If your function takes as input a Callable[X, Y], where X and/or Y is/contains a TypeVar, CrossHair might return "confirmed over all paths", even when the postcondition is wrong. See this issue as an example.
Parallelism: The code is assumed to be single-threaded and to not have any other kind of parallelism. Therefore, CrossHair provides no detection for deadlocks, concurrent writes, racing conditions, etc. Here is a sample example which works perfectly fine in a single-threaded setting, but could be wrong when adding parallelism.
Pointers: Using the ctypes library and modifying pointers, it's easy to have CrossHair return incorrect results. To demonstrate the power of ctypes, one could even redefine the value of 1, thus totally breaking the behavior of Python.
Python environment: Similarly, do not expect any guarantees if you try to modify part of the standard library. Results only apply to a fresh and unmodified python interpreter and library.

Additionally note that if you supply incorrect pre- or postconditions or that you write incorrect type annotations to your functions, analysis results have high chances to be incorrect as well.

Cases where the result is caped to `UNKNOWN`

Below is a list of more challenging cases where CrossHair's result is often UNKNOWN. You can find an example for most of them in this gist. Unlike the above list of cases, the UNKNOWN result here is "correct." We present these cases to help you understand why CrossHair may not be able to prove your property.

Because CrossHair approximates float values as real numbers, it will never report "Confirmed over all paths" when it uses a symbolic float.
If CrossHair needs to realize a symbolic value of type object or Any (both are treated the same way), it will only try the subtypes int and str and it will cap the result to UNKNOWN. This is because trying all subclasses of object is clearly not feasible in a reasonable amount of time. Note that in some rare circumstances where only some trivial operations are performed on symbolic objects, CrossHair does not need to realize the object and can still prove your result. Here is an example.
Using TypeVar with constraints (for example TypeVar("T", int, str)) is currently not supported and the result is caped to UNKNOWN. Note that this is different from using TypeVar("T", bound=Union[int, str]), which is supported. For more information about the difference, see this post.
Symbolic values are simulating real values. However, this illusion is not perfect. And some function might raise a TypeError when receiving a SymbolicInt instead of a regular int, for example. If CrossHair detects such a case, the result is caped to UNKNOWN. If this was not detected, the result will be REFUTED and it will directly report you the TypeError.
If the signature of a constructor is not found or incomplete, CrossHair will not be able to create a proxy for it (it will not know what types of symbolic values to feed to the constructor). Here again, the result is caped to UNKNOWN. This can happen for some of the builtins and for some external C-based modules.
Currently, CrossHair does not support symbolic functions with an ellipsis as argument type (i.e. Callable[..., <some_return_type>]) or with typing.Paramspec or typing.Concatenate. In such cases, the result is caped to UNKNOWN.
Slices with a step different from 1 are currently not supported and the result is caped to UNKNOWN.
Most operations over inputs of arbitrary size (lists, strings, dicts, etc) explicitly or implicitly trigger an infinite number of execution paths. In such cases, CrossHair timeouts and the result is UNKNOWN.
Similarly, if you write very complicated constraints, the SMT solver might have a hard time solving them and will timeout. In such cases, the result cannot be better than UNKNOWN, as some paths are not explored.

Note: In terms of design and evolutionary goals, CrossHair targets rapid counterexample discovery in practical use cases, rather than tactics for path exhaustion (verification). Often these two objectives are complementary, but not always. Therefore, issues that principally benefit verification use cases will not be prioritized as aggressively as other issues.

zouyonghao · 2024-09-26T14:33:38Z

zouyonghao
Sep 26, 2024

Thank you for this explanation. I recently am trying CrossHair to do symbolic execution. The above discussion seems not very severe for converting CrossHair to a pure symbolic executor except aliases and the case that functions refuse variable Symbolic*.

My biggest concern about CrossHair is that it does not directly execute the program in a way of other symbolic executors like Klee. For example, when there is a branch (e.g., if), CrossHair doesn't know that there's a branch but only knows that some bool variables are created. Are these two conditions (branch vs. bool variables creation) equivalent? Or maybe my understanding is wrong.

Thank you again!

1 reply

pschanely Sep 26, 2024
Maintainer Author

Ah, so, I have a few interesting things to say here.

First, we don't force a decision when a boolean is created; for example, x < 7 produces a SymbolicBool object. However, when that is used in a conditional (branch), the interpreter will call __bool__() on the SymbolicBoolean, and that's when we pick True or False, and record a fork in our execution path tree.

Second, there are several cases where we need to do something that regular Python doesn't support. One example here is that not x would normally (prematurely) force a decision; there is no special __not__() method to override Python's behavior. To avoid this, and several other similar issues, we install a tracer and monitor currently executing opcodes. When we are about to execute a UNARY_NOT opcode, we'll (with the help of a very invasive C extension module) check whether the input is symbolic and perform a symbolic negation instead. At some point in the future, we'll likely move the decision from inside __bool__() to opcode intercepts for the various branching opcodes, but we haven't had a need to do that yet.

All that said, I think the timing of the decision changes the efficiency of the path search, but not the completeness. From a completeness perspective, it's OK to move decisions forward, so long as you eventually exhaust the path tree.

If you decide to proceed with CrossHair, I would love it if you let me know what works well and what doesn't!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is CrossHair a Software Verification Tool? #156

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is CrossHair a Software Verification Tool? #156

pschanely Feb 11, 2022 Maintainer

Cases where CrossHair might incorrectly return CONFIRMED instead of REFUTED

Cases where the result is caped to UNKNOWN

Replies: 1 comment · 1 reply

zouyonghao Sep 26, 2024

pschanely Sep 26, 2024 Maintainer Author

pschanely
Feb 11, 2022
Maintainer

Cases where CrossHair might incorrectly return `CONFIRMED` instead of `REFUTED`

Cases where the result is caped to `UNKNOWN`

Replies: 1 comment 1 reply

zouyonghao
Sep 26, 2024

pschanely Sep 26, 2024
Maintainer Author