Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite IR Interpreter #241

Closed
wants to merge 65 commits into from
Closed
Show file tree
Hide file tree
Changes from 57 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
182c9e8
refactor call to a statement & add unreachable and return jumps
ailrst Aug 19, 2024
3711e4d
move transforms out of RunUtils.scala
ailrst Aug 19, 2024
fd56c9a
remove old cfg
ailrst Aug 9, 2024
bd6b2ad
update docs
ailrst Aug 19, 2024
96b9028
pe based expr evaluator
ailrst Aug 19, 2024
d264b97
interpreter test
ailrst Aug 19, 2024
a3bf83b
rewrite interpreter in functional style
ailrst Aug 22, 2024
74c4bc2
cleanup call/return
ailrst Aug 22, 2024
9e23558
cleanup memory ops to enter effects
ailrst Aug 22, 2024
7571aa3
tracing interpreter
ailrst Aug 26, 2024
f36c13e
compile with state monad
ailrst Aug 27, 2024
2c16c83
fix state monad interp
ailrst Aug 28, 2024
dd11175
indirect calls
ailrst Aug 28, 2024
c7b4a56
tracing interpreter
ailrst Aug 28, 2024
f64f904
breakpoints
ailrst Aug 28, 2024
897c565
improve breakpoints
ailrst Aug 29, 2024
cc97052
reorg
ailrst Aug 29, 2024
695b2d2
refactor with statemonad[s, either[v]]
ailrst Aug 29, 2024
00b482a
redo error handling
ailrst Aug 29, 2024
1a139d2
refactor call to a statement & add unreachable and return jumps
ailrst Aug 19, 2024
7e94536
move transforms out of RunUtils.scala
ailrst Aug 19, 2024
aae0064
remove old cfg
ailrst Aug 9, 2024
e2c0cd4
update docs
ailrst Aug 19, 2024
a3adee3
fix
ailrst Aug 30, 2024
4a92c99
fix externals
ailrst Aug 30, 2024
b994c99
disable IDE analyses if mainproc is external
ailrst Aug 30, 2024
b83c9ff
Merge branch 'call-statement' into interpreter
ailrst Aug 30, 2024
d6c5767
work on differential testing
ailrst Aug 30, 2024
2a03a23
hook for dynlinking
ailrst Sep 2, 2024
8b694c0
load full symtab
ailrst Sep 2, 2024
57ca6b3
update relf grammar
ailrst Sep 3, 2024
b276981
init bss
ailrst Sep 3, 2024
888d360
cleanup
ailrst Sep 3, 2024
0abf654
cleanup init trace
ailrst Sep 3, 2024
9192ce3
intrinsic stub and cleanup errors
ailrst Sep 3, 2024
5342358
init relocation table
ailrst Sep 3, 2024
87b0f86
pull stepper outside effects to fix interpreter composition again
ailrst Sep 4, 2024
f5a0590
cleanup
ailrst Sep 4, 2024
a6b58e8
cleanup
ailrst Sep 4, 2024
dd64c14
constprop test with interpreter
ailrst Sep 4, 2024
8be5cb6
interpreter docs
ailrst Sep 4, 2024
d2a2486
fix list
ailrst Sep 4, 2024
d06e268
paragraph
ailrst Sep 4, 2024
4ae3997
trap eval exceptions to monadic errors
ailrst Sep 4, 2024
4fc8ed2
improve interpretOne
ailrst Sep 4, 2024
8f966ce
notes on initialisation
ailrst Sep 4, 2024
f00b44b
simplify invoc funcs
ailrst Sep 4, 2024
ceef233
note missing features
ailrst Sep 5, 2024
cff6c97
run through all system tests
ailrst Sep 5, 2024
822c127
add resource limit
ailrst Sep 5, 2024
9b8f715
doc resource limit
ailrst Sep 5, 2024
648a41d
tweak doc
ailrst Sep 5, 2024
dfa7209
fix
ailrst Sep 5, 2024
38f871c
tweak interpretrlimit
ailrst Sep 5, 2024
6559180
basic malloc implementation
ailrst Sep 9, 2024
c6e938d
implement printf
ailrst Sep 10, 2024
285099b
cleanup intrins
ailrst Sep 10, 2024
aac7f5c
cleanup
ailrst Sep 23, 2024
1c2a5d5
Merge remote-tracking branch 'upstream/main' into interpreter
ailrst Sep 23, 2024
46ff66a
cleanup
ailrst Sep 23, 2024
970c55a
Merge branch 'main' into interpreter
Nov 8, 2024
1a8067c
make compiler options consistent
Nov 8, 2024
32bf439
fix use of deprecated examples
Nov 8, 2024
9cc3c31
fix use of deprecated examples
Nov 8, 2024
fe3678f
give State filename consistent with Package, formatting, clean up imp…
Nov 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 39 additions & 6 deletions docs/basil-ir.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
BASIL IR is the intermediate representation used during static analysis.
This is on contrast to Boogie IR which is used for specification annotation, and output to textual boogie syntax that can be run through the Boogie verifier.


The grammar is described below, note that the IR is a data-structure, without a concrete textual representation so the below grammar only represents the structure.
We omit the full description of the expression language because it is relatively standard.

Expand All @@ -13,13 +14,16 @@ The IR has a completely standard simple type system that is enforced at construc
Program ::=&~ Procedure* \\
Procedure ::=&~ (name: ProcID) (entryBlock: Block) (returnBlock: Block) (blocks: Block*) \\
&~ \text{Where }entryBlock, returnBlock \in blocks \\
Block ::=&~ BlockID \; (Statement*)\; Jump \; (fallthrough: (GoTo | None))\\
&~ \text{Where $fallthough$ may be $GoTo$ IF $Jump$ is $Call$} \\
Block_1 ::=&~ BlockID \; Statement*\; Call? \; Jump \; \\
Block_2 ::=&~ BlockID \; (Statement | Call)*\; Jump \; \\
\\
&~ Block = Block_1 \text{ is a structural invariant that holds during all the early analysis/transform stages}
\\
Statement ::=&~ MemoryAssign ~|~ LocalAssign ~|~ Assume ~|~ Assert ~|~ NOP \\
ProcID ::=&~ String \\
BlockID ::=&~ String \\
\\
Jump ::=&~ Call ~|~ GoTo \\
Jump ::=&~ GoTo ~|~ Unreachable ~|~ Return \\
GoTo ::=&~ \text{goto } BlockID* \\
Call ::=&~ DirectCall ~|~ IndirectCall \\
DirectCall ::=&~ \text{call } ProcID \\
Expand All @@ -46,6 +50,33 @@ Endian ::=&~ BigEndian ~|~ LittleEndian \\
\end{align*}
```

- The `GoTo` jump is a multi-target jump reprsenting non-deterministic choice between its targets.
Conditional structures are represented by these with a guard (an assume statement) beginning each target.
- The `Unreachable` jump is used to signify the absence of successors, it has the semantics of `assume false`.
- The `Return` jump passes control to the calling function, often this is over-approximated to all functions which call the statement's parent procedure.

## Translation Phases

#### IR With Returns

- Immediately after loading the IR return statements may appear in any block, or may be represented by indirect calls.
The transform pass below replaces all calls to the link register (R30) with return statements.
In the future, more proof is required to implement this soundly.

```
cilvisitor.visit_prog(transforms.ReplaceReturns(), ctx.program)
transforms.addReturnBlocks(ctx.program, true) // add return to all blocks because IDE solver expects it
cilvisitor.visit_prog(transforms.ConvertSingleReturn(), ctx.program)
```

This ensures that all returning, non-stub procedures have exactly one return statement residing in their `returnBlock`.

#### Calls appear only as the last statement in a block

- The structure of the IR allows a call may appear anywhere in the block but for all the analysis passes we hold the invariant that it
only appears as the last statement. This is checked with the function `singleCallBlockEnd(p: Program)`.
And it means for any call statement `c` we may `assert(c.parent.statements.lastOption.contains(c))`.

## Interaction with BASIL IR

### Constructing Programs in Code
Expand All @@ -62,10 +93,12 @@ var program: Program = prog(
block("first_call",
Assign(R0, bv64(1), None)
Assign(R1, bv64(1), None)
directCall("callee1", Some("second_call"))
directCall("callee1"),
goto("second_call"))
),
block("second_call",
directCall("callee2", Some("returnBlock"))
directCall("callee2"),
goto("returnBlock")
),
block("returnBlock",
ret
Expand All @@ -82,7 +115,7 @@ program ::= prog ( procedure+ )
procedure ::= proc (procname, block+)
block ::= block(blocklabel, statement+, jump)
statement ::= <BASIL IR Statement>
jump ::= call_s | goto_s | ret
jump ::= goto_s | ret | unreachable
call_s ::= directCall (procedurename, None | Some(blocklabel)) // target, fallthrough
goto_s ::= goto(blocklabel+) // targets
procname ::= String
Expand Down
203 changes: 203 additions & 0 deletions docs/development/interpreter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# BASIL IR Interpreter

The interpreter is designed for testing, debugging, and validation of static analyses and code transforms.
This page describes first how it can be used for this purpose, and secondly its design.

## Basic Usage

The interpreter can be invoked from the command line, via the interpret flag, by default this prints a trace and checks that the interpreter
exited on a non-error stop state.

```shell
./mill run -i src/test/correct/indirect_call/gcc/indirect_call.adt -r src/test/correct/indirect_call/gcc/indirect_call.relf --interpret
[INFO] Interpreter Trace:
StoreVar(#5,Local,0xfd0:bv64)
StoreMem(mem,HashMap(0xfd0:bv64 -> 0xf0:bv8, 0xfd6:bv64 -> 0x0:bv8, 0xfd2:bv64 -> 0x0:bv8, 0xfd3:bv64 -> 0x0:bv8, 0xfd4:bv64 -> 0x0:bv8, 0xfd7:bv64 -> 0x0:bv8, 0xfd5:bv64 -> 0x0:bv8, 0xfd1
...
[INFO] Interpreter stopped normally.
```

The `--verbose` flag can also be used, which may print interpreter trace events as they are executed, but not this may not correspond to the actual
execution trace, and contain additional events not corresponding to the program.
E.g. this shows the memory intialisation events that precede the program execution. This is mainly useful for debugging the interpreter.

### Testing with Interpreter

The interpreter is invoked with `interpret(p: IRContext)` to interpret normally and return an `InterpreterState` object
containing the final state.

#### Traces

There is also, `interpretTrace(p: IRContext)` which returns a tuple of `(InterpreterState, Trace(t: List[ExecEffect]))`,
where the second argument contains a list of all the events generated by the interpreter in order.
This is useful for asserting a stronger equivalence between program executions, but in most cases events describing "unobservable"
behaviour, such as register accesses should be filtered out from this list before comparison.

To see an example of this used to validate the constant prop analysis see [/src/test/scala/DifferentialAnalysis.scala](../../src/test/scala/DifferentialAnalysis.scala).

#### BreakPoints

Finally `interpretBreakPoints(p: IRContext, breakpoints: List[BreakPoint])` is used to
run an interpreter and perform additional actions at specified code points. For example, this may be invoked such as:

```scala
val watch = IRWalk.firstInProc((program.procedures.find(_.name == "main")).get).get
val bp = BreakPoint("entrypoint", BreakPointLoc.CMD(watch), BreakPointAction(saveState=true, stop=true, evalExprs=List(("R0", Register("R0", 64))), log=true))
val res = interpretBreakPoints(program, List(bp))
```

The structure of a breakpoint is as follows:

```scala
case class BreakPoint(name: String = "", location: BreakPointLoc, action: BreakPointAction)

// the place to perform the breakpoint action
enum BreakPointLoc:
case CMD(c: Command) // at a command c
case CMDCond(c: Command, condition: Expr) // at a command c, when condition evaluates to TrueLiteral

// describes what to do when the breakpoint is triggered
case class BreakPointAction(
saveState: Boolean = true, // stash the state of the interpreter
stop: Boolean = false, // stop the interpreter with an error state
evalExprs: List[(String, Expr)] = List(), // Evaluate the rhs of the list of expressions, and stash them (lhs is an arbitrary human-readable name)
log: Boolean = false // Print a log message about passing the breakpoint describing the results of this action
)
```

To see an example of this used to validate the constant prop analysis see [/src/test/scala/InterpretTestConstProp.scala](../../src/test/scala/InterpretTestConstProp.scala).

### Resource Limit

This kills the interpreter in an error state once a specified instruction count is reached, to avoid the interpreter running forever on infinite loops.

It can be used simply with the function `interptretRLimit`, this automatically ignores the initialisation instructions.

```scala
def interpretRLimit(p: IRContext, instructionLimit: Int) : InterpreterState
```

It can also be combined with other interpreters as shown:

```scala
def interp(p: IRContext, instructionLimit: Int) : (InterpreterState, Trace) = {
val interpreter = LayerInterpreter(tracingInterpreter(NormalInterpreter), EffectsRLimit(instructionLimit))
val initialState = InterpFuns.initProgState(NormalInterpreter)(p, InterpreterState())
BASILInterpreter(interpreter).run((initialState, Trace(List())), 0)._1
}
```

## Implementation / Code Structure

### Summary

- [Bitvector.scala](../../src/main/scala/ir/eval/Bitvector.scala)
- Evaluation of bitvector operations, throws `IllegalArgumentException` on violation of contract
(e.g negative divisor, type mismatch)
- [ExprEval.scala](../../src/main/scala/ir/eval/ExprEval.scala)
- Evaluation of expressions, defined in terms of partial evaluation down to a Literal
- This can also be used to evaluate expressions in static analyses, by passing a function to query variable assignments and memory state from the value domain.
- [Interpreter.scala](../../src/main/scala/ir/eval/Interpreter.scala)
- Definition of core `Effects[S, E]` and `Interpreter[S, E]` types describing state transitions in
the interpreter
- Instantiation/definition of `Effects` for concrete state `InterpreterState`
- [InterpreterProduct.scala](../../src/main/scala/ir/eval/InterpreterProduct.scala)
- Definition of product and layering composition of generic `Effects[S, E]`s interpreters
- [InterpretBasilIR.scala](../../src/main/scala/ir/eval/InterpretBasilIR.scala)
- Definition of `Eval` object defining expression evaluation in terms of `Effects[S, InterpreterError]`
- Definition of `Interpreter` instance for BASIL IR, using a generic `Effects` instance and concrete state.
- Definition of ELF initialisation in terms of generic `Effects[S, InterpreterError]`
- [InterpretBreakpoints.scala](../../src/main/scala/ir/eval/InterpretBreakpoints.scala)
- Definition of a generic interpreter with a breakpoint checker layered on top
- [interpretRLimit.scala](../../src/main/scala/ir/eval/InterpretRLimit.scala)
- Definition of layered interpreter which terminates after a specified cycle count
- [InterpretTrace.scala](../../src/main/scala/ir/eval/InterpretTrace.scala)
- Definition of a generic interpreter which records a trace of calls to the `Effects[]` instance.

### Explanation

The interpreter is structured for compositionality, at its core is the `Effects[S, E]` type, defined in [Interpreter.scala](../../src/main/scala/ir/eval/Interpreter.scala).
This type defines a small set of functions which describe all the possible state transformations, over a concrete state `S`, and error type `E` (always `InterpreterError` in practice).

This is implemented using the state Monad, `State[S,V,E]` where `S` is the state, `V` the value, and `E` the error type.
This is a flattened `State[S, Either[E]]`, defined in [util/functional.scala](../../src/main/scala/util/functional.scala).
`Effects` methods return delayed computations, functions from an input state (`S`) to a resulting state and a value (`(S, Either[E, V])`).
These are sequenced using `flatMap` (monad bind), or the `for{} yield()` syntax sugar for flatMap.

This `Effects[S, E]` is instantiated for a given concrete state, the main example of which is `NormalInterpreter <: Effects[InterpreterState, InterpreterError]`,
also defined in `Interpreter.scala`. The memory component of the state is abstracted further into the `MemoryState` object.

The actual execution of code is defined on top of this, in the `Interpreter[S, E]` type, which takes an instance of the `Effects` by parameter,
and defines both the small step (`interpretOne`) over on instruction, and the fixed point to termination from some in initial state in `run()`.
The fact that the stepping is defined outside the effects is important, as it allows concrete states, and state transitions over them to be
composed somewhat arbitrarily, and the interpretatation of the language compiled down to calls to resulting instance of `Effects`.

This is defined in [InterpretBasilIR.scala](../../src/main/scala/ir/eval/InterpretBasilIR.scala). `BASILInterpreter` defines an
`Interpreter` over an arbitrary instance of `Effects[S, InterpreterError]`, encoding BASIL IR commands as effects.
This file also contains definitions of the initial memory state setup of the interpreter, based on the ELF sections and symbol table.

### Composition of interpreters

There are two ways to compose `Effects`, product and layer. Both produce an instance of `Effects[(L, R), E]`,
where `L` and `R` are the concrete state types of the two Effects being composed.

Product runs the two effects, over two different concrete state types, simultaneously without interaction.

Layer runs the `before` effect first, and passes its state to the `inner` effect whose value is returned.

```scala
case class ProductInterpreter[L, T, E](val inner: Effects[L, E], val before: Effects[T, E]) extends Effects[(L, T), E] {
case class LayerInterpreter[L, T, E](val inner: Effects[L, E], val before: Effects[(L, T), E])
```

Examples of using these are in the `interpretTrace` and `interpretWithBreakPoints` interpreters respectively.

Note, this only works by the aforementioned requirement that all effect calls come from outside the `Effects[]`
instance itself. In the simple case, the `Interpreter` instance is the only object calling `Effects`.
This means, `Effects` triggered by an inner `Effects[]` instance do not flow back to the `ProductInterpreter`,
but only appear from when `Interpreter` above the `ProductInterpreter` interprets the program via effect calls.
For this reason if, for example, `NormalInterpreter` makes effect calls they will not appear in a trace emitted by `interptretTrace`.

### Note on memory space initialisation

Most of the interpret functions are overloaded such that there is a version taking a program `interpret(p: Program)`,
and a version taking `IRContext`. The variant taking IRContext uses the ELF symbol information to initialise the
memory before interpretation. If you are interpreting a real program (i.e. not a synthetic example created through
the DSL), this is most likely required.

We initialise:

- The general interpreter state, stack and memory regions, stack pointer, a symbolic mapping from addresses functions
- The initial and readonly memory sections stored in Program
- The `.bss` section to zero
- The relocation table. Each listed offset is stored an address to either a real procedure in the program, or a
location storing a symbolic function pointer to an intrinsic function.

`.bss` is generally the top of the initialised data, the ELF symbol `__bss_end__` being equal to the symbol `__end__`.
Above this we can somewhat choose arbitrarily where to put things, usually the heap is above, followed by
dynamically linked symbols, then the stack. There is currently no stack overflow checking, or heap implemented in the
interpreter.

Unfortunately these details are defined by the load-time linker and the system's linker script, and it is hard to find a good description
of their behaviour. Some details are described here https://refspecs.linuxfoundation.org/elf/elf.pdf, and here
https://dl.acm.org/doi/abs/10.1145/2983990.2983996.

### Missing features

- There is functionality to implement external function calls via intrinsics written in Scala code, but currently only
basic printf style functions are implemented as no-ops. These can be extended to use a file IO abstraction, where
a memory region is created for each file (e.g. stdout), with a variable to keep track of the current write-point
such that a file write operation stores to the write-point address, and increments it by the size of the store.
Importantly, an implementation of malloc() and free() is needed, which can implement a simple greedy allocation
algorithm.
- Despite the presence of procedure parameters in the current IR, they are not used for by the boogie translation and
are hence similarly ignored in the interpreter.
- The interpreter's immutable state representation is motivated by the ability to easily implement a sound approach
to non-determinism, e.g. to implement GoTos with guessing and rollback rather than look-ahead. This is more
useful for checking specification constructs than executing real programs, so is not yet implemented.
- The trace does not clearly distinguish internal vs external calls, or observable
and non-observable behaviour.
- While the interpreter semantics supports memory regions, we do not initialise the memory regions (or the initial memory state)
based on those present in the program, we simply assume a flat `mem` and `stack` memory partitioning.


1 change: 1 addition & 0 deletions docs/development/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- [tool-installation](tool-installation.md) Guide to lifter, etc. tool installation
- [scala](scala.md) Advice on Scala programming.
- [cfg](cfg.md) Explanation of the old CFG datastructure
- [interpreter](interpreter.md) Explanation of IR interpreter


## Scala
Expand Down
1 change: 1 addition & 0 deletions docs/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ To get started on development, see [development](development).
- [editor-setup](development/editor-setup.md) Guide to basil development in IDEs
- [tool-installation](development/tool-installation.md) Guide to lifter, etc. tool installation
- [cfg](development/cfg.md) Explanation of the old CFG datastructure
- [interpreter](development/interpreter.md) Explanation of IR interpreter
- [basil-ir](basil-ir.md) explanation of BASIL's intermediate representation
- [compiler-explorer](compiler-explorer.md) guide to the compiler explorer basil interface
- [il-cfg](il-cfg.md) explanation of the IL cfg iterator design
Expand Down
2 changes: 1 addition & 1 deletion src/main/antlr4/ReadELF.g4
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ symbolTableHeader :
'Symbol table' tableName 'contains' HEX 'entries:' NEWLINE
'Num:' 'Value' 'Size' 'Type' 'Bind' 'Vis' 'Ndx' 'Name' // Mainly a sanity check for the column order
;
symbolTableRow : HEX ':' value=HEX size=HEX entrytype=STRING bind=STRING vis=STRING (HEX | STRING) name=(HEX | STRING)? STRING? NEWLINE;
symbolTableRow : (num=HEX) ':' value=HEX size=HEX entrytype=STRING bind=STRING vis=STRING ndx=(HEX | STRING) name=(HEX | STRING)? STRING? NEWLINE;
// symbolTableRow : HEX ':' HEX HEX symbolType bind vis ndx name? ;

tableName : '\'' STRING '\'' ;
Expand Down
Loading