-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added first four exercises of start-guide #133
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @josh-nook,
There are few small comments to address and the ending needs some polish (as you already know), but otherwise I think it is a quite good tutorial. Well done!
> | ||
> **Modification:** The altering of a program | ||
|
||
So altogether, a DBM _Tool_ is a program that can alter natively compiled user-space binary during runtime, with no source code required. We could take `simple_program` and pass it through to MAMBO as we did before, but instead of simply executing it, we could perform all sorts of modifications on it. Examples of these include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small detail. Technically MAMBO is a DBM framework, whereas a tool is MAMBO + a specific plugin (e.g., MAMBO memcheck). Admittedly, we have been quite bad at making that distinction ourselves, but if possible, let's try to call MAMBO a DBM framework.
> | ||
> **Debugging:** Detecting memory faults within a program | ||
|
||
MAMBO isn't by any means the first DBM Tool to exist. [Pin](https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-dynamic-binary-instrumentation-tool.html), [Qemu](https://www.qemu.org), and [DynamoRIO](https://dynamorio.org) are all examples of DBM-based tools. So if other options are avaliable, what is the purpose of MAMBO? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the terminology is more important. It should say DBM frameworks, not DBM-based tools, since stuff like Pintools are clearly defined by PIN.
|
||
### Why MAMBO? | ||
|
||
MAMBO was created as part of Cosmin Gorgovan's EPSRC-funded PhD in the School of Computer Science at the University of Manchester, with a handful of properties that distinguishes it from other DBMs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nitpick. It reads like Cosmin created all those features; however, Guillermo optimised it for ARM64, and Alistair developed RISC-V support. I would say something along the lines: "It was initially developed by Cosmin's ... with other people contributing since then".
|
||
This exercise will go through how a program like our `simple_program` is executed using MAMBO, step-by-step. It's not _necessary_ content for the rest of the tutorial, but it'll certainly help you fully grasp MAMBO if you want to contribute the project. | ||
|
||
This exercise will obfuscate for the sake of simplicity much of how MAMBO works, most notably with optimisations regarding branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second part of the sentence needs a better flow. Maybe: "..., most notably branch optimisations".
</div> | ||
<br> | ||
|
||
For simplicity, portability, and full control over execution, DBM Tools often **load target programs within their own address space**. This cannot be done with `ld`, shown on the LHS of the diagram below, so we must implement a userspace loader which for MAMBO is `libelf`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just add at the end: "Not to be confused with the Linux provided libelf: https://archlinux.org/packages/core/x86_64/libelf/"
Most optimisations are to do with the main source of overhead in DBM tools: indirect branches. Description of optimisations are out of the scope of this tutorial, so a handful of them are outlined below: | ||
|
||
- **Inline hash lookups** are instrumented at the end of code blocks | ||
- **Hot Paths** between basic blocks are identified and directly linked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a bit of updating. Hot paths and traces are directly related, as traces are created for hot paths. Also, as far as I remember, traces can contain conditional branches. So the bullet points should be something like (not in these exact words):
- Indirect branch opts - I think that is correct
- Direct linking of direct branches (conditional and unconditional) to avoid calling the dispatcher
- Traces to optimise hot paths
- Instruction specific events | ||
|
||
|
||
Callback functions that we write for inserting code into basic blocks (instrumentation) are registered with **scantime** with *MAMBO generated scantime events* ie. when our target program is passed through the code scanner: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about: "... are registered on scan-time to be executed with MAMBO generated scan-time events".
} | ||
``` | ||
|
||
We've included also included a print statement so we can see the filename location and start address of each basic block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"included" is repeated twice.
|
||
The lighter sections within the label blocks represent a single basic block. For all but `LBB0_1`, they have one basic block as they end on a branch statement. | ||
|
||
`LBB0_1` however has two basic blocks. This is because there are two branch statements: `b.ge .LBB0_4` and `b .LBB0_2`. Since a basic block is **strictly** single entry and single branch exit, `LBB0_1` will constitute as two seperate basic blocks in the code cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Branch-link (bl
) also splits the assembly block into 2 basic blocks.
|
||
// TODO | ||
|
||
// I'm unsure as to why there is only 5 basic blocks and not 6. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would run objdump
on the binary and look at the address, this may shed some lights on what is happening.
Thanks for the feedback, I'll work on these at some point this week |
Ignore in current state, this is for myself and another developer to go through over voice call.