-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: New core API #341
base: master
Are you sure you want to change the base?
WIP: New core API #341
Conversation
…. add new Fun trait to enable this
… QT build * implement `Fun` for `neo::Function` * Make cli app_logic completely generic (can now properly compare the two) * Add generic asm printers to display * Add generic IL printer * Add SSAFunction trait for data-flow based analysis * Move ssa_conversion into cli, to be lazy * Impl SSAFunction as stubs for neo::Function (does nothing currently) * Fix QT build
* Completely port old function to use petgraph * add DataFlow trait which implements data flow algorithms * update data-flow to use petgraph * temporarily comment neo usage in cli as it cannot be passed into app_logic until il generics are implemented * fix tests in data-flow * remove postorder method in function
…aFlow impl for ssa_conversion()
generic function
… and in function using petgraph. remove unnecessary imports field from Project
completely remove graph_algos
I'm not sure I fully understand what happened. |
@m4b ok, why do you put all this Function trait stuff here? Shouldn't it be in the RRC crate? |
It was a shim for compat between old function and new. It’s conpletely removed in m4b/function_container branch. Also after porting I discovered that’s there’s no benefit whatsoever to the old function, it was just the RREIL being faster than neo in some cases. This will all be clearer if you checkout the branch above. So the function container branch is based off of this. I think the developments there are the light and the truth going forward. It presents a single container format, like vec, but for IL. Three IL are provided: RREIL, neo, and a noop language. Unfortunately I discovered that the changes w.rt perf and other stuff were escalating completely out of control so I’ve been doing everything I can to get it ported and building again. However the changes introduced here will 100% conflict causing more time loss while I rebase. That’s fine but we should endeavor to not touch same things after this. So I also ported abstract interpretation over to thE new function format, but only for RREIL, not neo. Only thing that remains is:
But I think I need help, I’m feeling pretty overwhelmed :/ So I suggest you let me rebase what you have here, and then review the merge I’ll create. To be clear I think the changes I’ve made are really awesome and will set the road going forward but I did not estimate the time required and I’m starting to burn out, even tho I’m close. |
E.g., here is a perf from a57a929 with some local changes:
The old format (rreil + ported to petgraph) is just slower than new function format + rreil |
I had a quick look at the function_container branch and I honestly have no idea what's happening there. It's 32 commits and thousands of lines. I appreciate your work on Panopticon, but you need to slow down a bit. First, there is no need to make Program, Function of whatever generic. We only need one IL, there is no benefit of supporting more. This stuff fits better into the RRC crate. Second, I won't merge your branch as it is now. It's way to much changes, many of them have nothing to do the core API and you nuked the AVR disassembler in the process. Also moving the old function type to petgraph is completely unnecessary. It will be replaced anyway. I simply need a Panopticon that's reasonable fast and has a simple API so I can build an decompiler, adding support for different ILs and rewriting the function type completely again does not help here. I want you as a contributor and get your code into the project but you're making it really hard 🤕 |
This is very long, sorry, but I would urge you to be patient and read :)
Unfortunately, the way panopticon is structured and written is that making any changes to core ripples through every single crate. I'm aware of the avr disassembler, but I'm not sure its no longer disassembling incorrectly, just that the tests are failing; that was due to changes in region, they can be reverted.
I will address this below, but to reiterate, the API of #[derive(Debug, Serialize, Deserialize)]
pub struct Function<IL = Bitcode> {
/// The name of this function
pub name: Str,
uuid: Uuid,
// sort by rev. post order
code: IL,
// sort by rev. post order
basic_blocks: Vec<BasicBlock>,
// sort by area.start
mnemonics: Vec<Mnemonic>,
cflow_graph: Graph<CfgNode,Guard>,
entry_point: BasicBlockIndex,
kind: FunctionKind,
aliases: Vec<String>,
} This affects your use of it hardly at all; I had to change precisely 0% of But to get at the heart of the matter, I believe I need to address the following:
Respectfully, I disagree, for several reasons. For one, consider the situation that you're in right now. You are porting some portions to a new IL.
Then if you succeed with all that, you have all of the qt binary to port, as well as cli. You then have all the tests to port. You cannot port the disassembler, because it lifts to Consequently, you now have 2 ILs, whether you like it or not. On top of that, the RREIL il will almost certainly always be faster, since it does not do encoding - as you said elsewhere, the bitcode encoding trades space for time. The benchmarks I've run confirm this. One theme here is how difficult it was to add/change an IL; having the function container generic makes that essentially a moot point. Why is that interesting? Perhaps you will change the IL in the future, like you did just now. Or perhaps you want to run a specialized IL for a different purpose. Or perhaps someone wants to implement VEX, but reuse all the disassembler logic, cfg construction, etc., using panopticon as a library. For example, all for my usecase, all I need is an IL that has a single call statement, and drops everything else. Disassembling to a full data-flow capable language with phi functions and executing operations is wasteful, in both space and time, for my purposes. With a generic container, just like a generic vector, that becomes easy and possible, and it will not require anyone to modify core. Indeed, I view the structure of Function now essentially complete, and clients need only worry about the language, or manipulating/iterating and performing whatever analysis they need with the function in their own crates. I can literally write this call-only language now, and use the With that being said, yes, I got quite carried away; some of the changes I'm happy to revert, though I think they were all done for very good reasons. For example, the region api changes added instant perf improvements for essentially free, by just making it simpler. Other changes outside of core are unavoidable, precisely for the reasons I detailed above; core as it's written now is completely tightly coupled with the rest of the crates - its really quite brittle, but again, the changes in function container imho will fix this. I also understand there's too many changes; I should have done it more incrementally, but it's quite hard, for the above reasons. To illustrate, making function generic changes client usage barely at all. E.g.: for (idx, bb) in function.basic_blocks() {
for statement in function.statements(idx) {
// etc will be either Bitcode or RREIL
}
} Only in tests typically, where a function is constructed and then iterated do you need to locally annotate (because rust type inference sucks in these cases), which is still as simple as Indeed, the way it is engineered right now, this day, is that this will all work for anything which implements the language trait: /// A sequence of statements in an Intermediate Language. Implement this for your new IL
pub trait Language {
/// A statement of this intermediate language; it currently must be able to convert itself from
/// standard panopticon RREIL
type Statement: From<Statement>;
/// Add a statement
fn push(&mut self, statement: Self::Statement) -> Result<usize>;
/// Add several statements
fn append(&mut self, statements: Vec<Self::Statement>) -> Result<Range<usize>>;
/// How many statements
fn len(&self) -> usize;
/// Hint for number of unique strings or variables in this sequence
fn number_of_strings(&self) -> Option<usize> {
None
}
} I need to emphasize, this is already implemented for both RREIL (actually a I personally think this is an absolutely tremendous gain for abstraction, flexibility, and maintainability. The only extra burden is when creating a completely generic algorithm for function, but this is always more verbose and harder to read, since its completely generic - and then it will work with anything, and you won't have to modify the code... Anyway, I can continue arguing for what I believe is the technically superior, future looking and easier to maintain, generic version, but if you are completely set that panopticon will never support multiple IL in the manner I've detailed, then I shouldn't bother continuing to write more. I will however be very disappointed. Lastly, I should also note that I spoke with you in depth about adding multiple IL and adding a generic function container format and you never expressed any of the opinions you are expressing now, so I'm quite shocked, frankly. If I had known you had strong opinions about there being no point to extra IL I would likely never have begun work on it, which I do find somewhat galling... On the other hand, if you are amenable to a more reasonable, incremental merge (again, apologies on so many changes, but much of it was experimentally adding/removing things, and meant to be squashed into single commit), I can look at decreasing the changes and the diff noise so you can better review it. Please let me know what you think, thanks for reading and sorry this was so long! |
While I do add/remove instructions to Panopticons RREIL dialect the majority of the language stays the same. This it different from supporting a whole different IL like VEX or ESIL. These have completely different concepts. I don't see how adding an trait to Function and everything it touches (nearly every function in Panopticon) helps with that. You would still need to change the abstract interpreter/abstract domains and all other parts in order to support another IL.
While I admit that I don't have the exact IL instruction set nailed yet, this is overkill. To be completely honest: Panopticon is not a experimentation field for ILs. It's graphical disassembler that implements static analysis in a way non-experts can use it. It's supposed to be fast and easy to deploy and use. I have no interest in appealing to academics. They have angr & BAP. I'm not completey opposed to people using Panopticon as a general binary program analysis library, it's just not no. 1 priority.
Ok, how about we add a simple flag that tells Function to discard all RREIL except calls when disassembling?
This is very sad. The reason it took me so long to answer is that I had no idea how to respond to this and I still don't know. I'm really sorry this happend. I want to prevent Panopticon losing it's focus and becoming a dumping ground for everything RE related like some other projects. While I don't believe Panopticon is the right place for an IL trait I still think the RRC/Illuminati crate needs something like this. Every Rust based RE project implements its own IL anyway so in order to have the same algorithm implementation to work with Falcon, Radeco & Panopticon the algos need to be parameterized for the IL. That said, the changes you made to Region, Layer and Disassembler to get rid of the graph-algos crate are great. In case you're still interested in contributing I'd welcome a PR the these changes cherry-picked onto master. |
Sorry for delay, been too busy. So first and foremost, everything is fine, don't stress :D I wrote a bunch of code; it didn't get merged, its fine ;) I don't want to distract you from working on the stuff you want to work on. That being said, I just don't have the time to dig through the commits, and extract them; also I don't really feel like it, since its drudgery and I honestly don't see the point of doing it. Of course feel free to cherry-pick anything you need. So to clarify, this PR attempted to:
On hindsight 4. was a serious tactical error and only increased the diff noise, and should not have been done, but I was impatient. Basically doing 2. more or less makes sense to do 1. And doing 3. adds almost no diff noise to 2, but as you said, its not what you have in mind (which is totally fine of course). Anyway, for a path forward, to put it bluntly, you have your work cut out for you. A huge amount of refactoring needs to be done to get the whole project properly compiling and using the new IL, let alone updating all of the tests. And you still won't be rid of the original RREIL since the disassembler uses it; which was one of my points attempting to motivate the function container idea, but I won't harp on that more :P I wish I'd known beforehand that generic function container was off the table to begin with, as some of the changes I made could have gone through, but alas, that's just not how it happened. And again, I'm sorry but I'm just not motivated to go through and extract the relevant parts, discarding imho a good technical decision. I very much look forward to seeing a disassembler come out of this; so I'd advise to focus on that instead of some of the optimizations here (and even in the bitcode PR)... Perhaps we can revisit some other time, I don't know. But frankly if your vision with this project is a integrated GUI, then I'm not sure whether our visions are aligned, which again is fine. Since I do think the decisions I made are good, and I mention this entirely in good faith, not as a threat of any kind, but there is a chance I may fork at some point and use the api I developed for my own purposes, specifically as a backend lib to a tool I've been toying with off and on for a bit, but thats probably somewhat into the future. Again, don't be stressed or worried, and I very much want to see a working disassembler! The world needs one (a libre one), and hoping that this misunderstanding didn't delay such a thing :) Take care, |
😭 I thought this was my PR, sorry for closing :P |
Continued from #333
Panopticon is (still) slow and has an awkward API. This PR implements the changes suggested in #308 in response to #313.
RREIL code is encoded to save space and cut down on allocations. Also. mnemonics and basic blocks can be iterated w/o traversing the cfg. The new disassembly code is ~2 times faster (2.7s vs. 4.3s for a static Hello World).
This PR adds a new crate with (micro) benchmarks to track performance improvements over the old code.