Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: New core API #333

Closed
wants to merge 9 commits into from
Closed

WIP: New core API #333

wants to merge 9 commits into from

Conversation

flanfly
Copy link
Member

@flanfly flanfly commented Oct 10, 2017

Panopticon is (still) slow and has an awkward API. This PR implements the changes suggested in #308 in response to #313.
RREIL code is encoded to save space and cut down on allocations. Also. mnemonics and basic blocks can be iterated w/o traversing the cfg. The new disassembly code is ~2 times faster (2.7s vs. 4.3s for a static Hello World).

This PR adds a new crate with (micro) benchmarks to track performance improvements over the old code.

@m4b
Copy link
Collaborator

m4b commented Oct 10, 2017

Just briefly, the way you've implemented function in neo I think this is the completely perfect opportunity to have a Function trait, which both old and new implement; then analyze can be generic over the container (e.g., function) that it generates. I won't really have to update the cli then, and can easily test performance differences between old and new by simply passing a command line flag.

Similarly, with program and project can be generic in their functions.

I also see no reason why liveness and other data flow analysis couldn't be completely generic, thus reusing all of that tested application logic, etc.

It may be some extra work at the offset, but I think it is 100% worth it. And if we do come up with a nice generic function api, we can upstream it to the RRC

Also, great work! ❤️

@flanfly
Copy link
Member Author

flanfly commented Oct 10, 2017

In the long run I plan to replace the old API completely, so I'm not sure wherever its worth it to implement the trait for it. Also I'm not 100% sure everything is right yet. For example in Falcon functions have a single exit node. This comes in handy at times, I just don't know yet if it's worth the extra complexity.

@m4b
Copy link
Collaborator

m4b commented Oct 12, 2017

So I'm pretty convinced generics are totally worth it here. For starters:

  1. I think that the data-flow algos should definitely be generic (we need only fix it to petgraph and be generic in the statement IL)
  2. By making Function a trait and having disassembly and analysis be generic, we can easily implement the "turn off rreil" feature by having struct which implements the Function trait but simply disassembles and does not create an IL.
  3. Downstream clients could create a new IL tuned for specific a usecase, and they need only implement the various Traits and they will automatically gain access to the full power of any of the analyses. It makes it so that clients can right downstream extensible code, essentially.

And a bunch of other reasons; besides not having to copy paste crap all over with minor tweaks, etc. E.g., the printer logic is basically unchanged, and the app logic is completely unchanged except for modification of the signature.

Anyway, with very minimal effort, I got neo::Function working with app_logic in CLI by making it totally generic.

In that branch https://github.com/das-labor/panopticon/tree/m4b/generic_function can now compare more or less difference between the two implementations:

e.g., on my machine:

RUST_LOG=panop=info /usr/bin/time ./panop /usr/lib/libc.so.6
52.02user 1.20system 0:16.66elapsed 319%CPU (0avgtext+0avgdata 2756540maxresident)k
0inputs+0outputs (0major+658616minor)pagefaults 0swaps

RUST_LOG=panop=info /usr/bin/time ./panop --neo /usr/lib/libc.so.6
21.26user 0.34system 0:06.98elapsed 309%CPU (0avgtext+0avgdata 450880maxresident)k
104inputs+0outputs (0major+134643minor)pagefaults 0swaps

For larger binaries, it's honestly astounding the performance speedup (and we haven't even started on easy gains like deduping strings across the program, etc.). Note it's not quite fair in the above, since ssa_convertion is being called on each function; the latest commit of that branch now lazily performs it only on filtered functions, which decreases memory by significant amount as well as time:

/usr/bin/time cargo run --release -- -f printf /usr/lib/libc.so.6 
30.32user 0.62system 0:10.28elapsed 300%CPU (0avgtext+0avgdata 1752408maxresident)k
0inputs+0outputs (0major+285285minor)pagefaults 0swaps

E.g., 1.7GB; but this is still 4.25x the size of the neo version!

It gets even better, the larger the binary, the better the gains. Here is a larger binary with about 3k functions:

m4b@efrit ::  [ ~/projects/panopticon/cli ] /usr/bin/time cargo run --release -- -f printf ~/binaries/big
419.39user 4.06system 3:28.99elapsed 202%CPU (0avgtext+0avgdata 6748104maxresident)k
346240inputs+0outputs (19990major+1688060minor)pagefaults 0swaps

m4b@efrit ::  [ ~/projects/panopticon/cli ] /usr/bin/time cargo run --release -- --neo -f printf ~/binaries/big
96.14user 1.81system 0:33.81elapsed 289%CPU (0avgtext+0avgdata 1720612maxresident)k
253376inputs+0outputs (883major+629224minor)pagefaults 0swaps

--neo uses almost 1/6th the memory, and runs 6x as fast!! (and this is with the lazy ssa_conversion optimization for regular function)

@m4b
Copy link
Collaborator

m4b commented Oct 16, 2017

Well, I have good news and bad news; good news is the neo version definitely yields memory intensive savings, and also more good news is that petgraph is a really great library.

the "bad news" is i ported function to use petgraph, and re-ran benchmarks, and basically the old function is actually faster than neo (but as expected, uses less memory). Here are a couple tables I created to illustrate this:

libc, 2999 functions:

graph lib neo func time RES memory
petgraph yes 0:18.16 504424 KB
petgraph no 0:07.38 1822452 KB

rust binary, 2843 (long) functions:

graph lib neo func time RES memory
petgraph yes 1:04.27 1808948 KB
petgraph no 0:53.57 6525708 KB

What this indicates to me is that our graph lib was a significant bottleneck, and its good we switched, but some of the CPU savings that the neo version brings are actually misleading, since most of it comes from petgraph...

@m4b
Copy link
Collaborator

m4b commented Nov 1, 2017

This is still on personal branch FYI

@coveralls
Copy link

Coverage Status

Coverage increased (+2.6%) to 61.086% when pulling 60348f1 on flanfly:bitcode into 1ec3057 on das-labor:master.

@flanfly flanfly closed this Nov 1, 2017
@flanfly flanfly mentioned this pull request Nov 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants