-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please make "ir-parser" "disassembler" a first-class component #9
Comments
Ah, and yes, IR needs to be documented ;-). |
I'm not sure what you mean by that. Converting assembly into IR is done by language-specific modules (see |
Then you strongly couple your decompiler to particular mundane disassemblers there. If I don't own IDA and have architecture not supported by Capstone (and it supports only the pop-up-to-boringness ones ;-) ), then I'm hosed - in a sense that I need to dig very deep into many aspects of your decompiler to interface it to something else, instead of just making that "something else" output a standard IR textual form and feed it into your decompiler.
Ok, if your IR supports all those features, can you please consider extending the syntax, and adding parsing support for that (I assume dump support already work)? And I assume you made your own IR syntax for a reason, and I can give only +1 on that, because when you look into some existing solution, you immediately get an expression that it's over-engineered, but well, a subset of LLVM syntax might work ;-). |
Currently this decompiler only support intel assembly (and not all instructions either), so if your goal is to decompile anything else you will need to write the disassembler-to-ir code for whichever combination of host software and assembly language you wish to decompile.
The IR is not meant to be parsed from text with ir_parser.py, that is only for testing purpose. If you want to parse assembly into IR, you would not go through an intermediary "text" that can be dumped/parsed out. What you would do is write support for your target assembly language in |
Sorry, that's exactly what I will do, and that's the basic requirement. It's complex stuff, so having good (human-friendly) representation for intermediate steps is vital. Also, nobody will be able to write "decompiler for everything", so that reduces people to writing "decompiler for X", and that immediately drastically prunes target user base and the reason the decompilation is where it is, with unmaintainable C crapware like Boomerang in ashes for a decade, and bunch of folks writing new crippled toy-likes, e.g. this dude https://github.com/electrojustin/triad-decompiler has a segfaulting thing which can decompile (simple) loops, but can't eliminate superfluous assignments because it doesn't do SSA, yours can do well with contracting expressions in acyclic code, but doesn't do loops, etc., etc. The only solution to that problem is to completely decouple "decompiler" from "convert machine-specific asm to a generic IR" part. Then maybe there will be critical mass to work on "decompiler" part. It's oh so sad that people don't see this obvious solution ;-). |
These 2 parts are well decoupled in my code already. I guess you could write a dumper and parser for IR as it is now, but currently the ir_parser.py is just a toy for testing purpose, it was never meant to parse text for decompilation purpose. I use it mostly for testing the SSA form. Right now the only way to dump out IR (or any other intermediate decompilation step) is to use the C output class (src/output/c.py) but that is a lossy translation as it's meant to look like readable C. As I mentioned it will lose tons of information about operands. You could very well write a new output module that is more verbose just for IR. I would merge a PR for this without problem, but it's not on my roadmap to write one. |
Thanks for explanation. I'll think about it, but lack of loop support prioritizes getting back to looking at other folks' stuff. For reference, proper conversion out of SSA in presence of loops is where I stuck with my crippled toy, a compiler-in-python https://github.com/pfalcon/llvm-codegen-py . I smartly left boring parts to things like clang, and thought that using LLVM IR which is already in SSA will make my task much easier. Turns out, conversion out of SSA requires about same effort and similar algos as converting to SSA, and actually one algo is similar to register allocation, so it itches to combine them, but then it only gets more complex... end result: project stuck. Ah, and also for reference, next in my queue is https://github.com/pfalcon-mirrors/decomp-6502-arm . That does conversion out of SSA, but bugs were reported for loops, surprise. Funnily, the guy eventually just deleted the repo, prompting me to mirror this GPL code, as a tribute to vain community efforts ;-). |
IR is the most important part of this project. Converting assembly to IR is a straightforward grunt work. Please let interested parties skip that boring part and start straight at experimenting with decompilation. Thanks.
The text was updated successfully, but these errors were encountered: