-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reversible binary patch format #23379
Comments
use the -r flag |
As of 2578ff0, using
|
you can also use The proposal to create a decent and standarized and extensible binary patching file format in plain text looks quite interesting to me, and i would love to have support in r2 for that. btw there's also support for rapatch. but its just part of r2, not a standalone tool. but for consistency with radiff2 it probably makes sense to have a rapatch2 tool instead of having an r2 uppercase flag.
You can read more about this in doc/rapatch.md |
just created a new tool that cant bemerged until r2-6.0 for now is just a dummy thing, but I agree that your proposal is important and should be treated as a first class tool, would you like to improve radiff2 to support this output? i'm not 100% sure about the LE/BE values because radiff just spots changes in byte which may not really know if the underlying data is a word or qword. the patch format can specify that or maybe we can do some happy assumptions on this. i think we have time during 5.9.x until we reach 6.0 to break abi and provide such new tool with proper manpage and a working patch format for unified binary patching. |
Thanks for taking a look. I'll need to think about the patch format to come up with a spec that makes sense first. You can always fall back to plain hex bytes if you don't know how the data type, but the patch format should allow you to use a more human friendly format. The problem with the flow of saving the binary and then diffing with the original is that you miss information on how the user specified the changes. It may be better to generate a patch from r2 itself when you know how those changes were made. This way you can store in the patch comments such as the instruction being changed or ASCII. I suspect that for any w command that you do with radare2, you can always find the opposite command that would revert that change, and specify it using the same value format. For example, --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ -0x00100000,4 +0x00100000,4 @@
- wvf 1.23
+ wvf 3.21 This is nice for radare2 users because they will be already familiar with the commands, but it doesn't make a lot of sense for users of other tools. It also has the problem that there is not information about the byte order. Something like this: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ -0x00100000,4 +0x00100000,4 @@
- LE (float) 1.23
+ LE (float) 3.21 May be more understandable, specially if we use known types like C. Another issue is that in the common case, you will always use the same LE/BE for a patch (although we should support the cases when that is not true), so you don't need to pollute every line with LE/BE. It is conveninent to define a byte order at the start: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00100000,4 +0x00100000,4 @@
- (float) 1.23
+ (float) 3.21 Now, there is the case in which a user may specify integers in different bases hex/dec/octal. In my above case: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00189ca8,4 +0x00189ca8,4 @@
- (uint32_t) 1799000 # Using a type that can map a constant into bytes
+ (uint32_t) 1920000 # Notice how I use decimal here I think it may be good to use the C format for numbers too: 0123 = octal, 123 = dec, 0x123 = hex. This may also be valid: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00189ca8,12 +0x00189ca8,12 @@
- (uint32_t []) { 1799000, 1799001, 1799002 }
+ (uint32_t []) { 1920000, 0x123, 07 } But it starts to complicate the syntax. Also, if I want to add another number, I would need to modify the 12 in the hunk header. So this may be simpler: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00189ca8,uint32_t +0x00189ca8,uint32_t @@
- 1799000 # Using a type that can map a constant into bytes
+ 1920000 # Notice how I use decimal here --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00189ca8,uint32_t[3] +0x00189ca8,uint32_t[3] @@
- 1799000, 1799001, 1799002
+ 1920000, 0x123, 0777 # Notice 0777 is octal All those cases can be mapped to the basic format, where everything is a simple hex string. I don't like to just specify an hex string that could be confused with a number. Also, I think So maybe we can use something like this: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ -0x00189ca8,char[4] +0x00189ca8,char[4] @@
- '00 1b 73 58'
+ '00 4c 1d 00' Notice there is no byte order, as hex string don't need one. We can also probably specify that the default is LE, and only write BE when needed. There is also mixed endianness, but I think we can ignore those for now, and fall back to hex strings if needed. This basic hex format is probably doable to be implemented in radiff2 without much effort, as we don't even need to output aligned words, just which bytes differ. I can try to modify radiff2, but I'm not familiar with the codebase so it may take a while. On a more advanced implementation, one could determine what type of data is placed on which addresses of a binary file, and then produce the appropriate representation in a patch when changing those bytes. Otherwise fall back to hex strings. The hex format should allow you to split the lines as you want, so you can write instructions properly: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ -0x00189ca8,char[4] +0x00189ca8,char[4] @@
- '01 46' # mov r1, r0
- '68 46' # mov r0, sp
+ '4f f2 ba fc' # bl 0x254526 It would be also nice if this format is a superset of the patch format, so you can also apply normal patches with rapatch2 (or even mix hunks). I think this can be easily done by using the "type" specifier of the hunk. So This may be useful if you have a mix of source code and blobs and you want to specify a patch to change both. |
This oneliner more or less implements the hex diff (addresses are decimal and start at 1): % bindiff() { diff -u0p <(od -An -vtx1 -w1 $1) <(od -An -vtx1 -w1 $2) | sed '/^@@/s/,\([0-9]*\)/,char[\1]/g' }
% bindiff v2.bin v3.bin
--- /proc/self/fd/11 2024-09-26 22:10:25.813980894 +0200
+++ /proc/self/fd/13 2024-09-26 22:10:25.813980894 +0200
@@ -1612969,char[3] +1612969,char[3] @@
- 58
- 73
- 1b
+ 00
+ 4c
+ 1d |
Let's discuss it in here https://hackmd.io/@BCdr4EkGSKO51w6pf-JUow/r1h5idQRC/edit |
I have created a draft of the specification here: https://github.com/rodarima/xpatch/ For now I'm calling it "xpatch" as in extended patch. It is a superset of the patch(1) format, so you can mix plain text and binary patches in the same xpatch file. There is a simple xdiff program as an example, but to implement a full xpatch I'll probably need to bring bison to parse the grammar properly and detect errors. I'm still thinking a bit about the format, but so far it seems suitable for all my usecases. I've also added support to parse hex strings directly by specifying the printf-like parsing format |
rapatch2 tool has been merged into master, now its the time to start implementing the required bits, want to submit a PR? |
I added a simple proof of concept xpatch that parses a small subset of the specification, but is not complete. I would like to first get some feedback on the syntax, before I attempt to implement the whole thing. Specially the extended format. I think it covers all the use cases I wanted, but I also think it also has enough expressive power to store all file manipulations you may be doing with radare2 (or other tools). |
I would like to generate some binary patches that can be read in plain text, in the same way I do with diff(1) and patch(1). In particular, I want to be able to do these operations:
The default radiff2(1) format is close to what I want.
It has the benefit that it can be reversed by a simple awk(1) program:
However, AFAIK this format doesn't seem to be accepted by any tool.
The r2 format outputs radare2(1) commands, but they ignore what was in that address before:
This is not enough, as I want to know if a give patch collides with another one. This also prevents from reverting an applied patch.
There is also the rapatch.md format, but it seems to be different than these two. And it also seems to have the same problem, it cannot be reverted.
Maybe this problem can be solved by implementing a reversible operator like
wx
that swap bytes instead of overwriting an address.The problem of this approach is that when a swap command fails, it should output that hunk into a reject file, which is probably not what you want from a r2 session.
Maybe it would be a better idea to have another tool just for this workflow (which could also work with multiple files at once). You first perform all the changes you want with r2
w
commands, then you save the file and generate a patch that can be further edited and applied/reversed:Here is an example of what that patch may look like, which is very close to what patch(1) expects:
The benefit of such format is that:
This format can also be used to insert or remove bytes, leaving a different sized file. It also prevents the problem of using multiple write commands for the same memory location if the hunk addresses are sorted. It also resembles the patch format closely enough that it gets the syntax colors of normal patches on GitHub.
Patches of patches are also readable:
I think I could adapt radiff2.c to output such format, and maybe modify patch(1) to accept them.
The text was updated successfully, but these errors were encountered: