-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PE: Add resource parser #431
base: master
Are you sure you want to change the base?
Conversation
/// Performs arbitrary alignment of values based on homogeneous numerical types. | ||
#[inline] | ||
pub(super) fn align_up<N>(value: N, align: N) -> N | ||
where | ||
N: core::ops::Add<Output = N> | ||
+ core::ops::Not<Output = N> | ||
+ core::ops::BitAnd<Output = N> | ||
+ core::ops::Sub<Output = N> | ||
+ core::cmp::PartialEq | ||
+ core::marker::Copy, | ||
u8: Into<N>, | ||
{ | ||
debug_assert!(align != 0u8.into(), "Align must be non-zero"); | ||
(value + align - 1u8.into()) & !(align - 1u8.into()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I impl this as pub
in pe::utils
like pe::utils::find_offset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure yes, as we have to have a duplicate function in CLR parser proposal at: https://github.com/m4b/goblin/pull/432/files#diff-cd5f7fb18a0ecca1bff11a2d304c2446f656baf68e045ae8ec0d78ba68600e8bR17-R31
pub(super) fn to_utf16_string(bytes: &[u8]) -> String { | ||
let u16_slice = bytes | ||
.chunks(2) | ||
.map(|chunk| u16::from_le_bytes([chunk[0], chunk[1]])) | ||
.take_while(|&wchar| wchar != 0) | ||
.collect::<Vec<_>>(); | ||
String::from_utf16_lossy(&u16_slice) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a lttle bit confusing part, but by design (ref ResourceString::parse
) it is guaranteed to be 2-bytes aligned within usage of this resource
module. I believe theres better way to do this, or should I remove alloc::string::String
s and let consumers impl higher implementations (thus keep our impl's lower impls)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a huge PR! Thank you for adding this though.
Some general feedback:
-
There is a lot of new pub fields, pub structs and pub functions
a. I believe i've mentioned in past, but to reiterate: goblin is very conservative withpub
, can consider it in your mind as basically 1.0, but occasionally letting through breaking changes; with that in mind, it's important to ask yourself/ourselves whether:i. will the new pub structs/enums, etc., have new fields, either by addition by the binary stakeholders (microsoft, linux (i guess?), apple), or by some current oversight; if yes, or likely to have new fields added, either needs to be marked `#[non_exhaustive]`, or a field hidden so that it isn't constructible, and so adding a new field is legal. the owners of the format adding new fields is unlikely, but it has happened during the existence of this crate (e.g., apple adding new binary variants, new `LC_COMMANDS`, etc.) ii. do you _really_ need a field `pub`? similarly with a function; if yes, that's fine, but if any doubt, start with private/`pub(crate)` and we can make it pub later if needed; once it's pub it generally can't go back.
-
I haven't investigate in detail, but I assume a lot of the offset math is checked/etc., before doing any kind of unprotected/panicable
&[offset..foo]
; from what I've seen it all looks good, just want to double check, and I know you've fixed that in the past, so i suspect you already know this one. -
be aware of allocations, etc., especially temporary ones, and as i mentioned, probably need to make a decision for the utf16 strings, as I mentioned in some comments.
Anyway, this is great work, other than some fairly minor nits, this looks pretty good to go, but just wanted to get some of the pub fields that don't need to be pub fixed, and etc.
Thank you for your patience and this amazing work!
pub const RT_MANIFEST: u16 = 24; | ||
|
||
/// Represents an image resource directory in the PE (Portable Executable) format. | ||
#[repr(C)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably does not need to be repr(C)
, unless it's planned on being shared across ffi boundaries, or some mechanism makes use of it's sizeof which might differ from how rust lays out the struct, etc.
It also doesn't hurt strictly (though it might use more memory than a non repr(C)
rust struct), and it could be argued if it exactly mirrors the (likely original C) struct that is in the PE binary, it also serves as a form of documentation as such. 🤷
/// | ||
/// Returns `true` if the name is a string, otherwise `false`. | ||
pub fn name_is_string(&self) -> bool { | ||
self.name_or_id & IMAGE_RESOURCE_NAME_IS_STRING != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency I would have called IMAGE_RESOURCE_NAME_IS_STRING_MASK
or IMAGE_RESOURCE_NAME_STRING_MASK
but maybe this mask comes from windows tooling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is exactly as-is defined in Windows SDK. So we better keep them.
pub comments: Option<&'a [u8]>, | ||
/// The name of the company that produced the file, e.g., "Microsoft Corporation". | ||
pub company_name: Option<&'a [u8]>, | ||
/// A description of the file suitable for presentation to users, e.g., "Keyboard driver for AT-style keyboards". | ||
pub file_description: Option<&'a [u8]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in general, for these utf16 strings, I would in general either pre-parse them (especially if this doesn't occur a lot) to the utf8 owned String, and keep the fields pub
; or i would make these fields not pub
and only expose the functions returning the parsed version.
in the case of this struct, since there are so many fields, I would almost prefer to pre-parse it, but maybe only if expect there to e.g,. be only one copy of this struct in the resource data; if it's in a vector or iterator and could have a perf issue due to eager pre-parsing, and maybe no one uses, then we can make it private, and have the function logic like you have below
src/pe/resource.rs
Outdated
opts: &options::ParseOptions, | ||
) -> error::Result<Option<Self>> { | ||
let it = it.collect::<Result<Vec<_>, _>>()?; | ||
let it = it.iter().find(|e| e.id() == Some(RT_VERSION)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc, you are finding the element that matches RT_VERSION; you shouldn't have to collect to do this, which is an extra allocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still be able to express like following but I don't like that kind of ugly expression:
) -> error::Result<Option<Self>> {
let mut maybe_entry = None;
for x in it {
if x?.id() == Some(RT_VERSION) {
entry = Some(x?);
break;
}
}
if let Some(entry) = maybe_entry {
This is semantically not equivalent since it hides the error in while converting Result<T,E>
into Option<T>
I really don't like:
if let Some(entry) = it
.filter_map(Result::ok)
.find(|x| x.id() == Some(RT_VERSION))
So that I concluded to following implementation:
impl<'a> ResourceEntryIterator<'a> {
/// Find the resource entry by its resource ID.
pub fn find_by_id(&self, id: u16) -> error::Result<Option<ResourceEntry>> {
self.map(|x| {
x.and_then(|x| {
if x.id() == Some(id) {
Ok(Some(x))
} else {
Ok(None)
}
})
})
.find_map(Result::transpose)
.transpose()
}
}
) -> error::Result<Option<Self>> {
if let Some(entry) = it.find_by_id(RT_VERSION)? {
This also provides shortcut for consumers as find_by_id
is pub
.
What do you think about this?
src/pe/resource.rs
Outdated
opts: &options::ParseOptions, | ||
) -> error::Result<Option<Self>> { | ||
let it = it.collect::<Result<Vec<_>, _>>()?; | ||
let it = it.iter().find(|e| e.id() == Some(RT_MANIFEST)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto here; all of this code looks very similar, with only minor variations; my brain immediately wants to abstract it somehow, but it will likely have very little gain, and likely only add complication :)
I found a bug related to an alignment. It seems a buffer passed to
|
is this in a reviewable state ? i can't remember where we left off exactly, a lot of PRs lately :) |
Added resource parser implementation as proposed in the issue but with also manifests and fixed version info.
RT_VERSION
RT_MANIFEST