Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PE: Add resource parser #431

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

kkent030315
Copy link
Contributor

@kkent030315 kkent030315 commented Oct 31, 2024

Added resource parser implementation as proposed in the issue but with also manifests and fixed version info.

  • Added parser for resource entries
  • Added parser for RT_VERSION
  • Added parser for RT_MANIFEST
  • Added integration tests for the parsers

Comment on lines +27 to +41
/// Performs arbitrary alignment of values based on homogeneous numerical types.
#[inline]
pub(super) fn align_up<N>(value: N, align: N) -> N
where
N: core::ops::Add<Output = N>
+ core::ops::Not<Output = N>
+ core::ops::BitAnd<Output = N>
+ core::ops::Sub<Output = N>
+ core::cmp::PartialEq
+ core::marker::Copy,
u8: Into<N>,
{
debug_assert!(align != 0u8.into(), "Align must be non-zero");
(value + align - 1u8.into()) & !(align - 1u8.into())
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I impl this as pub in pe::utils like pe::utils::find_offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure yes, as we have to have a duplicate function in CLR parser proposal at: https://github.com/m4b/goblin/pull/432/files#diff-cd5f7fb18a0ecca1bff11a2d304c2446f656baf68e045ae8ec0d78ba68600e8bR17-R31

Comment on lines +19 to +26
pub(super) fn to_utf16_string(bytes: &[u8]) -> String {
let u16_slice = bytes
.chunks(2)
.map(|chunk| u16::from_le_bytes([chunk[0], chunk[1]]))
.take_while(|&wchar| wchar != 0)
.collect::<Vec<_>>();
String::from_utf16_lossy(&u16_slice)
}
Copy link
Contributor Author

@kkent030315 kkent030315 Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lttle bit confusing part, but by design (ref ResourceString::parse) it is guaranteed to be 2-bytes aligned within usage of this resource module. I believe theres better way to do this, or should I remove alloc::string::Strings and let consumers impl higher implementations (thus keep our impl's lower impls)?

src/pe/resource.rs Outdated Show resolved Hide resolved
src/pe/resource.rs Outdated Show resolved Hide resolved
Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a huge PR! Thank you for adding this though.

Some general feedback:

  1. There is a lot of new pub fields, pub structs and pub functions
    a. I believe i've mentioned in past, but to reiterate: goblin is very conservative with pub, can consider it in your mind as basically 1.0, but occasionally letting through breaking changes; with that in mind, it's important to ask yourself/ourselves whether:

    i. will the new pub structs/enums, etc., have new fields, either by addition by the binary stakeholders (microsoft, linux (i guess?), apple), or by some current oversight; if yes, or likely to have new fields added, either needs to be marked `#[non_exhaustive]`, or a field hidden so that it isn't constructible, and so adding a new field is legal. the owners of the format adding new fields is unlikely, but it has happened during the existence of this crate (e.g., apple adding new binary variants, new `LC_COMMANDS`, etc.)
    ii. do you _really_ need a field `pub`? similarly with a function; if yes, that's fine, but if any doubt, start with private/`pub(crate)` and we can make it pub later if needed; once it's pub it generally can't go back.
    
  2. I haven't investigate in detail, but I assume a lot of the offset math is checked/etc., before doing any kind of unprotected/panicable &[offset..foo]; from what I've seen it all looks good, just want to double check, and I know you've fixed that in the past, so i suspect you already know this one.

  3. be aware of allocations, etc., especially temporary ones, and as i mentioned, probably need to make a decision for the utf16 strings, as I mentioned in some comments.

Anyway, this is great work, other than some fairly minor nits, this looks pretty good to go, but just wanted to get some of the pub fields that don't need to be pub fixed, and etc.

Thank you for your patience and this amazing work!

pub const RT_MANIFEST: u16 = 24;

/// Represents an image resource directory in the PE (Portable Executable) format.
#[repr(C)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably does not need to be repr(C), unless it's planned on being shared across ffi boundaries, or some mechanism makes use of it's sizeof which might differ from how rust lays out the struct, etc.
It also doesn't hurt strictly (though it might use more memory than a non repr(C) rust struct), and it could be argued if it exactly mirrors the (likely original C) struct that is in the PE binary, it also serves as a form of documentation as such. 🤷

src/pe/resource.rs Outdated Show resolved Hide resolved
src/pe/resource.rs Outdated Show resolved Hide resolved
///
/// Returns `true` if the name is a string, otherwise `false`.
pub fn name_is_string(&self) -> bool {
self.name_or_id & IMAGE_RESOURCE_NAME_IS_STRING != 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency I would have called IMAGE_RESOURCE_NAME_IS_STRING_MASK or IMAGE_RESOURCE_NAME_STRING_MASK but maybe this mask comes from windows tooling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is exactly as-is defined in Windows SDK. So we better keep them.

src/pe/resource.rs Show resolved Hide resolved
src/pe/resource.rs Outdated Show resolved Hide resolved
Comment on lines +972 to +976
pub comments: Option<&'a [u8]>,
/// The name of the company that produced the file, e.g., "Microsoft Corporation".
pub company_name: Option<&'a [u8]>,
/// A description of the file suitable for presentation to users, e.g., "Keyboard driver for AT-style keyboards".
pub file_description: Option<&'a [u8]>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in general, for these utf16 strings, I would in general either pre-parse them (especially if this doesn't occur a lot) to the utf8 owned String, and keep the fields pub; or i would make these fields not pub and only expose the functions returning the parsed version.

in the case of this struct, since there are so many fields, I would almost prefer to pre-parse it, but maybe only if expect there to e.g,. be only one copy of this struct in the resource data; if it's in a vector or iterator and could have a perf issue due to eager pre-parsing, and maybe no one uses, then we can make it private, and have the function logic like you have below

src/pe/resource.rs Outdated Show resolved Hide resolved
opts: &options::ParseOptions,
) -> error::Result<Option<Self>> {
let it = it.collect::<Result<Vec<_>, _>>()?;
let it = it.iter().find(|e| e.id() == Some(RT_VERSION));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc, you are finding the element that matches RT_VERSION; you shouldn't have to collect to do this, which is an extra allocation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still be able to express like following but I don't like that kind of ugly expression:

    ) -> error::Result<Option<Self>> {
        let mut maybe_entry = None;
        for x in it {
            if x?.id() == Some(RT_VERSION) {
                entry = Some(x?);
                break;
            }
        }

        if let Some(entry) = maybe_entry {

This is semantically not equivalent since it hides the error in while converting Result<T,E> into Option<T> I really don't like:

        if let Some(entry) = it
            .filter_map(Result::ok)
            .find(|x| x.id() == Some(RT_VERSION))

So that I concluded to following implementation:

impl<'a> ResourceEntryIterator<'a> {
    /// Find the resource entry by its resource ID.
    pub fn find_by_id(&self, id: u16) -> error::Result<Option<ResourceEntry>> {
        self.map(|x| {
            x.and_then(|x| {
                if x.id() == Some(id) {
                    Ok(Some(x))
                } else {
                    Ok(None)
                }
            })
        })
        .find_map(Result::transpose)
        .transpose()
    }
}
    ) -> error::Result<Option<Self>> {
        if let Some(entry) = it.find_by_id(RT_VERSION)? {

This also provides shortcut for consumers as find_by_id is pub.
What do you think about this?

opts: &options::ParseOptions,
) -> error::Result<Option<Self>> {
let it = it.collect::<Result<Vec<_>, _>>()?;
let it = it.iter().find(|e| e.id() == Some(RT_MANIFEST));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here; all of this code looks very similar, with only minor variations; my brain immediately wants to abstract it somehow, but it will likely have very little gain, and likely only add complication :)

@kkent030315
Copy link
Contributor Author

I found a bug related to an alignment. It seems a buffer passed to ResourceStringIterator sometimes aligned with 8 bytes where the case has paddings at the tail (e.g., RT_VERSION in ntdll.dll), but sometimes it doesn't (e.g., PYTHON_INSTALLER_VERSION_INFO).

DEBUG - Parsed next resource string as size 0x24: ResourceString {
    len: 0x24,
    value_len: 0x4,
    type: 0 (Binary Data),
    key: "Translation",
    key_slice: [54, 00, 72, 00, 61, 00, 6e, 00, 73, 00, 6c, 00, 61, 00, 74, 00, 69, 00, 6f, 00, 6e, 00] (22 bytes),
    value_slice: [09, 04, b0, 04] (4 bytes, Correct),
}
DEBUG - 4 bytes remaining
type is too big (2) for 0

@m4b
Copy link
Owner

m4b commented Jan 5, 2025

is this in a reviewable state ? i can't remember where we left off exactly, a lot of PRs lately :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants