-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dylib Derivation & Scanning #77
Conversation
…es `LC_SYMTAB` to find the related object files, aka the debug map.
// Enqueue a task for (possibly asynchronous) execution. If the `parallel_processing` setting in the | ||
// ORC config file is true, the task will be enqueued for processing on a background thread pool. | ||
// Otherwise, the task will be executed immediately in the current thread. | ||
void do_work(std::function<void()>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were passing do_work
all over the code base as the scheduler for any ORC task we may want to do asynchronously. (With the parallel_processing
setting set to false
, the tasks are run immediately- that's how we're able to implement that switch.)
Given that do_work
is the only scheduler we were using, I have removed it as a callback (more on the death of callbacks
in a separate comment) and put it into its own header that the sources can reference themselves. This has made our asynchronous story much simpler, in terms of how we, you know, do work.
std::vector<odrv_report> orc_process(const std::vector<std::filesystem::path>&); | ||
std::vector<odrv_report> orc_process(std::vector<std::filesystem::path>&&); | ||
|
||
namespace orc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started putting ORC calls into an orc
namespace. We really need to address this more completely as part of issue #20.
odrv_reporting, | ||
}; | ||
|
||
struct macho_params { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macho_params
surrenders to the fact that all this file scanning, in the end, is Mach-O based. At one point I was writing ORC more from the perspective that it shouldn't matter what the ABI is and the ORC file scanners should be more generic. Perhaps I was thinking we'd be able to incorporate Windows more easily at some point in the future? Who knows.
At any rate, the added genericity resulted in passing around the callbacks
block everywhere, as some kind of pseudo-delegate pattern that wasn't very well thought through. I have since removed it and replaced it with macho_params
. This is a parameter block that is created at the top level orc_process
call and is sent into the file scanning system to tell the Mach-O parser exactly what it should be looking out for. Hopefully this setup is clearer and more extensible than the callbacks
we had before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of windows support, but at this point, I think that would be more about abstracting out the ORC core from the parser. ORC would begin to look more like a multi-platform app, where you have the "core" parts and the "platform/compiler" parts. At that point, we would need to abstract the macho params.
But it's all very theoretical - especially w/o a MSVC implementation, so I think the approach of streamlining what we actually have is the correct one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
@@ -0,0 +1,31 @@ | |||
# For documentation on `just`, see https://github.com/casey/just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just (ha!) discovered the just
tool. It lets us write little scripts in this justfile
, and then we can run them from a command line e.g., just gen
, and it will spin off the proper CMake incantations so I don't have to remember them anymore. I'm an overnight fan.
@@ -0,0 +1,150 @@ | |||
// Copyright 2024 Adobe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this is taken from the orc
file.
@@ -148,61 +148,6 @@ auto& global_die_map() { | |||
|
|||
/**************************************************************************************************/ | |||
|
|||
void register_dies(dies die_vector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to further down this file, out of the anonymous namespace, but otherwise unmodified. That let me wrap it in the orc
namespace without the compiler complaining about orc
being inside an anonymous one.
@@ -211,114 +156,14 @@ struct cmdline_results { | |||
|
|||
/**************************************************************************************************/ | |||
|
|||
struct work_counter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this moved to async.cpp
otherwise unmodified.
// First stage: process all the DIEs | ||
std::vector<odrv_report> orc_process(std::vector<std::filesystem::path>&& file_list) { | ||
// First stage: (optional) dependency/dylib preprocessing | ||
if (settings::instance()._dylib_scan_mode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main engine for ORC, and this new block adds the dylib scan mode as the first step (when asked for.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a big PR - I left some comments, but I think testing will have to be the primary check.
odrv_reporting, | ||
}; | ||
|
||
struct macho_params { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of windows support, but at this point, I think that would be more about abstracting out the ORC core from the parser. ORC would begin to look more like a multi-platform app, where you have the "core" parts and the "platform/compiler" parts. At that point, we would need to abstract the macho params.
But it's all very theoretical - especially w/o a MSVC implementation, so I think the approach of streamlining what we actually have is the correct one.
const macho_params _params; | ||
std::vector<std::string> _unresolved_dylibs; | ||
std::vector<std::string> _rpaths; | ||
struct dwarf _dwarf; // must be last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? that's a little scary - is there a c header / struct thing going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within the macho_reader
constructor, the constructor for _dwarf
takes copies of some other fields within the macho_reader
:
macho_reader(std::uint32_t ofd_index,
freader&& s,
file_details&& details,
macho_params&& params)
: _ofd_index(ofd_index),
_s(std::move(s)),
_details(std::move(details)),
_params(std::move(params)),
_dwarf(ofd_index, copy(_s), copy(_details)) { ...
Since _dwarf
took copies from other member fields in the same struct, I added the comment to make sure it is constructed after those fields. I guess it doesn't have to be last, so the comment is a bit misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - thanks for all the hard work on this one.
This PR adds a new mode to ORC, called "dylib scanning" or "dylib derivation" mode. This mode allows ORC to examine the dependencies of a post-link artifact, scanning for ODRVs between the host executable and its dylibs as best it can.
Dylib scanning mode should be seen as a compliment to and not a replacement of ORC's "classic" mode. Any final artifact is made "ODR clean" by the linker, which assumes ODR was upheld during the creation process. Thus, any ODRVs within an artifact cannot be discovered by this mode. However, ORC's "classic" mode does not scan across final artifacts (e.g., an executable and the dylibs it depends upon), which is a common distribution method for applications. Therefore, this mode is the second of a one-two punch intended to more thoroughly detect ODR violations in an application.