Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dylib Derivation & Scanning #77

Merged
merged 26 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
6f89bea
wip
fosterbrereton May 1, 2024
b4b03b5
first pass at dependency scanning. For locally build objects, travers…
fosterbrereton May 1, 2024
f6d9f5e
adding justfile
fosterbrereton May 1, 2024
d60d48b
justfile tweaks
fosterbrereton May 2, 2024
3de9c00
refactoring to keep the right work in the right sources
fosterbrereton May 2, 2024
cf3dd79
checked in a typo
fosterbrereton May 2, 2024
6d0fbad
cleanup
fosterbrereton May 2, 2024
0e8dee7
cleanup
fosterbrereton May 2, 2024
5ccc0a2
cleanup
fosterbrereton May 2, 2024
c516b87
cleanup
fosterbrereton May 2, 2024
4b6a52f
reverting a change
fosterbrereton May 2, 2024
9d3aa33
cleanup
fosterbrereton May 2, 2024
f365a0c
more improvements and cleaning up the code so I can do recursive dyli…
fosterbrereton May 2, 2024
7ca396b
more improvements and cleaning up the code so I can do recursive dyli…
fosterbrereton May 2, 2024
d050908
more improvements and cleaning up the code so I can do recursive dyli…
fosterbrereton May 2, 2024
2978d1c
fix up `orc_test`
fosterbrereton May 3, 2024
c6a1114
replacing callbacks with a `macho_params` block
fosterbrereton May 3, 2024
5907afb
got the right `executable_path` wired up during dylib resolution
fosterbrereton May 3, 2024
e6ae50f
recursive dependency derivation is working
fosterbrereton May 3, 2024
e60946b
tweak
fosterbrereton May 7, 2024
041b105
Merge branch 'main' into fbrereto/dylib-scanning-mode
fosterbrereton May 14, 2024
e4dd201
performance improvements
fosterbrereton May 14, 2024
5841535
review change from @leethomason
fosterbrereton May 14, 2024
a4987e7
review change from @leethomason
fosterbrereton May 14, 2024
2c1d210
Yet another build break
fosterbrereton May 14, 2024
654d789
typo
fosterbrereton May 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ set(CMAKE_CXX_EXTENSIONS OFF)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_XCODE_GENERATE_SCHEME OFF)
# This does not appear to be set via `CMAKE_CXX_STANDARD`; not sure why.
set(CMAKE_XCODE_ATTRIBUTE_CLANG_CXX_LANGUAGE_STANDARD "c++20")

file(GLOB SRC_FILES CONFIGURE_DEPENDS ${PROJECT_SOURCE_DIR}/src/*.cpp)
file(GLOB HEADER_FILES CONFIGURE_DEPENDS ${PROJECT_SOURCE_DIR}/include/orc/*.hpp)
Expand Down
8 changes: 8 additions & 0 deletions _orc-config
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,14 @@ log_level = 'warning'

standalone_mode = false

# Scan the binary passed in to the command line for its Mach-O dependencies, specifically any
# dylibs it depends on (and rpaths where they might be found.) Then perform a scan over that
# collection as if it were runtime-linked.
#
# The default value is `false`.

dylib_scan_mode = false

# If defined, ORC will log output to the specified file. (Normal stream output
# is unaffected by the `output_file`.)
#
Expand Down
2 changes: 1 addition & 1 deletion include/orc/ar.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ void read_ar(object_ancestry&& ancestry,
freader& s,
std::istream::pos_type end_pos,
file_details details,
callbacks callbacks);
macho_params params);

/**************************************************************************************************/
30 changes: 30 additions & 0 deletions include/orc/async.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// Copyright 2024 Adobe
// All Rights Reserved.
//
// NOTICE: Adobe permits you to use, modify, and distribute this file in accordance with the terms
// of the Adobe license agreement accompanying it.

#pragma once

#include <functional>

//======================================================================================================================

namespace orc {

//======================================================================================================================

// Enqueue a task for (possibly asynchronous) execution. If the `parallel_processing` setting in the
// ORC config file is true, the task will be enqueued for processing on a background thread pool.
// Otherwise, the task will be executed immediately in the current thread.
void do_work(std::function<void()>);
Copy link
Contributor Author

@fosterbrereton fosterbrereton May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were passing do_work all over the code base as the scheduler for any ORC task we may want to do asynchronously. (With the parallel_processing setting set to false, the tasks are run immediately- that's how we're able to implement that switch.)

Given that do_work is the only scheduler we were using, I have removed it as a callback (more on the death of callbacks in a separate comment) and put it into its own header that the sources can reference themselves. This has made our asynchronous story much simpler, in terms of how we, you know, do work.


// blocks the calling thread until all enqueued work items have completed. If the
// `parallel_processing` setting in the ORC config file is `false`, this will return immediately.
void block_on_work();

//======================================================================================================================

} // namespace orc

//======================================================================================================================
3 changes: 1 addition & 2 deletions include/orc/dwarf.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ using die_pair = std::tuple<die, attribute_sequence>;
struct dwarf {
dwarf(std::uint32_t ofd_index,
freader&& s,
file_details&& details,
register_dies_callback&& callback);
file_details&& details);

void register_section(std::string name, std::size_t offset, std::size_t size);

Expand Down
2 changes: 1 addition & 1 deletion include/orc/fat.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ void read_fat(object_ancestry&& ancestry,
freader& s,
std::istream::pos_type end_pos,
file_details details,
callbacks callbacks);
macho_params params);

/**************************************************************************************************/
31 changes: 0 additions & 31 deletions include/orc/mach_types.hpp

This file was deleted.

8 changes: 6 additions & 2 deletions include/orc/macho.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,14 @@ void read_macho(object_ancestry&& ancestry,
freader s,
std::istream::pos_type end_pos,
file_details details,
callbacks callbacks);
macho_params params);

/**************************************************************************************************/

struct dwarf dwarf_from_macho(std::uint32_t ofd_index, register_dies_callback&& callback);
struct dwarf dwarf_from_macho(std::uint32_t ofd_index, macho_params params);

/**************************************************************************************************/

std::vector<std::filesystem::path> macho_derive_dylibs(const std::vector<std::filesystem::path>& root_binaries);

/**************************************************************************************************/
10 changes: 9 additions & 1 deletion include/orc/orc.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,13 @@ bool emit_report(const odrv_report& report);

/**************************************************************************************************/

std::vector<odrv_report> orc_process(const std::vector<std::filesystem::path>&);
std::vector<odrv_report> orc_process(std::vector<std::filesystem::path>&&);

namespace orc {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started putting ORC calls into an orc namespace. We really need to address this more completely as part of issue #20.


void register_dies(dies die_vector);

} // namespace orc

void orc_reset();

Expand Down Expand Up @@ -93,3 +99,5 @@ template <class F>
void cerr_safe(F&& f) {
ostream_safe(std::cerr, std::forward<F>(f));
}

/**************************************************************************************************/
20 changes: 13 additions & 7 deletions include/orc/parse_file.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -226,19 +226,25 @@ constexpr std::decay_t<T> copy(T&& value) noexcept(noexcept(std::decay_t<T>{

/**************************************************************************************************/

using register_dies_callback = std::function<void(dies)>;
using do_work_callback = std::function<void(std::function<void()>)>;
using empool_callback = std::function<pool_string(std::string_view)>;
enum class macho_reader_mode {
invalid,
register_dies,
derive_dylibs,
odrv_reporting,
};

struct macho_params {
Copy link
Contributor Author

@fosterbrereton fosterbrereton May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macho_params surrenders to the fact that all this file scanning, in the end, is Mach-O based. At one point I was writing ORC more from the perspective that it shouldn't matter what the ABI is and the ORC file scanners should be more generic. Perhaps I was thinking we'd be able to incorporate Windows more easily at some point in the future? Who knows.

At any rate, the added genericity resulted in passing around the callbacks block everywhere, as some kind of pseudo-delegate pattern that wasn't very well thought through. I have since removed it and replaced it with macho_params. This is a parameter block that is created at the top level orc_process call and is sent into the file scanning system to tell the Mach-O parser exactly what it should be looking out for. Hopefully this setup is clearer and more extensible than the callbacks we had before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of windows support, but at this point, I think that would be more about abstracting out the ORC core from the parser. ORC would begin to look more like a multi-platform app, where you have the "core" parts and the "platform/compiler" parts. At that point, we would need to abstract the macho params.

But it's all very theoretical - especially w/o a MSVC implementation, so I think the approach of streamlining what we actually have is the correct one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

using register_dependencies_callback = std::function<void(std::vector<std::filesystem::path>&&)>;

struct callbacks {
register_dies_callback _register_die;
do_work_callback _do_work;
macho_reader_mode _mode{macho_reader_mode::invalid};
std::filesystem::path _executable_path; // only required if mode == derive_dylibs
register_dependencies_callback _register_dependencies; // only required if mode == derive_dylibs
};

void parse_file(std::string_view object_name,
const object_ancestry& ancestry,
freader& s,
std::istream::pos_type end_pos,
callbacks callbacks);
macho_params params);

/**************************************************************************************************/
1 change: 1 addition & 0 deletions include/orc/settings.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ struct settings {
bool _forward_to_linker{true};
log_level _log_level{log_level::silent};
bool _standalone_mode{false};
bool _dylib_scan_mode{false};
bool _print_object_file_list{false};
std::vector<std::string> _symbol_ignore;
std::vector<std::string> _violation_report;
Expand Down
8 changes: 5 additions & 3 deletions include/orc/tracy.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,10 @@
#endif

//==================================================================================================

namespace orc::tracy {
// This used to be orc::tracy, but then the compiler would complain about some of the Tracy macros
// failing to resolve to symbols in this nested namespace. It failed to find the global ::tracy
// namespace where they actually live. This has been renamed `profiler` to avoid the collision.
namespace orc::profiler {

//==================================================================================================
// returns a unique `const char*` per thread for the lifetime of the application. A _brief_ name,
Expand All @@ -57,6 +59,6 @@ void initialize();

//==================================================================================================

} // namespace orc::tracy
} // namespace orc::profiler

//==================================================================================================
31 changes: 31 additions & 0 deletions justfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# For documentation on `just`, see https://github.com/casey/just
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just (ha!) discovered the just tool. It lets us write little scripts in this justfile, and then we can run them from a command line e.g., just gen, and it will spin off the proper CMake incantations so I don't have to remember them anymore. I'm an overnight fan.

# and the online manual https://just.systems/man/en/
# This set of recipes have only been tested on macOS.

set shell := ["bash", "-uc"]

# Self-help
[private] # Don't print `default` as one of the recipes
default:
@just --list --justfile {{justfile()}} --list-heading $'Usage: `just recipe` where `recipe` is one of:\n'

# Make `build/` if it does not exist
mkdir:
#!/usr/bin/env bash
set -euxo pipefail
if [ ! -d build ]; then
mkdir build
fi

# Generate the cmake project (Tracy disabled)
gen: mkdir
cd build && cmake .. -GXcode -DTRACY_ENABLE=OFF

# Erase and rebuild `build/` folder
[confirm("Are you sure you want to delete `build/`? (y/N)")]
nuke: && mkdir gen
rm -rf build/

# Generate the cmake project (Tracy enabled)
tracy: mkdir
cd build && cmake .. -GXcode -DTRACY_ENABLE=ON
8 changes: 5 additions & 3 deletions src/ar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

// application
#include "orc/str.hpp"
#include "orc/tracy.hpp"

/**************************************************************************************************/

Expand All @@ -33,11 +34,12 @@ void read_ar(object_ancestry&& ancestry,
freader& s,
std::istream::pos_type end_pos,
file_details details,
callbacks callbacks) {
macho_params params) {
std::string magic = read_fixed_string<8>(s);
assert(magic == "!<arch>\n");

// REVISIT: (fbrereto) opportunity to parallelize here.
// The .o files are stored serially within the ar file, so I am not sure how easily this can be
// parallelized (if at all)
while (s.tellg() < end_pos) {
std::string identifier = rstrip(read_fixed_string<16>(s));
std::string timestamp = rstrip(read_fixed_string<12>(s));
Expand All @@ -56,7 +58,7 @@ void read_ar(object_ancestry&& ancestry,

if (identifier.rfind(".o") == identifier.size() - 2) {
auto end_pos = s.tellg() + static_cast<std::streamoff>(file_size);
parse_file(identifier, ancestry, s, end_pos, callbacks);
parse_file(identifier, ancestry, s, end_pos, params);
s.seekg(end_pos); // parse_file could leave the read head anywhere.
} else {
// skip to next file in the archive.
Expand Down
Loading
Loading