Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parsing Gsym inlined function information #315

Merged
merged 16 commits into from
Sep 22, 2023

Conversation

danielocfb
Copy link
Collaborator

Please see individual commits for descriptions.

@codecov
Copy link

codecov bot commented Sep 14, 2023

Codecov Report

Patch coverage: 93.79% and project coverage change: +0.77% 🎉

Comparison is base (b834508) 85.64% compared to head (db0bdeb) 86.41%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #315      +/-   ##
==========================================
+ Coverage   85.64%   86.41%   +0.77%     
==========================================
  Files          37       39       +2     
  Lines        4659     4865     +206     
==========================================
+ Hits         3990     4204     +214     
+ Misses        669      661       -8     
Files Changed Coverage Δ
src/gsym/parser.rs 93.84% <ø> (+0.76%) ⬆️
src/kernel.rs 0.00% <0.00%> (ø)
src/resolver.rs 0.00% <ø> (ø)
src/symbolize/mod.rs 81.81% <81.81%> (ø)
src/ksym.rs 83.33% <85.71%> (-4.09%) ⬇️
src/gsym/resolver.rs 78.51% <86.88%> (+15.72%) ⬆️
src/symbolize/symbolizer.rs 71.25% <90.66%> (+4.75%) ⬆️
src/dwarf/resolver.rs 92.85% <94.73%> (+0.54%) ⬆️
src/gsym/inline.rs 98.11% <98.11%> (ø)
src/c_api/symbolize.rs 71.25% <100.00%> (+0.34%) ⬆️
... and 3 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@danielocfb danielocfb force-pushed the topic/gsym-inline branch 2 times, most recently from ed75d18 to 4b5591b Compare September 15, 2023 21:06
@danielocfb
Copy link
Collaborator Author

The PR is in a reasonable state with respect to what it covers. However, what is still lacking is the higher level plumbing for blazesym::symbolize::Symbolizer to actually report inlined function information. I think we may want to take this as an opportunity to rework the return type of the symbolize function to begin with, which is why I left it it as a draft for now while I work on that.
That being said, happy to receive feedback and most of the stuff that is out is unlikely to change all that much, I'd guess, it's only that more will be added.

@danielocfb
Copy link
Collaborator Author

Also linking #192 here, which this PR will address partly.

Now that we have plumbed through all necessary bits and pieces for
reporting inlined function information, this change takes the logical
last step and adjusts the Sym type to contain inlined function
information as well.
We opted in favor of including this information into the Sym type
instead of creating fully blown Sym objects, to more closely track how
the various supported formats treat inlined function information and to
make it possible to have a 1:1 mapping between input address and output
Sym.

Refs: #192

Signed-off-by: Daniel Müller <[email protected]>
Now that all the pieces came together and we are able to report inlined
functions, this change adds a test checking that inlined function
symbolization works as expected for the Gsym file source.

Signed-off-by: Daniel Müller <[email protected]>
@danielocfb danielocfb force-pushed the topic/gsym-inline branch 2 times, most recently from 75d8014 to cb903af Compare September 19, 2023 22:17
@danielocfb danielocfb marked this pull request as ready for review September 19, 2023 22:18
@danielocfb danielocfb force-pushed the topic/gsym-inline branch 6 times, most recently from e672676 to efea57f Compare September 20, 2023 19:41
src/gsym/inline.rs Outdated Show resolved Hide resolved
src/gsym/resolver.rs Outdated Show resolved Hide resolved
src/symbolize/symbolizer.rs Outdated Show resolved Hide resolved
tests/blazesym.rs Show resolved Hide resolved
With upcoming changes we want to reorder elements of an array only
partly. To prepare for that, this change introduces the
with_ordered_elems_with_swap() helper function which effectively extend
with_ordered_elems() with a new argument that is used for swapping two
elements.

Signed-off-by: Daniel Müller <[email protected]>
The doubly nested Vec<Vec<>> construct being returned from the
Symbolizer::symbolize() method has basically been there since the
beginning of time. However, it was also shown to be one of the least
understood aspects of the library for users.
This change reworks the return type. We are now returning a single level
of symbols, but associate each with an index of the input address it
corresponds to. In so doing we can express both the lack of
symbolization as well as the presence of one or more inlined functions
at the address in question.

Signed-off-by: Daniel Müller <[email protected]>
@danielocfb danielocfb force-pushed the topic/gsym-inline branch 2 times, most recently from dfd820c to ac3464a Compare September 21, 2023 22:36
@d-e-s-o
Copy link
Collaborator

d-e-s-o commented Sep 21, 2023

Will do the enum addition we discussed in person as a follow on. This PR is way to unwieldy already and in any case I am not going to merge it in.

Our SymResolver::find_line_info() method currently returns a single
AddrCodeInfo object. That is insufficient when it comes to inline
function information.
Rather than going down the route of returning a vector of those objects,
let's structure things a bit more. With this change we introduce the
FrameCodeInfo type. AddrCodeInfo is reworked to contain such an object
for the function being symbolized, along with an (optional) array
representing inlined functions.
Long-term I hope that we will move to such a scheme for the public
facing API as well, as it is much more telling than returning a mere
array of AddrCodeInfo (or Sym) objects.

Signed-off-by: Daniel Müller <[email protected]>
Factor out the GsymResolver::parse_line_tab_info() method containing the
bits necessary for working with line table information. Having a
separate methods reduces complexity GsymResolver::find_line_info() and
prepares for upcoming changes.

Signed-off-by: Daniel Müller <[email protected]>
This change adds the necessary bits for parsing and reporting inlined
function information to the Gsym resolver. Note that while the Gsym
resolver now is able to parse and report this information, the
user-facing Symbolizer cannot yet package it up for consumption by
clients.

Refs: #192

Signed-off-by: Daniel Müller <[email protected]>
This change adds a test for the Gsym inlined function lookup logic.

Signed-off-by: Daniel Müller <[email protected]>
With recent changes we started supporting the look up of inlined
function calls when using the Gsym format. Inlined function lookup and
symbolization is a potentially costly operation and as such we want to
make it optional (but enabled by default).
With this patch we introduce the necessary configurability knobs to the
symbolize::Builder type that will ultimately control inlined function
symbolization.

Signed-off-by: Daniel Müller <[email protected]>
This change adds a new argument, `inlined_fns`, to the
SymResolver::find_line_info() method. This argument determines whether
inlined function information (when supported and available) will be
looked up and reported.
The method is also renamed to SymResolver::find_code_info() to be more
in line with the chosen naming throughout the crate.

Signed-off-by: Daniel Müller <[email protected]>
This change introduces a new publicly accessible type, CodeInfo, which
encompasses all the source code information for a symbol. In the future,
it will also be used as part of the inlined function reporting.

Signed-off-by: Daniel Müller <[email protected]>
The thing is a nightmare to maintain and with upcoming changes we are
going to 1) support only reporting of a single symbol and 2) use the
existing and well tested binary search of the util module. Hence, we
won't have any use for this test moving forward. Remove it.

Signed-off-by: Daniel Müller <[email protected]>
The general contract is that for each address there can be at most one
symbol. The only violation of that constraint are aliases at the level
of the symbolization source (e.g., ELF). For aliases, it is arguably
irrelevant which name we return, but in most cases it would be either
wrong to report all of them or require additional post-processing on the
client side, impacting accessibility negatively for what is at best a
niche feature.
Furthermore, reporting multiple symbols for a single address makes APIs
more complex and, as a result, decreases accessibility further.

As such, we only want to report at most one logical symbol for any
address. To that end, this change adjusts the internal
SymResolver::find_syms() accordingly and renames it to
SymResolver::find_sym() in the process.

Signed-off-by: Daniel Müller <[email protected]>
This change refactors the Symbolizer::symbolize_kernel_addrs() method by
renaming it to create_kernel_resolver() and making it return a resolver
that can then use the existing Symbolizer::symbolize_addrs() method to
perform symbolization at the call site. In so doing we make this code
path more similar to the majority of others that instantiate a resolver
and then use it in conjunction with symbolize_addrs().

Signed-off-by: Daniel Müller <[email protected]>
This change introduces a new API, Symbolizer::symbolize_single, that can
be used to symbolize a single address. Our "batch processing" is
somewhat cumbersome to use when it is clear that only a single input
address is available. On top of that, we can avoid the need for
dynamic memory allocation in that case, because we know that at most one
symbol can be returned.

Closes: #317

Signed-off-by: Daniel Müller <[email protected]>
Currently all our LEB128 decoding functions return 128 bit values, as
they are the largest integer type available to us and that barring any
heap allocated dynamically sized integers, that is the fixed size
integer most likely to fit all expressed values.
However, working with 128 bit values requires two registers on common 64
bit machines and in general Gsym, the only format decoder currently
using this functionality, is unlikely to need this data range.
Hence, with this change we switch to decoding all those values into 64
bit fixed size integers instead.

Signed-off-by: Daniel Müller <[email protected]>
Copy link
Member

@anakryiko anakryiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. This is a big PR, let's land it. We had a discussion separately about Sym and addr_idx, approach. If you decide to change it, let's do it in a separate PR. Also, if you aggree to refine the inline post-processing to be a single step, let's do it in a separate PR as well.

src/gsym/resolver.rs Show resolved Hide resolved
src/symbolize/symbolizer.rs Show resolved Hide resolved
}

/// Enable/disable inlined function reporting.
pub fn enable_inlined_fns(mut self, enable: bool) -> Builder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit of eye sore to read "inline_fns", would "enable_inlines" be confusing? Then you can also call the field in Sym "inlines" and be consistent in naming throughout?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can also call the field in Sym "inlines"

It's actually called inlined. And it currently is of type InlinedFn. I am not sure I want to shorten it to just a single word, to be honest. Similar for code info or debug syms. So if anything I'd rather change Sym::inlined to Sym::inlined_fns. I have no strong preference. The types involved basically add the necessary context from my perspective.

@danielocfb
Copy link
Collaborator Author

Going ahead with the merge as per the suggestion.

@danielocfb danielocfb merged commit 270f839 into libbpf:main Sep 22, 2023
22 checks passed
@danielocfb danielocfb deleted the topic/gsym-inline branch September 22, 2023 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants