-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for parsing Gsym inlined function information #315
Conversation
0ea4f93
to
abbe1af
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #315 +/- ##
==========================================
+ Coverage 85.64% 86.41% +0.77%
==========================================
Files 37 39 +2
Lines 4659 4865 +206
==========================================
+ Hits 3990 4204 +214
+ Misses 669 661 -8
☔ View full report in Codecov by Sentry. |
ed75d18
to
4b5591b
Compare
The PR is in a reasonable state with respect to what it covers. However, what is still lacking is the higher level plumbing for |
Also linking #192 here, which this PR will address partly. |
4b5591b
to
a7b4c22
Compare
Now that we have plumbed through all necessary bits and pieces for reporting inlined function information, this change takes the logical last step and adjusts the Sym type to contain inlined function information as well. We opted in favor of including this information into the Sym type instead of creating fully blown Sym objects, to more closely track how the various supported formats treat inlined function information and to make it possible to have a 1:1 mapping between input address and output Sym. Refs: #192 Signed-off-by: Daniel Müller <[email protected]>
Now that all the pieces came together and we are able to report inlined functions, this change adds a test checking that inlined function symbolization works as expected for the Gsym file source. Signed-off-by: Daniel Müller <[email protected]>
75d8014
to
cb903af
Compare
e672676
to
efea57f
Compare
efea57f
to
15b3b4f
Compare
With upcoming changes we want to reorder elements of an array only partly. To prepare for that, this change introduces the with_ordered_elems_with_swap() helper function which effectively extend with_ordered_elems() with a new argument that is used for swapping two elements. Signed-off-by: Daniel Müller <[email protected]>
The doubly nested Vec<Vec<>> construct being returned from the Symbolizer::symbolize() method has basically been there since the beginning of time. However, it was also shown to be one of the least understood aspects of the library for users. This change reworks the return type. We are now returning a single level of symbols, but associate each with an index of the input address it corresponds to. In so doing we can express both the lack of symbolization as well as the presence of one or more inlined functions at the address in question. Signed-off-by: Daniel Müller <[email protected]>
dfd820c
to
ac3464a
Compare
Will do the enum addition we discussed in person as a follow on. This PR is way to unwieldy already and in any case I am not going to merge it in. |
Our SymResolver::find_line_info() method currently returns a single AddrCodeInfo object. That is insufficient when it comes to inline function information. Rather than going down the route of returning a vector of those objects, let's structure things a bit more. With this change we introduce the FrameCodeInfo type. AddrCodeInfo is reworked to contain such an object for the function being symbolized, along with an (optional) array representing inlined functions. Long-term I hope that we will move to such a scheme for the public facing API as well, as it is much more telling than returning a mere array of AddrCodeInfo (or Sym) objects. Signed-off-by: Daniel Müller <[email protected]>
Factor out the GsymResolver::parse_line_tab_info() method containing the bits necessary for working with line table information. Having a separate methods reduces complexity GsymResolver::find_line_info() and prepares for upcoming changes. Signed-off-by: Daniel Müller <[email protected]>
This change adds the necessary bits for parsing and reporting inlined function information to the Gsym resolver. Note that while the Gsym resolver now is able to parse and report this information, the user-facing Symbolizer cannot yet package it up for consumption by clients. Refs: #192 Signed-off-by: Daniel Müller <[email protected]>
This change adds a test for the Gsym inlined function lookup logic. Signed-off-by: Daniel Müller <[email protected]>
With recent changes we started supporting the look up of inlined function calls when using the Gsym format. Inlined function lookup and symbolization is a potentially costly operation and as such we want to make it optional (but enabled by default). With this patch we introduce the necessary configurability knobs to the symbolize::Builder type that will ultimately control inlined function symbolization. Signed-off-by: Daniel Müller <[email protected]>
This change adds a new argument, `inlined_fns`, to the SymResolver::find_line_info() method. This argument determines whether inlined function information (when supported and available) will be looked up and reported. The method is also renamed to SymResolver::find_code_info() to be more in line with the chosen naming throughout the crate. Signed-off-by: Daniel Müller <[email protected]>
This change introduces a new publicly accessible type, CodeInfo, which encompasses all the source code information for a symbol. In the future, it will also be used as part of the inlined function reporting. Signed-off-by: Daniel Müller <[email protected]>
The thing is a nightmare to maintain and with upcoming changes we are going to 1) support only reporting of a single symbol and 2) use the existing and well tested binary search of the util module. Hence, we won't have any use for this test moving forward. Remove it. Signed-off-by: Daniel Müller <[email protected]>
The general contract is that for each address there can be at most one symbol. The only violation of that constraint are aliases at the level of the symbolization source (e.g., ELF). For aliases, it is arguably irrelevant which name we return, but in most cases it would be either wrong to report all of them or require additional post-processing on the client side, impacting accessibility negatively for what is at best a niche feature. Furthermore, reporting multiple symbols for a single address makes APIs more complex and, as a result, decreases accessibility further. As such, we only want to report at most one logical symbol for any address. To that end, this change adjusts the internal SymResolver::find_syms() accordingly and renames it to SymResolver::find_sym() in the process. Signed-off-by: Daniel Müller <[email protected]>
This change refactors the Symbolizer::symbolize_kernel_addrs() method by renaming it to create_kernel_resolver() and making it return a resolver that can then use the existing Symbolizer::symbolize_addrs() method to perform symbolization at the call site. In so doing we make this code path more similar to the majority of others that instantiate a resolver and then use it in conjunction with symbolize_addrs(). Signed-off-by: Daniel Müller <[email protected]>
This change introduces a new API, Symbolizer::symbolize_single, that can be used to symbolize a single address. Our "batch processing" is somewhat cumbersome to use when it is clear that only a single input address is available. On top of that, we can avoid the need for dynamic memory allocation in that case, because we know that at most one symbol can be returned. Closes: #317 Signed-off-by: Daniel Müller <[email protected]>
ac3464a
to
b238f10
Compare
Currently all our LEB128 decoding functions return 128 bit values, as they are the largest integer type available to us and that barring any heap allocated dynamically sized integers, that is the fixed size integer most likely to fit all expressed values. However, working with 128 bit values requires two registers on common 64 bit machines and in general Gsym, the only format decoder currently using this functionality, is unlikely to need this data range. Hence, with this change we switch to decoding all those values into 64 bit fixed size integers instead. Signed-off-by: Daniel Müller <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. This is a big PR, let's land it. We had a discussion separately about Sym and addr_idx, approach. If you decide to change it, let's do it in a separate PR. Also, if you aggree to refine the inline post-processing to be a single step, let's do it in a separate PR as well.
} | ||
|
||
/// Enable/disable inlined function reporting. | ||
pub fn enable_inlined_fns(mut self, enable: bool) -> Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit of eye sore to read "inline_fns", would "enable_inlines" be confusing? Then you can also call the field in Sym "inlines" and be consistent in naming throughout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you can also call the field in Sym "inlines"
It's actually called inlined
. And it currently is of type InlinedFn
. I am not sure I want to shorten it to just a single word, to be honest. Similar for code info
or debug syms
. So if anything I'd rather change Sym::inlined
to Sym::inlined_fns
. I have no strong preference. The types involved basically add the necessary context from my perspective.
Going ahead with the merge as per the suggestion. |
Please see individual commits for descriptions.