Refine types based on debug metadata #191

frabert · 2021-10-22T15:30:52Z

Solves #190

frabert · 2021-10-22T15:35:14Z

Issues with this right now:

as mentioned in Recover type information #190 (comment) const causes problems and is ignored right now
~~ASTBuilder does not implement outputting typedefs yet, so they're left out~~
~~Globals do not enjoy the same treatment as locals do, which is an issue that needed to be addressed in Use debug metadata in LLVM bitcode to infer names #180 probably. But that ship has sailed.~~
Current implementation is messy, but I'm still not sure if a separate successive pass can do a clean job either

include/rellic/AST/ASTBuilder.h

lib/AST/IRToASTVisitor.cpp

pgoodman · 2021-10-26T16:17:10Z

lib/AST/IRToASTVisitor.cpp

-        auto inttype{llvm::cast<llvm::DIBasicType>(ditype)};
-        sign =
-            inttype->getSignedness() == llvm::DIBasicType::Signedness::Signed;
+        // TODO(frabert): this path will not be taken when arguments will have


Can you file issues for each of these cases, with an example of C code that reproduces the issue.

lib/AST/IRToASTVisitor.cpp

pgoodman · 2021-10-26T16:29:59Z

lib/AST/IRToASTVisitor.cpp

+      if (ditype) {
+        auto difunctype{llvm::cast<llvm::DISubroutineType>(ditype)};
+        auto arr{difunctype->getTypeArray()};
+        if (arr.size() == ditype_array.size()) {


Can you leave a comment on why the size check is needed. Is it a sanity check on the debug info? Is it to avoid the issue of dead argument elimination?

lib/AST/IRToASTVisitor.cpp

frabert · 2021-10-26T16:50:31Z

Regarding the overall method: yes; thinking about it, the current way of trying to match IR and debug metadata is not very robust. The "correct" solution will probably involve some kind of more thorough, "holistic" approach. Question: where should it fit? Could it be that it would be easier to do that before trying to lift the bitcode, as a preprocessing step?

pgoodman · 2021-10-26T16:57:52Z

Question: where should it fit? Could it be that it would be easier to do that before trying to lift the bitcode, as a preprocessing step?

I think doing so before might be a good place to start, but I don't know what that looks like. What is evident is that right now, in the middle of doing one thing, we're trying to reverse engineer debug info types, and integrate that info. I think that attempts to integrate more "smarts" into that process are going to lead to issues in trying to manage the complexity of what's going on. Some kind of pass, or multiple passes, that interprets bitcode values, types, and debug info locations/types ahead-of-time seems prudent.

Perhaps we can formulate this problem as the type of info that we think we should be able to present. For example, at each LLVM instruction, what logical source variables are "live" and where are their values, and what are their types? The "where are their values" is tricky, because their values may be embedded in other values (e.g. high bits, low bits, mid-bits [for the case of a bitfield], in a structure value, in a vector value, at some byte offset of an alloca). I think it would be prudent to work toward the ability to output this information, as a proof-of-capability for getting it, and a way of forcing it into a coherent API.

pgoodman · 2021-10-26T16:59:39Z

This might open up opportunities. For example, if the debug info "tells" us that two LLVM values represent the same local variable, and if the two values have the same LLVM type, then we might be able to keep track of this as saying: these two values are in a "storage equivalence class."

frabert · 2021-10-28T11:49:38Z

Regarding this specific PR: due to the way the QualTypes work, I can't think of a good way to factor out the new code out of IRToASTVisitor without essentially duplicating all of GetQualType.

Also, most tests need additional debug info for function prototypes, and I still haven't figured out a way to convince clang to consistently emit info for those. Even -O1 seemed do the trick, but it doesn't always work.

… use-debug-types

pgoodman · 2021-11-04T21:17:58Z

lib/AST/StructFieldRenamer.cpp

@@ -60,6 +60,9 @@ bool StructFieldRenamer::VisitRecordDecl(clang::RecordDecl *decl) {
    // FIXME(frabert): Is a clash between field names actually possible?
    // Can this mechanism actually be left out?
    auto name{di_field->getName().str()};
+    if (di_field->getTag() == llvm::dwarf::DW_TAG_inheritance) {


Does this make an explicit field out of the base type in the case of inheritance? Can you add a comment here that shows what a simple c++ code would look like, and what we would generate as a result?

Can you add some examples that use multiple inheritance and virtual inheritance?

struct Base1 { int foo; }; struct Base2 { float bar; } struct Derived : Base1 , Base2 { };

struct Base1 { int foo; }; struct Base2 : Base1 { float bar; }; struct Base3 : Base1 { float bar; }; struct Derived : virtual Base2 , virtual Base3 { };

Also, here is a particularly thorny example which shows when this method of embedding the base within the structure of the parent is going to break down:
C++: https://godbolt.org/z/bM4vrq6fW
C: https://godbolt.org/z/fYarYo5he

See this SO post for more detail: https://stackoverflow.com/questions/52818411/will-the-padding-of-base-class-be-copied-into-the-derived-class

pgoodman · 2021-11-04T21:20:38Z

lib/AST/StructFieldRenamer.cpp

@@ -60,6 +60,9 @@ bool StructFieldRenamer::VisitRecordDecl(clang::RecordDecl *decl) {
    // FIXME(frabert): Is a clash between field names actually possible?
    // Can this mechanism actually be left out?
    auto name{di_field->getName().str()};
+    if (di_field->getTag() == llvm::dwarf::DW_TAG_inheritance) {
+      name = di_field->getBaseType()->getName().str() + "_base";
+    }
    if (seen_names.find(name) == seen_names.end()) {


What if you have seen_names be a map of seen_names -> unsinged, then you could have:

auto &name_count = seen_names[name]; if (name_count) { name = name + "_" + std::to_string(name_count); } ++name_count;

Initial work on refining types

9798e97

pgoodman suggested changes Oct 22, 2021

View reviewed changes

include/rellic/AST/ASTBuilder.h Show resolved Hide resolved

lib/AST/IRToASTVisitor.cpp Show resolved Hide resolved

lib/AST/IRToASTVisitor.cpp Outdated Show resolved Hide resolved

frabert added 4 commits October 25, 2021 16:51

Visit global variables, improve return types

49e2601

Implement typedef printing

b32d10a

Improve type refinement for fields and arguments

55d3a13

Fix struct members

2b1e41d

pgoodman suggested changes Oct 26, 2021

View reviewed changes

frabert added 18 commits October 28, 2021 14:22

Add explanation for checking argument count

7db935e

Add unit test for ASTBuilder::CreateTypedefDecl

1067041

Fix varargs debug type analysis

0486438

Use more debug info for prototypes

171e90e

Fix function argument type refinement

09f8a2e

Default to signed integers

bdd21e4

Fix tests

7c474eb

Desugar types for Z3 conversion

e0e6efe

Initial work on refining types

175c073

Visit global variables, improve return types

b55a8d0

Implement typedef printing

287e137

Improve type refinement for fields and arguments

2292193

Fix struct members

c3da195

Add explanation for checking argument count

36cb7c8

Add unit test for ASTBuilder::CreateTypedefDecl

33294d3

Fix varargs debug type analysis

e878d42

Use more debug info for prototypes

6b4d302

Fix function argument type refinement

b231b24

frabert added 4 commits November 1, 2021 14:34

Default to signed integers

e02bd8a

Fix tests

3676aa5

Desugar types for Z3 conversion

a9b5371

Merge branch 'use-debug-types' of github.com:lifting-bits/rellic into…

0386c2b

… use-debug-types

frabert mentioned this pull request Nov 1, 2021

Incorrect result when decompiling pass-by-value struct arguments via WebAssembly #195

Closed

frabert added 3 commits November 4, 2021 19:31

Merge branch 'master' into use-debug-types

af69ad5

Add utility functions

8f39d9d

Fix bugs

35599ba

pgoodman suggested changes Nov 4, 2021

View reviewed changes

frabert added 3 commits November 8, 2021 14:15

Merge branch 'master' into use-debug-types

2905038

Add void to ptr casts

dbd0d82

Use plain char when asking for signed char

38991a6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine types based on debug metadata #191

Refine types based on debug metadata #191

frabert commented Oct 22, 2021

frabert commented Oct 22, 2021 •

edited

Loading

pgoodman Oct 26, 2021

pgoodman Oct 26, 2021

frabert commented Oct 26, 2021

pgoodman commented Oct 26, 2021

pgoodman commented Oct 26, 2021

frabert commented Oct 28, 2021

pgoodman Nov 4, 2021

pgoodman Nov 4, 2021

pgoodman Nov 4, 2021

Refine types based on debug metadata #191

Are you sure you want to change the base?

Refine types based on debug metadata #191

Conversation

frabert commented Oct 22, 2021

frabert commented Oct 22, 2021 • edited Loading

pgoodman Oct 26, 2021

Choose a reason for hiding this comment

pgoodman Oct 26, 2021

Choose a reason for hiding this comment

frabert commented Oct 26, 2021

pgoodman commented Oct 26, 2021

pgoodman commented Oct 26, 2021

frabert commented Oct 28, 2021

pgoodman Nov 4, 2021

Choose a reason for hiding this comment

pgoodman Nov 4, 2021

Choose a reason for hiding this comment

pgoodman Nov 4, 2021

Choose a reason for hiding this comment

frabert commented Oct 22, 2021 •

edited

Loading