Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine types based on debug metadata #191

Draft
wants to merge 33 commits into
base: master
Choose a base branch
from
Draft

Refine types based on debug metadata #191

wants to merge 33 commits into from

Conversation

frabert
Copy link
Collaborator

@frabert frabert commented Oct 22, 2021

Solves #190

@frabert
Copy link
Collaborator Author

frabert commented Oct 22, 2021

Issues with this right now:

  • as mentioned in Recover type information #190 (comment) const causes problems and is ignored right now
  • ASTBuilder does not implement outputting typedefs yet, so they're left out
  • Globals do not enjoy the same treatment as locals do, which is an issue that needed to be addressed in Use debug metadata in LLVM bitcode to infer names #180 probably. But that ship has sailed.
  • Current implementation is messy, but I'm still not sure if a separate successive pass can do a clean job either

include/rellic/AST/ASTBuilder.h Show resolved Hide resolved
lib/AST/IRToASTVisitor.cpp Show resolved Hide resolved
lib/AST/IRToASTVisitor.cpp Outdated Show resolved Hide resolved
auto inttype{llvm::cast<llvm::DIBasicType>(ditype)};
sign =
inttype->getSignedness() == llvm::DIBasicType::Signedness::Signed;
// TODO(frabert): this path will not be taken when arguments will have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you file issues for each of these cases, with an example of C code that reproduces the issue.

lib/AST/IRToASTVisitor.cpp Outdated Show resolved Hide resolved
if (ditype) {
auto difunctype{llvm::cast<llvm::DISubroutineType>(ditype)};
auto arr{difunctype->getTypeArray()};
if (arr.size() == ditype_array.size()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment on why the size check is needed. Is it a sanity check on the debug info? Is it to avoid the issue of dead argument elimination?

lib/AST/IRToASTVisitor.cpp Show resolved Hide resolved
@frabert
Copy link
Collaborator Author

frabert commented Oct 26, 2021

Regarding the overall method: yes; thinking about it, the current way of trying to match IR and debug metadata is not very robust. The "correct" solution will probably involve some kind of more thorough, "holistic" approach. Question: where should it fit? Could it be that it would be easier to do that before trying to lift the bitcode, as a preprocessing step?

@pgoodman
Copy link
Contributor

Question: where should it fit? Could it be that it would be easier to do that before trying to lift the bitcode, as a preprocessing step?

I think doing so before might be a good place to start, but I don't know what that looks like. What is evident is that right now, in the middle of doing one thing, we're trying to reverse engineer debug info types, and integrate that info. I think that attempts to integrate more "smarts" into that process are going to lead to issues in trying to manage the complexity of what's going on. Some kind of pass, or multiple passes, that interprets bitcode values, types, and debug info locations/types ahead-of-time seems prudent.

Perhaps we can formulate this problem as the type of info that we think we should be able to present. For example, at each LLVM instruction, what logical source variables are "live" and where are their values, and what are their types? The "where are their values" is tricky, because their values may be embedded in other values (e.g. high bits, low bits, mid-bits [for the case of a bitfield], in a structure value, in a vector value, at some byte offset of an alloca). I think it would be prudent to work toward the ability to output this information, as a proof-of-capability for getting it, and a way of forcing it into a coherent API.

@pgoodman
Copy link
Contributor

This might open up opportunities. For example, if the debug info "tells" us that two LLVM values represent the same local variable, and if the two values have the same LLVM type, then we might be able to keep track of this as saying: these two values are in a "storage equivalence class."

@frabert
Copy link
Collaborator Author

frabert commented Oct 28, 2021

Regarding this specific PR: due to the way the QualTypes work, I can't think of a good way to factor out the new code out of IRToASTVisitor without essentially duplicating all of GetQualType.

Also, most tests need additional debug info for function prototypes, and I still haven't figured out a way to convince clang to consistently emit info for those. Even -O1 seemed do the trick, but it doesn't always work.

@@ -60,6 +60,9 @@ bool StructFieldRenamer::VisitRecordDecl(clang::RecordDecl *decl) {
// FIXME(frabert): Is a clash between field names actually possible?
// Can this mechanism actually be left out?
auto name{di_field->getName().str()};
if (di_field->getTag() == llvm::dwarf::DW_TAG_inheritance) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make an explicit field out of the base type in the case of inheritance? Can you add a comment here that shows what a simple c++ code would look like, and what we would generate as a result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some examples that use multiple inheritance and virtual inheritance?

struct Base1 {
  int foo;
};
struct Base2 {
  float bar;
}
struct Derived : Base1 , Base2 {
};
struct Base1 {
  int foo;
};
struct Base2 : Base1 {
  float bar;
};
struct Base3 : Base1 {
  float bar;
};
struct Derived : virtual Base2 , virtual Base3 {
};

Also, here is a particularly thorny example which shows when this method of embedding the base within the structure of the parent is going to break down:
C++: https://godbolt.org/z/bM4vrq6fW
C: https://godbolt.org/z/fYarYo5he

See this SO post for more detail: https://stackoverflow.com/questions/52818411/will-the-padding-of-base-class-be-copied-into-the-derived-class

@@ -60,6 +60,9 @@ bool StructFieldRenamer::VisitRecordDecl(clang::RecordDecl *decl) {
// FIXME(frabert): Is a clash between field names actually possible?
// Can this mechanism actually be left out?
auto name{di_field->getName().str()};
if (di_field->getTag() == llvm::dwarf::DW_TAG_inheritance) {
name = di_field->getBaseType()->getName().str() + "_base";
}
if (seen_names.find(name) == seen_names.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you have seen_names be a map of seen_names -> unsinged, then you could have:

auto &name_count = seen_names[name];
if (name_count) {
  name = name + "_" + std::to_string(name_count);
}
++name_count;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants