Skip to content

Commit

Permalink
[BOLT] Improve ICP for virtual method calls and jump tables using val…
Browse files Browse the repository at this point in the history
…ue profiling.

Summary:
Use value profiling data to remove the method pointer loads from vtables when doing ICP at virtual function and jump table callsites.

The basic process is the following:
1. Work backwards from the callsite to find the most recent def of the call register.
2. Work back from the call register def to find the instruction where the vtable is loaded.
3. Find out of there is any value profiling data associated with the vtable load.  If so, record all these addresses as potential vtables + method offsets.
4. Since the addresses extracted by #3 will be vtable + method offset, we need to figure out the method offset in order to determine the actual vtable base address.  At this point I virtually execute all the instructions that occur between #3 and #2 that touch the method pointer register.  The result of this execution should be the method offset.
5. Fetch the actual method address from the appropriate data section containing the vtable using the computed method offset.  Make sure that this address maps to an actual function symbol.
6. Try to associate a vtable pointer with each target address in SymTargets.  If every target has a vtable, then this is almost certainly a virtual method callsite.
7. Use the vtable address when generating the promoted call code.  It's basically the same as regular ICP code except that the compare is against the vtable and not the method pointer.  Additionally, the instructions to load up the method are dumped into the cold call block.

For jump tables, the basic idea is the same.  I use the memory profiling data to find the hottest slots in the jumptable and then use that information to compute the indices of the hottest entries. We can then compare the index register to the hot index values and avoid the load from the jump table.

Note: I'm assuming the whole call is in a single BB.  According to @rafaelauler, this isn't always the case on ARM.    This also isn't always the case on X86 either.  If there are non-trivial arguments that are passed by value, there could be branches in between the setup and the call.  I'm going to leave fixing this until later since it makes things a bit more complicated.

I've also fixed a bug where ICP was introducing a conditional tail call.  I made sure that SCTC fixes these up afterwards.  I have no idea why I made it introduce a CTC in the first place.

(cherry picked from FBD6120768)
  • Loading branch information
Bill Nell authored and memfrob committed Oct 4, 2022
1 parent c588b5e commit 81f2331
Show file tree
Hide file tree
Showing 13 changed files with 1,000 additions and 162 deletions.
8 changes: 8 additions & 0 deletions bolt/BinaryBasicBlock.h
Original file line number Diff line number Diff line change
Expand Up @@ -642,6 +642,14 @@ class BinaryBasicBlock {
return Instructions.erase(II);
}

/// Erase instructions in the specified range.
template <typename ItrType>
void eraseInstructions(ItrType Begin, ItrType End) {
while (End > Begin) {
eraseInstruction(*--End);
}
}

/// Erase all instructions
void clear() {
Instructions.clear();
Expand Down
98 changes: 89 additions & 9 deletions bolt/BinaryContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,28 @@ PrintDebugInfo("print-debug-info",
cl::Hidden,
cl::cat(BoltCategory));

static cl::opt<bool>
PrintRelocations("print-relocations",
cl::desc("print relocations when printing functions"),
cl::Hidden,
cl::cat(BoltCategory));

static cl::opt<bool>
PrintMemData("print-mem-data",
cl::desc("print memory data annotations when printing functions"),
cl::Hidden,
cl::cat(BoltCategory));

} // namespace opts

namespace llvm {
namespace bolt {
extern void check_error(std::error_code EC, StringRef Message);
}
}

Triple::ArchType Relocation::Arch;

BinaryContext::~BinaryContext() { }

MCObjectWriter *BinaryContext::createObjectWriter(raw_pwrite_stream &OS) {
Expand Down Expand Up @@ -326,7 +346,9 @@ void BinaryContext::printInstruction(raw_ostream &OS,
const MCInst &Instruction,
uint64_t Offset,
const BinaryFunction* Function,
bool printMCInst) const {
bool PrintMCInst,
bool PrintMemData,
bool PrintRelocations) const {
if (MIA->isEHLabel(Instruction)) {
OS << " EH_LABEL: " << *MIA->getTargetSymbol(Instruction) << '\n';
return;
Expand Down Expand Up @@ -392,24 +414,58 @@ void BinaryContext::printInstruction(raw_ostream &OS,
}
}

auto *MD = Function ? DR.getFuncMemData(Function->getNames()) : nullptr;
if (MD) {
bool DidPrint = false;
for (auto &MI : MD->getMemInfoRange(Offset)) {
OS << (DidPrint ? ", " : " # Loads: ");
OS << MI.Addr << "/" << MI.Count;
DidPrint = true;
if ((opts::PrintMemData || PrintMemData) && Function) {
const auto *MD = Function->getMemData();
const auto MemDataOffset =
MIA->tryGetAnnotationAs<uint64_t>(Instruction, "MemDataOffset");
if (MD && MemDataOffset) {
bool DidPrint = false;
for (auto &MI : MD->getMemInfoRange(MemDataOffset.get())) {
OS << (DidPrint ? ", " : " # Loads: ");
OS << MI.Addr << "/" << MI.Count;
DidPrint = true;
}
}
}

if ((opts::PrintRelocations || PrintRelocations) && Function) {
const auto Size = computeCodeSize(&Instruction, &Instruction + 1);
Function->printRelocations(OS, Offset, Size);
}

OS << "\n";

if (printMCInst) {
if (PrintMCInst) {
Instruction.dump_pretty(OS, InstPrinter.get());
OS << "\n";
}
}

ErrorOr<ArrayRef<uint8_t>>
BinaryContext::getFunctionData(const BinaryFunction &Function) const {
auto Section = Function.getSection();
assert(Section.getAddress() <= Function.getAddress() &&
Section.getAddress() + Section.getSize()
>= Function.getAddress() + Function.getSize() &&
"wrong section for function");

if (!Section.isText() || Section.isVirtual() || !Section.getSize()) {
return std::make_error_code(std::errc::bad_address);
}

StringRef SectionContents;
check_error(Section.getContents(SectionContents),
"cannot get section contents");

assert(SectionContents.size() == Section.getSize() &&
"section size mismatch");

// Function offset from the section start.
auto FunctionOffset = Function.getAddress() - Section.getAddress();
auto *Bytes = reinterpret_cast<const uint8_t *>(SectionContents.data());
return ArrayRef<uint8_t>(Bytes + FunctionOffset, Function.getSize());
}

ErrorOr<SectionRef> BinaryContext::getSectionForAddress(uint64_t Address) const{
auto SI = AllocatableSections.upper_bound(Address);
if (SI != AllocatableSections.begin()) {
Expand Down Expand Up @@ -640,3 +696,27 @@ size_t Relocation::emit(MCStreamer *Streamer) const {
}
return Size;
}

#define ELF_RELOC(name, value) #name,

void Relocation::print(raw_ostream &OS) const {
static const char *X86RelocNames[] = {
#include "llvm/Support/ELFRelocs/x86_64.def"
};
static const char *AArch64RelocNames[] = {
#include "llvm/Support/ELFRelocs/AArch64.def"
};
if (Arch == Triple::aarch64)
OS << AArch64RelocNames[Type];
else
OS << X86RelocNames[Type];
OS << ", 0x" << Twine::utohexstr(Offset);
if (Symbol) {
OS << ", " << Symbol->getName();
}
if (int64_t(Addend) < 0)
OS << ", -0x" << Twine::utohexstr(-int64_t(Addend));
else
OS << ", 0x" << Twine::utohexstr(Addend);
OS << ", 0x" << Twine::utohexstr(Value);
}
37 changes: 33 additions & 4 deletions bolt/BinaryContext.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ class DataReader;

/// Relocation class.
struct Relocation {
static Triple::ArchType Arch; /// for printing, set by BinaryContext ctor.
uint64_t Offset;
mutable MCSymbol *Symbol; /// mutable to allow modification by emitter.
uint64_t Type;
Expand All @@ -78,13 +79,21 @@ struct Relocation {
/// Emit relocation at a current \p Streamer' position. The caller is
/// responsible for setting the position correctly.
size_t emit(MCStreamer *Streamer) const;

/// Print a relocation to \p OS.
void print(raw_ostream &OS) const;
};

/// Relocation ordering by offset.
inline bool operator<(const Relocation &A, const Relocation &B) {
return A.Offset < B.Offset;
}

inline raw_ostream &operator<<(raw_ostream &OS, const Relocation &Rel) {
Rel.print(OS);
return OS;
}

class BinaryContext {

BinaryContext() = delete;
Expand Down Expand Up @@ -199,7 +208,9 @@ class BinaryContext {
MIA(std::move(MIA)),
MRI(std::move(MRI)),
DisAsm(std::move(DisAsm)),
DR(DR) {}
DR(DR) {
Relocation::Arch = this->TheTriple->getArch();
}

~BinaryContext();

Expand All @@ -215,13 +226,26 @@ class BinaryContext {
/// global symbol was registered at the location.
MCSymbol *getGlobalSymbolAtAddress(uint64_t Address) const;

/// Find the address of the global symbol with the given \p Name.
/// return an error if no such symbol exists.
ErrorOr<uint64_t> getAddressForGlobalSymbol(StringRef Name) const {
auto Itr = GlobalSymbols.find(Name);
if (Itr != GlobalSymbols.end())
return Itr->second;
return std::make_error_code(std::errc::bad_address);
}

/// Return MCSymbol for the given \p Name or nullptr if no
/// global symbol with that name exists.
MCSymbol *getGlobalSymbolByName(const std::string &Name) const;

/// Print the global symbol table.
void printGlobalSymbols(raw_ostream& OS) const;

/// Get the raw bytes for a given function.
ErrorOr<ArrayRef<uint8_t>>
getFunctionData(const BinaryFunction &Function) const;

/// Return (allocatable) section containing the given \p Address.
ErrorOr<SectionRef> getSectionForAddress(uint64_t Address) const;

Expand Down Expand Up @@ -340,7 +364,9 @@ class BinaryContext {
const MCInst &Instruction,
uint64_t Offset = 0,
const BinaryFunction *Function = nullptr,
bool printMCInst = false) const;
bool PrintMCInst = false,
bool PrintMemData = false,
bool PrintRelocations = false) const;

/// Print a range of instructions.
template <typename Itr>
Expand All @@ -349,9 +375,12 @@ class BinaryContext {
Itr End,
uint64_t Offset = 0,
const BinaryFunction *Function = nullptr,
bool printMCInst = false) const {
bool PrintMCInst = false,
bool PrintMemData = false,
bool PrintRelocations = false) const {
while (Begin != End) {
printInstruction(OS, *Begin, Offset, Function, printMCInst);
printInstruction(OS, *Begin, Offset, Function, PrintMCInst,
PrintMemData, PrintRelocations);
Offset += computeCodeSize(Begin, Begin + 1);
++Begin;
}
Expand Down
80 changes: 78 additions & 2 deletions bolt/BinaryFunction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "llvm/Support/Debug.h"
#include "llvm/Support/GraphWriter.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/Regex.h"
#include <limits>
#include <queue>
#include <string>
Expand Down Expand Up @@ -137,8 +138,16 @@ PrintOnly("print-only",
cl::Hidden,
cl::cat(BoltCategory));

static cl::list<std::string>
PrintOnlyRegex("print-only-regex",
cl::CommaSeparated,
cl::desc("list of function regexes to print"),
cl::value_desc("func1,func2,func3,..."),
cl::Hidden,
cl::cat(BoltCategory));

bool shouldPrint(const BinaryFunction &Function) {
if (PrintOnly.empty())
if (PrintOnly.empty() && PrintOnlyRegex.empty())
return true;

for (auto &Name : opts::PrintOnly) {
Expand All @@ -147,6 +156,12 @@ bool shouldPrint(const BinaryFunction &Function) {
}
}

for (auto &Name : opts::PrintOnlyRegex) {
if (Function.hasNameRegex(Name)) {
return true;
}
}

return false;
}

Expand All @@ -160,6 +175,11 @@ constexpr unsigned BinaryFunction::MinAlign;

namespace {

template <typename R>
bool emptyRange(const R &Range) {
return Range.begin() == Range.end();
}

/// Gets debug line information for the instruction located at the given
/// address in the original binary. The SMLoc's pointer is used
/// to point to this information, which is represented by a
Expand Down Expand Up @@ -227,6 +247,14 @@ bool DynoStats::lessThan(const DynoStats &Other,

uint64_t BinaryFunction::Count = 0;

bool BinaryFunction::hasNameRegex(const std::string &NameRegex) const {
Regex MatchName(NameRegex);
for (auto &Name : Names)
if (MatchName.match(Name))
return true;
return false;
}

BinaryBasicBlock *
BinaryFunction::getBasicBlockContainingOffset(uint64_t Offset) {
if (Offset > Size)
Expand Down Expand Up @@ -558,6 +586,31 @@ void BinaryFunction::print(raw_ostream &OS, std::string Annotation,
OS << "End of Function \"" << *this << "\"\n\n";
}

void BinaryFunction::printRelocations(raw_ostream &OS,
uint64_t Offset,
uint64_t Size) const {
const char* Sep = " # Relocs: ";

auto RI = Relocations.lower_bound(Offset);
while (RI != Relocations.end() && RI->first < Offset + Size) {
OS << Sep << "(R: " << RI->second << ")";
Sep = ", ";
++RI;
}

RI = MoveRelocations.lower_bound(Offset);
while (RI != MoveRelocations.end() && RI->first < Offset + Size) {
OS << Sep << "(M: " << RI->second << ")";
Sep = ", ";
++RI;
}

auto PI = PCRelativeRelocationOffsets.lower_bound(Offset);
if (PI != PCRelativeRelocationOffsets.end() && *PI < Offset + Size) {
OS << Sep << "(pcrel)";
}
}

IndirectBranchType BinaryFunction::processIndirectBranch(MCInst &Instruction,
unsigned Size,
uint64_t Offset) {
Expand All @@ -566,7 +619,7 @@ IndirectBranchType BinaryFunction::processIndirectBranch(MCInst &Instruction,
// An instruction referencing memory used by jump instruction (directly or
// via register). This location could be an array of function pointers
// in case of indirect tail call, or a jump table.
const MCInst *MemLocInstr;
MCInst *MemLocInstr;

// Address of the table referenced by MemLocInstr. Could be either an
// array of function pointers, or a jump table.
Expand Down Expand Up @@ -834,6 +887,8 @@ void BinaryFunction::disassemble(ArrayRef<uint8_t> FunctionData) {

DWARFUnitLineTable ULT = getDWARFUnitLineTable();

matchProfileMemData();

// Insert a label at the beginning of the function. This will be our first
// basic block.
Labels[0] = Ctx->createTempSymbol("BB0", false);
Expand Down Expand Up @@ -1181,6 +1236,10 @@ void BinaryFunction::disassemble(ArrayRef<uint8_t> FunctionData) {
findDebugLineInformationForInstructionAt(AbsoluteInstrAddr, ULT));
}

if (MemData && !emptyRange(MemData->getMemInfoRange(Offset))) {
MIA->addAnnotation(Ctx.get(), Instruction, "MemDataOffset", Offset);
}

addInstruction(Offset, std::move(Instruction));
}

Expand Down Expand Up @@ -1892,6 +1951,23 @@ bool BinaryFunction::fetchProfileForOtherEntryPoints() {
return Updated;
}

void BinaryFunction::matchProfileMemData() {
const auto AllMemData = BC.DR.getFuncMemDataRegex(getNames());
for (auto *NewMemData : AllMemData) {
// Prevent functions from sharing the same profile.
if (NewMemData->Used)
continue;

if (MemData)
MemData->Used = false;

// Update function profile data with the new set.
MemData = NewMemData;
MemData->Used = true;
break;
}
}

void BinaryFunction::matchProfileData() {
// This functionality is available for LBR-mode only
// TODO: Implement evaluateProfileData() for samples, checking whether
Expand Down
Loading

0 comments on commit 81f2331

Please sign in to comment.