Skip to content

Commit

Permalink
Get ykllvm to provide enough info to identify a zero-length call.
Browse files Browse the repository at this point in the history
PT has this clause where returns are not compressed if the call is both
direct and to the address immediately after the call.

Section 33.4.2.2:

> For near CALLs, push the Next IP onto the stack... Note that this
> excludes zero-length CALLs, which are direct near CALLs with
> displacement zero (to the next IP). These CALLs typically don’t have
> matching RETs.

For example, this kind of thing is never compressed:

```
0x1234: call 0x1242
0x1242: pop rax
```

On x86_64 the instruction pointer register isn't addressable, so people
sometimes use this trick to get its value.

This change makes the compiler emit enough call information for the
runtime to decide whether a call was "zero-length" (namely the return
address of the call).

It's not clear to me if this has ever bitten us, but it could be one of
the causes of the rare PT decoding crashes that occasionally crop up.
  • Loading branch information
vext01 committed Nov 2, 2023
1 parent e69ab01 commit c27223f
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 6 deletions.
9 changes: 8 additions & 1 deletion llvm/include/llvm/CodeGen/AsmPrinter.h
Original file line number Diff line number Diff line change
Expand Up @@ -249,8 +249,15 @@ class AsmPrinter : public MachineFunctionPass {
MCSection *YkLastBBAddrMapSection = nullptr;

/// Symbols marking the call instructions of each block. Used for the Yk JIT.
///
/// Values are a 3-tuple:
/// - A symbol marking the call instruction.
/// - A symbol marking the return address of the call (if it were to return
/// by conventional means)
/// - If it's a direct call, a symbol marking the target of the call, or
/// `nullptr` if the call is indirect.
std::map<const MachineBasicBlock *,
SmallVector<std::tuple<MCSymbol *, MCSymbol *>>>
SmallVector<std::tuple<MCSymbol *, MCSymbol *, MCSymbol *>>>
YkCallMarkerSyms;

protected:
Expand Down
31 changes: 26 additions & 5 deletions llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1528,8 +1528,10 @@ void AsmPrinter::emitBBAddrMapSection(const MachineFunction &MF) {
for (auto Tup : YkCallMarkerSyms[&MBB]) {
// Emit address of the call instruction.
OutStreamer->emitSymbolValue(std::get<0>(Tup), getPointerSize());
// Emit the return address of the call.
OutStreamer->emitSymbolValue(std::get<1>(Tup), getPointerSize());
// Emit address of target if known, or 0.
MCSymbol *Target = std::get<1>(Tup);
MCSymbol *Target = std::get<2>(Tup);
if (Target)
OutStreamer->emitSymbolValue(Target, getPointerSize());
else
Expand Down Expand Up @@ -1988,15 +1990,33 @@ void AsmPrinter::emitFunctionBody() {
(MI.getOpcode() != TargetOpcode::STACKMAP) &&
(MI.getOpcode() != TargetOpcode::PATCHPOINT) &&
(MI.getOpcode() != TargetOpcode::STATEPOINT)) {
// Record the address of the call instruction itself.
MCSymbol *YkPreCallSym =
MF->getContext().createTempSymbol("yk_precall", true);
OutStreamer->emitLabel(YkPreCallSym);

// Codegen it as usual.
emitInstruction(&MI);

// Record the address of the instruction following the call. In other
// words, this is the return address of the call.
MCSymbol *YkPostCallSym =
MF->getContext().createTempSymbol("yk_postcall", true);
OutStreamer->emitLabel(YkPostCallSym);

// Figure out if this is a direct or indirect call.
//
// If it's direct, then we know the call's target from the first
// operand alone.
const MachineOperand CallOpnd = MI.getOperand(0);
MCSymbol *CallTargetSym = nullptr;
if (CallOpnd.isGlobal()) {
// Statically known function address.
// Direct call.
CallTargetSym = getSymbol(CallOpnd.getGlobal());
}
} else if (CallOpnd.isMCSymbol()) {
// Also a direct call.
CallTargetSym = CallOpnd.getMCSymbol();
} // Otherwise it's an indirect call.

// Ensure we are only working with near calls. This matters because
// Intel PT optimises near calls, and it simplifies our implementation
Expand All @@ -2005,10 +2025,11 @@ void AsmPrinter::emitFunctionBody() {
assert(!MF->getSubtarget().getInstrInfo()->isFarCall(MI));

assert(YkCallMarkerSyms.find(&MBB) != YkCallMarkerSyms.end());
YkCallMarkerSyms[&MBB].push_back({YkPreCallSym, CallTargetSym});
YkCallMarkerSyms[&MBB].push_back({YkPreCallSym, YkPostCallSym, CallTargetSym});
} else {
emitInstruction(&MI);
}

emitInstruction(&MI);
// Generate labels for function calls so we can record the correct
// instruction offset. The conditions for generating the label must be
// the same as the ones for generating the stackmap call in
Expand Down

0 comments on commit c27223f

Please sign in to comment.