Skip to content

Commit

Permalink
[EraVM] Adding support of iterative libraries linkage.
Browse files Browse the repository at this point in the history
This also includes the C-API for requiting the following info:
- If the given memory buffer contains an ELF file
- List of undefined linker symbols of the given ELF file
  • Loading branch information
PavelKopyl authored and akiramenai committed Sep 23, 2024
1 parent 3e544ce commit a40666b
Show file tree
Hide file tree
Showing 26 changed files with 1,281 additions and 58 deletions.
3 changes: 3 additions & 0 deletions lld/ELF/Arch/EraVM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ void EraVM::relocate(uint8_t *loc, const Relocation &rel, uint64_t val) const {
case R_ERAVM_16_SCALE_8:
add16scaled(val, /*scale=*/8);
break;
case R_ERAVM_32:
write32be(loc, static_cast<uint32_t>(val));
break;
default:
error(getErrorLocation(loc) + "unrecognized relocation " +
toString(rel.type));
Expand Down
112 changes: 106 additions & 6 deletions lld/include/lld-c/LLDAsLibraryC.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,58 @@
//
//===----------------------------------------------------------------------===//
//
// This header declares the C interface to the lld-as-a-library, which can be
// used to invoke LLD functionality.
// This header declares the following EraVM C interfaces:
// - 'LLVMLinkEraVM'
// The interface to the LLD linker functionality (via lld-as-a-library)
//
// - 'LLVMIsELF'
// Checks if the given memory buffer contains an ELF EraVM object file
//
// - 'LLVMGetUndefinedSymbolsEraVM'
// Returns an array of undefined linker symbols
//
// - 'LLVMDisposeUndefinedSymbolsEraVM'
// Disposes an array returned by the 'LLVMGetUndefinedSymbolsEraVM'
//
// These functions use a notion of the 'Linker Symbol' which is generalization
// of a usual ELF global symbol. The main difference is that 'Linker Symbol'
// has a 20-byte value, whereas the maximum value width of a usual ELF symbol
// is just 8 bytes. In order to represent a 20-byte symbol value with its
// relocation, initial 'Linker Symbol' is split into five sub-symbols
// which are usual 32-bit ELF symbols. This split is performed by the LLVM MC
// layer. For example, if the codegen needs to represent a 20-byte relocatable
// value associated with the symbol 'symbol_id', the MC layer sequentially
// (in the binary layout) emits the following undefined symbols:
//
// '__linker_symbol_id_0'
// '__linker_symbol_id_1'
// '__linker_symbol_id_2'
// '__linker_symbol_id_3'
// '__linker_symbol_id_4'
//
// with associated 32-bit relocations. Each sub-symbol name is formed by
// prepending '__linker' and appending '_[0-4]'. MC layer also sets the
// ELF::STO_ERAVM_LINKER_SYMBOL flag in the 'st_other' field in the symbol
// table entry to distinguish such symbols from all others.
// In EraVM, only these symbols are allowed to be undefined in an object
// code. All other cases must be treated as unreachable and denote a bug
// in the FE/LLVM codegen/Linker implementation.
// 'Linker Symbols' are resolved, i.e they receive their final 20-byte
// values, at the linkage stage when calling LLVMLinkEraVM.
// For this, the 'LLVMLinkEraVM' has parameters:
// - \p linkerSymbolNames, array of null-terminated linker symbol names
// - \p linkerSymbolValues, array of symbol values
//
// For example, if linkerSymbolNames[0] points to a string 'symbol_id',
// it takes the linkerSymbolValues[0] value which is 20-byte array
// 0xAAAAAAAABB.....EEEEEEEE) and creates five symbol definitions in
// a linker script:
//
// "__linker_symbol_id_0" = 0xAAAAAAAA
// "__linker_symbol_id_1" = 0xBBBBBBBB
// "__linker_symbol_id_2" = 0xCCCCCCCC
// "__linker_symbol_id_3" = 0xDDDDDDDD
// "__linker_symbol_id_4" = 0xEEEEEEEE
//
//===----------------------------------------------------------------------===//

Expand All @@ -19,10 +69,60 @@

LLVM_C_EXTERN_C_BEGIN

/** Performs linkage operation of an object code via lld-as-a-library.
Input/output files are transferred via memory buffers. */
LLVMBool LLVMLinkEraVM(LLVMMemoryBufferRef inBuf, LLVMMemoryBufferRef *outBuf,
char **ErrorMessage);
// Currently, the size of a linker symbol is limited to 20 bytes, as its the
// only usage is to represent Ethereum addresses which are of 160 bit width.
#define LINKER_SYMBOL_SIZE 20

/** Performs linkage of the ELF object code passed in \p inBuffer. The result
* is returned in \p outBuffer.
* EraVM platform hasn't a notion of separated compilation units, so the
* whole program is represented by the only ELF object file. In this case,
* the linker has two tasks. First, to emit definitions, passed in
* \p linkerSymbolValues, of linker symbols, passed in \p linkerSymbolNames,
* as shown above. The second one, is to perform symbol relocations.
* This function support an iterative linkage, i.e it will return relocatable
* ELF object files until all library symbols are defined. Once all of them
* are defined, it will return a final byte code with stripped ELF format.
* For example, if the initial input object file has two undefined linker
* symbols, 'symbol_id', 'symbol_id2' and at the first call only the
* 'symbol_id' definition was provided, the function will return an ELF
* object file where the symbol 'symbol_id' is defined, whereas the
* 'symbol_id2' is not. If the definition of 'symbol_id2' was provided
* at the second call, then the function returns the final bytecode.
* In case of an error the function returns 'true' and the error message
* is passes in \p errorMessage. The message should be disposed by
* 'LLVMDisposeMessage'. */
LLVMBool LLVMLinkEraVM(LLVMMemoryBufferRef inBuffer,
LLVMMemoryBufferRef *outBuffer,
const char *const *linkerSymbolNames,
const char linkerSymbolValues[][LINKER_SYMBOL_SIZE],
uint64_t numLinkerSymbols, char **errorMessage);

/** Returns true if the \p inBuffer contains an ELF object file. */
LLVMBool LLVMIsELFEraVM(LLVMMemoryBufferRef inBuffer);

/** Returns an array of undefined linker symbol names (null-terminating strings)
* of the ELF object file passed in \p inBuffer. The \p numLinkerSymbols
* contains the number of returned names.
* For example, if an ELF file has an undefined symbol which is represented
* via five sub-symbols:
* '__linker_symbol_id_0'
* '__linker_symbol_id_1'
* '__linker_symbol_id_2'
* '__linker_symbol_id_3'
* '__linker_symbol_id_4'
*
* the 'symbol_id' will be returned (stripping prefix and suffix) as the
* result.
* Caller should dispose the memory allocated for the returned array
* using 'LLVMDisposeUndefinedSymbolsEraVM' */
char **LLVMGetUndefinedLinkerSymbolsEraVM(LLVMMemoryBufferRef inBuffer,
uint64_t *numLinkerSymbols);

/** Disposes an array with linker symbols returned by the
* LLVMGetUndefinedSymbolsEraVM(). */
void LLVMDisposeUndefinedLinkerSymbolsEraVM(char *linkerSymbolNames[],
uint64_t numLinkerSymbols);
LLVM_C_EXTERN_C_END

#endif // LLD_C_LLDASLIBRARYC_H
Loading

0 comments on commit a40666b

Please sign in to comment.