From 27da600071fa9b261a2eed41673277ad69389557 Mon Sep 17 00:00:00 2001 From: Scott Young Date: Mon, 3 Feb 2020 18:36:53 -0400 Subject: [PATCH 01/25] This is a sqashed commit of the development commit log from the MicroJIT project. This project aims to introduce a lightweight templated JIT compiler to the OpenJ9 project. While this feature is not yet production ready, it is usable in short applications from compiling some methods. Co-authored-by: Eric Coffin Co-authored-by: Julie Brown Co-authored-by: Shubham Verma Co-authored-by: Harpreet Kaur Co-authored-by: Aaron Graham Signed-off-by: Scott Young Added assembly macro to aid in future template generation for a new lightweight templated JIT (MicroJIT) //Signed-off-by: Scott Young Added MicroJIT exception table and included MetaData creation functions for later use by a MicroJIT compiler. //Signed-off-by: Scott Young Added an assert for MicroJIT to identify errors as separate from TR. //Signed-off-by: Scott Young Added first templates to be used by a future MicroJIT code generator for x86_64. //Signed-off-by: Scott Young Added a MicroJIT code generator. This code generator is not yet used to create runnable compiled methods. //Signed-off-by: Scott Young Added links and options to allow MicroJIT code to be generated and executed. //Signed-off-by: Scott Young Cleaned up or correct rebased code where functionality tested on an earlier pull of OpenJ9 no longer integrated correctly. WIP. //Signed-off-by: Scott Young Fixed issue setting initial MicroJIT theshold count for methods. //Signed-off-by: Scott Young Added MicroJIT exception table and included MetaData creation functions for later use by a MicroJIT compiler. //Signed-off-by: Scott Young Added links and options to allow MicroJIT code to be generated and executed. //Signed-off-by: Scott Young Cleaned up or correct rebased code where functionality tested on an earlier pull of OpenJ9 no longer integrated correctly. WIP. //Signed-off-by: Scott Young Fixed issue setting initial MicroJIT theshold count for methods. //Signed-off-by: Scott Young Fixed the Microjit threshold count The counter in countAndCompile counts down from the TR threshold to zero; we need to set the MJIT threshold to the difference between the TR threshold and the provided MJIT count. For example if the TR threshold is 1000 and the MJIT threshold is 100, this would set the extra2 counter to 900. After 100 iteration MJIT would compile the method. //Signed-off-by: Eric Coffin Log the method signature after the method has compiled The log message can be used by a test script to ensure the target method has been compiled by the MicroJIT compiler. //Signed-off-by: Eric Coffin Add isub bytecode support for MJIT //Signed-off-by: Eric Coffin Add imul bytecode //Signed-off-by: Eric Coffin Add idiv bytecode TODO: Add support for ArithmeticException (divide by zero check) //Signed-off-by: Eric Coffin Fix idiv bytecode Given we're dealing with a DWORD rather than a QWORD, use eax rather than rax. Ensures sign bit is extended into edx (via cdq). //Signed-off-by: Eric Coffin Added new macros for MicroJIT assembly; Moved peak stack allocation from CompilationInfoPerThreadBase::mjit to AMD64CodeGenerator; Replaced specalized loading for loading tempaltes with generic loading template. //Signed-off-by: Scott Young Fixed an error related to tool assisted merge conflic resolution and a bug related to the initial storing of arguments in the local array. //Signed-off-by: Scott Young Added support for storing in the local array. Cleaned up Bytecode switch statement in generate body for loading and storing bytecodes using helper macros //Signed-off-by: Scott Young Remove unused iload templates //Signed-off-by: Eric Coffin Fixed error in calculating size of stack for MicroJIT //Signed-off-by: Scott Young Changed the stack based parameter map to a table based map. //Signed-off-by: Scott Young Initial commit of CodegenGC files used for creating GCMaps and Stack Atlas for MJIT //Signed-off-by: Scott Young Support recompilation after MJIT failure //Signed-off-by: Eric Coffin Add support for ladd and lreturn //Signed-off-by: Hapreet Kaur Clean up formatting //Signed-off-by: Eric Coffin Add JIT Hooks to mjit //Signed-off-by: Eric Coffin Added Paramater map creation to MJIT AMD64CodegenGC, added aload and getfield bytecodes to templates and CodeGen, added support for storing params on the stack into the local array during prologue. //Signed-off-by: Scott Young Release code cache after MJIT failure //Signed-off-by: Eric Coffin Support long method arguments TR appears to require 1 slot (8 bytes) when the last argument in a method is a long. //Signed-off-by: Eric Coffin Implemented the getstatic and putstatic bytecode templates. Updated the return templates to support void return type functions and moved the return generation to a separate function similar to the load and store bytecodes. //Signed-off-by: Scott Young Implemented the lsub and lmul bytecodes //Signed-off-by: Harpreet Kaur Added or updated copyright on files created for MJIT. Completed new MJIT specific stack atlas and GC Maps, and supporting side tables. //Signed-off-by: Scott Young Fixed errors created during rebase. //Signed-off-by: Scott Young Added support for compiling OpenJ9 with or without MicroJIT //Signed-off-by: Scott Young Updated outdated copyright information in files modified as part of initial PR to public Eclipse OpenJ9 repository //Signed-off-by: Scott Young Misc changes to MicroJIT files, detailed below: Guarded against several inclusions of MicroJIT files in build processes using define guards. Gaurded against changes for MicroJIT in files outside of compiler/microjit using define guards. Corrected file endings to include a newline at EOF for all modified or added files for MicroJIT. Removed typedef for J9MicroJITConfig and changed old usages to directly use struct J9JITConfig. Replaced magic numbers referencing pointer size to use an inlined call instead. Fixed copyright begin dates for newly created files for MicroJIT. Repalced accidently removed lines for debug symbol stripping. //Signed-off-by: Scott Young Misc changes to MicroJIT files, detailed below: Fixed problems with build files incorrectly updating includes Updated spacing between `if` and opening paren in CompilationThread Added descriptions to public methods of MJIT::CodeGenerator Removed duplicate comment in BytecodeInterpreter //Signed-off-by: Scott Young Replaced erronious tab characters with appropreate space counts. //Signed-off-by: Scott Young Replaced erronious tab characters with appropreate space counts in MicroJIT linkage. //Signed-off-by: Scott Young Fixed incorrect usage of spaces over tabs when readding line to strip debug symbols. //Signed-off-by: Scott Young Misc. changes to for MicroJIT detailed below: Added J9VM_OPT_MICROJIT comments to several `endif` statements. Fixed incorrectly ordered options in J9Options caused by previous fix to accidently deleted option. //Signed-off-by: Scott Young Fixed incorrect indexing when making local parameter table in MicroJIT prologue. //Signed-off-by: Scott Young Added newline to end of MetaData.cpp. Changed types in setInitialMJITCountUnsynchronized to prevent compiler warnings about increasing size. Fixed inconsitent placement of pointer specifier in variable declarations in CompilationThread.cpp //Signed-off-by: Scott Young Reverted copyright change in j9generated.h, since file is no longer modified from master version. //Signed-off-by: Scott Young Updated pointer syntax in MicroJIT directory to match usage in CompilationThread.cpp and maintain internal consistency. //Signed-off-by: Scott Young Partial implementation of GCR for MicroJIT //Signed-off-by: Scott Young Modified MicroJIT to use GCR as a recompilation trigger to cause TR to compile. Removed the extra2 field. MicroJIT now shares an extra field with TR. //Signed-off-by: Scott Young Fixed bug causing microjit to compile even when not given the Xjit:mjitEnabled=1 option //Signed-off-by: Scott Young Fixed incorrect options usage when getting the mjitEnabled flag for choosing whether to compile with MicroJIT or TR. Fixed a bug causing the MicroJIT default count overriding the TR count in the extra field when not using mjitEnabled as an xjit option. //Signed-off-by: Scott Young Modified the count setting to correctly set the extra field when using microjit. Changed code formatting of added code in HookedByTheJit for microjit to match surrounding code. Removed gaurd from counting in microjited methods to prevent microjited code from running for too long. //Signed-off-by: Scott Young Fixed problems with patching after removing gaurd in microjit GCR code generation. //Signed-off-by: Scott Young Added support for TR compilations after failing to compile with MicroJIT. Removed TRCount field from J9Method and changed mechanism for setting counts for MicroJIT counting blocks. //Signed-off-by: Scott Young Added CMakeList.txt files to microjit to allow for compilation with CMake //Signed-off-by: Scott Young Fixed issues with CMake build configuration for MicroJIT.\n Updated configuration files that prevented MicroJIT from being compiled correctly with CMake. Removed dead code from BytecodeInterpreter.hpp from previous verion of MicroJIT. //Signed-off-by: Scott Young Added missing Copyright information to newly added CMakeLists.txt files //Signed-off-by: Scott Young Update copyright and removed dead code. Changed boolean to bit flag in J9PersistenInfo. Added debugging text entry for logging in rossa.cpp //Signed-off-by: Scott Young stashed changes for workstation transfer //Signed-off-by: Scott Young Fixed signature logging and stack atlas for MJIT Added new line to print format string for signature in the mjit method of CompilationThreadInfoPerThreadBase The stack atlas creator now correctly skips the second slot of two slot data types when iterating over the local array. //Signed-off-by: Scott Young Overhauled the creation of CFGs in MicroJIT The CFG creation in MicroJIT was needlessly terse and did not hold all the information that was needed for getting profiling data from the interpreter profiler. The CFG creation process now stores all the required information to get these values from the profiler when using the JProfilingBlock optimization. //Signed-off-by: Scott Young Added MicroJIT methods for profiling to CFG. The J9CFG has recieved new methods which will only exist and be compiled when using MicroJIT. These methods will allow for the setting of edge frequencies without any IL trees in the CFG. //Signed-off-by: Scott Young Custom MJIT ByteCodeIterator Exposes printByteCodes() for printing entire method and adds helpers for getting the mnemonic and opcode. Add _printByteCodes boolean to CodeGenerator to limit BC debug statements in MJIT. Fixes opcode on compilation failure. //Signed-off-by: Eric Coffin //Signed-off-by: Scott Young Changed InternalMetaData to MJIT::ByteCodeIterator //Signed-off-by: Scott Young Partial implementation of `invokestatic` support //Signed-off-by: Scott Young Added support for `invokestatic` in MicroJIT code generation. Fixed problem with integers using full 64-bit registers, instead of lower 32-bit for MicroJIT code. //Signed-off-by: Scott Young Support for conditionals and goto Adds initial support for ifne and ificmpge and goto. //Signed-off-by: Eric Coffin //Signed-off-by: Scott Young MicroJIT - Add constants and iand Adds support for: - iconst_ - lconst_ - fconst_ - bipush - iand - land - ior - lor - ixor - lxor //Signed-off-by: Julie Brown Add ificmpgt to MJIT Adds ificmpgt and add fixes the missing switch statements for other branching bytecodes. //Signed-off-by: Eric Coffin MicroJIT - Patch missing break statement This commit patches a previous commit in which there was a missing break statement //Signed-off-by: Julie Brown MicroJIT - Patch iconst and lconst_ This commit updates the bytecode templates for iconst_ and lconst_ to use dword for integer types and qword for long types //Signed-off-by: Julie Brown Added support for iinc bytecode //Signed-off-by: Harpreet Kaur Fixes for MJIT Fixes ifne by replacing cmp with test. Allocates correct array size on the stack, which was previously causing a crash. //Signed-off-by: Eric Coffin //Signed-off-by: Scott Young Fixed accidental setting of extra field with MicroJIT start address during post compilation phase. Clearing registers in tempaltes before moves to prevent incorrect values from being partially kept. Fixed accidental exclusion of J9VM_OPT_MICROJIT from nasm defines. Now saving caller preserved registers used by MicroJIT before calling. //Signed-off-by: Scott Young MicroJIT - Add support for dup bytecodes This commit adds bytecode support for: - dup - dup_x1 - dup_x2 - dup2 - dup2_x1 - dup2_x2 These bytecodes have not yet been tested for correctness; writing test files for these will require support for invokespecial and new. //Signed-off-by: Julie Brown Added support for ldiv bytecode //Signed-off-by: Harpreet Kaur MicroJIT - Add support for ineg and lneg This commit adds support for ineg and lneg bytecodes //Signed-off-by: Julie Brown Log compilation success //Signed-off-by: Eric Coffin Wrap logging in define guards To improve performance wrap all logging in define guards. //Signed-off-by: Eric Coffin //Signed-off-by: Scott Young MicroJIT - Added support for irem and lrem //Signed-off-by: Harpreet Kaur Changed extra2 to act similarly to the extra field to help, fixing a crash caused when not running MicroJIT, but compiling it. //Signed-off-by: Scott Young MicroJIT: Add support for i2l and l2i This commit adds support for the i2l and l2i bytecodes //Signed-off-by: Julie Brown Fixed errors causing invokestatic to fail in some cases. MicroJIT now checks if the method has been compiled with TR before running it from interpreter. Fixed problem causing invokations of MicroJITed methods from JITed methods to enter at the interpreters entry point. //Signed-off-by: Scott Young Removed unnecessary comment from j2iTransition. //Signed-off-by: Scott Young Fixed problem with bipush patching. //Signed-off-by: Scott Young MicroJIT - Added support for shift bytecodes //Signed-off-by: Harpreet Kaur Some edits to the long shift bytecodes //Signed-off-by: Harpreet Kaur Fixed long paramters and shortened shift templates //Signed-off-by: Harpreet Kaur //Signed-off-by: Scott Young Function Declaration Edit //Signed-off-by: Harpreet Kaur MicroJIT: Add support for i2b and i2s bytecodes This commit adds support for the i2b and i2s bytecodes. //Signed-off-by: Julie Brown //Signed-off-by: Scott Young MicroJIT: Add support for sipush bytecode This commit adds support for the sipush bytecode. Also makes some edits to the iconst templates to specify the stack slot as being qword instead of dword. //Signed-off-by: Julie Brown MicroJIT: Add support for i2c bytecode This commit adds support for the i2c bytecode //Signed-off-by: Julie Brown MicroJIT: Add support for if and if_icmp This commit adds support for ifeq, iflt, ifle, ifgt, ifge, if_icmpeq, if_icmpne, if_icmplt, and if_icmple //Signed-off-by: Julie Brown MicroJIT: Add support for lcmp This commit adds support for the lcmp bytecode //Signed-off-by: Julie Brown //Signed-off-by: Scott Young MicroJIT - Added support for fadd, fsub and fmul //Signed-off-by: Harpreet Kaur Added fReturnTemplate for returning floats //Signed-off-by: Harpreet Kaur MicroJIT - Added support for fdiv and fneg bytecodes //Signed-off-by: Harpreet Kaur MicroJIT - Added support for double bytecodes //Signed-off-by: Harpreet Kaur MicroJIT: Add support for fcmpg and fcmpl This commit adds support for fcmpg and fcmpl. Also makes changes to lcmp which was not behaving properly. //Signed-off-by: Julie Brown MicroJIT: Remove comment about fconst //Signed-off-by: Julie Brown MicroJIT - Added support for ddiv and dneg //Signed-off-by: Harpreet Kaur MicroJIT - Fixed bug for longs //Signed-off-by: Harpreet Kaur //Signed-off-by: Scott Young Change MJIT computation stack longs to single slot This commit makes changes to the MicroJIT computation stack to use two slots for longs and doubles. This change is necessary for the stack bytecodes to work properly. It is also beneficial for future work with TR, as this will make the MicroJIT computation stack match up with TR //Signed-off-by: Julie Brown //Signed-off-by: Scott Young MJIT: Insert missing break statement This commit patches a previous commit in which a break statement was omitted //Signed-off-by: Julie Brown MicroJIT: Add support for all stack bytecodes //Signed-off-by: Julie Brown MicroJIT: Add supoort for dcmpl and dcmpg //Signed-off-by: Julie Brown MicroJIT: Add support for dconst //Signed-off-by: Julie Brown MicroJIT: Add support for i2d, l2d, d2i, d2l //Signed-off-by: Julie Brown MicroJIT: Add support ifnull, ifnonnull, if_acmpeq, if_acmpne Adds support for above listed bytecodes. These are untested //Signed-off-by: Julie Brown MicroJIT - Added support for d2f and f2d bytecodes //Signed-off-by: Harpreet Kaur MicroJIT: Add support for i2f, l2f, f2i, f2l //Signed-off-by: Harpreet Kaur MicroJIT - Added support for frem and drem //Signed-off-by: Harpreet Kaur //Signed-off-by: Scott Young MicroJIT - Added support for getfield and putfield //Signed-off-by: Harpreet Kaur //Signed-off-by: Scott Young Removed references to the old extra2 During the rebase, references to the old extra2 field reappered These references were removed. //Signed-off-by: Scott Young Misc. changes to MicroJIT code detailed below Removed references to the now removed trtlookup IL Changed signed ints that were not meant to be negative to unsigned Changed iterator in MicroJIT MetaData creation to the MJIT iterator Fixed mismatched declaration and definition signatures from rebase Switched SSE remainder helper calls in MJIT to use the helper lookup //Signed-off-by: Scott Young Changed assertion to log in MJIT Change an assertion in the logic for finding the stack offsets to instead log the info and end compilation. //Signed-off-by: Scott Young Readded unintentionally removed newline //Signed-off-by: Scott Young Fixed placement of define guard in J9CFG //Signed-off-by: Scott Young Linkage & Properties Structure Changes //Signed-off-by: Harpreet Kaur Added define guard on include for microjit //Signed-off-by: Scott Young Misc changes to profiling for MicroJIT Added checks for calling MicroJIT frequency setter on methods compiled by MJIT. Fixed incorrect usage of heap memory by supporting classes for making CFGs in MicroJIT. Fixed logical errors in the supporting classes used to generate a CFG in MicroJIT. Added optimization strategy for MicroJIT. Added early exit in JProfilingBlock for MicroJIT to allow for early testing of MicroJIT frequency setting before creation of counter placement code path for MicroJIT. //Signed-off-by: Scott Young Fixed errors with JNI in MJIT //Signed-off-by: Scott Young Temporarily removed support for addresses in MJIT //Signed-off-by: Scott Young Uncommented disabled BCs and Fixed issue with sipush //Signed-off-by: Scott Young Added GotoW support (Untested) //Signed-off-by: Shubham Verma Changes to Codegen for invokestatic to fix interpreter calling and stack slot count problems. //Signed-off-by: Scott Young MicroJIT - Fix compile errors in gotow //Signed-off-by: Harpreet Kaur Fixed MicroJIT causing crashes when mis-handling object references. Added codesize estimation calculation. //Signed-off-by: Scott Young Fixed GCMap and Invoke static issues. Slotcount was not being multiplied by slot-size when setting offsets for saving and loading arguments. Local index was being incorrectly assigned when looping over the parameter map in meta-data creation phase. //Signed-off-by: Scott Young Fixed typo in comments for several templates Fixed slot errors on jumps Some conditional branches were reducing from 2 to 1 slot when they should have been reducing from 2 to 0. This has been corrected. //Signed-off-by: Scott Young MicroJIT Linkage Refactoring //Signed-off-by: Harpreet Kaur Replacing MJIT CodeGenerator class parameter with TR Compilation class in linkage constructor //Signed-off-by: Harpreet Kaur Passing comp->cg() as parameter to the linkage constructor //Signed-off-by: Harpreet Kaur Fetching TR properties initialization code //Signed-off-by: Harpreet Kaur Using cg and _properties for function parameters //Signed-off-by: Harpreet Kaur Parameter name changes and passing properties structure as a pointer TODO: MicroJIT to use the setOffsetToFirstParm method //Signed-off-by: Harpreet Kaur Fetching default return registers from properties structure //Signed-off-by: Harpreet Kaur Handling addresses and patching requisite slots for different datatypes //Signed-off-by: Harpreet Kaur Changes to generateBody //Signed-off-by: Harpreet Kaur Fetching default argument registers from properties structure //Signed-off-by: Harpreet Kaur Getting default return registers from linkage properties in remainder bytecodes //Signed-off-by: Harpreet Kaur Misc. Changes to linkage and bytecodes to fix bugs. Turned off support for get and put field until write barriers can be implemented. Added verbose log check for mjit log messages //Signed-off-by: Scott Young MJIT DSL for templates and new jmp template //Signed-off-by: Scott Young Added define guards to microjit hook in profiler //Signed-off-by: Scott Young Misc bug fixes to MJIT compiler Fixed preprologue incorrect use of 5 byte jump instead of 2 byte jump. Added field to identify uninitialized entries in local and param tables. Fixed handling of loading and storing of references. Fixed handling of passing references to callee methods. Added verbose log checks to verbose logging in MJIT compiler. //Signed-off-by: Scott Young Handle Special Case for Longs and Doubles in the First Slot of the Param Table //Signed-off-by: Aaron Graham Guard Against Negative Indices Being Accessible in the Param, or Local, Table Entries //Signed-off-by: Aaron Graham Fix Initialization Check in Param and Local Table Entries //Signed-off-by: Aaron Graham Fixing compile errors during linkage rebase //Signed-off-by: Harpreet Kaur Changing copyright dates and formatting in bytecodes.nasm //Signed-off-by: Harpreet Kaur Commenting out the untested goto_w bytecode and some copyright changes //Signed-off-by: Harpreet Kaur Fixed holes in codecache. Rewrote side tables. //Signed-off-by: Scott Fixed MJIT issue with references and static fields //Co-authored-by: Scott //Signed-off-by: Harpreet Kaur Removed trailing whitespace from modified cpp/hpp/nasm files //Signed-off-by: Scott Young Removed additional whitespace found in .mk files and missed .cpp files. //Signed-off-by: Scott Young Updated Copyright on files changed by this PR that were not changed this year to the current year. //Signed-off-by: Scott Young Fixed additional trailing whitepace missed from previous commits //Signed-off-by: Scott Young Co-authored-by: Eric Coffin Co-authored-by: Julie Brown Co-authored-by: Shubham Verma Co-authored-by: Harpreet Kaur Co-authored-by: Aaron Graham Signed-off-by: Scott Young --- buildspecs/core.feature | 1 + buildspecs/j9.flags | 4 + runtime/compiler/CMakeLists.txt | 43 +- runtime/compiler/build/files/common.mk | 5 + runtime/compiler/build/files/target/amd64.mk | 11 +- runtime/compiler/build/toolcfg/common.mk | 7 +- runtime/compiler/build/toolcfg/gnu/common.mk | 12 +- .../compiler/control/CompilationRuntime.hpp | 29 +- .../compiler/control/CompilationThread.cpp | 434 ++- .../compiler/control/CompilationThread.hpp | 5 + runtime/compiler/control/HookedByTheJit.cpp | 9 + runtime/compiler/control/J9Options.cpp | 11 + runtime/compiler/control/J9Options.hpp | 9 + runtime/compiler/control/J9Recompilation.cpp | 4 +- .../compiler/control/RecompilationInfo.hpp | 16 +- runtime/compiler/control/rossa.cpp | 15 +- runtime/compiler/control/rossa.h | 15 +- runtime/compiler/env/VMJ9.h | 3 + runtime/compiler/infra/J9Cfg.cpp | 755 +++- runtime/compiler/infra/J9Cfg.hpp | 20 +- .../compiler/microjit/ByteCodeIterator.hpp | 58 + runtime/compiler/microjit/CMakeLists.txt | 39 + runtime/compiler/microjit/ExceptionTable.cpp | 80 + runtime/compiler/microjit/ExceptionTable.hpp | 56 + .../compiler/microjit/HWMetaDataCreation.cpp | 629 ++++ .../compiler/microjit/LWMetaDataCreation.cpp | 28 + .../microjit/MethodBytecodeMetaData.hpp | 69 + runtime/compiler/microjit/SideTables.hpp | 338 ++ runtime/compiler/microjit/assembly/utils.nasm | 30 + runtime/compiler/microjit/utils.hpp | 39 + runtime/compiler/microjit/x/CMakeLists.txt | 24 + .../microjit/x/amd64/AMD64Codegen.cpp | 3096 +++++++++++++++++ .../microjit/x/amd64/AMD64Codegen.hpp | 444 +++ .../microjit/x/amd64/AMD64CodegenGC.cpp | 146 + .../microjit/x/amd64/AMD64CodegenGC.hpp | 37 + .../microjit/x/amd64/AMD64Linkage.cpp | 38 + .../microjit/x/amd64/AMD64Linkage.hpp | 52 + .../compiler/microjit/x/amd64/CMakeLists.txt | 27 + .../microjit/x/amd64/templates/CMakeLists.txt | 26 + .../microjit/x/amd64/templates/bytecodes.nasm | 1351 +++++++ .../microjit/x/amd64/templates/linkage.nasm | 488 +++ .../microjit/x/amd64/templates/microjit.nasm | 94 + runtime/compiler/optimizer/J9Optimizer.cpp | 14 + runtime/compiler/optimizer/J9Optimizer.hpp | 5 +- .../compiler/optimizer/JProfilingBlock.cpp | 34 +- runtime/compiler/runtime/MetaData.cpp | 289 ++ .../x/amd64/codegen/AMD64PrivateLinkage.cpp | 167 +- .../x/amd64/codegen/AMD64PrivateLinkage.hpp | 4 +- runtime/include/j9cfg.h.in | 1 + runtime/makelib/uma.properties | 4 +- runtime/oti/j9consts.h | 2 + 51 files changed, 8988 insertions(+), 129 deletions(-) create mode 100644 runtime/compiler/microjit/ByteCodeIterator.hpp create mode 100644 runtime/compiler/microjit/CMakeLists.txt create mode 100644 runtime/compiler/microjit/ExceptionTable.cpp create mode 100644 runtime/compiler/microjit/ExceptionTable.hpp create mode 100644 runtime/compiler/microjit/HWMetaDataCreation.cpp create mode 100644 runtime/compiler/microjit/LWMetaDataCreation.cpp create mode 100644 runtime/compiler/microjit/MethodBytecodeMetaData.hpp create mode 100644 runtime/compiler/microjit/SideTables.hpp create mode 100644 runtime/compiler/microjit/assembly/utils.nasm create mode 100644 runtime/compiler/microjit/utils.hpp create mode 100644 runtime/compiler/microjit/x/CMakeLists.txt create mode 100755 runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp create mode 100644 runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp create mode 100644 runtime/compiler/microjit/x/amd64/AMD64CodegenGC.cpp create mode 100644 runtime/compiler/microjit/x/amd64/AMD64CodegenGC.hpp create mode 100644 runtime/compiler/microjit/x/amd64/AMD64Linkage.cpp create mode 100644 runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp create mode 100644 runtime/compiler/microjit/x/amd64/CMakeLists.txt create mode 100644 runtime/compiler/microjit/x/amd64/templates/CMakeLists.txt create mode 100644 runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm create mode 100644 runtime/compiler/microjit/x/amd64/templates/linkage.nasm create mode 100644 runtime/compiler/microjit/x/amd64/templates/microjit.nasm diff --git a/buildspecs/core.feature b/buildspecs/core.feature index 155a8568b5b..5b5e4bf09aa 100644 --- a/buildspecs/core.feature +++ b/buildspecs/core.feature @@ -112,6 +112,7 @@ SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-excepti + diff --git a/buildspecs/j9.flags b/buildspecs/j9.flags index 152c908e30b..128b0ccb639 100644 --- a/buildspecs/j9.flags +++ b/buildspecs/j9.flags @@ -1784,6 +1784,10 @@ Only available on zOS Enables common dependencies between OpenJ9 and OpenJDK MethodHandles. Disables common dependencies between OpenJ9 and OpenJDK MethodHandles. + + MicroJIT support is enabled in the buildspec. + + Turns on module support diff --git a/runtime/compiler/CMakeLists.txt b/runtime/compiler/CMakeLists.txt index 2ac3bbfccbf..6e4d98fe365 100644 --- a/runtime/compiler/CMakeLists.txt +++ b/runtime/compiler/CMakeLists.txt @@ -38,15 +38,28 @@ if(OMR_ARCH_X86) endif() endif() enable_language(ASM_NASM) - # We have to manually append "/" to the paths as NASM versions older than v2.14 requires trailing / in the directory paths. - set(asm_inc_dirs - "-I${j9vm_SOURCE_DIR}/oti/" - "-I${j9vm_BINARY_DIR}/oti/" - "-I${CMAKE_CURRENT_SOURCE_DIR}/" - "-I${CMAKE_CURRENT_SOURCE_DIR}/x/runtime/" - "-I${CMAKE_CURRENT_SOURCE_DIR}/x/amd64/runtime/" - "-I${CMAKE_CURRENT_SOURCE_DIR}/x/i386/runtime/" - ) + # We have to manually append "/" to the paths as NASM versions older than v2.14 requires trailing / in the directory paths + if(J9VM_OPT_MICROJIT) + set(asm_inc_dirs + "-I${j9vm_SOURCE_DIR}/oti/" + "-I${j9vm_BINARY_DIR}/oti/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/x/runtime/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/x/amd64/runtime/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/x/i386/runtime/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/microjit/assembly/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/microjit/x/amd64/templates/" + ) + else() + set(asm_inc_dirs + "-I${j9vm_SOURCE_DIR}/oti/" + "-I${j9vm_BINARY_DIR}/oti/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/x/runtime/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/x/amd64/runtime/" + "-I${CMAKE_CURRENT_SOURCE_DIR}/x/i386/runtime/" + ) + endif() omr_append_flags(CMAKE_ASM_NASM_FLAGS ${asm_inc_dirs}) # For whatever reason cmake does not apply compile definitions when building nasm objects. # The if guard is here in case they ever start doing so. @@ -106,8 +119,12 @@ if(J9VM_OPT_JITSERVER) include_directories(${OPENSSL_INCLUDE_DIR}) endif() -# TODO We should get rid of this, but it's still required by the compiler_support module in OMR. -set(J9SRC ${j9vm_SOURCE_DIR}) +if(J9VM_OPT_MICROJIT) + message(STATUS "MicroJIT is enabled") +endif() + +#TODO We should get rid of this, but its still required by the compiler_support module in omr +set(J9SRC ${j9vm_SOURCE_DIR}) # include compiler support module from OMR include(OmrCompilerSupport) @@ -146,6 +163,10 @@ if(J9VM_OPT_JITSERVER) add_subdirectory(net) endif() +if(J9VM_OPT_MICROJIT) + add_subdirectory(microjit) +endif() + if(IS_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/${TR_HOST_ARCH}") add_subdirectory(${TR_HOST_ARCH}) endif() diff --git a/runtime/compiler/build/files/common.mk b/runtime/compiler/build/files/common.mk index d0efa8acf90..65e0b693e95 100644 --- a/runtime/compiler/build/files/common.mk +++ b/runtime/compiler/build/files/common.mk @@ -417,6 +417,11 @@ JIT_PRODUCT_SOURCE_FILES+=\ compiler/runtime/MetricsServer.cpp endif +ifneq ($(J9VM_OPT_MICROJIT),) +JIT_PRODUCT_SOURCE_FILES+=\ + compiler/microjit/ExceptionTable.cpp +endif + -include $(JIT_MAKE_DIR)/files/extra.mk include $(JIT_MAKE_DIR)/files/host/$(HOST_ARCH).mk include $(JIT_MAKE_DIR)/files/target/$(TARGET_ARCH).mk diff --git a/runtime/compiler/build/files/target/amd64.mk b/runtime/compiler/build/files/target/amd64.mk index c77c3e7debe..3c43a82e544 100644 --- a/runtime/compiler/build/files/target/amd64.mk +++ b/runtime/compiler/build/files/target/amd64.mk @@ -1,4 +1,4 @@ -# Copyright (c) 2000, 2021 IBM Corp. and others +# Copyright (c) 2000, 2022 IBM Corp. and others # # This program and the accompanying materials are made available under # the terms of the Eclipse Public License 2.0 which accompanies this @@ -35,3 +35,12 @@ JIT_PRODUCT_SOURCE_FILES+=\ compiler/x/amd64/codegen/AMD64JNILinkage.cpp \ compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp \ compiler/x/amd64/codegen/J9CodeGenerator.cpp + +ifneq ($(J9VM_OPT_MICROJIT),) +JIT_PRODUCT_SOURCE_FILES+=\ + compiler/microjit/x/amd64/AMD64Codegen.cpp \ + compiler/microjit/x/amd64/AMD64CodegenGC.cpp \ + compiler/microjit/x/amd64/AMD64Linkage.cpp \ + compiler/microjit/x/amd64/templates/linkage.nasm \ + compiler/microjit/x/amd64/templates/bytecodes.nasm +endif diff --git a/runtime/compiler/build/toolcfg/common.mk b/runtime/compiler/build/toolcfg/common.mk index b0b74aa6f63..749a577a9e6 100644 --- a/runtime/compiler/build/toolcfg/common.mk +++ b/runtime/compiler/build/toolcfg/common.mk @@ -1,4 +1,4 @@ -# Copyright (c) 2000, 2020 IBM Corp. and others +# Copyright (c) 2000, 2022 IBM Corp. and others # # This program and the accompanying materials are made available under # the terms of the Eclipse Public License 2.0 which accompanies this @@ -43,6 +43,11 @@ PRODUCT_INCLUDES=\ $(J9SRC)/oti \ $(J9SRC)/util +ifneq ($(J9VM_OPT_MICROJIT),) + PRODUCT_INCLUDES+=\ + $(FIXED_SRCBASE)/compiler/microjit +endif + PRODUCT_DEFINES+=\ BITVECTOR_BIT_NUMBERING_MSB \ J9_PROJECT_SPECIFIC diff --git a/runtime/compiler/build/toolcfg/gnu/common.mk b/runtime/compiler/build/toolcfg/gnu/common.mk index c4de3e711b4..b294cef0141 100644 --- a/runtime/compiler/build/toolcfg/gnu/common.mk +++ b/runtime/compiler/build/toolcfg/gnu/common.mk @@ -1,4 +1,4 @@ -# Copyright (c) 2000, 2021 IBM Corp. and others +# Copyright (c) 2000, 2022 IBM Corp. and others # # This program and the accompanying materials are made available under # the terms of the Eclipse Public License 2.0 which accompanies this @@ -290,6 +290,11 @@ ifeq ($(HOST_ARCH),x) $(J9SRC)/compiler \ $(J9SRC)/compiler/x/runtime + ifneq ($(J9VM_OPT_MICROJIT),) + NASM_INCLUDES+=\ + $(J9SRC)/compiler/microjit/assembly + endif + ifeq ($(HOST_BITS),32) NASM_OBJ_FORMAT=-felf32 @@ -310,6 +315,11 @@ ifeq ($(HOST_ARCH),x) TR_HOST_64BIT \ TR_TARGET_64BIT + ifneq ($(J9VM_OPT_MICROJIT),) + NASM_DEFINES+=\ + J9VM_OPT_MICROJIT + endif + NASM_INCLUDES+=\ $(J9SRC)/compiler/x/amd64/runtime endif diff --git a/runtime/compiler/control/CompilationRuntime.hpp b/runtime/compiler/control/CompilationRuntime.hpp index 9b41bc2fb1e..319c53a0961 100644 --- a/runtime/compiler/control/CompilationRuntime.hpp +++ b/runtime/compiler/control/CompilationRuntime.hpp @@ -674,12 +674,33 @@ class CompilationInfo #if defined(J9VM_OPT_JITSERVER) TR_ASSERT_FATAL(!TR::CompilationInfo::getStream(), "not yet implemented for JITServer"); #endif /* defined(J9VM_OPT_JITSERVER) */ - value = (value << 1) | J9_STARTPC_NOT_TRANSLATED; - if (value < 0) - value = INT_MAX; - method->extra = reinterpret_cast(static_cast(value)); +#if defined(J9VM_OPT_MICROJIT) + TR::Options *options = TR::Options::getJITCmdLineOptions(); + if(options->_mjitEnabled && value) + { + return; + } + else +#endif + { + value = (value << 1) | J9_STARTPC_NOT_TRANSLATED; + if (value < 0) + value = INT_MAX; + method->extra = reinterpret_cast(static_cast(value)); + } } +#if defined(J9VM_OPT_MICROJIT) + static void setInitialMJITCountUnsynchronized(J9Method *method, int32_t mjitThreshold) + { + if(TR::Options::getJITCmdLineOptions()->_mjitEnabled) + { + intptr_t value = (intptr_t)((mjitThreshold << 1) | J9_STARTPC_NOT_TRANSLATED); + method->extra = reinterpret_cast(static_cast(value)); + } + } +#endif + static uint32_t getMethodBytecodeSize(const J9ROMMethod * romMethod); static uint32_t getMethodBytecodeSize(J9Method* method); diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index 26d8eebd70f..af38e7e81fb 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -94,6 +94,16 @@ #include "env/J9SegmentCache.hpp" #include "env/SystemSegmentProvider.hpp" #include "env/DebugSegmentProvider.hpp" +#include "ilgen/J9ByteCodeIterator.hpp" +#include "ilgen/J9ByteCodeIteratorWithState.hpp" +#if defined(J9VM_OPT_MICROJIT) +#include "microjit/x/amd64/AMD64Codegen.hpp" +#include "microjit/x/amd64/AMD64CodegenGC.hpp" +#include "microjit/ByteCodeIterator.hpp" +#include "microjit/SideTables.hpp" +#include "microjit/utils.hpp" +#include "codegen/OMRLinkage_inlines.hpp" +#endif /* J9VM_OPT_MICROJIT */ #if defined(J9VM_OPT_JITSERVER) #include "control/JITClientCompilationThread.hpp" #include "control/JITServerCompilationThread.hpp" @@ -155,6 +165,15 @@ extern TR::OptionSet *findOptionSet(J9Method *, bool); #define DECOMPRESSION_FAILED -1 #define DECOMPRESSION_SUCCEEDED 0 +#if defined(J9VM_OPT_MICROJIT) +#define COMPILE_WITH_MICROJIT(extra, method, extendedFlags)\ + (TR::Options::getJITCmdLineOptions()->_mjitEnabled && \ + extra && \ + !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) &&\ + !J9_ARE_NO_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) &&\ + !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)) +#endif + #if defined(WINDOWS) void setThreadAffinity(unsigned _int64 handle, unsigned long mask) { @@ -2333,6 +2352,11 @@ bool TR::CompilationInfo::shouldRetryCompilation(TR_MethodToBeCompiled *entry, T entry->_optimizationPlan->setDisableGCR(); // GCR isn't needed tryCompilingAgain = true; break; +#if defined(J9VM_OPT_MICROJIT) + case mjitCompilationFailure: + tryCompilingAgain = true; + break; +#endif case compilationNullSubstituteCodeCache: case compilationCodeMemoryExhausted: case compilationCodeCacheError: @@ -9073,9 +9097,22 @@ TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary *portLib, void * Trc_JIT_compileStart(vmThread, hotnessString, compiler->signature()); TR_ASSERT(compiler->getMethodHotness() != unknownHotness, "Trying to compile at unknown hotness level"); - - metaData = that->compile(vmThread, compiler, compilee, *vm, p->_optimizationPlan, scratchSegmentProvider); - + { +#if defined(J9VM_OPT_MICROJIT) + J9Method *method = that->_methodBeingCompiled->getMethodDetails().getMethod(); + UDATA extra = (UDATA)method->extra; + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); + U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + if (COMPILE_WITH_MICROJIT(extra, method, extendedFlags)) + { + metaData = that->mjit(vmThread, compiler, compilee, *vm, p->_optimizationPlan, scratchSegmentProvider, p->trMemory()); + } + else +#endif + { + metaData = that->compile(vmThread, compiler, compilee, *vm, p->_optimizationPlan, scratchSegmentProvider); + } + } } try @@ -9297,6 +9334,390 @@ TR::CompilationInfoPerThreadBase::performAOTLoad( return metaData; } +#if defined(J9VM_OPT_MICROJIT) +// MicroJIT returns a code size of 0 when it encournters a compilation error +// We must use this to set the correct values and return NULL, this is the same +// no matter which phase of compilation fails, so we create a macro here to +// perform that work. +#define MJIT_COMPILE_ERROR(code_size, method, romMethod) do{\ + if (0 == code_size)\ + {\ + intptr_t trCountFromMJIT = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT;\ + trCountFromMJIT = (trCountFromMJIT << 1) | 1;\ + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread);\ + U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method);\ + *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE;\ + method->extra = reinterpret_cast(trCountFromMJIT);\ + compiler->failCompilation("Cannot compile with MicroJIT.");\ + }\ +} while(0) + +// This routine should only be called from wrappedCompile +TR_MethodMetaData * +TR::CompilationInfoPerThreadBase::mjit( + J9VMThread *vmThread, + TR::Compilation *compiler, + TR_ResolvedMethod *compilee, + TR_J9VMBase &vm, + TR_OptimizationPlan *optimizationPlan, + TR::SegmentAllocator const &scratchSegmentProvider, + TR_Memory *trMemory + ) + { + + PORT_ACCESS_FROM_JITCONFIG(jitConfig); + + TR_MethodMetaData *metaData = NULL; + J9Method *method = NULL; + TR::CodeCache *codeCache = NULL; + + if (_methodBeingCompiled->_priority >= CP_SYNC_MIN) + ++_compInfo._numSyncCompilations; + else + ++_compInfo._numAsyncCompilations; + + if (_methodBeingCompiled->isDLTCompile()) + compiler->setDltBcIndex(static_cast(_methodBeingCompiled->getMethodDetails()).getByteCodeIndex()); + + bool volatile haveLockedClassUnloadMonitor = false; // used for compilation without VM access + int32_t estimated_size = 0; + char* buffer = NULL; + try + { + + InterruptibleOperation compilingMethodBody(*this); + + TR::IlGeneratorMethodDetails & details = _methodBeingCompiled->getMethodDetails(); + method = details.getMethod(); + + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); //TODO: fe can be replaced by vm and removed from here. + U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + const char* signature = compilee->signature(compiler->trMemory()); + + TRIGGER_J9HOOK_JIT_COMPILING_START(_jitConfig->hookInterface, vmThread, method); + + // BEGIN MICROJIT + TR::FilePointer *logFileFP = this->getCompilation()->getOutFile(); + MJIT::ByteCodeIterator bcIterator(0, static_cast (compilee), static_cast (&vm), comp()); + + if (TR::Options::canJITCompile()) + { + TR::Options *options = TR::Options::getJITCmdLineOptions(); + + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); + U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); + if(compiler->getOption(TR_TraceCG)){ + trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiling %s...\n------------------------------\n", signature); + } + + MJIT_COMPILE_ERROR(!(compilee->isConstructor()), method, romMethod); //"MicroJIT does not currently compile constructors" + + bool isStaticMethod = compilee->isStatic(); // To know if the method we are compiling is static or not + if (!isStaticMethod){ + maxLength += 1; // Make space for an extra character for object reference in case of non-static methods + } + + char typeString[maxLength]; + if (MJIT::nativeSignature(method, typeString)) + { + return 0; + } + + // To insert 'L' for object reference with non-static methods + // before the input arguments starting at index 3. + // The last iteration sets typeString[3] to typeString[2], + // which is always MJIT::CLASSNAME_TYPE_CHARACTER, i.e., 'L' + // and other iterations just move the characters one index down. + // e.g. xxIIB -> + if (!isStaticMethod){ + for (int i = maxLength - 1; i > 2 ; i--) { + typeString[i] = typeString[i-1]; + } + } + + U_16 paramCount = MJIT::getParamCount(typeString, maxLength); + U_16 actualParamCount = MJIT::getActualParamCount(typeString, maxLength); + MJIT::ParamTableEntry paramTableEntries[actualParamCount*2]; + + for(int i=0; imaxBytecodeIndex(); + + MJIT::CodeGenGC mjitCGGC(logFileFP); + MJIT::CodeGenerator mjit_cg(_jitConfig, vmThread, logFileFP, vm, ¶mTable, compiler, &mjitCGGC, comp()->getPersistentInfo(), trMemory, compilee); + estimated_size = MAX_BUFFER_SIZE(maxBCI); + buffer = (char*)mjit_cg.allocateCodeCache(estimated_size, &vm, vmThread); + MJIT_COMPILE_ERROR(buffer, method, romMethod); // MicroJIT cannot compile if allocation fails. + codeCache = mjit_cg.getCodeCache(); + + + // provide enough space for CodeCacheMethodHeader + char *cursor = buffer; + + buffer_size_t buffer_size = 0; + + char *magicWordLocation, *first2BytesPatchLocation, *samplingRecompileCallLocation; + buffer_size_t code_size = mjit_cg.generatePrePrologue( + cursor, + method, + &magicWordLocation, + &first2BytesPatchLocation, + &samplingRecompileCallLocation); + + MJIT_COMPILE_ERROR(code_size, method, romMethod); + + compiler->cg()->setPrePrologueSize(code_size); + buffer_size += code_size; + MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after pre-prologue"); + +#ifdef MJIT_DEBUG + trfprintf(this->getCompilation()->getOutFile(), "\ngeneratePrePrologue\n"); + for(int32_t i = 0; i < code_size; i++) + { + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + } +#endif + cursor += code_size; + + //start point should point to prolog + char * prologue_address = cursor; + + // generate debug breakpoint + if (comp()->getOption(TR_EntryBreakPoints)) + { + code_size = mjit_cg.generateDebugBreakpoint(cursor); + cursor += code_size; + buffer_size += code_size; + } + + char * jitStackOverflowPatchLocation = NULL; + + mjit_cg.setPeakStackSize(romMethod->maxStack * mjit_cg.getPointerSize()); + char* firstInstructionLocation = NULL; + + code_size = mjit_cg.generatePrologue( + cursor, + method, + &jitStackOverflowPatchLocation, + magicWordLocation, + first2BytesPatchLocation, + samplingRecompileCallLocation, + &firstInstructionLocation, + &bcIterator); + MJIT_COMPILE_ERROR(code_size, method, romMethod); + + TR::GCStackAtlas *atlas = mjit_cg.getStackAtlas(); + compiler->cg()->setStackAtlas(atlas); + compiler->cg()->setMethodStackMap(atlas->getLocalMap()); + //TODO: Find out why setting this correctly causes the startPC to report the jitToJit startPC and not the interpreter entry point + //compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(firstInstructionLocation-cursor)); + compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(0)); + + buffer_size += code_size; + MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after prologue"); + + if(compiler->getOption(TR_TraceCG)) + { + trfprintf(this->getCompilation()->getOutFile(), "\ngeneratePrologue\n"); + for(int32_t i = 0; i < code_size; i++) + { + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + } + } + + cursor += code_size; + + // GENERATE BODY + bcIterator.setIndex(0); + + TR_Debug dbg(this->getCompilation()); + this->getCompilation()->setDebug(&dbg); + + if(compiler->getOption(TR_TraceCG)) + { + trfprintf( + this->getCompilation()->getOutFile(), + "\n" + "%s\n", + signature); + } + + code_size = mjit_cg.generateBody(cursor, &bcIterator); + + MJIT_COMPILE_ERROR(code_size, method, romMethod); + + buffer_size += code_size; + MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after body"); + +#ifdef MJIT_DEBUG + trfprintf(this->getCompilation()->getOutFile(), "\ngenerateBody\n"); + for(int32_t i = 0; i < code_size; i++) + { + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + } +#endif + + cursor += code_size; + // END GENERATE BODY + + // GENERATE COLD AREA + code_size = mjit_cg.generateColdArea(cursor, method, jitStackOverflowPatchLocation); + + MJIT_COMPILE_ERROR(code_size, method, romMethod); + buffer_size += code_size; + MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after cold-area"); + +#ifdef MJIT_DEBUG + trfprintf(this->getCompilation()->getOutFile(), "\ngenerateColdArea\n"); + for(int32_t i = 0; i < code_size; i++) + { + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + } + + trfprintf(this->getCompilation()->getOutFile(), "\nfinal method\n"); + for(int32_t i = 0; i < buffer_size; i++) + { + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)buffer[i]) & (unsigned char)0xff); + } +#endif + + cursor += code_size; + + // As the body is finished, mark its profile info as active so that the JProfiler thread will inspect it + + //TODO: after adding profiling support, uncomment this. + //if (bodyInfo && bodyInfo->getProfileInfo()) + // { + // bodyInfo->getProfileInfo()->setActive(); + // } + + // Put a metaData pointer into the Code Cache Header(s). + // + compiler->cg()->setBinaryBufferCursor((uint8_t*)(cursor)); + compiler->cg()->setBinaryBufferStart((uint8_t*)(buffer)); + if(compiler->getOption(TR_TraceCG)) + { + trfprintf( + this->getCompilation()->getOutFile(), + "Compiled method binary buffer finalized [" POINTER_PRINTF_FORMAT " : " POINTER_PRINTF_FORMAT "] for %s @ %s", + ((void*)buffer), + ((void*)cursor), + compiler->signature(), + compiler->getHotnessName()); + } + + metaData = createMJITMethodMetaData(vm, compilee, compiler); + if (!metaData) + { + if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch)) + { + TR_VerboseLog::writeLineLocked( + TR_Vlog_DISPATCH, + "Failed to create metadata for %s @ %s", + compiler->signature(), + compiler->getHotnessName()); + } + compiler->failCompilation("Metadata creation failure"); + } + if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch)) + { + TR_VerboseLog::writeLineLocked( + TR_Vlog_DISPATCH, + "Successfully created metadata [" POINTER_PRINTF_FORMAT "] for %s @ %s", + metaData, + compiler->signature(), + compiler->getHotnessName()); + } + setMetadata(metaData); + uint8_t *warmMethodHeader = compiler->cg()->getBinaryBufferStart() - sizeof(OMR::CodeCacheMethodHeader); + memcpy( warmMethodHeader + offsetof(OMR::CodeCacheMethodHeader, _metaData), &metaData, sizeof(metaData) ); + // FAR: should we do postpone this copying until after CHTable commit? + metaData->runtimeAssumptionList = *(compiler->getMetadataAssumptionList()); + TR::ClassTableCriticalSection chTableCommit(&vm); + TR_ASSERT(!chTableCommit.acquiredVMAccess(), "We should have already acquired VM access at this point."); + TR_CHTable *chTable = compiler->getCHTable(); + + if (prologue_address && compiler->getOption(TR_TraceCG)) + { + trfprintf( + this->getCompilation()->getOutFile(), + "\n" + "MJIT:%s", + compilee->signature(compiler->trMemory())); + } + + compiler->cg()->getCodeCache()->trimCodeMemoryAllocation(buffer, (cursor-buffer)); + + // END MICROJIT + if(compiler->getOption(TR_TraceCG)){ + trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiled %s Successfully!\n------------------------------\n", signature); + } + + } + + logCompilationSuccess(vmThread, vm, method, scratchSegmentProvider, compilee, compiler, metaData, optimizationPlan); + + TRIGGER_J9HOOK_JIT_COMPILING_END(_jitConfig->hookInterface, vmThread, method); + } + catch (const std::exception &e) + { + const char *exceptionName; + +#if defined(J9ZOS390) + // Compiling with -Wc,lp64 results in a crash on z/OS when trying + // to call the what() virtual method of the exception. + exceptionName = "std::exception"; +#else + exceptionName = e.what(); +#endif + + printCompFailureInfo(compiler, exceptionName); + processException(vmThread, scratchSegmentProvider, compiler, haveLockedClassUnloadMonitor, exceptionName); + if (codeCache) + { + if(estimated_size && buffer) + { + codeCache->trimCodeMemoryAllocation(buffer, 1); + } + TR::CodeCacheManager::instance()->unreserveCodeCache(codeCache); + } + metaData = 0; + } + + // At this point the compilation has either succeeded and compilation cannot be + // interrupted anymore, or it has failed. In either case _compilationShouldBeinterrupted flag + // is not needed anymore + setCompilationShouldBeInterrupted(0); + + // We should not have the classTableMutex at this point + TR_ASSERT_FATAL(!TR::MonitorTable::get()->getClassTableMutex()->owned_by_self(), + "Should not still own classTableMutex"); + + // Increment the number of JIT compilations (either successful or not) + // performed by this compilation thread + // + incNumJITCompilations(); + + return metaData; + } +#endif /* J9VM_OPT_MICROJIT */ + // This routine should only be called from wrappedCompile TR_MethodMetaData * TR::CompilationInfoPerThreadBase::compile( @@ -10919,6 +11340,13 @@ TR::CompilationInfoPerThreadBase::processException( _methodBeingCompiled->_compErrCode = compilationHeapLimitExceeded; } /* Allocation Exceptions End */ +#if defined(J9VM_OPT_MICROJIT) + catch (const MJIT::MJITCompilationFailure &e) + { + shouldProcessExceptionCommonTasks = false; + _methodBeingCompiled->_compErrCode = mjitCompilationFailure; + } +#endif /* IL Gen Exceptions Start */ catch (const J9::AOTHasInvokeHandle &e) diff --git a/runtime/compiler/control/CompilationThread.hpp b/runtime/compiler/control/CompilationThread.hpp index c12ddf76f86..972cf8e2028 100644 --- a/runtime/compiler/control/CompilationThread.hpp +++ b/runtime/compiler/control/CompilationThread.hpp @@ -157,6 +157,11 @@ class CompilationInfoPerThreadBase TR_MethodMetaData *getMetadata() {return _metadata;} void setMetadata(TR_MethodMetaData *m) {_metadata = m;} void *compile(J9VMThread *context, TR_MethodToBeCompiled *entry, J9::J9SegmentProvider &scratchSegmentProvider); +#if defined(J9VM_OPT_MICROJIT) + TR_MethodMetaData *mjit(J9VMThread *context, TR::Compilation *, + TR_ResolvedMethod *compilee, TR_J9VMBase &, TR_OptimizationPlan*, TR::SegmentAllocator const &scratchSegmentProvider, + TR_Memory *trMemory); +#endif TR_MethodMetaData *compile(J9VMThread *context, TR::Compilation *, TR_ResolvedMethod *compilee, TR_J9VMBase &, TR_OptimizationPlan*, TR::SegmentAllocator const &scratchSegmentProvider); TR_MethodMetaData *performAOTLoad(J9VMThread *context, TR::Compilation *, TR_ResolvedMethod *compilee, TR_J9VMBase *vm, J9Method *method); diff --git a/runtime/compiler/control/HookedByTheJit.cpp b/runtime/compiler/control/HookedByTheJit.cpp index 34121579404..c0be1163d06 100644 --- a/runtime/compiler/control/HookedByTheJit.cpp +++ b/runtime/compiler/control/HookedByTheJit.cpp @@ -610,6 +610,15 @@ static void jitHookInitializeSendTarget(J9HookInterface * * hook, UDATA eventNum TR_VerboseLog::writeLineLocked(TR_Vlog_PERF, "Setting count=%d for %s", count, buffer); } +#if defined(J9VM_OPT_MICROJIT) + if(optionsJIT->_mjitEnabled) + { + int32_t mjitCount = optionsJIT->_mjitInitialCount; + if(count) + TR::CompilationInfo::setInitialMJITCountUnsynchronized(method,mjitCount); + } +#endif + TR::CompilationInfo::setInitialInvocationCountUnsynchronized(method,count); if (TR::Options::getJITCmdLineOptions()->getOption(TR_DumpInitialMethodNamesAndCounts) || TR::Options::getAOTCmdLineOptions()->getOption(TR_DumpInitialMethodNamesAndCounts)) diff --git a/runtime/compiler/control/J9Options.cpp b/runtime/compiler/control/J9Options.cpp index 1e3b0fdfb81..ba076861edf 100644 --- a/runtime/compiler/control/J9Options.cpp +++ b/runtime/compiler/control/J9Options.cpp @@ -278,6 +278,11 @@ int32_t J9::Options::_dltPostponeThreshold = 2; int32_t J9::Options::_expensiveCompWeight = TR::CompilationInfo::JSR292_WEIGHT; int32_t J9::Options::_jProfilingEnablementSampleThreshold = 10000; +#if defined(J9VM_OPT_MICROJIT) +int32_t J9::Options::_mjitEnabled = 0; +int32_t J9::Options::_mjitInitialCount = 20; +#endif + bool J9::Options::_aggressiveLockReservation = false; //************************************************************************ @@ -919,6 +924,12 @@ TR::OptionTable OMR::Options::_feOptions[] = { TR::Options::setStaticNumeric, (intptr_t)&TR::Options::_minSamplingPeriod, 0, "P%d", NOT_IN_SUBSET}, {"minSuperclassArraySize=", "I\t set the size of the minimum superclass array size", TR::Options::setStaticNumeric, (intptr_t)&TR::Options::_minimumSuperclassArraySize, 0, "F%d", NOT_IN_SUBSET}, +#if defined(J9VM_OPT_MICROJIT) + {"mjitCount=", "C\tnumber of invocations before MicroJIT compiles methods without loops", + TR::Options::setStaticNumeric, (intptr_t)&TR::Options::_mjitInitialCount, 0, "F%d"}, + {"mjitEnabled=", "C\tenable Microjit (set to 1 to enable, default=0)", + TR::Options::setStaticNumeric, (intptr_t)&TR::Options::_mjitEnabled, 0, "F%d"}, +#endif {"noregmap", 0, RESET_JITCONFIG_RUNTIME_FLAG(J9JIT_CG_REGISTER_MAPS) }, {"numCodeCachesOnStartup=", "R\tnumber of code caches to create at startup", TR::Options::setStaticNumeric, (intptr_t)&TR::Options::_numCodeCachesToCreateAtStartup, 0, "F%d", NOT_IN_SUBSET}, diff --git a/runtime/compiler/control/J9Options.hpp b/runtime/compiler/control/J9Options.hpp index b7c651470b7..a15592dd35e 100644 --- a/runtime/compiler/control/J9Options.hpp +++ b/runtime/compiler/control/J9Options.hpp @@ -333,6 +333,15 @@ class OMR_EXTENSIBLE Options : public OMR::OptionsConnector static int32_t _expensiveCompWeight; // weight of a comp request to be considered expensive static int32_t _jProfilingEnablementSampleThreshold; +#if defined(J9VM_OPT_MICROJIT) + static int32_t getMJITEnabled() { return _mjitEnabled; } + static int32_t _mjitEnabled; + + + static int32_t getMJITInitialCount() { return _mjitInitialCount; } + static int32_t _mjitInitialCount; +#endif + static bool _aggressiveLockReservation; static void printPID(); diff --git a/runtime/compiler/control/J9Recompilation.cpp b/runtime/compiler/control/J9Recompilation.cpp index 0e18e0c0c4c..781d632f019 100644 --- a/runtime/compiler/control/J9Recompilation.cpp +++ b/runtime/compiler/control/J9Recompilation.cpp @@ -497,7 +497,7 @@ J9::Recompilation::getExistingMethodInfo(TR_ResolvedMethod *method) /** * This method can extract a value profiler from the current list of * recompilation profilers. - * + * * \return The first TR_ValueProfiler in the current list of profilers, NULL if there are none. */ TR_ValueProfiler * @@ -739,7 +739,7 @@ TR_PersistentMethodInfo::setForSharedInfo(TR_PersistentProfileInfo** ptr, TR_Per // Before it can be accessed, inc ref count on new info if (newInfo) TR_PersistentProfileInfo::incRefCount(newInfo); - + // Update ptr as if it was unlocked // Doesn't matter what the old info was, as long as it was unlocked do { diff --git a/runtime/compiler/control/RecompilationInfo.hpp b/runtime/compiler/control/RecompilationInfo.hpp index b9f2a44fe3f..4eaab7564be 100644 --- a/runtime/compiler/control/RecompilationInfo.hpp +++ b/runtime/compiler/control/RecompilationInfo.hpp @@ -1,5 +1,5 @@ /******************************************************************************* - * Copyright (c) 2000, 2021 IBM Corp. and others + * Copyright (c) 2000, 2022 IBM Corp. and others * * This program and the accompanying materials are made available under * the terms of the Eclipse Public License 2.0 which accompanies this @@ -44,6 +44,9 @@ class TR_FrontEnd; class TR_OpaqueMethodBlock; class TR_OptimizationPlan; class TR_ResolvedMethod; +#if defined(J9VM_OPT_MICROJIT) +namespace MJIT { class CodeGenerator; } +#endif namespace TR { class Instruction; } namespace TR { class SymbolReference; } @@ -321,6 +324,9 @@ class TR_PersistentJittedBodyInfo friend class TR_EmilyPersistentJittedBodyInfo; friend class ::OMR::Options; friend class J9::Options; +#if defined(J9VM_OPT_MICROJIT) + friend class MJIT::CodeGenerator; +#endif #if defined(TR_HOST_X86) || defined(TR_HOST_POWER) || defined(TR_HOST_S390) || defined(TR_HOST_ARM) || defined(TR_HOST_ARM64) friend void fixPersistentMethodInfo(void *table, bool isJITClientAOTLoad); @@ -332,6 +338,10 @@ class TR_PersistentJittedBodyInfo static TR_PersistentJittedBodyInfo *get(void *startPC); bool getHasLoops() { return _flags.testAny(HasLoops); } +#if defined(J9VM_OPT_MICROJIT) + bool isMJITCompiledMethod() { return _flags.testAny(IsMJITCompiled); } + void setIsMJITCompiledMethod(bool b) { _flags.set(IsMJITCompiled, b); } +#endif /* defined(J9VM_OPT_MICROJIT) */ bool getUsesPreexistence() { return _flags.testAny(UsesPreexistence); } bool getDisableSampling() { return _flags.testAny(DisableSampling); } void setDisableSampling(bool b) { _flags.set(DisableSampling, b); } @@ -410,7 +420,9 @@ class TR_PersistentJittedBodyInfo enum { HasLoops = 0x0001, - //HasManyIterationsLoops = 0x0002, // Available +#if defined(J9VM_OPT_MICROJIT) + IsMJITCompiled = 0x0002, +#endif /* defined(J9VM_OPT_MICROJIT) */ UsesPreexistence = 0x0004, DisableSampling = 0x0008, // This flag disables sampling of this method even though its recompilable IsProfilingBody = 0x0010, diff --git a/runtime/compiler/control/rossa.cpp b/runtime/compiler/control/rossa.cpp index c6be8c1c9e5..ecfcd502843 100644 --- a/runtime/compiler/control/rossa.cpp +++ b/runtime/compiler/control/rossa.cpp @@ -204,13 +204,16 @@ char *compilationErrorNames[]={ "compilationAotPatchedCPConstant", // 45 "compilationAotHasInvokeSpecialInterface", // 46 "compilationRelocationFailure", // 47 +#if defined(J9VM_OPT_MICROJIT) + "mjitCompilationFailure", // 48 +#endif /* defined(J9VM_OPT_MICROJIT) */ #if defined(J9VM_OPT_JITSERVER) - "compilationStreamFailure", // compilationFirstJITServerFailure = 48 - "compilationStreamLostMessage", // compilationFirstJITServerFailure + 1 = 49 - "compilationStreamMessageTypeMismatch", // compilationFirstJITServerFailure + 2 = 50 - "compilationStreamVersionIncompatible", // compilationFirstJITServerFailure + 3 = 51 - "compilationStreamInterrupted", // compilationFirstJITServerFailure + 4 = 52 - "aotCacheDeserializationFailure", // compilationFirstJITServerFailure + 5 = 53 + "compilationStreamFailure", // compilationFirstJITServerFailure = 48 or 49 + "compilationStreamLostMessage", // compilationFirstJITServerFailure + 1 = 49 or 50 + "compilationStreamMessageTypeMismatch", // compilationFirstJITServerFailure + 2 = 50 or 51 + "compilationStreamVersionIncompatible", // compilationFirstJITServerFailure + 3 = 51 or 52 + "compilationStreamInterrupted", // compilationFirstJITServerFailure + 4 = 52 or 53 + "aotCacheDeserializationFailure", // compilationFirstJITServerFailure + 5 = 53 or 54 #endif /* defined(J9VM_OPT_JITSERVER) */ "compilationMaxError" }; diff --git a/runtime/compiler/control/rossa.h b/runtime/compiler/control/rossa.h index 03553736a0c..2b054a680f6 100644 --- a/runtime/compiler/control/rossa.h +++ b/runtime/compiler/control/rossa.h @@ -74,14 +74,17 @@ typedef enum { compilationAotPatchedCPConstant = 45, compilationAotHasInvokeSpecialInterface = 46, compilationRelocationFailure = 47, +#if defined(J9VM_OPT_MICROJIT) + mjitCompilationFailure = 48, +#endif /* defined(J9VM_OPT_MICROJIT) */ #if defined(J9VM_OPT_JITSERVER) compilationFirstJITServerFailure, - compilationStreamFailure = compilationFirstJITServerFailure, // 48 - compilationStreamLostMessage = compilationFirstJITServerFailure + 1, // 49 - compilationStreamMessageTypeMismatch = compilationFirstJITServerFailure + 2, // 50 - compilationStreamVersionIncompatible = compilationFirstJITServerFailure + 3, // 51 - compilationStreamInterrupted = compilationFirstJITServerFailure + 4, // 52 - aotCacheDeserializationFailure = compilationFirstJITServerFailure + 5, // 53 + compilationStreamFailure = compilationFirstJITServerFailure, // 48 or 49 + compilationStreamLostMessage = compilationFirstJITServerFailure + 1, // 49 or 50 + compilationStreamMessageTypeMismatch = compilationFirstJITServerFailure + 2, // 50 or 51 + compilationStreamVersionIncompatible = compilationFirstJITServerFailure + 3, // 51 or 52 + compilationStreamInterrupted = compilationFirstJITServerFailure + 4, // 52 or 53 + aotCacheDeserializationFailure = compilationFirstJITServerFailure + 5, // 53 or 54 #endif /* defined(J9VM_OPT_JITSERVER) */ /* must be the last one */ diff --git a/runtime/compiler/env/VMJ9.h b/runtime/compiler/env/VMJ9.h index 60d869cbfd9..9804142df02 100644 --- a/runtime/compiler/env/VMJ9.h +++ b/runtime/compiler/env/VMJ9.h @@ -1542,6 +1542,9 @@ struct TR_PCMap typedef J9JITExceptionTable TR_MethodMetaData; TR_MethodMetaData * createMethodMetaData(TR_J9VMBase &, TR_ResolvedMethod *, TR::Compilation *); +#if defined(J9VM_OPT_MICROJIT) +TR_MethodMetaData * createMJITMethodMetaData(TR_J9VMBase &, TR_ResolvedMethod *, TR::Compilation *); +#endif extern J9JITConfig * jitConfig; diff --git a/runtime/compiler/infra/J9Cfg.cpp b/runtime/compiler/infra/J9Cfg.cpp index 5ce30a81e5c..c2393f4c4cf 100644 --- a/runtime/compiler/infra/J9Cfg.cpp +++ b/runtime/compiler/infra/J9Cfg.cpp @@ -1,5 +1,5 @@ /******************************************************************************* - * Copyright (c) 2000, 2021 IBM Corp. and others + * Copyright (c) 2000, 2022 IBM Corp. and others * * This program and the accompanying materials are made available under * the terms of the Eclipse Public License 2.0 which accompanies this @@ -33,6 +33,9 @@ #include "cs2/arrayof.h" #include "cs2/bitvectr.h" #include "cs2/listof.h" +#if defined(J9VM_OPT_MICROJIT) +#include "env/j9method.h" +#endif /* defined(J9VM_OPT_MICROJIT) */ #include "env/TRMemory.hpp" #include "env/PersistentInfo.hpp" #include "env/VMJ9.h" @@ -47,6 +50,9 @@ #include "il/SymbolReference.hpp" #include "il/TreeTop.hpp" #include "il/TreeTop_inlines.hpp" +#if defined(J9VM_OPT_MICROJIT) +#include "ilgen/J9ByteCodeIterator.hpp" +#endif /* defined(J9VM_OPT_MICROJIT) */ #include "infra/Assert.hpp" #include "infra/Cfg.hpp" #include "infra/Link.hpp" @@ -1707,7 +1713,14 @@ void J9::CFG::propagateFrequencyInfoFromExternalProfiler(TR_ExternalProfiler *profiler) { _externalProfiler = profiler; - +#if defined(J9VM_OPT_MICROJIT) + J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); + U_8 *extendedFlags = reinterpret_cast(comp()->fe())->fetchMethodExtendedFlagsPointer(method); + if( ((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1 && TR::Options::getJITCmdLineOptions()->_mjitEnabled){ + if (!profiler) return; + self()->setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT(); + } +#endif /* J9VM_OPT_MICROJIT */ if (profiler) { self()->setBlockFrequenciesBasedOnInterpreterProfiler(); @@ -1763,3 +1776,741 @@ J9::CFG::emitVerbosePseudoRandomFrequencies() comp()->fej9()->emitNewPseudoRandomVerboseSuffix(); return true; } + + +#if defined(J9VM_OPT_MICROJIT) +/* These methods all use MicroJIT, so we cannot assume all the assumptions about + * availability of meta-data structures (such as IL Trees) that TR can usually make. + * To that end, there are no real tree tops in this CFG, so we must use the exit + * nodes to get the last bytecode index for a given block, and from that get the bytecode. + * Since this will be done no less than 3 times, a helper function is provided. + */ +TR_J9ByteCode +J9::CFG::getBytecodeFromIndex(int32_t index) + { + TR_ResolvedJ9Method *method = static_cast(_method->getResolvedMethod()); + TR_J9VMBase * fe = _compilation->fej9(); + TR_J9ByteCodeIterator bcIterator(_method, method, fe, _compilation); + bcIterator.setIndex(index); + return bcIterator.next(); + } + +TR_J9ByteCode +J9::CFG::getLastRealBytecodeOfBlock(TR::CFGNode *start) + { + return this->getBytecodeFromIndex(start->asBlock()->getExit()->getNode()->getLocalIndex()); + } + +void +J9::CFG::getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken) + { + // We have not Real Tree Tops, so we need to work around that. + TR::Node *node = cfgNode->asBlock()->getExit()->getNode(); + + if (this != comp()->getFlowGraph()) + { + TR::TreeTop *entry = (cfgNode->asBlock()->getNextBlock()) ? cfgNode->asBlock()->getNextBlock()->getEntry() : NULL; + _externalProfiler->getBranchCounters(node, entry, taken, nottaken, comp()); + } + else + { + getBranchCounters(node, cfgNode->asBlock(), taken, nottaken, comp()); + } + + if (*taken || *nottaken) + { + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"If on node %p has branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + } + else if (isVirtualGuard(node)) + { + *taken = 0; + *nottaken = AVG_FREQ; + int32_t sumFreq = summarizeFrequencyFromPredecessors(cfgNode, self()); + + if (sumFreq>0) + *nottaken = sumFreq; + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Guard on node %p has default branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + } + else if (!cfgNode->asBlock()->isCold()) + { + if (node->getBranchDestination()->getNode()->getBlock()->isCold()) + *taken = 0; + else + *taken = LOW_FREQ; + + if (cfgNode->asBlock()->getNextBlock() && + cfgNode->asBlock()->getNextBlock()->isCold()) + *nottaken = 0; + else + *nottaken = LOW_FREQ; + + /*int32_t sumFreq = summarizeFrequencyFromPredecessors(cfgNode, this); + + if (sumFreq>0) + { + *nottaken = sumFreq>>1; + *taken = *nottaken; + } + else + { + if (node->getByteCodeIndex()==0) + { + int32_t callCount = getParentCallCount(this, node); + + // we don't know what the call count is, assign some + // moderate frequency + if (callCount<=0) + callCount = AVG_FREQ; + + *taken = 0; + *nottaken = callCount; + } + } + */ + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"If with no profiling information on node %p has low branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + } + } + +void +J9::CFG::setSwitchEdgeFrequenciesOnNodeMicroJIT(TR::CFGNode *node, TR::Compilation *comp) + { + TR::Block *block = node->asBlock(); + TR::Node *treeNode = node->asBlock()->getExit()->getNode(); + int32_t sumFrequency = _externalProfiler->getSumSwitchCount(treeNode, comp); + + if (sumFrequency < 10) + { + if (comp->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp,"Low count switch I'll set frequencies using uniform edge distribution\n"); + + self()->setUniformEdgeFrequenciesOnNode (node, sumFrequency, false, comp); + return; + } + + if (treeNode->getInlinedSiteIndex() < -1) + { + if (comp->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp,"Dummy switch generated in estimate code size I'll set frequencies using uniform edge distribution\n"); + + self()->setUniformEdgeFrequenciesOnNode (node, sumFrequency, false, comp); + return; + } + + if (_externalProfiler->isSwitchProfileFlat(treeNode, comp)) + { + if (comp->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp,"Flat profile switch, setting average frequency on each case.\n"); + + self()->setUniformEdgeFrequenciesOnNode(node, _externalProfiler->getFlatSwitchProfileCounts(treeNode, comp), false, comp); + return; + } + + for ( int32_t count=1; count < treeNode->getNumChildren(); count++) + { + TR::Node *child = treeNode->getChild(count); + TR::CFGEdge *e = getCFGEdgeForNode(node, child); + + int32_t frequency = _externalProfiler->getSwitchCountForValue (treeNode, (count-1), comp); + e->setFrequency( std::max(frequency,1) ); + if (comp->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp,"Edge %p between %d and %d has freq %d (Switch)\n", e, e->getFrom()->getNumber(), e->getTo()->getNumber(), e->getFrequency()); + } + } + +//TODO: Uncomment this method and implement it using an updated traversal order call that won't look at real tree tops. +/* +TR_BitVector * +J9::CFG::setBlockAndEdgeFrequenciesBasedOnJITProfilerMicroJIT() + { + // Use the setBlockAndEdgeFrequenciesBasedOnJITProfiler above as a guide should this method need to be implemented. + } +*/ + +bool isBranch(TR_J9ByteCode bc) + { + switch(bc) + { + case J9BCifeq: + case J9BCifne: + case J9BCiflt: + case J9BCifge: + case J9BCifgt: + case J9BCifle: + case J9BCificmpeq: + case J9BCificmpne: + case J9BCificmplt: + case J9BCificmpge: + case J9BCificmpgt: + case J9BCificmple: + case J9BCifacmpeq: + case J9BCifacmpne: + case J9BCifnull: + case J9BCifnonnull: + return true; + default: + return false; + } + } + +bool isTableOp(TR_J9ByteCode bc) + { + switch(bc) + { + case J9BClookupswitch: + case J9BCtableswitch: + return true; + default: + return false; + } + } + +bool +J9::CFG::continueLoop(TR::CFGNode *temp) //TODO: Find a better name for this method + { + if((temp->getSuccessors().size() == 1) && !temp->asBlock()->getEntry()) + { + TR_J9ByteCode bc = this->getLastRealBytecodeOfBlock(temp); + return !(isBranch(bc) || isTableOp(bc)); + } + return false; + } + +void +J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() + { + TR::StackMemoryRegion stackMemoryRegion(*trMemory()); + int32_t numBlocks = getNextNodeNumber(); + + TR_BitVector *_seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + TR_BitVector *_seenNodesInCycle = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + _frequencySet = new (trHeapMemory()) TR_BitVector(numBlocks, trMemory(), heapAlloc, notGrowable); + int32_t startFrequency = AVG_FREQ; + int32_t taken = AVG_FREQ; + int32_t nottaken = AVG_FREQ; + int32_t initialCallScanFreq = -1; + int32_t inlinedSiteIndex = -1; + + // Walk until we find if or switch statement + TR::CFGNode *start = getStart(); + TR::CFGNode *temp = start; + TR_ScratchList upStack(trMemory()); + + bool backEdgeExists = false; + TR::TreeTop *methodScanEntry = NULL; + //_maxFrequency = 0; + while (continueLoop(temp)) + { + if (_seenNodes->isSet(temp->getNumber())) + break; + + upStack.add(temp); + _seenNodes->set(temp->getNumber()); + + if (!backEdgeExists && !temp->getPredecessors().empty() && !(temp->getPredecessors().size() == 1)) + { + if (comp()->getHCRMode() != TR::osr) + backEdgeExists = true; + else + { + // Count the number of edges from non OSR blocks + int32_t nonOSREdges = 0; + for (auto edge = temp->getPredecessors().begin(); edge != temp->getPredecessors().end(); ++edge) + { + if ((*edge)->getFrom()->asBlock()->isOSRCodeBlock() || (*edge)->getFrom()->asBlock()->isOSRCatchBlock()) + continue; + if (nonOSREdges > 0) + { + backEdgeExists = true; + break; + } + nonOSREdges++; + } + } + } + + if (temp->asBlock()->getEntry() && initialCallScanFreq < 0) + { + int32_t methodScanFreq = scanForFrequencyOnSimpleMethod(temp->asBlock()->getEntry(), temp->asBlock()->getExit()); + if (methodScanFreq > 0) + initialCallScanFreq = methodScanFreq; + inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); + methodScanEntry = temp->asBlock()->getEntry(); + } + + if (!temp->getSuccessors().empty()) + { + if ((temp->getSuccessors().size() == 2) && temp->asBlock() && temp->asBlock()->getNextBlock()) + { + temp = temp->asBlock()->getNextBlock(); + } + else + temp = temp->getSuccessors().front()->getTo(); + } + else + break; + } + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Propagation start block_%d\n", temp->getNumber()); + + if (temp->asBlock()->getEntry()) + inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); + + if ((temp->getSuccessors().size() == 2) && + temp->asBlock()->getEntry() && + isBranch(getLastRealBytecodeOfBlock(temp))) + { + getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken); + startFrequency = taken + nottaken; + self()->setEdgeFrequenciesOnNode( temp, taken, nottaken, comp()); + } + else if (temp->asBlock()->getEntry() && + isTableOp(this->getBytecodeFromIndex(temp->asBlock()->getExit()->getNode()->getLocalIndex()))) + { + startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getExit()->getNode(), comp()); + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Switch with total frequency of %d\n", startFrequency); + setSwitchEdgeFrequenciesOnNode(temp, comp()); + } + else + { + if (_calledFrequency > 0) + startFrequency = _calledFrequency; + else if (initialCallScanFreq > 0) + startFrequency = initialCallScanFreq; + else + { + startFrequency = AVG_FREQ; + } + + if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && isBranch(getLastRealBytecodeOfBlock(temp))) + self()->setEdgeFrequenciesOnNode( temp, 0, startFrequency, comp()); + else + self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); + } + + setBlockFrequency (temp, startFrequency); + _initialBlockFrequency = startFrequency; + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); + + start = temp; + + // Walk backwards to the start and propagate this frequency + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Propagating start frequency backwards...\n"); + + ListIterator upit(&upStack); + for (temp = upit.getFirst(); temp; temp = upit.getNext()) + { + if (!temp->asBlock()->getEntry()) + continue; + if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) + self()->setEdgeFrequenciesOnNode( temp, 0, startFrequency, comp()); + else + self()->setUniformEdgeFrequenciesOnNode( temp, startFrequency, false, comp()); + setBlockFrequency (temp, startFrequency); + _seenNodes->set(temp->getNumber()); + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); + } + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Propagating block frequency forward...\n"); + + // Walk reverse post-order + // we start at the first if or switch statement + TR_ScratchList stack(trMemory()); + stack.add(start); + + while (!stack.isEmpty()) + { + TR::CFGNode *node = stack.popHead(); + + if (comp()->getOption(TR_TraceBFGeneration)) + traceMsg(comp(), "Considering block_%d\n", node->getNumber()); + + if (_seenNodes->isSet(node->getNumber())) + continue; + + _seenNodes->set(node->getNumber()); + + if (!node->asBlock()->getEntry()) + continue; + + inlinedSiteIndex = node->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); + + if (node->asBlock()->isCold()) + { + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Analyzing COLD block_%d\n", node->getNumber()); + //node->asBlock()->setFrequency(0); + int32_t freq = node->getFrequency(); + if ((node->getSuccessors().size() == 2) && (freq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) + self()->setEdgeFrequenciesOnNode( node, 0, freq, comp()); + else + self()->setUniformEdgeFrequenciesOnNode(node, freq, false, comp()); + setBlockFrequency (node, freq); + //continue; + } + + // ignore the first node + if (!node->asBlock()->isCold() && (node != start)) + { + if ((node->getSuccessors().size() == 2) && + node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) + { + // if it's an if and it's not JIT inserted artificial if then just + // read the outgoing branches and put that as node frequency + // else get frequency from predecessors + if (!isVirtualGuard(node->asBlock()->getLastRealTreeTop()->getNode()) && + !node->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) + { + _seenNodesInCycle->empty(); + getInterpreterProfilerBranchCountersOnDoubleton(node, &taken, ¬taken); + + if ((taken <= 0) && (nottaken <= 0)) + { + taken = LOW_FREQ; + nottaken = LOW_FREQ; + } + + self()->setEdgeFrequenciesOnNode( node, taken, nottaken, comp()); + setBlockFrequency (node, taken + nottaken); + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"If on node %p is not guard I'm using the taken and nottaken counts for producing block frequency\n", node->asBlock()->getLastRealTreeTop()->getNode()); + } + else + { + if (comp()->getOption(TR_TraceBFGeneration)) + { + TR_PredecessorIterator pit(node); + for (TR::CFGEdge *edge = pit.getFirst(); edge; edge = pit.getNext()) + { + TR::CFGNode *pred = edge->getFrom(); + + if (edge->getFrequency()<=0) + continue; + + int32_t edgeFreq = edge->getFrequency(); + + if (comp()->getOption(TR_TraceBFGeneration)) + traceMsg(comp(), "11Pred %d has freq %d\n", pred->getNumber(), edgeFreq); + } + } + + TR::CFGNode *predNotSet = NULL; + TR_PredecessorIterator pit(node); + for (TR::CFGEdge *edge = pit.getFirst(); edge; edge = pit.getNext()) + { + TR::CFGNode *pred = edge->getFrom(); + + if (pred->getFrequency()< 0) + { + predNotSet = pred; + break; + } + } + + if (predNotSet && + !_seenNodesInCycle->get(node->getNumber())) + { + stack.add(predNotSet); + _seenNodesInCycle->set(node->getNumber()); + _seenNodes->reset(node->getNumber()); + continue; + } + + if (!predNotSet) + _seenNodesInCycle->empty(); + + int32_t sumFreq = summarizeFrequencyFromPredecessors(node, self()); + if (sumFreq <= 0) + sumFreq = AVG_FREQ; + self()->setEdgeFrequenciesOnNode( node, 0, sumFreq, comp()); + setBlockFrequency (node, sumFreq); + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"If on node %p is guard I'm using the predecessor frequency sum\n", node->asBlock()->getLastRealTreeTop()->getNode()); + } + + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); + } + else if (node->asBlock()->getEntry() && + (node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || + node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) + { + _seenNodesInCycle->empty(); + int32_t sumFreq = _externalProfiler->getSumSwitchCount(node->asBlock()->getLastRealTreeTop()->getNode(), comp()); + setSwitchEdgeFrequenciesOnNode(node, comp()); + setBlockFrequency (node, sumFreq); + if (comp()->getOption(TR_TraceBFGeneration)) + { + dumpOptDetails(comp(),"Found a Switch statement at exit of block_%d\n", node->getNumber()); + dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); + } + } + else + { + TR::CFGNode *predNotSet = NULL; + int32_t sumFreq = 0; + TR_PredecessorIterator pit(node); + for (TR::CFGEdge *edge = pit.getFirst(); edge; edge = pit.getNext()) + { + TR::CFGNode *pred = edge->getFrom(); + + if (pred->getFrequency()< 0) + { + predNotSet = pred; + break; + } + + int32_t edgeFreq = edge->getFrequency(); + sumFreq += edgeFreq; + } + + if (predNotSet && + !_seenNodesInCycle->get(node->getNumber())) + { + _seenNodesInCycle->set(node->getNumber()); + _seenNodes->reset(node->getNumber()); + stack.add(predNotSet); + continue; + } + else + { + if (!predNotSet) + _seenNodesInCycle->empty(); + + if (comp()->getOption(TR_TraceBFGeneration)) + traceMsg(comp(), "2Setting block and uniform freqs\n"); + + if ((node->getSuccessors().size() == 2) && (sumFreq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) + self()->setEdgeFrequenciesOnNode( node, 0, sumFreq, comp()); + else + self()->setUniformEdgeFrequenciesOnNode( node, sumFreq, false, comp()); + setBlockFrequency (node, sumFreq); + } + + if (comp()->getOption(TR_TraceBFGeneration)) + { + dumpOptDetails(comp(),"Not an if (or unknown if) at exit of block %d (isSingleton=%d)\n", node->getNumber(), (node->getSuccessors().size() == 1)); + dumpOptDetails(comp(),"Set frequency of %d on block %d\n", node->asBlock()->getFrequency(), node->getNumber()); + } + } + } + + TR_SuccessorIterator sit(node); + // add the successors on the list + for (TR::CFGEdge *edge = sit.getFirst(); edge; edge = sit.getNext()) + { + TR::CFGNode *succ = edge->getTo(); + + if (!_seenNodes->isSet(succ->getNumber())) + stack.add(succ); + else + { + if (!(((succ->getSuccessors().size() == 2) && + succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) || + (succ->asBlock()->getEntry() && + (succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || + succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)))) + { + setBlockFrequency ( succ, edge->getFrequency(), true); + + // addup this edge to the frequency of the blocks following it + // propagate downward until you reach a block that doesn't end in + // goto or cycle + if (succ->getSuccessors().size() == 1) + { + TR::CFGNode *tempNode = succ->getSuccessors().front()->getTo(); + TR_BitVector *_seenGotoNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + _seenGotoNodes->set(succ->getNumber()); + while ((tempNode->getSuccessors().size() == 1) && + (tempNode != succ) && + (tempNode != getEnd()) && + !_seenGotoNodes->isSet(tempNode->getNumber())) + { + TR::CFGNode *nextTempNode = tempNode->getSuccessors().front()->getTo(); + + if (comp()->getOption(TR_TraceBFGeneration)) + traceMsg(comp(), "3Setting block and uniform freqs\n"); + + self()->setUniformEdgeFrequenciesOnNode( tempNode, edge->getFrequency(), true, comp()); + setBlockFrequency (tempNode, edge->getFrequency(), true); + + _seenGotoNodes->set(tempNode->getNumber()); + tempNode = nextTempNode; + } + } + } + } + } + } + + + if (backEdgeExists) + { + while (!upStack.isEmpty()) + { + TR::CFGNode *temp = upStack.popHead(); + if (temp == start) + continue; + + if (backEdgeExists) + temp->setFrequency(std::max(MAX_COLD_BLOCK_COUNT+1, startFrequency/10)); + } + } + + + // now set all the blocks that haven't been seen to have 0 frequency + // catch blocks are one example, we don't try to set frequency there to + // save compilation time, since those are most frequently cold. Future might prove + // this otherwise. + + TR::CFGNode *node; + start = getStart(); + for (node = getFirstNode(); node; node=node->getNext()) + { + if ((node == getEnd()) || (node == start)) + node->asBlock()->setFrequency(0); + + if (_seenNodes->isSet(node->getNumber())) + continue; + + if (node->asBlock()->getEntry() && + (node->asBlock()->getFrequency()<0)) + node->asBlock()->setFrequency(0); + } + + //scaleEdgeFrequencies(); + + TR_BitVector *nodesToBeNormalized = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + nodesToBeNormalized->setAll(numBlocks); + _maxFrequency = -1; + _maxEdgeFrequency = -1; + normalizeFrequencies(nodesToBeNormalized); + _oldMaxFrequency = _maxFrequency; + _oldMaxEdgeFrequency = _maxEdgeFrequency; + + if (comp()->getOption(TR_TraceBFGeneration)) + traceMsg(comp(), "max freq %d max edge freq %d\n", _maxFrequency, _maxEdgeFrequency); + + } + +void +J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compilation *comp) + { + TR_ExternalProfiler* profiler = comp->fej9()->hasIProfilerBlockFrequencyInfo(*comp); + if (!profiler) + { + _initialBlockFrequency = AVG_FREQ; + return; + } + + _externalProfiler = profiler; + + TR::StackMemoryRegion stackMemoryRegion(*trMemory()); + int32_t numBlocks = getNextNodeNumber(); + + TR_BitVector *_seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + _frequencySet = new (trHeapMemory()) TR_BitVector(numBlocks, trMemory(), heapAlloc, notGrowable); + int32_t startFrequency = AVG_FREQ; + int32_t taken = AVG_FREQ; + int32_t nottaken = AVG_FREQ; + int32_t initialCallScanFreq = -1; + int32_t inlinedSiteIndex = -1; + + // Walk until we find if or switch statement + TR::CFGNode *start = getStart(); + TR::CFGNode *temp = start; + + bool backEdgeExists = false; + TR::TreeTop *methodScanEntry = NULL; + + while ( (temp->getSuccessors().size() == 1) || + !temp->asBlock()->getEntry() || + !((temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() + && !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) || + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) + { + if (_seenNodes->isSet(temp->getNumber())) + break; + + _seenNodes->set(temp->getNumber()); + + if (!temp->getPredecessors().empty() && !(temp->getPredecessors().size() == 1)) + backEdgeExists = true; + + if (temp->asBlock()->getEntry() && initialCallScanFreq < 0) + { + int32_t methodScanFreq = scanForFrequencyOnSimpleMethod(temp->asBlock()->getEntry(), temp->asBlock()->getExit()); + if (methodScanFreq > 0) + initialCallScanFreq = methodScanFreq; + inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); + methodScanEntry = temp->asBlock()->getEntry(); + } + + if (!temp->getSuccessors().empty()) + { + if ((temp->getSuccessors().size() == 2) && temp->asBlock() && temp->asBlock()->getNextBlock()) + { + temp = temp->asBlock()->getNextBlock(); + } + else + temp = temp->getSuccessors().front()->getTo(); + } + else + break; + } + + if (temp->asBlock()->getEntry()) + inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); + + if ((temp->getSuccessors().size() == 2) && + temp->asBlock()->getEntry() && + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() && + !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) + { + getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken); + if ((taken <= 0) && (nottaken <= 0)) + { + taken = LOW_FREQ; + nottaken = LOW_FREQ; + } + startFrequency = taken + nottaken; + } + else if (temp->asBlock()->getEntry() && + (temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) + { + startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getLastRealTreeTop()->getNode(), comp); + } + else + { + if (_calledFrequency > 0) + startFrequency = _calledFrequency; + else if (initialCallScanFreq > 0) + startFrequency = initialCallScanFreq; + else + { + startFrequency = AVG_FREQ; + } + } + + if (startFrequency <= 0) + startFrequency = LOW_FREQ; + + _initialBlockFrequency = startFrequency; + } +#endif /* defined(J9VM_OPT_MICROJIT) */ \ No newline at end of file diff --git a/runtime/compiler/infra/J9Cfg.hpp b/runtime/compiler/infra/J9Cfg.hpp index 5f09b558eda..a863168f36c 100644 --- a/runtime/compiler/infra/J9Cfg.hpp +++ b/runtime/compiler/infra/J9Cfg.hpp @@ -1,5 +1,5 @@ /******************************************************************************* - * Copyright (c) 2000, 2019 IBM Corp. and others + * Copyright (c) 2000, 2022 IBM Corp. and others * * This program and the accompanying materials are made available under * the terms of the Eclipse Public License 2.0 which accompanies this @@ -39,6 +39,9 @@ namespace J9 { typedef J9::CFG CFGConnector; } #include "cs2/listof.h" #include "env/TRMemory.hpp" #include "il/Node.hpp" +#if defined(J9VM_OPT_MICROJIT) +#include "ilgen/J9ByteCode.hpp" +#endif /* defined(J9VM_OPT_MICROJIT) */ #include "infra/Assert.hpp" #include "infra/List.hpp" #include "infra/TRCfgEdge.hpp" @@ -88,6 +91,21 @@ class CFG : public OMR::CFGConnector TR_BitVector *setBlockAndEdgeFrequenciesBasedOnJITProfiler(); void setBlockFrequenciesBasedOnInterpreterProfiler(); void computeInitialBlockFrequencyBasedOnExternalProfiler(TR::Compilation *comp); + // May need MicroJIT versions of each of previous 3 methods in case they are called from outside MicroJIT on MJIT methods + #if defined(J9VM_OPT_MICROJIT) + // A number of methods here differ from their TR counter parts only in whether the TR::Node + // used for information about bytecode is in getLastRealTreeTop(), or getExit() on the asBlock() + // of the cfgNode. Once this is working, first clean-up step will be to have these methods + // take the node as a parameter instead, to reduce code duplication. + TR_J9ByteCode getBytecodeFromIndex(int32_t index); // Helper for getting bytecode from an index + TR_J9ByteCode getLastRealBytecodeOfBlock(TR::CFGNode *start); // Helper for getting last byteode in the CFGNode's block + //TR_BitVector * setBlockAndEdgeFrequenciesBasedOnJITProfilerMicroJIT(); Uncomment after implementing in the cpp file. + void getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken); + void setSwitchEdgeFrequenciesOnNodeMicroJIT(TR::CFGNode *node, TR::Compilation *comp); + bool continueLoop(TR::CFGNode *temp); + void setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT(); + void computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compilation *comp); + #endif /* defined(J9VM_OPT_MICROJIT) */ void propagateFrequencyInfoFromExternalProfiler(TR_ExternalProfiler *profiler); void getInterpreterProfilerBranchCountersOnDoubleton(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken); void setSwitchEdgeFrequenciesOnNode(TR::CFGNode *node, TR::Compilation *comp); diff --git a/runtime/compiler/microjit/ByteCodeIterator.hpp b/runtime/compiler/microjit/ByteCodeIterator.hpp new file mode 100644 index 00000000000..572ae3e327b --- /dev/null +++ b/runtime/compiler/microjit/ByteCodeIterator.hpp @@ -0,0 +1,58 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ + + +#ifndef MJIT_BYTECODEITERATOR_INCL +#define MJIT_BYTECODEITERATOR_INCL +#include "ilgen/J9ByteCodeIterator.hpp" + +namespace MJIT { + class ByteCodeIterator : public TR_J9ByteCodeIterator + { + + public: + ByteCodeIterator(TR::ResolvedMethodSymbol *methodSymbol, TR_ResolvedJ9Method *method, TR_J9VMBase * fe, TR::Compilation * comp) : + TR_J9ByteCodeIterator(methodSymbol, method, fe, comp) + {}; + + void printByteCodes() + { + this->printByteCodePrologue(); + for (TR_J9ByteCode bc = this->first(); bc != J9BCunknown; bc = this->next()) + { + this->printByteCode(); + } + this->printByteCodeEpilogue(); + } + + const char * currentMnemonic() { + uint8_t opcode = nextByte(0); + return ((TR_J9VMBase *)fe())->getByteCodeName(opcode); + } + + uint8_t currentOpcode() { + return nextByte(0); + } + + }; +} +#endif /* MJIT_BYTECODEITERATOR_INCL */ \ No newline at end of file diff --git a/runtime/compiler/microjit/CMakeLists.txt b/runtime/compiler/microjit/CMakeLists.txt new file mode 100644 index 00000000000..d52eb7c3e4f --- /dev/null +++ b/runtime/compiler/microjit/CMakeLists.txt @@ -0,0 +1,39 @@ +################################################################################ +# Copyright (c) 2022, 2022 IBM Corp. and others +# +# This program and the accompanying materials are made available under +# the terms of the Eclipse Public License 2.0 which accompanies this +# distribution and is available at https://www.eclipse.org/legal/epl-2.0/ +# or the Apache License, Version 2.0 which accompanies this distribution and +# is available at https://www.apache.org/licenses/LICENSE-2.0. +# +# This Source Code may also be made available under the following +# Secondary Licenses when the conditions for such availability set +# forth in the Eclipse Public License, v. 2.0 are satisfied: GNU +# General Public License, version 2 with the GNU Classpath +# Exception [1] and GNU General Public License, version 2 with the +# OpenJDK Assembly Exception [2]. +# +# [1] https://www.gnu.org/software/classpath/license.html +# [2] http://openjdk.java.net/legal/assembly-exception.html +# +# SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception +################################################################################ +if(J9VM_OPT_MJIT_Standalone) +j9jit_files( + microjit/ExceptionTable.cpp + microjit/assembly/utils.nasm + microjit/LWMetaDataCreation.cpp +) +else() +j9jit_files( + microjit/ExceptionTable.cpp + microjit/assembly/utils.nasm + microjit/HWMetaDataCreation.cpp +) +endif() +if(TR_HOST_ARCH STREQUAL "x") + if(TR_TARGET_ARCH STREQUAL "x") + add_subdirectory(x) + endif() +endif() \ No newline at end of file diff --git a/runtime/compiler/microjit/ExceptionTable.cpp b/runtime/compiler/microjit/ExceptionTable.cpp new file mode 100644 index 00000000000..2bea77223e8 --- /dev/null +++ b/runtime/compiler/microjit/ExceptionTable.cpp @@ -0,0 +1,80 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at http://eclipse.org/legal/epl-2.0 + * or the Apache License, Version 2.0 which accompanies this distribution + * and is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following Secondary + * Licenses when the conditions for such availability set forth in the + * Eclipse Public License, v. 2.0 are satisfied: GNU General Public License, + * version 2 with the GNU Classpath Exception [1] and GNU General Public + * License, version 2 with the OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#include "microjit/ExceptionTable.hpp" + +#include +#include "compile/Compilation.hpp" +#include "env/TRMemory.hpp" +#include "env/jittypes.h" +#include "il/Block.hpp" +#include "il/Node.hpp" +#include "il/TreeTop.hpp" +#include "il/TreeTop_inlines.hpp" +#include "infra/Array.hpp" +#include "infra/Assert.hpp" +#include "infra/List.hpp" +#include "infra/CfgEdge.hpp" + +class TR_ResolvedMethod; + +//TODO: This Should create an Exception table using only information MicroJIT has. +MJIT_ExceptionTableEntryIterator::MJIT_ExceptionTableEntryIterator(TR::Compilation *comp) + : _compilation(comp) + { + int32_t i = 1; + _tableEntries = (TR_Array > *)comp->trMemory()->allocateHeapMemory(sizeof(TR_Array >)*i); + for (int32_t j = 0; j < i; ++j) + _tableEntries[j].init(comp->trMemory()); + + // lookup catch blocks + // + + // for each exception block: + // create exception ranges from the list of exception predecessors + // + } + +//TODO: This Should create an Exception table entry using only information MicroJIT has. +void +MJIT_ExceptionTableEntryIterator::addSnippetRanges( + List & tableEntries, TR::Block * snippetBlock, TR::Block * catchBlock, uint32_t catchType, + TR_ResolvedMethod * method, TR::Compilation *comp) + { + //TODO: create a snippet for each exception and add it to the correct place. + } + +TR_ExceptionTableEntry * +MJIT_ExceptionTableEntryIterator::getFirst() + { + return NULL; + } + +TR_ExceptionTableEntry * +MJIT_ExceptionTableEntryIterator::getNext() + { + return NULL; + } + +uint32_t +MJIT_ExceptionTableEntryIterator::size() + { + return 0; + } diff --git a/runtime/compiler/microjit/ExceptionTable.hpp b/runtime/compiler/microjit/ExceptionTable.hpp new file mode 100644 index 00000000000..3ec44cd7f53 --- /dev/null +++ b/runtime/compiler/microjit/ExceptionTable.hpp @@ -0,0 +1,56 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at http://eclipse.org/legal/epl-2.0 + * or the Apache License, Version 2.0 which accompanies this distribution + * and is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following Secondary + * Licenses when the conditions for such availability set forth in the + * Eclipse Public License, v. 2.0 are satisfied: GNU General Public License, + * version 2 with the GNU Classpath Exception [1] and GNU General Public + * License, version 2 with the OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ + +#ifndef MJIT_EXCEPTIONTABLE_INCL +#define MJIT_EXCEPTIONTABLE_INCL + +#include "infra/Array.hpp" + +#include +#include "env/TRMemory.hpp" +#include "env/jittypes.h" +#include "infra/List.hpp" +#include "env/ExceptionTable.hpp" + +class TR_ResolvedMethod; +namespace TR { class Block; } +namespace TR { class Compilation; } +template class TR_Array; + +struct MJIT_ExceptionTableEntryIterator + { + MJIT_ExceptionTableEntryIterator(TR::Compilation *comp); + + TR_ExceptionTableEntry * getFirst(); + TR_ExceptionTableEntry * getNext(); + + uint32_t size(); +private: + void addSnippetRanges(List &, TR::Block *, TR::Block *, uint32_t, TR_ResolvedMethod *, TR::Compilation *); + + TR::Compilation * _compilation; + TR_Array > * _tableEntries; + ListIterator _entryIterator; + int32_t _inlineDepth; + uint32_t _handlerIndex; + }; + +#endif /* MJIT_EXCEPTIONTABLE_INCL */ diff --git a/runtime/compiler/microjit/HWMetaDataCreation.cpp b/runtime/compiler/microjit/HWMetaDataCreation.cpp new file mode 100644 index 00000000000..0b6d3943f71 --- /dev/null +++ b/runtime/compiler/microjit/HWMetaDataCreation.cpp @@ -0,0 +1,629 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at http://eclipse.org/legal/epl-2.0 + * or the Apache License, Version 2.0 which accompanies this distribution + * and is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following Secondary + * Licenses when the conditions for such availability set forth in the + * Eclipse Public License, v. 2.0 are satisfied: GNU General Public License, + * version 2 with the GNU Classpath Exception [1] and GNU General Public + * License, version 2 with the OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ + +#include +#include "il/Block.hpp" +#include "il/J9Node_inlines.hpp" +#include "infra/Cfg.hpp" + +#include "microjit/MethodBytecodeMetaData.hpp" + +#if defined(J9VM_OPT_MJIT_Standalone) +/* do nothing */ +#else + +void +setupNode( + TR::Node *node, + uint32_t bcIndex, + TR_ResolvedMethod *feMethod, + TR::Compilation *comp +){ + node->getByteCodeInfo().setDoNotProfile(0); + node->setByteCodeIndex(bcIndex); + node->setInlinedSiteIndex(-10); + node->setMethod(feMethod->getPersistentIdentifier()); +} + +TR::Block * +makeBlock( + TR::Compilation *comp, + TR_ResolvedMethod *feMethod, + uint32_t start, + uint32_t end, + TR::CFG &cfg +){ + TR::TreeTop *startTree = TR::TreeTop::create(comp, TR::Node::create(NULL, TR::BBStart, 0)); + TR::TreeTop *endTree = TR::TreeTop::create(comp, TR::Node::create(NULL, TR::BBEnd, 0)); + startTree->join(endTree); + TR::Block *block = TR::Block::createBlock(startTree, endTree, cfg); + block->setBlockBCIndex(start); + block->setNumber(cfg.getNextNodeNumber()); + setupNode(startTree->getNode(), start, feMethod, comp); + setupNode(endTree->getNode(), end, feMethod, comp); + cfg.addNode(block); + return block; +} + +void +join( + TR::CFG &cfg, + TR::Block *src, + TR::Block *dst, + bool isBranchDest +){ + TR::CFGEdge *e = cfg.addEdge(src, dst); + + // TODO: add case for switch + if (isBranchDest) { + src->getExit()->getNode()->setBranchDestination(dst->getEntry()); + if (dst->getEntry()->getNode()->getByteCodeIndex() < src->getExit()->getNode()->getByteCodeIndex()) + src->setBranchesBackwards(); + } else { + src->getExit()->join(dst->getEntry()); + } + src->addSuccessor(e); + dst->addPredecessor(e); +} + +class BytecodeRange{ + public: + int32_t _start; + int32_t _end; + BytecodeRange(int32_t start) + :_start(start) + ,_end(-1) + {} + inline bool isClosed() { return _end > _start; } + BytecodeRange() = delete; +}; + +class BytecodeRangeList{ + private: + BytecodeRange *_range; + public: + TR_ALLOC(TR_Memory::UnknownType); //MicroJIT TODO: create a memory object type for this and get it added to OMR + BytecodeRangeList *prev; + BytecodeRangeList *next; + BytecodeRangeList(BytecodeRange *range) + :_range(range) + ,next(NULL) + ,prev(NULL) + {} + inline uint32_t getStart() { return _range->_start; } + inline uint32_t getEnd() { return _range->_end; } + inline BytecodeRangeList *insert(BytecodeRangeList *newListEntry) { + if (this == newListEntry || next == newListEntry) return this; + while(_range->_start < newListEntry->getStart()){ + if(next){ + next->insert(newListEntry); + return this; + } else { //end of list + next = newListEntry; + newListEntry->prev = this; + newListEntry->next = NULL; + return this; + } + } + MJIT_ASSERT_NO_MSG(_range->_start != newListEntry->getStart()); + if (prev) prev->next = newListEntry; + newListEntry->next = this; + newListEntry->prev = prev; + prev = newListEntry; + return newListEntry; + } +}; + +class BytecodeIndexList{ +private: + BytecodeIndexList *_prev; + BytecodeIndexList *_next; + TR::CFG &_cfg; +public: + TR_ALLOC(TR_Memory::UnknownType); //MicroJIT TODO: create a memory object type for this and get it added to OMR + uint32_t _index; + BytecodeIndexList(uint32_t index, BytecodeIndexList *prev, BytecodeIndexList *next, TR::CFG &cfg) + :_index(index) + ,_prev(prev) + ,_next(next) + ,_cfg(cfg) + {} + BytecodeIndexList() = delete; + BytecodeIndexList * appendBCI(uint32_t i){ + if (i == _index) return this; + if (_next) return _next->appendBCI(i); + BytecodeIndexList *newIndex = new (_cfg.getInternalRegion()) BytecodeIndexList(i, this, _next, _cfg); + _next = newIndex; + return newIndex; + } + BytecodeIndexList * getBCIListEntry(uint32_t i){ + if (i == _index) { + return this; + }else if (i < _index) { + return _prev->getBCIListEntry(i); + } else if(_next && _next->_index <= i) { // (i > _index) + return _next->getBCIListEntry(i); + } else { + BytecodeIndexList *newIndex = new (_cfg.getInternalRegion()) BytecodeIndexList(i, this, _next, _cfg); + return newIndex; + } + } + inline uint32_t getPreviousIndex(){ + return _prev->_index; + } + inline BytecodeIndexList * getPrev(){ return _prev; } + BytecodeIndexList * getTail(){ + if (_next) return _next->getTail(); + return this; + } + inline void cutNext() { + if (!_next) return; + _next->_prev = NULL; + _next = NULL; + } +}; + +class BlockList{ +private: + BlockList *_next; + BlockList *_prev; + BytecodeRangeList *_successorList; + uint32_t _successorCount; + TR::CFG &_cfg; + BlockList** _tail; + BytecodeIndexList* _bcListHead; + BytecodeIndexList* _bcListCurrent; +public: + TR_ALLOC(TR_Memory::UnknownType) //MicroJIT TODO: create a memory object type for this and get it added to OMR + BytecodeRange _range; + TR::Block* _block; + + BlockList(TR::CFG &cfg, uint32_t start, BlockList** tail) + :_next(NULL) + ,_prev(NULL) + ,_successorList(NULL) + ,_successorCount(0) + ,_cfg(cfg) + ,_tail(tail) + ,_bcListHead(NULL) + ,_bcListCurrent(NULL) + ,_range(start) + ,_block(NULL) + { + _bcListHead = new (_cfg.getInternalRegion()) BytecodeIndexList(start, NULL, NULL, _cfg); + _bcListCurrent = _bcListHead; + } + + inline void addBCIndex(uint32_t index){ + _bcListCurrent = _bcListCurrent->appendBCI(index); + } + + inline void setBCListHead(BytecodeIndexList* list){ + _bcListHead = list; + } + + inline void setBCListCurrent(BytecodeIndexList* list){ + _bcListCurrent = list; + } + + inline void close(){ + _range._end = _bcListCurrent->_index; + } + + inline void insertAfter(BlockList* bl) + { + MJIT_ASSERT_NO_MSG(bl); + bl->_next = this->_next; + bl->_prev = this; + if(this->_next){ + this->_next->_prev = bl; + } + this->_next = bl; + if(this == *_tail){ + *_tail = bl; + } + } + + inline void addSuccessor(BytecodeRange *successor){ + BytecodeRangeList *brl = new (_cfg.getInternalRegion()) BytecodeRangeList(successor); + if(_successorList){ + _successorList = _successorList->insert(brl); + } else { + _successorList = brl; + } + _successorCount++; + //Until we support switch and other multiple end point bytecodes this should never go over 2 + MJIT_ASSERT_NO_MSG(_successorCount > 0 && _successorCount <= 2); + } + + BlockList * getBlockListByStartIndex(uint32_t index){ + //NOTE always start searching from a block with a lower start than the index + MJIT_ASSERT_NO_MSG(_range._start <= index); + + // If this is the target block return it + if (_range._start == index) return this; + if (_range._start < index && _next && _next->_range._start > index) return this; + + // If the branch is after this block, and there is more list to search, then continue searching + if (_next) return _next->getBlockListByStartIndex(index); + + //There was no block in the BlockList for this index + return NULL; + } + + BlockList * getOrMakeBlockListByBytecodeIndex(uint32_t index){ + //NOTE always start searching from a block with a lower start than the index + MJIT_ASSERT_NO_MSG(_range._start <= index); + + // If this is the target block return it + if (_range._start == index) return this; + + // If the branch targets the middle of this block (which may be under construction) split it on that index + if (_range.isClosed() && index > _range._start && index <= _range._end) return splitBlockListOnIndex(index); + if (!_range.isClosed() && _next && index > _range._start && index < _next->_range._start) return splitBlockListOnIndex(index); + + // If the branch is after this block, and there is more list to search, then continue searching + if (_next && index > _next->_range._start) return _next->getOrMakeBlockListByBytecodeIndex(index); + + // The target isn't before this block, isn't this block, isn't in this block, and there is no more list to search. + // Time to make a new block and add it to the list + BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); + this->insertAfter(target); + return target; + } + + BlockList * splitBlockListOnIndex(uint32_t index){ + BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); + insertAfter(target); + + BytecodeIndexList *splitPoint = _bcListHead->getBCIListEntry(index); + target->setBCListHead(splitPoint); + target->setBCListCurrent(splitPoint->getTail()); + + _range._end = splitPoint->getPreviousIndex(); + _bcListCurrent = splitPoint->getPrev(); + _bcListCurrent->cutNext(); + + return target; + } + + BlockList * getNext(){ + return _next; + } + + BlockList * getPrev(){ + return _prev; + } + + inline uint32_t getSuccessorCount(){ + return _successorCount; + } + + inline void getDestinations(uint32_t *destinations){ + BytecodeRangeList *range = _successorList; + for(uint32_t i=0; i<_successorCount; i++){ + destinations[i] = range->getStart(); + range = _successorList->next; + } + } +}; + +class CFGCreator{ +private: + BlockList *_head; + BlockList *_current; + BlockList *_tail; + bool _newBlock; + bool _fallThrough; + uint32_t _blockCount; + TR::CFG &_cfg; + TR_ResolvedMethod *_compilee; + TR::Compilation *_comp; + +public: + CFGCreator(TR::CFG &cfg, TR_ResolvedMethod *compilee, TR::Compilation *comp) + :_head(NULL) + ,_current(NULL) + ,_tail(NULL) + ,_newBlock(false) + ,_fallThrough(true) + ,_blockCount(0) + ,_cfg(cfg) + ,_compilee(compilee) + ,_comp(comp) + {} + void addBytecodeIndexToCFG( + TR_J9ByteCodeIterator *bci + ){ + //Just started parsing, make the block list. + if(!_head){ + _head = new (_cfg.getInternalRegion()) BlockList(_cfg, bci->currentByteCodeIndex(), &_tail); + _tail = _head; + _current = _head; + } + + //Given current state and current bytecode, update BlockList + BlockList *nextBlock = _head->getBlockListByStartIndex(bci->currentByteCodeIndex()); + + //If we have a block for this (possibly from a forward jump) + // we should use that block, and close the current one. + //If the current block is the right block to work on, then don't close it. + if(nextBlock && nextBlock != _current) { + _current->close(); + _current = _head->getBlockListByStartIndex(bci->currentByteCodeIndex()); + //If we had thought we needed a new block, override it + _newBlock = false; + _fallThrough = true; + } + + if(_newBlock){//Current block has ended in a branch, this is a new block now + nextBlock = new (_cfg.getInternalRegion()) BlockList(_cfg, bci->currentByteCodeIndex(), &_tail); + _current->insertAfter(nextBlock); + if(_fallThrough) _current->addSuccessor(&(nextBlock->_range)); + _newBlock = false; + _fallThrough = true; + _current = nextBlock; + } else if(bci->isBranch()) { //Get the target block, make it if it isn't there already + _current->_range._end = bci->currentByteCodeIndex(); + nextBlock = _head->getOrMakeBlockListByBytecodeIndex(bci->branchDestination(bci->currentByteCodeIndex())); + if(bci->current() == J9BCgoto || bci->current() == J9BCgotow) _fallThrough = false; + _current->addSuccessor(&(nextBlock->_range)); + _newBlock = true; + } else { //Still working on current block + // TODO: determine if there are any edge cases that need to be filled in here. + } + _current->addBCIndex(bci->currentByteCodeIndex()); + } + + void buildCFG(){ + // Make the blocks + BlockList * temp = _head; + while(temp != NULL){ + uint32_t start = temp->_range._start; + uint32_t end = temp->_range._end; + temp->_block = makeBlock(_comp, _compilee, start, end, _cfg); + temp = temp->getNext(); + } + + // Use successor lists to build CFG + temp = _head; + while(temp != NULL){ + uint32_t successorCount = temp->getSuccessorCount(); + uint32_t destinations[successorCount]; + temp->getDestinations(destinations); + for(uint32_t i=0; igetBlockListByStartIndex(destinations[i]); + join(_cfg, temp->_block, dst->_block, (dst == temp->getNext())); + } + temp = temp->getNext(); + } + _cfg.setStart(_head->_block); + } +}; + + +TR::Optimizer * createOptimizer(TR::Compilation *comp, TR::ResolvedMethodSymbol *methodSymbol) +{ + return new (comp->trHeapMemory()) TR::Optimizer(comp, methodSymbol, true, J9::Optimizer::microJITOptimizationStrategy(comp)); +} + + +MJIT::InternalMetaData +MJIT::createInternalMethodMetadata( + MJIT::ByteCodeIterator *bci, + MJIT::LocalTableEntry *localTableEntries, + U_16 entries, + int32_t offsetToFirstLocal, + TR_ResolvedMethod *compilee, + TR::Compilation *comp, + MJIT::ParamTable *paramTable, + uint8_t pointerSize, + bool *resolvedAllCallees +){ + int32_t totalSize = 0; + + int32_t localIndex = 0; // Indexes are 0 based and positive + int32_t lastParamSlot = 0; + int32_t gcMapOffset = 0; // Offset will always be greater than 0 + U_16 size = 0; //Sizes are all multiples of 8 and greater than 0 + bool isRef = false; //No good default here, both options are valid. + + uint16_t maxCalleeArgsSize = 0; + + //TODO: Create the CFG before here and pass reference + bool profilingCompilation = comp->getOption(TR_EnableJProfiling) || comp->getProfilingMode() == JProfiling; + + TR::CFG *cfg; + if(profilingCompilation){ + cfg = new (comp->trHeapMemory()) TR::CFG(comp, compilee->findOrCreateJittedMethodSymbol(comp)); + } + CFGCreator creator(*cfg, compilee, comp); + ParamTableEntry entry; + for(int i=0; igetParamCount(); i+=entry.slots){ + MJIT_ASSERT_NO_MSG(paramTable->getEntry(i, &entry)); + size = entry.slots*pointerSize; + localTableEntries[i] = makeLocalTableEntry(i, totalSize + offsetToFirstLocal, size, entry.isReference); + totalSize += size; + lastParamSlot += localTableEntries[i].slots; + } + + for(TR_J9ByteCode bc = bci->first(); bc != J9BCunknown; bc = bci->next()){ + /* It's okay to overwrite entries here. + * We are not storing whether the entry is used for a + * load or a store because it could be both. Identifying entries + * that we have already created would require zeroing the + * array before use and/or remembering which entries we've created. + * Both might be as much work as just recreating entries, depending + * on how many times a local is stored/loaded during a method. + * This may be worth doing later, but requires some research first. + * + * TODO: Arguments are not guaranteed to be used in the bytecode of a method + * e.g. class MyClass { public int ret3() {return 3;} } + * The above method is not a static method (arg 0 is an object reference) but the BC + * will be [iconst_3, return]. + * We must find a way to take the parameter table in and create the required entries + * in the local table. + */ + + if (profilingCompilation) creator.addBytecodeIndexToCFG(bci); + gcMapOffset = 0; + isRef = false; + switch(bc){ + MakeEntry: + gcMapOffset = offsetToFirstLocal + totalSize; + localTableEntries[localIndex] = makeLocalTableEntry(localIndex, gcMapOffset, size, isRef); + totalSize += size; + break; + case J9BCiload: + case J9BCistore: + case J9BCfstore: + case J9BCfload: + localIndex = (int32_t)bci->nextByte(); + size = pointerSize; + goto MakeEntry; + case J9BCiload0: + case J9BCistore0: + case J9BCfload0: + case J9BCfstore0: + localIndex = 0; + size = pointerSize; + goto MakeEntry; + case J9BCiload1: + case J9BCistore1: + case J9BCfstore1: + case J9BCfload1: + localIndex = 1; + size = pointerSize; + goto MakeEntry; + case J9BCiload2: + case J9BCistore2: + case J9BCfstore2: + case J9BCfload2: + localIndex = 2; + size = pointerSize; + goto MakeEntry; + case J9BCiload3: + case J9BCistore3: + case J9BCfstore3: + case J9BCfload3: + localIndex = 3; + size = pointerSize; + goto MakeEntry; + case J9BClstore: + case J9BClload: + case J9BCdstore: + case J9BCdload: + localIndex = (int32_t)bci->nextByte(); + size = pointerSize*2; + goto MakeEntry; + case J9BClstore0: + case J9BClload0: + case J9BCdload0: + case J9BCdstore0: + localIndex = 0; + size = pointerSize*2; + goto MakeEntry; + case J9BClstore1: + case J9BCdstore1: + case J9BCdload1: + case J9BClload1: + localIndex = 1; + size = pointerSize*2; + goto MakeEntry; + case J9BClstore2: + case J9BClload2: + case J9BCdstore2: + case J9BCdload2: + localIndex = 2; + size = pointerSize*2; + goto MakeEntry; + case J9BClstore3: + case J9BClload3: + case J9BCdstore3: + case J9BCdload3: + localIndex = 3; + size = pointerSize*2; + goto MakeEntry; + case J9BCaload: + case J9BCastore: + localIndex = (int32_t)bci->nextByte(); + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCaload0: + case J9BCastore0: + localIndex = 0; + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCaload1: + case J9BCastore1: + localIndex = 1; + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCaload2: + case J9BCastore2: + localIndex = 2; + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCaload3: + case J9BCastore3: + localIndex = 3; + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCinvokestatic: + { + int32_t cpIndex = (int32_t)bci->next2Bytes(); + bool isUnresolvedInCP; + TR_ResolvedMethod* resolved = compilee->getResolvedStaticMethod(comp, cpIndex, &isUnresolvedInCP); + if (!resolved) { + *resolvedAllCallees = false; + break; + } + J9Method *ramMethod = static_cast(resolved)->ramMethod(); + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(ramMethod); + //TODO replace this with a more accurate count. This is a clear upper bound; + uint16_t calleeArgsSize = romMethod->argCount * pointerSize * 2; //assume 2 slot for now + maxCalleeArgsSize = (calleeArgsSize > maxCalleeArgsSize) ? calleeArgsSize : maxCalleeArgsSize; + } + default: + break; + } + } + + if(profilingCompilation) { + creator.buildCFG(); + comp->getMethodSymbol()->setFlowGraph(cfg); + TR::Optimizer *optimizer = createOptimizer(comp, comp->getMethodSymbol()); + comp->setOptimizer(optimizer); + optimizer->optimize(); + } + + MJIT::LocalTable localTable(localTableEntries, entries); + + MJIT::InternalMetaData internalMetaData(localTable, cfg, maxCalleeArgsSize); + return internalMetaData; +} +#endif /* TR_MJIT_Interop */ \ No newline at end of file diff --git a/runtime/compiler/microjit/LWMetaDataCreation.cpp b/runtime/compiler/microjit/LWMetaDataCreation.cpp new file mode 100644 index 00000000000..ea0f38c08a7 --- /dev/null +++ b/runtime/compiler/microjit/LWMetaDataCreation.cpp @@ -0,0 +1,28 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at http://eclipse.org/legal/epl-2.0 + * or the Apache License, Version 2.0 which accompanies this distribution + * and is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following Secondary + * Licenses when the conditions for such availability set forth in the + * Eclipse Public License, v. 2.0 are satisfied: GNU General Public License, + * version 2 with the GNU Classpath Exception [1] and GNU General Public + * License, version 2 with the OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ + +#include "microjit/MethodBytecodeMetaData.hpp" + +#if defined(J9VM_OPT_MJIT_Standalone) +//TODO: Complete this section so that MicroJIT can be used in a future case where MicroJIT won't deploy with TR. +#else +/* do nothing */ +#endif /* TR_MJIT_Interop */ \ No newline at end of file diff --git a/runtime/compiler/microjit/MethodBytecodeMetaData.hpp b/runtime/compiler/microjit/MethodBytecodeMetaData.hpp new file mode 100644 index 00000000000..6d5eaf866ec --- /dev/null +++ b/runtime/compiler/microjit/MethodBytecodeMetaData.hpp @@ -0,0 +1,69 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at http://eclipse.org/legal/epl-2.0 + * or the Apache License, Version 2.0 which accompanies this distribution + * and is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following Secondary + * Licenses when the conditions for such availability set forth in the + * Eclipse Public License, v. 2.0 are satisfied: GNU General Public License, + * version 2 with the GNU Classpath Exception [1] and GNU General Public + * License, version 2 with the OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ + +#ifndef MJIT_METHOD_BYTECODE_META_DATA +#define MJIT_METHOD_BYTECODE_META_DATA + +#include "ilgen/J9ByteCodeIterator.hpp" +#include "microjit/ByteCodeIterator.hpp" +#include "microjit/SideTables.hpp" + + +namespace MJIT { + +#if defined(J9VM_OPT_MJIT_Standalone) +class InternalMetaData{ +public: + MJIT::LocalTable _localTable; + InternalMetaData(MJIT::LocalTable table) + :_localTable(table) + {} +}; +#else +class InternalMetaData{ +public: + MJIT::LocalTable _localTable; + TR::CFG *_cfg; + uint16_t _maxCalleeArgSize; + InternalMetaData(MJIT::LocalTable table, TR::CFG *cfg, uint16_t maxCalleeArgSize) + :_localTable(table) + ,_cfg(cfg) + ,_maxCalleeArgSize(maxCalleeArgSize) + {} +}; +#endif + +MJIT::InternalMetaData +createInternalMethodMetadata( + MJIT::ByteCodeIterator *bci, + MJIT::LocalTableEntry *localTableEntries, + U_16 entries, + int32_t offsetToFirstLocal, + TR_ResolvedMethod *compilee, + TR::Compilation *comp, + ParamTable* paramTable, + uint8_t pointerSize, + bool *resolvedAllCallees +); + +} + +#endif /* MJIT_METHOD_BYTECODE_META_DATA */ \ No newline at end of file diff --git a/runtime/compiler/microjit/SideTables.hpp b/runtime/compiler/microjit/SideTables.hpp new file mode 100644 index 00000000000..ff2b60ddb0a --- /dev/null +++ b/runtime/compiler/microjit/SideTables.hpp @@ -0,0 +1,338 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#ifndef MJIT_SIDETABLE_HPP +#define MJIT_SIDETABLE_HPP + +#include +#include "microjit/x/amd64/AMD64Linkage.hpp" +#include "microjit/utils.hpp" + +/* +| Architecture | Endian | Return address | vTable index | +| x86-64 | Little | Stack | R8 (receiver in RAX) + +| Integer Return value registers | Integer Preserved registers | Integer Argument registers +| EAX (32-bit) RAX (64-bit) | RBX R9 | RAX RSI RDX RCX + +| Float Return value registers | Float Preserved registers | Float Argument registers +| XMM0 | XMM8-XMM15 | XMM0-XMM7 +*/ +namespace MJIT { + +struct RegisterStack { + unsigned char useRAX : 2; + unsigned char useRSI : 2; + unsigned char useRDX : 2; + unsigned char useRCX : 2; + unsigned char useXMM0 : 2; + unsigned char useXMM1 : 2; + unsigned char useXMM2 : 2; + unsigned char useXMM3 : 2; + unsigned char useXMM4 : 2; + unsigned char useXMM5 : 2; + unsigned char useXMM6 : 2; + unsigned char useXMM7 : 2; + unsigned char stackSlotsUsed : 8; +}; + +inline U_16 +calculateOffset(RegisterStack *stack) +{ + return 8*(stack->useRAX + + stack->useRSI + + stack->useRDX + + stack->useRCX + + stack->useXMM0 + + stack->useXMM1 + + stack->useXMM2 + + stack->useXMM3 + + stack->useXMM4 + + stack->useXMM5 + + stack->useXMM6 + + stack->useXMM7 + + stack->stackSlotsUsed); +} + +inline void +initParamStack(RegisterStack *stack) +{ + stack->useRAX = 0; + stack->useRSI = 0; + stack->useRDX = 0; + stack->useRCX = 0; + stack->useXMM0 = 0; + stack->useXMM1 = 0; + stack->useXMM2 = 0; + stack->useXMM3 = 0; + stack->useXMM4 = 0; + stack->useXMM5 = 0; + stack->useXMM6 = 0; + stack->useXMM7 = 0; + stack->stackSlotsUsed = 0; +} + +inline int +addParamIntToStack(RegisterStack *stack, U_16 size) +{ + if(!stack->useRAX){ + stack->useRAX = size/8; + return TR::RealRegister::eax; + } else if(!stack->useRSI){ + stack->useRSI = size/8; + return TR::RealRegister::esi; + } else if(!stack->useRDX){ + stack->useRDX = size/8; + return TR::RealRegister::edx; + } else if(!stack->useRCX){ + stack->useRCX = size/8; + return TR::RealRegister::ecx; + } else { + stack->stackSlotsUsed += size/8; + return TR::RealRegister::NoReg; + } +} + +inline int +addParamFloatToStack(RegisterStack *stack, U_16 size) +{ + if(!stack->useXMM0){ + stack->useXMM0 = size/8; + return TR::RealRegister::xmm0; + } else if(!stack->useXMM1){ + stack->useXMM1 = size/8; + return TR::RealRegister::xmm1; + } else if(!stack->useXMM2){ + stack->useXMM2 = size/8; + return TR::RealRegister::xmm2; + } else if(!stack->useXMM3){ + stack->useXMM3 = size/8; + return TR::RealRegister::xmm3; + } else if(!stack->useXMM4){ + stack->useXMM4 = size/8; + return TR::RealRegister::xmm4; + } else if(!stack->useXMM5){ + stack->useXMM5 = size/8; + return TR::RealRegister::xmm5; + } else if(!stack->useXMM6){ + stack->useXMM6 = size/8; + return TR::RealRegister::xmm6; + } else if(!stack->useXMM7){ + stack->useXMM7 = size/8; + return TR::RealRegister::xmm7; + } else { + stack->stackSlotsUsed += size/8; + return TR::RealRegister::NoReg; + } +} + +inline int +removeParamIntFromStack(RegisterStack *stack, U_16 *size) +{ + if(stack->useRAX){ + *size = stack->useRAX*8; + stack->useRAX = 0; + return TR::RealRegister::eax; + } else if(stack->useRSI){ + *size = stack->useRSI*8; + stack->useRSI = 0; + return TR::RealRegister::esi; + } else if(stack->useRDX){ + *size = stack->useRDX*8; + stack->useRDX = 0; + return TR::RealRegister::edx; + } else if(stack->useRCX){ + *size = stack->useRCX*8; + stack->useRCX = 0; + return TR::RealRegister::ecx; + } else { + *size = 0; + return TR::RealRegister::NoReg; + } +} + +inline int +removeParamFloatFromStack(RegisterStack *stack, U_16 *size) +{ + if(stack->useXMM0){ + *size = stack->useXMM0*8; + stack->useXMM0 = 0; + return TR::RealRegister::xmm0; + } else if(stack->useXMM1){ + *size = stack->useXMM1*8; + stack->useXMM1 = 0; + return TR::RealRegister::xmm1; + } else if(stack->useXMM2){ + *size = stack->useXMM2*8; + stack->useXMM2 = 0; + return TR::RealRegister::xmm2; + } else if(stack->useXMM3){ + *size = stack->useXMM3*8; + stack->useXMM3 = 0; + return TR::RealRegister::xmm3; + } else if(stack->useXMM4){ + *size = stack->useXMM4*8; + stack->useXMM4 = 0; + return TR::RealRegister::xmm4; + } else if(stack->useXMM5){ + *size = stack->useXMM5*8; + stack->useXMM5 = 0; + return TR::RealRegister::xmm5; + } else if(stack->useXMM6){ + *size = stack->useXMM6*8; + stack->useXMM6 = 0; + return TR::RealRegister::xmm6; + } else if(stack->useXMM7){ + *size = stack->useXMM7*8; + stack->useXMM7 = 0; + return TR::RealRegister::xmm7; + } else { + *size = 0; + return TR::RealRegister::NoReg; + } +} + +struct ParamTableEntry { + int32_t offset; + int32_t regNo; + int32_t gcMapOffset; + //Stored here so we can look up when saving to local array + char slots; + char size; + bool isReference; + bool onStack; + bool notInitialized; +}; + +inline MJIT::ParamTableEntry +initializeParamTableEntry(){ + MJIT::ParamTableEntry entry; + entry.regNo = TR::RealRegister::NoReg; + entry.offset = 0; + entry.gcMapOffset = 0; + entry.onStack = false; + entry.slots = 0; + entry.size = 0; + entry.isReference = false; + entry.notInitialized = true; + return entry; +} + + +// This function is only used for paramaters +inline ParamTableEntry +makeRegisterEntry(int32_t regNo, int32_t stackOffset, uint16_t size, uint16_t slots, bool isRef){ + ParamTableEntry entry; + entry.regNo = regNo; + entry.offset = stackOffset; + entry.gcMapOffset = stackOffset; + entry.onStack = false; + entry.slots = slots; + entry.size = size; + entry.isReference = isRef; + entry.notInitialized = false; + return entry; +} + +// This function is only used for paramaters +inline ParamTableEntry +makeStackEntry(int32_t stackOffset, uint16_t size, uint16_t slots, bool isRef){ + ParamTableEntry entry; + entry.regNo = TR::RealRegister::NoReg; + entry.offset = stackOffset; + entry.gcMapOffset = stackOffset; + entry.onStack = true; + entry.slots = slots; + entry.size = size; + entry.isReference = isRef; + entry.notInitialized = false; + return entry; +} + +class ParamTable +{ + private: + ParamTableEntry *_tableEntries; + U_16 _paramCount; + U_16 _actualParamCount; + RegisterStack *_registerStack; + public: + ParamTable(ParamTableEntry*, uint16_t, uint16_t, RegisterStack*); + bool getEntry(uint16_t, ParamTableEntry*); + bool setEntry(uint16_t, ParamTableEntry*); + uint16_t getTotalParamSize(); + uint16_t getParamCount(); + uint16_t getActualParamCount(); +}; + +using LocalTableEntry=ParamTableEntry; + +inline LocalTableEntry +makeLocalTableEntry(int32_t gcMapOffset, uint16_t size, uint16_t slots, bool isRef) +{ + LocalTableEntry entry; + entry.regNo = TR::RealRegister::NoReg; + entry.offset = -1; //This field isn't used for locals + entry.gcMapOffset = gcMapOffset; + entry.onStack = true; + entry.slots = slots; + entry.size = size; + entry.isReference = isRef; + entry.notInitialized = false; + return entry; +} + +inline LocalTableEntry +initializeLocalTableEntry(){ + ParamTableEntry entry; + entry.regNo = TR::RealRegister::NoReg; + entry.offset = 0; + entry.gcMapOffset = 0; + entry.onStack = false; + entry.slots = 0; + entry.size = 0; + entry.isReference = false; + entry.notInitialized = true; + return entry; +} + +class LocalTable +{ + private: + LocalTableEntry *_tableEntries; + uint16_t _localCount; + public: + LocalTable(LocalTableEntry*, uint16_t); + bool getEntry(uint16_t, LocalTableEntry*); + uint16_t getTotalLocalSize(); + uint16_t getLocalCount(); +}; + +struct JumpTableEntry { + uint32_t byteCodeIndex; + char * codeCacheAddress; + JumpTableEntry() {} + JumpTableEntry(uint32_t bci, char* cca) : byteCodeIndex(bci), codeCacheAddress(cca) {} +}; + + +} +#endif /* MJIT_SIDETABLE_HPP */ diff --git a/runtime/compiler/microjit/assembly/utils.nasm b/runtime/compiler/microjit/assembly/utils.nasm new file mode 100644 index 00000000000..b19cb9d4dbb --- /dev/null +++ b/runtime/compiler/microjit/assembly/utils.nasm @@ -0,0 +1,30 @@ +; Copyright (c) 2022, 2022 IBM Corp. and others +; +; This program and the accompanying materials are made available under +; the terms of the Eclipse Public License 2.0 which accompanies this +; distribution and is available at https://www.eclipse.org/legal/epl-2.0/ +; or the Apache License, Version 2.0 which accompanies this distribution and +; is available at https://www.apache.org/licenses/LICENSE-2.0. +; +; This Source Code may also be made available under the following +; Secondary Licenses when the conditions for such availability set +; forth in the Eclipse Public License, v. 2.0 are satisfied: GNU +; General Public License, version 2 with the GNU Classpath +; Exception [1] and GNU General Public License, version 2 with the +; OpenJDK Assembly Exception [2]. +; +; [1] https://www.gnu.org/software/classpath/license.html +; [2] http://openjdk.java.net/legal/assembly-exception.html +; +; SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + +%macro template_start 1 +global %1 +global %1 %+ Size +%1: +%endmacro + +%macro template_end 1 +%1 %+ _end: +%1 %+ Size: dw %1 %+ _end - %1 +%endmacro diff --git a/runtime/compiler/microjit/utils.hpp b/runtime/compiler/microjit/utils.hpp new file mode 100644 index 00000000000..c0296c3742f --- /dev/null +++ b/runtime/compiler/microjit/utils.hpp @@ -0,0 +1,39 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#ifndef MJIT_UTILS_HPP +#define MJIT_UTILS_HPP + +#include +#include +#include "env/IO.hpp" + +inline void MJIT_ASSERT(TR::FilePointer *filePtr, bool condition, const char * const error_string){ + if(!condition){ + trfprintf(filePtr, "%s", error_string); + assert(condition); + } +} + +inline void MJIT_ASSERT_NO_MSG(bool condition){ + assert(condition); +} +#endif /* MJIT_UTILS_HPP */ diff --git a/runtime/compiler/microjit/x/CMakeLists.txt b/runtime/compiler/microjit/x/CMakeLists.txt new file mode 100644 index 00000000000..2e9e18cb0b9 --- /dev/null +++ b/runtime/compiler/microjit/x/CMakeLists.txt @@ -0,0 +1,24 @@ +################################################################################ +# Copyright (c) 2022, 2022 IBM Corp. and others +# +# This program and the accompanying materials are made available under +# the terms of the Eclipse Public License 2.0 which accompanies this +# distribution and is available at https://www.eclipse.org/legal/epl-2.0/ +# or the Apache License, Version 2.0 which accompanies this distribution and +# is available at https://www.apache.org/licenses/LICENSE-2.0. +# +# This Source Code may also be made available under the following +# Secondary Licenses when the conditions for such availability set +# forth in the Eclipse Public License, v. 2.0 are satisfied: GNU +# General Public License, version 2 with the GNU Classpath +# Exception [1] and GNU General Public License, version 2 with the +# OpenJDK Assembly Exception [2]. +# +# [1] https://www.gnu.org/software/classpath/license.html +# [2] http://openjdk.java.net/legal/assembly-exception.html +# +# SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception +################################################################################ +if(TR_HOST_BITS STREQUAL 64) + add_subdirectory(amd64) +endif() \ No newline at end of file diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp new file mode 100755 index 00000000000..11b54199989 --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp @@ -0,0 +1,3096 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#include "j9.h" +#include "env/VMJ9.h" + +#include "il/Block.hpp" +#include "infra/Cfg.hpp" +#include "compile/Compilation.hpp" +#include "codegen/CodeGenerator.hpp" +#include "control/Recompilation.hpp" +#include "microjit/MethodBytecodeMetaData.hpp" +#include "microjit/x/amd64/AMD64Codegen.hpp" +#include "runtime/CodeCacheManager.hpp" +#include "env/PersistentInfo.hpp" +#include "runtime/J9Runtime.hpp" + +#define MJIT_JITTED_BODY_INFO_PTR_SIZE 8 +#define MJIT_SAVE_AREA_SIZE 2 +#define MJIT_LINKAGE_INFO_SIZE 4 +#define GENERATE_SWITCH_TO_INTERP_PREPROLOGUE 1 + +#define BIT_MASK_32 0xffffffff +#define BIT_MASK_64 0xffffffffffffffff + +//Debug Params, comment/uncomment undef to get omit/enable debugging info +//#define MJIT_DEBUG 1 +#undef MJIT_DEBUG + +#ifdef MJIT_DEBUG +#define MJIT_DEBUG_MAP_PARAMS 1 +#else +#undef MJIT_DEBUG_MAP_PARAMS +#endif +#ifdef MJIT_DEBUG +#define MJIT_DEBUG_BC_WALKING 1 +#define MJIT_DEBUG_BC_LOG(logFile, text) trfprintf(logFile, text) +#else +#undef MJIT_DEBUG_BC_WALKING +#define MJIT_DEBUG_BC_LOG(logFile, text) do {} while(0) +#endif + +#define STACKCHECKBUFFER 512 + +#include "j9comp.h" +extern J9_CDATA char * const JavaBCNames[]; + +#include +#include +#include + +#define COPY_TEMPLATE(buffer, INSTRUCTION, size) do {\ + memcpy(buffer, INSTRUCTION, INSTRUCTION##Size);\ + buffer=(buffer+INSTRUCTION##Size);\ + size += INSTRUCTION##Size;\ +} while(0) + +#define PATCH_RELATIVE_ADDR_8_BIT(buffer, absAddress) do {\ + intptr_t relativeAddress = (intptr_t)(buffer);\ + intptr_t minAddress = (relativeAddress < (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress);\ + intptr_t maxAddress = (relativeAddress > (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress);\ + intptr_t absDistance = maxAddress - minAddress;\ + MJIT_ASSERT_NO_MSG(absDistance < (intptr_t)0x00000000000000ff);\ + relativeAddress = (relativeAddress < (intptr_t)(absAddress)) ? absDistance : (-1*(intptr_t)(absDistance));\ + patchImm1(buffer, (U_32)(relativeAddress & 0x00000000000000ff));\ +} while(0) + +//TODO: Find out how to jump to trampolines for far calls +#define PATCH_RELATIVE_ADDR_32_BIT(buffer, absAddress) do {\ + intptr_t relativeAddress = (intptr_t)(buffer);\ + intptr_t minAddress = (relativeAddress < (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress);\ + intptr_t maxAddress = (relativeAddress > (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress);\ + intptr_t absDistance = maxAddress - minAddress;\ + MJIT_ASSERT_NO_MSG(absDistance < (intptr_t)0x00000000ffffffff);\ + relativeAddress = (relativeAddress < (intptr_t)(absAddress)) ? absDistance : (-1*(intptr_t)(absDistance));\ + patchImm4(buffer, (U_32)(relativeAddress & 0x00000000ffffffff));\ +} while(0) + +// Using char[] to signify that we are treating this +// like data, even though they are instructions. +#define EXTERN_TEMPLATE(TEMPLATE_NAME) extern char TEMPLATE_NAME[] +#define EXTERN_TEMPLATE_SIZE(TEMPLATE_NAME) extern unsigned short TEMPLATE_NAME##Size +#define DECLARE_TEMPLATE(TEMPLATE_NAME) EXTERN_TEMPLATE(TEMPLATE_NAME);\ + EXTERN_TEMPLATE_SIZE(TEMPLATE_NAME) + +// Labels linked from templates. +DECLARE_TEMPLATE(movRSPR10); +DECLARE_TEMPLATE(movR10R14); +DECLARE_TEMPLATE(subR10Imm4); +DECLARE_TEMPLATE(addR10Imm4); +DECLARE_TEMPLATE(addR14Imm4); +DECLARE_TEMPLATE(subR14Imm4); +DECLARE_TEMPLATE(subRSPImm8); +DECLARE_TEMPLATE(subRSPImm4); +DECLARE_TEMPLATE(subRSPImm2); +DECLARE_TEMPLATE(subRSPImm1); +DECLARE_TEMPLATE(addRSPImm8); +DECLARE_TEMPLATE(addRSPImm4); +DECLARE_TEMPLATE(addRSPImm2); +DECLARE_TEMPLATE(addRSPImm1); +DECLARE_TEMPLATE(jbe4ByteRel); +DECLARE_TEMPLATE(cmpRspRbpDerefOffset); +DECLARE_TEMPLATE(loadEBPOffset); +DECLARE_TEMPLATE(loadESPOffset); +DECLARE_TEMPLATE(loadEAXOffset); +DECLARE_TEMPLATE(loadESIOffset); +DECLARE_TEMPLATE(loadEDXOffset); +DECLARE_TEMPLATE(loadECXOffset); +DECLARE_TEMPLATE(loadEBXOffset); +DECLARE_TEMPLATE(loadEBPOffset); +DECLARE_TEMPLATE(loadESPOffset); +DECLARE_TEMPLATE(loadRAXOffset); +DECLARE_TEMPLATE(loadRSIOffset); +DECLARE_TEMPLATE(loadRDXOffset); +DECLARE_TEMPLATE(loadRCXOffset); +DECLARE_TEMPLATE(loadRBXOffset); +DECLARE_TEMPLATE(loadRBPOffset); +DECLARE_TEMPLATE(loadRSPOffset); +DECLARE_TEMPLATE(loadR9Offset); +DECLARE_TEMPLATE(loadR10Offset); +DECLARE_TEMPLATE(loadR11Offset); +DECLARE_TEMPLATE(loadR12Offset); +DECLARE_TEMPLATE(loadR13Offset); +DECLARE_TEMPLATE(loadR14Offset); +DECLARE_TEMPLATE(loadR15Offset); +DECLARE_TEMPLATE(loadXMM0Offset); +DECLARE_TEMPLATE(loadXMM1Offset); +DECLARE_TEMPLATE(loadXMM2Offset); +DECLARE_TEMPLATE(loadXMM3Offset); +DECLARE_TEMPLATE(loadXMM4Offset); +DECLARE_TEMPLATE(loadXMM5Offset); +DECLARE_TEMPLATE(loadXMM6Offset); +DECLARE_TEMPLATE(loadXMM7Offset); +DECLARE_TEMPLATE(saveEAXOffset); +DECLARE_TEMPLATE(saveESIOffset); +DECLARE_TEMPLATE(saveEDXOffset); +DECLARE_TEMPLATE(saveECXOffset); +DECLARE_TEMPLATE(saveEBXOffset); +DECLARE_TEMPLATE(saveEBPOffset); +DECLARE_TEMPLATE(saveESPOffset); +DECLARE_TEMPLATE(saveRAXOffset); +DECLARE_TEMPLATE(saveRSIOffset); +DECLARE_TEMPLATE(saveRDXOffset); +DECLARE_TEMPLATE(saveRCXOffset); +DECLARE_TEMPLATE(saveRBXOffset); +DECLARE_TEMPLATE(saveRBPOffset); +DECLARE_TEMPLATE(saveRSPOffset); +DECLARE_TEMPLATE(saveR9Offset); +DECLARE_TEMPLATE(saveR10Offset); +DECLARE_TEMPLATE(saveR11Offset); +DECLARE_TEMPLATE(saveR12Offset); +DECLARE_TEMPLATE(saveR13Offset); +DECLARE_TEMPLATE(saveR14Offset); +DECLARE_TEMPLATE(saveR15Offset); +DECLARE_TEMPLATE(saveXMM0Offset); +DECLARE_TEMPLATE(saveXMM1Offset); +DECLARE_TEMPLATE(saveXMM2Offset); +DECLARE_TEMPLATE(saveXMM3Offset); +DECLARE_TEMPLATE(saveXMM4Offset); +DECLARE_TEMPLATE(saveXMM5Offset); +DECLARE_TEMPLATE(saveXMM6Offset); +DECLARE_TEMPLATE(saveXMM7Offset); +DECLARE_TEMPLATE(saveXMM0Local); +DECLARE_TEMPLATE(saveXMM1Local); +DECLARE_TEMPLATE(saveXMM2Local); +DECLARE_TEMPLATE(saveXMM3Local); +DECLARE_TEMPLATE(saveXMM4Local); +DECLARE_TEMPLATE(saveXMM5Local); +DECLARE_TEMPLATE(saveXMM6Local); +DECLARE_TEMPLATE(saveXMM7Local); +DECLARE_TEMPLATE(saveRAXLocal); +DECLARE_TEMPLATE(saveRSILocal); +DECLARE_TEMPLATE(saveRDXLocal); +DECLARE_TEMPLATE(saveRCXLocal); +DECLARE_TEMPLATE(saveR11Local); +DECLARE_TEMPLATE(callByteRel); +DECLARE_TEMPLATE(call4ByteRel); +DECLARE_TEMPLATE(jump4ByteRel); +DECLARE_TEMPLATE(jumpByteRel); +DECLARE_TEMPLATE(nopInstruction); +DECLARE_TEMPLATE(movRDIImm64); +DECLARE_TEMPLATE(movEDIImm32); +DECLARE_TEMPLATE(movRAXImm64); +DECLARE_TEMPLATE(movEAXImm32); +DECLARE_TEMPLATE(movRSPOffsetR11); +DECLARE_TEMPLATE(jumpRDI); +DECLARE_TEMPLATE(jumpRAX); +DECLARE_TEMPLATE(paintRegister); +DECLARE_TEMPLATE(paintLocal); +DECLARE_TEMPLATE(moveCountAndRecompile); +DECLARE_TEMPLATE(checkCountAndRecompile); +DECLARE_TEMPLATE(loadCounter); +DECLARE_TEMPLATE(decrementCounter); +DECLARE_TEMPLATE(incrementCounter); +DECLARE_TEMPLATE(jgCount); +DECLARE_TEMPLATE(callRetranslateArg1); +DECLARE_TEMPLATE(callRetranslateArg2); +DECLARE_TEMPLATE(callRetranslate); +DECLARE_TEMPLATE(setCounter); +DECLARE_TEMPLATE(jmpToBody); + +// bytecodes +DECLARE_TEMPLATE(debugBreakpoint); +DECLARE_TEMPLATE(aloadTemplatePrologue); +DECLARE_TEMPLATE(iloadTemplatePrologue); +DECLARE_TEMPLATE(lloadTemplatePrologue); +DECLARE_TEMPLATE(loadTemplate); +DECLARE_TEMPLATE(astoreTemplate); +DECLARE_TEMPLATE(istoreTemplate); +DECLARE_TEMPLATE(lstoreTemplate); +DECLARE_TEMPLATE(popTemplate); +DECLARE_TEMPLATE(pop2Template); +DECLARE_TEMPLATE(swapTemplate); +DECLARE_TEMPLATE(dupTemplate); +DECLARE_TEMPLATE(dupx1Template); +DECLARE_TEMPLATE(dupx2Template); +DECLARE_TEMPLATE(dup2Template); +DECLARE_TEMPLATE(dup2x1Template); +DECLARE_TEMPLATE(dup2x2Template); +DECLARE_TEMPLATE(getFieldTemplatePrologue); +DECLARE_TEMPLATE(intGetFieldTemplate); +DECLARE_TEMPLATE(longGetFieldTemplate); +DECLARE_TEMPLATE(floatGetFieldTemplate); +DECLARE_TEMPLATE(doubleGetFieldTemplate); +DECLARE_TEMPLATE(intPutFieldTemplatePrologue); +DECLARE_TEMPLATE(addrPutFieldTemplatePrologue); +DECLARE_TEMPLATE(longPutFieldTemplatePrologue); +DECLARE_TEMPLATE(floatPutFieldTemplatePrologue); +DECLARE_TEMPLATE(doublePutFieldTemplatePrologue); +DECLARE_TEMPLATE(intPutFieldTemplate); +DECLARE_TEMPLATE(longPutFieldTemplate); +DECLARE_TEMPLATE(floatPutFieldTemplate); +DECLARE_TEMPLATE(doublePutFieldTemplate); +DECLARE_TEMPLATE(invokeStaticTemplate); +DECLARE_TEMPLATE(staticTemplatePrologue); +DECLARE_TEMPLATE(intGetStaticTemplate); +DECLARE_TEMPLATE(addrGetStaticTemplate); +DECLARE_TEMPLATE(addrGetStaticTemplatePrologue); +DECLARE_TEMPLATE(longGetStaticTemplate); +DECLARE_TEMPLATE(floatGetStaticTemplate); +DECLARE_TEMPLATE(doubleGetStaticTemplate); +DECLARE_TEMPLATE(intPutStaticTemplate); +DECLARE_TEMPLATE(addrPutStaticTemplatePrologue); +DECLARE_TEMPLATE(addrPutStaticTemplate); +DECLARE_TEMPLATE(longPutStaticTemplate); +DECLARE_TEMPLATE(floatPutStaticTemplate); +DECLARE_TEMPLATE(doublePutStaticTemplate); +DECLARE_TEMPLATE(iAddTemplate); +DECLARE_TEMPLATE(iSubTemplate); +DECLARE_TEMPLATE(iMulTemplate); +DECLARE_TEMPLATE(iDivTemplate); +DECLARE_TEMPLATE(iRemTemplate); +DECLARE_TEMPLATE(iNegTemplate); +DECLARE_TEMPLATE(iShlTemplate); +DECLARE_TEMPLATE(iShrTemplate); +DECLARE_TEMPLATE(iUshrTemplate); +DECLARE_TEMPLATE(iAndTemplate); +DECLARE_TEMPLATE(iOrTemplate); +DECLARE_TEMPLATE(iXorTemplate); +DECLARE_TEMPLATE(i2lTemplate); +DECLARE_TEMPLATE(l2iTemplate); +DECLARE_TEMPLATE(i2bTemplate); +DECLARE_TEMPLATE(i2sTemplate); +DECLARE_TEMPLATE(i2cTemplate); +DECLARE_TEMPLATE(i2dTemplate); +DECLARE_TEMPLATE(l2dTemplate); +DECLARE_TEMPLATE(d2iTemplate); +DECLARE_TEMPLATE(d2lTemplate); +DECLARE_TEMPLATE(iconstm1Template); +DECLARE_TEMPLATE(iconst0Template); +DECLARE_TEMPLATE(iconst1Template); +DECLARE_TEMPLATE(iconst2Template); +DECLARE_TEMPLATE(iconst3Template); +DECLARE_TEMPLATE(iconst4Template); +DECLARE_TEMPLATE(iconst5Template); +DECLARE_TEMPLATE(bipushTemplate); +DECLARE_TEMPLATE(sipushTemplatePrologue); +DECLARE_TEMPLATE(sipushTemplate); +DECLARE_TEMPLATE(iIncTemplate_01_load); +DECLARE_TEMPLATE(iIncTemplate_02_add); +DECLARE_TEMPLATE(iIncTemplate_03_store); +DECLARE_TEMPLATE(lAddTemplate); +DECLARE_TEMPLATE(lSubTemplate); +DECLARE_TEMPLATE(lMulTemplate); +DECLARE_TEMPLATE(lDivTemplate); +DECLARE_TEMPLATE(lRemTemplate); +DECLARE_TEMPLATE(lNegTemplate); +DECLARE_TEMPLATE(lShlTemplate); +DECLARE_TEMPLATE(lShrTemplate); +DECLARE_TEMPLATE(lUshrTemplate); +DECLARE_TEMPLATE(lAndTemplate); +DECLARE_TEMPLATE(lOrTemplate); +DECLARE_TEMPLATE(lXorTemplate); +DECLARE_TEMPLATE(lconst0Template); +DECLARE_TEMPLATE(lconst1Template); +DECLARE_TEMPLATE(fAddTemplate); +DECLARE_TEMPLATE(fSubTemplate); +DECLARE_TEMPLATE(fMulTemplate); +DECLARE_TEMPLATE(fDivTemplate); +DECLARE_TEMPLATE(fRemTemplate); +DECLARE_TEMPLATE(fNegTemplate); +DECLARE_TEMPLATE(fconst0Template); +DECLARE_TEMPLATE(fconst1Template); +DECLARE_TEMPLATE(fconst2Template); +DECLARE_TEMPLATE(dAddTemplate); +DECLARE_TEMPLATE(dSubTemplate); +DECLARE_TEMPLATE(dMulTemplate); +DECLARE_TEMPLATE(dDivTemplate); +DECLARE_TEMPLATE(dRemTemplate); +DECLARE_TEMPLATE(dNegTemplate); +DECLARE_TEMPLATE(dconst0Template); +DECLARE_TEMPLATE(dconst1Template); +DECLARE_TEMPLATE(i2fTemplate); +DECLARE_TEMPLATE(f2iTemplate); +DECLARE_TEMPLATE(l2fTemplate); +DECLARE_TEMPLATE(f2lTemplate); +DECLARE_TEMPLATE(d2fTemplate); +DECLARE_TEMPLATE(f2dTemplate); +DECLARE_TEMPLATE(eaxReturnTemplate); +DECLARE_TEMPLATE(raxReturnTemplate); +DECLARE_TEMPLATE(xmm0ReturnTemplate); +DECLARE_TEMPLATE(retTemplate_add); +DECLARE_TEMPLATE(vReturnTemplate); +DECLARE_TEMPLATE(moveeaxForCall); +DECLARE_TEMPLATE(moveesiForCall); +DECLARE_TEMPLATE(moveedxForCall); +DECLARE_TEMPLATE(moveecxForCall); +DECLARE_TEMPLATE(moveraxRefForCall); +DECLARE_TEMPLATE(moversiRefForCall); +DECLARE_TEMPLATE(moverdxRefForCall); +DECLARE_TEMPLATE(movercxRefForCall); +DECLARE_TEMPLATE(moveraxForCall); +DECLARE_TEMPLATE(moversiForCall); +DECLARE_TEMPLATE(moverdxForCall); +DECLARE_TEMPLATE(movercxForCall); +DECLARE_TEMPLATE(movexmm0ForCall); +DECLARE_TEMPLATE(movexmm1ForCall); +DECLARE_TEMPLATE(movexmm2ForCall); +DECLARE_TEMPLATE(movexmm3ForCall); +DECLARE_TEMPLATE(moveDxmm0ForCall); +DECLARE_TEMPLATE(moveDxmm1ForCall); +DECLARE_TEMPLATE(moveDxmm2ForCall); +DECLARE_TEMPLATE(moveDxmm3ForCall); +DECLARE_TEMPLATE(retTemplate_sub); +DECLARE_TEMPLATE(loadeaxReturn); +DECLARE_TEMPLATE(loadraxReturn); +DECLARE_TEMPLATE(loadxmm0Return); +DECLARE_TEMPLATE(loadDxmm0Return); +DECLARE_TEMPLATE(lcmpTemplate); +DECLARE_TEMPLATE(fcmplTemplate); +DECLARE_TEMPLATE(fcmpgTemplate); +DECLARE_TEMPLATE(dcmplTemplate); +DECLARE_TEMPLATE(dcmpgTemplate); +DECLARE_TEMPLATE(gotoTemplate); +DECLARE_TEMPLATE(ifneTemplate); +DECLARE_TEMPLATE(ifeqTemplate); +DECLARE_TEMPLATE(ifltTemplate); +DECLARE_TEMPLATE(ifleTemplate); +DECLARE_TEMPLATE(ifgtTemplate); +DECLARE_TEMPLATE(ifgeTemplate); +DECLARE_TEMPLATE(ificmpeqTemplate); +DECLARE_TEMPLATE(ificmpneTemplate); +DECLARE_TEMPLATE(ificmpltTemplate); +DECLARE_TEMPLATE(ificmpleTemplate); +DECLARE_TEMPLATE(ificmpgeTemplate); +DECLARE_TEMPLATE(ificmpgtTemplate); +DECLARE_TEMPLATE(ifacmpeqTemplate); +DECLARE_TEMPLATE(ifacmpneTemplate); +DECLARE_TEMPLATE(ifnullTemplate); +DECLARE_TEMPLATE(ifnonnullTemplate); +DECLARE_TEMPLATE(decompressReferenceTemplate); +DECLARE_TEMPLATE(compressReferenceTemplate); +DECLARE_TEMPLATE(decompressReference1Template); +DECLARE_TEMPLATE(compressReference1Template); +DECLARE_TEMPLATE(addrGetFieldTemplatePrologue); +DECLARE_TEMPLATE(addrGetFieldTemplate); + + + +static void +debug_print_hex(char *buffer, unsigned long long size) +{ + printf("Start of dump:\n"); + while(size--){ + printf("%x ", 0xc0 & *(buffer++)); + } + printf("\nEnd of dump:"); +} + +inline uint32_t align(uint32_t number, uint32_t requirement, TR::FilePointer *fp) +{ + MJIT_ASSERT(fp, requirement && ((requirement & (requirement -1)) == 0), "INCORRECT ALIGNMENT"); + return (number + requirement - 1) & ~(requirement - 1); +} + +static bool +getRequiredAlignment(uintptr_t cursor, uintptr_t boundary, uintptr_t margin, uintptr_t *alignment) +{ + if((boundary & (boundary-1)) != 0) + return true; + *alignment = (-cursor - margin) & (boundary-1); + return false; +} + +static intptr_t +getHelperOrTrampolineAddress(TR_RuntimeHelper h, uintptr_t callsite){ + uintptr_t helperAddress = (uintptr_t)runtimeHelperValue(h); + uintptr_t minAddress = (callsite < (uintptr_t)(helperAddress)) ? callsite : (uintptr_t)(helperAddress); + uintptr_t maxAddress = (callsite > (uintptr_t)(helperAddress)) ? callsite : (uintptr_t)(helperAddress); + uintptr_t distance = maxAddress - minAddress; + if(distance > 0x00000000ffffffff){ + helperAddress = (uintptr_t)TR::CodeCacheManager::instance()->findHelperTrampoline(h, (void *)callsite); + } + return helperAddress; +} + +bool +MJIT::nativeSignature(J9Method *method, char *resultBuffer) +{ + J9UTF8 *methodSig; + UDATA arg; + U_16 i, ch; + BOOLEAN parsingReturnType = FALSE, processingBracket = FALSE; + char nextType = '\0'; + + methodSig = J9ROMMETHOD_SIGNATURE(J9_ROM_METHOD_FROM_RAM_METHOD(method)); + i = 0; + arg = 3; /* skip the return type slot and JNI standard slots, they will be filled in later. */ + + while(i < J9UTF8_LENGTH(methodSig)) { + ch = J9UTF8_DATA(methodSig)[i++]; + switch(ch) { + case '(': /* start of signature -- skip */ + continue; + case ')': /* End of signature -- done args, find return type */ + parsingReturnType = TRUE; + continue; + case MJIT::CLASSNAME_TYPE_CHARACTER: + nextType = MJIT::CLASSNAME_TYPE_CHARACTER; + while(J9UTF8_DATA(methodSig)[i++] != ';') {} /* a type string - loop scanning for ';' to end it - i points past ';' when done loop */ + break; + case MJIT::BOOLEAN_TYPE_CHARACTER: + nextType = MJIT::BOOLEAN_TYPE_CHARACTER; + break; + case MJIT::BYTE_TYPE_CHARACTER: + nextType = MJIT::BYTE_TYPE_CHARACTER; + break; + case MJIT::CHAR_TYPE_CHARACTER: + nextType = MJIT::CHAR_TYPE_CHARACTER; + break; + case MJIT::SHORT_TYPE_CHARACTER: + nextType = MJIT::SHORT_TYPE_CHARACTER; + break; + case MJIT::INT_TYPE_CHARACTER: + nextType = MJIT::INT_TYPE_CHARACTER; + break; + case MJIT::LONG_TYPE_CHARACTER: + nextType = MJIT::LONG_TYPE_CHARACTER; + break; + case MJIT::FLOAT_TYPE_CHARACTER: + nextType = MJIT::FLOAT_TYPE_CHARACTER; + break; + case MJIT::DOUBLE_TYPE_CHARACTER: + nextType = MJIT::DOUBLE_TYPE_CHARACTER; + break; + case '[': + processingBracket = TRUE; + continue; /* go back to top of loop for next char */ + case MJIT::VOID_TYPE_CHARACTER: + if(!parsingReturnType) { + return true; + } + nextType = MJIT::VOID_TYPE_CHARACTER; + break; + + default: + nextType = '\0'; + return true; + } + if(processingBracket) { + if(parsingReturnType) { + resultBuffer[0] = MJIT::CLASSNAME_TYPE_CHARACTER; + break; /* from the while loop */ + } else { + resultBuffer[arg] = MJIT::CLASSNAME_TYPE_CHARACTER; + arg++; + processingBracket = FALSE; + } + } else if(parsingReturnType) { + resultBuffer[0] = nextType; + break; /* from the while loop */ + } else { + resultBuffer[arg] = nextType; + arg++; + } + } + + resultBuffer[1] = MJIT::CLASSNAME_TYPE_CHARACTER; /* the JNIEnv */ + resultBuffer[2] = MJIT::CLASSNAME_TYPE_CHARACTER; /* the jobject or jclass */ + resultBuffer[arg] = '\0'; + return false; +} + +/** + * Assuming that 'buffer' is the end of the immediate value location, + * this function replaces the value supplied value 'imm' + * + * Assumes 64-bit value + */ +static void +patchImm8(char *buffer, U_64 imm){ + *(((unsigned long long int*)buffer)-1) = imm; +} + +/** + * Assuming that 'buffer' is the end of the immediate value location, + * this function replaces the value supplied value 'imm' + * + * Assumes 32-bit value + */ +static void +patchImm4(char *buffer, U_32 imm){ + *(((unsigned int*)buffer)-1) = imm; +} + +/** + * Assuming that 'buffer' is the end of the immediate value location, + * this function replaces the value supplied value 'imm' + * + * Assumes 16-bit value + */ +static void +patchImm2(char *buffer, U_16 imm){ + *(((unsigned short int*)buffer)-1) = imm; +} + +/** + * Assuming that 'buffer' is the end of the immediate value location, + * this function replaces the value supplied value 'imm' + * + * Assumes 8-bit value + */ +static void +patchImm1(char *buffer, U_8 imm){ + *(unsigned char*)(buffer-1) = imm; +} + +/* +| Architecture | Endian | Return address | vTable index | +| x86-64 | Little | Stack | R8 (receiver in RAX) | + +| Integer Return value registers | Integer Preserved registers | Integer Argument registers | +| EAX (32-bit) RAX (64-bit) | RBX R9 RSP | RAX RSI RDX RCX | + +| Long Return value registers | Long Preserved registers | Long Argument registers | +| RAX (64-bit) | RBX R9 RSP | RAX RSI RDX RCX | + +| Float Return value registers | Float Preserved registers | Float Argument registers | +| XMM0 | XMM8-XMM15 | XMM0-XMM7 | + +| Double Return value registers | Double Preserved registers | Double Argument registers | +| XMM0 | XMM8-XMM15 | XMM0-XMM7 | +*/ + +inline buffer_size_t +savePreserveRegisters(char *buffer, int32_t offset) +{ + buffer_size_t size = 0; + COPY_TEMPLATE(buffer, saveRBXOffset, size); + patchImm4(buffer, offset); + COPY_TEMPLATE(buffer, saveR9Offset, size); + patchImm4(buffer, offset-8); + COPY_TEMPLATE(buffer, saveRSPOffset, size); + patchImm4(buffer, offset-16); + return size; +} + +inline buffer_size_t +loadPreserveRegisters(char *buffer, int32_t offset) +{ + buffer_size_t size = 0; + COPY_TEMPLATE(buffer, loadRBXOffset, size); + patchImm4(buffer, offset); + COPY_TEMPLATE(buffer, loadR9Offset, size); + patchImm4(buffer, offset-8); + COPY_TEMPLATE(buffer, loadRSPOffset, size); + patchImm4(buffer, offset-16); + return size; +} + +#define COPY_IMM1_TEMPLATE(buffer, size, value) do{\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + patchImm1(buffer, (U_8)(value));\ +}while(0) + +#define COPY_IMM2_TEMPLATE(buffer, size, value) do{\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + patchImm2(buffer, (U_16)(value));\ +}while(0) + +#define COPY_IMM4_TEMPLATE(buffer, size, value) do{\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + patchImm4(buffer, (U_32)(value));\ +}while(0) + +#define COPY_IMM8_TEMPLATE(buffer, size, value) do{\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + COPY_TEMPLATE(buffer, nopInstruction, size);\ + patchImm8(buffer, (U_64)(value));\ +}while(0) + +MJIT::RegisterStack +MJIT::mapIncomingParams( + char *typeString, + U_16 maxLength, + int *error_code, + ParamTableEntry *paramTable, + U_16 actualParamCount, + TR::FilePointer *logFileFP +){ + MJIT::RegisterStack stack; + initParamStack(&stack); + + uint16_t index = 0; + intptr_t offset = 0; + + for(int i = 0; i= paramIndex) ? false : (paramIndex < _paramCount)); + if(success) + *entry = _tableEntries[paramIndex]; + return success; +} + +bool +MJIT::ParamTable::setEntry(U_16 paramIndex, MJIT::ParamTableEntry *entry) +{ + bool success = ((-1 >= paramIndex) ? false : (paramIndex < _paramCount)); + if(success) + _tableEntries[paramIndex] = *entry; + return success; +} + +U_16 +MJIT::ParamTable::getTotalParamSize() +{ + return calculateOffset(_registerStack); +} + +U_16 +MJIT::ParamTable::getParamCount() +{ + return _paramCount; +} + +U_16 +MJIT::ParamTable::getActualParamCount() +{ + return _actualParamCount; +} + +MJIT::LocalTable::LocalTable(LocalTableEntry *tableEntries, U_16 localCount) + : _tableEntries(tableEntries) + , _localCount(localCount) +{} + +bool +MJIT::LocalTable::getEntry(U_16 localIndex, MJIT::LocalTableEntry *entry) +{ + bool success = ((-1 >= localIndex) ? false : (localIndex < _localCount)); + if(success) + *entry = _tableEntries[localIndex]; + return success; +} + +U_16 +MJIT::LocalTable::getTotalLocalSize() +{ + U_16 localSize = 0; + for(int i=0; i<_localCount; i++){ + localSize += _tableEntries[i].slots*8; + } + return localSize; +} + +U_16 +MJIT::LocalTable::getLocalCount() +{ + return _localCount; +} + +buffer_size_t +MJIT::CodeGenerator::generateGCR( + char *buffer, + int32_t initialCount, + J9Method *method, + uintptr_t startPC +){ + // see TR_J9ByteCodeIlGenerator::prependGuardedCountForRecompilation(TR::Block * originalFirstBlock) + // guardBlock: if (!countForRecompile) goto originalFirstBlock; + // bumpCounterBlock: count--; + // if (bodyInfo->count > 0) goto originalFirstBlock; + // callRecompileBlock: call jitRetranslateCallerWithPreparation(j9method, startPC); + // bodyInfo->count=10000 + // goto originalFirstBlock + // bumpCounter: count + // originalFirstBlock: ... + // + buffer_size_t size = 0; + + //MicroJIT either needs its own gaurd, or it needs to not have a guard block + // The following is kept here so that if a new gaurd it + /* + COPY_TEMPLATE(buffer, moveCountAndRecompile, size); + char *guardBlock = buffer; + COPY_TEMPLATE(buffer, checkCountAndRecompile, size); + char *guardBlockJump = buffer; + */ + + //bumpCounterBlock + COPY_TEMPLATE(buffer, loadCounter, size); + char *counterLoader = buffer; + COPY_TEMPLATE(buffer, decrementCounter, size); + char *counterStore = buffer; + COPY_TEMPLATE(buffer, jgCount, size); + char *bumpCounterBlockJump = buffer; + + //callRecompileBlock + COPY_TEMPLATE(buffer, callRetranslateArg1, size); + char *retranslateArg1Patch = buffer; + COPY_TEMPLATE(buffer, callRetranslateArg2, size); + char *retranslateArg2Patch = buffer; + COPY_TEMPLATE(buffer, callRetranslate, size); + char *callSite = buffer; + COPY_TEMPLATE(buffer, setCounter, size); + char *counterSetter = buffer; + COPY_TEMPLATE(buffer, jmpToBody, size); + char *callRecompileBlockJump = buffer; + + //bumpCounter + char *counterLocation = buffer; + COPY_IMM4_TEMPLATE(buffer, size, initialCount); + + uintptr_t jitRetranslateCallerWithPrepHelperAddress = getHelperOrTrampolineAddress(TR_jitRetranslateCallerWithPrep, (uintptr_t)buffer); + + // Find the countForRecompile's address and patch for the guard block. + patchImm8(retranslateArg1Patch, (uintptr_t)(method)); + patchImm8(retranslateArg2Patch, (uintptr_t)(startPC)); + //patchImm8(guardBlock, (uintptr_t)(&(_persistentInfo->_countForRecompile))); + PATCH_RELATIVE_ADDR_32_BIT(callSite, jitRetranslateCallerWithPrepHelperAddress); + patchImm8(counterLoader, (uintptr_t)counterLocation); + patchImm8(counterStore, (uintptr_t)counterLocation); + patchImm8(counterSetter, (uintptr_t)counterLocation); + //PATCH_RELATIVE_ADDR_32_BIT(guardBlockJump, buffer); + PATCH_RELATIVE_ADDR_32_BIT(bumpCounterBlockJump, buffer); + PATCH_RELATIVE_ADDR_32_BIT(callRecompileBlockJump, buffer); + + return size; +} + +buffer_size_t +MJIT::CodeGenerator::saveArgsInLocalArray( + char *buffer, + buffer_size_t stack_alloc_space, + char *typeString +){ + buffer_size_t saveSize = 0; + U_16 slot = 0; + TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; + char typeChar; + int index = 0; + ParamTableEntry entry; + while(index<_paramTable->getParamCount()){ + MJIT_ASSERT(_logFileFP, _paramTable->getEntry(index, &entry), "Bad index for table entry"); + if(entry.notInitialized) { + index++; + continue; + } + if(entry.onStack) break; + int32_t regNum = entry.regNo; + switch(regNum){ + case TR::RealRegister::xmm0: + COPY_TEMPLATE(buffer, saveXMM0Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm1: + COPY_TEMPLATE(buffer, saveXMM1Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm2: + COPY_TEMPLATE(buffer, saveXMM2Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm3: + COPY_TEMPLATE(buffer, saveXMM3Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm4: + COPY_TEMPLATE(buffer, saveXMM4Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm5: + COPY_TEMPLATE(buffer, saveXMM5Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm6: + COPY_TEMPLATE(buffer, saveXMM6Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm7: + COPY_TEMPLATE(buffer, saveXMM7Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::eax: + COPY_TEMPLATE(buffer, saveRAXLocal, saveSize); + goto PatchAndBreak; + case TR::RealRegister::esi: + COPY_TEMPLATE(buffer, saveRSILocal, saveSize); + goto PatchAndBreak; + case TR::RealRegister::edx: + COPY_TEMPLATE(buffer, saveRDXLocal, saveSize); + goto PatchAndBreak; + case TR::RealRegister::ecx: + COPY_TEMPLATE(buffer, saveRCXLocal, saveSize); + goto PatchAndBreak; + PatchAndBreak: + patchImm4(buffer, (U_32)((slot*8) & BIT_MASK_32)); + break; + case TR::RealRegister::NoReg: + break; + } + slot += entry.slots; + index += entry.slots; + } + + while(index<_paramTable->getParamCount()){ + MJIT_ASSERT(_logFileFP, _paramTable->getEntry(index, &entry), "Bad index for table entry"); + if(entry.notInitialized) { + index++; + continue; + } + uintptr_t offset = entry.offset; + //Our linkage templates for loading from the stack assume that the offset will always be less than 0xff. + // This is however not always true for valid code. If this becomes a problem in the future we will + // have to add support for larger offsets by creating new templates that use 2 byte offsets from rsp. + // Whether that will be a change to the current tempaltes, or new templates with supporting code + // is a decision yet to be determined. + if((offset + stack_alloc_space) < uintptr_t(0xff)){ + if(_comp->getOption(TR_TraceCG)){ + trfprintf(_logFileFP, "Offset too large, add support for larger offsets"); + } + return -1; + } + COPY_TEMPLATE(buffer, movRSPOffsetR11, saveSize); + patchImm1(buffer, (U_8)(offset+stack_alloc_space)); + COPY_TEMPLATE(buffer, saveR11Local, saveSize); + patchImm4(buffer, (U_32)((slot*8) & BIT_MASK_32)); + slot += entry.slots; + index += entry.slots; + } + return saveSize; +} + +buffer_size_t +MJIT::CodeGenerator::saveArgsOnStack( + char *buffer, + buffer_size_t stack_alloc_space, + ParamTable *paramTable +){ + buffer_size_t saveArgsSize = 0; + U_16 offset = paramTable->getTotalParamSize(); + TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; + ParamTableEntry entry; + for(int i=0; igetParamCount();){ + MJIT_ASSERT(_logFileFP, paramTable->getEntry(i, &entry), "Bad index for table entry"); + if(entry.notInitialized) { + i++; + continue; + } + if(entry.onStack) + break; + int32_t regNum = entry.regNo; + if(i != 0) + offset -= entry.size; + switch(regNum){ + case TR::RealRegister::xmm0: + COPY_TEMPLATE(buffer, saveXMM0Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm1: + COPY_TEMPLATE(buffer, saveXMM1Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm2: + COPY_TEMPLATE(buffer, saveXMM2Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm3: + COPY_TEMPLATE(buffer, saveXMM3Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm4: + COPY_TEMPLATE(buffer, saveXMM4Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm5: + COPY_TEMPLATE(buffer, saveXMM5Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm6: + COPY_TEMPLATE(buffer, saveXMM6Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm7: + COPY_TEMPLATE(buffer, saveXMM7Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::eax: + COPY_TEMPLATE(buffer, saveRAXOffset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::esi: + COPY_TEMPLATE(buffer, saveRSIOffset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::edx: + COPY_TEMPLATE(buffer, saveRDXOffset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::ecx: + COPY_TEMPLATE(buffer, saveRCXOffset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::NoReg: + break; + } + i += entry.slots; + } + return saveArgsSize; +} + +buffer_size_t +MJIT::CodeGenerator::loadArgsFromStack( + char *buffer, + buffer_size_t stack_alloc_space, + ParamTable *paramTable +){ + U_16 offset = paramTable->getTotalParamSize(); + buffer_size_t argLoadSize = 0; + TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; + ParamTableEntry entry; + for(int i=0; igetParamCount();){ + MJIT_ASSERT(_logFileFP, paramTable->getEntry(i, &entry), "Bad index for table entry"); + if(entry.notInitialized) { + i++; + continue; + } + if(entry.onStack) + break; + int32_t regNum = entry.regNo; + if(i != 0) + offset -= entry.size; + switch(regNum){ + case TR::RealRegister::xmm0: + COPY_TEMPLATE(buffer, loadXMM0Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm1: + COPY_TEMPLATE(buffer, loadXMM1Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm2: + COPY_TEMPLATE(buffer, loadXMM2Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm3: + COPY_TEMPLATE(buffer, loadXMM3Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm4: + COPY_TEMPLATE(buffer, loadXMM4Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm5: + COPY_TEMPLATE(buffer, loadXMM5Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm6: + COPY_TEMPLATE(buffer, loadXMM6Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm7: + COPY_TEMPLATE(buffer, loadXMM7Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::eax: + COPY_TEMPLATE(buffer, loadRAXOffset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::esi: + COPY_TEMPLATE(buffer, loadRSIOffset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::edx: + COPY_TEMPLATE(buffer, loadRDXOffset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::ecx: + COPY_TEMPLATE(buffer, loadRCXOffset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::NoReg: + break; + } + i += entry.slots; + } + return argLoadSize; +} + +MJIT::CodeGenerator::CodeGenerator( + struct J9JITConfig* config, + J9VMThread* thread, + TR::FilePointer* fp, + TR_J9VMBase& vm, + ParamTable* paramTable, + TR::Compilation* comp, + MJIT::CodeGenGC* mjitCGGC, + TR::PersistentInfo* persistentInfo, + TR_Memory *trMemory, + TR_ResolvedMethod *compilee) + :_linkage(Linkage(comp->cg())) + ,_logFileFP(fp) + ,_vm(vm) + ,_codeCache(NULL) + ,_stackPeakSize(0) + ,_paramTable(paramTable) + ,_comp(comp) + ,_mjitCGGC(mjitCGGC) + ,_atlas(NULL) + ,_persistentInfo(persistentInfo) + ,_trMemory(trMemory) + ,_compilee(compilee) + ,_maxCalleeArgsSize(0) +{ + _linkage._properties.setOutgoingArgAlignment(lcm(16, _vm.getLocalObjectAlignmentInBytes())); +} + +TR::GCStackAtlas* +MJIT::CodeGenerator::getStackAtlas() +{ + return _atlas; +} + +buffer_size_t +MJIT::CodeGenerator::generateSwitchToInterpPrePrologue( + char *buffer, + J9Method *method, + buffer_size_t boundary, + buffer_size_t margin, + char *typeString, + U_16 maxLength +){ + + uintptr_t startOfMethod = (uintptr_t)buffer; + buffer_size_t switchSize = 0; + + // Store the J9Method address in edi and store the label of this jitted method. + uintptr_t j2iTransitionPtr = getHelperOrTrampolineAddress(TR_j2iTransition, (uintptr_t)buffer); + uintptr_t methodPtr = (uintptr_t)method; + COPY_TEMPLATE(buffer, movRDIImm64, switchSize); + patchImm8(buffer, methodPtr); + + //if (comp->getOption(TR_EnableHCR)) + // comp->getStaticHCRPICSites()->push_front(prev); + + // Pass args on stack, expects the method to run in RDI + buffer_size_t saveArgSize = saveArgsOnStack(buffer, 0, _paramTable); + buffer += saveArgSize; + switchSize += saveArgSize; + + // Generate jump to the TR_i2jTransition function + COPY_TEMPLATE(buffer, jump4ByteRel, switchSize); + PATCH_RELATIVE_ADDR_32_BIT(buffer, j2iTransitionPtr); + + margin += 2; + uintptr_t requiredAlignment = 0; + if(getRequiredAlignment((uintptr_t)buffer, boundary, margin, &requiredAlignment)){ + buffer = (char*)startOfMethod; + return 0; + } + for(U_8 i = 0; i<=requiredAlignment; i++){ + COPY_TEMPLATE(buffer, nopInstruction, switchSize); + } + + //Generate relative jump to start + COPY_TEMPLATE(buffer, jumpByteRel, switchSize); + PATCH_RELATIVE_ADDR_8_BIT(buffer, startOfMethod); + + return switchSize; +} + +buffer_size_t +MJIT::CodeGenerator::generatePrePrologue( + char* buffer, + J9Method* method, + char** magicWordLocation, + char** first2BytesPatchLocation, + char** samplingRecompileCallLocation +){ + // Return size (in bytes) of pre-proglue on success + + buffer_size_t preprologueSize = 0; + int alignmentMargin = 0; + int alignmentBoundary = 8; + uintptr_t endOfPreviousMethod = (uintptr_t)buffer; + + J9ROMClass *romClass = J9_CLASS_FROM_METHOD(method)->romClass; + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); + U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); + // Not sure whether maxLength needs to be changed here for non-static methods or not + char typeString[maxLength]; + if(nativeSignature(method, typeString) //If this had no signature, we can't compile it. + || !_comp->getRecompilationInfo() //If this has no recompilation info we won't be able to recompile, so let TR compile it later. + ){ + return 0; + } + + // Save area for the first two bytes of the method + jitted body info pointer + linkageinfo in margin. + alignmentMargin += MJIT_SAVE_AREA_SIZE + MJIT_JITTED_BODY_INFO_PTR_SIZE + MJIT_LINKAGE_INFO_SIZE; + + + // Make sure the startPC at least 4-byte aligned. This is important, since the VM + // depends on the alignment (it uses the low order bits as tag bits). + // + // TODO: `mustGenerateSwitchToInterpreterPrePrologue` checks that the code generator's compilation is not: + // `usesPreexistence`, + // `getOption(TR_EnableHCR)`, + // `!fej9()->isAsyncCompilation`, + // `getOption(TR_FullSpeedDebug)`, + // + if(GENERATE_SWITCH_TO_INTERP_PREPROLOGUE){ // TODO: Replace with a check that perfroms the above checks. + //generateSwitchToInterpPrePrologue will align data for use + char *old_buff = buffer; + preprologueSize += generateSwitchToInterpPrePrologue(buffer, method, alignmentBoundary, alignmentMargin, typeString, maxLength); + buffer += preprologueSize; +#ifdef MJIT_DEBUG + trfprintf(_logFileFP, "\ngenerateSwitchToInterpPrePrologue: %d\n", preprologueSize); + for(int32_t i = 0; i < preprologueSize; i++){ + trfprintf(_logFileFP, "%02x\n", ((unsigned char)old_buff[i]) & (unsigned char)0xff); + } +#endif + } else { + uintptr_t requiredAlignment = 0; + if(getRequiredAlignment((uintptr_t)buffer, alignmentBoundary, alignmentMargin, &requiredAlignment)){ + buffer = (char*)endOfPreviousMethod; + return 0; + } + for(U_8 i = 0; igetRecompilationInfo()->getJittedBodyInfo(); + bodyInfo->setUsesGCR(); + bodyInfo->setIsMJITCompiledMethod(true); + COPY_IMM8_TEMPLATE(buffer, preprologueSize, bodyInfo); + + + // Allow 4 bytes for private linkage return type info. Allocate the 4 bytes + // even if the linkage is not private, so that all the offsets are + // predictable. + uint32_t magicWord = 0x00000010; // All MicroJIT compilations are set to recompile with GCR. + switch(typeString[0]){ + case MJIT::VOID_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_VoidReturn); + break; + case MJIT::BYTE_TYPE_CHARACTER: + case MJIT::SHORT_TYPE_CHARACTER: + case MJIT::CHAR_TYPE_CHARACTER: + case MJIT::BOOLEAN_TYPE_CHARACTER: + case MJIT::INT_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_IntReturn); + break; + case MJIT::LONG_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_LongReturn); + break; + case MJIT::CLASSNAME_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_ObjectReturn); + break; + case MJIT::FLOAT_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_FloatXMMReturn); + break; + case MJIT::DOUBLE_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_DoubleXMMReturn); + break; + } + *magicWordLocation = buffer; + return preprologueSize; +} + +/** + * Write the prologue to the buffer and return the size written to the buffer. + */ +buffer_size_t +MJIT::CodeGenerator::generatePrologue( + char *buffer, + J9Method *method, + char **jitStackOverflowJumpPatchLocation, + char *magicWordLocation, + char *first2BytesPatchLocation, + char *samplingRecompileCallLocation, + char **firstInstLocation, + MJIT::ByteCodeIterator *bci +){ + buffer_size_t prologueSize = 0; + uintptr_t prologueStart = (uintptr_t)buffer; + + J9ROMClass *romClass = J9_CLASS_FROM_METHOD(method)->romClass; + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); + U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); + // Not sure whether maxLength needs to be changed here for non-static methods or not + char typeString[maxLength]; + if(nativeSignature(method, typeString)){ + return 0; + } + uintptr_t startPC = (uintptr_t)buffer; + //Same order as saving (See generateSwitchToInterpPrePrologue) + buffer_size_t loadArgSize = loadArgsFromStack(buffer, 0, _paramTable); + buffer += loadArgSize; + prologueSize += loadArgSize; + + uint32_t magicWord = *(magicWordLocation-4); //We save the end of the location for patching, not the start. + magicWord |= (((uintptr_t)buffer) - prologueStart) << 16; + patchImm4(magicWordLocation, magicWord); + + //check for stack overflow + *firstInstLocation = buffer; + COPY_TEMPLATE(buffer, cmpRspRbpDerefOffset, prologueSize); + patchImm4(buffer, (U_32)(0x50)); + patchImm2(first2BytesPatchLocation, *(uint16_t*)(*firstInstLocation)); + COPY_TEMPLATE(buffer, jbe4ByteRel, prologueSize); + *jitStackOverflowJumpPatchLocation = buffer; + + // Compute frame size + // + // allocSize: bytes to be subtracted from the stack pointer when allocating the frame + // peakSize: maximum bytes of stack this method might consume before encountering another stack check + // + struct TR::X86LinkageProperties properties = _linkage._properties; + //Max amount of data to be preserved. It is overkill, but as long as the prologue/epilogue save/load everything + // it is a workable solution for now. Later try to determine what registers need to be preserved ahead of time. + const int32_t pointerSize = properties.getPointerSize(); + U_32 preservedRegsSize = properties._numPreservedRegisters * pointerSize; + const int32_t localSize = (romMethod->argCount + romMethod->tempCount)*pointerSize; + MJIT_ASSERT(_logFileFP, localSize >= 0, "assertion failure"); + int32_t frameSize = localSize + preservedRegsSize + ( properties.getReservesOutgoingArgsInPrologue() ? pointerSize : 0 ); + uint32_t stackSize = frameSize + properties.getRetAddressWidth(); + uint32_t adjust = align(stackSize, properties.getOutgoingArgAlignment(), _logFileFP) - stackSize; + auto allocSize = frameSize + adjust; + + const int32_t mjitRegisterSize = (6 * pointerSize); + + // return address is allocated by call instruction + // Here we conservatively assume there is a call + _stackPeakSize = + allocSize + //space for stack frame + mjitRegisterSize +//saved registers, conservatively expecting a call + _stackPeakSize; //space for mjit value stack (set in CompilationInfoPerThreadBase::mjit) + + // Small: entire stack usage fits in STACKCHECKBUFFER, so if sp is within + // the soft limit before buying the frame, then the whole frame will fit + // within the hard limit. + // + // Medium: the additional stack required after bumping the sp fits in + // STACKCHECKBUFFER, so if sp after the bump is within the soft limit, the + // whole frame will fit within the hard limit. + // + // Large: No shortcuts. Calculate the maximum extent of stack needed and + // compare that against the soft limit. (We have to use the soft limit here + // if for no other reason than that's the one used for asyncchecks.) + // + + COPY_TEMPLATE(buffer, subRSPImm4, prologueSize); + patchImm4(buffer, _stackPeakSize); + + buffer_size_t savePreserveSize = savePreserveRegisters(buffer, _stackPeakSize-8); + buffer += savePreserveSize; + prologueSize += savePreserveSize; + + buffer_size_t saveArgSize = saveArgsOnStack(buffer, _stackPeakSize, _paramTable); + buffer += saveArgSize; + prologueSize += saveArgSize; + + /* + * MJIT stack frames must conform to TR stack frame shapes. However, so long as we are able to have our frames walked, we should be + * free to allocate more things on the stack before the required data which is used for walking. The following is the general shape of the + * initial MJIT stack frame + * +-----------------------------+ + * | ... | + * | Paramaters passed by caller | <-- The space for saving params passed in registers is also here + * | ... | + * +-----------------------------+ + * | Return Address | <-- RSP points here before we sub the _stackPeakSize + * +-----------------------------+ + * | Alignment space | + * +-----------------------------+ <-- Caller/Callee stack boundary. + * | ... | + * | Callee Preserved Registers | + * | ... | + * +-----------------------------+ + * | ... | + * | Local Variables | + * | ... | <-- R10 points here at the end of the prologue, R14 always points here. + * +-----------------------------+ + * | ... | + * | MJIT Computation Stack | + * | ... | + * +-----------------------------+ + * | ... | + * | Caller Preserved Registers | + * | ... | <-- RSP points here after we sub _stackPeakSize the first time + * +-----------------------------+ + * | ... | + * | Paramaters passed to callee | + * | ... | <-- RSP points here after we sub _stackPeakSize the second time + * +-----------------------------+ + * | Return Address | <-- RSP points here after a callq instruction + * +-----------------------------+ <-- End of stack frame + */ + + //Set up MJIT value stack + COPY_TEMPLATE(buffer, movRSPR10, prologueSize); + COPY_TEMPLATE(buffer, addR10Imm4, prologueSize); + patchImm4(buffer, (U_32)(mjitRegisterSize)); + COPY_TEMPLATE(buffer, addR10Imm4, prologueSize); + patchImm4(buffer, (U_32)(romMethod->maxStack*8)); + //TODO: Find a way to cleanly preserve this value before overriding it in the _stackAllocSize + + //Set up MJIT local array + COPY_TEMPLATE(buffer, movR10R14, prologueSize); + + // Move parameters to where the method body will expect to find them + buffer_size_t saveSize = saveArgsInLocalArray(buffer, _stackPeakSize, typeString); + if((int32_t)saveSize == -1) + return 0; + buffer += saveSize; + prologueSize += saveSize; + if (!saveSize && romMethod->argCount) return 0; + + int32_t firstLocalOffset = (preservedRegsSize+8); + // This is the number of slots, not variables. + U_16 localVariableCount = romMethod->argCount + romMethod->tempCount; + MJIT::LocalTableEntry localTableEntries[localVariableCount]; + for(int i=0; icg()->setStackFramePaddingSizeInBytes(adjust); + _comp->cg()->setFrameSizeInBytes(_stackPeakSize); + + //The Parameters are now in the local array, so we update the entries in the ParamTable with their values from the LocalTable + ParamTableEntry entry; + for(int i=0; i<_paramTable->getParamCount(); i += ((entry.notInitialized) ? 1 : entry.slots)){ + //The indexes should be the same in both tables + // because (as far as we know) the parameters + // are indexed the same as locals and parameters + localTable->getEntry(i, &entry); + _paramTable->setEntry(i, &entry); + } + + TR::GCStackAtlas *atlas = _mjitCGGC->createStackAtlas(_comp, _paramTable, localTable); + if (!atlas) return 0; + _atlas = atlas; + + //Support to paint allocated frame slots. + if (( _comp->getOption(TR_PaintAllocatedFrameSlotsDead) || _comp->getOption(TR_PaintAllocatedFrameSlotsFauxObject) ) && allocSize!=0) { + uint64_t paintValue = 0; + + //Paint the slots with deadf00d + if (_comp->getOption(TR_PaintAllocatedFrameSlotsDead)) { + paintValue = (uint64_t)CONSTANT64(0xdeadf00ddeadf00d); + } else { //Paint stack slots with a arbitrary object aligned address. + paintValue = ((uintptr_t) ((uintptr_t)_comp->getOptions()->getHeapBase() + (uintptr_t) 4096)); + } + + COPY_TEMPLATE(buffer, paintRegister, prologueSize); + patchImm8(buffer, paintValue); + for(int32_t i=_paramTable->getParamCount(); igetLocalCount();){ + if(entry.notInitialized){ + i++; + continue; + } + localTable->getEntry(i, &entry); + COPY_TEMPLATE(buffer, paintLocal, prologueSize); + patchImm4(buffer, i * pointerSize); + i += entry.slots; + } + } + uint32_t count = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT; + prologueSize += generateGCR(buffer, count, method, startPC); + + return prologueSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateEpologue(char *buffer){ + buffer_size_t epologueSize = loadPreserveRegisters(buffer, _stackPeakSize-8); + buffer += epologueSize; + COPY_TEMPLATE(buffer, addRSPImm4, epologueSize); + patchImm4(buffer, _stackPeakSize - _maxCalleeArgsSize); + return epologueSize; +} + +// Macros to clean up the switch case for generate body +#define loadCasePrologue(byteCodeCaseType)\ +case TR_J9ByteCode::byteCodeCaseType +#define loadCaseBody(byteCodeCaseType)\ + goto GenericLoadCall +#define storeCasePrologue(byteCodeCaseType)\ +case TR_J9ByteCode::byteCodeCaseType +#define storeCaseBody(byteCodeCaseType)\ + goto GenericStoreCall + +// used for each conditional and jump except gotow (wide) +#define branchByteCode(byteCode, template)\ +case byteCode: {\ + _targetIndex = bci->next2BytesSigned() + bci->currentByteCodeIndex();\ + COPY_TEMPLATE(buffer, template, codeGenSize);\ + MJIT::JumpTableEntry _entry(_targetIndex, buffer);\ + _jumpTable[_targetCounter] = _entry;\ + _targetCounter++;\ + break;\ + } +buffer_size_t +MJIT::CodeGenerator::generateBody(char *buffer, MJIT::ByteCodeIterator *bci){ + buffer_size_t codeGenSize = 0; + buffer_size_t calledCGSize = 0; + int32_t _currentByteCodeIndex = 0; + int32_t _targetIndex = 0; + int32_t _targetCounter = 0; + +#ifdef MJIT_DEBUG_BC_WALKING + std::string _bcMnemonic, _bcText; + u_int8_t _bcOpcode; +#endif + + uintptr_t SSEfloatRemainder; + uintptr_t SSEdoubleRemainder; + + // Map between ByteCode Index to generated JITed code address + char* _byteCodeIndexToJitTable[bci->maxByteCodeIndex()]; + + // Each time we encounter a branch or goto + // add an entry. After codegen completes use + // these entries to patch addresses using _byteCodeIndexToJitTable. + MJIT::JumpTableEntry _jumpTable[bci->maxByteCodeIndex()]; + + +#ifdef MJIT_DEBUG_BC_WALKING + MJIT_DEBUG_BC_LOG(_logFileFP, "\nMicroJIT Bytecode Dump:\n"); + bci->printByteCodes(); +#endif + + struct TR::X86LinkageProperties *properties = &_linkage._properties; + TR::RealRegister::RegNum returnRegIndex; + U_16 slotCountRem = 0; + bool isDouble = false; + for(TR_J9ByteCode bc = bci->first(); bc != J9BCunknown; bc = bci->next()) + { + +#ifdef MJIT_DEBUG_BC_WALKING + // In cases of conjunction of two bytecodes, the _bcMnemonic needs to be replaced by correct bytecodes, + // i.e., JBaload0getfield needs to be replaced by JBaload0 + // and JBnewdup by JBnew + if(!strcmp(bci->currentMnemonic(), "JBaload0getfield")) + { + const char * newMnemonic = "JBaload0"; + _bcMnemonic = std::string(newMnemonic); + } + else if(!strcmp(bci->currentMnemonic(), "JBnewdup")) + { + const char * newMnemonic = "JBnew"; + _bcMnemonic = std::string(newMnemonic); + } + else + { + _bcMnemonic = std::string(bci->currentMnemonic()); + } + _bcOpcode = bci->currentOpcode(); + _bcText = _bcMnemonic + "\n"; + MJIT_DEBUG_BC_LOG(_logFileFP, _bcText.c_str()); +#endif + + _currentByteCodeIndex = bci->currentByteCodeIndex(); + + // record BCI to JIT address + _byteCodeIndexToJitTable[_currentByteCodeIndex] = buffer; + + switch(bc) { + loadCasePrologue(J9BCiload0): + loadCaseBody(J9BCiload0); + loadCasePrologue(J9BCiload1): + loadCaseBody(J9BCiload1); + loadCasePrologue(J9BCiload2): + loadCaseBody(J9BCiload2); + loadCasePrologue(J9BCiload3): + loadCaseBody(J9BCiload3); + loadCasePrologue(J9BCiload): + loadCaseBody(J9BCiload); + loadCasePrologue(J9BClload0): + loadCaseBody(J9BClload0); + loadCasePrologue(J9BClload1): + loadCaseBody(J9BClload1); + loadCasePrologue(J9BClload2): + loadCaseBody(J9BClload2); + loadCasePrologue(J9BClload3): + loadCaseBody(J9BClload3); + loadCasePrologue(J9BClload): + loadCaseBody(J9BClload); + loadCasePrologue(J9BCfload0): + loadCaseBody(J9BCfload0); + loadCasePrologue(J9BCfload1): + loadCaseBody(J9BCfload1); + loadCasePrologue(J9BCfload2): + loadCaseBody(J9BCfload2); + loadCasePrologue(J9BCfload3): + loadCaseBody(J9BCfload3); + loadCasePrologue(J9BCfload): + loadCaseBody(J9BCfload); + loadCasePrologue(J9BCdload0): + loadCaseBody(J9BCdload0); + loadCasePrologue(J9BCdload1): + loadCaseBody(J9BCdload1); + loadCasePrologue(J9BCdload2): + loadCaseBody(J9BCdload2); + loadCasePrologue(J9BCdload3): + loadCaseBody(J9BCdload3); + loadCasePrologue(J9BCdload): + loadCaseBody(J9BCdload); + loadCasePrologue(J9BCaload0): + loadCaseBody(J9BCaload0); + loadCasePrologue(J9BCaload1): + loadCaseBody(J9BCaload1); + loadCasePrologue(J9BCaload2): + loadCaseBody(J9BCaload2); + loadCasePrologue(J9BCaload3): + loadCaseBody(J9BCaload3); + loadCasePrologue(J9BCaload): + loadCaseBody(J9BCaload); + GenericLoadCall: + if(calledCGSize = generateLoad(buffer, bc, bci)){ + buffer += calledCGSize; + codeGenSize += calledCGSize; + } else { +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "Unsupported load bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); +#endif + return 0; + } + break; + storeCasePrologue(J9BCistore0): + storeCaseBody(J9BCistore0); + storeCasePrologue(J9BCistore1): + storeCaseBody(J9BCistore1); + storeCasePrologue(J9BCistore2): + storeCaseBody(J9BCistore2); + storeCasePrologue(J9BCistore3): + storeCaseBody(J9BCistore3); + storeCasePrologue(J9BCistore): + storeCaseBody(J9BCistore); + storeCasePrologue(J9BClstore0): + storeCaseBody(J9BClstore0); + storeCasePrologue(J9BClstore1): + storeCaseBody(J9BClstore1); + storeCasePrologue(J9BClstore2): + storeCaseBody(J9BClstore2); + storeCasePrologue(J9BClstore3): + storeCaseBody(J9BClstore3); + storeCasePrologue(J9BClstore): + storeCaseBody(J9BClstore); + storeCasePrologue(J9BCfstore0): + storeCaseBody(J9BCfstore0); + storeCasePrologue(J9BCfstore1): + storeCaseBody(J9BCfstore1); + storeCasePrologue(J9BCfstore2): + storeCaseBody(J9BCfstore2); + storeCasePrologue(J9BCfstore3): + storeCaseBody(J9BCfstore3); + storeCasePrologue(J9BCfstore): + storeCaseBody(J9BCfstore); + storeCasePrologue(J9BCdstore0): + storeCaseBody(J9BCdstore0); + storeCasePrologue(J9BCdstore1): + storeCaseBody(J9BCdstore1); + storeCasePrologue(J9BCdstore2): + storeCaseBody(J9BCdstore2); + storeCasePrologue(J9BCdstore3): + storeCaseBody(J9BCdstore3); + storeCasePrologue(J9BCdstore): + storeCaseBody(J9BCdstore); + storeCasePrologue(J9BCastore0): + storeCaseBody(J9BCastore0); + storeCasePrologue(J9BCastore1): + storeCaseBody(J9BCastore1); + storeCasePrologue(J9BCastore2): + storeCaseBody(J9BCastore2); + storeCasePrologue(J9BCastore3): + storeCaseBody(J9BCastore3); + storeCasePrologue(J9BCastore): + storeCaseBody(J9BCastore); + GenericStoreCall: + if(calledCGSize = generateStore(buffer, bc, bci)){ + buffer += calledCGSize; + codeGenSize += calledCGSize; + } else { +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "Unsupported store bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); +#endif + return 0; + } + break; + branchByteCode(TR_J9ByteCode::J9BCgoto,gotoTemplate); + branchByteCode(TR_J9ByteCode::J9BCifne,ifneTemplate); + branchByteCode(TR_J9ByteCode::J9BCifeq,ifeqTemplate); + branchByteCode(TR_J9ByteCode::J9BCiflt,ifltTemplate); + branchByteCode(TR_J9ByteCode::J9BCifle,ifleTemplate); + branchByteCode(TR_J9ByteCode::J9BCifgt,ifgtTemplate); + branchByteCode(TR_J9ByteCode::J9BCifge,ifgeTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmpeq,ificmpeqTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmpne,ificmpneTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmplt,ificmpltTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmple,ificmpleTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmpge,ificmpgeTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmpgt,ificmpgtTemplate); + branchByteCode(TR_J9ByteCode::J9BCifacmpeq,ifacmpeqTemplate); + branchByteCode(TR_J9ByteCode::J9BCifacmpne,ifacmpneTemplate); + branchByteCode(TR_J9ByteCode::J9BCifnull,ifnullTemplate); + branchByteCode(TR_J9ByteCode::J9BCifnonnull,ifnonnullTemplate); + /* Commenting out as this is currently untested + case TR_J9ByteCode::J9BCgotow : + { + _targetIndex = bci->next4BytesSigned() + bci->currentByteCodeIndex(); //Copied because of 4 byte jump + COPY_TEMPLATE(buffer, gotoTemplate, codeGenSize); + MJIT::JumpTableEntry _entry(_targetIndex, buffer); + _jumpTable[_targetCounter] = _entry; + _targetCounter++; + } + break; + */ + case TR_J9ByteCode::J9BClcmp: + COPY_TEMPLATE(buffer, lcmpTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfcmpl: + COPY_TEMPLATE(buffer, fcmplTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfcmpg: + COPY_TEMPLATE(buffer, fcmpgTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdcmpl: + COPY_TEMPLATE(buffer, dcmplTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdcmpg: + COPY_TEMPLATE(buffer, dcmpgTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCpop: + COPY_TEMPLATE(buffer, popTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCpop2: + COPY_TEMPLATE(buffer, pop2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCswap: + COPY_TEMPLATE(buffer, swapTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdup: + COPY_TEMPLATE(buffer, dupTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdupx1: + COPY_TEMPLATE(buffer, dupx1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdupx2: + COPY_TEMPLATE(buffer, dupx2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdup2: + COPY_TEMPLATE(buffer, dup2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdup2x1: + COPY_TEMPLATE(buffer, dup2x1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdup2x2: + COPY_TEMPLATE(buffer, dup2x2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCgetfield: + if(calledCGSize = generateGetField(buffer, bci)) { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } else { +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "Unsupported getField bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); +#endif + return 0; + } + break; + case TR_J9ByteCode::J9BCputfield: + return 0; + if(calledCGSize = generatePutField(buffer, bci)) { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } else { +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "Unsupported putField bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); +#endif + return 0; + } + break; + case TR_J9ByteCode::J9BCgetstatic: + if(calledCGSize = generateGetStatic(buffer, bci)){ + buffer += calledCGSize; + codeGenSize += calledCGSize; + } else { +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "Unsupported getstatic bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); +#endif + return 0; + } + break; + case TR_J9ByteCode::J9BCputstatic: + return 0; + if(calledCGSize = generatePutStatic(buffer, bci)){ + buffer += calledCGSize; + codeGenSize += calledCGSize; + } else { +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "Unsupported putstatic bytecode - %s %d\n", _bcMnemonic.c_str(), _bcOpcode); +#endif + return 0; + } + break; + case TR_J9ByteCode::J9BCiadd: + COPY_TEMPLATE(buffer, iAddTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCisub: + COPY_TEMPLATE(buffer, iSubTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCimul: + COPY_TEMPLATE(buffer, iMulTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCidiv: + COPY_TEMPLATE(buffer, iDivTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCirem: + COPY_TEMPLATE(buffer, iRemTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCineg: + COPY_TEMPLATE(buffer, iNegTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCishl: + COPY_TEMPLATE(buffer, iShlTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCishr: + COPY_TEMPLATE(buffer, iShrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCiushr: + COPY_TEMPLATE(buffer, iUshrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCiand: + COPY_TEMPLATE(buffer, iAndTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCior: + COPY_TEMPLATE(buffer, iOrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCixor: + COPY_TEMPLATE(buffer, iXorTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2l: + COPY_TEMPLATE(buffer, i2lTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCl2i: + COPY_TEMPLATE(buffer, l2iTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2b: + COPY_TEMPLATE(buffer, i2bTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2s: + COPY_TEMPLATE(buffer, i2sTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2c: + COPY_TEMPLATE(buffer, i2cTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2d: + COPY_TEMPLATE(buffer, i2dTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCl2d: + COPY_TEMPLATE(buffer, l2dTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCd2i: + COPY_TEMPLATE(buffer, d2iTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCd2l: + COPY_TEMPLATE(buffer, d2lTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconstm1: + COPY_TEMPLATE(buffer, iconstm1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst0: + COPY_TEMPLATE(buffer, iconst0Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst1: + COPY_TEMPLATE(buffer, iconst1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst2: + COPY_TEMPLATE(buffer, iconst2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst3: + COPY_TEMPLATE(buffer, iconst3Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst4: + COPY_TEMPLATE(buffer, iconst4Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst5: + COPY_TEMPLATE(buffer, iconst5Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCbipush: + COPY_TEMPLATE(buffer, bipushTemplate, codeGenSize); + patchImm4(buffer, (U_32)bci->nextByteSigned()); + break; + case TR_J9ByteCode::J9BCsipush: + COPY_TEMPLATE(buffer, sipushTemplatePrologue, codeGenSize); + patchImm2(buffer, (U_32)bci->next2Bytes()); + COPY_TEMPLATE(buffer, sipushTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCiinc: { + uint8_t index = bci->nextByte(); + int8_t value = bci->nextByteSigned(2); + COPY_TEMPLATE(buffer, iIncTemplate_01_load, codeGenSize); + patchImm4(buffer, (U_32)(index*8)); + COPY_TEMPLATE(buffer, iIncTemplate_02_add, codeGenSize); + patchImm1(buffer, value); + COPY_TEMPLATE(buffer, iIncTemplate_03_store, codeGenSize); + patchImm4(buffer, (U_32)(index*8)); + break; } + case TR_J9ByteCode::J9BCladd: + COPY_TEMPLATE(buffer, lAddTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClsub: + COPY_TEMPLATE(buffer, lSubTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClmul: + COPY_TEMPLATE(buffer, lMulTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCldiv: + COPY_TEMPLATE(buffer, lDivTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClrem: + COPY_TEMPLATE(buffer, lRemTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClneg: + COPY_TEMPLATE(buffer, lNegTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClshl: + COPY_TEMPLATE(buffer, lShlTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClshr: + COPY_TEMPLATE(buffer, lShrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClushr: + COPY_TEMPLATE(buffer, lUshrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCland: + COPY_TEMPLATE(buffer, lAndTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClor: + COPY_TEMPLATE(buffer, lOrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClxor: + COPY_TEMPLATE(buffer, lXorTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClconst0: + COPY_TEMPLATE(buffer, lconst0Template, codeGenSize); + break; + case TR_J9ByteCode::J9BClconst1: + COPY_TEMPLATE(buffer, lconst1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCfadd: + COPY_TEMPLATE(buffer, fAddTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfsub: + COPY_TEMPLATE(buffer, fSubTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfmul: + COPY_TEMPLATE(buffer, fMulTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfdiv: + COPY_TEMPLATE(buffer, fDivTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfrem: + COPY_TEMPLATE(buffer, fRemTemplate, codeGenSize); + SSEfloatRemainder = getHelperOrTrampolineAddress(TR_AMD64floatRemainder, (uintptr_t)buffer); + PATCH_RELATIVE_ADDR_32_BIT(buffer, (intptr_t)SSEfloatRemainder); + slotCountRem = 1; + goto genRem; + case TR_J9ByteCode::J9BCdrem: + COPY_TEMPLATE(buffer, dRemTemplate, codeGenSize); + SSEdoubleRemainder = getHelperOrTrampolineAddress(TR_AMD64doubleRemainder, (uintptr_t)buffer); + PATCH_RELATIVE_ADDR_32_BIT(buffer, (intptr_t)SSEdoubleRemainder); + slotCountRem = 2; + isDouble = true; + goto genRem; + genRem: + returnRegIndex = properties->getFloatReturnRegister(); + COPY_TEMPLATE(buffer, retTemplate_sub, codeGenSize); + patchImm1(buffer, slotCountRem*8); + switch(returnRegIndex){ + case TR::RealRegister::xmm0: + if(isDouble) { + COPY_TEMPLATE(buffer, loadDxmm0Return, codeGenSize); + } else { + COPY_TEMPLATE(buffer, loadxmm0Return, codeGenSize); + } + break; + } + break; + case TR_J9ByteCode::J9BCfneg: + COPY_TEMPLATE(buffer, fNegTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfconst0: + COPY_TEMPLATE(buffer, fconst0Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCfconst1: + COPY_TEMPLATE(buffer, fconst1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCfconst2: + COPY_TEMPLATE(buffer, fconst2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdconst0: + COPY_TEMPLATE(buffer, dconst0Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdconst1: + COPY_TEMPLATE(buffer, dconst1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdadd: + COPY_TEMPLATE(buffer, dAddTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdsub: + COPY_TEMPLATE(buffer, dSubTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdmul: + COPY_TEMPLATE(buffer, dMulTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCddiv: + COPY_TEMPLATE(buffer, dDivTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdneg: + COPY_TEMPLATE(buffer, dNegTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2f: + COPY_TEMPLATE(buffer, i2fTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCf2i: + COPY_TEMPLATE(buffer, f2iTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCl2f: + COPY_TEMPLATE(buffer, l2fTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCf2l: + COPY_TEMPLATE(buffer, f2lTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCd2f: + COPY_TEMPLATE(buffer, d2fTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCf2d: + COPY_TEMPLATE(buffer, f2dTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCReturnB: // fallthrough + case TR_J9ByteCode::J9BCReturnC: // fallthrough + case TR_J9ByteCode::J9BCReturnS: // fallthrough + case TR_J9ByteCode::J9BCReturnZ: // fallthrough + case TR_J9ByteCode::J9BCgenericReturn: + if(calledCGSize = generateReturn(buffer, _compilee->returnType(), properties)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } else if(_compilee->returnType() == TR::Int64) { + buffer_size_t epilogueSize = generateEpologue(buffer); + buffer += epilogueSize; + codeGenSize += epilogueSize; + COPY_TEMPLATE(buffer, eaxReturnTemplate, codeGenSize); + COPY_TEMPLATE(buffer, retTemplate_add, codeGenSize); + patchImm1(buffer, 8); + COPY_TEMPLATE(buffer, vReturnTemplate, codeGenSize); + } else { +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "Unknown Return type: %d\n", _compilee->returnType()); +#endif + return 0; + } + break; + case TR_J9ByteCode::J9BCinvokestatic: + if(calledCGSize = generateInvokeStatic(buffer, bci, properties)) { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } else { +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "Unsupported invokestatic bytecode %d\n", bc); +#endif + return 0; + } + break; + default: +#ifdef MJIT_DEBUG_BC_WALKING + trfprintf(_logFileFP, "no match for bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); +#endif + return 0; + } + } + + // patch branches and goto addresses + for(int i = 0; i < _targetCounter; i++) { + MJIT::JumpTableEntry e = _jumpTable[i]; + char * _target = _byteCodeIndexToJitTable[e.byteCodeIndex]; + PATCH_RELATIVE_ADDR_32_BIT(e.codeCacheAddress, (uintptr_t)_target); + } + + return codeGenSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateEdgeCounter(char *buffer, TR_J9ByteCodeIterator *bci) +{ + buffer_size_t returnSize = 0; + // TODO: Get the pointer to the profiling counter + uint32_t* profilingCounterLocation = NULL; + + COPY_TEMPLATE(buffer, loadCounter, returnSize); + patchImm8(buffer, (uint64_t)profilingCounterLocation); + COPY_TEMPLATE(buffer, incrementCounter, returnSize); + patchImm8(buffer, (uint64_t)profilingCounterLocation); + + return returnSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateReturn(char *buffer, TR::DataType dt, struct TR::X86LinkageProperties *properties) +{ + buffer_size_t returnSize = 0; + buffer_size_t calledCGSize = 0; + TR::RealRegister::RegNum returnRegIndex; + U_16 slotCount = 0; + bool isAddressOrLong = false; + switch(dt){ + case TR::Address: + slotCount = 1; + isAddressOrLong = true; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genReturn; + case TR::Int64: + slotCount = 2; + isAddressOrLong = true; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genReturn; + case TR::Int8: // fallthrough + case TR::Int16: // fallthrough + case TR::Int32: // fallthrough + slotCount = 1; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genReturn; + case TR::Float: + slotCount = 1; + returnRegIndex = properties->getFloatReturnRegister(); + goto genReturn; + case TR::Double: + slotCount = 2; + returnRegIndex = properties->getFloatReturnRegister(); + goto genReturn; + case TR::NoType: + returnRegIndex = TR::RealRegister::NoReg; + goto genReturn; + genReturn: + calledCGSize = generateEpologue(buffer); + buffer += calledCGSize; + returnSize += calledCGSize; + switch(returnRegIndex){ + case TR::RealRegister::eax: + if(isAddressOrLong) { + COPY_TEMPLATE(buffer, raxReturnTemplate, returnSize); + } else { + COPY_TEMPLATE(buffer, eaxReturnTemplate, returnSize); + } + goto genSlots; + case TR::RealRegister::xmm0: + COPY_TEMPLATE(buffer, xmm0ReturnTemplate, returnSize); + goto genSlots; + case TR::RealRegister::NoReg: + goto genSlots; + genSlots: + if(slotCount){ + COPY_TEMPLATE(buffer, retTemplate_add, returnSize); + patchImm1(buffer, slotCount*8); + } + COPY_TEMPLATE(buffer, vReturnTemplate, returnSize); + break; + } + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", dt.toString()); + break; + } + return returnSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateLoad(char *buffer, TR_J9ByteCode bc, MJIT::ByteCodeIterator *bci) +{ + char *signature = _compilee->signatureChars(); + int index = -1; + buffer_size_t loadSize = 0; + switch(bc) { + GenerateATemplate: + COPY_TEMPLATE(buffer, aloadTemplatePrologue, loadSize); + patchImm4(buffer, (U_32)(index*8)); + COPY_TEMPLATE(buffer, loadTemplate, loadSize); + break; + GenerateITemplate: + COPY_TEMPLATE(buffer, iloadTemplatePrologue, loadSize); + patchImm4(buffer, (U_32)(index*8)); + COPY_TEMPLATE(buffer, loadTemplate, loadSize); + break; + GenerateLTemplate: + COPY_TEMPLATE(buffer, lloadTemplatePrologue, loadSize); + patchImm4(buffer, (U_32)(index*8)); + COPY_TEMPLATE(buffer, loadTemplate, loadSize); + break; + case TR_J9ByteCode::J9BClload0: + case TR_J9ByteCode::J9BCdload0: + index = 0; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload0: + case TR_J9ByteCode::J9BCiload0: + index = 0; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload0: + index = 0; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClload1: + case TR_J9ByteCode::J9BCdload1: + index = 1; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload1: + case TR_J9ByteCode::J9BCiload1: + index = 1; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload1: + index = 1; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClload2: + case TR_J9ByteCode::J9BCdload2: + index = 2; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload2: + case TR_J9ByteCode::J9BCiload2: + index = 2; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload2: + index = 2; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClload3: + case TR_J9ByteCode::J9BCdload3: + index = 3; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload3: + case TR_J9ByteCode::J9BCiload3: + index = 3; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload3: + index = 3; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClload: + case TR_J9ByteCode::J9BCdload: + index = bci->nextByte(); + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload: + case TR_J9ByteCode::J9BCiload: + index = bci->nextByte(); + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload: + index = bci->nextByte(); + goto GenerateATemplate; + default: + return 0; + } + return loadSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateStore(char *buffer, TR_J9ByteCode bc, MJIT::ByteCodeIterator *bci) +{ + char *signature = _compilee->signatureChars(); + int index = -1; + buffer_size_t storeSize = 0; + switch(bc) { + GenerateATemplate: + COPY_TEMPLATE(buffer, astoreTemplate, storeSize); + patchImm4(buffer, (U_32)(index*8)); + break; + GenerateLTemplate: + COPY_TEMPLATE(buffer, lstoreTemplate, storeSize); + patchImm4(buffer, (U_32)(index*8)); + break; + GenerateITemplate: + COPY_TEMPLATE(buffer, istoreTemplate, storeSize); + patchImm4(buffer, (U_32)(index*8)); + break; + case TR_J9ByteCode::J9BClstore0: + case TR_J9ByteCode::J9BCdstore0: + index = 0; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore0: + case TR_J9ByteCode::J9BCistore0: + index = 0; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore0: + index = 0; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClstore1: + case TR_J9ByteCode::J9BCdstore1: + index = 1; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore1: + case TR_J9ByteCode::J9BCistore1: + index = 1; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore1: + index = 1; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClstore2: + case TR_J9ByteCode::J9BCdstore2: + index = 2; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore2: + case TR_J9ByteCode::J9BCistore2: + index = 2; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore2: + index = 2; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClstore3: + case TR_J9ByteCode::J9BCdstore3: + index = 3; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore3: + case TR_J9ByteCode::J9BCistore3: + index = 3; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore3: + index = 3; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClstore: + case TR_J9ByteCode::J9BCdstore: + index = bci->nextByte(); + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore: + case TR_J9ByteCode::J9BCistore: + index = bci->nextByte(); + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore: + index = bci->nextByte(); + goto GenerateATemplate; + default: + return 0; + } + return storeSize; +} + + +//TODO: see ::emitSnippitCode and redo this to use snippits. +buffer_size_t +MJIT::CodeGenerator::generateArgumentMoveForStaticMethod(char *buffer, TR_ResolvedMethod *staticMethod, char *typeString, U_16 typeStringLength, U_16 slotCount, struct TR::X86LinkageProperties *properties){ + buffer_size_t argumentMoveSize = 0; + + //Pull out parameters and find their place for calling + U_16 paramCount = MJIT::getParamCount(typeString, typeStringLength); + U_16 actualParamCount = MJIT::getActualParamCount(typeString, typeStringLength); + //TODO: Implement passing arguments on stack when surpassing register count. + if(actualParamCount > 4){ + return 0; + } + + if(!paramCount){ + return -1; + } + MJIT::ParamTableEntry paramTableEntries[paramCount]; + for(int i=0; i0){ + if(paramTableEntries[lastIndex].notInitialized) { + lastIndex--; + } else { + break; + } + } + bool isLongOrDouble; + bool isReference; + TR::RealRegister::RegNum argRegNum; + MJIT::ParamTableEntry entry; + while(actualParamCount>0){ + isLongOrDouble = false; + isReference = false; + argRegNum = TR::RealRegister::NoReg; + switch((actualParamCount-1)){ + case 0: // fallthrough + case 1: // fallthrough + case 2: // fallthrough + case 3:{ + switch(typeString[(actualParamCount-1)+3]){ + case MJIT::BYTE_TYPE_CHARACTER: + case MJIT::CHAR_TYPE_CHARACTER: + case MJIT::INT_TYPE_CHARACTER: + case MJIT::SHORT_TYPE_CHARACTER: + case MJIT::BOOLEAN_TYPE_CHARACTER: + argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); + goto genArg; + case MJIT::LONG_TYPE_CHARACTER: + isLongOrDouble = true; + argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); + goto genArg; + case MJIT::CLASSNAME_TYPE_CHARACTER: + isReference = true; + argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); + goto genArg; + case MJIT::DOUBLE_TYPE_CHARACTER: + isLongOrDouble = true; + argRegNum = properties->getFloatArgumentRegister(actualParamCount-1); + goto genArg; + case MJIT::FLOAT_TYPE_CHARACTER: + argRegNum = properties->getFloatArgumentRegister(actualParamCount-1); + goto genArg; + genArg: + switch(argRegNum){ + case TR::RealRegister::eax: + if(isLongOrDouble) { + COPY_TEMPLATE(buffer, moveraxForCall, argumentMoveSize); + } else if(isReference) { + COPY_TEMPLATE(buffer, moveraxRefForCall, argumentMoveSize); + } else { + COPY_TEMPLATE(buffer, moveeaxForCall, argumentMoveSize); + } + break; + case TR::RealRegister::esi: + if(isLongOrDouble) { + COPY_TEMPLATE(buffer, moversiForCall, argumentMoveSize); + } else if(isReference) { + COPY_TEMPLATE(buffer, moversiRefForCall, argumentMoveSize); + } else { + COPY_TEMPLATE(buffer, moveesiForCall, argumentMoveSize); + } + break; + case TR::RealRegister::edx: + if(isLongOrDouble) { + COPY_TEMPLATE(buffer, moverdxForCall, argumentMoveSize); + } else if(isReference) { + COPY_TEMPLATE(buffer, moverdxRefForCall, argumentMoveSize); + } else { + COPY_TEMPLATE(buffer, moveedxForCall, argumentMoveSize); + } + break; + case TR::RealRegister::ecx: + if(isLongOrDouble) { + COPY_TEMPLATE(buffer, movercxForCall, argumentMoveSize); + } else if(isReference) { + COPY_TEMPLATE(buffer, movercxRefForCall, argumentMoveSize); + } else { + COPY_TEMPLATE(buffer, moveecxForCall, argumentMoveSize); + } + break; + case TR::RealRegister::xmm0: + if(isLongOrDouble) { + COPY_TEMPLATE(buffer, moveDxmm0ForCall, argumentMoveSize); + } else { + COPY_TEMPLATE(buffer, movexmm0ForCall, argumentMoveSize); + } + break; + case TR::RealRegister::xmm1: + if(isLongOrDouble) { + COPY_TEMPLATE(buffer, moveDxmm1ForCall, argumentMoveSize); + } else { + COPY_TEMPLATE(buffer, movexmm1ForCall, argumentMoveSize); + } + break; + case TR::RealRegister::xmm2: + if(isLongOrDouble) { + COPY_TEMPLATE(buffer, moveDxmm2ForCall, argumentMoveSize); + } else { + COPY_TEMPLATE(buffer, movexmm2ForCall, argumentMoveSize); + } + break; + case TR::RealRegister::xmm3: + if(isLongOrDouble) { + COPY_TEMPLATE(buffer, moveDxmm3ForCall, argumentMoveSize); + } else { + COPY_TEMPLATE(buffer, movexmm3ForCall, argumentMoveSize); + } + break; + } + break; + } + break; + } + default: + //TODO: Add stack argument passing here. + break; + } + actualParamCount--; + } + + argumentMoveSize += saveArgsOnStack(buffer, -8, ¶mTable); + + return argumentMoveSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateInvokeStatic(char *buffer, MJIT::ByteCodeIterator *bci, struct TR::X86LinkageProperties *properties) +{ + buffer_size_t invokeStaticSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + //Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. + //In the future research project we could attempt runtime resolution. TR does this with self modifying code. + //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. + bool isUnresolvedInCP; + TR_ResolvedMethod* resolved = _compilee->getResolvedStaticMethod(_comp, cpIndex, &isUnresolvedInCP); + //TODO: Split this into generateInvokeStaticJava and generateInvokeStaticNative + //and call the correct one with required args. Do not turn this method into a mess. + if(!resolved || resolved->isNative()) return 0; + + //Get method signature + J9Method *ramMethod = static_cast(resolved)->ramMethod(); + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(ramMethod); + U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); + char typeString[maxLength]; + if(MJIT::nativeSignature(ramMethod, typeString)){ + return 0; + } + + //TODO: Find the maximum size that will need to be saved + // for calling a method and reserve it in the prologue. + // This could be done during the meta-data gathering phase. + U_16 slotCount = MJIT::getMJITStackSlotCount(typeString, maxLength); + + //Generate template and instantiate immediate values + buffer_size_t argMoveSize = generateArgumentMoveForStaticMethod(buffer, resolved, typeString, maxLength, slotCount, properties); + + if(!argMoveSize){ + return 0; + } else if(-1 == argMoveSize) { + argMoveSize = 0; + } + buffer += argMoveSize; + invokeStaticSize += argMoveSize; + + //Save Caller preserved registers + COPY_TEMPLATE(buffer, saveR10Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x0 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR11Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x8 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR12Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x10 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR13Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x18 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR14Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x20 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR15Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x28 + (slotCount * 8))); + COPY_TEMPLATE(buffer, invokeStaticTemplate, invokeStaticSize); + patchImm8(buffer, (U_64)ramMethod); + + //Call the glue code + COPY_TEMPLATE(buffer, call4ByteRel, invokeStaticSize); + intptr_t interpreterStaticAndSpecialGlue = getHelperOrTrampolineAddress(TR_X86interpreterStaticAndSpecialGlue, (uintptr_t)buffer); + PATCH_RELATIVE_ADDR_32_BIT(buffer, interpreterStaticAndSpecialGlue); + + //Load Caller preserved registers + COPY_TEMPLATE(buffer, loadR10Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x00 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR11Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x08 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR12Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x10 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR13Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x18 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR14Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x20 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR15Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x28 + (slotCount * 8))); + TR::RealRegister::RegNum returnRegIndex; + slotCount = 0; + bool isAddressOrLongorDouble = false; + + switch(typeString[0]){ + case MJIT::VOID_TYPE_CHARACTER: + break; + case MJIT::BOOLEAN_TYPE_CHARACTER: + case MJIT::BYTE_TYPE_CHARACTER: + case MJIT::CHAR_TYPE_CHARACTER: + case MJIT::SHORT_TYPE_CHARACTER: + case MJIT::INT_TYPE_CHARACTER: + slotCount = 1; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genStatic; + case MJIT::CLASSNAME_TYPE_CHARACTER: + slotCount = 1; + isAddressOrLongorDouble = true; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genStatic; + case MJIT::LONG_TYPE_CHARACTER: + slotCount = 2; + isAddressOrLongorDouble = true; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genStatic; + case MJIT::FLOAT_TYPE_CHARACTER: + slotCount = 1; + returnRegIndex = properties->getFloatReturnRegister(); + goto genStatic; + case MJIT::DOUBLE_TYPE_CHARACTER: + slotCount = 2; + isAddressOrLongorDouble = true; + returnRegIndex = properties->getFloatReturnRegister(); + goto genStatic; + genStatic: + if(slotCount){ + COPY_TEMPLATE(buffer, retTemplate_sub, invokeStaticSize); + patchImm1(buffer, slotCount*8); + } + switch(returnRegIndex){ + case TR::RealRegister::eax: + if(isAddressOrLongorDouble) { + COPY_TEMPLATE(buffer, loadraxReturn, invokeStaticSize); + } else { + COPY_TEMPLATE(buffer, loadeaxReturn, invokeStaticSize); + } + break; + case TR::RealRegister::xmm0: + if(isAddressOrLongorDouble) { + COPY_TEMPLATE(buffer, loadDxmm0Return, invokeStaticSize); + } else { + COPY_TEMPLATE(buffer, loadxmm0Return, invokeStaticSize); + } + break; + } + break; + default: + MJIT_ASSERT(_logFileFP, false, "Bad return type."); + } + return invokeStaticSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateGetField(char *buffer, MJIT::ByteCodeIterator *bci) +{ + buffer_size_t getFieldSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + //Attempt to resolve the offset at compile time. This can fail, and if it does we must fail the compilation for now. + //In the future research project we could attempt runtime resolution. TR does this with self modifying code. + //These are all from TR and likely have implications on how we use this offset. However, we do not know that as of yet. + TR::DataType type = TR::NoType; + U_32 fieldOffset = 0; + bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; + bool resolved = _compilee->fieldAttributes(_comp, cpIndex, &fieldOffset, &type, &isVolatile, &isFinal, &isPrivate, false, &isUnresolvedInCP); + if(!resolved) return 0; + + switch(type) + { + case TR::Int8: + case TR::Int16: + case TR::Int32: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, intGetFieldTemplate, getFieldSize); + break; + case TR::Address: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, addrGetFieldTemplatePrologue, getFieldSize); + if(TR::Compiler->om.compressObjectReferences()){ + int32_t shift = TR::Compiler->om.compressedReferenceShift(); + COPY_TEMPLATE(buffer, decompressReferenceTemplate, getFieldSize); + patchImm1(buffer, shift); + } + COPY_TEMPLATE(buffer, addrGetFieldTemplate, getFieldSize); + break; + case TR::Int64: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, longGetFieldTemplate, getFieldSize); + break; + case TR::Float: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, floatGetFieldTemplate, getFieldSize); + break; + case TR::Double: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, doubleGetFieldTemplate, getFieldSize); + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); + break; + } + if(_comp->getOption(TR_TraceCG)){ + trfprintf(_logFileFP, "Field Offset: %u\n", fieldOffset); + } + + return getFieldSize; + +} + +buffer_size_t +MJIT::CodeGenerator::generatePutField(char *buffer, MJIT::ByteCodeIterator *bci) +{ + buffer_size_t putFieldSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + //Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. + //In the future research project we could attempt runtime resolution. TR does this with self modifying code. + //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. + U_32 fieldOffset = 0; + TR::DataType type = TR::NoType; + bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; + bool resolved = _compilee->fieldAttributes(_comp, cpIndex, &fieldOffset, &type, &isVolatile, &isFinal, &isPrivate, true, &isUnresolvedInCP); + + if(!resolved) return 0; + + switch(type) + { + case TR::Int8: + case TR::Int16: + case TR::Int32: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, intPutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, intPutFieldTemplate, putFieldSize); + break; + case TR::Address: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, addrPutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + if(TR::Compiler->om.compressObjectReferences()){ + int32_t shift = TR::Compiler->om.compressedReferenceShift(); + COPY_TEMPLATE(buffer, compressReferenceTemplate, putFieldSize); + patchImm1(buffer, shift); + } + COPY_TEMPLATE(buffer, intPutFieldTemplate, putFieldSize); + break; + case TR::Int64: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, longPutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, longPutFieldTemplate, putFieldSize); + break; + case TR::Float: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, floatPutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, floatPutFieldTemplate, putFieldSize); + break; + case TR::Double: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, doublePutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, doublePutFieldTemplate, putFieldSize); + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); + break; + } + if(_comp->getOption(TR_TraceCG)){ + trfprintf(_logFileFP, "Field Offset: %u\n", fieldOffset); + } + return putFieldSize; + +} + +buffer_size_t +MJIT::CodeGenerator::generateGetStatic(char *buffer, MJIT::ByteCodeIterator *bci) +{ + buffer_size_t getStaticSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + //Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. + //In the future research project we could attempt runtime resolution. TR does this with self modifying code. + void *dataAddress; + //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. + TR::DataType type = TR::NoType; + bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; + bool resolved = _compilee->staticAttributes(_comp, cpIndex, &dataAddress, &type, &isVolatile, &isFinal, &isPrivate, false, &isUnresolvedInCP); + if(!resolved) return 0; + + switch(type) + { + case TR::Int8: + case TR::Int16: + case TR::Int32: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, intGetStaticTemplate, getStaticSize); + break; + case TR::Address: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, addrGetStaticTemplatePrologue, getStaticSize); + /* + if(TR::Compiler->om.compressObjectReferences()){ + int32_t shift = TR::Compiler->om.compressedReferenceShift(); + COPY_TEMPLATE(buffer, decompressReferenceTemplate, getStaticSize); + patchImm1(buffer, shift); + } + */ + COPY_TEMPLATE(buffer, addrGetStaticTemplate, getStaticSize); + break; + case TR::Int64: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, longGetStaticTemplate, getStaticSize); + break; + case TR::Float: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, floatGetStaticTemplate, getStaticSize); + break; + case TR::Double: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, doubleGetStaticTemplate, getStaticSize); + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); + break; + } + + return getStaticSize; +} + +buffer_size_t +MJIT::CodeGenerator::generatePutStatic(char *buffer, MJIT::ByteCodeIterator *bci) +{ + buffer_size_t putStaticSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + //Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. + //In the future research project we could attempt runtime resolution. TR does this with self modifying code. + void * dataAddress; + //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. + TR::DataType type = TR::NoType; + bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; + bool resolved = _compilee->staticAttributes(_comp, cpIndex, &dataAddress, &type, &isVolatile, &isFinal, &isPrivate, true, &isUnresolvedInCP); + if(!resolved) return 0; + + switch(type) + { + case TR::Int8: + case TR::Int16: + case TR::Int32: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, intPutStaticTemplate, putStaticSize); + break; + case TR::Address: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, addrPutStaticTemplatePrologue, putStaticSize); + /* + if(TR::Compiler->om.compressObjectReferences()){ + int32_t shift = TR::Compiler->om.compressedReferenceShift(); + COPY_TEMPLATE(buffer, compressReferenceTemplate, putStaticSize); + patchImm1(buffer, shift); + } + */ + COPY_TEMPLATE(buffer, addrPutStaticTemplate, putStaticSize); + break; + case TR::Int64: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, longPutStaticTemplate, putStaticSize); + break; + case TR::Float: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, floatPutStaticTemplate, putStaticSize); + break; + case TR::Double: + //Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, doublePutStaticTemplate, putStaticSize); + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); + break; + } + + return putStaticSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateColdArea(char *buffer, J9Method *method, char *jitStackOverflowJumpPatchLocation){ + buffer_size_t coldAreaSize = 0; + PATCH_RELATIVE_ADDR_32_BIT(jitStackOverflowJumpPatchLocation, (intptr_t)buffer); + COPY_TEMPLATE(buffer, movEDIImm32, coldAreaSize); + patchImm4(buffer, _stackPeakSize); + + COPY_TEMPLATE(buffer, call4ByteRel, coldAreaSize); + uintptr_t jitStackOverflowHelperAddress = getHelperOrTrampolineAddress(TR_stackOverflow, (uintptr_t)buffer); + PATCH_RELATIVE_ADDR_32_BIT(buffer, jitStackOverflowHelperAddress); + + COPY_TEMPLATE(buffer, jump4ByteRel, coldAreaSize); + PATCH_RELATIVE_ADDR_32_BIT(buffer, jitStackOverflowJumpPatchLocation); + + return coldAreaSize; +} + +buffer_size_t +MJIT::CodeGenerator::generateDebugBreakpoint(char *buffer){ + buffer_size_t codeGenSize = 0; + COPY_TEMPLATE(buffer, debugBreakpoint, codeGenSize); + return codeGenSize; +} + +TR::CodeCache* +MJIT::CodeGenerator::getCodeCache(){ + return _codeCache; +} + +U_8 * +MJIT::CodeGenerator::allocateCodeCache(int32_t length, TR_J9VMBase *vmBase, J9VMThread *vmThread){ + TR::CodeCacheManager *manager = TR::CodeCacheManager::instance(); + int32_t compThreadID = vmBase->getCompThreadIDForVMThread(vmThread); + int32_t numReserved; + + if(!_codeCache){ + _codeCache = manager->reserveCodeCache(false, length, compThreadID, &numReserved); + if (!_codeCache){ + return NULL; + } + _comp->cg()->setCodeCache(_codeCache); + } + + uint8_t *coldCode; + U_8 *codeStart = manager->allocateCodeMemory(length, 0, &_codeCache, &coldCode, false); + if(!codeStart){ + return NULL; + } + + return codeStart; +} \ No newline at end of file diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp new file mode 100644 index 00000000000..5ff1b22b1d9 --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp @@ -0,0 +1,444 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#ifndef MJIT_AMD64CODEGEN_HPP +#define MJIT_AMD64CODEGEN_HPP + +#include +#include "microjit/SideTables.hpp" +#include "microjit/utils.hpp" +#include "microjit/x/amd64/AMD64Linkage.hpp" +#include "microjit/x/amd64/AMD64CodegenGC.hpp" +#include "microjit/ByteCodeIterator.hpp" + +#include "control/RecompilationInfo.hpp" +#include "env/IO.hpp" +#include "env/TRMemory.hpp" +#include "ilgen/J9ByteCodeIterator.hpp" +#include "oti/j9generated.h" + +namespace TR { class Compilation; } +namespace TR { class CFG; } + +typedef unsigned int buffer_size_t; +namespace MJIT { + +enum TypeCharacters { + BYTE_TYPE_CHARACTER = 'B', + CHAR_TYPE_CHARACTER = 'C', + DOUBLE_TYPE_CHARACTER = 'D', + FLOAT_TYPE_CHARACTER = 'F', + INT_TYPE_CHARACTER = 'I', + LONG_TYPE_CHARACTER = 'J', + CLASSNAME_TYPE_CHARACTER = 'L', + SHORT_TYPE_CHARACTER = 'S', + BOOLEAN_TYPE_CHARACTER = 'Z', + VOID_TYPE_CHARACTER = 'V', +}; + +inline static U_8 +typeSignatureSize(char typeBuffer) +{ + switch(typeBuffer){ + case MJIT::BYTE_TYPE_CHARACTER: + case MJIT::CHAR_TYPE_CHARACTER: + case MJIT::INT_TYPE_CHARACTER: + case MJIT::FLOAT_TYPE_CHARACTER: + case MJIT::CLASSNAME_TYPE_CHARACTER: + case MJIT::SHORT_TYPE_CHARACTER: + case MJIT::BOOLEAN_TYPE_CHARACTER: + return 8; + case MJIT::DOUBLE_TYPE_CHARACTER: + case MJIT::LONG_TYPE_CHARACTER: + return 16; + default: + return 0; + } +} + +RegisterStack +mapIncomingParams(char*, U_16, int*, ParamTableEntry*, U_16, TR::FilePointer*); + +bool +nativeSignature(J9Method *method, char *resultBuffer); + +int +getParamCount(char *typeString, U_16 maxLength); + +int +getActualParamCount(char *typeString, U_16 maxLength); + +int +getMJITStackSlotCount(char *typeString, U_16 maxLength); + +class CodeGenerator { + private: + Linkage _linkage; + TR::FilePointer *_logFileFP; + TR_J9VMBase& _vm; + TR::CodeCache *_codeCache; + int32_t _stackPeakSize; + ParamTable *_paramTable; + TR::Compilation *_comp; + MJIT::CodeGenGC *_mjitCGGC; + TR::GCStackAtlas *_atlas; + TR::PersistentInfo *_persistentInfo; + TR_Memory *_trMemory; + TR_ResolvedMethod *_compilee; + uint16_t _maxCalleeArgsSize; + bool _printByteCodes; + + buffer_size_t generateSwitchToInterpPrePrologue( + char*, + J9Method*, + buffer_size_t, + buffer_size_t, + char*, + U_16 + ); + + buffer_size_t generateEpologue(char*); + + buffer_size_t loadArgsFromStack( + char*, + buffer_size_t, + ParamTable* + ); + + buffer_size_t saveArgsOnStack( + char*, + buffer_size_t, + ParamTable* + ); + + buffer_size_t + saveArgsInLocalArray( + char*, + buffer_size_t, + char* + ); + + /** + * Generates a load instruction based on the type. + * + * @param buffer code buffer + * @param bc Bytecode that generated the load instruction + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateLoad( + char *buffer, + TR_J9ByteCode bc, + MJIT::ByteCodeIterator *bci + ); + + /** + * Generates a store instruction based on the type. + * + * @param buffer code buffer + * @param bc Bytecode that generated the load instruction + * @param bci Bytecode iterator from which the bytecode was retreived + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateStore( + char *buffer, + TR_J9ByteCode bc, + MJIT::ByteCodeIterator *bci + ); + + /** + * Generates argument moves for static method invokation. + * + * @param buffer code buffer + * @param staticMethod method for introspection + * @param typeString typeString for getting arguments + * @param typeStringLength the length of the string stored in typeString + * @return size of generated code; 0 if method failed, -1 if method has no arguments + */ + buffer_size_t + generateArgumentMoveForStaticMethod( + char* buffer, + TR_ResolvedMethod* staticMethod, + char* typeString, + U_16 typeStringLength, + U_16 slotCount, + struct TR::X86LinkageProperties *properties + ); + + /** + * Generates an invoke static instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retreived + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateInvokeStatic( + char* buffer, + MJIT::ByteCodeIterator* bci, + struct TR::X86LinkageProperties *properties + ); + + /** + * Generates a getField instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retreived + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateGetField( + char* buffer, + MJIT::ByteCodeIterator* bci + ); + + /** + * Generates a putField instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retreived + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generatePutField( + char* buffer, + MJIT::ByteCodeIterator* bci + ); + + /** + * Generates a putStatic instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retreived + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generatePutStatic( + char *buffer, + MJIT::ByteCodeIterator *bci + ); + + /** + * Generates a getStatic instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retreived + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateGetStatic( + char *buffer, + MJIT::ByteCodeIterator *bci + ); + + /** + * Generates a return instruction based on the type. + * + * @param buffer code buffer + * @param dt The data type to be returned by the TR-compiled method + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateReturn( + char *buffer, + TR::DataType dt, + struct TR::X86LinkageProperties *properties + ); + + /** + * Generates a gaurded counter for recompilation through JITed counting. + * + * @param buffer code buffer + * @param initialCount invocations before compiling with TR + * @return size of generate code; 0 if method failed + */ + buffer_size_t + generateGCR( + char *buffer, + int32_t initialCount, + J9Method *method, + uintptr_t startPC + ); + + public: + CodeGenerator() = delete; + CodeGenerator( + struct J9JITConfig*, + J9VMThread*, + TR::FilePointer*, + TR_J9VMBase&, + ParamTable*, + TR::Compilation*, + MJIT::CodeGenGC*, + TR::PersistentInfo*, + TR_Memory*, + TR_ResolvedMethod* + ); + + inline void + setPeakStackSize(int32_t newSize) + { + _stackPeakSize = newSize; + } + + inline int32_t + getPeakStackSize() + { + return _stackPeakSize; + } + + /** + * Write the pre-prologue to the buffer and return the size written to the buffer. + * + * @param buffer code buffer + * @param method method for introspection + * @param magicWordLocation, location of variable to hold pointer to space allocated for magic word + * @param first2BytesPatchLocation, location of variable to hold pointer to first 2 bytes of method to be executed + * @param bodyInfo location of variable to hold pointer to the bodyInfo, allocated inside this method + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generatePrePrologue( + char* buffer, + J9Method* method, + char** magicWordLocation, + char** first2BytesPatchLocation, + char** samplingRecompileCallLocation + ); + + /** + * Write the prologue to the buffer and return the size written to the buffer. + * + * @param buffer code buffer + * @param method method for introspection + * @param jitStackOverflowJumpPatchLocation, location of variable to hold pointer to line that jumps to the stack overflow helper. + * @param magicWordLocation, pointer to magic word location, written to during prologue generation + * @param first2BytesPatchLocation, pointer to first 2 bytes location, written to during prologue generation + * @param firstInstLocation, location of variable to hold pointer to first instruction, later used to set entry point + * @param bci bytecode iterator, used to gather type information + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generatePrologue( + char* buffer, + J9Method* method, + char** jitStackOverflowJumpPatchLocation, + char* magicWordLocation, + char* first2BytesPatchLocation, + char* samplingRecompileCallLocation, + char** firstInstLocation, + MJIT::ByteCodeIterator* bci + ); + + /** + * Generates the cold area used for helpers, such as the jit stack overflow checker. + * + * @param buffer code buffer + * @param method method for introspection + * @param jitStackOverflowJumpPatchLocation location to be patched with jit stack overflow helper relative address. + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateColdArea( + char *buffer, + J9Method *method, + char *jitStackOverflowJumpPatchLocation + ); + + /** + * Generates the body for a method. + * + * @param buffer code buffer + * @param bci byte code iterator used in hot loop to generate body of compiled method + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateBody( + char *buffer, + MJIT::ByteCodeIterator *bci + ); + + /** + * Generates a profiling counter for an edge. + * + * @param buffer code buffer + * @param bci byte code iterator, currently pointing to the edge that needs to be profiled + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateEdgeCounter( + char *buffer, + TR_J9ByteCodeIterator *bci + ); + + /** + * Generate an int3 for signaling debug breakpoint for Linux x86 + * + * @param buffer code buffer + */ + buffer_size_t + generateDebugBreakpoint(char *buffer); + + /** + * Allocate space in the code cache of size length and + * copy the buffer to the cache. + * + * @param length length of code in the buffer + * @param vmBase To check if compilation should be interrupted + * @param vmThread Thread Id is stored in cache + * @return pointer to the new code cache segment or NULL if failed. + */ + U_8 * + allocateCodeCache( + int32_t length, + TR_J9VMBase *vmBase, + J9VMThread *vmThread + ); + + /** + * Get the code cache + */ + TR::CodeCache *getCodeCache(); + + /** + * Get a pointer to the Stack Atlas + */ + TR::GCStackAtlas *getStackAtlas(); + + inline uint8_t + getPointerSize() + { + return _linkage._properties.getPointerSize(); + } +}; + +class MJITCompilationFailure: public virtual std::exception { + public: + MJITCompilationFailure() { } + virtual const char *what() const throw(){ + return "Unable to compile method."; + } +}; + + +} //namespace MJIT +#endif /* MJIT_AMD64CODEGEN_HPP */ diff --git a/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.cpp b/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.cpp new file mode 100644 index 00000000000..fb51ef186cc --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.cpp @@ -0,0 +1,146 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#include "microjit/x/amd64/AMD64CodegenGC.hpp" + +#include +#include +#include "env/StackMemoryRegion.hpp" +#include "codegen/CodeGenerator.hpp" +#include "codegen/CodeGenerator_inlines.hpp" +#include "codegen/GCStackAtlas.hpp" +#include "codegen/GCStackMap.hpp" +#include "codegen/Linkage.hpp" +#include "codegen/Linkage_inlines.hpp" +#include "compile/Compilation.hpp" +#include "control/Options.hpp" +#include "control/Options_inlines.hpp" +#include "env/ObjectModel.hpp" +#include "env/CompilerEnv.hpp" +#include "env/TRMemory.hpp" +#include "env/jittypes.h" +#include "il/AutomaticSymbol.hpp" +#include "il/Node.hpp" +#include "il/ParameterSymbol.hpp" +#include "il/ResolvedMethodSymbol.hpp" +#include "il/Symbol.hpp" +#include "il/SymbolReference.hpp" +#include "infra/Assert.hpp" +#include "infra/BitVector.hpp" +#include "infra/IGNode.hpp" +#include "infra/InterferenceGraph.hpp" +#include "infra/List.hpp" +#include "microjit/SideTables.hpp" +#include "microjit/utils.hpp" +#include "ras/Debug.hpp" + + +MJIT::CodeGenGC::CodeGenGC(TR::FilePointer *logFileFP) + :_logFileFP(logFileFP) +{} + +TR::GCStackAtlas* +MJIT::CodeGenGC::createStackAtlas(TR::Compilation *comp, MJIT::ParamTable *paramTable, MJIT::LocalTable *localTable) + { + + // -------------------------------------------------------------------------------- + // Construct the parameter map for mapped reference parameters + // + intptr_t stackSlotSize = TR::Compiler->om.sizeofReferenceAddress(); + U_16 paramCount = paramTable->getParamCount(); + U_16 ParamSlots = paramTable->getTotalParamSize()/stackSlotSize; + TR_GCStackMap *parameterMap = new (comp->trHeapMemory(), paramCount) TR_GCStackMap(paramCount); + + int32_t firstMappedParmOffsetInBytes = -1; + MJIT::ParamTableEntry entry; + for(int i=0; igetEntry(i, &entry), "Bad index for table entry"); + if(entry.notInitialized) { + i++; + continue; + } + if(!entry.notInitialized && entry.isReference){ + firstMappedParmOffsetInBytes = entry.gcMapOffset; + break; + } + i += entry.slots; + } + + for(int i=0; igetEntry(i, &entry), "Bad index for table entry"); + if(entry.notInitialized) { + i++; + continue; + } + if(!entry.notInitialized && entry.isReference){ + int32_t entryOffsetInBytes = entry.gcMapOffset; + parameterMap->setBit(((entryOffsetInBytes-firstMappedParmOffsetInBytes)/stackSlotSize)); + if(!entry.onStack){ + parameterMap->setRegisterBits(TR::RealRegister::gprMask((TR::RealRegister::RegNum)entry.regNo)); + } + } + i += entry.slots; + } + + // -------------------------------------------------------------------------------- + // Construct the local map for mapped reference locals. + // This does not duplicate items mapped from the parameter table. + // + firstMappedParmOffsetInBytes = + (firstMappedParmOffsetInBytes >= 0) ? firstMappedParmOffsetInBytes : 0; + U_16 localCount = localTable->getLocalCount(); + TR_GCStackMap *localMap = new (comp->trHeapMemory(), localCount) TR_GCStackMap(localCount); + localMap->copy(parameterMap); + for(int i=paramCount; igetEntry(i, &entry), "Bad index for table entry"); + if(entry.notInitialized) { + i++; + continue; + } + if(!entry.notInitialized && entry.offset == -1 && entry.isReference){ + int32_t entryOffsetInBytes = entry.gcMapOffset; + localMap->setBit(((entryOffsetInBytes-firstMappedParmOffsetInBytes)/stackSlotSize)); + if(!entry.onStack){ + localMap->setRegisterBits(TR::RealRegister::gprMask((TR::RealRegister::RegNum)entry.regNo)); + } + } + i += entry.slots; + } + + // -------------------------------------------------------------------------- + // Now create the stack atlas + // + TR::GCStackAtlas * atlas = new (comp->trHeapMemory()) TR::GCStackAtlas(paramCount, localCount, comp->trMemory()); + atlas->setParmBaseOffset(firstMappedParmOffsetInBytes); + atlas->setParameterMap(parameterMap); + atlas->setLocalMap(localMap); + // MJIT does not create stack allocated objects + atlas->setStackAllocMap(NULL); + // MJIT must initialize all locals which are not parameters + atlas->setNumberOfSlotsToBeInitialized(localCount - paramCount); + atlas->setIndexOfFirstSpillTemp(localCount); + // MJIT does not use internal pointers + atlas->setInternalPointerMap(NULL); + // MJIT does not mimic interpreter frame shape + atlas->setNumberOfPendingPushSlots(0); + atlas->setNumberOfPaddingSlots(0); + return atlas; + } diff --git a/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.hpp b/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.hpp new file mode 100644 index 00000000000..24cfbfa2942 --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.hpp @@ -0,0 +1,37 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#ifndef MJIT_AMD64_CODEGENGC_HPP +#define MJIT_AMD64_CODEGENGC_HPP +#include "codegen/GCStackAtlas.hpp" +#include "microjit/SideTables.hpp" +#include "env/IO.hpp" + +namespace MJIT { + class CodeGenGC { + private: + TR::FilePointer *_logFileFP; + public: + CodeGenGC(TR::FilePointer *logFileFP); + TR::GCStackAtlas *createStackAtlas(TR::Compilation*, MJIT::ParamTable*, MJIT::LocalTable*); + }; +} +#endif /* MJIT_AMD64_CODEGENGC_HPP */ diff --git a/runtime/compiler/microjit/x/amd64/AMD64Linkage.cpp b/runtime/compiler/microjit/x/amd64/AMD64Linkage.cpp new file mode 100644 index 00000000000..cdb4f70ee4c --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/AMD64Linkage.cpp @@ -0,0 +1,38 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#include "AMD64Linkage.hpp" +#include + +/* +| Architecture | Endian | Return address | Argument registers | +| x86-64 | Little | Stack | RAX RSI RDX RCX | + +| Return value registers | Preserved registers | vTable index | +| EAX (32-bit) RAX (64-bit) XMM0 (float/double) | RBX R9 RSP | R8 (receiver in RAX) | +*/ + +MJIT::Linkage::Linkage(TR::CodeGenerator *cg) +{ + // TODO: MicroJIT: Use the method setOffsetToFirstParm(RETURN_ADDRESS_SIZE); + _properties._offsetToFirstParm = RETURN_ADDRESS_SIZE; + setProperties(cg,&_properties); +} diff --git a/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp b/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp new file mode 100644 index 00000000000..84e24af61e2 --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp @@ -0,0 +1,52 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at https://www.eclipse.org/legal/epl-2.0/ + * or the Apache License, Version 2.0 which accompanies this distribution and + * is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following + * Secondary Licenses when the conditions for such availability set + * forth in the Eclipse Public License, v. 2.0 are satisfied: GNU + * General Public License, version 2 with the GNU Classpath + * Exception [1] and GNU General Public License, version 2 with the + * OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ +#ifndef AMD64LINKAGE_HPP +#define AMD64LINKAGE_HPP + +#include +#include "codegen/AMD64PrivateLinkage.hpp" +#include "codegen/OMRLinkage_inlines.hpp" + +/* +| Architecture | Endian | Return address | Argument registers +| x86-64 | Little | Stack | RAX RSI RDX RCX + +| Return value registers | Preserved registers | vTable index | +| EAX (32-bit) RAX (64-bit) XMM0 (float/double) | RBX R9 RSP | R8 (receiver in RAX) | +*/ + +namespace MJIT { + +enum + { + RETURN_ADDRESS_SIZE=8, + }; + +class Linkage { +public: + struct TR::X86LinkageProperties _properties; + Linkage() = delete; + Linkage(TR::CodeGenerator*); +}; + +} // namespace MJIT +#endif /* AMD64LINKAGE_HPP */ diff --git a/runtime/compiler/microjit/x/amd64/CMakeLists.txt b/runtime/compiler/microjit/x/amd64/CMakeLists.txt new file mode 100644 index 00000000000..2c2ebafd739 --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/CMakeLists.txt @@ -0,0 +1,27 @@ +################################################################################ +# Copyright (c) 2022, 2022 IBM Corp. and others +# +# This program and the accompanying materials are made available under +# the terms of the Eclipse Public License 2.0 which accompanies this +# distribution and is available at https://www.eclipse.org/legal/epl-2.0/ +# or the Apache License, Version 2.0 which accompanies this distribution and +# is available at https://www.apache.org/licenses/LICENSE-2.0. +# +# This Source Code may also be made available under the following +# Secondary Licenses when the conditions for such availability set +# forth in the Eclipse Public License, v. 2.0 are satisfied: GNU +# General Public License, version 2 with the GNU Classpath +# Exception [1] and GNU General Public License, version 2 with the +# OpenJDK Assembly Exception [2]. +# +# [1] https://www.gnu.org/software/classpath/license.html +# [2] http://openjdk.java.net/legal/assembly-exception.html +# +# SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception +################################################################################ +j9jit_files( + microjit/x/amd64/AMD64Codegen.cpp + microjit/x/amd64/AMD64CodegenGC.cpp + microjit/x/amd64/AMD64Linkage.cpp +) +add_subdirectory(templates) \ No newline at end of file diff --git a/runtime/compiler/microjit/x/amd64/templates/CMakeLists.txt b/runtime/compiler/microjit/x/amd64/templates/CMakeLists.txt new file mode 100644 index 00000000000..0861b09fafd --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/templates/CMakeLists.txt @@ -0,0 +1,26 @@ +################################################################################ +# Copyright (c) 2022, 2022 IBM Corp. and others +# +# This program and the accompanying materials are made available under +# the terms of the Eclipse Public License 2.0 which accompanies this +# distribution and is available at https://www.eclipse.org/legal/epl-2.0/ +# or the Apache License, Version 2.0 which accompanies this distribution and +# is available at https://www.apache.org/licenses/LICENSE-2.0. +# +# This Source Code may also be made available under the following +# Secondary Licenses when the conditions for such availability set +# forth in the Eclipse Public License, v. 2.0 are satisfied: GNU +# General Public License, version 2 with the GNU Classpath +# Exception [1] and GNU General Public License, version 2 with the +# OpenJDK Assembly Exception [2]. +# +# [1] https://www.gnu.org/software/classpath/license.html +# [2] http://openjdk.java.net/legal/assembly-exception.html +# +# SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception +################################################################################ +j9jit_files( + microjit/x/amd64/templates/microjit.nasm + microjit/x/amd64/templates/bytecodes.nasm + microjit/x/amd64/templates/linkage.nasm +) \ No newline at end of file diff --git a/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm b/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm new file mode 100644 index 00000000000..9188a0c2fee --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm @@ -0,0 +1,1351 @@ +; Copyright (c) 2022, 2022 IBM Corp. and others +; +; This program and the accompanying materials are made available under +; the terms of the Eclipse Public License 2.0 which accompanies this +; distribution and is available at https://www.eclipse.org/legal/epl-2.0/ +; or the Apache License, Version 2.0 which accompanies this distribution and +; is available at https://www.apache.org/licenses/LICENSE-2.0. +; +; This Source Code may also be made available under the following +; Secondary Licenses when the conditions for such availability set +; forth in the Eclipse Public License, v. 2.0 are satisfied: GNU +; General Public License, version 2 with the GNU Classpath +; Exception [1] and GNU General Public License, version 2 with the +; OpenJDK Assembly Exception [2]. +; +; [1] https://www.gnu.org/software/classpath/license.html +; [2] http://openjdk.java.net/legal/assembly-exception.html +; +; SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception +%include "utils.nasm" +%include "microjit.nasm" + +;MicroJIT Virtual Machine to X86-64 mapping +; rsp: +; is the stack extent for the java value stack pointer. +; r10: +; will hold the java value stack pointer. +; r11: +; stores the accumulator, or +; stores a pointer to an object +; r12: +; stores any value which will act on the accumulator, or +; stores the value to be written to an object field, or +; stores the value read from an object field +; r13: +; holds addresses for absolute addressing +; r14: +; holds a pointer to the start of the local array +; r15: +; stores values loaded from memory for storing on the stack or in the local array + +template_start debugBreakpoint + int3 ; trigger hardware interrupt +template_end debugBreakpoint + +; All integer loads require adding stack space of 8 bytes +; and the moving of values from a stack slot to a temp register +template_start aloadTemplatePrologue + push_single_slot + _64bit_local_to_rXX_PATCH r15 +template_end aloadTemplatePrologue + +; All integer loads require adding stack space of 8 bytes +; and the moving of values from a stack slot to a temp register +template_start iloadTemplatePrologue + push_single_slot + _32bit_local_to_rXX_PATCH r15 +template_end iloadTemplatePrologue + +; All long loads require adding 2 stack spaces of 8 bytes (16 total) +; and the moving of values from a stack slot to a temp register +template_start lloadTemplatePrologue + push_dual_slot + _64bit_local_to_rXX_PATCH r15 +template_end lloadTemplatePrologue + +template_start loadTemplate + mov [r10], r15 +template_end loadTemplate + +template_start invokeStaticTemplate + mov rdi, qword 0xefbeaddeefbeadde +template_end invokeStaticTemplate + +template_start retTemplate_sub + sub r10, byte 0xFF +template_end retTemplate_sub + +template_start loadeaxReturn + _32bit_slot_stack_from_eXX ax,0 +template_end loadeaxReturn + +template_start loadraxReturn + _64bit_slot_stack_from_rXX rax,0 +template_end loadraxReturn + +template_start loadxmm0Return + movd [r10], xmm0 +template_end loadxmm0Return + +template_start loadDxmm0Return + movq [r10], xmm0 +template_end loadDxmm0Return + +template_start moveeaxForCall + _32bit_slot_stack_to_eXX ax,0 + pop_single_slot +template_end moveeaxForCall + +template_start moveesiForCall + _32bit_slot_stack_to_eXX si,0 + pop_single_slot +template_end moveesiForCall + +template_start moveedxForCall + _32bit_slot_stack_to_eXX dx,0 + pop_single_slot +template_end moveedxForCall + +template_start moveecxForCall + _32bit_slot_stack_to_eXX cx,0 + pop_single_slot +template_end moveecxForCall + +template_start moveraxRefForCall + _64bit_slot_stack_to_rXX rax,0 + pop_single_slot +template_end moveraxRefForCall + +template_start moversiRefForCall + _64bit_slot_stack_to_rXX rsi,0 + pop_single_slot +template_end moversiRefForCall + +template_start moverdxRefForCall + _64bit_slot_stack_to_rXX rdx,0 + pop_single_slot +template_end moverdxRefForCall + +template_start movercxRefForCall + _64bit_slot_stack_to_rXX rcx,0 + pop_single_slot +template_end movercxRefForCall + +template_start moveraxForCall + _64bit_slot_stack_to_rXX rax,0 + pop_dual_slot +template_end moveraxForCall + +template_start moversiForCall + _64bit_slot_stack_to_rXX rsi,0 + pop_dual_slot +template_end moversiForCall + +template_start moverdxForCall + _64bit_slot_stack_to_rXX rdx,0 + pop_dual_slot +template_end moverdxForCall + +template_start movercxForCall + _64bit_slot_stack_to_rXX rcx,0 + pop_dual_slot +template_end movercxForCall + +template_start movexmm0ForCall + psrldq xmm0, 8 ; clear xmm0 by shifting right + movss xmm0, [r10] + pop_single_slot +template_end movexmm0ForCall + +template_start movexmm1ForCall + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] + pop_single_slot +template_end movexmm1ForCall + +template_start movexmm2ForCall + psrldq xmm2, 8 ; clear xmm2 by shifting right + movss xmm2, [r10] + pop_single_slot +template_end movexmm2ForCall + +template_start movexmm3ForCall + psrldq xmm3, 8 ; clear xmm3 by shifting right + movss xmm3, [r10] + pop_single_slot +template_end movexmm3ForCall + +template_start moveDxmm0ForCall + psrldq xmm0, 8 ; clear xmm0 by shifting right + movsd xmm0, [r10] + pop_dual_slot +template_end moveDxmm0ForCall + +template_start moveDxmm1ForCall + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] + pop_dual_slot +template_end moveDxmm1ForCall + +template_start moveDxmm2ForCall + psrldq xmm2, 8 ; clear xmm2 by shifting right + movsd xmm2, [r10] + pop_dual_slot +template_end moveDxmm2ForCall + +template_start moveDxmm3ForCall + psrldq xmm3, 8 ; clear xmm3 by shifting right + movsd xmm3, [r10] + pop_dual_slot +template_end moveDxmm3ForCall + +template_start astoreTemplate + _64bit_slot_stack_to_rXX r15,0 + pop_single_slot + _64bit_local_from_rXX_PATCH r15 +template_end astoreTemplate + +template_start istoreTemplate + _32bit_slot_stack_to_rXX r15,0 + pop_single_slot + _32bit_local_from_rXX_PATCH r15 +template_end istoreTemplate + +template_start lstoreTemplate + _64bit_slot_stack_to_rXX r15,0 + pop_dual_slot + _64bit_local_from_rXX_PATCH r15 +template_end lstoreTemplate + +; stack before: ..., value → +; stack after: ... +template_start popTemplate + pop_single_slot ; remove the top value of the stack +template_end popTemplate + +; stack before: ..., value2, value1 → +; stack after: ... +template_start pop2Template + pop_dual_slot ; remove the top two values from the stack +template_end pop2Template + +; stack before: ..., value2, value1 → +; stack after: ..., value1, value2 +template_start swapTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + _64bit_slot_stack_from_rXX r12,8 ; write value1 to the second stack slot + _64bit_slot_stack_from_rXX r11,0 ; write value2 to the top stack slot +template_end swapTemplate + +; stack before: ..., value1 → +; stack after: ..., value1, value1 +template_start dupTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy the top value into r12 + push_single_slot ; increase the stack by 1 slot + _64bit_slot_stack_from_rXX r12,0 ; push the duplicate on top of the stack +template_end dupTemplate + +; stack before: ..., value2, value1 → +; stack after: ..., value1, value2, value1 +template_start dupx1Template + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + _64bit_slot_stack_from_rXX r12,8 ; store value1 in third stack slot + _64bit_slot_stack_from_rXX r11,0 ; store value2 in second stack slot + push_single_slot ; increase stack by 1 slot + _64bit_slot_stack_from_rXX r12,0 ; store value1 on top stack slot +template_end dupx1Template + +; stack before: ..., value3, value2, value1 → +; stack after: ..., value1, value3, value2, value1 +template_start dupx2Template + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + _64bit_slot_stack_to_rXX r15,16 ; copy value3 into r15 + _64bit_slot_stack_from_rXX r12,16 ; store value1 in fourth stack slot + _64bit_slot_stack_from_rXX r15,8 ; store value3 in third stack slot + _64bit_slot_stack_from_rXX r11,0 ; store value2 in second stack slot + push_single_slot ; increase stack by 1 slot + _64bit_slot_stack_from_rXX r12,0 ; store value1 in top stack slot +template_end dupx2Template + +; stack before: ..., value2, value1 → +; stack after: ..., value2, value1, value2, value1 +template_start dup2Template + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + push_dual_slot ; increase stack by 2 slots + _64bit_slot_stack_from_rXX r11,8 ; store value2 in second stack slot + _64bit_slot_stack_from_rXX r12,0 ; store value1 in top stack slot +template_end dup2Template + +; stack before: ..., value3, value2, value1 → +; stack after: ..., value2, value1, value3, value2, value1 +template_start dup2x1Template + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + _64bit_slot_stack_to_rXX r15,16 ; copy value3 into r15 + _64bit_slot_stack_from_rXX r11,16 ; store value2 in fifth stack slot + _64bit_slot_stack_from_rXX r12,8 ; store value1 in fourth stack slot + _64bit_slot_stack_from_rXX r15,0 ; store value3 in third stack slot + push_dual_slot ; increase stack by 2 slots + _64bit_slot_stack_from_rXX r11,8 ; store value2 in second stack slot + _64bit_slot_stack_from_rXX r12,0 ; store value in top stack slot +template_end dup2x1Template + +; stack before: ..., value4, value3, value2, value1 → +; stack after: ..., value2, value1, value4, value3, value2, value1 +template_start dup2x2Template + _64bit_slot_stack_to_rXX r11,0 ; mov value1 into r11 + _64bit_slot_stack_to_rXX r12,8 ; mov value2 into r12 + _64bit_slot_stack_to_rXX r15,16 ; mov value3 into r15 + _64bit_slot_stack_from_rXX r11,16 ; mov value1 into old value3 slot + _64bit_slot_stack_from_rXX r15,0 ; mov value3 into old value1 slot + _64bit_slot_stack_to_rXX r15,24 ; mov value4 into r15 + _64bit_slot_stack_from_rXX r15,8 ; mov value4 into old value2 slot + _64bit_slot_stack_from_rXX r12,24 ; mov value2 into old value4 slot + push_single_slot ; increase the stack + _64bit_slot_stack_from_rXX r12,0 ; mov value2 into second slot + push_single_slot ; increase the stack + _64bit_slot_stack_from_rXX r11,0 ; mov value1 into top stack slot +template_end dup2x2Template + +template_start getFieldTemplatePrologue + _64bit_slot_stack_to_rXX r11,0 ; get the objectref from the stack + lea r13, [r11 + 0xefbeadde] ; load the effective address of the object field +template_end getFieldTemplatePrologue + +template_start intGetFieldTemplate + mov r12d, dword [r13] ; load only 32-bits + _32bit_slot_stack_from_rXX r12,0 ; put it on the stack +template_end intGetFieldTemplate + +template_start addrGetFieldTemplatePrologue + mov r12d, dword [r13] ; load only 32-bits +template_end addrGetFieldTemplatePrologue + +template_start decompressReferenceTemplate + shl r12, 0xff ; Reference from field is compressed, decompress by shift amount +template_end decompressReferenceTemplate + +template_start decompressReference1Template + shl r12, 0x01 ; Reference from field is compressed, decompress by shift amount +template_end decompressReference1Template + +template_start addrGetFieldTemplate + _64bit_slot_stack_from_rXX r12,0 ; put it on the stack +template_end addrGetFieldTemplate + +template_start longGetFieldTemplate + push_single_slot ; increase the stack pointer to make room for extra slot + mov r12, [r13] ; load the value + _64bit_slot_stack_from_rXX r12,0 ; put it on the stack +template_end longGetFieldTemplate + +template_start floatGetFieldTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + movss xmm0, dword [r13] ; load the value + movss [r10], xmm0 ; put it on the stack +template_end floatGetFieldTemplate + +template_start doubleGetFieldTemplate + push_single_slot ; increase the stack pointer to make room for extra slot + movsd xmm0, [r13] ; load the value + movsd [r10], xmm0 ; put it on the stack +template_end doubleGetFieldTemplate + +template_start intPutFieldTemplatePrologue + _32bit_slot_stack_to_rXX r11,0 ; copy value into r11 + pop_single_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field +template_end intPutFieldTemplatePrologue + +template_start addrPutFieldTemplatePrologue + _64bit_slot_stack_to_rXX r11,0 ; copy value into r11 + pop_single_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field +template_end addrPutFieldTemplatePrologue + +template_start longPutFieldTemplatePrologue + _64bit_slot_stack_to_rXX r11,0 ; copy value into r11 + pop_dual_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field +template_end longPutFieldTemplatePrologue + +template_start floatPutFieldTemplatePrologue + movss xmm0, [r10] ; copy value into r11 + pop_single_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field +template_end floatPutFieldTemplatePrologue + +template_start doublePutFieldTemplatePrologue + movsd xmm0, [r10] ; copy value into r11 + pop_dual_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field +template_end doublePutFieldTemplatePrologue + +template_start intPutFieldTemplate + mov [r13], r11d ; Move value to memory +template_end intPutFieldTemplate + +template_start compressReferenceTemplate + shr r11, 0xff +template_end compressReferenceTemplate + +template_start compressReference1Template + shr r11, 0x01 +template_end compressReference1Template + +template_start longPutFieldTemplate + mov [r13], r11 ; Move value to memory +template_end longPutFieldTemplate + +template_start floatPutFieldTemplate + movss [r13], xmm0 ; Move value to memory +template_end floatPutFieldTemplate + +template_start doublePutFieldTemplate + movsd [r13], xmm0 ; Move value to memory +template_end doublePutFieldTemplate + +template_start staticTemplatePrologue + lea r13, [0xefbeadde] ; load the absolute address of a static value +template_end staticTemplatePrologue + +template_start intGetStaticTemplate + push_single_slot ; allocate a stack slot + mov r11d, dword [r13] ; load 32-bit value into r11 + _32bit_slot_stack_from_rXX r11,0 ; move it onto the stack +template_end intGetStaticTemplate + +template_start addrGetStaticTemplatePrologue + push_single_slot ; allocate a stack slot + mov r11, qword [r13] ; load 32-bit value into r11 +template_end addrGetStaticTemplatePrologue + +template_start addrGetStaticTemplate + _64bit_slot_stack_from_rXX r11,0 ; move it onto the stack +template_end addrGetStaticTemplate + +template_start longGetStaticTemplate + push_dual_slot ; allocate a stack slot + mov r11, [r13] ; load 64-bit value into r11 + _64bit_slot_stack_from_rXX r11,0 ; move it onto the stack +template_end longGetStaticTemplate + +template_start floatGetStaticTemplate + push_single_slot ; allocate a stack slot + movss xmm0, [r13] ; load value into r11 + movss [r10], xmm0 ; move it onto the stack +template_end floatGetStaticTemplate + +template_start doubleGetStaticTemplate + push_dual_slot ; allocate a stack slot + movsd xmm0, [r13] ; load value into r11 + movsd [r10], xmm0 ; move it onto the stack +template_end doubleGetStaticTemplate + +template_start intPutStaticTemplate + _32bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 + pop_single_slot ; Pop the value off the stack + mov dword [r13], r11d ; Move it to memory +template_end intPutStaticTemplate + +template_start addrPutStaticTemplatePrologue + _64bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 + pop_single_slot ; Pop the value off the stack +template_end addrPutStaticTemplatePrologue + +template_start addrPutStaticTemplate + mov qword [r13], r11 ; Move it to memory +template_end addrPutStaticTemplate + +template_start longPutStaticTemplate + _64bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 + pop_dual_slot ; Pop the value off the stack + mov [r13], r11 ; Move it to memory +template_end longPutStaticTemplate + +template_start floatPutStaticTemplate + movss xmm0, [r10] ; Move stack value into r11 + pop_single_slot ; Pop the value off the stack + movss [r13], xmm0 ; Move it to memory +template_end floatPutStaticTemplate + +template_start doublePutStaticTemplate + movsd xmm0, [r10] ; Move stack value into r11 + pop_dual_slot ; Pop the value off the stack + movsd [r13], xmm0 ; Move it to memory +template_end doublePutStaticTemplate + +template_start iAddTemplate + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; which means reducing the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r12,0 ; copy second value to the value register + add r11d, r12d ; add the value to the accumulator + _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg +template_end iAddTemplate + +template_start iSubTemplate + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; copy second value to the accumulator register + sub r11d, r12d ; subtract the value from the accumulator + _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg +template_end iSubTemplate + +template_start iMulTemplate + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; which means reducing the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r12,0 ; copy second value to the value register + imul r11d, r12d ; multiply accumulator by the value to the accumulator + _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg +template_end iMulTemplate + +template_start iDivTemplate + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value (dividend) to eax + cdq ; extend sign bit from eax to edx + idiv r12d ; divide edx:eax by lower 32 bits of r12 value + _32bit_slot_stack_from_eXX ax,0 ; store the quotient in accumulator +template_end iDivTemplate + +template_start iRemTemplate + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value (dividend) to eax + cdq ; extend sign bit from eax to edx + idiv r12d ; divide edx:eax by lower 32 bits of r12 value + _32bit_slot_stack_from_eXX dx,0 ; store the remainder in accumulator +template_end iRemTemplate + +template_start iNegTemplate + _32bit_slot_stack_to_rXX r12,0 ; copy the top value from the stack into r12 + neg r12d ; two's complement negation of r12 + _32bit_slot_stack_from_rXX r12,0 ; store the result back on the stack +template_end iNegTemplate + +template_start iShlTemplate + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax + sal eax, cl ; arithmetic left shift eax by cl bits + _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack +template_end iShlTemplate + +template_start iShrTemplate + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax + sar eax, cl ; arithmetic right shift eax by cl bits + _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack +template_end iShrTemplate + +template_start iUshrTemplate + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax + shr eax, cl ; logical right shift eax by cl bits + _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack +template_end iUshrTemplate + +template_start iAndTemplate + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_single_slot ; reduce the stack size by 1 slot + _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + and r11d, r12d ; perform the bitwise and operation + _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack +template_end iAndTemplate + +template_start iOrTemplate + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_single_slot ; reduce the stack size by 1 slot + _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + or r11d, r12d ; perform the bitwise or operation + _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack +template_end iOrTemplate + +template_start iXorTemplate + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_single_slot ; reduce the stack size by 1 slot + _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + xor r11d, r12d ; perform the bitwise xor operation + _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack +template_end iXorTemplate + +template_start i2lTemplate + movsxd r12, dword [r10] ; takes the bottom 32 from the stack bits and sign extends + push_single_slot ; add slot for long (1 slot -> 2 slot) + _64bit_slot_stack_from_rXX r12,0 ; push the result back on the stack +template_end i2lTemplate + +template_start l2iTemplate + movsxd r12, [r10] ; takes the bottom 32 from the stack bits and sign extends + pop_single_slot ; remove extra slot for long (2 slot -> 1 slot) + _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack +template_end l2iTemplate + +template_start i2bTemplate + movsx r12, byte [r10] ; takes the top byte from the stack and sign extends + _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack +template_end i2bTemplate + +template_start i2sTemplate + movsx r12, word [r10] ; takes the top word from the stack and sign extends + _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack +template_end i2sTemplate + +template_start i2cTemplate + movzx r12, word [r10] ; takes the top word from the stack and zero extends + _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack +template_end i2cTemplate + +template_start i2dTemplate + cvtsi2sd xmm0, dword [r10] ; convert signed 32 bit integer to double-presision fp + push_single_slot ; make room for the double + movsd [r10], xmm0 ; push on top of stack +template_end i2dTemplate + +template_start l2dTemplate + cvtsi2sd xmm0, qword [r10] ; convert signed 64 bit integer to double-presision fp + movsd [r10], xmm0 ; push on top of stack +template_end l2dTemplate + +template_start d2iTemplate + movsd xmm0, [r10] ; load value from top of stack + mov r11, 2147483647 ; store max int in r11 + mov r12, -2147483648 ; store min int in r12 + mov rcx, 0 ; store 0 in rcx + cvtsi2sd xmm1, r11d ; convert max int to double + cvtsi2sd xmm2, r12d ; convert min int to double + cvttsd2si rax, xmm0 ; convert double to int with truncation + ucomisd xmm0, xmm1 ; compare value with max int + cmovae rax, r11 ; if value is above or equal to max int, store max int in rax + ucomisd xmm0, xmm2 ; compare value with min int + cmovb rax, r12 ; if value is below min int, store min int in rax + cmovp rax, rcx ; if value is NaN, store 0 in rax + pop_single_slot ; remove extra slot for double (2 slot -> 1 slot) + _64bit_slot_stack_from_rXX rax,0 ; store result back on stack +template_end d2iTemplate + +template_start d2lTemplate + movsd xmm0, [r10] ; load value from top of stack + mov r11, 9223372036854775807 ; store max long in r11 + mov r12, -9223372036854775808 ; store min long in r12 + mov rcx, 0 ; store 0 in rcx + cvtsi2sd xmm1, r11 ; convert max long to double + cvtsi2sd xmm2, r12 ; convert min long to double + cvttsd2si rax, xmm0 ; convert double to long with truncation + ucomisd xmm0, xmm1 ; compare value with max int + cmovae rax, r11 ; if value is above or equal to max long, store max int in rax + ucomisd xmm0, xmm2 ; compare value with min int + cmovb rax, r12 ; if value is below min int, store min int in rax + cmovp rax, rcx ; if value is NaN, store 0 in rax + _64bit_slot_stack_from_rXX rax,0 ; store back on stack +template_end d2lTemplate + +template_start iconstm1Template + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], -1 ; push -1 to the java stack +template_end iconstm1Template + +template_start iconst0Template + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 0 ; push 0 to the java stack +template_end iconst0Template + +template_start iconst1Template + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 1 ; push 1 to the java stack +template_end iconst1Template + +template_start iconst2Template + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 2 ; push 2 to the java stack +template_end iconst2Template + +template_start iconst3Template + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 3 ; push 3 to the java stack +template_end iconst3Template + +template_start iconst4Template + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 4 ; push 4 to the java stack +template_end iconst4Template + +template_start iconst5Template + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 5 ; push 5 to the java stack +template_end iconst5Template + +template_start bipushTemplate + push_single_slot ; increase the stack by 1 slot + mov qword [r10], 0xFF ; push immediate value to stack +template_end bipushTemplate + +template_start sipushTemplatePrologue + mov r11w, 0xFF ; put immediate value in accumulator for sign extension +template_end sipushTemplatePrologue + +template_start sipushTemplate + movsx r11, r11w ; sign extend the contents of r11w through the rest of r11 + push_single_slot ; increase the stack by 1 slot + _32bit_slot_stack_from_rXX r11,0 ; push immediate value to stack +template_end sipushTemplate + +template_start iIncTemplate_01_load + _32bit_local_to_rXX_PATCH r11 +template_end iIncTemplate_01_load + +template_start iIncTemplate_02_add + add r11, byte 0xFF +template_end iIncTemplate_02_add + +template_start iIncTemplate_03_store + _32bit_local_from_rXX_PATCH r11 +template_end iIncTemplate_03_store + +template_start lAddTemplate + _64bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r12,0 ; copy second value to the value register + add r11, r12 ; add the value to the accumulator + _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg +template_end lAddTemplate + +template_start lSubTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r11,0 ; copy second value to the accumulator register + sub r11, r12 ; subtract the value from the accumulator + _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg +template_end lSubTemplate + +template_start lMulTemplate + _64bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r12,0 ; copy second value to the value register + imul r11, r12 ; multiply accumulator by the value to the accumulator + _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg +template_end lMulTemplate + +template_start lDivTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value (dividend) to rax + cqo ; extend sign bit from rax to rdx + idiv r12 ; divide rdx:rax by r12 value + _64bit_slot_stack_from_rXX rax,0 ; store the quotient in accumulator +template_end lDivTemplate + +template_start lRemTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value (dividend) to rax + cqo ; extend sign bit from rax to rdx + idiv r12 ; divide rdx:rax by r12 value + _64bit_slot_stack_from_rXX rdx,0 ; store the remainder in accumulator +template_end lRemTemplate + +template_start lNegTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy the top value from the stack into r12 + neg r12 ; two's complement negation of r12 + _64bit_slot_stack_from_rXX r12,0 ; store the result back on the stack +template_end lNegTemplate + +template_start lShlTemplate + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax + sal rax, cl ; arithmetic left shift rax by cl bits + _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack +template_end lShlTemplate + +template_start lShrTemplate + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax + sar rax, cl ; arithmetic right shift rax by cl bits + _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack +template_end lShrTemplate + +template_start lUshrTemplate + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax + shr rax, cl ; logical right shift rax by cl bits + _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack +template_end lUshrTemplate + +template_start lAndTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + and r11, r12 ; perform the bitwise and operation + _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack +template_end lAndTemplate + +template_start lOrTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + or r11 , r12 ; perform the bitwise or operation + _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack +template_end lOrTemplate + +template_start lXorTemplate + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + xor r11 , r12 ; perform the bitwise xor operation + _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack +template_end lXorTemplate + +template_start lconst0Template + push_dual_slot ; increase the stack size by 2 slots (16 bytes) + mov qword [r10], 0 ; push 0 to the java stack +template_end lconst0Template + +template_start lconst1Template + push_dual_slot ; increase the stack size by 2 slots (16 bytes) + mov qword [r10], 1 ; push 1 to the java stack +template_end lconst1Template + +template_start fAddTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] ; copy top value of stack in xmm1 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm0, [r10] ; copy the second value in xmm0 + addss xmm0, xmm1 ; add the values in xmm registers and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack +template_end fAddTemplate + +template_start fSubTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] ; copy top value of stack in xmm1 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm0, [r10] ; copy the second value in xmm0 + subss xmm0, xmm1 ; subtract the value of xmm1 from xmm0 and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack +template_end fSubTemplate + +template_start fMulTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] ; copy top value of stack in xmm1 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm0, [r10] ; copy the second value in xmm0 + mulss xmm0, xmm1 ; multiply the values in xmm registers and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack +template_end fMulTemplate + +template_start fDivTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] ; copy top value of stack in xmm1 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm0, [r10] ; copy the second value in xmm0 + divss xmm0, xmm1 ; divide the value of xmm0 by xmm1 and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack +template_end fDivTemplate + +template_start fRemTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm1, [r10] ; copy the second value in xmm1 + call 0xefbeadde ; for function call +template_end fRemTemplate + +template_start fNegTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + mov eax, 0x80000000 ; loading mask immediate value to xmm1 through eax + movd xmm1, eax ; as it can't be directly loaded to xmm registers + xorps xmm0, xmm1 ; XOR xmm0 and xmm1 mask bits to negate xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack +template_end fNegTemplate + +template_start fconst0Template + push_single_slot ; increase the java stack size by 1 slot + fldz ; push 0 onto the fpu register stack + fbstp [r10] ; stores st0 on the java stack +template_end fconst0Template + +template_start fconst1Template + push_single_slot ; increase the java stack size by 1 slot + fld1 ; push 1 onto the fpu register stack + fbstp [r10] ; stores st0 on the java stack +template_end fconst1Template + +template_start fconst2Template + push_single_slot ; increase the java stack size by 1 slot + fld1 ; push 1 on the fpu register stack + fld1 ; push another 1 on the fpu register stack + fadd st0, st1 ; add the top two values together and store them back on the top of the stack + fbstp [r10] ; stores st0 on the java stack +template_end fconst2Template + +template_start dconst0Template + push_dual_slot ; increase the java stack size by 2 slots + fldz ; push 0 onto the fpu register stack + fbstp [r10] ; stores st0 on the java stack +template_end dconst0Template + +template_start dconst1Template + push_dual_slot ; increase the java stack size by 2 slots + fld1 ; push 1 onto the fpu register stack + fbstp [r10] ; stores st0 on the java stack +template_end dconst1Template + +template_start dAddTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] ; copy top value of stack in xmm1 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm0, [r10] ; copy the second value in xmm0 + addsd xmm0, xmm1 ; add the values in xmm registers and store in xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack +template_end dAddTemplate + +template_start dSubTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] ; copy top value of stack in xmm1 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm0, [r10] ; copy the second value in xmm0 + subsd xmm0, xmm1 ; subtract the value of xmm1 from xmm0 and store in xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack +template_end dSubTemplate + +template_start dMulTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] ; copy top value of stack in xmm1 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm0, [r10] ; copy the second value in xmm0 + mulsd xmm0, xmm1 ; multiply the values in xmm registers and store in xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack +template_end dMulTemplate + +template_start dDivTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] ; copy top value of stack in xmm1 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm0, [r10] ; copy the second value in xmm0 + divsd xmm0, xmm1 ; divide the value of xmm0 by xmm1 and store in xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack +template_end dDivTemplate + +template_start dRemTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm0, [r10] ; copy top value of stack in xmm0 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm1, [r10] ; copy the second value in xmm1 + call 0xefbeadde ; for function call +template_end dRemTemplate + +template_start dNegTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm0, [r10] ; copy top value of stack in xmm0 + mov rax, 0x8000000000000000 ; loading mask immediate value to xmm1 through rax + movq xmm1, rax ; as it can't be directly loaded to xmm registers + xorpd xmm0, xmm1 ; XOR xmm0 and xmm1 mask bits to negate xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack +template_end dNegTemplate + +template_start i2fTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + cvtsi2ss xmm0, [r10] ; convert integer to float and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack +template_end i2fTemplate + +template_start f2iTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + psrldq xmm2, 8 ; clear xmm2 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + mov r11d, 2147483647 ; store max int in r11 + mov r12d, -2147483648 ; store min int in r12 + mov ecx, 0 ; store 0 in rcx + cvtsi2ss xmm1, r11d ; convert max int to float + cvtsi2ss xmm2, r12d ; convert min int to float + cvttss2si eax, xmm0 ; convert float to int with truncation + ucomiss xmm0, xmm1 ; compare value with max int + cmovae eax, r11d ; if value is above or equal to max int, store max int in rax + ucomiss xmm0, xmm2 ; compare value with min int + cmovb eax, r12d ; if value is below min int, store min int in rax + cmovp eax, ecx ; if value is NaN, store 0 in rax + _32bit_slot_stack_from_eXX ax,0 ; store result back on stack +template_end f2iTemplate + +template_start l2fTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + cvtsi2ss xmm0, qword [r10] ; convert long to float and store in xmm0 + pop_single_slot ; remove extra slot (2 slot -> 1 slot) + movss [r10], xmm0 ; store the result in xmm0 on the java stack +template_end l2fTemplate + +template_start f2lTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + psrldq xmm2, 8 ; clear xmm2 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + mov r11, 9223372036854775807 ; store max long in r11 + mov r12, -9223372036854775808 ; store min long in r12 + mov rcx, 0 ; store 0 in rcx + cvtsi2ss xmm1, r11 ; convert max long to float + cvtsi2ss xmm2, r12 ; convert min long to float + cvttss2si rax, xmm0 ; convert float to long with truncation + ucomiss xmm0, xmm1 ; compare value with max long + cmovae rax, r11 ; if value is above or equal to max long, store max long in rax + ucomiss xmm0, xmm2 ; compare value with min int + cmovb rax, r12 ; if value is below min long, store min long in rax + cmovp rax, rcx ; if value is NaN, store 0 in rax + push_single_slot ; add extra slot for long (1 slot -> 2 slot) + _64bit_slot_stack_from_rXX rax,0 ; store result back on stack +template_end f2lTemplate + +template_start d2fTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + movsd xmm0, [r10] ; copy top value of stack in xmm0 + cvtsd2ss xmm0, xmm0 ; convert double to float and store in xmm0 + pop_single_slot ; remove extra slot from double (2 slot -> 1 slot) + movss [r10], xmm0 ; store the result in xmm0 on the java stack +template_end d2fTemplate + +template_start f2dTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + cvtss2sd xmm0, xmm0 ; convert float to double and store in xmm0 + push_single_slot ; add slot to stack for double (1 slot -> 2 slot) + movsd [r10], xmm0 ; store the result in xmm0 on the java stack +template_end f2dTemplate + +template_start eaxReturnTemplate + xor rax, rax ; clear rax + _32bit_slot_stack_to_eXX ax,0 ; move the stack top into eax register +template_end eaxReturnTemplate + +template_start raxReturnTemplate + _64bit_slot_stack_to_rXX rax,0 ; move the stack top into rax register +template_end raxReturnTemplate + +template_start xmm0ReturnTemplate + psrldq xmm0, 8 ; clear xmm0 by shifting right + movd xmm0, [r10] ; move the stack top into xmm0 register +template_end xmm0ReturnTemplate + +template_start retTemplate_add + add r10, byte 0xFF ; decrease stack size +template_end retTemplate_add + +template_start vReturnTemplate + ret ; return from the JITed method +template_end vReturnTemplate + +; Stack before: ..., value1, value2 -> +; Stack after: ..., result +; if value1 > value2, result = 1 +; if value1 == value 2, result = 0 +; if value1 < value 2, result = -1 +template_start lcmpTemplate + _64bit_slot_stack_to_rXX r11,0 ; grab value2 from the stack + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r12,0 ; grab value1 from the stack + pop_single_slot ; reduce long slot to int slot + xor ecx, ecx ; clear ecx (store 0) + mov dword edx, -1 ; store -1 in edx + mov dword ebx, 1 ; store 1 in ebx + cmp r12, r11 ; compare the two values + cmovg eax, ebx ; mov 1 into eax if ZF = 0 and CF = 0 + cmovl eax, edx ; mov -1 into eax if CF = 1 + cmovz eax, ecx ; mov 0 into eax if ZF = 1 + _32bit_slot_stack_from_eXX ax,0 ; store result on top of stack +template_end lcmpTemplate + +; Stack before: ..., value1, value2 -> +; Stack after: ..., result +; if value1 > value2, result = 1 Flags: ZF,PF,CF --> 000 +; else if value1 == value 2, result = 0 Flags: ZF,PF,CF --> 100 +; else if value1 < value 2, result = -1 Flags: ZF,PF,CF --> 001 +; else, one of value1 or value2 must be NaN -> result = -1 Flags: ZF,PF,CF --> 111 +template_start fcmplTemplate + movss xmm0, dword [r10] ; copy value2 into xmm0 + pop_single_slot ; decrease the stack by 1 slot + movss xmm1, dword [r10] ; copy value1 into xmm1 + xor r11d, r11d ; clear r11 (store 0) + mov dword eax, 1 ; store 1 in rax + mov dword ecx, -1 ; store -1 in rcx + comiss xmm1, xmm0 ; compare the values + cmove r12d, r11d ; mov 0 into r12 if ZF = 1 + cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 + cmovb r12d, ecx ; mov -1 into r12 if CF = 1 + cmovp r12d, ecx ; mov -1 into r12 if PF = 1 + _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack +template_end fcmplTemplate + +; Stack before: ..., value1, value2 -> +; Stack after: ..., result +; if value1 > value2, result = 1 Flags: ZF,PF,CF --> 000 +; else if value1 == value 2, result = 0 Flags: ZF,PF,CF --> 100 +; else if value1 < value 2, result = -1 Flags: ZF,PF,CF --> 001 +; else, one of value1 or value2 must be NaN -> result = 1 Flags: ZF,PF,CF --> 111 +template_start fcmpgTemplate + movss xmm0, dword [r10] ; copy value2 into xmm0 + pop_single_slot ; decrease the stack by 1 slot + movss xmm1, dword [r10] ; copy value1 into xmm1 + xor r11d, r11d ; clear r11 (store 0) + mov dword eax, 1 ; store 1 in rax + mov dword ecx, -1 ; store -1 in rcx + comiss xmm1, xmm0 ; compare the values + cmove r12d, r11d ; mov 0 into r12 if ZF = 1 + cmovb r12d, ecx ; mov -1 into r12 if CF = 1 + cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 + cmovp r12d, eax ; mov 1 into r12 if PF = 1 + _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack +template_end fcmpgTemplate + +; Stack before: ..., value1, value2 -> +; Stack after: ..., result +; if value1 > value2, result = 1 Flags: ZF,PF,CF --> 000 +; else if value1 == value 2, result = 0 Flags: ZF,PF,CF --> 100 +; else if value1 < value 2, result = -1 Flags: ZF,PF,CF --> 001 +; else, one of value1 or value2 must be NaN -> result = -1 Flags: ZF,PF,CF --> 111 +template_start dcmplTemplate + movsd xmm0, qword [r10] ; copy value2 into xmm0 + pop_dual_slot ; decrease the stack by 2 slots + movsd xmm1, qword [r10] ; copy value1 into xmm1 + pop_single_slot ; decrease the stack by 1 slot + xor r11d, r11d ; clear r11 (store 0) + mov dword eax, 1 ; store 1 in rax + mov dword ecx, -1 ; store -1 in rcx + comisd xmm1, xmm0 ; compare the values + cmove r12d, r11d ; mov 0 into r12 if ZF = 1 + cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 + cmovb r12d, ecx ; mov -1 into r12 if CF = 1 + cmovp r12d, ecx ; mov -1 into r12 if PF = 1 + _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack +template_end dcmplTemplate + +; Stack before: ..., value1, value2 -> +; Stack after: ..., result +; if value1 > value2, result = 1 Flags: ZF,PF,CF --> 000 +; else if value1 == value 2, result = 0 Flags: ZF,PF,CF --> 100 +; else if value1 < value 2, result = -1 Flags: ZF,PF,CF --> 001 +; else, one of value1 or value2 must be NaN -> result = 1 Flags: ZF,PF,CF --> 111 +template_start dcmpgTemplate + movsd xmm0, qword [r10] ; copy value2 into xmm0 + pop_dual_slot ; decrease the stack by 1 slot + movsd xmm1, qword [r10] ; copy value1 into xmm1 + pop_single_slot ; decrease the stack by 1 slot + xor r11d, r11d ; clear r11 (store 0) + mov dword eax, 1 ; store 1 in rax + mov dword ecx, -1 ; store -1 in rcx + comisd xmm1, xmm0 ; compare the values + cmove r12d, r11d ; mov 0 into r12 if ZF = 1 + cmovb r12d, ecx ; mov -1 into r12 if CF = 1 + cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 + cmovp r12d, eax ; mov 1 into r12 if PF = 1 + _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack +template_end dcmpgTemplate + +; ifne succeeds if and only if value ≠ 0 +; Stack before: ..., value -> +; Stack after: ... +template_start ifneTemplate + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jnz 0xefbeadde ; Used for generating jump not equal to far labels +template_end ifneTemplate + +; ifeq succeeds if and only if value = 0 +; Stack before: ..., value -> +; Stack after: ... +template_start ifeqTemplate + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jz 0xefbeadde ; Used for generating jump to far labels if ZF = 1 +template_end ifeqTemplate + +; iflt succeeds if and only if value < 0 +; Stack before: ..., value -> +; Stack after: ... +template_start ifltTemplate + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jl 0xefbeadde ; Used for generating jump to far labels if SF <> OF +template_end ifltTemplate + +; ifge succeeds if and only if value >= 0 +; Stack before: ..., value -> +; Stack after: ... +template_start ifgeTemplate + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jge 0xefbeadde ; Used for generating jump to far labels if SF = OF +template_end ifgeTemplate + +; ifgt succeeds if and only if value > 0 +; Stack before: ..., value -> +; Stack after: ... +template_start ifgtTemplate + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jg 0xefbeadde ; Used for generating jump to far labels if ZF = 0 and SF = OF +template_end ifgtTemplate + +; ifle succeeds if and only if value <= 0 +; Stack before: ..., value -> +; Stack after: ... +template_start ifleTemplate + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jle 0xefbeadde ; Used for generating jump to far labels if ZF = 1 or SF <> OF +template_end ifleTemplate + +; if_icmpeq succeeds if and only if value1 = value2 +; Stack before: ..., value1, value2 -> +; Stack after: ... +template_start ificmpeqTemplate + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + je 0xefbeadde ; Used for generating jump greater than equal to far labels +template_end ificmpeqTemplate + +template_start ifacmpeqTemplate + _64bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11, r12 ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + je 0xefbeadde ; Used for generating jump greater than equal to far labels +template_end ifacmpeqTemplate + +; if_icmpne succeeds if and only if value1 ≠ value2 +; Stack before: ..., value1, value2 -> +; Stack after: ... +template_start ificmpneTemplate + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jne 0xefbeadde ; Used for generating jump greater than equal to far labels +template_end ificmpneTemplate + +; if_icmpne succeeds if and only if value1 ≠ value2 +; Stack before: ..., value1, value2 -> +; Stack after: ... +template_start ifacmpneTemplate + _64bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11, r12 ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jne 0xefbeadde ; Used for generating jump greater than equal to far labels +template_end ifacmpneTemplate + +; if_icmplt succeeds if and only if value1 < value2 +; Stack before: ..., value1, value2 -> +; Stack after: ... +template_start ificmpltTemplate + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jl 0xefbeadde ; Used for generating jump greater than equal to far labels +template_end ificmpltTemplate + +; if_icmple succeeds if and only if value1 < value2 +; Stack before: ..., value1, value2 -> +; Stack after: ... +template_start ificmpleTemplate + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jle 0xefbeadde ; Used for generating jump greater than equal to far labels +template_end ificmpleTemplate + +; if_icmpge succeeds if and only if value1 ≥ value2 +; Stack before: ..., value1, value2 -> +; Stack after: ... +template_start ificmpgeTemplate + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jge 0xefbeadde ; Used for generating jump greater than equal to far labels +template_end ificmpgeTemplate + +; if_icmpgt succeeds if and only if value1 > value2 +; Stack before: ..., value1, value2 -> +; Stack after: ... +template_start ificmpgtTemplate + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jg 0xefbeadde ; Used for generating jump greater than to far labels +template_end ificmpgtTemplate + +template_start gotoTemplate + jmp 0xefbeadde ; Used for generating jump not equal to far labels +template_end gotoTemplate + +; Below bytecodes are untested + +; if_null succeeds if and only if value == 0 +; Stack before: ..., value -> +; Stack after: ... +template_start ifnullTemplate + _64bit_slot_stack_to_rXX rax,0 ; grab the reference from the operand stack + pop_single_slot ; reduce the mjit stack by one slot + cmp rax, 0 ; compare against 0 (null) + je 0xefbeadde ; jump if rax is null +template_end ifnullTemplate + +; if_nonnull succeeds if and only if value != 0 +; Stack before: ..., value -> +; Stack after: ... +template_start ifnonnullTemplate + _64bit_slot_stack_to_rXX rax,0 ; grab the reference from the operand stack + pop_single_slot ; reduce the mjit stack by one slot + cmp rax, 0 ; compare against 0 (null) + jne 0xefbeadde ; jump if rax is not null +template_end ifnonnullTemplate diff --git a/runtime/compiler/microjit/x/amd64/templates/linkage.nasm b/runtime/compiler/microjit/x/amd64/templates/linkage.nasm new file mode 100644 index 00000000000..ec5981cf4ad --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/templates/linkage.nasm @@ -0,0 +1,488 @@ +; Copyright (c) 2022, 2022 IBM Corp. and others +; +; This program and the accompanying materials are made available under +; the terms of the Eclipse Public License 2.0 which accompanies this +; distribution and is available at https://www.eclipse.org/legal/epl-2.0/ +; or the Apache License, Version 2.0 which accompanies this distribution and +; is available at https://www.apache.org/licenses/LICENSE-2.0. +; +; This Source Code may also be made available under the following +; Secondary Licenses when the conditions for such availability set +; forth in the Eclipse Public License, v. 2.0 are satisfied: GNU +; General Public License, version 2 with the GNU Classpath +; Exception [1] and GNU General Public License, version 2 with the +; OpenJDK Assembly Exception [2]. +; +; [1] https://www.gnu.org/software/classpath/license.html +; [2] http://openjdk.java.net/legal/assembly-exception.html +; +; SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception +%include "utils.nasm" + +; +;Create templates +; + +section .text + +template_start saveXMM0Local + movq [r14 + 0xefbeadde], xmm0 +template_end saveXMM0Local + +template_start saveXMM1Local + movq [r14 + 0xefbeadde], xmm1 +template_end saveXMM1Local + +template_start saveXMM2Local + movq [r14 + 0xefbeadde], xmm2 +template_end saveXMM2Local + +template_start saveXMM3Local + movq [r14 + 0xefbeadde], xmm3 +template_end saveXMM3Local + +template_start saveXMM4Local + movq [r14 + 0xefbeadde], xmm4 +template_end saveXMM4Local + +template_start saveXMM5Local + movq [r14 + 0xefbeadde], xmm5 +template_end saveXMM5Local + +template_start saveXMM6Local + movq [r14 + 0xefbeadde], xmm6 +template_end saveXMM6Local + +template_start saveXMM7Local + movq [r14 + 0xefbeadde], xmm7 +template_end saveXMM7Local + +template_start saveRAXLocal + mov [r14 + 0xefbeadde], rax +template_end saveRAXLocal + +template_start saveRSILocal + mov [r14 + 0xefbeadde], rsi +template_end saveRSILocal + +template_start saveRDXLocal + mov [r14 + 0xefbeadde], rdx +template_end saveRDXLocal + +template_start saveRCXLocal + mov [r14 + 0xefbeadde], rcx +template_end saveRCXLocal + +template_start saveR11Local + mov [r14 + 0xefbeadde], r11 +template_end saveR11Local + +template_start movRSPOffsetR11 + mov r11, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end movRSPOffsetR11 + +template_start movRSPR10 + mov r10, rsp; Used for allocating space on the stack. +template_end movRSPR10 + +template_start movR10R14 + mov r14, r10; Used for allocating space on the stack. +template_end movR10R14 + +template_start subR10Imm4 + sub r10, 0x7fffffff; Used for allocating space on the stack. +template_end subR10Imm4 + +template_start addR10Imm4 + add r10, 0x7fffffff; Used for allocating space on the stack. +template_end addR10Imm4 + +template_start subR14Imm4 + sub r14, 0x7fffffff; Used for allocating space on the stack. +template_end subR14Imm4 + +template_start addR14Imm4 + add r14, 0x7fffffff; Used for allocating space on the stack. +template_end addR14Imm4 + +template_start saveRBPOffset + mov [rsp+0xff], rbp; Used ff as all other labels look valid and 255 will be rare. +template_end saveRBPOffset + +template_start saveEBPOffset + mov [rsp+0xff], ebp; Used ff as all other labels look valid and 255 will be rare. +template_end saveEBPOffset + +template_start saveRSPOffset + mov [rsp+0xff], rsp; Used ff as all other labels look valid and 255 will be rare. +template_end saveRSPOffset + +template_start saveESPOffset + mov [rsp+0xff], rsp; Used ff as all other labels look valid and 255 will be rare. +template_end saveESPOffset + +template_start loadRBPOffset + mov rbp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadRBPOffset + +template_start loadEBPOffset + mov ebp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadEBPOffset + +template_start loadRSPOffset + mov rsp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadRSPOffset + +template_start loadESPOffset + mov rsp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadESPOffset + +template_start saveEAXOffset + mov [rsp+0xff], eax; Used ff as all other labels look valid and 255 will be rare. +template_end saveEAXOffset + +template_start saveESIOffset + mov [rsp+0xff], esi; Used ff as all other labels look valid and 255 will be rare. +template_end saveESIOffset + +template_start saveEDXOffset + mov [rsp+0xff], edx; Used ff as all other labels look valid and 255 will be rare. +template_end saveEDXOffset + +template_start saveECXOffset + mov [rsp+0xff], ecx; Used ff as all other labels look valid and 255 will be rare. +template_end saveECXOffset + +template_start saveEBXOffset + mov [rsp+0xff], ebx; Used ff as all other labels look valid and 255 will be rare. +template_end saveEBXOffset + +template_start loadEAXOffset + mov eax, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadEAXOffset + +template_start loadESIOffset + mov esi, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadESIOffset + +template_start loadEDXOffset + mov edx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadEDXOffset + +template_start loadECXOffset + mov ecx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadECXOffset + +template_start loadEBXOffset + mov ebx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadEBXOffset + +template_start addRSPImm8 + add rsp, 0x7fffffffffffffff; Used for allocating space on the stack. +template_end addRSPImm8 + +template_start addRSPImm4 + add rsp, 0x7fffffff; Used for allocating space on the stack. +template_end addRSPImm4 + +template_start addRSPImm2 + add rsp, 0x7fff; Used for allocating space on the stack. +template_end addRSPImm2 + +template_start addRSPImm1 + add rsp, 0x7f; Used for allocating space on the stack. +template_end addRSPImm1 + +template_start je4ByteRel + je 0xff; +template_end je4ByteRel + +template_start jeByteRel + je 0xff; +template_end jeByteRel + +template_start subRSPImm8 + sub rsp, 0x7fffffffffffffff; Used for allocating space on the stack. +template_end subRSPImm8 + +template_start subRSPImm4 + sub rsp, 0x7fffffff; Used for allocating space on the stack. +template_end subRSPImm4 + +template_start subRSPImm2 + sub rsp, 0x7fff; Used for allocating space on the stack. +template_end subRSPImm2 + +template_start subRSPImm1 + sub rsp, 0x7f; Used for allocating space on the stack. +template_end subRSPImm1 + +template_start jbe4ByteRel + jbe 0xefbeadde; Used for generating jumps to far labels. +template_end jbe4ByteRel + +template_start cmpRspRbpDerefOffset + cmp rsp, [rbp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end cmpRspRbpDerefOffset + +template_start loadRAXOffset + mov rax, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadRAXOffset + +template_start loadRSIOffset + mov rsi, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadRSIOffset + +template_start loadRDXOffset + mov rdx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadRDXOffset + +template_start loadRCXOffset + mov rcx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadRCXOffset + +template_start loadRBXOffset + mov rbx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadRBXOffset + +template_start loadR9Offset + mov r9, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadR9Offset + +template_start loadR10Offset + mov r10, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadR10Offset + +template_start loadR11Offset + mov r11, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadR11Offset + +template_start loadR12Offset + mov r12, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadR12Offset + +template_start loadR13Offset + mov r13, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadR13Offset + +template_start loadR14Offset + mov r14, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadR14Offset + +template_start loadR15Offset + mov r15, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadR15Offset + +template_start loadXMM0Offset + movq xmm0, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadXMM0Offset + +template_start loadXMM1Offset + movq xmm1, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadXMM1Offset + +template_start loadXMM2Offset + movq xmm2, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadXMM2Offset + +template_start loadXMM3Offset + movq xmm3, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadXMM3Offset + +template_start loadXMM4Offset + movq xmm4, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadXMM4Offset + +template_start loadXMM5Offset + movq xmm5, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadXMM5Offset + +template_start loadXMM6Offset + movq xmm6, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadXMM6Offset + +template_start loadXMM7Offset + movq xmm7, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. +template_end loadXMM7Offset + +template_start saveRAXOffset + mov [rsp+0xff], rax; Used ff as all other labels look valid and 255 will be rare. +template_end saveRAXOffset + +template_start saveRSIOffset + mov [rsp+0xff], rsi; Used ff as all other labels look valid and 255 will be rare. +template_end saveRSIOffset + +template_start saveRDXOffset + mov [rsp+0xff], rdx; Used ff as all other labels look valid and 255 will be rare. +template_end saveRDXOffset + +template_start saveRCXOffset + mov [rsp+0xff], rcx; Used ff as all other labels look valid and 255 will be rare. +template_end saveRCXOffset + +template_start saveRBXOffset + mov [rsp+0xff], rbx; Used ff as all other labels look valid and 255 will be rare. +template_end saveRBXOffset + +template_start saveR9Offset + mov [rsp+0xff], r9; Used ff as all other labels look valid and 255 will be rare. +template_end saveR9Offset + +template_start saveR10Offset + mov [rsp+0xff], r10; Used ff as all other labels look valid and 255 will be rare. +template_end saveR10Offset + +template_start saveR11Offset + mov [rsp+0xff], r11; Used ff as all other labels look valid and 255 will be rare. +template_end saveR11Offset + +template_start saveR12Offset + mov [rsp+0xff], r12; Used ff as all other labels look valid and 255 will be rare. +template_end saveR12Offset + +template_start saveR13Offset + mov [rsp+0xff], r13; Used ff as all other labels look valid and 255 will be rare. +template_end saveR13Offset + +template_start saveR14Offset + mov [rsp+0xff], r14; Used ff as all other labels look valid and 255 will be rare. +template_end saveR14Offset + +template_start saveR15Offset + mov [rsp+0xff], r15; Used ff as all other labels look valid and 255 will be rare. +template_end saveR15Offset + +template_start saveXMM0Offset + movq [rsp+0xff], xmm0; Used ff as all other labels look valid and 255 will be rare. +template_end saveXMM0Offset + +template_start saveXMM1Offset + movq [rsp+0xff], xmm1; Used ff as all other labels look valid and 255 will be rare. +template_end saveXMM1Offset + +template_start saveXMM2Offset + movq [rsp+0xff], xmm2; Used ff as all other labels look valid and 255 will be rare. +template_end saveXMM2Offset + +template_start saveXMM3Offset + movq [rsp+0xff], xmm3; Used ff as all other labels look valid and 255 will be rare. +template_end saveXMM3Offset + +template_start saveXMM4Offset + movq [rsp+0xff], xmm4; Used ff as all other labels look valid and 255 will be rare. +template_end saveXMM4Offset + +template_start saveXMM5Offset + movq [rsp+0xff], xmm5; Used ff as all other labels look valid and 255 will be rare. +template_end saveXMM5Offset + +template_start saveXMM6Offset + movq [rsp+0xff], xmm6; Used ff as all other labels look valid and 255 will be rare. +template_end saveXMM6Offset + +template_start saveXMM7Offset + movq [rsp+0xff], xmm7; Used ff as all other labels look valid and 255 will be rare. +template_end saveXMM7Offset + +template_start call4ByteRel + call 0xefbeadde; Used for generating jumps to far labels. +template_end call4ByteRel + +template_start callByteRel + call 0xff; Used for generating jumps to nearby labels. Used ff as all other labels look valid and 255 will be rare. +template_end callByteRel + +template_start jump4ByteRel + jmp 0xefbeadde; Used for generating jumps to far labels. +template_end jump4ByteRel + +template_start jumpByteRel + jmp short $+0x02; Used for generating jumps to nearby labels. shows up as eb 00 in memory (easy to see) +template_end jumpByteRel + +template_start nopInstruction + nop; Used for alignment. One byte wide +template_end nopInstruction + +template_start movRDIImm64 + mov rdi, 0xefbeaddeefbeadde; looks like 'deadbeef' in objdump +template_end movRDIImm64 + +template_start movEDIImm32 + mov edi, 0xefbeadde; looks like 'deadbeef' in objdump +template_end movEDIImm32 + +template_start movRAXImm64 + mov rax, 0xefbeaddeefbeadde; looks like 'deadbeef' in objdump +template_end movRAXImm64 + +template_start movEAXImm32 + mov eax, 0xefbeadde; looks like 'deadbeef' in objdump +template_end movEAXImm32 + +template_start jumpRDI + jmp rdi; Useful for jumping to absolute addresses. Store in RDI then generate this. +template_end jumpRDI + +template_start jumpRAX + jmp rax; Useful for jumping to absolute addresses. Store in RAX then generate this. +template_end jumpRAX + +template_start paintRegister + mov rax, qword 0x0df0adde0df0adde +template_end paintRegister + +template_start paintLocal + mov [r14+0xefbeadde], rax +template_end paintLocal + +template_start moveCountAndRecompile + mov eax, [qword 0xefbeaddeefbeadde] +template_end moveCountAndRecompile + +template_start checkCountAndRecompile + test eax,eax + je 0xefbeadde +template_end checkCountAndRecompile + +template_start loadCounter + mov eax, [qword 0xefbeaddeefbeadde] +template_end loadCounter + +template_start decrementCounter + sub eax, 1 + mov [qword 0xefbeaddeefbeadde], eax +template_end decrementCounter + +template_start incrementCounter + inc rax + mov [qword 0xefbeaddeefbeadde], eax +template_end incrementCounter + +template_start jgCount + test eax, eax + jg 0xefbeadde +template_end jgCount + +template_start callRetranslateArg1 + mov rax, qword 0x0df0adde0df0adde +template_end callRetranslateArg1 + +template_start callRetranslateArg2 + mov rsi, qword 0x0df0adde0df0adde +template_end callRetranslateArg2 + +template_start callRetranslate + mov edi, 0x00080000 + call 0xefbeadde +template_end callRetranslate + +template_start setCounter + mov rax, 0x2710 + mov [qword 0xefbeaddeefbeadde], eax +template_end setCounter + +template_start jmpToBody + jmp 0xefbeadde +template_end jmpToBody \ No newline at end of file diff --git a/runtime/compiler/microjit/x/amd64/templates/microjit.nasm b/runtime/compiler/microjit/x/amd64/templates/microjit.nasm new file mode 100644 index 00000000000..cd5267d8361 --- /dev/null +++ b/runtime/compiler/microjit/x/amd64/templates/microjit.nasm @@ -0,0 +1,94 @@ +; Copyright (c) 2022, 2022 IBM Corp. and others +; +; This program and the accompanying materials are made available under +; the terms of the Eclipse Public License 2.0 which accompanies this +; distribution and is available at https://www.eclipse.org/legal/epl-2.0/ +; or the Apache License, Version 2.0 which accompanies this distribution and +; is available at https://www.apache.org/licenses/LICENSE-2.0. +; +; This Source Code may also be made available under the following +; Secondary Licenses when the conditions for such availability set +; forth in the Eclipse Public License, v. 2.0 are satisfied: GNU +; General Public License, version 2 with the GNU Classpath +; Exception [1] and GNU General Public License, version 2 with the +; OpenJDK Assembly Exception [2]. +; +; [1] https://www.gnu.org/software/classpath/license.html +; [2] http://openjdk.java.net/legal/assembly-exception.html +; +; SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + +;MicroJIT Virtual Machine to X86-64 mapping +; rsp: +; is the stack extent for the java value stack pointer. +; r10: +; will hold the java value stack pointer. +; r11: +; stores the accumulator, or +; stores a pointer to an object +; r12: +; stores any value which will act on the accumulator, or +; stores the value to be written to an object field, or +; stores the value read from an object field +; r13: +; holds addresses for absolute addressing +; r14: +; holds a pointer to the start of the local array +; r15: +; stores values loaded from memory for storing on the stack or in the local array + +%macro pop_single_slot 0 +add r10, 8 +%endmacro + +%macro pop_dual_slot 0 +add r10, 16 +%endmacro + +%macro push_single_slot 0 +sub r10, 8 +%endmacro + +%macro push_dual_slot 0 +sub r10, 16 +%endmacro + +%macro _32bit_local_to_rXX_PATCH 1 +mov %1d, dword [r14 + 0xefbeadde] +%endmacro + +%macro _64bit_local_to_rXX_PATCH 1 +mov %{1}, qword [r14 + 0xefbeadde] +%endmacro + +%macro _32bit_local_from_rXX_PATCH 1 +mov dword [r14 + 0xefbeadde], %1d +%endmacro + +%macro _64bit_local_from_rXX_PATCH 1 +mov qword [r14 + 0xefbeadde], %1 +%endmacro + +%macro _32bit_slot_stack_to_rXX 2 +mov %1d, dword [r10 + %2] +%endmacro + +%macro _32bit_slot_stack_to_eXX 2 +mov e%{1}, dword [r10 + %2] +%endmacro + +%macro _64bit_slot_stack_to_rXX 2 +mov %{1}, qword [r10 + %2] +%endmacro + +%macro _32bit_slot_stack_from_rXX 2 +mov dword [r10 + %2], %1d +%endmacro + +%macro _32bit_slot_stack_from_eXX 2 +mov dword [r10 + %2], e%1 +%endmacro + +%macro _64bit_slot_stack_from_rXX 2 +mov qword [r10 + %2], %1 +%endmacro diff --git a/runtime/compiler/optimizer/J9Optimizer.cpp b/runtime/compiler/optimizer/J9Optimizer.cpp index 78b690a26cb..efa2d2bc4b4 100644 --- a/runtime/compiler/optimizer/J9Optimizer.cpp +++ b/runtime/compiler/optimizer/J9Optimizer.cpp @@ -86,6 +86,13 @@ #include "optimizer/MethodHandleTransformer.hpp" #include "optimizer/VectorAPIExpansion.hpp" +#if defined(J9VM_OPT_MICROJIT) +static const OptimizationStrategy J9MicroJITOpts[] = + { + { OMR::jProfilingBlock, OMR::MustBeDone }, + { OMR::endOpts } + }; +#endif static const OptimizationStrategy J9EarlyGlobalOpts[] = { @@ -1018,6 +1025,13 @@ J9::Optimizer::optimizationStrategy(TR::Compilation *c) } } +#if defined(J9VM_OPT_MICROJIT) +const OptimizationStrategy * +J9::Optimizer::microJITOptimizationStrategy(TR::Compilation *c) + { + return J9MicroJITOpts; + } +#endif /* J9VM_OPT_MICROJIT */ ValueNumberInfoBuildType J9::Optimizer::valueNumberInfoBuildType() diff --git a/runtime/compiler/optimizer/J9Optimizer.hpp b/runtime/compiler/optimizer/J9Optimizer.hpp index 600b62c27a3..1b5992252c9 100644 --- a/runtime/compiler/optimizer/J9Optimizer.hpp +++ b/runtime/compiler/optimizer/J9Optimizer.hpp @@ -1,5 +1,5 @@ /******************************************************************************* - * Copyright (c) 2000, 2019 IBM Corp. and others + * Copyright (c) 2000, 2022 IBM Corp. and others * * This program and the accompanying materials are made available under * the terms of the Eclipse Public License 2.0 which accompanies this @@ -61,6 +61,9 @@ class Optimizer : public OMR::OptimizerConnector bool switchToProfiling(); static const OptimizationStrategy *optimizationStrategy(TR::Compilation *c); +#if defined(J9VM_OPT_MICROJIT) + static const OptimizationStrategy *microJITOptimizationStrategy(TR::Compilation *c); +#endif static ValueNumberInfoBuildType valueNumberInfoBuildType(); private: diff --git a/runtime/compiler/optimizer/JProfilingBlock.cpp b/runtime/compiler/optimizer/JProfilingBlock.cpp index 34947676d4b..03557670847 100644 --- a/runtime/compiler/optimizer/JProfilingBlock.cpp +++ b/runtime/compiler/optimizer/JProfilingBlock.cpp @@ -1,5 +1,5 @@ /******************************************************************************* - * Copyright (c) 2000, 2021 IBM Corp. and others + * Copyright (c) 2000, 2022 IBM Corp. and others * * This program and the accompanying materials are made available under * the terms of the Eclipse Public License 2.0 which accompanies this @@ -35,6 +35,7 @@ #include "control/Recompilation.hpp" #include "control/RecompilationInfo.hpp" #include "optimizer/TransformUtil.hpp" +#include "env/j9method.h" // Global thresholds for the number of method enters required to trip // method recompilation - these are adjusted in the JIT hook control logic @@ -595,6 +596,12 @@ void TR_JProfilingBlock::computeMinimumSpanningTree(BlockParents &parent, BlockP BlockWeights weights(cfg->getNextNodeNumber(), stackMemoryRegion); TR::BlockChecklist inMST(comp()); + char* classname = comp()->getMethodBeingCompiled()->classNameChars(); + uint16_t classnamelength = comp()->getMethodBeingCompiled()->classNameLength(); + + char* name = comp()->getMethodBeingCompiled()->nameChars(); + uint16_t namelength = comp()->getMethodBeingCompiled()->nameLength(); + // Prim's init { TR::Block *first = optimizer()->getMethodSymbol()->getFirstTreeTop()->getNode()->getBlock(); @@ -602,6 +609,11 @@ void TR_JProfilingBlock::computeMinimumSpanningTree(BlockParents &parent, BlockP Q.push(std::make_pair(0,first)); } + if (trace()) + { + traceMsg(comp(), "class.method:from_block:to_block:frequency\n"); + } + while (!Q.empty()) { TR::Block *block = Q.top().second; @@ -609,14 +621,19 @@ void TR_JProfilingBlock::computeMinimumSpanningTree(BlockParents &parent, BlockP inMST.add(block); if (trace()) - traceMsg(comp(), "Add block_%d to the MST\n", block->getNumber()); + { + //traceMsg(comp(), "Add block_%d to the MST\n", block->getNumber()); + } AdjacentBlockIterator adj(comp(), block); while (adj.current()) { TR::Block *candidate = adj.current(); if (trace()) - traceMsg(comp(), " adj block_%d weight %d\n", candidate->getNumber(), adj.frequency()); + { + //traceMsg(comp(), " adj block_%d weight %d\n", candidate->getNumber(), adj.frequency()); + traceMsg(comp(), "%s.%s:%d:%d:%d\n", classname, name, block->getNumber(), candidate->getNumber(), adj.frequency()); + } if (!inMST.contains(candidate) && weights[candidate] > -adj.frequency()) { weights[candidate] = -adj.frequency(); @@ -923,6 +940,7 @@ void TR_JProfilingBlock::addRecompilationTests(TR_BlockFrequencyInfo *blockFrequ */ int32_t TR_JProfilingBlock::perform() { + // Can we perform these tests before the perform call? if (comp()->getOption(TR_EnableJProfiling)) { if (trace()) @@ -963,6 +981,16 @@ int32_t TR_JProfilingBlock::perform() if (trace()) parent.dump(comp()); +//TODO: Remove this when ready to implement counter insertion. +#if defined(J9VM_OPT_MICROJIT) + J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); + U_8 *extendedFlags = reinterpret_cast(fe())->fetchMethodExtendedFlagsPointer(method); + if( ((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1 ){ + comp()->setSkippedJProfilingBlock(); + return 0; + } +#endif + TR::BlockChecklist countedBlocks(comp()); int32_t numEdges = processCFGForCounting(parent, countedBlocks, loopBack); diff --git a/runtime/compiler/runtime/MetaData.cpp b/runtime/compiler/runtime/MetaData.cpp index 3160aeb2507..a6c4cbc848a 100644 --- a/runtime/compiler/runtime/MetaData.cpp +++ b/runtime/compiler/runtime/MetaData.cpp @@ -59,6 +59,9 @@ #include "il/Symbol.hpp" #include "il/TreeTop.hpp" #include "il/TreeTop_inlines.hpp" +#if defined(J9VM_OPT_MICROJIT) +#include "microjit/ExceptionTable.hpp" +#endif #include "runtime/ArtifactManager.hpp" #include "runtime/CodeCache.hpp" #include "runtime/MethodMetaData.h" @@ -227,6 +230,52 @@ createExceptionTable( } } +#if defined(J9VM_OPT_MICROJIT) +static void +createMJITExceptionTable( + TR_MethodMetaData * data, + MJIT_ExceptionTableEntryIterator & exceptionIterator, + bool fourByteOffsets, + TR::Compilation *comp) + { + uint8_t * cursor = (uint8_t *)data + sizeof(TR_MethodMetaData); + + for (TR_ExceptionTableEntry * e = exceptionIterator.getFirst(); e; e = exceptionIterator.getNext()) + { + if (fourByteOffsets) + { + *(uint32_t *)cursor = e->_instructionStartPC, cursor += 4; + *(uint32_t *)cursor = e->_instructionEndPC, cursor += 4; + *(uint32_t *)cursor = e->_instructionHandlerPC, cursor += 4; + *(uint32_t *)cursor = e->_catchType, cursor += 4; + if (comp->fej9()->isAOT_DEPRECATED_DO_NOT_USE()) + *(uintptr_t *)cursor = (uintptr_t)e->_byteCodeInfo.getCallerIndex(), cursor += sizeof(uintptr_t); + else + *(uintptr_t *)cursor = (uintptr_t)e->_method->resolvedMethodAddress(), cursor += sizeof(uintptr_t); + } + else + { + TR_ASSERT(e->_catchType <= 0xFFFF, "the cp index for the catch type requires 17 bits!"); + + *(uint16_t *)cursor = e->_instructionStartPC, cursor += 2; + *(uint16_t *)cursor = e->_instructionEndPC, cursor += 2; + *(uint16_t *)cursor = e->_instructionHandlerPC, cursor += 2; + *(uint16_t *)cursor = e->_catchType, cursor += 2; + } + + // Ensure that InstructionBoundaries are initialized properly. + // + TR_ASSERT(e->_instructionStartPC != UINT_MAX, "Uninitialized startPC in exception table entry: %p", e); + TR_ASSERT(e->_instructionEndPC != UINT_MAX, "Uninitialized endPC in exception table entry: %p", e); + + if (comp->getOption(TR_FullSpeedDebug) && !debug("dontEmitFSDInfo")) + { + *(uint32_t *)cursor = e->_byteCodeInfo.getByteCodeIndex(); + cursor += 4; + } + } + } +#endif /* J9VM_OPT_MICROJIT */ // This method is used to calculate the size (in number of bytes) // that this internal pointer map will require in the J9 GC map @@ -1087,6 +1136,47 @@ static int32_t calculateExceptionsSize( return exceptionsSize; } +#if defined(J9VM_OPT_MICROJIT) +static int32_t calculateMJITExceptionsSize( + TR::Compilation* comp, + MJIT_ExceptionTableEntryIterator& exceptionIterator, + bool& fourByteExceptionTableEntries, + uint32_t& numberOfExceptionRangesWithBits) + { + uint32_t exceptionsSize = 0; + uint32_t numberOfExceptionRanges = exceptionIterator.size(); + numberOfExceptionRangesWithBits = numberOfExceptionRanges; + if (numberOfExceptionRanges) + { + if (numberOfExceptionRanges > 0x3FFF) + return -1; // our meta data representation only has 14 bits for the number of exception ranges + + if (!fourByteExceptionTableEntries) + for (TR_ExceptionTableEntry * e = exceptionIterator.getFirst(); e; e = exceptionIterator.getNext()) + if (e->_catchType > 0xFFFF || !e->_method->isSameMethod(comp->getCurrentMethod())) + { fourByteExceptionTableEntries = true; break; } + + int32_t entrySize; + if (fourByteExceptionTableEntries) + { + entrySize = sizeof(J9JIT32BitExceptionTableEntry); + numberOfExceptionRangesWithBits |= 0x8000; + } + else + entrySize = sizeof(J9JIT16BitExceptionTableEntry); + + if (comp->getOption(TR_FullSpeedDebug)) + { + numberOfExceptionRangesWithBits |= 0x4000; + entrySize += 4; + } + + exceptionsSize = numberOfExceptionRanges * entrySize; + } + return exceptionsSize; + } +#endif /* J9VM_OPT_MICROJIT */ + static void populateBodyInfo( TR::Compilation *comp, @@ -1749,3 +1839,202 @@ createMethodMetaData( return data; } + +#if defined(J9VM_OPT_MICROJIT) +// +//the routine that sequences the creation of the meta-data for +// a method compiled for MJIT +// +TR_MethodMetaData * +createMJITMethodMetaData( + TR_J9VMBase & vmArg, + TR_ResolvedMethod *vmMethod, + TR::Compilation *comp + ) + { + //--------------------------------------------------------------------------- + //This function needs to create the meta-data structures for: + // GC maps, Exceptions, inlining calls, stack atlas, + // GPU optimizations and internal pointer table + TR_J9VMBase * vm = &vmArg; + MJIT_ExceptionTableEntryIterator exceptionIterator(comp); + TR::ResolvedMethodSymbol * methodSymbol = comp->getJittedMethodSymbol(); + TR::CodeGenerator * cg = comp->cg(); + TR::GCStackAtlas * trStackAtlas = cg->getStackAtlas(); + + // -------------------------------------------------------------------------- + // Find unmergeable GC maps + //TODO: Create mechanism like treetop to support this at a MicroJIT level. + GCStackMapSet nonmergeableBCI(std::less(), cg->trMemory()->heapMemoryRegion()); + // if (comp->getOptions()->getReportByteCodeInfoAtCatchBlock()) + // { + // for (TR::TreeTop* treetop = comp->getStartTree(); treetop; treetop = treetop->getNextTreeTop()) + // { + // if (treetop->getNode()->getOpCodeValue() == TR::BBStart) + // { + // TR::Block* block = treetop->getNode()->getBlock(); + // if (block->getCatchBlockExtension()) + // { + // nonmergeableBCI.insert(block->getFirstInstruction()->getGCMap()); + // } + // } + // } + // } + + bool fourByteOffsets = RANGE_NEEDS_FOUR_BYTE_OFFSET(cg->getCodeLength()); + + uint32_t tableSize = sizeof(TR_MethodMetaData); + + uint32_t numberOfSlotsMapped = trStackAtlas->getNumberOfSlotsMapped(); + uint32_t numberOfMapBytes = (numberOfSlotsMapped + 1 + 7) >> 3; // + 1 to indicate whether there's a live monitor map + if (comp->isAlignStackMaps()) + { + numberOfMapBytes = (numberOfMapBytes + 3) & ~3; + } + + // -------------------------------------------------------------------------- + // Computing the size of the exception table + // fourByteExceptionTableEntries and numberOfExceptionRangesWithBits will be + // computed in calculateExceptionSize + // + bool fourByteExceptionTableEntries = fourByteOffsets; + uint32_t numberOfExceptionRangesWithBits = 0; + int32_t exceptionsSize = calculateMJITExceptionsSize( + comp, + exceptionIterator, + fourByteExceptionTableEntries, + numberOfExceptionRangesWithBits); + + if (exceptionsSize == -1) + { + return NULL; + } + // TODO: once exception support is added to MicroJIT uncomment this line. + // tableSize += exceptionsSize; + uint32_t exceptionTableSize = tableSize; //Size of the meta data header and exception entries + + // -------------------------------------------------------------------------- + // Computing the size of the inlinedCall + // + + // TODO: Do we care about inline calls? + uint32_t inlinedCallSize = 0; + + // Add size of stack atlas to allocate + // + int32_t sizeOfStackAtlasInBytes = calculateSizeOfStackAtlas( + vm, + cg, + fourByteOffsets, + numberOfSlotsMapped, + numberOfMapBytes, + comp, + nonmergeableBCI); + + tableSize += sizeOfStackAtlasInBytes; + + // Add size of internal ptr data structure to allocate + // + tableSize += calculateSizeOfInternalPtrMap(comp); + + // Escape analysis change for compressed pointers + // + TR_GCStackAllocMap * stackAllocMap = trStackAtlas->getStackAllocMap(); + if (stackAllocMap) + { + tableSize += numberOfMapBytes + sizeof(uintptr_t); + } + + // Internal pointer map + // + + /* Legend of the info stored at "data". From top to bottom, the address increases + Exception Info + -------------- + Inline Info + -------------- + Stack Atlas + -------------- + Stack alloc map (if exists) + -------------- + internal pointer map + */ + + TR_MethodMetaData * data = (TR_MethodMetaData*) vmMethod->allocateException(tableSize, comp); + + populateBodyInfo(comp, vm, data); + + data->startPC = (UDATA)comp->cg()->getCodeStart(); + data->endPC = (UDATA)comp->cg()->getCodeEnd(); + data->startColdPC = (UDATA)0; + data->endWarmPC = data->endPC; + data->codeCacheAlloc = (UDATA)comp->cg()->getBinaryBufferStart(); + data->flags |= JIT_METADATA_GC_MAP_32_BIT_OFFSETS; + + data->hotness = comp->getMethodHotness(); + data->totalFrameSize = comp->cg()->getFrameSizeInBytes()/TR::Compiler->om.sizeofReferenceAddress(); + data->slots = vmMethod->numberOfParameterSlots(); + data->scalarTempSlots = methodSymbol->getScalarTempSlots(); + data->objectTempSlots = methodSymbol->getObjectTempSlots(); + data->prologuePushes = methodSymbol->getProloguePushSlots(); + data->numExcptionRanges = numberOfExceptionRangesWithBits; + data->tempOffset = comp->cg()->getStackAtlas()->getNumberOfPendingPushSlots(); + data->size = tableSize; + + data->gcStackAtlas = createStackAtlas( + vm, + comp->cg(), + fourByteOffsets, + numberOfSlotsMapped, + numberOfMapBytes, + comp, + ((uint8_t *)data + exceptionTableSize + inlinedCallSize), + sizeOfStackAtlasInBytes, + nonmergeableBCI); + + createMJITExceptionTable(data, exceptionIterator, fourByteExceptionTableEntries, comp); + + data->registerSaveDescription = comp->cg()->getRegisterSaveDescription(); + + data->inlinedCalls = NULL; + data->riData = NULL; + + if (!(vm->_jitConfig->runtimeFlags & J9JIT_TOSS_CODE) && !vm->isAOT_DEPRECATED_DO_NOT_USE()) + { + TR_TranslationArtifactManager *artifactManager = TR_TranslationArtifactManager::getGlobalArtifactManager(); + TR_TranslationArtifactManager::CriticalSection updateMetaData; + + if ( !(artifactManager->insertArtifact( static_cast(data) ) ) ) + { + // Insert trace point here for insertion failure + } + if (vm->isAnonymousClass( ((TR_ResolvedJ9Method*)vmMethod)->romClassPtr())) + { + J9Class *j9clazz = ((TR_ResolvedJ9Method*)vmMethod)->constantPoolHdr(); + J9CLASS_EXTENDED_FLAGS_SET(j9clazz, J9ClassContainsJittedMethods); + data->prevMethod = NULL; + data->nextMethod = j9clazz->jitMetaDataList; + if (j9clazz->jitMetaDataList) + j9clazz->jitMetaDataList->prevMethod = data; + j9clazz->jitMetaDataList = data; + } + else + { + J9ClassLoader * classLoader = ((TR_ResolvedJ9Method*)vmMethod)->getClassLoader(); + classLoader->flags |= J9CLASSLOADER_CONTAINS_JITTED_METHODS; + data->prevMethod = NULL; + data->nextMethod = classLoader->jitMetaDataList; + if (classLoader->jitMetaDataList) + classLoader->jitMetaDataList->prevMethod = data; + classLoader->jitMetaDataList = data; + } + } + + if (comp->getOption(TR_TraceCG) && comp->getOutFile() != NULL) + { + comp->getDebug()->print(data, vmMethod, fourByteOffsets); + } + + return data; + } +#endif /* J9VM_OPT_MICROJIT */ diff --git a/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp b/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp index d75b6187ab4..b56bd49c2d3 100644 --- a/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp +++ b/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp @@ -86,14 +86,21 @@ enum J9::X86::AMD64::PrivateLinkage::PrivateLinkage(TR::CodeGenerator *cg) : J9::X86::PrivateLinkage(cg) + { + setOffsetToFirstParm(RETURN_ADDRESS_SIZE); + setProperties(cg,&_properties); + } + +void J9::X86::AMD64::setProperties( + TR::CodeGenerator *cg, + TR::X86LinkageProperties *properties) { TR_J9VMBase *fej9 = (TR_J9VMBase *)(cg->fe()); const TR::RealRegister::RegNum noReg = TR::RealRegister::NoReg; uint8_t r, p; TR::RealRegister::RegNum metaReg = TR::RealRegister::ebp; - - _properties._properties = + properties->_properties = EightBytePointers | EightByteParmSlots | IntegersInRegisters | LongsInRegisters | FloatsInRegisters | NeedsThunksForIndirectCalls @@ -101,98 +108,97 @@ J9::X86::AMD64::PrivateLinkage::PrivateLinkage(TR::CodeGenerator *cg) ; if (!fej9->pushesOutgoingArgsAtCallSite(cg->comp())) - _properties._properties |= CallerCleanup | ReservesOutgoingArgsInPrologue; + properties->_properties |= CallerCleanup | ReservesOutgoingArgsInPrologue; // Integer arguments p=0; - _properties._firstIntegerArgumentRegister = p; - _properties._argumentRegisters[p++] = TR::RealRegister::eax; - _properties._argumentRegisters[p++] = TR::RealRegister::esi; - _properties._argumentRegisters[p++] = TR::RealRegister::edx; - _properties._argumentRegisters[p++] = TR::RealRegister::ecx; + properties->_firstIntegerArgumentRegister = p; + properties->_argumentRegisters[p++] = TR::RealRegister::eax; + properties->_argumentRegisters[p++] = TR::RealRegister::esi; + properties->_argumentRegisters[p++] = TR::RealRegister::edx; + properties->_argumentRegisters[p++] = TR::RealRegister::ecx; TR_ASSERT(p == NUM_INTEGER_LINKAGE_REGS, "assertion failure"); - _properties._numIntegerArgumentRegisters = NUM_INTEGER_LINKAGE_REGS; + properties->_numIntegerArgumentRegisters = NUM_INTEGER_LINKAGE_REGS; // Float arguments - _properties._firstFloatArgumentRegister = p; + properties->_firstFloatArgumentRegister = p; for(r=0; r<=7; r++) - _properties._argumentRegisters[p++] = TR::RealRegister::xmmIndex(r); + properties->_argumentRegisters[p++] = TR::RealRegister::xmmIndex(r); TR_ASSERT(p == NUM_INTEGER_LINKAGE_REGS + NUM_FLOAT_LINKAGE_REGS, "assertion failure"); - _properties._numFloatArgumentRegisters = NUM_FLOAT_LINKAGE_REGS; + properties->_numFloatArgumentRegisters = NUM_FLOAT_LINKAGE_REGS; // Preserved p=0; - _properties._preservedRegisters[p++] = TR::RealRegister::ebx; - _properties._preservedRegisterMapForGC = TR::RealRegister::ebxMask; + properties->_preservedRegisters[p++] = TR::RealRegister::ebx; + properties->_preservedRegisterMapForGC = TR::RealRegister::ebxMask; int32_t lastPreservedRegister = 9; // changed to 9 for liberty, it used to be 15 for (r=9; r<=lastPreservedRegister; r++) { - _properties._preservedRegisters[p++] = TR::RealRegister::rIndex(r); - _properties._preservedRegisterMapForGC |= TR::RealRegister::gprMask(TR::RealRegister::rIndex(r)); + properties->_preservedRegisters[p++] = TR::RealRegister::rIndex(r); + properties->_preservedRegisterMapForGC |= TR::RealRegister::gprMask(TR::RealRegister::rIndex(r)); } - _properties._numberOfPreservedGPRegisters = p; + properties->_numberOfPreservedGPRegisters = p; if (!INTERPRETER_CLOBBERS_XMMS) for (r=8; r<=15; r++) { - _properties._preservedRegisters[p++] = TR::RealRegister::xmmIndex(r); - _properties._preservedRegisterMapForGC |= TR::RealRegister::xmmrMask(TR::RealRegister::xmmIndex(r)); + properties->_preservedRegisters[p++] = TR::RealRegister::xmmIndex(r); + properties->_preservedRegisterMapForGC |= TR::RealRegister::xmmrMask(TR::RealRegister::xmmIndex(r)); } - _properties._numberOfPreservedXMMRegisters = p - _properties._numberOfPreservedGPRegisters; - _properties._maxRegistersPreservedInPrologue = p; - _properties._preservedRegisters[p++] = metaReg; - _properties._preservedRegisters[p++] = TR::RealRegister::esp; - _properties._numPreservedRegisters = p; + properties->_numberOfPreservedXMMRegisters = p - properties->_numberOfPreservedGPRegisters; + properties->_maxRegistersPreservedInPrologue = p; + properties->_preservedRegisters[p++] = metaReg; + properties->_preservedRegisters[p++] = TR::RealRegister::esp; + properties->_numPreservedRegisters = p; // Other - _properties._returnRegisters[0] = TR::RealRegister::eax; - _properties._returnRegisters[1] = TR::RealRegister::xmm0; - _properties._returnRegisters[2] = noReg; + properties->_returnRegisters[0] = TR::RealRegister::eax; + properties->_returnRegisters[1] = TR::RealRegister::xmm0; + properties->_returnRegisters[2] = noReg; - _properties._scratchRegisters[0] = TR::RealRegister::edi; - _properties._scratchRegisters[1] = TR::RealRegister::r8; - _properties._numScratchRegisters = 2; + properties->_scratchRegisters[0] = TR::RealRegister::edi; + properties->_scratchRegisters[1] = TR::RealRegister::r8; + properties->_numScratchRegisters = 2; - _properties._vtableIndexArgumentRegister = TR::RealRegister::r8; - _properties._j9methodArgumentRegister = TR::RealRegister::edi; - _properties._framePointerRegister = TR::RealRegister::esp; - _properties._methodMetaDataRegister = metaReg; + properties->_vtableIndexArgumentRegister = TR::RealRegister::r8; + properties->_j9methodArgumentRegister = TR::RealRegister::edi; + properties->_framePointerRegister = TR::RealRegister::esp; + properties->_methodMetaDataRegister = metaReg; - _properties._numberOfVolatileGPRegisters = 6; // rax, rsi, rdx, rcx, rdi, r8 - _properties._numberOfVolatileXMMRegisters = INTERPRETER_CLOBBERS_XMMS? 16 : 8; // xmm0-xmm7 + properties->_numberOfVolatileGPRegisters = 6; // rax, rsi, rdx, rcx, rdi, r8 + properties->_numberOfVolatileXMMRegisters = INTERPRETER_CLOBBERS_XMMS? 16 : 8; // xmm0-xmm7 // Offsets relative to where the frame pointer *would* point if we had one; // namely, the local with the highest address (ie. the "first" local) - setOffsetToFirstParm(RETURN_ADDRESS_SIZE); - _properties._offsetToFirstLocal = 0; + properties->_offsetToFirstLocal = 0; // TODO: Need a better way to build the flags so they match the info above // - memset(_properties._registerFlags, 0, sizeof(_properties._registerFlags)); + memset(properties->_registerFlags, 0, sizeof(properties->_registerFlags)); // Integer arguments/return - _properties._registerFlags[TR::RealRegister::eax] = IntegerArgument | IntegerReturn; - _properties._registerFlags[TR::RealRegister::esi] = IntegerArgument; - _properties._registerFlags[TR::RealRegister::edx] = IntegerArgument; - _properties._registerFlags[TR::RealRegister::ecx] = IntegerArgument; + properties->_registerFlags[TR::RealRegister::eax] = IntegerArgument | IntegerReturn; + properties->_registerFlags[TR::RealRegister::esi] = IntegerArgument; + properties->_registerFlags[TR::RealRegister::edx] = IntegerArgument; + properties->_registerFlags[TR::RealRegister::ecx] = IntegerArgument; // Float arguments/return - _properties._registerFlags[TR::RealRegister::xmm0] = FloatArgument | FloatReturn; + properties->_registerFlags[TR::RealRegister::xmm0] = FloatArgument | FloatReturn; for(r=1; r <= 7; r++) - _properties._registerFlags[TR::RealRegister::xmmIndex(r)] = FloatArgument; + properties->_registerFlags[TR::RealRegister::xmmIndex(r)] = FloatArgument; // Preserved - _properties._registerFlags[TR::RealRegister::ebx] = Preserved; - _properties._registerFlags[TR::RealRegister::esp] = Preserved; - _properties._registerFlags[metaReg] = Preserved; + properties->_registerFlags[TR::RealRegister::ebx] = Preserved; + properties->_registerFlags[TR::RealRegister::esp] = Preserved; + properties->_registerFlags[metaReg] = Preserved; for(r=9; r <= lastPreservedRegister; r++) - _properties._registerFlags[TR::RealRegister::rIndex(r)] = Preserved; + properties->_registerFlags[TR::RealRegister::rIndex(r)] = Preserved; if(!INTERPRETER_CLOBBERS_XMMS) for(r=8; r <= 15; r++) - _properties._registerFlags[TR::RealRegister::xmmIndex(r)] = Preserved; + properties->_registerFlags[TR::RealRegister::xmmIndex(r)] = Preserved; p = 0; @@ -201,8 +207,8 @@ J9::X86::AMD64::PrivateLinkage::PrivateLinkage(TR::CodeGenerator *cg) // Volatiles that aren't linkage regs if (TR::Machine::numGPRRegsWithheld(cg) == 0) { - _properties._allocationOrder[p++] = TR::RealRegister::edi; - _properties._allocationOrder[p++] = TR::RealRegister::r8; + properties->_allocationOrder[p++] = TR::RealRegister::edi; + properties->_allocationOrder[p++] = TR::RealRegister::r8; } else { @@ -210,20 +216,20 @@ J9::X86::AMD64::PrivateLinkage::PrivateLinkage(TR::CodeGenerator *cg) } // Linkage regs in reverse order - _properties._allocationOrder[p++] = TR::RealRegister::ecx; - _properties._allocationOrder[p++] = TR::RealRegister::edx; - _properties._allocationOrder[p++] = TR::RealRegister::esi; - _properties._allocationOrder[p++] = TR::RealRegister::eax; + properties->_allocationOrder[p++] = TR::RealRegister::ecx; + properties->_allocationOrder[p++] = TR::RealRegister::edx; + properties->_allocationOrder[p++] = TR::RealRegister::esi; + properties->_allocationOrder[p++] = TR::RealRegister::eax; } // Preserved regs - _properties._allocationOrder[p++] = TR::RealRegister::ebx; - _properties._allocationOrder[p++] = TR::RealRegister::r9; - _properties._allocationOrder[p++] = TR::RealRegister::r10; - _properties._allocationOrder[p++] = TR::RealRegister::r11; - _properties._allocationOrder[p++] = TR::RealRegister::r12; - _properties._allocationOrder[p++] = TR::RealRegister::r13; - _properties._allocationOrder[p++] = TR::RealRegister::r14; - _properties._allocationOrder[p++] = TR::RealRegister::r15; + properties->_allocationOrder[p++] = TR::RealRegister::ebx; + properties->_allocationOrder[p++] = TR::RealRegister::r9; + properties->_allocationOrder[p++] = TR::RealRegister::r10; + properties->_allocationOrder[p++] = TR::RealRegister::r11; + properties->_allocationOrder[p++] = TR::RealRegister::r12; + properties->_allocationOrder[p++] = TR::RealRegister::r13; + properties->_allocationOrder[p++] = TR::RealRegister::r14; + properties->_allocationOrder[p++] = TR::RealRegister::r15; TR_ASSERT(p == machine()->getNumGlobalGPRs(), "assertion failure"); @@ -232,34 +238,33 @@ J9::X86::AMD64::PrivateLinkage::PrivateLinkage(TR::CodeGenerator *cg) // Linkage regs in reverse order if (TR::Machine::numRegsWithheld(cg) == 0) { - _properties._allocationOrder[p++] = TR::RealRegister::xmm7; - _properties._allocationOrder[p++] = TR::RealRegister::xmm6; + properties->_allocationOrder[p++] = TR::RealRegister::xmm7; + properties->_allocationOrder[p++] = TR::RealRegister::xmm6; } else { TR_ASSERT(TR::Machine::numRegsWithheld(cg) == 2, "numRegsWithheld: only 0 and 2 currently supported"); } - _properties._allocationOrder[p++] = TR::RealRegister::xmm5; - _properties._allocationOrder[p++] = TR::RealRegister::xmm4; - _properties._allocationOrder[p++] = TR::RealRegister::xmm3; - _properties._allocationOrder[p++] = TR::RealRegister::xmm2; - _properties._allocationOrder[p++] = TR::RealRegister::xmm1; - _properties._allocationOrder[p++] = TR::RealRegister::xmm0; + properties->_allocationOrder[p++] = TR::RealRegister::xmm5; + properties->_allocationOrder[p++] = TR::RealRegister::xmm4; + properties->_allocationOrder[p++] = TR::RealRegister::xmm3; + properties->_allocationOrder[p++] = TR::RealRegister::xmm2; + properties->_allocationOrder[p++] = TR::RealRegister::xmm1; + properties->_allocationOrder[p++] = TR::RealRegister::xmm0; } // Preserved regs - _properties._allocationOrder[p++] = TR::RealRegister::xmm8; - _properties._allocationOrder[p++] = TR::RealRegister::xmm9; - _properties._allocationOrder[p++] = TR::RealRegister::xmm10; - _properties._allocationOrder[p++] = TR::RealRegister::xmm11; - _properties._allocationOrder[p++] = TR::RealRegister::xmm12; - _properties._allocationOrder[p++] = TR::RealRegister::xmm13; - _properties._allocationOrder[p++] = TR::RealRegister::xmm14; - _properties._allocationOrder[p++] = TR::RealRegister::xmm15; + properties->_allocationOrder[p++] = TR::RealRegister::xmm8; + properties->_allocationOrder[p++] = TR::RealRegister::xmm9; + properties->_allocationOrder[p++] = TR::RealRegister::xmm10; + properties->_allocationOrder[p++] = TR::RealRegister::xmm11; + properties->_allocationOrder[p++] = TR::RealRegister::xmm12; + properties->_allocationOrder[p++] = TR::RealRegister::xmm13; + properties->_allocationOrder[p++] = TR::RealRegister::xmm14; + properties->_allocationOrder[p++] = TR::RealRegister::xmm15; TR_ASSERT(p == (machine()->getNumGlobalGPRs() + machine()->_numGlobalFPRs), "assertion failure"); } - //////////////////////////////////////////////// // // Argument manipulation diff --git a/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.hpp b/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.hpp index 10bdc3fe637..2f330d401cb 100644 --- a/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.hpp +++ b/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.hpp @@ -1,5 +1,5 @@ /******************************************************************************* - * Copyright (c) 2000, 2020 IBM Corp. and others + * Copyright (c) 2000, 2022 IBM Corp. and others * * This program and the accompanying materials are made available under * the terms of the Eclipse Public License 2.0 which accompanies this @@ -38,7 +38,7 @@ namespace X86 namespace AMD64 { - +void setProperties(TR::CodeGenerator *cg,TR::X86LinkageProperties *properties); class PrivateLinkage : public J9::X86::PrivateLinkage { public: diff --git a/runtime/include/j9cfg.h.in b/runtime/include/j9cfg.h.in index f2755a06b39..251eaa0e615 100644 --- a/runtime/include/j9cfg.h.in +++ b/runtime/include/j9cfg.h.in @@ -225,6 +225,7 @@ extern "C" { #cmakedefine J9VM_OPT_MEMORY_CHECK_SUPPORT #cmakedefine J9VM_OPT_METHOD_HANDLE #cmakedefine J9VM_OPT_METHOD_HANDLE_COMMON +#cmakedefine J9VM_OPT_MICROJIT #cmakedefine J9VM_OPT_MODULE #cmakedefine J9VM_OPT_MULTI_VM #cmakedefine J9VM_OPT_NATIVE_CHARACTER_CONVERTER diff --git a/runtime/makelib/uma.properties b/runtime/makelib/uma.properties index e013dcc74c9..93186af50d6 100644 --- a/runtime/makelib/uma.properties +++ b/runtime/makelib/uma.properties @@ -1,4 +1,4 @@ -# Copyright (c) 2009, 2020 IBM Corp. and others +# Copyright (c) 2009, 2022 IBM Corp. and others # # This program and the accompanying materials are made available under # the terms of the Eclipse Public License 2.0 which accompanies this @@ -33,4 +33,4 @@ JCL_HS60=jclhs60_ makefile_phase_prefix=.NOTPARALLEL # Define phases -phase_list=omr core jit mjit quick sjit util j2se jvmti_tests rom offload +phase_list=omr core jit quick sjit util j2se jvmti_tests rom offload diff --git a/runtime/oti/j9consts.h b/runtime/oti/j9consts.h index d58b8a56d20..7951bebdeb2 100644 --- a/runtime/oti/j9consts.h +++ b/runtime/oti/j9consts.h @@ -700,12 +700,14 @@ extern "C" { #define J9_BCLOOP_EXIT_INTERPRETER 0x16 #define J9_BCLOOP_FILL_OSR_BUFFER 0x17 +/* Extended Method Flags */ #define J9_RAS_METHOD_UNSEEN 0x0 #define J9_RAS_METHOD_SEEN 0x1 #define J9_RAS_METHOD_TRACING 0x2 #define J9_RAS_METHOD_TRACE_ARGS 0x4 #define J9_RAS_METHOD_TRIGGERING 0x8 #define J9_RAS_MASK 0xF +#define J9_MJIT_FAILED_COMPILE 0x10 #define J9_JNI_OFFLOAD_SWITCH_CREATE_JAVA_VM 0x1 #define J9_JNI_OFFLOAD_SWITCH_DEALLOCATE_VM_THREAD 0x2 From 2d204d56c7edffba39971f202aa7608790e575ea Mon Sep 17 00:00:00 2001 From: Scott Young Date: Thu, 17 Mar 2022 20:52:29 +0000 Subject: [PATCH 02/25] Added documentation for the MicroJIT feature Signed-off-by: Scott Young --- doc/compiler/MicroJIT/DSL.md | 60 +++++++++++++++++++++++ doc/compiler/MicroJIT/Platforms.md | 30 ++++++++++++ doc/compiler/MicroJIT/README.md | 74 +++++++++++++++++++++++++++++ doc/compiler/MicroJIT/SideTables.md | 65 +++++++++++++++++++++++++ doc/compiler/MicroJIT/Stages.md | 74 +++++++++++++++++++++++++++++ doc/compiler/MicroJIT/support.md | 72 ++++++++++++++++++++++++++++ doc/compiler/MicroJIT/vISA.md | 49 +++++++++++++++++++ 7 files changed, 424 insertions(+) create mode 100644 doc/compiler/MicroJIT/DSL.md create mode 100644 doc/compiler/MicroJIT/Platforms.md create mode 100644 doc/compiler/MicroJIT/README.md create mode 100644 doc/compiler/MicroJIT/SideTables.md create mode 100644 doc/compiler/MicroJIT/Stages.md create mode 100644 doc/compiler/MicroJIT/support.md create mode 100644 doc/compiler/MicroJIT/vISA.md diff --git a/doc/compiler/MicroJIT/DSL.md b/doc/compiler/MicroJIT/DSL.md new file mode 100644 index 00000000000..d9875453f49 --- /dev/null +++ b/doc/compiler/MicroJIT/DSL.md @@ -0,0 +1,60 @@ + + +# Overview + +MicroJIT's DSL exists to help new contributors from making common mistakes +when writing new, or editing existsing, templates. It helps developers keep +sizes of data types and number of slots separate from each other conceptually +while also keeping their own footprint small. + +# Current Macros + +The current existing DSL macros are: + pop_single_slot + add the size of 1 stack slots from the computation stack pointer + pop_dual_slot + add the size of 2 stack slots from the computation stack pointer + push_single_slot + subtract the size of 1 stack slots from the computation stack pointer + push_dual_slot + subtract the size of 2 stack slots from the computation stack pointer + _32bit_local_to_rXX_PATCH + Move a 32 bit value to a r11d-r15d register from a local array slot + _64bit_local_to_rXX_PATCH + Move a 64 bit value to a 64-bit register from a local array slot + _32bit_local_from_rXX_PATCH + Move a 32 bit value from a r11d-r15d register to a local array slot + _64bit_local_from_rXX_PATCH + Move a 64 bit value from a 64-bit register to a local array slot + _32bit_slot_stack_to_rXX + Move a 64 bit value to an r11-r15 register from the computation stack + _32bit_slot_stack_to_eXX + Move a 32 bit value to an eax-edx register from the computation stack + _64bit_slot_stack_to_rXX + Move a 64 bit value to a 64 bit register from the computation stack + _32bit_slot_stack_from_rXX + Move a 64 bit value from an r11d-r15d register to the computation stack + _32bit_slot_stack_from_eXX + Move a 32 bit value from an eax-edx register to the computation stack + _64bit_slot_stack_from_rXX + Move a 64 bit value from a 64 bit register to the computation stack \ No newline at end of file diff --git a/doc/compiler/MicroJIT/Platforms.md b/doc/compiler/MicroJIT/Platforms.md new file mode 100644 index 00000000000..10554a70949 --- /dev/null +++ b/doc/compiler/MicroJIT/Platforms.md @@ -0,0 +1,30 @@ + + +# List of currently supported platforms + +AMD64 (a.k.a. x86-64) + +# List of expected future platforms + +RISC-V +aarch64 \ No newline at end of file diff --git a/doc/compiler/MicroJIT/README.md b/doc/compiler/MicroJIT/README.md new file mode 100644 index 00000000000..4450f25bc20 --- /dev/null +++ b/doc/compiler/MicroJIT/README.md @@ -0,0 +1,74 @@ + + +# Overview + +MicroJIT is a lightweight template-based Just-In-Time (JIT) compiler that integrates +seamlessly with TR. MicroJIT is designed to be used either in-place-of TR +(where CPU resources are scarce) or to reduce the time between start-up and +TR's first compile (where CPU resources are not a constraint). + +MicroJIT's entry point is at the `wrapped compile` stage of TR's compile process. +When building with a `--enable-microjit` configuration, the VM checks whether +a method should be compiled with MicroJIT or TR before handing off the method to +code generation ([MicroJIT Compile Stages](Stages.md)) + +Being template-based, MicroJIT does not build an intermediate language (IL) the +way that TR does. It instead performs a tight loop over the bytecodes and generates +the patchable assembly for each instance of a JVM Bytecode. This means that porting +MicroJIT to a new platform requires rewriting much of the code generator in that +target platform. The current list of available platforms is available [here](Platforms.md). + +MicroJIT uses a stack based [vISA](vISA.md) for its internal model of how to execute Bytecodes. +This maps cleanly onto the JVM Bytecode, so MicroJIT uses the number of stack slots +for JVM data types that the JVM spec requires where ever the target architecture allows. + +To keep track of the location of parameters and locals, as well as for creating the +required GCStackAtlas, MicroJIT makes use of [side tables](SideTables.md). These +tables are arrays of structures which represent a row in their table. Using these +tables, and some upper bounding, MicroJIT can allocate the required number of rows +in these tables on the stack rather than the heap, helping MicroJIT compile faster. + +The templates MicroJIT uses for its code generation are written in a mix of the +target architecture's assembly (for the NASM assembler) and assembler macros used to +abstract some of the vISA operations away into a domain specific language ([DSL](DSL.md)). + +To try MicroJIT, first build your JVM with a configuration using the `--enable-microjit` +option (see the openj9-openjdk-jdk8 project). Then, once built, use the +`-Xjit:mjitEnabled=1,mjitCount=20` options. You can use other values for `mjitCount` than +20, but peer-reviewed researh shows that this value is a good estimation of the best +count for similar template-based JITs on JVM benchmarks. + +MicroJIT is an experimental feature. It currently does not compile all methods, and will +fail to compile methods with unsupported bytecodes or calling conventions. For details +on supported bytecodes and calling conventions, see our [supported compilations](support.md) +documentation. + +# Topics + +[MicroJIT Compile Stages](Stages.md) +[Supported Platforms](Platforms.md) +[vISA](vISA.md) +[Supporting Side Tables](SideTables.md) +[Domain Specific Language](DSL.md) +[Supported Compilations](support.md) +
diff --git a/doc/compiler/MicroJIT/SideTables.md b/doc/compiler/MicroJIT/SideTables.md new file mode 100644 index 00000000000..fa57c630205 --- /dev/null +++ b/doc/compiler/MicroJIT/SideTables.md @@ -0,0 +1,65 @@ + + +# Overview + +This document contains descriptions of all the side tables used by MicroJIT to +generate code while limiting heap allocations. + +# Register Stack + +The register stack contains the number of stack slots contained +in each register used by TR's Linkage. This allows for MicroJIT +to quickly calculate stack offsets for interpreter arguments +on the fly. + +RegisterStack: + useRAX + useRSI + useRDX + useRCX + useXMM0 + useXMM1 + useXMM2 + useXMM3 + useXMM4 + useXMM5 + useXMM6 + useXMM7 + stackSlotsUsed + +# ParamTableEntry and LocalTableEntry + +This entry is used for both tables, and contains information used +to copy parameters from their linkage locations to their local +array slots, map their locations into GCMaps, and get the size of +their data without storing their data types. + +ParamTableEntry: + offset //Offset into stack for loading from stack + regNo //Register containing parameter when called by interpreter + gcMapOffset //Offset used for lookup by GCMap + slots //Number of slots used by this variable when storing in local array + size //Number of bytes used by this variable when stored on the stack + isReference //Is this variable an object reference + onStack //Is this variable currently on the stack + notInitialized //Is this entry uninitialized by MicroJIT diff --git a/doc/compiler/MicroJIT/Stages.md b/doc/compiler/MicroJIT/Stages.md new file mode 100644 index 00000000000..4ada44a690a --- /dev/null +++ b/doc/compiler/MicroJIT/Stages.md @@ -0,0 +1,74 @@ + + +# Overview + +This document contains information on the compilation stages of MicroJIT +from where to diverges from TR. + +# MetaData + +At the entry point to MicroJIT (the `mjit` method of `TR::CompilationInfoPerThreadBase`), +MicroJIT enters the first stage of compilation, building MetaData. It parses the +method signature to learn the types of its parameters, and bytecodes to learn the +number and type of its locals. It then estimates an upperbound for how much space +the compiled method will need and requests it from the allocator. + +During this stage, it also learns the amount of space to save on the stack for calling +child methods, and will eventually also create a control flow graph here to perform +profiling code generation later. The code generator object is constructed and the next +steps begin. + +# Pre-prologue + +The pre-prologue is generated first. This contains many useful code snippits, jump points, +and small bits of meta-data used throughout the JVM that are relative to an individual +compilation of a method. + +# Prologue + +The prologue is then generated. Copying paramters into the appropreate places and updating +tables as needed. Once these tables are created, the structures used by the JVM are created, +namely the `GCStackAtlas` and its initial GC Maps. At this point, the required size of the +stack is known, the required size of the local array is known, and the code generator can +generate the code for allocating the stack space needed, moving the parameters into their +local array slots, and setting the Computation Stack and Local Array pointers (See [vISA](vISA.md)). + +# Body + +The code generator then itterates in a tight loop over the bytecode intruction stream, +generating the required template for each bytecode, one at a time, and patching them as +needed. In this phase, if a bytecode that is unsupported (See [supported](support.md) +is reached, the code generator bubbles the error up to the top level and compilation +is marked as a failure. The method is set to not be attempted with MicroJIT again, and +its compilation threshold is set to the default TR threshold. + +# Cold Area + +The snippets required by MicroJIT are generated last. These snippets can serve a +variety of purposes, but are usually cold code not expected to be executed on every +execution of the method. + +# Clean up + +The remaining meta-data structures are created, MicroJIT updates the internal counts +for compilations, and the unused space allocated for compilation in the start is reclaimed. \ No newline at end of file diff --git a/doc/compiler/MicroJIT/support.md b/doc/compiler/MicroJIT/support.md new file mode 100644 index 00000000000..deac0bf89c4 --- /dev/null +++ b/doc/compiler/MicroJIT/support.md @@ -0,0 +1,72 @@ + + +MicroJIT does not currently support the compilation of: + System methods + Methods which call System methods + Methods which call JNI methods + Methods which use any of the following bytecodes: + 01 (0x01) aconst_null + 18 (0x12) ldc + 19 (0x13) ldc_w + 20 (0x14) ldc2_w + 46 (0x2e) iaload + 47 (0x2f) laload + 48 (0x30) faload + 49 (0x31) daload + 50 (0x32) aaload + 51 (0x33) baload + 52 (0x34) caload + 53 (0x35) saload + 79 (0x4f) iastore + 80 (0x50) lastore + 81 (0x51) fastore + 82 (0x52) dastore + 83 (0x53) aastore + 84 (0x54) bastore + 85 (0x55) castore + 86 (0x56) sastore + 168 (0xa8) jsr + 169 (0xa9) ret + 170 (0xaa) tableswitch + 171 (0xab) lookupswitch + 182 (0xb6) invokevirtual HK + 183 (0xb7) invokespecial + 185 (0xb9) invokeinterface + 186 (0xba) invokedynamic + 187 (0xbb) new + 188 (0xbc) newarray + 189 (0xbd) anewarray + 190 (0xbe) arraylength + 191 (0xbf) athrow + 192 (0xc0) checkcast + 193 (0xc1) instanceof + 194 (0xc2) monitorenter + 195 (0xc3) monitorexit + 196 (0xc4) wide + 197 (0xc5) multianewarray + 200 (0xc8) goto_w + 201 (0xc9) jsr_w + 202 (0xca) breakpoint + 254 (0xfe) impdep1 + 255 (0xff) impdep2 + MicroJIT is also currently known to have issue with methods that are far down the call stack which contain live object references \ No newline at end of file diff --git a/doc/compiler/MicroJIT/vISA.md b/doc/compiler/MicroJIT/vISA.md new file mode 100644 index 00000000000..21d0a3e0c62 --- /dev/null +++ b/doc/compiler/MicroJIT/vISA.md @@ -0,0 +1,49 @@ + + +# Overview + +MicroJIT uses a stack-based vISA. This means values are pushed onto +the computation stack, popped off the computation stack, saved and +loaded from a local array, and that operands for a given instruction +are either in the instruction stream, or on the computation stack. + +MicroJIT is implemented on register based architectures, and makes use +of those registers as pointers onto the stack and temporary holding +cells for values either being used by an operation, or being saved +and loaded to and from the local array. Below are the mappings for +the supported architectures. + +# AMD64 + +rsp: The stack extent of the computation stack pointer. +r10: The computation stack pointer. +r11: Temporary register for storing: + The accumulator + A pointer to an object +r12: Temporary register for storing: + Any value which will act on the accumulator + The value to be written to an object field + The value read from an object field +r13: Holds addresses for absolute addressing +r14: The pointer to the start of the local array +r15: Values loaded from memory for storing on the stack or in the local array \ No newline at end of file From ed205cbc3b541249c0e2ca2280c38f713ccee5a8 Mon Sep 17 00:00:00 2001 From: Scott Young Date: Thu, 17 Mar 2022 20:59:57 +0000 Subject: [PATCH 03/25] Fixed unordered lists in documentation for the MicroJIT feature Signed-off-by: Scott Young --- doc/compiler/MicroJIT/DSL.md | 56 ++++++++--------- doc/compiler/MicroJIT/Platforms.md | 6 +- doc/compiler/MicroJIT/README.md | 12 ++-- doc/compiler/MicroJIT/SideTables.md | 46 +++++++------- doc/compiler/MicroJIT/support.md | 98 ++++++++++++++--------------- doc/compiler/MicroJIT/vISA.md | 24 +++---- 6 files changed, 121 insertions(+), 121 deletions(-) diff --git a/doc/compiler/MicroJIT/DSL.md b/doc/compiler/MicroJIT/DSL.md index d9875453f49..6316debc6a3 100644 --- a/doc/compiler/MicroJIT/DSL.md +++ b/doc/compiler/MicroJIT/DSL.md @@ -30,31 +30,31 @@ while also keeping their own footprint small. # Current Macros The current existing DSL macros are: - pop_single_slot - add the size of 1 stack slots from the computation stack pointer - pop_dual_slot - add the size of 2 stack slots from the computation stack pointer - push_single_slot - subtract the size of 1 stack slots from the computation stack pointer - push_dual_slot - subtract the size of 2 stack slots from the computation stack pointer - _32bit_local_to_rXX_PATCH - Move a 32 bit value to a r11d-r15d register from a local array slot - _64bit_local_to_rXX_PATCH - Move a 64 bit value to a 64-bit register from a local array slot - _32bit_local_from_rXX_PATCH - Move a 32 bit value from a r11d-r15d register to a local array slot - _64bit_local_from_rXX_PATCH - Move a 64 bit value from a 64-bit register to a local array slot - _32bit_slot_stack_to_rXX - Move a 64 bit value to an r11-r15 register from the computation stack - _32bit_slot_stack_to_eXX - Move a 32 bit value to an eax-edx register from the computation stack - _64bit_slot_stack_to_rXX - Move a 64 bit value to a 64 bit register from the computation stack - _32bit_slot_stack_from_rXX - Move a 64 bit value from an r11d-r15d register to the computation stack - _32bit_slot_stack_from_eXX - Move a 32 bit value from an eax-edx register to the computation stack - _64bit_slot_stack_from_rXX - Move a 64 bit value from a 64 bit register to the computation stack \ No newline at end of file +- pop_single_slot +- - add the size of 1 stack slots from the computation stack pointer +- pop_dual_slot +- - add the size of 2 stack slots from the computation stack pointer +- push_single_slot +- - subtract the size of 1 stack slots from the computation stack pointer +- push_dual_slot +- - subtract the size of 2 stack slots from the computation stack pointer +- _32bit_local_to_rXX_PATCH +- - Move a 32 bit value to a r11d-r15d register from a local array slot +- _64bit_local_to_rXX_PATCH +- - Move a 64 bit value to a 64-bit register from a local array slot +- _32bit_local_from_rXX_PATCH +- - Move a 32 bit value from a r11d-r15d register to a local array slot +- _64bit_local_from_rXX_PATCH +- - Move a 64 bit value from a 64-bit register to a local array slot +- _32bit_slot_stack_to_rXX +- - Move a 64 bit value to an r11-r15 register from the computation stack +- _32bit_slot_stack_to_eXX +- - Move a 32 bit value to an eax-edx register from the computation stack +- _64bit_slot_stack_to_rXX +- - Move a 64 bit value to a 64 bit register from the computation stack +- _32bit_slot_stack_from_rXX +- - Move a 64 bit value from an r11d-r15d register to the computation stack +- _32bit_slot_stack_from_eXX +- - Move a 32 bit value from an eax-edx register to the computation stack +- _64bit_slot_stack_from_rXX +- - Move a 64 bit value from a 64 bit register to the computation stack \ No newline at end of file diff --git a/doc/compiler/MicroJIT/Platforms.md b/doc/compiler/MicroJIT/Platforms.md index 10554a70949..23683388347 100644 --- a/doc/compiler/MicroJIT/Platforms.md +++ b/doc/compiler/MicroJIT/Platforms.md @@ -22,9 +22,9 @@ SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-excepti # List of currently supported platforms -AMD64 (a.k.a. x86-64) +- AMD64 (a.k.a. x86-64) # List of expected future platforms -RISC-V -aarch64 \ No newline at end of file +- RISC-V +- aarch64 \ No newline at end of file diff --git a/doc/compiler/MicroJIT/README.md b/doc/compiler/MicroJIT/README.md index 4450f25bc20..f9d5ba636e3 100644 --- a/doc/compiler/MicroJIT/README.md +++ b/doc/compiler/MicroJIT/README.md @@ -65,10 +65,10 @@ documentation. # Topics -[MicroJIT Compile Stages](Stages.md) -[Supported Platforms](Platforms.md) -[vISA](vISA.md) -[Supporting Side Tables](SideTables.md) -[Domain Specific Language](DSL.md) -[Supported Compilations](support.md) +- [MicroJIT Compile Stages](Stages.md) +- [Supported Platforms](Platforms.md) +- [vISA](vISA.md) +- [Supporting Side Tables](SideTables.md) +- [Domain Specific Language](DSL.md) +- [Supported Compilations](support.md)
diff --git a/doc/compiler/MicroJIT/SideTables.md b/doc/compiler/MicroJIT/SideTables.md index fa57c630205..c52e5c76e47 100644 --- a/doc/compiler/MicroJIT/SideTables.md +++ b/doc/compiler/MicroJIT/SideTables.md @@ -32,20 +32,20 @@ in each register used by TR's Linkage. This allows for MicroJIT to quickly calculate stack offsets for interpreter arguments on the fly. -RegisterStack: - useRAX - useRSI - useRDX - useRCX - useXMM0 - useXMM1 - useXMM2 - useXMM3 - useXMM4 - useXMM5 - useXMM6 - useXMM7 - stackSlotsUsed +- RegisterStack: +- - useRAX +- - useRSI +- - useRDX +- - useRCX +- - useXMM0 +- - useXMM1 +- - useXMM2 +- - useXMM3 +- - useXMM4 +- - useXMM5 +- - useXMM6 +- - useXMM7 +- - stackSlotsUsed # ParamTableEntry and LocalTableEntry @@ -54,12 +54,12 @@ to copy parameters from their linkage locations to their local array slots, map their locations into GCMaps, and get the size of their data without storing their data types. -ParamTableEntry: - offset //Offset into stack for loading from stack - regNo //Register containing parameter when called by interpreter - gcMapOffset //Offset used for lookup by GCMap - slots //Number of slots used by this variable when storing in local array - size //Number of bytes used by this variable when stored on the stack - isReference //Is this variable an object reference - onStack //Is this variable currently on the stack - notInitialized //Is this entry uninitialized by MicroJIT +- ParamTableEntry: +- - offset //Offset into stack for loading from stack +- - regNo //Register containing parameter when called by interpreter +- - gcMapOffset //Offset used for lookup by GCMap +- - slots //Number of slots used by this variable when storing in local array +- - size //Number of bytes used by this variable when stored on the stack +- - isReference //Is this variable an object reference +- - onStack //Is this variable currently on the stack +- - notInitialized //Is this entry uninitialized by MicroJIT diff --git a/doc/compiler/MicroJIT/support.md b/doc/compiler/MicroJIT/support.md index deac0bf89c4..f08e9df00de 100644 --- a/doc/compiler/MicroJIT/support.md +++ b/doc/compiler/MicroJIT/support.md @@ -21,52 +21,52 @@ SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-excepti --> MicroJIT does not currently support the compilation of: - System methods - Methods which call System methods - Methods which call JNI methods - Methods which use any of the following bytecodes: - 01 (0x01) aconst_null - 18 (0x12) ldc - 19 (0x13) ldc_w - 20 (0x14) ldc2_w - 46 (0x2e) iaload - 47 (0x2f) laload - 48 (0x30) faload - 49 (0x31) daload - 50 (0x32) aaload - 51 (0x33) baload - 52 (0x34) caload - 53 (0x35) saload - 79 (0x4f) iastore - 80 (0x50) lastore - 81 (0x51) fastore - 82 (0x52) dastore - 83 (0x53) aastore - 84 (0x54) bastore - 85 (0x55) castore - 86 (0x56) sastore - 168 (0xa8) jsr - 169 (0xa9) ret - 170 (0xaa) tableswitch - 171 (0xab) lookupswitch - 182 (0xb6) invokevirtual HK - 183 (0xb7) invokespecial - 185 (0xb9) invokeinterface - 186 (0xba) invokedynamic - 187 (0xbb) new - 188 (0xbc) newarray - 189 (0xbd) anewarray - 190 (0xbe) arraylength - 191 (0xbf) athrow - 192 (0xc0) checkcast - 193 (0xc1) instanceof - 194 (0xc2) monitorenter - 195 (0xc3) monitorexit - 196 (0xc4) wide - 197 (0xc5) multianewarray - 200 (0xc8) goto_w - 201 (0xc9) jsr_w - 202 (0xca) breakpoint - 254 (0xfe) impdep1 - 255 (0xff) impdep2 - MicroJIT is also currently known to have issue with methods that are far down the call stack which contain live object references \ No newline at end of file +- System methods +- Methods which call System methods +- Methods which call JNI methods +- Methods which use any of the following bytecodes: +- - 01 (0x01) aconst_null +- - 18 (0x12) ldc +- - 19 (0x13) ldc_w +- - 20 (0x14) ldc2_w +- - 46 (0x2e) iaload +- - 47 (0x2f) laload +- - 48 (0x30) faload +- - 49 (0x31) daload +- - 50 (0x32) aaload +- - 51 (0x33) baload +- - 52 (0x34) caload +- - 53 (0x35) saload +- - 79 (0x4f) iastore +- - 80 (0x50) lastore +- - 81 (0x51) fastore +- - 82 (0x52) dastore +- - 83 (0x53) aastore +- - 84 (0x54) bastore +- - 85 (0x55) castore +- - 86 (0x56) sastore +- - 168 (0xa8) jsr +- - 169 (0xa9) ret +- - 170 (0xaa) tableswitch +- - 171 (0xab) lookupswitch +- - 182 (0xb6) invokevirtual HK +- - 183 (0xb7) invokespecial +- - 185 (0xb9) invokeinterface +- - 186 (0xba) invokedynamic +- - 187 (0xbb) new +- - 188 (0xbc) newarray +- - 189 (0xbd) anewarray +- - 190 (0xbe) arraylength +- - 191 (0xbf) athrow +- - 192 (0xc0) checkcast +- - 193 (0xc1) instanceof +- - 194 (0xc2) monitorenter +- - 195 (0xc3) monitorexit +- - 196 (0xc4) wide +- - 197 (0xc5) multianewarray +- - 200 (0xc8) goto_w +- - 201 (0xc9) jsr_w +- - 202 (0xca) breakpoint +- - 254 (0xfe) impdep1 +- - 255 (0xff) impdep2 +- MicroJIT is also currently known to have issue with methods that are far down the call stack which contain live object references \ No newline at end of file diff --git a/doc/compiler/MicroJIT/vISA.md b/doc/compiler/MicroJIT/vISA.md index 21d0a3e0c62..38ad8ee7bd3 100644 --- a/doc/compiler/MicroJIT/vISA.md +++ b/doc/compiler/MicroJIT/vISA.md @@ -35,15 +35,15 @@ the supported architectures. # AMD64 -rsp: The stack extent of the computation stack pointer. -r10: The computation stack pointer. -r11: Temporary register for storing: - The accumulator - A pointer to an object -r12: Temporary register for storing: - Any value which will act on the accumulator - The value to be written to an object field - The value read from an object field -r13: Holds addresses for absolute addressing -r14: The pointer to the start of the local array -r15: Values loaded from memory for storing on the stack or in the local array \ No newline at end of file +- rsp: The stack extent of the computation stack pointer. +- r10: The computation stack pointer. +- r11: Temporary register for storing: +- - The accumulator +- - A pointer to an object +- r12: Temporary register for storing: +- - Any value which will act on the accumulator +- - The value to be written to an object field +- - The value read from an object field +- r13: Holds addresses for absolute addressing +- r14: The pointer to the start of the local array +- r15: Values loaded from memory for storing on the stack or in the local array \ No newline at end of file From 278b0692daeed16bed0f34a281381c177f392e0d Mon Sep 17 00:00:00 2001 From: Scott Young Date: Thu, 17 Mar 2022 21:08:03 +0000 Subject: [PATCH 04/25] Fixed tables in documentation for the MicroJIT feature Signed-off-by: Scott Young --- doc/compiler/MicroJIT/SideTables.md | 48 +++++++++++++++-------------- 1 file changed, 25 insertions(+), 23 deletions(-) diff --git a/doc/compiler/MicroJIT/SideTables.md b/doc/compiler/MicroJIT/SideTables.md index c52e5c76e47..ba8d4f2ab8d 100644 --- a/doc/compiler/MicroJIT/SideTables.md +++ b/doc/compiler/MicroJIT/SideTables.md @@ -32,20 +32,20 @@ in each register used by TR's Linkage. This allows for MicroJIT to quickly calculate stack offsets for interpreter arguments on the fly. -- RegisterStack: -- - useRAX -- - useRSI -- - useRDX -- - useRCX -- - useXMM0 -- - useXMM1 -- - useXMM2 -- - useXMM3 -- - useXMM4 -- - useXMM5 -- - useXMM6 -- - useXMM7 -- - stackSlotsUsed +RegisterStack: +- useRAX +- useRSI +- useRDX +- useRCX +- useXMM0 +- useXMM1 +- useXMM2 +- useXMM3 +- useXMM4 +- useXMM5 +- useXMM6 +- useXMM7 +- stackSlotsUsed # ParamTableEntry and LocalTableEntry @@ -54,12 +54,14 @@ to copy parameters from their linkage locations to their local array slots, map their locations into GCMaps, and get the size of their data without storing their data types. -- ParamTableEntry: -- - offset //Offset into stack for loading from stack -- - regNo //Register containing parameter when called by interpreter -- - gcMapOffset //Offset used for lookup by GCMap -- - slots //Number of slots used by this variable when storing in local array -- - size //Number of bytes used by this variable when stored on the stack -- - isReference //Is this variable an object reference -- - onStack //Is this variable currently on the stack -- - notInitialized //Is this entry uninitialized by MicroJIT +ParamTableEntry: +| Field | Description | +|:--------------:|:-----------------------------------------------------------------:| +| offset | Offset into stack for loading from stack | +| regNo | Register containing parameter when called by interpreter | +| gcMapOffset | Offset used for lookup by GCMap | +| slots | Number of slots used by this variable when storing in local array | +| size | Number of bytes used by this variable when stored on the stack | +| isReference | Is this variable an object reference | +| onStack | Is this variable currently on the stack | +| notInitialized | Is this entry uninitialized by MicroJIT | From e3f886c461ddde3392a3be5816e7ba2f5523debf Mon Sep 17 00:00:00 2001 From: Scott Young Date: Thu, 17 Mar 2022 21:15:13 +0000 Subject: [PATCH 05/25] Changed vISA documentation to a table Signed-off-by: Scott Young --- doc/compiler/MicroJIT/vISA.md | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/doc/compiler/MicroJIT/vISA.md b/doc/compiler/MicroJIT/vISA.md index 38ad8ee7bd3..37f692ec730 100644 --- a/doc/compiler/MicroJIT/vISA.md +++ b/doc/compiler/MicroJIT/vISA.md @@ -34,16 +34,12 @@ and loaded to and from the local array. Below are the mappings for the supported architectures. # AMD64 - -- rsp: The stack extent of the computation stack pointer. -- r10: The computation stack pointer. -- r11: Temporary register for storing: -- - The accumulator -- - A pointer to an object -- r12: Temporary register for storing: -- - Any value which will act on the accumulator -- - The value to be written to an object field -- - The value read from an object field -- r13: Holds addresses for absolute addressing -- r14: The pointer to the start of the local array -- r15: Values loaded from memory for storing on the stack or in the local array \ No newline at end of file +| Register | Description | +|:--------:|:------------------------------------------------------------------------------------------------:| +| rsp | The stack extent of the computation stack pointer | +| r10 | The computation stack pointer | +| r11 | Storing the accumulator or pointer to an object | +| r12 | Storing values that act on the accumulator, are to be written to, or have been read from a field | +| r13 | Holds addresses for absolute addressing | +| r14 | The pointer to the start of the local array | +| r15 | Values loaded from memory for storing on the stack or in the local array | \ No newline at end of file From 932eb492fd40287cd3d3d9df15a090920cd1380c Mon Sep 17 00:00:00 2001 From: Scott Young Date: Tue, 22 Mar 2022 17:45:03 +0000 Subject: [PATCH 06/25] Moved MJIT exception table into MJIT namespace Signed-off-by: Scott Young --- runtime/compiler/microjit/ExceptionTable.cpp | 10 +++++----- runtime/compiler/microjit/ExceptionTable.hpp | 8 ++++++-- runtime/compiler/runtime/MetaData.cpp | 6 +++--- 3 files changed, 14 insertions(+), 10 deletions(-) diff --git a/runtime/compiler/microjit/ExceptionTable.cpp b/runtime/compiler/microjit/ExceptionTable.cpp index 2bea77223e8..1a67f9faf0f 100644 --- a/runtime/compiler/microjit/ExceptionTable.cpp +++ b/runtime/compiler/microjit/ExceptionTable.cpp @@ -36,7 +36,7 @@ class TR_ResolvedMethod; //TODO: This Should create an Exception table using only information MicroJIT has. -MJIT_ExceptionTableEntryIterator::MJIT_ExceptionTableEntryIterator(TR::Compilation *comp) +MJIT::ExceptionTableEntryIterator::ExceptionTableEntryIterator(TR::Compilation *comp) : _compilation(comp) { int32_t i = 1; @@ -54,7 +54,7 @@ MJIT_ExceptionTableEntryIterator::MJIT_ExceptionTableEntryIterator(TR::Compilati //TODO: This Should create an Exception table entry using only information MicroJIT has. void -MJIT_ExceptionTableEntryIterator::addSnippetRanges( +MJIT::ExceptionTableEntryIterator::addSnippetRanges( List & tableEntries, TR::Block * snippetBlock, TR::Block * catchBlock, uint32_t catchType, TR_ResolvedMethod * method, TR::Compilation *comp) { @@ -62,19 +62,19 @@ MJIT_ExceptionTableEntryIterator::addSnippetRanges( } TR_ExceptionTableEntry * -MJIT_ExceptionTableEntryIterator::getFirst() +MJIT::ExceptionTableEntryIterator::getFirst() { return NULL; } TR_ExceptionTableEntry * -MJIT_ExceptionTableEntryIterator::getNext() +MJIT::ExceptionTableEntryIterator::getNext() { return NULL; } uint32_t -MJIT_ExceptionTableEntryIterator::size() +MJIT::ExceptionTableEntryIterator::size() { return 0; } diff --git a/runtime/compiler/microjit/ExceptionTable.hpp b/runtime/compiler/microjit/ExceptionTable.hpp index 3ec44cd7f53..f42501e4f84 100644 --- a/runtime/compiler/microjit/ExceptionTable.hpp +++ b/runtime/compiler/microjit/ExceptionTable.hpp @@ -35,9 +35,11 @@ namespace TR { class Block; } namespace TR { class Compilation; } template class TR_Array; -struct MJIT_ExceptionTableEntryIterator +namespace MJIT { + +struct ExceptionTableEntryIterator { - MJIT_ExceptionTableEntryIterator(TR::Compilation *comp); + ExceptionTableEntryIterator(TR::Compilation *comp); TR_ExceptionTableEntry * getFirst(); TR_ExceptionTableEntry * getNext(); @@ -53,4 +55,6 @@ struct MJIT_ExceptionTableEntryIterator uint32_t _handlerIndex; }; +} + #endif /* MJIT_EXCEPTIONTABLE_INCL */ diff --git a/runtime/compiler/runtime/MetaData.cpp b/runtime/compiler/runtime/MetaData.cpp index a6c4cbc848a..f37e5e72283 100644 --- a/runtime/compiler/runtime/MetaData.cpp +++ b/runtime/compiler/runtime/MetaData.cpp @@ -234,7 +234,7 @@ createExceptionTable( static void createMJITExceptionTable( TR_MethodMetaData * data, - MJIT_ExceptionTableEntryIterator & exceptionIterator, + MJIT::ExceptionTableEntryIterator & exceptionIterator, bool fourByteOffsets, TR::Compilation *comp) { @@ -1139,7 +1139,7 @@ static int32_t calculateExceptionsSize( #if defined(J9VM_OPT_MICROJIT) static int32_t calculateMJITExceptionsSize( TR::Compilation* comp, - MJIT_ExceptionTableEntryIterator& exceptionIterator, + MJIT::ExceptionTableEntryIterator& exceptionIterator, bool& fourByteExceptionTableEntries, uint32_t& numberOfExceptionRangesWithBits) { @@ -1857,7 +1857,7 @@ createMJITMethodMetaData( // GC maps, Exceptions, inlining calls, stack atlas, // GPU optimizations and internal pointer table TR_J9VMBase * vm = &vmArg; - MJIT_ExceptionTableEntryIterator exceptionIterator(comp); + MJIT::ExceptionTableEntryIterator exceptionIterator(comp); TR::ResolvedMethodSymbol * methodSymbol = comp->getJittedMethodSymbol(); TR::CodeGenerator * cg = comp->cg(); TR::GCStackAtlas * trStackAtlas = cg->getStackAtlas(); From 0d0872ce2a538611b67aaaf34317f77539a4461e Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Fri, 25 Mar 2022 16:10:29 -0300 Subject: [PATCH 07/25] MicroJIT: Fixing indentation and line endings Signed-off-by: Harpreet Kaur --- buildspecs/j9.flags | 2 +- doc/compiler/MicroJIT/DSL.md | 32 +- doc/compiler/MicroJIT/Platforms.md | 2 +- doc/compiler/MicroJIT/README.md | 12 +- doc/compiler/MicroJIT/SideTables.md | 2 +- doc/compiler/MicroJIT/Stages.md | 16 +- .../MicroJIT/{support.md => Support.md} | 88 +- doc/compiler/MicroJIT/vISA.md | 4 +- .../compiler/control/CompilationRuntime.hpp | 8 +- .../compiler/control/CompilationThread.cpp | 190 +- runtime/compiler/control/HookedByTheJit.cpp | 8 +- runtime/compiler/control/J9Options.cpp | 4 +- runtime/compiler/infra/J9Cfg.cpp | 158 +- .../compiler/microjit/ByteCodeIterator.hpp | 61 +- runtime/compiler/microjit/CMakeLists.txt | 2 +- runtime/compiler/microjit/ExceptionTable.cpp | 27 +- runtime/compiler/microjit/ExceptionTable.hpp | 21 +- .../compiler/microjit/HWMetaDataCreation.cpp | 1173 ++-- .../compiler/microjit/LWMetaDataCreation.cpp | 4 +- .../microjit/MethodBytecodeMetaData.hpp | 65 +- runtime/compiler/microjit/SideTables.hpp | 573 +- runtime/compiler/microjit/utils.hpp | 21 +- runtime/compiler/microjit/x/CMakeLists.txt | 2 +- .../microjit/x/amd64/AMD64Codegen.cpp | 4955 +++++++++-------- .../microjit/x/amd64/AMD64Codegen.hpp | 752 ++- .../microjit/x/amd64/AMD64CodegenGC.cpp | 61 +- .../microjit/x/amd64/AMD64CodegenGC.hpp | 22 +- .../microjit/x/amd64/AMD64Linkage.cpp | 5 +- .../microjit/x/amd64/AMD64Linkage.hpp | 14 +- .../compiler/microjit/x/amd64/CMakeLists.txt | 2 +- .../microjit/x/amd64/templates/CMakeLists.txt | 2 +- .../microjit/x/amd64/templates/bytecodes.nasm | 1378 ++--- .../microjit/x/amd64/templates/linkage.nasm | 242 +- .../microjit/x/amd64/templates/microjit.nasm | 2 +- runtime/compiler/optimizer/J9Optimizer.cpp | 2 +- .../compiler/optimizer/JProfilingBlock.cpp | 12 +- runtime/compiler/runtime/MetaData.cpp | 83 +- .../x/amd64/codegen/AMD64PrivateLinkage.cpp | 4 +- .../x/amd64/codegen/AMD64PrivateLinkage.hpp | 1 + 39 files changed, 5080 insertions(+), 4932 deletions(-) rename doc/compiler/MicroJIT/{support.md => Support.md} (56%) diff --git a/buildspecs/j9.flags b/buildspecs/j9.flags index 128b0ccb639..b5644ef97c8 100644 --- a/buildspecs/j9.flags +++ b/buildspecs/j9.flags @@ -1785,7 +1785,7 @@ Only available on zOS Disables common dependencies between OpenJ9 and OpenJDK MethodHandles.
- MicroJIT support is enabled in the buildspec. + Enables MicroJIT support. diff --git a/doc/compiler/MicroJIT/DSL.md b/doc/compiler/MicroJIT/DSL.md index 6316debc6a3..8a834d93408 100644 --- a/doc/compiler/MicroJIT/DSL.md +++ b/doc/compiler/MicroJIT/DSL.md @@ -22,8 +22,8 @@ SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-excepti # Overview -MicroJIT's DSL exists to help new contributors from making common mistakes -when writing new, or editing existsing, templates. It helps developers keep +MicroJIT's DSL helps new contributors to avoid making common mistakes +when writing new, or editing existing, templates. It helps developers keep sizes of data types and number of slots separate from each other conceptually while also keeping their own footprint small. @@ -31,30 +31,30 @@ while also keeping their own footprint small. The current existing DSL macros are: - pop_single_slot -- - add the size of 1 stack slots from the computation stack pointer + - Add the size of 1 stack slot to the computation stack pointer - pop_dual_slot -- - add the size of 2 stack slots from the computation stack pointer + - Add the size of 2 stack slots to the computation stack pointer - push_single_slot -- - subtract the size of 1 stack slots from the computation stack pointer + - Subtract the size of 1 stack slot from the computation stack pointer - push_dual_slot -- - subtract the size of 2 stack slots from the computation stack pointer + - Subtract the size of 2 stack slots from the computation stack pointer - _32bit_local_to_rXX_PATCH -- - Move a 32 bit value to a r11d-r15d register from a local array slot + - Move a 32-bit value to an r11d-r15d register from a local array slot - _64bit_local_to_rXX_PATCH -- - Move a 64 bit value to a 64-bit register from a local array slot + - Move a 64-bit value to a 64-bit register from a local array slot - _32bit_local_from_rXX_PATCH -- - Move a 32 bit value from a r11d-r15d register to a local array slot + - Move a 32-bit value from an r11d-r15d register to a local array slot - _64bit_local_from_rXX_PATCH -- - Move a 64 bit value from a 64-bit register to a local array slot + - Move a 64-bit value from a 64-bit register to a local array slot - _32bit_slot_stack_to_rXX -- - Move a 64 bit value to an r11-r15 register from the computation stack + - Move a 64-bit value to an r11-r15 register from the computation stack - _32bit_slot_stack_to_eXX -- - Move a 32 bit value to an eax-edx register from the computation stack + - Move a 32-bit value to an eax-edx register from the computation stack - _64bit_slot_stack_to_rXX -- - Move a 64 bit value to a 64 bit register from the computation stack + - Move a 64-bit value to a 64-bit register from the computation stack - _32bit_slot_stack_from_rXX -- - Move a 64 bit value from an r11d-r15d register to the computation stack + - Move a 64-bit value from an r11d-r15d register to the computation stack - _32bit_slot_stack_from_eXX -- - Move a 32 bit value from an eax-edx register to the computation stack + - Move a 32-bit value from an eax-edx register to the computation stack - _64bit_slot_stack_from_rXX -- - Move a 64 bit value from a 64 bit register to the computation stack \ No newline at end of file + - Move a 64-bit value from a 64-bit register to the computation stack \ No newline at end of file diff --git a/doc/compiler/MicroJIT/Platforms.md b/doc/compiler/MicroJIT/Platforms.md index 23683388347..17f15e4613d 100644 --- a/doc/compiler/MicroJIT/Platforms.md +++ b/doc/compiler/MicroJIT/Platforms.md @@ -27,4 +27,4 @@ SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-excepti # List of expected future platforms - RISC-V -- aarch64 \ No newline at end of file +- AArch64 \ No newline at end of file diff --git a/doc/compiler/MicroJIT/README.md b/doc/compiler/MicroJIT/README.md index f9d5ba636e3..2a99249dd2d 100644 --- a/doc/compiler/MicroJIT/README.md +++ b/doc/compiler/MicroJIT/README.md @@ -30,7 +30,7 @@ TR's first compile (where CPU resources are not a constraint). MicroJIT's entry point is at the `wrapped compile` stage of TR's compile process. When building with a `--enable-microjit` configuration, the VM checks whether a method should be compiled with MicroJIT or TR before handing off the method to -code generation ([MicroJIT Compile Stages](Stages.md)) +code generation ([MicroJIT Compile Stages](Stages.md)). Being template-based, MicroJIT does not build an intermediate language (IL) the way that TR does. It instead performs a tight loop over the bytecodes and generates @@ -38,9 +38,9 @@ the patchable assembly for each instance of a JVM Bytecode. This means that port MicroJIT to a new platform requires rewriting much of the code generator in that target platform. The current list of available platforms is available [here](Platforms.md). -MicroJIT uses a stack based [vISA](vISA.md) for its internal model of how to execute Bytecodes. +MicroJIT uses a stack-based [vISA](vISA.md) for its internal model of how to execute bytecodes. This maps cleanly onto the JVM Bytecode, so MicroJIT uses the number of stack slots -for JVM data types that the JVM spec requires where ever the target architecture allows. +for JVM data types that the JVM specification requires wherever the target architecture allows. To keep track of the location of parameters and locals, as well as for creating the required GCStackAtlas, MicroJIT makes use of [side tables](SideTables.md). These @@ -50,12 +50,12 @@ in these tables on the stack rather than the heap, helping MicroJIT compile fast The templates MicroJIT uses for its code generation are written in a mix of the target architecture's assembly (for the NASM assembler) and assembler macros used to -abstract some of the vISA operations away into a domain specific language ([DSL](DSL.md)). +abstract some of the vISA operations away into a domain-specific language ([DSL](DSL.md)). To try MicroJIT, first build your JVM with a configuration using the `--enable-microjit` option (see the openj9-openjdk-jdk8 project). Then, once built, use the `-Xjit:mjitEnabled=1,mjitCount=20` options. You can use other values for `mjitCount` than -20, but peer-reviewed researh shows that this value is a good estimation of the best +20, but peer-reviewed research shows that this value is a good estimation of the best count for similar template-based JITs on JVM benchmarks. MicroJIT is an experimental feature. It currently does not compile all methods, and will @@ -69,6 +69,6 @@ documentation. - [Supported Platforms](Platforms.md) - [vISA](vISA.md) - [Supporting Side Tables](SideTables.md) -- [Domain Specific Language](DSL.md) +- [Domain-Specific Language](DSL.md) - [Supported Compilations](support.md)
diff --git a/doc/compiler/MicroJIT/SideTables.md b/doc/compiler/MicroJIT/SideTables.md index ba8d4f2ab8d..6e05437845b 100644 --- a/doc/compiler/MicroJIT/SideTables.md +++ b/doc/compiler/MicroJIT/SideTables.md @@ -28,7 +28,7 @@ generate code while limiting heap allocations. # Register Stack The register stack contains the number of stack slots contained -in each register used by TR's Linkage. This allows for MicroJIT +in each register used by TR's linkage. This allows for MicroJIT to quickly calculate stack offsets for interpreter arguments on the fly. diff --git a/doc/compiler/MicroJIT/Stages.md b/doc/compiler/MicroJIT/Stages.md index 4ada44a690a..62f2589c083 100644 --- a/doc/compiler/MicroJIT/Stages.md +++ b/doc/compiler/MicroJIT/Stages.md @@ -23,7 +23,7 @@ SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-excepti # Overview This document contains information on the compilation stages of MicroJIT -from where to diverges from TR. +from where it diverges from TR. # MetaData @@ -40,26 +40,26 @@ steps begin. # Pre-prologue -The pre-prologue is generated first. This contains many useful code snippits, jump points, +The pre-prologue is generated first. This contains many useful code snippets, jump points, and small bits of meta-data used throughout the JVM that are relative to an individual compilation of a method. # Prologue -The prologue is then generated. Copying paramters into the appropreate places and updating +The prologue is then generated, copying parameters into appropriate places and updating tables as needed. Once these tables are created, the structures used by the JVM are created, -namely the `GCStackAtlas` and its initial GC Maps. At this point, the required size of the -stack is known, the required size of the local array is known, and the code generator can +namely the `GCStackAtlas` and its initial GC Maps. At this point, the required sizes of the +stack and local array are known. Hence, the code generator can generate the code for allocating the stack space needed, moving the parameters into their local array slots, and setting the Computation Stack and Local Array pointers (See [vISA](vISA.md)). # Body -The code generator then itterates in a tight loop over the bytecode intruction stream, +The code generator then iterates in a tight loop over the bytecode instruction stream, generating the required template for each bytecode, one at a time, and patching them as -needed. In this phase, if a bytecode that is unsupported (See [supported](support.md) +needed. During this phase, if a bytecode that is unsupported (See [supported](support.md)) is reached, the code generator bubbles the error up to the top level and compilation -is marked as a failure. The method is set to not be attempted with MicroJIT again, and +is marked as a failure. The method is set not to be attempted with MicroJIT again, and its compilation threshold is set to the default TR threshold. # Cold Area diff --git a/doc/compiler/MicroJIT/support.md b/doc/compiler/MicroJIT/Support.md similarity index 56% rename from doc/compiler/MicroJIT/support.md rename to doc/compiler/MicroJIT/Support.md index f08e9df00de..6abd13d06bd 100644 --- a/doc/compiler/MicroJIT/support.md +++ b/doc/compiler/MicroJIT/Support.md @@ -25,48 +25,48 @@ MicroJIT does not currently support the compilation of: - Methods which call System methods - Methods which call JNI methods - Methods which use any of the following bytecodes: -- - 01 (0x01) aconst_null -- - 18 (0x12) ldc -- - 19 (0x13) ldc_w -- - 20 (0x14) ldc2_w -- - 46 (0x2e) iaload -- - 47 (0x2f) laload -- - 48 (0x30) faload -- - 49 (0x31) daload -- - 50 (0x32) aaload -- - 51 (0x33) baload -- - 52 (0x34) caload -- - 53 (0x35) saload -- - 79 (0x4f) iastore -- - 80 (0x50) lastore -- - 81 (0x51) fastore -- - 82 (0x52) dastore -- - 83 (0x53) aastore -- - 84 (0x54) bastore -- - 85 (0x55) castore -- - 86 (0x56) sastore -- - 168 (0xa8) jsr -- - 169 (0xa9) ret -- - 170 (0xaa) tableswitch -- - 171 (0xab) lookupswitch -- - 182 (0xb6) invokevirtual HK -- - 183 (0xb7) invokespecial -- - 185 (0xb9) invokeinterface -- - 186 (0xba) invokedynamic -- - 187 (0xbb) new -- - 188 (0xbc) newarray -- - 189 (0xbd) anewarray -- - 190 (0xbe) arraylength -- - 191 (0xbf) athrow -- - 192 (0xc0) checkcast -- - 193 (0xc1) instanceof -- - 194 (0xc2) monitorenter -- - 195 (0xc3) monitorexit -- - 196 (0xc4) wide -- - 197 (0xc5) multianewarray -- - 200 (0xc8) goto_w -- - 201 (0xc9) jsr_w -- - 202 (0xca) breakpoint -- - 254 (0xfe) impdep1 -- - 255 (0xff) impdep2 + - 01 (0x01) aconst_null + - 18 (0x12) ldc + - 19 (0x13) ldc_w + - 20 (0x14) ldc2_w + - 46 (0x2e) iaload + - 47 (0x2f) laload + - 48 (0x30) faload + - 49 (0x31) daload + - 50 (0x32) aaload + - 51 (0x33) baload + - 52 (0x34) caload + - 53 (0x35) saload + - 79 (0x4f) iastore + - 80 (0x50) lastore + - 81 (0x51) fastore + - 82 (0x52) dastore + - 83 (0x53) aastore + - 84 (0x54) bastore + - 85 (0x55) castore + - 86 (0x56) sastore + - 168 (0xa8) jsr + - 169 (0xa9) ret + - 170 (0xaa) tableswitch + - 171 (0xab) lookupswitch + - 182 (0xb6) invokevirtual + - 183 (0xb7) invokespecial + - 185 (0xb9) invokeinterface + - 186 (0xba) invokedynamic + - 187 (0xbb) new + - 188 (0xbc) newarray + - 189 (0xbd) anewarray + - 190 (0xbe) arraylength + - 191 (0xbf) athrow + - 192 (0xc0) checkcast + - 193 (0xc1) instanceof + - 194 (0xc2) monitorenter + - 195 (0xc3) monitorexit + - 196 (0xc4) wide + - 197 (0xc5) multianewarray + - 200 (0xc8) goto_w + - 201 (0xc9) jsr_w + - 202 (0xca) breakpoint + - 254 (0xfe) impdep1 + - 255 (0xff) impdep2 - MicroJIT is also currently known to have issue with methods that are far down the call stack which contain live object references \ No newline at end of file diff --git a/doc/compiler/MicroJIT/vISA.md b/doc/compiler/MicroJIT/vISA.md index 37f692ec730..16b920fa7af 100644 --- a/doc/compiler/MicroJIT/vISA.md +++ b/doc/compiler/MicroJIT/vISA.md @@ -22,12 +22,12 @@ SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-excepti # Overview -MicroJIT uses a stack-based vISA. This means values are pushed onto +MicroJIT uses a stack-based virtual ISA. This means values are pushed onto the computation stack, popped off the computation stack, saved and loaded from a local array, and that operands for a given instruction are either in the instruction stream, or on the computation stack. -MicroJIT is implemented on register based architectures, and makes use +MicroJIT is implemented on register-based architectures, and makes use of those registers as pointers onto the stack and temporary holding cells for values either being used by an operation, or being saved and loaded to and from the local array. Below are the mappings for diff --git a/runtime/compiler/control/CompilationRuntime.hpp b/runtime/compiler/control/CompilationRuntime.hpp index 319c53a0961..75190c72772 100644 --- a/runtime/compiler/control/CompilationRuntime.hpp +++ b/runtime/compiler/control/CompilationRuntime.hpp @@ -676,11 +676,11 @@ class CompilationInfo #endif /* defined(J9VM_OPT_JITSERVER) */ #if defined(J9VM_OPT_MICROJIT) TR::Options *options = TR::Options::getJITCmdLineOptions(); - if(options->_mjitEnabled && value) + if (options->_mjitEnabled && value) { - return; + return; } - else + else #endif { value = (value << 1) | J9_STARTPC_NOT_TRANSLATED; @@ -693,7 +693,7 @@ class CompilationInfo #if defined(J9VM_OPT_MICROJIT) static void setInitialMJITCountUnsynchronized(J9Method *method, int32_t mjitThreshold) { - if(TR::Options::getJITCmdLineOptions()->_mjitEnabled) + if (TR::Options::getJITCmdLineOptions()->_mjitEnabled) { intptr_t value = (intptr_t)((mjitThreshold << 1) | J9_STARTPC_NOT_TRANSLATED); method->extra = reinterpret_cast(static_cast(value)); diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index af38e7e81fb..1a206b34790 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -9097,7 +9097,6 @@ TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary *portLib, void * Trc_JIT_compileStart(vmThread, hotnessString, compiler->signature()); TR_ASSERT(compiler->getMethodHotness() != unknownHotness, "Trying to compile at unknown hotness level"); - { #if defined(J9VM_OPT_MICROJIT) J9Method *method = that->_methodBeingCompiled->getMethodDetails().getMethod(); UDATA extra = (UDATA)method->extra; @@ -9112,7 +9111,6 @@ TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary *portLib, void * { metaData = that->compile(vmThread, compiler, compilee, *vm, p->_optimizationPlan, scratchSegmentProvider); } - } } try @@ -9335,7 +9333,7 @@ TR::CompilationInfoPerThreadBase::performAOTLoad( } #if defined(J9VM_OPT_MICROJIT) -// MicroJIT returns a code size of 0 when it encournters a compilation error +// MicroJIT returns a code size of 0 when it encounters a compilation error // We must use this to set the correct values and return NULL, this is the same // no matter which phase of compilation fails, so we create a macro here to // perform that work. @@ -9390,7 +9388,7 @@ TR::CompilationInfoPerThreadBase::mjit( TR::IlGeneratorMethodDetails & details = _methodBeingCompiled->getMethodDetails(); method = details.getMethod(); - TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); //TODO: fe can be replaced by vm and removed from here. + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); // TODO: fe can be replaced by vm and removed from here. U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); const char* signature = compilee->signature(compiler->trMemory()); @@ -9406,22 +9404,18 @@ TR::CompilationInfoPerThreadBase::mjit( J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); - if(compiler->getOption(TR_TraceCG)){ + if (compiler->getOption(TR_TraceCG)) trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiling %s...\n------------------------------\n", signature); - } - MJIT_COMPILE_ERROR(!(compilee->isConstructor()), method, romMethod); //"MicroJIT does not currently compile constructors" + MJIT_COMPILE_ERROR(!(compilee->isConstructor()), method, romMethod); // "MicroJIT does not currently compile constructors" bool isStaticMethod = compilee->isStatic(); // To know if the method we are compiling is static or not - if (!isStaticMethod){ + if (!isStaticMethod) maxLength += 1; // Make space for an extra character for object reference in case of non-static methods - } char typeString[maxLength]; if (MJIT::nativeSignature(method, typeString)) - { return 0; - } // To insert 'L' for object reference with non-static methods // before the input arguments starting at index 3. @@ -9429,20 +9423,19 @@ TR::CompilationInfoPerThreadBase::mjit( // which is always MJIT::CLASSNAME_TYPE_CHARACTER, i.e., 'L' // and other iterations just move the characters one index down. // e.g. xxIIB -> - if (!isStaticMethod){ - for (int i = maxLength - 1; i > 2 ; i--) { + if (!isStaticMethod) + { + for (int i = maxLength - 1; i > 2 ; i--) typeString[i] = typeString[i-1]; } - } U_16 paramCount = MJIT::getParamCount(typeString, maxLength); U_16 actualParamCount = MJIT::getActualParamCount(typeString, maxLength); MJIT::ParamTableEntry paramTableEntries[actualParamCount*2]; - for(int i=0; igetCompilation()->getOutFile(), "\ngeneratePrePrologue\n"); - for(int32_t i = 0; i < code_size; i++) - { - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); - } + trfprintf(this->getCompilation()->getOutFile(), "\ngeneratePrePrologue\n"); + for (int32_t i = 0; i < code_size; i++) + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + #endif cursor += code_size; - //start point should point to prolog - char * prologue_address = cursor; + // start point should point to prolog + char *prologue_address = cursor; // generate debug breakpoint if (comp()->getOption(TR_EntryBreakPoints)) @@ -9506,39 +9492,29 @@ TR::CompilationInfoPerThreadBase::mjit( buffer_size += code_size; } - char * jitStackOverflowPatchLocation = NULL; + char *jitStackOverflowPatchLocation = NULL; mjit_cg.setPeakStackSize(romMethod->maxStack * mjit_cg.getPointerSize()); - char* firstInstructionLocation = NULL; + char *firstInstructionLocation = NULL; - code_size = mjit_cg.generatePrologue( - cursor, - method, - &jitStackOverflowPatchLocation, - magicWordLocation, - first2BytesPatchLocation, - samplingRecompileCallLocation, - &firstInstructionLocation, - &bcIterator); + code_size = mjit_cg.generatePrologue(cursor, method, &jitStackOverflowPatchLocation, magicWordLocation, first2BytesPatchLocation, samplingRecompileCallLocation, &firstInstructionLocation, &bcIterator); MJIT_COMPILE_ERROR(code_size, method, romMethod); TR::GCStackAtlas *atlas = mjit_cg.getStackAtlas(); compiler->cg()->setStackAtlas(atlas); compiler->cg()->setMethodStackMap(atlas->getLocalMap()); - //TODO: Find out why setting this correctly causes the startPC to report the jitToJit startPC and not the interpreter entry point - //compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(firstInstructionLocation-cursor)); + // TODO: Find out why setting this correctly causes the startPC to report the jitToJit startPC and not the interpreter entry point + // compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(firstInstructionLocation-cursor)); compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(0)); buffer_size += code_size; MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after prologue"); - if(compiler->getOption(TR_TraceCG)) + if (compiler->getOption(TR_TraceCG)) { trfprintf(this->getCompilation()->getOutFile(), "\ngeneratePrologue\n"); - for(int32_t i = 0; i < code_size; i++) - { - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); - } + for (int32_t i = 0; i < code_size; i++) + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); } cursor += code_size; @@ -9549,14 +9525,8 @@ TR::CompilationInfoPerThreadBase::mjit( TR_Debug dbg(this->getCompilation()); this->getCompilation()->setDebug(&dbg); - if(compiler->getOption(TR_TraceCG)) - { - trfprintf( - this->getCompilation()->getOutFile(), - "\n" - "%s\n", - signature); - } + if (compiler->getOption(TR_TraceCG)) + trfprintf(this->getCompilation()->getOutFile(), "\n%s\n", signature); code_size = mjit_cg.generateBody(cursor, &bcIterator); @@ -9566,11 +9536,9 @@ TR::CompilationInfoPerThreadBase::mjit( MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after body"); #ifdef MJIT_DEBUG - trfprintf(this->getCompilation()->getOutFile(), "\ngenerateBody\n"); - for(int32_t i = 0; i < code_size; i++) - { - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); - } + trfprintf(this->getCompilation()->getOutFile(), "\ngenerateBody\n"); + for (int32_t i = 0; i < code_size; i++) + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); #endif cursor += code_size; @@ -9584,25 +9552,21 @@ TR::CompilationInfoPerThreadBase::mjit( MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after cold-area"); #ifdef MJIT_DEBUG - trfprintf(this->getCompilation()->getOutFile(), "\ngenerateColdArea\n"); - for(int32_t i = 0; i < code_size; i++) - { - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); - } - - trfprintf(this->getCompilation()->getOutFile(), "\nfinal method\n"); - for(int32_t i = 0; i < buffer_size; i++) - { - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)buffer[i]) & (unsigned char)0xff); - } + trfprintf(this->getCompilation()->getOutFile(), "\ngenerateColdArea\n"); + for (int32_t i = 0; i < code_size; i++) + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + + trfprintf(this->getCompilation()->getOutFile(), "\nfinal method\n"); + for (int32_t i = 0; i < buffer_size; i++) + trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)buffer[i]) & (unsigned char)0xff); #endif cursor += code_size; // As the body is finished, mark its profile info as active so that the JProfiler thread will inspect it - //TODO: after adding profiling support, uncomment this. - //if (bodyInfo && bodyInfo->getProfileInfo()) + // TODO: after adding profiling support, uncomment this. + // if (bodyInfo && bodyInfo->getProfileInfo()) // { // bodyInfo->getProfileInfo()->setActive(); // } @@ -9611,42 +9575,21 @@ TR::CompilationInfoPerThreadBase::mjit( // compiler->cg()->setBinaryBufferCursor((uint8_t*)(cursor)); compiler->cg()->setBinaryBufferStart((uint8_t*)(buffer)); - if(compiler->getOption(TR_TraceCG)) - { - trfprintf( - this->getCompilation()->getOutFile(), - "Compiled method binary buffer finalized [" POINTER_PRINTF_FORMAT " : " POINTER_PRINTF_FORMAT "] for %s @ %s", - ((void*)buffer), - ((void*)cursor), - compiler->signature(), - compiler->getHotnessName()); - } + if (compiler->getOption(TR_TraceCG)) + trfprintf(this->getCompilation()->getOutFile(), "Compiled method binary buffer finalized [" POINTER_PRINTF_FORMAT " : " POINTER_PRINTF_FORMAT "] for %s @ %s", ((void*)buffer), ((void*)cursor), compiler->signature(), compiler->getHotnessName()); metaData = createMJITMethodMetaData(vm, compilee, compiler); if (!metaData) { if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch)) - { - TR_VerboseLog::writeLineLocked( - TR_Vlog_DISPATCH, - "Failed to create metadata for %s @ %s", - compiler->signature(), - compiler->getHotnessName()); - } + TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Failed to create metadata for %s @ %s", compiler->signature(), compiler->getHotnessName()); compiler->failCompilation("Metadata creation failure"); } if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch)) - { - TR_VerboseLog::writeLineLocked( - TR_Vlog_DISPATCH, - "Successfully created metadata [" POINTER_PRINTF_FORMAT "] for %s @ %s", - metaData, - compiler->signature(), - compiler->getHotnessName()); - } + TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Successfully created metadata [" POINTER_PRINTF_FORMAT "] for %s @ %s", metaData, compiler->signature(), compiler->getHotnessName()); setMetadata(metaData); uint8_t *warmMethodHeader = compiler->cg()->getBinaryBufferStart() - sizeof(OMR::CodeCacheMethodHeader); - memcpy( warmMethodHeader + offsetof(OMR::CodeCacheMethodHeader, _metaData), &metaData, sizeof(metaData) ); + memcpy(warmMethodHeader + offsetof(OMR::CodeCacheMethodHeader, _metaData), &metaData, sizeof(metaData)); // FAR: should we do postpone this copying until after CHTable commit? metaData->runtimeAssumptionList = *(compiler->getMetadataAssumptionList()); TR::ClassTableCriticalSection chTableCommit(&vm); @@ -9654,20 +9597,13 @@ TR::CompilationInfoPerThreadBase::mjit( TR_CHTable *chTable = compiler->getCHTable(); if (prologue_address && compiler->getOption(TR_TraceCG)) - { - trfprintf( - this->getCompilation()->getOutFile(), - "\n" - "MJIT:%s", - compilee->signature(compiler->trMemory())); - } + trfprintf(this->getCompilation()->getOutFile(), "\nMJIT:%s\n", compilee->signature(compiler->trMemory())); compiler->cg()->getCodeCache()->trimCodeMemoryAllocation(buffer, (cursor-buffer)); // END MICROJIT - if(compiler->getOption(TR_TraceCG)){ + if (compiler->getOption(TR_TraceCG)) trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiled %s Successfully!\n------------------------------\n", signature); - } } @@ -9675,33 +9611,33 @@ TR::CompilationInfoPerThreadBase::mjit( TRIGGER_J9HOOK_JIT_COMPILING_END(_jitConfig->hookInterface, vmThread, method); } - catch (const std::exception &e) - { - const char *exceptionName; + catch (const std::exception &e) + { + const char *exceptionName; #if defined(J9ZOS390) - // Compiling with -Wc,lp64 results in a crash on z/OS when trying - // to call the what() virtual method of the exception. - exceptionName = "std::exception"; + // Compiling with -Wc,lp64 results in a crash on z/OS when trying + // to call the what() virtual method of the exception. + exceptionName = "std::exception"; #else - exceptionName = e.what(); + exceptionName = e.what(); #endif - printCompFailureInfo(compiler, exceptionName); - processException(vmThread, scratchSegmentProvider, compiler, haveLockedClassUnloadMonitor, exceptionName); - if (codeCache) + printCompFailureInfo(compiler, exceptionName); + processException(vmThread, scratchSegmentProvider, compiler, haveLockedClassUnloadMonitor, exceptionName); + if (codeCache) + { + if (estimated_size && buffer) { - if(estimated_size && buffer) - { - codeCache->trimCodeMemoryAllocation(buffer, 1); - } - TR::CodeCacheManager::instance()->unreserveCodeCache(codeCache); + codeCache->trimCodeMemoryAllocation(buffer, 1); } - metaData = 0; + TR::CodeCacheManager::instance()->unreserveCodeCache(codeCache); } + metaData = 0; + } // At this point the compilation has either succeeded and compilation cannot be - // interrupted anymore, or it has failed. In either case _compilationShouldBeinterrupted flag + // interrupted anymore, or it has failed. In either case _compilationShouldBeInterrupted flag // is not needed anymore setCompilationShouldBeInterrupted(0); @@ -11343,7 +11279,7 @@ TR::CompilationInfoPerThreadBase::processException( #if defined(J9VM_OPT_MICROJIT) catch (const MJIT::MJITCompilationFailure &e) { - shouldProcessExceptionCommonTasks = false; + shouldProcessExceptionCommonTasks = false; _methodBeingCompiled->_compErrCode = mjitCompilationFailure; } #endif diff --git a/runtime/compiler/control/HookedByTheJit.cpp b/runtime/compiler/control/HookedByTheJit.cpp index c0be1163d06..8941b28af39 100644 --- a/runtime/compiler/control/HookedByTheJit.cpp +++ b/runtime/compiler/control/HookedByTheJit.cpp @@ -611,15 +611,15 @@ static void jitHookInitializeSendTarget(J9HookInterface * * hook, UDATA eventNum } #if defined(J9VM_OPT_MICROJIT) - if(optionsJIT->_mjitEnabled) + if (optionsJIT->_mjitEnabled) { int32_t mjitCount = optionsJIT->_mjitInitialCount; - if(count) - TR::CompilationInfo::setInitialMJITCountUnsynchronized(method,mjitCount); + if (count) + TR::CompilationInfo::setInitialMJITCountUnsynchronized(method, mjitCount); } #endif - TR::CompilationInfo::setInitialInvocationCountUnsynchronized(method,count); + TR::CompilationInfo::setInitialInvocationCountUnsynchronized(method, count); if (TR::Options::getJITCmdLineOptions()->getOption(TR_DumpInitialMethodNamesAndCounts) || TR::Options::getAOTCmdLineOptions()->getOption(TR_DumpInitialMethodNamesAndCounts)) { diff --git a/runtime/compiler/control/J9Options.cpp b/runtime/compiler/control/J9Options.cpp index ba076861edf..b9173175374 100644 --- a/runtime/compiler/control/J9Options.cpp +++ b/runtime/compiler/control/J9Options.cpp @@ -925,9 +925,9 @@ TR::OptionTable OMR::Options::_feOptions[] = { {"minSuperclassArraySize=", "I\t set the size of the minimum superclass array size", TR::Options::setStaticNumeric, (intptr_t)&TR::Options::_minimumSuperclassArraySize, 0, "F%d", NOT_IN_SUBSET}, #if defined(J9VM_OPT_MICROJIT) - {"mjitCount=", "C\tnumber of invocations before MicroJIT compiles methods without loops", + {"mjitCount=", "C\tnumber of invocations before MicroJIT compiles methods without loops", TR::Options::setStaticNumeric, (intptr_t)&TR::Options::_mjitInitialCount, 0, "F%d"}, - {"mjitEnabled=", "C\tenable Microjit (set to 1 to enable, default=0)", + {"mjitEnabled=", "C\tenable MicroJIT (set to 1 to enable, default=0)", TR::Options::setStaticNumeric, (intptr_t)&TR::Options::_mjitEnabled, 0, "F%d"}, #endif {"noregmap", 0, RESET_JITCONFIG_RUNTIME_FLAG(J9JIT_CG_REGISTER_MAPS) }, diff --git a/runtime/compiler/infra/J9Cfg.cpp b/runtime/compiler/infra/J9Cfg.cpp index c2393f4c4cf..0abb161d7cb 100644 --- a/runtime/compiler/infra/J9Cfg.cpp +++ b/runtime/compiler/infra/J9Cfg.cpp @@ -1716,10 +1716,12 @@ J9::CFG::propagateFrequencyInfoFromExternalProfiler(TR_ExternalProfiler *profile #if defined(J9VM_OPT_MICROJIT) J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); U_8 *extendedFlags = reinterpret_cast(comp()->fe())->fetchMethodExtendedFlagsPointer(method); - if( ((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1 && TR::Options::getJITCmdLineOptions()->_mjitEnabled){ - if (!profiler) return; + if (((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1 && TR::Options::getJITCmdLineOptions()->_mjitEnabled) + { + if (!profiler) + return; self()->setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT(); - } + } #endif /* J9VM_OPT_MICROJIT */ if (profiler) { @@ -1789,7 +1791,7 @@ TR_J9ByteCode J9::CFG::getBytecodeFromIndex(int32_t index) { TR_ResolvedJ9Method *method = static_cast(_method->getResolvedMethod()); - TR_J9VMBase * fe = _compilation->fej9(); + TR_J9VMBase *fe = _compilation->fej9(); TR_J9ByteCodeIterator bcIterator(_method, method, fe, _compilation); bcIterator.setIndex(index); return bcIterator.next(); @@ -1820,7 +1822,7 @@ J9::CFG::getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(TR::CFGNode *cf if (*taken || *nottaken) { if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"If on node %p has branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + dumpOptDetails(comp(), "If on node %p has branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); } else if (isVirtualGuard(node)) { @@ -1832,7 +1834,7 @@ J9::CFG::getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(TR::CFGNode *cf *nottaken = sumFreq; if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Guard on node %p has default branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + dumpOptDetails(comp(), "Guard on node %p has default branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); } else if (!cfgNode->asBlock()->isCold()) { @@ -1871,7 +1873,7 @@ J9::CFG::getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(TR::CFGNode *cf } */ if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"If with no profiling information on node %p has low branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + dumpOptDetails(comp(), "If with no profiling information on node %p has low branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); } } @@ -1885,7 +1887,7 @@ J9::CFG::setSwitchEdgeFrequenciesOnNodeMicroJIT(TR::CFGNode *node, TR::Compilati if (sumFrequency < 10) { if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp,"Low count switch I'll set frequencies using uniform edge distribution\n"); + dumpOptDetails(comp, "Low count switch I'll set frequencies using uniform edge distribution\n"); self()->setUniformEdgeFrequenciesOnNode (node, sumFrequency, false, comp); return; @@ -1894,34 +1896,34 @@ J9::CFG::setSwitchEdgeFrequenciesOnNodeMicroJIT(TR::CFGNode *node, TR::Compilati if (treeNode->getInlinedSiteIndex() < -1) { if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp,"Dummy switch generated in estimate code size I'll set frequencies using uniform edge distribution\n"); + dumpOptDetails(comp, "Dummy switch generated in estimate code size I'll set frequencies using uniform edge distribution\n"); - self()->setUniformEdgeFrequenciesOnNode (node, sumFrequency, false, comp); + self()->setUniformEdgeFrequenciesOnNode(node, sumFrequency, false, comp); return; } if (_externalProfiler->isSwitchProfileFlat(treeNode, comp)) { if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp,"Flat profile switch, setting average frequency on each case.\n"); + dumpOptDetails(comp, "Flat profile switch, setting average frequency on each case.\n"); self()->setUniformEdgeFrequenciesOnNode(node, _externalProfiler->getFlatSwitchProfileCounts(treeNode, comp), false, comp); return; } - for ( int32_t count=1; count < treeNode->getNumChildren(); count++) + for (int32_t count=1; count < treeNode->getNumChildren(); count++) { TR::Node *child = treeNode->getChild(count); TR::CFGEdge *e = getCFGEdgeForNode(node, child); - int32_t frequency = _externalProfiler->getSwitchCountForValue (treeNode, (count-1), comp); - e->setFrequency( std::max(frequency,1) ); + int32_t frequency = _externalProfiler->getSwitchCountForValue(treeNode, (count-1), comp); + e->setFrequency(std::max(frequency,1)); if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp,"Edge %p between %d and %d has freq %d (Switch)\n", e, e->getFrom()->getNumber(), e->getTo()->getNumber(), e->getFrequency()); + dumpOptDetails(comp, "Edge %p between %d and %d has freq %d (Switch)\n", e, e->getFrom()->getNumber(), e->getTo()->getNumber(), e->getFrequency()); } } -//TODO: Uncomment this method and implement it using an updated traversal order call that won't look at real tree tops. +// TODO: Uncomment this method and implement it using an updated traversal order call that won't look at real tree tops. /* TR_BitVector * J9::CFG::setBlockAndEdgeFrequenciesBasedOnJITProfilerMicroJIT() @@ -1930,7 +1932,8 @@ J9::CFG::setBlockAndEdgeFrequenciesBasedOnJITProfilerMicroJIT() } */ -bool isBranch(TR_J9ByteCode bc) +bool +isBranch(TR_J9ByteCode bc) { switch(bc) { @@ -1956,7 +1959,8 @@ bool isBranch(TR_J9ByteCode bc) } } -bool isTableOp(TR_J9ByteCode bc) +bool +isTableOp(TR_J9ByteCode bc) { switch(bc) { @@ -1969,9 +1973,9 @@ bool isTableOp(TR_J9ByteCode bc) } bool -J9::CFG::continueLoop(TR::CFGNode *temp) //TODO: Find a better name for this method +J9::CFG::continueLoop(TR::CFGNode *temp) // TODO: Find a better name for this method { - if((temp->getSuccessors().size() == 1) && !temp->asBlock()->getEntry()) + if ((temp->getSuccessors().size() == 1) && !temp->asBlock()->getEntry()) { TR_J9ByteCode bc = this->getLastRealBytecodeOfBlock(temp); return !(isBranch(bc) || isTableOp(bc)); @@ -2044,9 +2048,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if (!temp->getSuccessors().empty()) { if ((temp->getSuccessors().size() == 2) && temp->asBlock() && temp->asBlock()->getNextBlock()) - { temp = temp->asBlock()->getNextBlock(); - } else temp = temp->getSuccessors().front()->getTo(); } @@ -2055,7 +2057,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() } if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Propagation start block_%d\n", temp->getNumber()); + dumpOptDetails(comp(), "Propagation start block_%d\n", temp->getNumber()); if (temp->asBlock()->getEntry()) inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); @@ -2066,14 +2068,14 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() { getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken); startFrequency = taken + nottaken; - self()->setEdgeFrequenciesOnNode( temp, taken, nottaken, comp()); + self()->setEdgeFrequenciesOnNode(temp, taken, nottaken, comp()); } else if (temp->asBlock()->getEntry() && isTableOp(this->getBytecodeFromIndex(temp->asBlock()->getExit()->getNode()->getLocalIndex()))) { startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getExit()->getNode(), comp()); if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Switch with total frequency of %d\n", startFrequency); + dumpOptDetails(comp(), "Switch with total frequency of %d\n", startFrequency); setSwitchEdgeFrequenciesOnNode(temp, comp()); } else @@ -2083,28 +2085,26 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() else if (initialCallScanFreq > 0) startFrequency = initialCallScanFreq; else - { startFrequency = AVG_FREQ; - } if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && isBranch(getLastRealBytecodeOfBlock(temp))) - self()->setEdgeFrequenciesOnNode( temp, 0, startFrequency, comp()); + self()->setEdgeFrequenciesOnNode(temp, 0, startFrequency, comp()); else self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); } - setBlockFrequency (temp, startFrequency); + setBlockFrequency(temp, startFrequency); _initialBlockFrequency = startFrequency; if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); + dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); start = temp; // Walk backwards to the start and propagate this frequency if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Propagating start frequency backwards...\n"); + dumpOptDetails(comp(), "Propagating start frequency backwards...\n"); ListIterator upit(&upStack); for (temp = upit.getFirst(); temp; temp = upit.getNext()) @@ -2112,18 +2112,18 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if (!temp->asBlock()->getEntry()) continue; if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode( temp, 0, startFrequency, comp()); + self()->setEdgeFrequenciesOnNode(temp, 0, startFrequency, comp()); else - self()->setUniformEdgeFrequenciesOnNode( temp, startFrequency, false, comp()); - setBlockFrequency (temp, startFrequency); + self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); + setBlockFrequency(temp, startFrequency); _seenNodes->set(temp->getNumber()); if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); + dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); } if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Propagating block frequency forward...\n"); + dumpOptDetails(comp(), "Propagating block frequency forward...\n"); // Walk reverse post-order // we start at the first if or switch statement @@ -2150,11 +2150,11 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if (node->asBlock()->isCold()) { if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Analyzing COLD block_%d\n", node->getNumber()); + dumpOptDetails(comp(), "Analyzing COLD block_%d\n", node->getNumber()); //node->asBlock()->setFrequency(0); int32_t freq = node->getFrequency(); if ((node->getSuccessors().size() == 2) && (freq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode( node, 0, freq, comp()); + self()->setEdgeFrequenciesOnNode(node, 0, freq, comp()); else self()->setUniformEdgeFrequenciesOnNode(node, freq, false, comp()); setBlockFrequency (node, freq); @@ -2182,11 +2182,11 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() nottaken = LOW_FREQ; } - self()->setEdgeFrequenciesOnNode( node, taken, nottaken, comp()); - setBlockFrequency (node, taken + nottaken); + self()->setEdgeFrequenciesOnNode(node, taken, nottaken, comp()); + setBlockFrequency(node, taken + nottaken); if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"If on node %p is not guard I'm using the taken and nottaken counts for producing block frequency\n", node->asBlock()->getLastRealTreeTop()->getNode()); + dumpOptDetails(comp(), "If on node %p is not guard I'm using the taken and nottaken counts for producing block frequency\n", node->asBlock()->getLastRealTreeTop()->getNode()); } else { @@ -2235,28 +2235,28 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() int32_t sumFreq = summarizeFrequencyFromPredecessors(node, self()); if (sumFreq <= 0) sumFreq = AVG_FREQ; - self()->setEdgeFrequenciesOnNode( node, 0, sumFreq, comp()); - setBlockFrequency (node, sumFreq); + self()->setEdgeFrequenciesOnNode(node, 0, sumFreq, comp()); + setBlockFrequency(node, sumFreq); if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"If on node %p is guard I'm using the predecessor frequency sum\n", node->asBlock()->getLastRealTreeTop()->getNode()); + dumpOptDetails(comp(), "If on node %p is guard I'm using the predecessor frequency sum\n", node->asBlock()->getLastRealTreeTop()->getNode()); } if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); + dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); } else if (node->asBlock()->getEntry() && (node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) + node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) { _seenNodesInCycle->empty(); int32_t sumFreq = _externalProfiler->getSumSwitchCount(node->asBlock()->getLastRealTreeTop()->getNode(), comp()); setSwitchEdgeFrequenciesOnNode(node, comp()); - setBlockFrequency (node, sumFreq); + setBlockFrequency(node, sumFreq); if (comp()->getOption(TR_TraceBFGeneration)) { - dumpOptDetails(comp(),"Found a Switch statement at exit of block_%d\n", node->getNumber()); - dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); + dumpOptDetails(comp(), "Found a Switch statement at exit of block_%d\n", node->getNumber()); + dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); } } else @@ -2268,15 +2268,15 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() { TR::CFGNode *pred = edge->getFrom(); - if (pred->getFrequency()< 0) + if (pred->getFrequency() < 0) { predNotSet = pred; break; } - int32_t edgeFreq = edge->getFrequency(); - sumFreq += edgeFreq; - } + int32_t edgeFreq = edge->getFrequency(); + sumFreq += edgeFreq; + } if (predNotSet && !_seenNodesInCycle->get(node->getNumber())) @@ -2294,17 +2294,17 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if (comp()->getOption(TR_TraceBFGeneration)) traceMsg(comp(), "2Setting block and uniform freqs\n"); - if ((node->getSuccessors().size() == 2) && (sumFreq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode( node, 0, sumFreq, comp()); - else - self()->setUniformEdgeFrequenciesOnNode( node, sumFreq, false, comp()); - setBlockFrequency (node, sumFreq); + if ((node->getSuccessors().size() == 2) && (sumFreq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) + self()->setEdgeFrequenciesOnNode(node, 0, sumFreq, comp()); + else + self()->setUniformEdgeFrequenciesOnNode(node, sumFreq, false, comp()); + setBlockFrequency(node, sumFreq); } if (comp()->getOption(TR_TraceBFGeneration)) { - dumpOptDetails(comp(),"Not an if (or unknown if) at exit of block %d (isSingleton=%d)\n", node->getNumber(), (node->getSuccessors().size() == 1)); - dumpOptDetails(comp(),"Set frequency of %d on block %d\n", node->asBlock()->getFrequency(), node->getNumber()); + dumpOptDetails(comp(), "Not an if (or unknown if) at exit of block %d (isSingleton=%d)\n", node->getNumber(), (node->getSuccessors().size() == 1)); + dumpOptDetails(comp(), "Set frequency of %d on block %d\n", node->asBlock()->getFrequency(), node->getNumber()); } } } @@ -2320,12 +2320,12 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() else { if (!(((succ->getSuccessors().size() == 2) && - succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) || + succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) || (succ->asBlock()->getEntry() && - (succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)))) + (succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || + succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)))) { - setBlockFrequency ( succ, edge->getFrequency(), true); + setBlockFrequency(succ, edge->getFrequency(), true); // addup this edge to the frequency of the blocks following it // propagate downward until you reach a block that doesn't end in @@ -2334,7 +2334,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() { TR::CFGNode *tempNode = succ->getSuccessors().front()->getTo(); TR_BitVector *_seenGotoNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - _seenGotoNodes->set(succ->getNumber()); + _seenGotoNodes->set(succ->getNumber()); while ((tempNode->getSuccessors().size() == 1) && (tempNode != succ) && (tempNode != getEnd()) && @@ -2345,8 +2345,8 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if (comp()->getOption(TR_TraceBFGeneration)) traceMsg(comp(), "3Setting block and uniform freqs\n"); - self()->setUniformEdgeFrequenciesOnNode( tempNode, edge->getFrequency(), true, comp()); - setBlockFrequency (tempNode, edge->getFrequency(), true); + self()->setUniformEdgeFrequenciesOnNode(tempNode, edge->getFrequency(), true, comp()); + setBlockFrequency(tempNode, edge->getFrequency(), true); _seenGotoNodes->set(tempNode->getNumber()); tempNode = nextTempNode; @@ -2382,13 +2382,13 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() for (node = getFirstNode(); node; node=node->getNext()) { if ((node == getEnd()) || (node == start)) - node->asBlock()->setFrequency(0); + node->asBlock()->setFrequency(0); if (_seenNodes->isSet(node->getNumber())) continue; if (node->asBlock()->getEntry() && - (node->asBlock()->getFrequency()<0)) + (node->asBlock()->getFrequency()<0)) node->asBlock()->setFrequency(0); } @@ -2410,7 +2410,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() void J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compilation *comp) { - TR_ExternalProfiler* profiler = comp->fej9()->hasIProfilerBlockFrequencyInfo(*comp); + TR_ExternalProfiler *profiler = comp->fej9()->hasIProfilerBlockFrequencyInfo(*comp); if (!profiler) { _initialBlockFrequency = AVG_FREQ; @@ -2437,12 +2437,12 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compila bool backEdgeExists = false; TR::TreeTop *methodScanEntry = NULL; - while ( (temp->getSuccessors().size() == 1) || - !temp->asBlock()->getEntry() || - !((temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() - && !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) || - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) + while ((temp->getSuccessors().size() == 1) || + !temp->asBlock()->getEntry() || + !((temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() + && !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) || + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) { if (_seenNodes->isSet(temp->getNumber())) break; @@ -2464,9 +2464,7 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compila if (!temp->getSuccessors().empty()) { if ((temp->getSuccessors().size() == 2) && temp->asBlock() && temp->asBlock()->getNextBlock()) - { temp = temp->asBlock()->getNextBlock(); - } else temp = temp->getSuccessors().front()->getTo(); } @@ -2503,9 +2501,7 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compila else if (initialCallScanFreq > 0) startFrequency = initialCallScanFreq; else - { startFrequency = AVG_FREQ; - } } if (startFrequency <= 0) @@ -2513,4 +2509,4 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compila _initialBlockFrequency = startFrequency; } -#endif /* defined(J9VM_OPT_MICROJIT) */ \ No newline at end of file +#endif /* defined(J9VM_OPT_MICROJIT) */ diff --git a/runtime/compiler/microjit/ByteCodeIterator.hpp b/runtime/compiler/microjit/ByteCodeIterator.hpp index 572ae3e327b..1ce97c02883 100644 --- a/runtime/compiler/microjit/ByteCodeIterator.hpp +++ b/runtime/compiler/microjit/ByteCodeIterator.hpp @@ -20,39 +20,42 @@ * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception *******************************************************************************/ - #ifndef MJIT_BYTECODEITERATOR_INCL #define MJIT_BYTECODEITERATOR_INCL #include "ilgen/J9ByteCodeIterator.hpp" -namespace MJIT { - class ByteCodeIterator : public TR_J9ByteCodeIterator - { +namespace MJIT +{ + +class ByteCodeIterator : public TR_J9ByteCodeIterator + { - public: - ByteCodeIterator(TR::ResolvedMethodSymbol *methodSymbol, TR_ResolvedJ9Method *method, TR_J9VMBase * fe, TR::Compilation * comp) : - TR_J9ByteCodeIterator(methodSymbol, method, fe, comp) - {}; + public: + ByteCodeIterator(TR::ResolvedMethodSymbol *methodSymbol, TR_ResolvedJ9Method *method, TR_J9VMBase *fe, TR::Compilation *comp) + : TR_J9ByteCodeIterator(methodSymbol, method, fe, comp) + {}; - void printByteCodes() - { - this->printByteCodePrologue(); - for (TR_J9ByteCode bc = this->first(); bc != J9BCunknown; bc = this->next()) - { + void printByteCodes() + { + this->printByteCodePrologue(); + for (TR_J9ByteCode bc = this->first(); bc != J9BCunknown; bc = this->next()) + { this->printByteCode(); - } - this->printByteCodeEpilogue(); - } - - const char * currentMnemonic() { - uint8_t opcode = nextByte(0); - return ((TR_J9VMBase *)fe())->getByteCodeName(opcode); - } - - uint8_t currentOpcode() { - return nextByte(0); - } - - }; -} -#endif /* MJIT_BYTECODEITERATOR_INCL */ \ No newline at end of file + } + this->printByteCodeEpilogue(); + } + + const char *currentMnemonic() + { + uint8_t opcode = nextByte(0); + return ((TR_J9VMBase *)fe())->getByteCodeName(opcode); + } + + uint8_t currentOpcode() + { + return nextByte(0); + } + }; + +} // namespace MJIT +#endif /* MJIT_BYTECODEITERATOR_INCL */ diff --git a/runtime/compiler/microjit/CMakeLists.txt b/runtime/compiler/microjit/CMakeLists.txt index d52eb7c3e4f..ad53b424d63 100644 --- a/runtime/compiler/microjit/CMakeLists.txt +++ b/runtime/compiler/microjit/CMakeLists.txt @@ -36,4 +36,4 @@ if(TR_HOST_ARCH STREQUAL "x") if(TR_TARGET_ARCH STREQUAL "x") add_subdirectory(x) endif() -endif() \ No newline at end of file +endif() diff --git a/runtime/compiler/microjit/ExceptionTable.cpp b/runtime/compiler/microjit/ExceptionTable.cpp index 1a67f9faf0f..2851efed7f8 100644 --- a/runtime/compiler/microjit/ExceptionTable.cpp +++ b/runtime/compiler/microjit/ExceptionTable.cpp @@ -18,9 +18,8 @@ * * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception *******************************************************************************/ -#include "microjit/ExceptionTable.hpp" - #include +#include "microjit/ExceptionTable.hpp" #include "compile/Compilation.hpp" #include "env/TRMemory.hpp" #include "env/jittypes.h" @@ -35,7 +34,7 @@ class TR_ResolvedMethod; -//TODO: This Should create an Exception table using only information MicroJIT has. +// TODO: This Should create an Exception table using only information MicroJIT has. MJIT::ExceptionTableEntryIterator::ExceptionTableEntryIterator(TR::Compilation *comp) : _compilation(comp) { @@ -48,33 +47,37 @@ MJIT::ExceptionTableEntryIterator::ExceptionTableEntryIterator(TR::Compilation * // // for each exception block: - // create exception ranges from the list of exception predecessors - // + // create exception ranges from the list of exception predecessors + } -//TODO: This Should create an Exception table entry using only information MicroJIT has. +// TODO: This Should create an Exception table entry using only information MicroJIT has. void MJIT::ExceptionTableEntryIterator::addSnippetRanges( - List & tableEntries, TR::Block * snippetBlock, TR::Block * catchBlock, uint32_t catchType, - TR_ResolvedMethod * method, TR::Compilation *comp) + List & tableEntries, + TR::Block *snippetBlock, + TR::Block *catchBlock, + uint32_t catchType, + TR_ResolvedMethod *method, + TR::Compilation *comp) { - //TODO: create a snippet for each exception and add it to the correct place. + // TODO: create a snippet for each exception and add it to the correct place. } TR_ExceptionTableEntry * MJIT::ExceptionTableEntryIterator::getFirst() { - return NULL; + return NULL; } TR_ExceptionTableEntry * MJIT::ExceptionTableEntryIterator::getNext() { - return NULL; + return NULL; } uint32_t MJIT::ExceptionTableEntryIterator::size() { - return 0; + return 0; } diff --git a/runtime/compiler/microjit/ExceptionTable.hpp b/runtime/compiler/microjit/ExceptionTable.hpp index f42501e4f84..118765abbf2 100644 --- a/runtime/compiler/microjit/ExceptionTable.hpp +++ b/runtime/compiler/microjit/ExceptionTable.hpp @@ -22,9 +22,8 @@ #ifndef MJIT_EXCEPTIONTABLE_INCL #define MJIT_EXCEPTIONTABLE_INCL -#include "infra/Array.hpp" - #include +#include "infra/Array.hpp" #include "env/TRMemory.hpp" #include "env/jittypes.h" #include "infra/List.hpp" @@ -35,26 +34,28 @@ namespace TR { class Block; } namespace TR { class Compilation; } template class TR_Array; -namespace MJIT { +namespace MJIT +{ struct ExceptionTableEntryIterator { ExceptionTableEntryIterator(TR::Compilation *comp); - TR_ExceptionTableEntry * getFirst(); - TR_ExceptionTableEntry * getNext(); + TR_ExceptionTableEntry *getFirst(); + TR_ExceptionTableEntry *getNext(); + + uint32_t size(); - uint32_t size(); -private: - void addSnippetRanges(List &, TR::Block *, TR::Block *, uint32_t, TR_ResolvedMethod *, TR::Compilation *); + private: + void addSnippetRanges(List &, TR::Block *, TR::Block *, uint32_t, TR_ResolvedMethod *, TR::Compilation *); - TR::Compilation * _compilation; + TR::Compilation * _compilation; TR_Array > * _tableEntries; ListIterator _entryIterator; int32_t _inlineDepth; uint32_t _handlerIndex; }; -} +} // namespace MJIT #endif /* MJIT_EXCEPTIONTABLE_INCL */ diff --git a/runtime/compiler/microjit/HWMetaDataCreation.cpp b/runtime/compiler/microjit/HWMetaDataCreation.cpp index 0b6d3943f71..f11abcd5faf 100644 --- a/runtime/compiler/microjit/HWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/HWMetaDataCreation.cpp @@ -23,7 +23,6 @@ #include "il/Block.hpp" #include "il/J9Node_inlines.hpp" #include "infra/Cfg.hpp" - #include "microjit/MethodBytecodeMetaData.hpp" #if defined(J9VM_OPT_MJIT_Standalone) @@ -32,598 +31,682 @@ void setupNode( - TR::Node *node, - uint32_t bcIndex, - TR_ResolvedMethod *feMethod, - TR::Compilation *comp -){ - node->getByteCodeInfo().setDoNotProfile(0); - node->setByteCodeIndex(bcIndex); - node->setInlinedSiteIndex(-10); - node->setMethod(feMethod->getPersistentIdentifier()); -} + TR::Node *node, + uint32_t bcIndex, + TR_ResolvedMethod *feMethod, + TR::Compilation *comp) + { + node->getByteCodeInfo().setDoNotProfile(0); + node->setByteCodeIndex(bcIndex); + node->setInlinedSiteIndex(-10); + node->setMethod(feMethod->getPersistentIdentifier()); + } TR::Block * makeBlock( - TR::Compilation *comp, - TR_ResolvedMethod *feMethod, - uint32_t start, - uint32_t end, - TR::CFG &cfg -){ - TR::TreeTop *startTree = TR::TreeTop::create(comp, TR::Node::create(NULL, TR::BBStart, 0)); - TR::TreeTop *endTree = TR::TreeTop::create(comp, TR::Node::create(NULL, TR::BBEnd, 0)); - startTree->join(endTree); - TR::Block *block = TR::Block::createBlock(startTree, endTree, cfg); - block->setBlockBCIndex(start); - block->setNumber(cfg.getNextNodeNumber()); - setupNode(startTree->getNode(), start, feMethod, comp); - setupNode(endTree->getNode(), end, feMethod, comp); - cfg.addNode(block); - return block; -} + TR::Compilation *comp, + TR_ResolvedMethod *feMethod, + uint32_t start, + uint32_t end, + TR::CFG &cfg) + { + TR::TreeTop *startTree = TR::TreeTop::create(comp, TR::Node::create(NULL, TR::BBStart, 0)); + TR::TreeTop *endTree = TR::TreeTop::create(comp, TR::Node::create(NULL, TR::BBEnd, 0)); + startTree->join(endTree); + TR::Block *block = TR::Block::createBlock(startTree, endTree, cfg); + block->setBlockBCIndex(start); + block->setNumber(cfg.getNextNodeNumber()); + setupNode(startTree->getNode(), start, feMethod, comp); + setupNode(endTree->getNode(), end, feMethod, comp); + cfg.addNode(block); + return block; + } void join( - TR::CFG &cfg, - TR::Block *src, - TR::Block *dst, - bool isBranchDest -){ - TR::CFGEdge *e = cfg.addEdge(src, dst); - - // TODO: add case for switch - if (isBranchDest) { - src->getExit()->getNode()->setBranchDestination(dst->getEntry()); - if (dst->getEntry()->getNode()->getByteCodeIndex() < src->getExit()->getNode()->getByteCodeIndex()) - src->setBranchesBackwards(); - } else { - src->getExit()->join(dst->getEntry()); - } - src->addSuccessor(e); - dst->addPredecessor(e); -} - -class BytecodeRange{ - public: - int32_t _start; - int32_t _end; - BytecodeRange(int32_t start) - :_start(start) - ,_end(-1) - {} - inline bool isClosed() { return _end > _start; } - BytecodeRange() = delete; -}; - -class BytecodeRangeList{ - private: - BytecodeRange *_range; - public: - TR_ALLOC(TR_Memory::UnknownType); //MicroJIT TODO: create a memory object type for this and get it added to OMR - BytecodeRangeList *prev; - BytecodeRangeList *next; - BytecodeRangeList(BytecodeRange *range) - :_range(range) - ,next(NULL) - ,prev(NULL) - {} - inline uint32_t getStart() { return _range->_start; } - inline uint32_t getEnd() { return _range->_end; } - inline BytecodeRangeList *insert(BytecodeRangeList *newListEntry) { - if (this == newListEntry || next == newListEntry) return this; - while(_range->_start < newListEntry->getStart()){ - if(next){ - next->insert(newListEntry); - return this; - } else { //end of list - next = newListEntry; - newListEntry->prev = this; - newListEntry->next = NULL; - return this; + TR::CFG &cfg, + TR::Block *src, + TR::Block *dst, + bool isBranchDest) + { + TR::CFGEdge *e = cfg.addEdge(src, dst); + + // TODO: add case for switch + if (isBranchDest) + { + src->getExit()->getNode()->setBranchDestination(dst->getEntry()); + if (dst->getEntry()->getNode()->getByteCodeIndex() < src->getExit()->getNode()->getByteCodeIndex()) + src->setBranchesBackwards(); + } + else + { + src->getExit()->join(dst->getEntry()); + } + src->addSuccessor(e); + dst->addPredecessor(e); + } + +class BytecodeRange + { + + public: + int32_t _start; + int32_t _end; + BytecodeRange(int32_t start) + :_start(start) + ,_end(-1) + {} + inline bool isClosed() + { + return _end > _start; + } + BytecodeRange() = delete; + }; + +class BytecodeRangeList + { + + private: + BytecodeRange *_range; + + public: + TR_ALLOC(TR_Memory::UnknownType); // TODO: MicroJIT: create a memory object type for this and get it added to OMR + BytecodeRangeList *prev; + BytecodeRangeList *next; + BytecodeRangeList(BytecodeRange *range) + :_range(range) + ,next(NULL) + ,prev(NULL) + {} + inline uint32_t getStart() + { + return _range->_start; + } + inline uint32_t getEnd() + { + return _range->_end; + } + inline BytecodeRangeList *insert(BytecodeRangeList *newListEntry) + { + if (this == newListEntry || next == newListEntry) + return this; + while (_range->_start < newListEntry->getStart()) + { + if (next) + { + next->insert(newListEntry); + return this; + } + else + { // end of list + next = newListEntry; + newListEntry->prev = this; + newListEntry->next = NULL; + return this; + } } - } - MJIT_ASSERT_NO_MSG(_range->_start != newListEntry->getStart()); - if (prev) prev->next = newListEntry; - newListEntry->next = this; - newListEntry->prev = prev; - prev = newListEntry; - return newListEntry; - } -}; - -class BytecodeIndexList{ -private: - BytecodeIndexList *_prev; - BytecodeIndexList *_next; - TR::CFG &_cfg; -public: - TR_ALLOC(TR_Memory::UnknownType); //MicroJIT TODO: create a memory object type for this and get it added to OMR - uint32_t _index; - BytecodeIndexList(uint32_t index, BytecodeIndexList *prev, BytecodeIndexList *next, TR::CFG &cfg) - :_index(index) - ,_prev(prev) - ,_next(next) - ,_cfg(cfg) - {} - BytecodeIndexList() = delete; - BytecodeIndexList * appendBCI(uint32_t i){ - if (i == _index) return this; - if (_next) return _next->appendBCI(i); - BytecodeIndexList *newIndex = new (_cfg.getInternalRegion()) BytecodeIndexList(i, this, _next, _cfg); - _next = newIndex; - return newIndex; - } - BytecodeIndexList * getBCIListEntry(uint32_t i){ - if (i == _index) { + MJIT_ASSERT_NO_MSG(_range->_start != newListEntry->getStart()); + if (prev) + prev->next = newListEntry; + newListEntry->next = this; + newListEntry->prev = prev; + prev = newListEntry; + return newListEntry; + } + }; + +class BytecodeIndexList + { + + private: + BytecodeIndexList *_prev; + BytecodeIndexList *_next; + TR::CFG &_cfg; + + public: + TR_ALLOC(TR_Memory::UnknownType); //MicroJIT TODO: create a memory object type for this and get it added to OMR + uint32_t _index; + BytecodeIndexList(uint32_t index, BytecodeIndexList *prev, BytecodeIndexList *next, TR::CFG &cfg) + :_index(index) + ,_prev(prev) + ,_next(next) + ,_cfg(cfg) + {} + BytecodeIndexList() = delete; + BytecodeIndexList *appendBCI(uint32_t i) + { + if (i == _index) return this; - }else if (i < _index) { + if (_next) + return _next->appendBCI(i); + BytecodeIndexList *newIndex = new (_cfg.getInternalRegion()) BytecodeIndexList(i, this, _next, _cfg); + _next = newIndex; + return newIndex; + } + BytecodeIndexList *getBCIListEntry(uint32_t i) + { + if (i == _index) + { + return this; + } + else if (i < _index) + { return _prev->getBCIListEntry(i); - } else if(_next && _next->_index <= i) { // (i > _index) + } + else if (_next && _next->_index <= i) + { // (i > _index) return _next->getBCIListEntry(i); - } else { + } + else + { BytecodeIndexList *newIndex = new (_cfg.getInternalRegion()) BytecodeIndexList(i, this, _next, _cfg); return newIndex; - } - } - inline uint32_t getPreviousIndex(){ - return _prev->_index; - } - inline BytecodeIndexList * getPrev(){ return _prev; } - BytecodeIndexList * getTail(){ - if (_next) return _next->getTail(); - return this; - } - inline void cutNext() { - if (!_next) return; - _next->_prev = NULL; - _next = NULL; - } -}; - -class BlockList{ -private: - BlockList *_next; - BlockList *_prev; - BytecodeRangeList *_successorList; - uint32_t _successorCount; - TR::CFG &_cfg; - BlockList** _tail; - BytecodeIndexList* _bcListHead; - BytecodeIndexList* _bcListCurrent; -public: - TR_ALLOC(TR_Memory::UnknownType) //MicroJIT TODO: create a memory object type for this and get it added to OMR - BytecodeRange _range; - TR::Block* _block; - - BlockList(TR::CFG &cfg, uint32_t start, BlockList** tail) - :_next(NULL) - ,_prev(NULL) - ,_successorList(NULL) - ,_successorCount(0) - ,_cfg(cfg) - ,_tail(tail) - ,_bcListHead(NULL) - ,_bcListCurrent(NULL) - ,_range(start) - ,_block(NULL) - { - _bcListHead = new (_cfg.getInternalRegion()) BytecodeIndexList(start, NULL, NULL, _cfg); - _bcListCurrent = _bcListHead; - } - - inline void addBCIndex(uint32_t index){ - _bcListCurrent = _bcListCurrent->appendBCI(index); - } - - inline void setBCListHead(BytecodeIndexList* list){ - _bcListHead = list; - } - - inline void setBCListCurrent(BytecodeIndexList* list){ - _bcListCurrent = list; - } - - inline void close(){ - _range._end = _bcListCurrent->_index; - } - - inline void insertAfter(BlockList* bl) - { - MJIT_ASSERT_NO_MSG(bl); - bl->_next = this->_next; - bl->_prev = this; - if(this->_next){ + } + } + inline uint32_t getPreviousIndex() + { + return _prev->_index; + } + inline BytecodeIndexList *getPrev() + { + return _prev; + } + BytecodeIndexList *getTail() + { + if (_next) + return _next->getTail(); + return this; + } + inline void cutNext() + { + if (!_next) + return; + _next->_prev = NULL; + _next = NULL; + } + }; + +class BlockList + { + + private: + BlockList *_next; + BlockList *_prev; + BytecodeRangeList *_successorList; + uint32_t _successorCount; + TR::CFG &_cfg; + BlockList** _tail; + BytecodeIndexList *_bcListHead; + BytecodeIndexList *_bcListCurrent; + public: + TR_ALLOC(TR_Memory::UnknownType) // TODO: MicroJIT: create a memory object type for this and get it added to OMR + BytecodeRange _range; + TR::Block *_block; + + BlockList(TR::CFG &cfg, uint32_t start, BlockList** tail) + :_next(NULL) + ,_prev(NULL) + ,_successorList(NULL) + ,_successorCount(0) + ,_cfg(cfg) + ,_tail(tail) + ,_bcListHead(NULL) + ,_bcListCurrent(NULL) + ,_range(start) + ,_block(NULL) + { + _bcListHead = new (_cfg.getInternalRegion()) BytecodeIndexList(start, NULL, NULL, _cfg); + _bcListCurrent = _bcListHead; + } + + inline void addBCIndex(uint32_t index) + { + _bcListCurrent = _bcListCurrent->appendBCI(index); + } + + inline void setBCListHead(BytecodeIndexList *list) + { + _bcListHead = list; + } + + inline void setBCListCurrent(BytecodeIndexList *list) + { + _bcListCurrent = list; + } + + inline void close() + { + _range._end = _bcListCurrent->_index; + } + + inline void insertAfter(BlockList *bl) + { + MJIT_ASSERT_NO_MSG(bl); + bl->_next = this->_next; + bl->_prev = this; + if (this->_next) this->_next->_prev = bl; - } - this->_next = bl; - if(this == *_tail){ + this->_next = bl; + if (this == *_tail) *_tail = bl; - } - } + } - inline void addSuccessor(BytecodeRange *successor){ - BytecodeRangeList *brl = new (_cfg.getInternalRegion()) BytecodeRangeList(successor); - if(_successorList){ + inline void addSuccessor(BytecodeRange *successor) + { + BytecodeRangeList *brl = new (_cfg.getInternalRegion()) BytecodeRangeList(successor); + if (_successorList) _successorList = _successorList->insert(brl); - } else { + else _successorList = brl; - } - _successorCount++; - //Until we support switch and other multiple end point bytecodes this should never go over 2 - MJIT_ASSERT_NO_MSG(_successorCount > 0 && _successorCount <= 2); - } - - BlockList * getBlockListByStartIndex(uint32_t index){ - //NOTE always start searching from a block with a lower start than the index - MJIT_ASSERT_NO_MSG(_range._start <= index); - - // If this is the target block return it - if (_range._start == index) return this; - if (_range._start < index && _next && _next->_range._start > index) return this; - - // If the branch is after this block, and there is more list to search, then continue searching - if (_next) return _next->getBlockListByStartIndex(index); - - //There was no block in the BlockList for this index - return NULL; - } - - BlockList * getOrMakeBlockListByBytecodeIndex(uint32_t index){ - //NOTE always start searching from a block with a lower start than the index - MJIT_ASSERT_NO_MSG(_range._start <= index); - - // If this is the target block return it - if (_range._start == index) return this; - - // If the branch targets the middle of this block (which may be under construction) split it on that index - if (_range.isClosed() && index > _range._start && index <= _range._end) return splitBlockListOnIndex(index); - if (!_range.isClosed() && _next && index > _range._start && index < _next->_range._start) return splitBlockListOnIndex(index); - - // If the branch is after this block, and there is more list to search, then continue searching - if (_next && index > _next->_range._start) return _next->getOrMakeBlockListByBytecodeIndex(index); - - // The target isn't before this block, isn't this block, isn't in this block, and there is no more list to search. - // Time to make a new block and add it to the list - BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); - this->insertAfter(target); - return target; - } - - BlockList * splitBlockListOnIndex(uint32_t index){ - BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); - insertAfter(target); - - BytecodeIndexList *splitPoint = _bcListHead->getBCIListEntry(index); - target->setBCListHead(splitPoint); - target->setBCListCurrent(splitPoint->getTail()); - - _range._end = splitPoint->getPreviousIndex(); - _bcListCurrent = splitPoint->getPrev(); - _bcListCurrent->cutNext(); - - return target; - } - - BlockList * getNext(){ - return _next; - } - - BlockList * getPrev(){ - return _prev; - } - - inline uint32_t getSuccessorCount(){ - return _successorCount; - } - - inline void getDestinations(uint32_t *destinations){ - BytecodeRangeList *range = _successorList; - for(uint32_t i=0; i<_successorCount; i++){ + _successorCount++; + // Until we support switch and other multiple end point bytecodes this should never go over 2 + MJIT_ASSERT_NO_MSG(_successorCount > 0 && _successorCount <= 2); + } + + BlockList *getBlockListByStartIndex(uint32_t index) + { + // NOTE always start searching from a block with a lower start than the index + MJIT_ASSERT_NO_MSG(_range._start <= index); + + // If this is the target block return it + if (_range._start == index) + return this; + if (_range._start < index && _next && _next->_range._start > index) + return this; + + // If the branch is after this block, and there is more list to search, then continue searching + if (_next) + return _next->getBlockListByStartIndex(index); + + // There was no block in the BlockList for this index + return NULL; + } + + BlockList *getOrMakeBlockListByBytecodeIndex(uint32_t index) + { + // NOTE always start searching from a block with a lower start than the index + MJIT_ASSERT_NO_MSG(_range._start <= index); + + // If this is the target block return it + if (_range._start == index) + return this; + + // If the branch targets the middle of this block (which may be under construction) split it on that index + if (_range.isClosed() && index > _range._start && index <= _range._end) + return splitBlockListOnIndex(index); + if (!_range.isClosed() && _next && index > _range._start && index < _next->_range._start) + return splitBlockListOnIndex(index); + + // If the branch is after this block, and there is more list to search, then continue searching + if (_next && index > _next->_range._start) + return _next->getOrMakeBlockListByBytecodeIndex(index); + + // The target isn't before this block, isn't this block, isn't in this block, and there is no more list to search. + // Time to make a new block and add it to the list + BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); + this->insertAfter(target); + return target; + } + + BlockList *splitBlockListOnIndex(uint32_t index) + { + BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); + insertAfter(target); + + BytecodeIndexList *splitPoint = _bcListHead->getBCIListEntry(index); + target->setBCListHead(splitPoint); + target->setBCListCurrent(splitPoint->getTail()); + + _range._end = splitPoint->getPreviousIndex(); + _bcListCurrent = splitPoint->getPrev(); + _bcListCurrent->cutNext(); + + return target; + } + + BlockList *getNext() + { + return _next; + } + + BlockList *getPrev() + { + return _prev; + } + + inline uint32_t getSuccessorCount() + { + return _successorCount; + } + + inline void getDestinations(uint32_t *destinations) + { + BytecodeRangeList *range = _successorList; + for (uint32_t i=0; i<_successorCount; i++) + { destinations[i] = range->getStart(); range = _successorList->next; - } - } -}; - -class CFGCreator{ -private: - BlockList *_head; - BlockList *_current; - BlockList *_tail; - bool _newBlock; - bool _fallThrough; - uint32_t _blockCount; - TR::CFG &_cfg; - TR_ResolvedMethod *_compilee; - TR::Compilation *_comp; - -public: - CFGCreator(TR::CFG &cfg, TR_ResolvedMethod *compilee, TR::Compilation *comp) - :_head(NULL) - ,_current(NULL) - ,_tail(NULL) - ,_newBlock(false) - ,_fallThrough(true) - ,_blockCount(0) - ,_cfg(cfg) - ,_compilee(compilee) - ,_comp(comp) - {} - void addBytecodeIndexToCFG( - TR_J9ByteCodeIterator *bci - ){ - //Just started parsing, make the block list. - if(!_head){ + } + } + }; + +class CFGCreator + { + + private: + BlockList *_head; + BlockList *_current; + BlockList *_tail; + bool _newBlock; + bool _fallThrough; + uint32_t _blockCount; + TR::CFG &_cfg; + TR_ResolvedMethod *_compilee; + TR::Compilation *_comp; + + public: + CFGCreator(TR::CFG &cfg, TR_ResolvedMethod *compilee, TR::Compilation *comp) + :_head(NULL) + ,_current(NULL) + ,_tail(NULL) + ,_newBlock(false) + ,_fallThrough(true) + ,_blockCount(0) + ,_cfg(cfg) + ,_compilee(compilee) + ,_comp(comp) + {} + + void addBytecodeIndexToCFG(TR_J9ByteCodeIterator *bci) + { + // Just started parsing, make the block list. + if (!_head) + { _head = new (_cfg.getInternalRegion()) BlockList(_cfg, bci->currentByteCodeIndex(), &_tail); _tail = _head; _current = _head; - } + } - //Given current state and current bytecode, update BlockList - BlockList *nextBlock = _head->getBlockListByStartIndex(bci->currentByteCodeIndex()); + // Given current state and current bytecode, update BlockList + BlockList *nextBlock = _head->getBlockListByStartIndex(bci->currentByteCodeIndex()); - //If we have a block for this (possibly from a forward jump) - // we should use that block, and close the current one. - //If the current block is the right block to work on, then don't close it. - if(nextBlock && nextBlock != _current) { + // If we have a block for this (possibly from a forward jump) + // we should use that block, and close the current one. + // If the current block is the right block to work on, then don't close it. + if (nextBlock && nextBlock != _current) + { _current->close(); _current = _head->getBlockListByStartIndex(bci->currentByteCodeIndex()); - //If we had thought we needed a new block, override it + // If we had thought we needed a new block, override it _newBlock = false; _fallThrough = true; - } + } - if(_newBlock){//Current block has ended in a branch, this is a new block now + if (_newBlock) + { // Current block has ended in a branch, this is a new block now nextBlock = new (_cfg.getInternalRegion()) BlockList(_cfg, bci->currentByteCodeIndex(), &_tail); _current->insertAfter(nextBlock); - if(_fallThrough) _current->addSuccessor(&(nextBlock->_range)); + if(_fallThrough) + _current->addSuccessor(&(nextBlock->_range)); _newBlock = false; _fallThrough = true; _current = nextBlock; - } else if(bci->isBranch()) { //Get the target block, make it if it isn't there already + } + else if (bci->isBranch()) + { // Get the target block, make it if it isn't there already _current->_range._end = bci->currentByteCodeIndex(); nextBlock = _head->getOrMakeBlockListByBytecodeIndex(bci->branchDestination(bci->currentByteCodeIndex())); - if(bci->current() == J9BCgoto || bci->current() == J9BCgotow) _fallThrough = false; + if (bci->current() == J9BCgoto || bci->current() == J9BCgotow) + _fallThrough = false; _current->addSuccessor(&(nextBlock->_range)); _newBlock = true; - } else { //Still working on current block - // TODO: determine if there are any edge cases that need to be filled in here. - } - _current->addBCIndex(bci->currentByteCodeIndex()); - } - - void buildCFG(){ - // Make the blocks - BlockList * temp = _head; - while(temp != NULL){ + } + else + { // Still working on current block + // TODO: determine if there are any edge cases that need to be filled in here. + } + _current->addBCIndex(bci->currentByteCodeIndex()); + } + + void buildCFG() + { + // Make the blocks + BlockList *temp = _head; + while (temp != NULL) + { uint32_t start = temp->_range._start; uint32_t end = temp->_range._end; temp->_block = makeBlock(_comp, _compilee, start, end, _cfg); temp = temp->getNext(); - } + } - // Use successor lists to build CFG - temp = _head; - while(temp != NULL){ + // Use successor lists to build CFG + temp = _head; + while (temp != NULL) + { uint32_t successorCount = temp->getSuccessorCount(); uint32_t destinations[successorCount]; temp->getDestinations(destinations); - for(uint32_t i=0; igetBlockListByStartIndex(destinations[i]); - join(_cfg, temp->_block, dst->_block, (dst == temp->getNext())); - } + for (uint32_t i=0; igetBlockListByStartIndex(destinations[i]); + join(_cfg, temp->_block, dst->_block, (dst == temp->getNext())); + } temp = temp->getNext(); - } - _cfg.setStart(_head->_block); - } -}; - - -TR::Optimizer * createOptimizer(TR::Compilation *comp, TR::ResolvedMethodSymbol *methodSymbol) -{ - return new (comp->trHeapMemory()) TR::Optimizer(comp, methodSymbol, true, J9::Optimizer::microJITOptimizationStrategy(comp)); -} + } + _cfg.setStart(_head->_block); + } + }; +TR::Optimizer *createOptimizer(TR::Compilation *comp, TR::ResolvedMethodSymbol *methodSymbol) + { + return new (comp->trHeapMemory()) TR::Optimizer(comp, methodSymbol, true, J9::Optimizer::microJITOptimizationStrategy(comp)); + } MJIT::InternalMetaData MJIT::createInternalMethodMetadata( - MJIT::ByteCodeIterator *bci, - MJIT::LocalTableEntry *localTableEntries, - U_16 entries, - int32_t offsetToFirstLocal, - TR_ResolvedMethod *compilee, - TR::Compilation *comp, - MJIT::ParamTable *paramTable, - uint8_t pointerSize, - bool *resolvedAllCallees -){ - int32_t totalSize = 0; - - int32_t localIndex = 0; // Indexes are 0 based and positive - int32_t lastParamSlot = 0; - int32_t gcMapOffset = 0; // Offset will always be greater than 0 - U_16 size = 0; //Sizes are all multiples of 8 and greater than 0 - bool isRef = false; //No good default here, both options are valid. - - uint16_t maxCalleeArgsSize = 0; - - //TODO: Create the CFG before here and pass reference - bool profilingCompilation = comp->getOption(TR_EnableJProfiling) || comp->getProfilingMode() == JProfiling; - - TR::CFG *cfg; - if(profilingCompilation){ - cfg = new (comp->trHeapMemory()) TR::CFG(comp, compilee->findOrCreateJittedMethodSymbol(comp)); - } - CFGCreator creator(*cfg, compilee, comp); - ParamTableEntry entry; - for(int i=0; igetParamCount(); i+=entry.slots){ - MJIT_ASSERT_NO_MSG(paramTable->getEntry(i, &entry)); - size = entry.slots*pointerSize; - localTableEntries[i] = makeLocalTableEntry(i, totalSize + offsetToFirstLocal, size, entry.isReference); - totalSize += size; - lastParamSlot += localTableEntries[i].slots; - } - - for(TR_J9ByteCode bc = bci->first(); bc != J9BCunknown; bc = bci->next()){ - /* It's okay to overwrite entries here. - * We are not storing whether the entry is used for a - * load or a store because it could be both. Identifying entries - * that we have already created would require zeroing the - * array before use and/or remembering which entries we've created. - * Both might be as much work as just recreating entries, depending - * on how many times a local is stored/loaded during a method. - * This may be worth doing later, but requires some research first. - * - * TODO: Arguments are not guaranteed to be used in the bytecode of a method - * e.g. class MyClass { public int ret3() {return 3;} } - * The above method is not a static method (arg 0 is an object reference) but the BC - * will be [iconst_3, return]. - * We must find a way to take the parameter table in and create the required entries - * in the local table. - */ - - if (profilingCompilation) creator.addBytecodeIndexToCFG(bci); - gcMapOffset = 0; - isRef = false; - switch(bc){ - MakeEntry: - gcMapOffset = offsetToFirstLocal + totalSize; - localTableEntries[localIndex] = makeLocalTableEntry(localIndex, gcMapOffset, size, isRef); - totalSize += size; - break; - case J9BCiload: - case J9BCistore: - case J9BCfstore: - case J9BCfload: - localIndex = (int32_t)bci->nextByte(); - size = pointerSize; - goto MakeEntry; - case J9BCiload0: - case J9BCistore0: - case J9BCfload0: - case J9BCfstore0: - localIndex = 0; - size = pointerSize; - goto MakeEntry; - case J9BCiload1: - case J9BCistore1: - case J9BCfstore1: - case J9BCfload1: - localIndex = 1; - size = pointerSize; - goto MakeEntry; - case J9BCiload2: - case J9BCistore2: - case J9BCfstore2: - case J9BCfload2: - localIndex = 2; - size = pointerSize; - goto MakeEntry; - case J9BCiload3: - case J9BCistore3: - case J9BCfstore3: - case J9BCfload3: - localIndex = 3; - size = pointerSize; - goto MakeEntry; - case J9BClstore: - case J9BClload: - case J9BCdstore: - case J9BCdload: - localIndex = (int32_t)bci->nextByte(); - size = pointerSize*2; - goto MakeEntry; - case J9BClstore0: - case J9BClload0: - case J9BCdload0: - case J9BCdstore0: - localIndex = 0; - size = pointerSize*2; - goto MakeEntry; - case J9BClstore1: - case J9BCdstore1: - case J9BCdload1: - case J9BClload1: - localIndex = 1; - size = pointerSize*2; - goto MakeEntry; - case J9BClstore2: - case J9BClload2: - case J9BCdstore2: - case J9BCdload2: - localIndex = 2; - size = pointerSize*2; - goto MakeEntry; - case J9BClstore3: - case J9BClload3: - case J9BCdstore3: - case J9BCdload3: - localIndex = 3; - size = pointerSize*2; - goto MakeEntry; - case J9BCaload: - case J9BCastore: - localIndex = (int32_t)bci->nextByte(); - size = pointerSize; - isRef = true; - goto MakeEntry; - case J9BCaload0: - case J9BCastore0: - localIndex = 0; - size = pointerSize; - isRef = true; - goto MakeEntry; - case J9BCaload1: - case J9BCastore1: - localIndex = 1; - size = pointerSize; - isRef = true; - goto MakeEntry; - case J9BCaload2: - case J9BCastore2: - localIndex = 2; - size = pointerSize; - isRef = true; - goto MakeEntry; - case J9BCaload3: - case J9BCastore3: - localIndex = 3; - size = pointerSize; - isRef = true; - goto MakeEntry; - case J9BCinvokestatic: - { - int32_t cpIndex = (int32_t)bci->next2Bytes(); - bool isUnresolvedInCP; - TR_ResolvedMethod* resolved = compilee->getResolvedStaticMethod(comp, cpIndex, &isUnresolvedInCP); - if (!resolved) { - *resolvedAllCallees = false; - break; - } - J9Method *ramMethod = static_cast(resolved)->ramMethod(); - J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(ramMethod); - //TODO replace this with a more accurate count. This is a clear upper bound; - uint16_t calleeArgsSize = romMethod->argCount * pointerSize * 2; //assume 2 slot for now - maxCalleeArgsSize = (calleeArgsSize > maxCalleeArgsSize) ? calleeArgsSize : maxCalleeArgsSize; - } - default: - break; - } - } - - if(profilingCompilation) { - creator.buildCFG(); - comp->getMethodSymbol()->setFlowGraph(cfg); - TR::Optimizer *optimizer = createOptimizer(comp, comp->getMethodSymbol()); - comp->setOptimizer(optimizer); - optimizer->optimize(); - } - - MJIT::LocalTable localTable(localTableEntries, entries); - - MJIT::InternalMetaData internalMetaData(localTable, cfg, maxCalleeArgsSize); - return internalMetaData; -} -#endif /* TR_MJIT_Interop */ \ No newline at end of file + MJIT::ByteCodeIterator *bci, + MJIT::LocalTableEntry *localTableEntries, + U_16 entries, + int32_t offsetToFirstLocal, + TR_ResolvedMethod *compilee, + TR::Compilation *comp, + MJIT::ParamTable *paramTable, + uint8_t pointerSize, + bool *resolvedAllCallees) + { + int32_t totalSize = 0; + + int32_t localIndex = 0; // Indexes are 0 based and positive + int32_t lastParamSlot = 0; + int32_t gcMapOffset = 0; // Offset will always be greater than 0 + U_16 size = 0; // Sizes are all multiples of 8 and greater than 0 + bool isRef = false; // No good default here, both options are valid. + + uint16_t maxCalleeArgsSize = 0; + + // TODO: Create the CFG before here and pass reference + bool profilingCompilation = comp->getOption(TR_EnableJProfiling) || comp->getProfilingMode() == JProfiling; + + TR::CFG *cfg; + if (profilingCompilation) + cfg = new (comp->trHeapMemory()) TR::CFG(comp, compilee->findOrCreateJittedMethodSymbol(comp)); + + CFGCreator creator(*cfg, compilee, comp); + ParamTableEntry entry; + for (int i=0; igetParamCount(); i+=entry.slots) + { + MJIT_ASSERT_NO_MSG(paramTable->getEntry(i, &entry)); + size = entry.slots*pointerSize; + localTableEntries[i] = makeLocalTableEntry(i, totalSize + offsetToFirstLocal, size, entry.isReference); + totalSize += size; + lastParamSlot += localTableEntries[i].slots; + } + + for (TR_J9ByteCode bc = bci->first(); bc != J9BCunknown; bc = bci->next()) + { + /* It's okay to overwrite entries here. + * We are not storing whether the entry is used for a + * load or a store because it could be both. Identifying entries + * that we have already created would require zeroing the + * array before use and/or remembering which entries we've created. + * Both might be as much work as just recreating entries, depending + * on how many times a local is stored/loaded during a method. + * This may be worth doing later, but requires some research first. + * + * TODO: Arguments are not guaranteed to be used in the bytecode of a method + * e.g. class MyClass { public int ret3() {return 3;} } + * The above method is not a static method (arg 0 is an object reference) but the BC + * will be [iconst_3, return]. + * We must find a way to take the parameter table in and create the required entries + * in the local table. + */ + + if (profilingCompilation) + creator.addBytecodeIndexToCFG(bci); + gcMapOffset = 0; + isRef = false; + switch (bc) + { + MakeEntry: + gcMapOffset = offsetToFirstLocal + totalSize; + localTableEntries[localIndex] = makeLocalTableEntry(localIndex, gcMapOffset, size, isRef); + totalSize += size; + break; + case J9BCiload: + case J9BCistore: + case J9BCfstore: + case J9BCfload: + localIndex = (int32_t)bci->nextByte(); + size = pointerSize; + goto MakeEntry; + case J9BCiload0: + case J9BCistore0: + case J9BCfload0: + case J9BCfstore0: + localIndex = 0; + size = pointerSize; + goto MakeEntry; + case J9BCiload1: + case J9BCistore1: + case J9BCfstore1: + case J9BCfload1: + localIndex = 1; + size = pointerSize; + goto MakeEntry; + case J9BCiload2: + case J9BCistore2: + case J9BCfstore2: + case J9BCfload2: + localIndex = 2; + size = pointerSize; + goto MakeEntry; + case J9BCiload3: + case J9BCistore3: + case J9BCfstore3: + case J9BCfload3: + localIndex = 3; + size = pointerSize; + goto MakeEntry; + case J9BClstore: + case J9BClload: + case J9BCdstore: + case J9BCdload: + localIndex = (int32_t)bci->nextByte(); + size = pointerSize*2; + goto MakeEntry; + case J9BClstore0: + case J9BClload0: + case J9BCdload0: + case J9BCdstore0: + localIndex = 0; + size = pointerSize*2; + goto MakeEntry; + case J9BClstore1: + case J9BCdstore1: + case J9BCdload1: + case J9BClload1: + localIndex = 1; + size = pointerSize*2; + goto MakeEntry; + case J9BClstore2: + case J9BClload2: + case J9BCdstore2: + case J9BCdload2: + localIndex = 2; + size = pointerSize*2; + goto MakeEntry; + case J9BClstore3: + case J9BClload3: + case J9BCdstore3: + case J9BCdload3: + localIndex = 3; + size = pointerSize*2; + goto MakeEntry; + case J9BCaload: + case J9BCastore: + localIndex = (int32_t)bci->nextByte(); + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCaload0: + case J9BCastore0: + localIndex = 0; + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCaload1: + case J9BCastore1: + localIndex = 1; + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCaload2: + case J9BCastore2: + localIndex = 2; + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCaload3: + case J9BCastore3: + localIndex = 3; + size = pointerSize; + isRef = true; + goto MakeEntry; + case J9BCinvokestatic: + { + int32_t cpIndex = (int32_t)bci->next2Bytes(); + bool isUnresolvedInCP; + TR_ResolvedMethod *resolved = compilee->getResolvedStaticMethod(comp, cpIndex, &isUnresolvedInCP); + if (!resolved) + { + *resolvedAllCallees = false; + break; + } + J9Method *ramMethod = static_cast(resolved)->ramMethod(); + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(ramMethod); + // TODO replace this with a more accurate count. This is a clear upper bound; + uint16_t calleeArgsSize = romMethod->argCount * pointerSize * 2; // assume 2 slot for now + maxCalleeArgsSize = (calleeArgsSize > maxCalleeArgsSize) ? calleeArgsSize : maxCalleeArgsSize; + } + default: + break; + } + } + + if(profilingCompilation) + { + creator.buildCFG(); + comp->getMethodSymbol()->setFlowGraph(cfg); + TR::Optimizer *optimizer = createOptimizer(comp, comp->getMethodSymbol()); + comp->setOptimizer(optimizer); + optimizer->optimize(); + } + + MJIT::LocalTable localTable(localTableEntries, entries); + + MJIT::InternalMetaData internalMetaData(localTable, cfg, maxCalleeArgsSize); + return internalMetaData; + } +#endif /* TR_MJIT_Interop */ diff --git a/runtime/compiler/microjit/LWMetaDataCreation.cpp b/runtime/compiler/microjit/LWMetaDataCreation.cpp index ea0f38c08a7..3ca6afd5d2c 100644 --- a/runtime/compiler/microjit/LWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/LWMetaDataCreation.cpp @@ -22,7 +22,7 @@ #include "microjit/MethodBytecodeMetaData.hpp" #if defined(J9VM_OPT_MJIT_Standalone) -//TODO: Complete this section so that MicroJIT can be used in a future case where MicroJIT won't deploy with TR. +// TODO: Complete this section so that MicroJIT can be used in a future case where MicroJIT won't deploy with TR. #else /* do nothing */ -#endif /* TR_MJIT_Interop */ \ No newline at end of file +#endif /* TR_MJIT_Interop */ diff --git a/runtime/compiler/microjit/MethodBytecodeMetaData.hpp b/runtime/compiler/microjit/MethodBytecodeMetaData.hpp index 6d5eaf866ec..347b67e69aa 100644 --- a/runtime/compiler/microjit/MethodBytecodeMetaData.hpp +++ b/runtime/compiler/microjit/MethodBytecodeMetaData.hpp @@ -26,44 +26,45 @@ #include "microjit/ByteCodeIterator.hpp" #include "microjit/SideTables.hpp" - -namespace MJIT { +namespace MJIT +{ #if defined(J9VM_OPT_MJIT_Standalone) -class InternalMetaData{ -public: - MJIT::LocalTable _localTable; - InternalMetaData(MJIT::LocalTable table) - :_localTable(table) - {} -}; +class InternalMetaData + { + public: + MJIT::LocalTable _localTable; + InternalMetaData(MJIT::LocalTable table) + :_localTable(table) + {} + }; #else -class InternalMetaData{ -public: - MJIT::LocalTable _localTable; - TR::CFG *_cfg; - uint16_t _maxCalleeArgSize; - InternalMetaData(MJIT::LocalTable table, TR::CFG *cfg, uint16_t maxCalleeArgSize) - :_localTable(table) - ,_cfg(cfg) - ,_maxCalleeArgSize(maxCalleeArgSize) - {} -}; +class InternalMetaData + { + public: + MJIT::LocalTable _localTable; + TR::CFG *_cfg; + uint16_t _maxCalleeArgSize; + InternalMetaData(MJIT::LocalTable table, TR::CFG *cfg, uint16_t maxCalleeArgSize) + :_localTable(table) + ,_cfg(cfg) + ,_maxCalleeArgSize(maxCalleeArgSize) + {} + }; #endif MJIT::InternalMetaData createInternalMethodMetadata( - MJIT::ByteCodeIterator *bci, - MJIT::LocalTableEntry *localTableEntries, - U_16 entries, - int32_t offsetToFirstLocal, - TR_ResolvedMethod *compilee, - TR::Compilation *comp, - ParamTable* paramTable, - uint8_t pointerSize, - bool *resolvedAllCallees -); + MJIT::ByteCodeIterator *bci, + MJIT::LocalTableEntry *localTableEntries, + U_16 entries, + int32_t offsetToFirstLocal, + TR_ResolvedMethod *compilee, + TR::Compilation *comp, + ParamTable* paramTable, + uint8_t pointerSize, + bool *resolvedAllCallees); -} +} // namespace MJIT -#endif /* MJIT_METHOD_BYTECODE_META_DATA */ \ No newline at end of file +#endif /* MJIT_METHOD_BYTECODE_META_DATA */ diff --git a/runtime/compiler/microjit/SideTables.hpp b/runtime/compiler/microjit/SideTables.hpp index ff2b60ddb0a..12c9c97c65f 100644 --- a/runtime/compiler/microjit/SideTables.hpp +++ b/runtime/compiler/microjit/SideTables.hpp @@ -36,303 +36,366 @@ | Float Return value registers | Float Preserved registers | Float Argument registers | XMM0 | XMM8-XMM15 | XMM0-XMM7 */ -namespace MJIT { +namespace MJIT +{ -struct RegisterStack { - unsigned char useRAX : 2; - unsigned char useRSI : 2; - unsigned char useRDX : 2; - unsigned char useRCX : 2; - unsigned char useXMM0 : 2; - unsigned char useXMM1 : 2; - unsigned char useXMM2 : 2; - unsigned char useXMM3 : 2; - unsigned char useXMM4 : 2; - unsigned char useXMM5 : 2; - unsigned char useXMM6 : 2; - unsigned char useXMM7 : 2; - unsigned char stackSlotsUsed : 8; -}; +struct RegisterStack + { + unsigned char useRAX : 2; + unsigned char useRSI : 2; + unsigned char useRDX : 2; + unsigned char useRCX : 2; + unsigned char useXMM0 : 2; + unsigned char useXMM1 : 2; + unsigned char useXMM2 : 2; + unsigned char useXMM3 : 2; + unsigned char useXMM4 : 2; + unsigned char useXMM5 : 2; + unsigned char useXMM6 : 2; + unsigned char useXMM7 : 2; + unsigned char stackSlotsUsed : 8; + }; inline U_16 calculateOffset(RegisterStack *stack) -{ - return 8*(stack->useRAX + - stack->useRSI + - stack->useRDX + - stack->useRCX + - stack->useXMM0 + - stack->useXMM1 + - stack->useXMM2 + - stack->useXMM3 + - stack->useXMM4 + - stack->useXMM5 + - stack->useXMM6 + - stack->useXMM7 + - stack->stackSlotsUsed); -} + { + return 8*(stack->useRAX + + stack->useRSI + + stack->useRDX + + stack->useRCX + + stack->useXMM0 + + stack->useXMM1 + + stack->useXMM2 + + stack->useXMM3 + + stack->useXMM4 + + stack->useXMM5 + + stack->useXMM6 + + stack->useXMM7 + + stack->stackSlotsUsed); + } inline void initParamStack(RegisterStack *stack) -{ - stack->useRAX = 0; - stack->useRSI = 0; - stack->useRDX = 0; - stack->useRCX = 0; - stack->useXMM0 = 0; - stack->useXMM1 = 0; - stack->useXMM2 = 0; - stack->useXMM3 = 0; - stack->useXMM4 = 0; - stack->useXMM5 = 0; - stack->useXMM6 = 0; - stack->useXMM7 = 0; - stack->stackSlotsUsed = 0; -} + { + stack->useRAX = 0; + stack->useRSI = 0; + stack->useRDX = 0; + stack->useRCX = 0; + stack->useXMM0 = 0; + stack->useXMM1 = 0; + stack->useXMM2 = 0; + stack->useXMM3 = 0; + stack->useXMM4 = 0; + stack->useXMM5 = 0; + stack->useXMM6 = 0; + stack->useXMM7 = 0; + stack->stackSlotsUsed = 0; + } inline int addParamIntToStack(RegisterStack *stack, U_16 size) -{ - if(!stack->useRAX){ - stack->useRAX = size/8; - return TR::RealRegister::eax; - } else if(!stack->useRSI){ - stack->useRSI = size/8; - return TR::RealRegister::esi; - } else if(!stack->useRDX){ - stack->useRDX = size/8; - return TR::RealRegister::edx; - } else if(!stack->useRCX){ - stack->useRCX = size/8; - return TR::RealRegister::ecx; - } else { - stack->stackSlotsUsed += size/8; - return TR::RealRegister::NoReg; - } -} + { + if (!stack->useRAX) + { + stack->useRAX = size/8; + return TR::RealRegister::eax; + } + else if (!stack->useRSI) + { + stack->useRSI = size/8; + return TR::RealRegister::esi; + } + else if (!stack->useRDX) + { + stack->useRDX = size/8; + return TR::RealRegister::edx; + } + else if (!stack->useRCX) + { + stack->useRCX = size/8; + return TR::RealRegister::ecx; + } + else + { + stack->stackSlotsUsed += size/8; + return TR::RealRegister::NoReg; + } + } inline int addParamFloatToStack(RegisterStack *stack, U_16 size) -{ - if(!stack->useXMM0){ - stack->useXMM0 = size/8; - return TR::RealRegister::xmm0; - } else if(!stack->useXMM1){ - stack->useXMM1 = size/8; - return TR::RealRegister::xmm1; - } else if(!stack->useXMM2){ - stack->useXMM2 = size/8; - return TR::RealRegister::xmm2; - } else if(!stack->useXMM3){ - stack->useXMM3 = size/8; - return TR::RealRegister::xmm3; - } else if(!stack->useXMM4){ - stack->useXMM4 = size/8; - return TR::RealRegister::xmm4; - } else if(!stack->useXMM5){ - stack->useXMM5 = size/8; - return TR::RealRegister::xmm5; - } else if(!stack->useXMM6){ - stack->useXMM6 = size/8; - return TR::RealRegister::xmm6; - } else if(!stack->useXMM7){ - stack->useXMM7 = size/8; - return TR::RealRegister::xmm7; - } else { - stack->stackSlotsUsed += size/8; - return TR::RealRegister::NoReg; - } -} + { + if (!stack->useXMM0) + { + stack->useXMM0 = size/8; + return TR::RealRegister::xmm0; + } + else if (!stack->useXMM1) + { + stack->useXMM1 = size/8; + return TR::RealRegister::xmm1; + } + else if (!stack->useXMM2) + { + stack->useXMM2 = size/8; + return TR::RealRegister::xmm2; + } + else if (!stack->useXMM3) + { + stack->useXMM3 = size/8; + return TR::RealRegister::xmm3; + } + else if (!stack->useXMM4) + { + stack->useXMM4 = size/8; + return TR::RealRegister::xmm4; + } + else if (!stack->useXMM5) + { + stack->useXMM5 = size/8; + return TR::RealRegister::xmm5; + } + else if (!stack->useXMM6) + { + stack->useXMM6 = size/8; + return TR::RealRegister::xmm6; + } + else if (!stack->useXMM7) + { + stack->useXMM7 = size/8; + return TR::RealRegister::xmm7; + } + else + { + stack->stackSlotsUsed += size/8; + return TR::RealRegister::NoReg; + } + } inline int removeParamIntFromStack(RegisterStack *stack, U_16 *size) -{ - if(stack->useRAX){ - *size = stack->useRAX*8; - stack->useRAX = 0; - return TR::RealRegister::eax; - } else if(stack->useRSI){ - *size = stack->useRSI*8; - stack->useRSI = 0; - return TR::RealRegister::esi; - } else if(stack->useRDX){ - *size = stack->useRDX*8; - stack->useRDX = 0; - return TR::RealRegister::edx; - } else if(stack->useRCX){ - *size = stack->useRCX*8; - stack->useRCX = 0; - return TR::RealRegister::ecx; - } else { - *size = 0; - return TR::RealRegister::NoReg; - } -} + { + if (stack->useRAX) + { + *size = stack->useRAX*8; + stack->useRAX = 0; + return TR::RealRegister::eax; + } + else if (stack->useRSI) + { + *size = stack->useRSI*8; + stack->useRSI = 0; + return TR::RealRegister::esi; + } + else if (stack->useRDX) + { + *size = stack->useRDX*8; + stack->useRDX = 0; + return TR::RealRegister::edx; + } + else if (stack->useRCX) + { + *size = stack->useRCX*8; + stack->useRCX = 0; + return TR::RealRegister::ecx; + } + else + { + *size = 0; + return TR::RealRegister::NoReg; + } + } inline int removeParamFloatFromStack(RegisterStack *stack, U_16 *size) -{ - if(stack->useXMM0){ - *size = stack->useXMM0*8; - stack->useXMM0 = 0; - return TR::RealRegister::xmm0; - } else if(stack->useXMM1){ - *size = stack->useXMM1*8; - stack->useXMM1 = 0; - return TR::RealRegister::xmm1; - } else if(stack->useXMM2){ - *size = stack->useXMM2*8; - stack->useXMM2 = 0; - return TR::RealRegister::xmm2; - } else if(stack->useXMM3){ - *size = stack->useXMM3*8; - stack->useXMM3 = 0; - return TR::RealRegister::xmm3; - } else if(stack->useXMM4){ - *size = stack->useXMM4*8; - stack->useXMM4 = 0; - return TR::RealRegister::xmm4; - } else if(stack->useXMM5){ - *size = stack->useXMM5*8; - stack->useXMM5 = 0; - return TR::RealRegister::xmm5; - } else if(stack->useXMM6){ - *size = stack->useXMM6*8; - stack->useXMM6 = 0; - return TR::RealRegister::xmm6; - } else if(stack->useXMM7){ - *size = stack->useXMM7*8; - stack->useXMM7 = 0; - return TR::RealRegister::xmm7; - } else { - *size = 0; - return TR::RealRegister::NoReg; - } -} + { + if (stack->useXMM0) + { + *size = stack->useXMM0*8; + stack->useXMM0 = 0; + return TR::RealRegister::xmm0; + } + else if (stack->useXMM1) + { + *size = stack->useXMM1*8; + stack->useXMM1 = 0; + return TR::RealRegister::xmm1; + } + else if (stack->useXMM2) + { + *size = stack->useXMM2*8; + stack->useXMM2 = 0; + return TR::RealRegister::xmm2; + } + else if (stack->useXMM3) + { + *size = stack->useXMM3*8; + stack->useXMM3 = 0; + return TR::RealRegister::xmm3; + } + else if (stack->useXMM4) + { + *size = stack->useXMM4*8; + stack->useXMM4 = 0; + return TR::RealRegister::xmm4; + } + else if (stack->useXMM5) + { + *size = stack->useXMM5*8; + stack->useXMM5 = 0; + return TR::RealRegister::xmm5; + } + else if (stack->useXMM6) + { + *size = stack->useXMM6*8; + stack->useXMM6 = 0; + return TR::RealRegister::xmm6; + } + else if (stack->useXMM7) + { + *size = stack->useXMM7*8; + stack->useXMM7 = 0; + return TR::RealRegister::xmm7; + } + else + { + *size = 0; + return TR::RealRegister::NoReg; + } + } -struct ParamTableEntry { - int32_t offset; - int32_t regNo; - int32_t gcMapOffset; - //Stored here so we can look up when saving to local array - char slots; - char size; - bool isReference; - bool onStack; - bool notInitialized; -}; +struct ParamTableEntry + { + int32_t offset; + int32_t regNo; + int32_t gcMapOffset; + // Stored here so we can look up when saving to local array + char slots; + char size; + bool isReference; + bool onStack; + bool notInitialized; + }; inline MJIT::ParamTableEntry -initializeParamTableEntry(){ - MJIT::ParamTableEntry entry; - entry.regNo = TR::RealRegister::NoReg; - entry.offset = 0; - entry.gcMapOffset = 0; - entry.onStack = false; - entry.slots = 0; - entry.size = 0; - entry.isReference = false; - entry.notInitialized = true; - return entry; -} +initializeParamTableEntry() + { + MJIT::ParamTableEntry entry; + entry.regNo = TR::RealRegister::NoReg; + entry.offset = 0; + entry.gcMapOffset = 0; + entry.onStack = false; + entry.slots = 0; + entry.size = 0; + entry.isReference = false; + entry.notInitialized = true; + return entry; + } // This function is only used for paramaters inline ParamTableEntry -makeRegisterEntry(int32_t regNo, int32_t stackOffset, uint16_t size, uint16_t slots, bool isRef){ - ParamTableEntry entry; - entry.regNo = regNo; - entry.offset = stackOffset; - entry.gcMapOffset = stackOffset; - entry.onStack = false; - entry.slots = slots; - entry.size = size; - entry.isReference = isRef; - entry.notInitialized = false; - return entry; -} +makeRegisterEntry(int32_t regNo, int32_t stackOffset, uint16_t size, uint16_t slots, bool isRef) + { + ParamTableEntry entry; + entry.regNo = regNo; + entry.offset = stackOffset; + entry.gcMapOffset = stackOffset; + entry.onStack = false; + entry.slots = slots; + entry.size = size; + entry.isReference = isRef; + entry.notInitialized = false; + return entry; + } // This function is only used for paramaters inline ParamTableEntry -makeStackEntry(int32_t stackOffset, uint16_t size, uint16_t slots, bool isRef){ - ParamTableEntry entry; - entry.regNo = TR::RealRegister::NoReg; - entry.offset = stackOffset; - entry.gcMapOffset = stackOffset; - entry.onStack = true; - entry.slots = slots; - entry.size = size; - entry.isReference = isRef; - entry.notInitialized = false; - return entry; -} +makeStackEntry(int32_t stackOffset, uint16_t size, uint16_t slots, bool isRef) + { + ParamTableEntry entry; + entry.regNo = TR::RealRegister::NoReg; + entry.offset = stackOffset; + entry.gcMapOffset = stackOffset; + entry.onStack = true; + entry.slots = slots; + entry.size = size; + entry.isReference = isRef; + entry.notInitialized = false; + return entry; + } class ParamTable -{ - private: - ParamTableEntry *_tableEntries; - U_16 _paramCount; - U_16 _actualParamCount; - RegisterStack *_registerStack; - public: - ParamTable(ParamTableEntry*, uint16_t, uint16_t, RegisterStack*); - bool getEntry(uint16_t, ParamTableEntry*); - bool setEntry(uint16_t, ParamTableEntry*); - uint16_t getTotalParamSize(); - uint16_t getParamCount(); - uint16_t getActualParamCount(); -}; + { + + private: + ParamTableEntry *_tableEntries; + U_16 _paramCount; + U_16 _actualParamCount; + RegisterStack *_registerStack; + + public: + ParamTable(ParamTableEntry*, uint16_t, uint16_t, RegisterStack*); + bool getEntry(uint16_t, ParamTableEntry*); + bool setEntry(uint16_t, ParamTableEntry*); + uint16_t getTotalParamSize(); + uint16_t getParamCount(); + uint16_t getActualParamCount(); + }; using LocalTableEntry=ParamTableEntry; inline LocalTableEntry makeLocalTableEntry(int32_t gcMapOffset, uint16_t size, uint16_t slots, bool isRef) -{ - LocalTableEntry entry; - entry.regNo = TR::RealRegister::NoReg; - entry.offset = -1; //This field isn't used for locals - entry.gcMapOffset = gcMapOffset; - entry.onStack = true; - entry.slots = slots; - entry.size = size; - entry.isReference = isRef; - entry.notInitialized = false; - return entry; -} + { + LocalTableEntry entry; + entry.regNo = TR::RealRegister::NoReg; + entry.offset = -1; // This field isn't used for locals + entry.gcMapOffset = gcMapOffset; + entry.onStack = true; + entry.slots = slots; + entry.size = size; + entry.isReference = isRef; + entry.notInitialized = false; + return entry; + } inline LocalTableEntry -initializeLocalTableEntry(){ - ParamTableEntry entry; - entry.regNo = TR::RealRegister::NoReg; - entry.offset = 0; - entry.gcMapOffset = 0; - entry.onStack = false; - entry.slots = 0; - entry.size = 0; - entry.isReference = false; - entry.notInitialized = true; - return entry; -} +initializeLocalTableEntry() + { + ParamTableEntry entry; + entry.regNo = TR::RealRegister::NoReg; + entry.offset = 0; + entry.gcMapOffset = 0; + entry.onStack = false; + entry.slots = 0; + entry.size = 0; + entry.isReference = false; + entry.notInitialized = true; + return entry; + } class LocalTable -{ - private: - LocalTableEntry *_tableEntries; - uint16_t _localCount; - public: - LocalTable(LocalTableEntry*, uint16_t); - bool getEntry(uint16_t, LocalTableEntry*); - uint16_t getTotalLocalSize(); - uint16_t getLocalCount(); -}; + { + + private: + LocalTableEntry *_tableEntries; + uint16_t _localCount; -struct JumpTableEntry { - uint32_t byteCodeIndex; - char * codeCacheAddress; - JumpTableEntry() {} - JumpTableEntry(uint32_t bci, char* cca) : byteCodeIndex(bci), codeCacheAddress(cca) {} -}; + public: + LocalTable(LocalTableEntry*, uint16_t); + bool getEntry(uint16_t, LocalTableEntry*); + uint16_t getTotalLocalSize(); + uint16_t getLocalCount(); + }; +struct JumpTableEntry + { + uint32_t byteCodeIndex; + char *codeCacheAddress; + JumpTableEntry() {} + JumpTableEntry(uint32_t bci, char* cca) : byteCodeIndex(bci), codeCacheAddress(cca) {} + }; } #endif /* MJIT_SIDETABLE_HPP */ diff --git a/runtime/compiler/microjit/utils.hpp b/runtime/compiler/microjit/utils.hpp index c0296c3742f..e4d9b42a665 100644 --- a/runtime/compiler/microjit/utils.hpp +++ b/runtime/compiler/microjit/utils.hpp @@ -26,14 +26,17 @@ #include #include "env/IO.hpp" -inline void MJIT_ASSERT(TR::FilePointer *filePtr, bool condition, const char * const error_string){ - if(!condition){ - trfprintf(filePtr, "%s", error_string); - assert(condition); - } -} +inline void MJIT_ASSERT(TR::FilePointer *filePtr, bool condition, const char * const error_string) + { + if (!condition) + { + trfprintf(filePtr, "%s", error_string); + assert(condition); + } + } -inline void MJIT_ASSERT_NO_MSG(bool condition){ - assert(condition); -} +inline void MJIT_ASSERT_NO_MSG(bool condition) + { + assert(condition); + } #endif /* MJIT_UTILS_HPP */ diff --git a/runtime/compiler/microjit/x/CMakeLists.txt b/runtime/compiler/microjit/x/CMakeLists.txt index 2e9e18cb0b9..c0933947552 100644 --- a/runtime/compiler/microjit/x/CMakeLists.txt +++ b/runtime/compiler/microjit/x/CMakeLists.txt @@ -21,4 +21,4 @@ ################################################################################ if(TR_HOST_BITS STREQUAL 64) add_subdirectory(amd64) -endif() \ No newline at end of file +endif() diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp index 11b54199989..12a26ed959e 100755 --- a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp @@ -19,9 +19,12 @@ * * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception *******************************************************************************/ +#include +#include +#include +#include "j9comp.h" #include "j9.h" #include "env/VMJ9.h" - #include "il/Block.hpp" #include "infra/Cfg.hpp" #include "compile/Compilation.hpp" @@ -41,8 +44,8 @@ #define BIT_MASK_32 0xffffffff #define BIT_MASK_64 0xffffffffffffffff -//Debug Params, comment/uncomment undef to get omit/enable debugging info -//#define MJIT_DEBUG 1 +// Debug Params, comment/uncomment undef to get omit/enable debugging info +// #define MJIT_DEBUG 1 #undef MJIT_DEBUG #ifdef MJIT_DEBUG @@ -60,46 +63,44 @@ #define STACKCHECKBUFFER 512 -#include "j9comp.h" extern J9_CDATA char * const JavaBCNames[]; -#include -#include -#include - -#define COPY_TEMPLATE(buffer, INSTRUCTION, size) do {\ - memcpy(buffer, INSTRUCTION, INSTRUCTION##Size);\ - buffer=(buffer+INSTRUCTION##Size);\ - size += INSTRUCTION##Size;\ -} while(0) - -#define PATCH_RELATIVE_ADDR_8_BIT(buffer, absAddress) do {\ - intptr_t relativeAddress = (intptr_t)(buffer);\ - intptr_t minAddress = (relativeAddress < (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress);\ - intptr_t maxAddress = (relativeAddress > (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress);\ - intptr_t absDistance = maxAddress - minAddress;\ - MJIT_ASSERT_NO_MSG(absDistance < (intptr_t)0x00000000000000ff);\ - relativeAddress = (relativeAddress < (intptr_t)(absAddress)) ? absDistance : (-1*(intptr_t)(absDistance));\ - patchImm1(buffer, (U_32)(relativeAddress & 0x00000000000000ff));\ -} while(0) - -//TODO: Find out how to jump to trampolines for far calls -#define PATCH_RELATIVE_ADDR_32_BIT(buffer, absAddress) do {\ - intptr_t relativeAddress = (intptr_t)(buffer);\ - intptr_t minAddress = (relativeAddress < (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress);\ - intptr_t maxAddress = (relativeAddress > (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress);\ - intptr_t absDistance = maxAddress - minAddress;\ - MJIT_ASSERT_NO_MSG(absDistance < (intptr_t)0x00000000ffffffff);\ - relativeAddress = (relativeAddress < (intptr_t)(absAddress)) ? absDistance : (-1*(intptr_t)(absDistance));\ - patchImm4(buffer, (U_32)(relativeAddress & 0x00000000ffffffff));\ -} while(0) +#define COPY_TEMPLATE(buffer, INSTRUCTION, size) \ + do { \ + memcpy(buffer, INSTRUCTION, INSTRUCTION##Size); \ + buffer=(buffer+INSTRUCTION##Size); \ + size += INSTRUCTION##Size; \ + } while (0) + +#define PATCH_RELATIVE_ADDR_8_BIT(buffer, absAddress) \ + do { \ + intptr_t relativeAddress = (intptr_t)(buffer); \ + intptr_t minAddress = (relativeAddress < (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress); \ + intptr_t maxAddress = (relativeAddress > (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress); \ + intptr_t absDistance = maxAddress - minAddress; \ + MJIT_ASSERT_NO_MSG(absDistance < (intptr_t)0x00000000000000ff); \ + relativeAddress = (relativeAddress < (intptr_t)(absAddress)) ? absDistance : (-1*(intptr_t)(absDistance)); \ + patchImm1(buffer, (U_32)(relativeAddress & 0x00000000000000ff)); \ + } while (0) + +// TODO: Find out how to jump to trampolines for far calls +#define PATCH_RELATIVE_ADDR_32_BIT(buffer, absAddress) \ + do { \ + intptr_t relativeAddress = (intptr_t)(buffer); \ + intptr_t minAddress = (relativeAddress < (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress); \ + intptr_t maxAddress = (relativeAddress > (intptr_t)(absAddress)) ? relativeAddress : (intptr_t)(absAddress); \ + intptr_t absDistance = maxAddress - minAddress; \ + MJIT_ASSERT_NO_MSG(absDistance < (intptr_t)0x00000000ffffffff); \ + relativeAddress = (relativeAddress < (intptr_t)(absAddress)) ? absDistance : (-1*(intptr_t)(absDistance)); \ + patchImm4(buffer, (U_32)(relativeAddress & 0x00000000ffffffff)); \ + } while (0) // Using char[] to signify that we are treating this // like data, even though they are instructions. #define EXTERN_TEMPLATE(TEMPLATE_NAME) extern char TEMPLATE_NAME[] #define EXTERN_TEMPLATE_SIZE(TEMPLATE_NAME) extern unsigned short TEMPLATE_NAME##Size -#define DECLARE_TEMPLATE(TEMPLATE_NAME) EXTERN_TEMPLATE(TEMPLATE_NAME);\ - EXTERN_TEMPLATE_SIZE(TEMPLATE_NAME) +#define DECLARE_TEMPLATE(TEMPLATE_NAME) EXTERN_TEMPLATE(TEMPLATE_NAME); \ + EXTERN_TEMPLATE_SIZE(TEMPLATE_NAME) // Labels linked from templates. DECLARE_TEMPLATE(movRSPR10); @@ -397,127 +398,135 @@ DECLARE_TEMPLATE(addrGetFieldTemplate); static void debug_print_hex(char *buffer, unsigned long long size) -{ - printf("Start of dump:\n"); - while(size--){ - printf("%x ", 0xc0 & *(buffer++)); - } - printf("\nEnd of dump:"); -} - -inline uint32_t align(uint32_t number, uint32_t requirement, TR::FilePointer *fp) -{ - MJIT_ASSERT(fp, requirement && ((requirement & (requirement -1)) == 0), "INCORRECT ALIGNMENT"); - return (number + requirement - 1) & ~(requirement - 1); -} + { + printf("Start of dump:\n"); + while (size--) + printf("%x ", 0xc0 & *(buffer++)); + printf("\nEnd of dump:"); + } + +inline uint32_t +align(uint32_t number, uint32_t requirement, TR::FilePointer *fp) + { + MJIT_ASSERT(fp, requirement && ((requirement & (requirement -1)) == 0), "INCORRECT ALIGNMENT"); + return (number + requirement - 1) & ~(requirement - 1); + } static bool getRequiredAlignment(uintptr_t cursor, uintptr_t boundary, uintptr_t margin, uintptr_t *alignment) -{ - if((boundary & (boundary-1)) != 0) - return true; - *alignment = (-cursor - margin) & (boundary-1); - return false; -} + { + if ((boundary & (boundary-1)) != 0) + return true; + *alignment = (-cursor - margin) & (boundary-1); + return false; + } static intptr_t -getHelperOrTrampolineAddress(TR_RuntimeHelper h, uintptr_t callsite){ - uintptr_t helperAddress = (uintptr_t)runtimeHelperValue(h); - uintptr_t minAddress = (callsite < (uintptr_t)(helperAddress)) ? callsite : (uintptr_t)(helperAddress); - uintptr_t maxAddress = (callsite > (uintptr_t)(helperAddress)) ? callsite : (uintptr_t)(helperAddress); - uintptr_t distance = maxAddress - minAddress; - if(distance > 0x00000000ffffffff){ - helperAddress = (uintptr_t)TR::CodeCacheManager::instance()->findHelperTrampoline(h, (void *)callsite); - } - return helperAddress; -} +getHelperOrTrampolineAddress(TR_RuntimeHelper h, uintptr_t callsite) + { + uintptr_t helperAddress = (uintptr_t)runtimeHelperValue(h); + uintptr_t minAddress = (callsite < (uintptr_t)(helperAddress)) ? callsite : (uintptr_t)(helperAddress); + uintptr_t maxAddress = (callsite > (uintptr_t)(helperAddress)) ? callsite : (uintptr_t)(helperAddress); + uintptr_t distance = maxAddress - minAddress; + if (distance > 0x00000000ffffffff) + helperAddress = (uintptr_t)TR::CodeCacheManager::instance()->findHelperTrampoline(h, (void *)callsite); + return helperAddress; + } bool MJIT::nativeSignature(J9Method *method, char *resultBuffer) -{ - J9UTF8 *methodSig; - UDATA arg; - U_16 i, ch; - BOOLEAN parsingReturnType = FALSE, processingBracket = FALSE; - char nextType = '\0'; - - methodSig = J9ROMMETHOD_SIGNATURE(J9_ROM_METHOD_FROM_RAM_METHOD(method)); - i = 0; - arg = 3; /* skip the return type slot and JNI standard slots, they will be filled in later. */ - - while(i < J9UTF8_LENGTH(methodSig)) { - ch = J9UTF8_DATA(methodSig)[i++]; - switch(ch) { - case '(': /* start of signature -- skip */ - continue; - case ')': /* End of signature -- done args, find return type */ - parsingReturnType = TRUE; - continue; - case MJIT::CLASSNAME_TYPE_CHARACTER: - nextType = MJIT::CLASSNAME_TYPE_CHARACTER; - while(J9UTF8_DATA(methodSig)[i++] != ';') {} /* a type string - loop scanning for ';' to end it - i points past ';' when done loop */ - break; - case MJIT::BOOLEAN_TYPE_CHARACTER: - nextType = MJIT::BOOLEAN_TYPE_CHARACTER; - break; - case MJIT::BYTE_TYPE_CHARACTER: - nextType = MJIT::BYTE_TYPE_CHARACTER; - break; - case MJIT::CHAR_TYPE_CHARACTER: - nextType = MJIT::CHAR_TYPE_CHARACTER; - break; - case MJIT::SHORT_TYPE_CHARACTER: - nextType = MJIT::SHORT_TYPE_CHARACTER; - break; - case MJIT::INT_TYPE_CHARACTER: - nextType = MJIT::INT_TYPE_CHARACTER; - break; - case MJIT::LONG_TYPE_CHARACTER: - nextType = MJIT::LONG_TYPE_CHARACTER; - break; - case MJIT::FLOAT_TYPE_CHARACTER: - nextType = MJIT::FLOAT_TYPE_CHARACTER; - break; - case MJIT::DOUBLE_TYPE_CHARACTER: - nextType = MJIT::DOUBLE_TYPE_CHARACTER; - break; - case '[': - processingBracket = TRUE; - continue; /* go back to top of loop for next char */ - case MJIT::VOID_TYPE_CHARACTER: - if(!parsingReturnType) { - return true; - } - nextType = MJIT::VOID_TYPE_CHARACTER; - break; - - default: - nextType = '\0'; - return true; - } - if(processingBracket) { - if(parsingReturnType) { - resultBuffer[0] = MJIT::CLASSNAME_TYPE_CHARACTER; - break; /* from the while loop */ - } else { - resultBuffer[arg] = MJIT::CLASSNAME_TYPE_CHARACTER; - arg++; - processingBracket = FALSE; - } - } else if(parsingReturnType) { - resultBuffer[0] = nextType; + { + J9UTF8 *methodSig; + UDATA arg; + U_16 i, ch; + BOOLEAN parsingReturnType = FALSE, processingBracket = FALSE; + char nextType = '\0'; + + methodSig = J9ROMMETHOD_SIGNATURE(J9_ROM_METHOD_FROM_RAM_METHOD(method)); + i = 0; + arg = 3; /* skip the return type slot and JNI standard slots, they will be filled in later. */ + + while (i < J9UTF8_LENGTH(methodSig)) + { + ch = J9UTF8_DATA(methodSig)[i++]; + switch (ch) + { + case '(': /* start of signature -- skip */ + continue; + case ')': /* End of signature -- done args, find return type */ + parsingReturnType = TRUE; + continue; + case MJIT::CLASSNAME_TYPE_CHARACTER: + nextType = MJIT::CLASSNAME_TYPE_CHARACTER; + while(J9UTF8_DATA(methodSig)[i++] != ';') {} /* a type string - loop scanning for ';' to end it - i points past ';' when done loop */ + break; + case MJIT::BOOLEAN_TYPE_CHARACTER: + nextType = MJIT::BOOLEAN_TYPE_CHARACTER; + break; + case MJIT::BYTE_TYPE_CHARACTER: + nextType = MJIT::BYTE_TYPE_CHARACTER; + break; + case MJIT::CHAR_TYPE_CHARACTER: + nextType = MJIT::CHAR_TYPE_CHARACTER; + break; + case MJIT::SHORT_TYPE_CHARACTER: + nextType = MJIT::SHORT_TYPE_CHARACTER; + break; + case MJIT::INT_TYPE_CHARACTER: + nextType = MJIT::INT_TYPE_CHARACTER; + break; + case MJIT::LONG_TYPE_CHARACTER: + nextType = MJIT::LONG_TYPE_CHARACTER; + break; + case MJIT::FLOAT_TYPE_CHARACTER: + nextType = MJIT::FLOAT_TYPE_CHARACTER; + break; + case MJIT::DOUBLE_TYPE_CHARACTER: + nextType = MJIT::DOUBLE_TYPE_CHARACTER; + break; + case '[': + processingBracket = TRUE; + continue; /* go back to top of loop for next char */ + case MJIT::VOID_TYPE_CHARACTER: + if (!parsingReturnType) + return true; + nextType = MJIT::VOID_TYPE_CHARACTER; + break; + default: + nextType = '\0'; + return true; + } + if (processingBracket) + { + if (parsingReturnType) + { + resultBuffer[0] = MJIT::CLASSNAME_TYPE_CHARACTER; break; /* from the while loop */ - } else { - resultBuffer[arg] = nextType; + } + else + { + resultBuffer[arg] = MJIT::CLASSNAME_TYPE_CHARACTER; arg++; - } - } - - resultBuffer[1] = MJIT::CLASSNAME_TYPE_CHARACTER; /* the JNIEnv */ - resultBuffer[2] = MJIT::CLASSNAME_TYPE_CHARACTER; /* the jobject or jclass */ - resultBuffer[arg] = '\0'; - return false; -} + processingBracket = FALSE; + } + } + else if (parsingReturnType) + { + resultBuffer[0] = nextType; + break; /* from the while loop */ + } + else + { + resultBuffer[arg] = nextType; + arg++; + } + } + + resultBuffer[1] = MJIT::CLASSNAME_TYPE_CHARACTER; /* the JNIEnv */ + resultBuffer[2] = MJIT::CLASSNAME_TYPE_CHARACTER; /* the jobject or jclass */ + resultBuffer[arg] = '\0'; + return false; + } /** * Assuming that 'buffer' is the end of the immediate value location, @@ -526,9 +535,10 @@ MJIT::nativeSignature(J9Method *method, char *resultBuffer) * Assumes 64-bit value */ static void -patchImm8(char *buffer, U_64 imm){ - *(((unsigned long long int*)buffer)-1) = imm; -} +patchImm8(char *buffer, U_64 imm) + { + *(((unsigned long long int*)buffer)-1) = imm; + } /** * Assuming that 'buffer' is the end of the immediate value location, @@ -537,9 +547,10 @@ patchImm8(char *buffer, U_64 imm){ * Assumes 32-bit value */ static void -patchImm4(char *buffer, U_32 imm){ - *(((unsigned int*)buffer)-1) = imm; -} +patchImm4(char *buffer, U_32 imm) + { + *(((unsigned int*)buffer)-1) = imm; + } /** * Assuming that 'buffer' is the end of the immediate value location, @@ -548,9 +559,10 @@ patchImm4(char *buffer, U_32 imm){ * Assumes 16-bit value */ static void -patchImm2(char *buffer, U_16 imm){ - *(((unsigned short int*)buffer)-1) = imm; -} +patchImm2(char *buffer, U_16 imm) + { + *(((unsigned short int*)buffer)-1) = imm; + } /** * Assuming that 'buffer' is the end of the immediate value location, @@ -559,9 +571,10 @@ patchImm2(char *buffer, U_16 imm){ * Assumes 8-bit value */ static void -patchImm1(char *buffer, U_8 imm){ - *(unsigned char*)(buffer-1) = imm; -} +patchImm1(char *buffer, U_8 imm) + { + *(unsigned char*)(buffer-1) = imm; + } /* | Architecture | Endian | Return address | vTable index | @@ -582,2515 +595,2577 @@ patchImm1(char *buffer, U_8 imm){ inline buffer_size_t savePreserveRegisters(char *buffer, int32_t offset) -{ - buffer_size_t size = 0; - COPY_TEMPLATE(buffer, saveRBXOffset, size); - patchImm4(buffer, offset); - COPY_TEMPLATE(buffer, saveR9Offset, size); - patchImm4(buffer, offset-8); - COPY_TEMPLATE(buffer, saveRSPOffset, size); - patchImm4(buffer, offset-16); - return size; -} + { + buffer_size_t size = 0; + COPY_TEMPLATE(buffer, saveRBXOffset, size); + patchImm4(buffer, offset); + COPY_TEMPLATE(buffer, saveR9Offset, size); + patchImm4(buffer, offset-8); + COPY_TEMPLATE(buffer, saveRSPOffset, size); + patchImm4(buffer, offset-16); + return size; + } inline buffer_size_t loadPreserveRegisters(char *buffer, int32_t offset) -{ - buffer_size_t size = 0; - COPY_TEMPLATE(buffer, loadRBXOffset, size); - patchImm4(buffer, offset); - COPY_TEMPLATE(buffer, loadR9Offset, size); - patchImm4(buffer, offset-8); - COPY_TEMPLATE(buffer, loadRSPOffset, size); - patchImm4(buffer, offset-16); - return size; -} - -#define COPY_IMM1_TEMPLATE(buffer, size, value) do{\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - patchImm1(buffer, (U_8)(value));\ -}while(0) - -#define COPY_IMM2_TEMPLATE(buffer, size, value) do{\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - patchImm2(buffer, (U_16)(value));\ -}while(0) - -#define COPY_IMM4_TEMPLATE(buffer, size, value) do{\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - patchImm4(buffer, (U_32)(value));\ -}while(0) - -#define COPY_IMM8_TEMPLATE(buffer, size, value) do{\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - COPY_TEMPLATE(buffer, nopInstruction, size);\ - patchImm8(buffer, (U_64)(value));\ -}while(0) + { + buffer_size_t size = 0; + COPY_TEMPLATE(buffer, loadRBXOffset, size); + patchImm4(buffer, offset); + COPY_TEMPLATE(buffer, loadR9Offset, size); + patchImm4(buffer, offset-8); + COPY_TEMPLATE(buffer, loadRSPOffset, size); + patchImm4(buffer, offset-16); + return size; + } + +#define COPY_IMM1_TEMPLATE(buffer, size, value) \ + do { \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + patchImm1(buffer, (U_8)(value)); \ + } while (0) + +#define COPY_IMM2_TEMPLATE(buffer, size, value) \ + do { \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + patchImm2(buffer, (U_16)(value)); \ + } while (0) + +#define COPY_IMM4_TEMPLATE(buffer, size, value) \ + do { \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + patchImm4(buffer, (U_32)(value)); \ + } while (0) + +#define COPY_IMM8_TEMPLATE(buffer, size, value) \ + do { \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + COPY_TEMPLATE(buffer, nopInstruction, size); \ + patchImm8(buffer, (U_64)(value)); \ + } while (0) MJIT::RegisterStack MJIT::mapIncomingParams( - char *typeString, - U_16 maxLength, - int *error_code, - ParamTableEntry *paramTable, - U_16 actualParamCount, - TR::FilePointer *logFileFP -){ - MJIT::RegisterStack stack; - initParamStack(&stack); - - uint16_t index = 0; - intptr_t offset = 0; - - for(int i = 0; i= paramIndex) ? false : (paramIndex < _paramCount)); - if(success) - *entry = _tableEntries[paramIndex]; - return success; -} + { + bool success = ((-1 >= paramIndex) ? false : (paramIndex < _paramCount)); + if (success) + *entry = _tableEntries[paramIndex]; + return success; + } bool MJIT::ParamTable::setEntry(U_16 paramIndex, MJIT::ParamTableEntry *entry) -{ - bool success = ((-1 >= paramIndex) ? false : (paramIndex < _paramCount)); - if(success) - _tableEntries[paramIndex] = *entry; - return success; -} + { + bool success = ((-1 >= paramIndex) ? false : (paramIndex < _paramCount)); + if (success) + _tableEntries[paramIndex] = *entry; + return success; + } U_16 MJIT::ParamTable::getTotalParamSize() -{ - return calculateOffset(_registerStack); -} + { + return calculateOffset(_registerStack); + } U_16 MJIT::ParamTable::getParamCount() -{ - return _paramCount; -} + { + return _paramCount; + } U_16 MJIT::ParamTable::getActualParamCount() -{ - return _actualParamCount; -} + { + return _actualParamCount; + } MJIT::LocalTable::LocalTable(LocalTableEntry *tableEntries, U_16 localCount) - : _tableEntries(tableEntries) - , _localCount(localCount) -{} + : _tableEntries(tableEntries) + , _localCount(localCount) + { + + } bool MJIT::LocalTable::getEntry(U_16 localIndex, MJIT::LocalTableEntry *entry) -{ - bool success = ((-1 >= localIndex) ? false : (localIndex < _localCount)); - if(success) - *entry = _tableEntries[localIndex]; - return success; -} + { + bool success = ((-1 >= localIndex) ? false : (localIndex < _localCount)); + if (success) + *entry = _tableEntries[localIndex]; + return success; + } U_16 MJIT::LocalTable::getTotalLocalSize() -{ - U_16 localSize = 0; - for(int i=0; i<_localCount; i++){ - localSize += _tableEntries[i].slots*8; - } - return localSize; -} + { + U_16 localSize = 0; + for (int i=0; i<_localCount; i++) + localSize += _tableEntries[i].slots*8; + return localSize; + } U_16 MJIT::LocalTable::getLocalCount() -{ - return _localCount; -} + { + return _localCount; + } buffer_size_t MJIT::CodeGenerator::generateGCR( - char *buffer, - int32_t initialCount, - J9Method *method, - uintptr_t startPC -){ - // see TR_J9ByteCodeIlGenerator::prependGuardedCountForRecompilation(TR::Block * originalFirstBlock) - // guardBlock: if (!countForRecompile) goto originalFirstBlock; - // bumpCounterBlock: count--; - // if (bodyInfo->count > 0) goto originalFirstBlock; - // callRecompileBlock: call jitRetranslateCallerWithPreparation(j9method, startPC); - // bodyInfo->count=10000 - // goto originalFirstBlock - // bumpCounter: count - // originalFirstBlock: ... - // - buffer_size_t size = 0; - - //MicroJIT either needs its own gaurd, or it needs to not have a guard block - // The following is kept here so that if a new gaurd it - /* - COPY_TEMPLATE(buffer, moveCountAndRecompile, size); - char *guardBlock = buffer; - COPY_TEMPLATE(buffer, checkCountAndRecompile, size); - char *guardBlockJump = buffer; - */ - - //bumpCounterBlock - COPY_TEMPLATE(buffer, loadCounter, size); - char *counterLoader = buffer; - COPY_TEMPLATE(buffer, decrementCounter, size); - char *counterStore = buffer; - COPY_TEMPLATE(buffer, jgCount, size); - char *bumpCounterBlockJump = buffer; - - //callRecompileBlock - COPY_TEMPLATE(buffer, callRetranslateArg1, size); - char *retranslateArg1Patch = buffer; - COPY_TEMPLATE(buffer, callRetranslateArg2, size); - char *retranslateArg2Patch = buffer; - COPY_TEMPLATE(buffer, callRetranslate, size); - char *callSite = buffer; - COPY_TEMPLATE(buffer, setCounter, size); - char *counterSetter = buffer; - COPY_TEMPLATE(buffer, jmpToBody, size); - char *callRecompileBlockJump = buffer; - - //bumpCounter - char *counterLocation = buffer; - COPY_IMM4_TEMPLATE(buffer, size, initialCount); - - uintptr_t jitRetranslateCallerWithPrepHelperAddress = getHelperOrTrampolineAddress(TR_jitRetranslateCallerWithPrep, (uintptr_t)buffer); - - // Find the countForRecompile's address and patch for the guard block. - patchImm8(retranslateArg1Patch, (uintptr_t)(method)); - patchImm8(retranslateArg2Patch, (uintptr_t)(startPC)); - //patchImm8(guardBlock, (uintptr_t)(&(_persistentInfo->_countForRecompile))); - PATCH_RELATIVE_ADDR_32_BIT(callSite, jitRetranslateCallerWithPrepHelperAddress); - patchImm8(counterLoader, (uintptr_t)counterLocation); - patchImm8(counterStore, (uintptr_t)counterLocation); - patchImm8(counterSetter, (uintptr_t)counterLocation); - //PATCH_RELATIVE_ADDR_32_BIT(guardBlockJump, buffer); - PATCH_RELATIVE_ADDR_32_BIT(bumpCounterBlockJump, buffer); - PATCH_RELATIVE_ADDR_32_BIT(callRecompileBlockJump, buffer); - - return size; -} + char *buffer, + int32_t initialCount, + J9Method *method, + uintptr_t startPC) + { + // see TR_J9ByteCodeIlGenerator::prependGuardedCountForRecompilation(TR::Block * originalFirstBlock) + // guardBlock: if (!countForRecompile) goto originalFirstBlock; + // bumpCounterBlock: count--; + // if (bodyInfo->count > 0) goto originalFirstBlock; + // callRecompileBlock: call jitRetranslateCallerWithPreparation(j9method, startPC); + // bodyInfo->count=10000 + // goto originalFirstBlock + // bumpCounter: count + // originalFirstBlock: ... + // + buffer_size_t size = 0; + + // MicroJIT either needs its own guard, or it needs to not have a guard block + // The following is kept here so that if a new guard it + /* + COPY_TEMPLATE(buffer, moveCountAndRecompile, size); + char *guardBlock = buffer; + COPY_TEMPLATE(buffer, checkCountAndRecompile, size); + char *guardBlockJump = buffer; + */ + + // bumpCounterBlock + COPY_TEMPLATE(buffer, loadCounter, size); + char *counterLoader = buffer; + COPY_TEMPLATE(buffer, decrementCounter, size); + char *counterStore = buffer; + COPY_TEMPLATE(buffer, jgCount, size); + char *bumpCounterBlockJump = buffer; + + // callRecompileBlock + COPY_TEMPLATE(buffer, callRetranslateArg1, size); + char *retranslateArg1Patch = buffer; + COPY_TEMPLATE(buffer, callRetranslateArg2, size); + char *retranslateArg2Patch = buffer; + COPY_TEMPLATE(buffer, callRetranslate, size); + char *callSite = buffer; + COPY_TEMPLATE(buffer, setCounter, size); + char *counterSetter = buffer; + COPY_TEMPLATE(buffer, jmpToBody, size); + char *callRecompileBlockJump = buffer; + + // bumpCounter + char *counterLocation = buffer; + COPY_IMM4_TEMPLATE(buffer, size, initialCount); + + uintptr_t jitRetranslateCallerWithPrepHelperAddress = getHelperOrTrampolineAddress(TR_jitRetranslateCallerWithPrep, (uintptr_t)buffer); + + // Find the countForRecompile's address and patch for the guard block. + patchImm8(retranslateArg1Patch, (uintptr_t)(method)); + patchImm8(retranslateArg2Patch, (uintptr_t)(startPC)); + // patchImm8(guardBlock, (uintptr_t)(&(_persistentInfo->_countForRecompile))); + PATCH_RELATIVE_ADDR_32_BIT(callSite, jitRetranslateCallerWithPrepHelperAddress); + patchImm8(counterLoader, (uintptr_t)counterLocation); + patchImm8(counterStore, (uintptr_t)counterLocation); + patchImm8(counterSetter, (uintptr_t)counterLocation); + // PATCH_RELATIVE_ADDR_32_BIT(guardBlockJump, buffer); + PATCH_RELATIVE_ADDR_32_BIT(bumpCounterBlockJump, buffer); + PATCH_RELATIVE_ADDR_32_BIT(callRecompileBlockJump, buffer); + + return size; + } buffer_size_t MJIT::CodeGenerator::saveArgsInLocalArray( - char *buffer, - buffer_size_t stack_alloc_space, - char *typeString -){ - buffer_size_t saveSize = 0; - U_16 slot = 0; - TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; - char typeChar; - int index = 0; - ParamTableEntry entry; - while(index<_paramTable->getParamCount()){ - MJIT_ASSERT(_logFileFP, _paramTable->getEntry(index, &entry), "Bad index for table entry"); - if(entry.notInitialized) { - index++; - continue; - } - if(entry.onStack) break; - int32_t regNum = entry.regNo; - switch(regNum){ - case TR::RealRegister::xmm0: - COPY_TEMPLATE(buffer, saveXMM0Local, saveSize); - goto PatchAndBreak; - case TR::RealRegister::xmm1: - COPY_TEMPLATE(buffer, saveXMM1Local, saveSize); - goto PatchAndBreak; - case TR::RealRegister::xmm2: - COPY_TEMPLATE(buffer, saveXMM2Local, saveSize); - goto PatchAndBreak; - case TR::RealRegister::xmm3: - COPY_TEMPLATE(buffer, saveXMM3Local, saveSize); - goto PatchAndBreak; - case TR::RealRegister::xmm4: - COPY_TEMPLATE(buffer, saveXMM4Local, saveSize); - goto PatchAndBreak; - case TR::RealRegister::xmm5: - COPY_TEMPLATE(buffer, saveXMM5Local, saveSize); - goto PatchAndBreak; - case TR::RealRegister::xmm6: - COPY_TEMPLATE(buffer, saveXMM6Local, saveSize); - goto PatchAndBreak; - case TR::RealRegister::xmm7: - COPY_TEMPLATE(buffer, saveXMM7Local, saveSize); - goto PatchAndBreak; - case TR::RealRegister::eax: - COPY_TEMPLATE(buffer, saveRAXLocal, saveSize); - goto PatchAndBreak; - case TR::RealRegister::esi: - COPY_TEMPLATE(buffer, saveRSILocal, saveSize); - goto PatchAndBreak; - case TR::RealRegister::edx: - COPY_TEMPLATE(buffer, saveRDXLocal, saveSize); - goto PatchAndBreak; - case TR::RealRegister::ecx: - COPY_TEMPLATE(buffer, saveRCXLocal, saveSize); - goto PatchAndBreak; - PatchAndBreak: - patchImm4(buffer, (U_32)((slot*8) & BIT_MASK_32)); - break; - case TR::RealRegister::NoReg: - break; - } - slot += entry.slots; - index += entry.slots; - } - - while(index<_paramTable->getParamCount()){ - MJIT_ASSERT(_logFileFP, _paramTable->getEntry(index, &entry), "Bad index for table entry"); - if(entry.notInitialized) { - index++; - continue; - } - uintptr_t offset = entry.offset; - //Our linkage templates for loading from the stack assume that the offset will always be less than 0xff. - // This is however not always true for valid code. If this becomes a problem in the future we will - // have to add support for larger offsets by creating new templates that use 2 byte offsets from rsp. - // Whether that will be a change to the current tempaltes, or new templates with supporting code - // is a decision yet to be determined. - if((offset + stack_alloc_space) < uintptr_t(0xff)){ - if(_comp->getOption(TR_TraceCG)){ - trfprintf(_logFileFP, "Offset too large, add support for larger offsets"); - } - return -1; - } - COPY_TEMPLATE(buffer, movRSPOffsetR11, saveSize); - patchImm1(buffer, (U_8)(offset+stack_alloc_space)); - COPY_TEMPLATE(buffer, saveR11Local, saveSize); - patchImm4(buffer, (U_32)((slot*8) & BIT_MASK_32)); - slot += entry.slots; - index += entry.slots; - } - return saveSize; -} + char *buffer, + buffer_size_t stack_alloc_space, + char *typeString) + { + buffer_size_t saveSize = 0; + U_16 slot = 0; + TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; + char typeChar; + int index = 0; + ParamTableEntry entry; + while (index<_paramTable->getParamCount()) + { + MJIT_ASSERT(_logFileFP, _paramTable->getEntry(index, &entry), "Bad index for table entry"); + if (entry.notInitialized) + { + index++; + continue; + } + if (entry.onStack) + break; + int32_t regNum = entry.regNo; + switch (regNum) + { + case TR::RealRegister::xmm0: + COPY_TEMPLATE(buffer, saveXMM0Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm1: + COPY_TEMPLATE(buffer, saveXMM1Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm2: + COPY_TEMPLATE(buffer, saveXMM2Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm3: + COPY_TEMPLATE(buffer, saveXMM3Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm4: + COPY_TEMPLATE(buffer, saveXMM4Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm5: + COPY_TEMPLATE(buffer, saveXMM5Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm6: + COPY_TEMPLATE(buffer, saveXMM6Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::xmm7: + COPY_TEMPLATE(buffer, saveXMM7Local, saveSize); + goto PatchAndBreak; + case TR::RealRegister::eax: + COPY_TEMPLATE(buffer, saveRAXLocal, saveSize); + goto PatchAndBreak; + case TR::RealRegister::esi: + COPY_TEMPLATE(buffer, saveRSILocal, saveSize); + goto PatchAndBreak; + case TR::RealRegister::edx: + COPY_TEMPLATE(buffer, saveRDXLocal, saveSize); + goto PatchAndBreak; + case TR::RealRegister::ecx: + COPY_TEMPLATE(buffer, saveRCXLocal, saveSize); + goto PatchAndBreak; + PatchAndBreak: + patchImm4(buffer, (U_32)((slot*8) & BIT_MASK_32)); + break; + case TR::RealRegister::NoReg: + break; + } + slot += entry.slots; + index += entry.slots; + } + + while (index<_paramTable->getParamCount()) + { + MJIT_ASSERT(_logFileFP, _paramTable->getEntry(index, &entry), "Bad index for table entry"); + if (entry.notInitialized) + { + index++; + continue; + } + uintptr_t offset = entry.offset; + // Our linkage templates for loading from the stack assume that the offset will always be less than 0xff. + // This is however not always true for valid code. If this becomes a problem in the future we will + // have to add support for larger offsets by creating new templates that use 2 byte offsets from rsp. + // Whether that will be a change to the current templates, or new templates with supporting code + // is a decision yet to be determined. + if ((offset + stack_alloc_space) < uintptr_t(0xff)) + { + if (_comp->getOption(TR_TraceCG)) + trfprintf(_logFileFP, "Offset too large, add support for larger offsets"); + return -1; + } + COPY_TEMPLATE(buffer, movRSPOffsetR11, saveSize); + patchImm1(buffer, (U_8)(offset+stack_alloc_space)); + COPY_TEMPLATE(buffer, saveR11Local, saveSize); + patchImm4(buffer, (U_32)((slot*8) & BIT_MASK_32)); + slot += entry.slots; + index += entry.slots; + } + return saveSize; + } buffer_size_t MJIT::CodeGenerator::saveArgsOnStack( - char *buffer, - buffer_size_t stack_alloc_space, - ParamTable *paramTable -){ - buffer_size_t saveArgsSize = 0; - U_16 offset = paramTable->getTotalParamSize(); - TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; - ParamTableEntry entry; - for(int i=0; igetParamCount();){ - MJIT_ASSERT(_logFileFP, paramTable->getEntry(i, &entry), "Bad index for table entry"); - if(entry.notInitialized) { - i++; - continue; - } - if(entry.onStack) + char *buffer, + buffer_size_t stack_alloc_space, + ParamTable *paramTable) + { + buffer_size_t saveArgsSize = 0; + U_16 offset = paramTable->getTotalParamSize(); + TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; + ParamTableEntry entry; + for (int i=0; igetParamCount();) + { + MJIT_ASSERT(_logFileFP, paramTable->getEntry(i, &entry), "Bad index for table entry"); + if (entry.notInitialized) + { + i++; + continue; + } + if (entry.onStack) + break; + int32_t regNum = entry.regNo; + if (i != 0) + offset -= entry.size; + switch (regNum) + { + case TR::RealRegister::xmm0: + COPY_TEMPLATE(buffer, saveXMM0Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); break; - int32_t regNum = entry.regNo; - if(i != 0) - offset -= entry.size; - switch(regNum){ - case TR::RealRegister::xmm0: - COPY_TEMPLATE(buffer, saveXMM0Offset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm1: - COPY_TEMPLATE(buffer, saveXMM1Offset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm2: - COPY_TEMPLATE(buffer, saveXMM2Offset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm3: - COPY_TEMPLATE(buffer, saveXMM3Offset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm4: - COPY_TEMPLATE(buffer, saveXMM4Offset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm5: - COPY_TEMPLATE(buffer, saveXMM5Offset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm6: - COPY_TEMPLATE(buffer, saveXMM6Offset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm7: - COPY_TEMPLATE(buffer, saveXMM7Offset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::eax: - COPY_TEMPLATE(buffer, saveRAXOffset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::esi: - COPY_TEMPLATE(buffer, saveRSIOffset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::edx: - COPY_TEMPLATE(buffer, saveRDXOffset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::ecx: - COPY_TEMPLATE(buffer, saveRCXOffset, saveArgsSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::NoReg: - break; - } - i += entry.slots; - } - return saveArgsSize; -} + case TR::RealRegister::xmm1: + COPY_TEMPLATE(buffer, saveXMM1Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm2: + COPY_TEMPLATE(buffer, saveXMM2Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm3: + COPY_TEMPLATE(buffer, saveXMM3Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm4: + COPY_TEMPLATE(buffer, saveXMM4Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm5: + COPY_TEMPLATE(buffer, saveXMM5Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm6: + COPY_TEMPLATE(buffer, saveXMM6Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm7: + COPY_TEMPLATE(buffer, saveXMM7Offset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::eax: + COPY_TEMPLATE(buffer, saveRAXOffset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::esi: + COPY_TEMPLATE(buffer, saveRSIOffset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::edx: + COPY_TEMPLATE(buffer, saveRDXOffset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::ecx: + COPY_TEMPLATE(buffer, saveRCXOffset, saveArgsSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::NoReg: + break; + } + i += entry.slots; + } + return saveArgsSize; + } buffer_size_t MJIT::CodeGenerator::loadArgsFromStack( - char *buffer, - buffer_size_t stack_alloc_space, - ParamTable *paramTable -){ - U_16 offset = paramTable->getTotalParamSize(); - buffer_size_t argLoadSize = 0; - TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; - ParamTableEntry entry; - for(int i=0; igetParamCount();){ - MJIT_ASSERT(_logFileFP, paramTable->getEntry(i, &entry), "Bad index for table entry"); - if(entry.notInitialized) { - i++; - continue; - } - if(entry.onStack) + char *buffer, + buffer_size_t stack_alloc_space, + ParamTable *paramTable) + { + U_16 offset = paramTable->getTotalParamSize(); + buffer_size_t argLoadSize = 0; + TR::RealRegister::RegNum regNum = TR::RealRegister::NoReg; + ParamTableEntry entry; + for (int i=0; igetParamCount();) + { + MJIT_ASSERT(_logFileFP, paramTable->getEntry(i, &entry), "Bad index for table entry"); + if (entry.notInitialized) + { + i++; + continue; + } + if (entry.onStack) + break; + int32_t regNum = entry.regNo; + if (i != 0) + offset -= entry.size; + switch(regNum) + { + case TR::RealRegister::xmm0: + COPY_TEMPLATE(buffer, loadXMM0Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); break; - int32_t regNum = entry.regNo; - if(i != 0) - offset -= entry.size; - switch(regNum){ - case TR::RealRegister::xmm0: - COPY_TEMPLATE(buffer, loadXMM0Offset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm1: - COPY_TEMPLATE(buffer, loadXMM1Offset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm2: - COPY_TEMPLATE(buffer, loadXMM2Offset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm3: - COPY_TEMPLATE(buffer, loadXMM3Offset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm4: - COPY_TEMPLATE(buffer, loadXMM4Offset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm5: - COPY_TEMPLATE(buffer, loadXMM5Offset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm6: - COPY_TEMPLATE(buffer, loadXMM6Offset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::xmm7: - COPY_TEMPLATE(buffer, loadXMM7Offset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::eax: - COPY_TEMPLATE(buffer, loadRAXOffset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::esi: - COPY_TEMPLATE(buffer, loadRSIOffset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::edx: - COPY_TEMPLATE(buffer, loadRDXOffset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::ecx: - COPY_TEMPLATE(buffer, loadRCXOffset, argLoadSize); - patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); - break; - case TR::RealRegister::NoReg: - break; - } - i += entry.slots; - } - return argLoadSize; -} + case TR::RealRegister::xmm1: + COPY_TEMPLATE(buffer, loadXMM1Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm2: + COPY_TEMPLATE(buffer, loadXMM2Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm3: + COPY_TEMPLATE(buffer, loadXMM3Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm4: + COPY_TEMPLATE(buffer, loadXMM4Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm5: + COPY_TEMPLATE(buffer, loadXMM5Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm6: + COPY_TEMPLATE(buffer, loadXMM6Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::xmm7: + COPY_TEMPLATE(buffer, loadXMM7Offset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::eax: + COPY_TEMPLATE(buffer, loadRAXOffset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::esi: + COPY_TEMPLATE(buffer, loadRSIOffset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::edx: + COPY_TEMPLATE(buffer, loadRDXOffset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::ecx: + COPY_TEMPLATE(buffer, loadRCXOffset, argLoadSize); + patchImm4(buffer, (U_32)((offset+ stack_alloc_space) & 0xffffffff)); + break; + case TR::RealRegister::NoReg: + break; + } + i += entry.slots; + } + return argLoadSize; + } MJIT::CodeGenerator::CodeGenerator( - struct J9JITConfig* config, - J9VMThread* thread, - TR::FilePointer* fp, - TR_J9VMBase& vm, - ParamTable* paramTable, - TR::Compilation* comp, - MJIT::CodeGenGC* mjitCGGC, - TR::PersistentInfo* persistentInfo, - TR_Memory *trMemory, - TR_ResolvedMethod *compilee) - :_linkage(Linkage(comp->cg())) - ,_logFileFP(fp) - ,_vm(vm) - ,_codeCache(NULL) - ,_stackPeakSize(0) - ,_paramTable(paramTable) - ,_comp(comp) - ,_mjitCGGC(mjitCGGC) - ,_atlas(NULL) - ,_persistentInfo(persistentInfo) - ,_trMemory(trMemory) - ,_compilee(compilee) - ,_maxCalleeArgsSize(0) -{ - _linkage._properties.setOutgoingArgAlignment(lcm(16, _vm.getLocalObjectAlignmentInBytes())); -} - -TR::GCStackAtlas* + struct J9JITConfig *config, + J9VMThread *thread, + TR::FilePointer *fp, + TR_J9VMBase& vm, + ParamTable *paramTable, + TR::Compilation *comp, + MJIT::CodeGenGC *mjitCGGC, + TR::PersistentInfo *persistentInfo, + TR_Memory *trMemory, + TR_ResolvedMethod *compilee) + :_linkage(Linkage(comp->cg())) + ,_logFileFP(fp) + ,_vm(vm) + ,_codeCache(NULL) + ,_stackPeakSize(0) + ,_paramTable(paramTable) + ,_comp(comp) + ,_mjitCGGC(mjitCGGC) + ,_atlas(NULL) + ,_persistentInfo(persistentInfo) + ,_trMemory(trMemory) + ,_compilee(compilee) + ,_maxCalleeArgsSize(0) + { + _linkage._properties.setOutgoingArgAlignment(lcm(16, _vm.getLocalObjectAlignmentInBytes())); + } + +TR::GCStackAtlas * MJIT::CodeGenerator::getStackAtlas() -{ - return _atlas; -} + { + return _atlas; + } buffer_size_t MJIT::CodeGenerator::generateSwitchToInterpPrePrologue( - char *buffer, - J9Method *method, - buffer_size_t boundary, - buffer_size_t margin, - char *typeString, - U_16 maxLength -){ - - uintptr_t startOfMethod = (uintptr_t)buffer; - buffer_size_t switchSize = 0; - - // Store the J9Method address in edi and store the label of this jitted method. - uintptr_t j2iTransitionPtr = getHelperOrTrampolineAddress(TR_j2iTransition, (uintptr_t)buffer); - uintptr_t methodPtr = (uintptr_t)method; - COPY_TEMPLATE(buffer, movRDIImm64, switchSize); - patchImm8(buffer, methodPtr); - - //if (comp->getOption(TR_EnableHCR)) - // comp->getStaticHCRPICSites()->push_front(prev); - - // Pass args on stack, expects the method to run in RDI - buffer_size_t saveArgSize = saveArgsOnStack(buffer, 0, _paramTable); - buffer += saveArgSize; - switchSize += saveArgSize; - - // Generate jump to the TR_i2jTransition function - COPY_TEMPLATE(buffer, jump4ByteRel, switchSize); - PATCH_RELATIVE_ADDR_32_BIT(buffer, j2iTransitionPtr); - - margin += 2; - uintptr_t requiredAlignment = 0; - if(getRequiredAlignment((uintptr_t)buffer, boundary, margin, &requiredAlignment)){ - buffer = (char*)startOfMethod; - return 0; - } - for(U_8 i = 0; i<=requiredAlignment; i++){ - COPY_TEMPLATE(buffer, nopInstruction, switchSize); - } - - //Generate relative jump to start - COPY_TEMPLATE(buffer, jumpByteRel, switchSize); - PATCH_RELATIVE_ADDR_8_BIT(buffer, startOfMethod); - - return switchSize; -} + char *buffer, + J9Method *method, + buffer_size_t boundary, + buffer_size_t margin, + char *typeString, + U_16 maxLength) + { + uintptr_t startOfMethod = (uintptr_t)buffer; + buffer_size_t switchSize = 0; + + // Store the J9Method address in edi and store the label of this jitted method. + uintptr_t j2iTransitionPtr = getHelperOrTrampolineAddress(TR_j2iTransition, (uintptr_t)buffer); + uintptr_t methodPtr = (uintptr_t)method; + COPY_TEMPLATE(buffer, movRDIImm64, switchSize); + patchImm8(buffer, methodPtr); + + // if (comp->getOption(TR_EnableHCR)) + // comp->getStaticHCRPICSites()->push_front(prev); + + // Pass args on stack, expects the method to run in RDI + buffer_size_t saveArgSize = saveArgsOnStack(buffer, 0, _paramTable); + buffer += saveArgSize; + switchSize += saveArgSize; + + // Generate jump to the TR_i2jTransition function + COPY_TEMPLATE(buffer, jump4ByteRel, switchSize); + PATCH_RELATIVE_ADDR_32_BIT(buffer, j2iTransitionPtr); + + margin += 2; + uintptr_t requiredAlignment = 0; + if (getRequiredAlignment((uintptr_t)buffer, boundary, margin, &requiredAlignment)) + { + buffer = (char*)startOfMethod; + return 0; + } + for (U_8 i = 0; i<=requiredAlignment; i++) + COPY_TEMPLATE(buffer, nopInstruction, switchSize); + + // Generate relative jump to start + COPY_TEMPLATE(buffer, jumpByteRel, switchSize); + PATCH_RELATIVE_ADDR_8_BIT(buffer, startOfMethod); + + return switchSize; + } buffer_size_t MJIT::CodeGenerator::generatePrePrologue( - char* buffer, - J9Method* method, - char** magicWordLocation, - char** first2BytesPatchLocation, - char** samplingRecompileCallLocation -){ - // Return size (in bytes) of pre-proglue on success - - buffer_size_t preprologueSize = 0; - int alignmentMargin = 0; - int alignmentBoundary = 8; - uintptr_t endOfPreviousMethod = (uintptr_t)buffer; - - J9ROMClass *romClass = J9_CLASS_FROM_METHOD(method)->romClass; - J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); - U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); - // Not sure whether maxLength needs to be changed here for non-static methods or not - char typeString[maxLength]; - if(nativeSignature(method, typeString) //If this had no signature, we can't compile it. - || !_comp->getRecompilationInfo() //If this has no recompilation info we won't be able to recompile, so let TR compile it later. - ){ - return 0; - } - - // Save area for the first two bytes of the method + jitted body info pointer + linkageinfo in margin. - alignmentMargin += MJIT_SAVE_AREA_SIZE + MJIT_JITTED_BODY_INFO_PTR_SIZE + MJIT_LINKAGE_INFO_SIZE; - - - // Make sure the startPC at least 4-byte aligned. This is important, since the VM - // depends on the alignment (it uses the low order bits as tag bits). - // - // TODO: `mustGenerateSwitchToInterpreterPrePrologue` checks that the code generator's compilation is not: - // `usesPreexistence`, - // `getOption(TR_EnableHCR)`, - // `!fej9()->isAsyncCompilation`, - // `getOption(TR_FullSpeedDebug)`, - // - if(GENERATE_SWITCH_TO_INTERP_PREPROLOGUE){ // TODO: Replace with a check that perfroms the above checks. - //generateSwitchToInterpPrePrologue will align data for use - char *old_buff = buffer; - preprologueSize += generateSwitchToInterpPrePrologue(buffer, method, alignmentBoundary, alignmentMargin, typeString, maxLength); - buffer += preprologueSize; + char *buffer, + J9Method *method, + char** magicWordLocation, + char** first2BytesPatchLocation, + char** samplingRecompileCallLocation) + { + // Return size (in bytes) of pre-prologue on success + + buffer_size_t preprologueSize = 0; + int alignmentMargin = 0; + int alignmentBoundary = 8; + uintptr_t endOfPreviousMethod = (uintptr_t)buffer; + + J9ROMClass *romClass = J9_CLASS_FROM_METHOD(method)->romClass; + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); + U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); + char typeString[maxLength]; + if (nativeSignature(method, typeString) // If this had no signature, we can't compile it. + || !_comp->getRecompilationInfo()) // If this has no recompilation info we won't be able to recompile, so let TR compile it later. + { + return 0; + } + + // Save area for the first two bytes of the method + jitted body info pointer + linkageinfo in margin. + alignmentMargin += MJIT_SAVE_AREA_SIZE + MJIT_JITTED_BODY_INFO_PTR_SIZE + MJIT_LINKAGE_INFO_SIZE; + + // Make sure the startPC at least 4-byte aligned. This is important, since the VM + // depends on the alignment (it uses the low order bits as tag bits). + // + // TODO: `mustGenerateSwitchToInterpreterPrePrologue` checks that the code generator's compilation is not: + // `usesPreexistence`, + // `getOption(TR_EnableHCR)`, + // `!fej9()->isAsyncCompilation`, + // `getOption(TR_FullSpeedDebug)`, + // + if (GENERATE_SWITCH_TO_INTERP_PREPROLOGUE) + { + // TODO: Replace with a check that performs the above checks. + // generateSwitchToInterpPrePrologue will align data for use + char *old_buff = buffer; + preprologueSize += generateSwitchToInterpPrePrologue(buffer, method, alignmentBoundary, alignmentMargin, typeString, maxLength); + buffer += preprologueSize; #ifdef MJIT_DEBUG - trfprintf(_logFileFP, "\ngenerateSwitchToInterpPrePrologue: %d\n", preprologueSize); - for(int32_t i = 0; i < preprologueSize; i++){ - trfprintf(_logFileFP, "%02x\n", ((unsigned char)old_buff[i]) & (unsigned char)0xff); - } + trfprintf(_logFileFP, "\ngenerateSwitchToInterpPrePrologue: %d\n", preprologueSize); + for (int32_t i = 0; i < preprologueSize; i++) + trfprintf(_logFileFP, "%02x\n", ((unsigned char)old_buff[i]) & (unsigned char)0xff); #endif - } else { - uintptr_t requiredAlignment = 0; - if(getRequiredAlignment((uintptr_t)buffer, alignmentBoundary, alignmentMargin, &requiredAlignment)){ - buffer = (char*)endOfPreviousMethod; - return 0; - } - for(U_8 i = 0; igetRecompilationInfo()->getJittedBodyInfo(); - bodyInfo->setUsesGCR(); - bodyInfo->setIsMJITCompiledMethod(true); - COPY_IMM8_TEMPLATE(buffer, preprologueSize, bodyInfo); - - - // Allow 4 bytes for private linkage return type info. Allocate the 4 bytes - // even if the linkage is not private, so that all the offsets are - // predictable. - uint32_t magicWord = 0x00000010; // All MicroJIT compilations are set to recompile with GCR. - switch(typeString[0]){ - case MJIT::VOID_TYPE_CHARACTER: - COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_VoidReturn); - break; - case MJIT::BYTE_TYPE_CHARACTER: - case MJIT::SHORT_TYPE_CHARACTER: - case MJIT::CHAR_TYPE_CHARACTER: - case MJIT::BOOLEAN_TYPE_CHARACTER: - case MJIT::INT_TYPE_CHARACTER: - COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_IntReturn); - break; - case MJIT::LONG_TYPE_CHARACTER: - COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_LongReturn); - break; - case MJIT::CLASSNAME_TYPE_CHARACTER: - COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_ObjectReturn); - break; - case MJIT::FLOAT_TYPE_CHARACTER: - COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_FloatXMMReturn); - break; - case MJIT::DOUBLE_TYPE_CHARACTER: - COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_DoubleXMMReturn); - break; - } - *magicWordLocation = buffer; - return preprologueSize; -} + } + else + { + uintptr_t requiredAlignment = 0; + if (getRequiredAlignment((uintptr_t)buffer, alignmentBoundary, alignmentMargin, &requiredAlignment)) + { + buffer = (char*)endOfPreviousMethod; + return 0; + } + for (U_8 i = 0; igetRecompilationInfo()->getJittedBodyInfo(); + bodyInfo->setUsesGCR(); + bodyInfo->setIsMJITCompiledMethod(true); + COPY_IMM8_TEMPLATE(buffer, preprologueSize, bodyInfo); + + // Allow 4 bytes for private linkage return type info. Allocate the 4 bytes + // even if the linkage is not private, so that all the offsets are + // predictable. + uint32_t magicWord = 0x00000010; // All MicroJIT compilations are set to recompile with GCR. + switch (typeString[0]) + { + case MJIT::VOID_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_VoidReturn); + break; + case MJIT::BYTE_TYPE_CHARACTER: + case MJIT::SHORT_TYPE_CHARACTER: + case MJIT::CHAR_TYPE_CHARACTER: + case MJIT::BOOLEAN_TYPE_CHARACTER: + case MJIT::INT_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_IntReturn); + break; + case MJIT::LONG_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_LongReturn); + break; + case MJIT::CLASSNAME_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_ObjectReturn); + break; + case MJIT::FLOAT_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_FloatXMMReturn); + break; + case MJIT::DOUBLE_TYPE_CHARACTER: + COPY_IMM4_TEMPLATE(buffer, preprologueSize, magicWord | TR_DoubleXMMReturn); + break; + } + *magicWordLocation = buffer; + return preprologueSize; + } -/** +/* * Write the prologue to the buffer and return the size written to the buffer. */ buffer_size_t MJIT::CodeGenerator::generatePrologue( - char *buffer, - J9Method *method, - char **jitStackOverflowJumpPatchLocation, - char *magicWordLocation, - char *first2BytesPatchLocation, - char *samplingRecompileCallLocation, - char **firstInstLocation, - MJIT::ByteCodeIterator *bci -){ - buffer_size_t prologueSize = 0; - uintptr_t prologueStart = (uintptr_t)buffer; - - J9ROMClass *romClass = J9_CLASS_FROM_METHOD(method)->romClass; - J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); - U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); - // Not sure whether maxLength needs to be changed here for non-static methods or not - char typeString[maxLength]; - if(nativeSignature(method, typeString)){ - return 0; - } - uintptr_t startPC = (uintptr_t)buffer; - //Same order as saving (See generateSwitchToInterpPrePrologue) - buffer_size_t loadArgSize = loadArgsFromStack(buffer, 0, _paramTable); - buffer += loadArgSize; - prologueSize += loadArgSize; - - uint32_t magicWord = *(magicWordLocation-4); //We save the end of the location for patching, not the start. - magicWord |= (((uintptr_t)buffer) - prologueStart) << 16; - patchImm4(magicWordLocation, magicWord); - - //check for stack overflow - *firstInstLocation = buffer; - COPY_TEMPLATE(buffer, cmpRspRbpDerefOffset, prologueSize); - patchImm4(buffer, (U_32)(0x50)); - patchImm2(first2BytesPatchLocation, *(uint16_t*)(*firstInstLocation)); - COPY_TEMPLATE(buffer, jbe4ByteRel, prologueSize); - *jitStackOverflowJumpPatchLocation = buffer; - - // Compute frame size - // - // allocSize: bytes to be subtracted from the stack pointer when allocating the frame - // peakSize: maximum bytes of stack this method might consume before encountering another stack check - // - struct TR::X86LinkageProperties properties = _linkage._properties; - //Max amount of data to be preserved. It is overkill, but as long as the prologue/epilogue save/load everything - // it is a workable solution for now. Later try to determine what registers need to be preserved ahead of time. - const int32_t pointerSize = properties.getPointerSize(); - U_32 preservedRegsSize = properties._numPreservedRegisters * pointerSize; - const int32_t localSize = (romMethod->argCount + romMethod->tempCount)*pointerSize; - MJIT_ASSERT(_logFileFP, localSize >= 0, "assertion failure"); - int32_t frameSize = localSize + preservedRegsSize + ( properties.getReservesOutgoingArgsInPrologue() ? pointerSize : 0 ); - uint32_t stackSize = frameSize + properties.getRetAddressWidth(); - uint32_t adjust = align(stackSize, properties.getOutgoingArgAlignment(), _logFileFP) - stackSize; - auto allocSize = frameSize + adjust; - - const int32_t mjitRegisterSize = (6 * pointerSize); - - // return address is allocated by call instruction - // Here we conservatively assume there is a call - _stackPeakSize = - allocSize + //space for stack frame - mjitRegisterSize +//saved registers, conservatively expecting a call - _stackPeakSize; //space for mjit value stack (set in CompilationInfoPerThreadBase::mjit) - - // Small: entire stack usage fits in STACKCHECKBUFFER, so if sp is within - // the soft limit before buying the frame, then the whole frame will fit - // within the hard limit. - // - // Medium: the additional stack required after bumping the sp fits in - // STACKCHECKBUFFER, so if sp after the bump is within the soft limit, the - // whole frame will fit within the hard limit. - // - // Large: No shortcuts. Calculate the maximum extent of stack needed and - // compare that against the soft limit. (We have to use the soft limit here - // if for no other reason than that's the one used for asyncchecks.) - // - - COPY_TEMPLATE(buffer, subRSPImm4, prologueSize); - patchImm4(buffer, _stackPeakSize); - - buffer_size_t savePreserveSize = savePreserveRegisters(buffer, _stackPeakSize-8); - buffer += savePreserveSize; - prologueSize += savePreserveSize; - - buffer_size_t saveArgSize = saveArgsOnStack(buffer, _stackPeakSize, _paramTable); - buffer += saveArgSize; - prologueSize += saveArgSize; - - /* - * MJIT stack frames must conform to TR stack frame shapes. However, so long as we are able to have our frames walked, we should be - * free to allocate more things on the stack before the required data which is used for walking. The following is the general shape of the - * initial MJIT stack frame - * +-----------------------------+ - * | ... | - * | Paramaters passed by caller | <-- The space for saving params passed in registers is also here - * | ... | - * +-----------------------------+ - * | Return Address | <-- RSP points here before we sub the _stackPeakSize - * +-----------------------------+ - * | Alignment space | - * +-----------------------------+ <-- Caller/Callee stack boundary. - * | ... | - * | Callee Preserved Registers | - * | ... | - * +-----------------------------+ - * | ... | - * | Local Variables | - * | ... | <-- R10 points here at the end of the prologue, R14 always points here. - * +-----------------------------+ - * | ... | - * | MJIT Computation Stack | - * | ... | - * +-----------------------------+ - * | ... | - * | Caller Preserved Registers | - * | ... | <-- RSP points here after we sub _stackPeakSize the first time - * +-----------------------------+ - * | ... | - * | Paramaters passed to callee | - * | ... | <-- RSP points here after we sub _stackPeakSize the second time - * +-----------------------------+ - * | Return Address | <-- RSP points here after a callq instruction - * +-----------------------------+ <-- End of stack frame - */ - - //Set up MJIT value stack - COPY_TEMPLATE(buffer, movRSPR10, prologueSize); - COPY_TEMPLATE(buffer, addR10Imm4, prologueSize); - patchImm4(buffer, (U_32)(mjitRegisterSize)); - COPY_TEMPLATE(buffer, addR10Imm4, prologueSize); - patchImm4(buffer, (U_32)(romMethod->maxStack*8)); - //TODO: Find a way to cleanly preserve this value before overriding it in the _stackAllocSize - - //Set up MJIT local array - COPY_TEMPLATE(buffer, movR10R14, prologueSize); - - // Move parameters to where the method body will expect to find them - buffer_size_t saveSize = saveArgsInLocalArray(buffer, _stackPeakSize, typeString); - if((int32_t)saveSize == -1) - return 0; - buffer += saveSize; - prologueSize += saveSize; - if (!saveSize && romMethod->argCount) return 0; - - int32_t firstLocalOffset = (preservedRegsSize+8); - // This is the number of slots, not variables. - U_16 localVariableCount = romMethod->argCount + romMethod->tempCount; - MJIT::LocalTableEntry localTableEntries[localVariableCount]; - for(int i=0; icg()->setStackFramePaddingSizeInBytes(adjust); - _comp->cg()->setFrameSizeInBytes(_stackPeakSize); - - //The Parameters are now in the local array, so we update the entries in the ParamTable with their values from the LocalTable - ParamTableEntry entry; - for(int i=0; i<_paramTable->getParamCount(); i += ((entry.notInitialized) ? 1 : entry.slots)){ - //The indexes should be the same in both tables - // because (as far as we know) the parameters - // are indexed the same as locals and parameters - localTable->getEntry(i, &entry); - _paramTable->setEntry(i, &entry); - } - - TR::GCStackAtlas *atlas = _mjitCGGC->createStackAtlas(_comp, _paramTable, localTable); - if (!atlas) return 0; - _atlas = atlas; - - //Support to paint allocated frame slots. - if (( _comp->getOption(TR_PaintAllocatedFrameSlotsDead) || _comp->getOption(TR_PaintAllocatedFrameSlotsFauxObject) ) && allocSize!=0) { - uint64_t paintValue = 0; - - //Paint the slots with deadf00d - if (_comp->getOption(TR_PaintAllocatedFrameSlotsDead)) { - paintValue = (uint64_t)CONSTANT64(0xdeadf00ddeadf00d); - } else { //Paint stack slots with a arbitrary object aligned address. - paintValue = ((uintptr_t) ((uintptr_t)_comp->getOptions()->getHeapBase() + (uintptr_t) 4096)); - } - - COPY_TEMPLATE(buffer, paintRegister, prologueSize); - patchImm8(buffer, paintValue); - for(int32_t i=_paramTable->getParamCount(); igetLocalCount();){ - if(entry.notInitialized){ - i++; - continue; + char *buffer, + J9Method *method, + char **jitStackOverflowJumpPatchLocation, + char *magicWordLocation, + char *first2BytesPatchLocation, + char *samplingRecompileCallLocation, + char **firstInstLocation, + MJIT::ByteCodeIterator *bci) + { + buffer_size_t prologueSize = 0; + uintptr_t prologueStart = (uintptr_t)buffer; + + J9ROMClass *romClass = J9_CLASS_FROM_METHOD(method)->romClass; + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); + U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); + char typeString[maxLength]; + if (nativeSignature(method, typeString)) + return 0; + uintptr_t startPC = (uintptr_t)buffer; + // Same order as saving (See generateSwitchToInterpPrePrologue) + buffer_size_t loadArgSize = loadArgsFromStack(buffer, 0, _paramTable); + buffer += loadArgSize; + prologueSize += loadArgSize; + + uint32_t magicWord = *(magicWordLocation-4); // We save the end of the location for patching, not the start. + magicWord |= (((uintptr_t)buffer) - prologueStart) << 16; + patchImm4(magicWordLocation, magicWord); + + // check for stack overflow + *firstInstLocation = buffer; + COPY_TEMPLATE(buffer, cmpRspRbpDerefOffset, prologueSize); + patchImm4(buffer, (U_32)(0x50)); + patchImm2(first2BytesPatchLocation, *(uint16_t*)(*firstInstLocation)); + COPY_TEMPLATE(buffer, jbe4ByteRel, prologueSize); + *jitStackOverflowJumpPatchLocation = buffer; + + // Compute frame size + // + // allocSize: bytes to be subtracted from the stack pointer when allocating the frame + // peakSize: maximum bytes of stack this method might consume before encountering another stack check + // + struct TR::X86LinkageProperties properties = _linkage._properties; + // Max amount of data to be preserved. It is overkill, but as long as the prologue/epilogue save/load everything + // it is a workable solution for now. Later try to determine what registers need to be preserved ahead of time. + const int32_t pointerSize = properties.getPointerSize(); + U_32 preservedRegsSize = properties._numPreservedRegisters * pointerSize; + const int32_t localSize = (romMethod->argCount + romMethod->tempCount)*pointerSize; + MJIT_ASSERT(_logFileFP, localSize >= 0, "assertion failure"); + int32_t frameSize = localSize + preservedRegsSize + ( properties.getReservesOutgoingArgsInPrologue() ? pointerSize : 0 ); + uint32_t stackSize = frameSize + properties.getRetAddressWidth(); + uint32_t adjust = align(stackSize, properties.getOutgoingArgAlignment(), _logFileFP) - stackSize; + auto allocSize = frameSize + adjust; + + const int32_t mjitRegisterSize = (6 * pointerSize); + + // return address is allocated by call instruction + // Here we conservatively assume there is a call + _stackPeakSize = + allocSize + // space for stack frame + mjitRegisterSize + // saved registers, conservatively expecting a call + _stackPeakSize; // space for mjit value stack (set in CompilationInfoPerThreadBase::mjit) + + // Small: entire stack usage fits in STACKCHECKBUFFER, so if sp is within + // the soft limit before buying the frame, then the whole frame will fit + // within the hard limit. + // + // Medium: the additional stack required after bumping the sp fits in + // STACKCHECKBUFFER, so if sp after the bump is within the soft limit, the + // whole frame will fit within the hard limit. + // + // Large: No shortcuts. Calculate the maximum extent of stack needed and + // compare that against the soft limit. (We have to use the soft limit here + // if for no other reason than that's the one used for asyncchecks.) + + COPY_TEMPLATE(buffer, subRSPImm4, prologueSize); + patchImm4(buffer, _stackPeakSize); + + buffer_size_t savePreserveSize = savePreserveRegisters(buffer, _stackPeakSize-8); + buffer += savePreserveSize; + prologueSize += savePreserveSize; + + buffer_size_t saveArgSize = saveArgsOnStack(buffer, _stackPeakSize, _paramTable); + buffer += saveArgSize; + prologueSize += saveArgSize; + + /* + * MJIT stack frames must conform to TR stack frame shapes. However, so long as we are able to have our frames walked, we should be + * free to allocate more things on the stack before the required data which is used for walking. The following is the general shape of the + * initial MJIT stack frame + * +-----------------------------+ + * | ... | + * | Paramaters passed by caller | <-- The space for saving params passed in registers is also here + * | ... | + * +-----------------------------+ + * | Return Address | <-- RSP points here before we sub the _stackPeakSize + * +-----------------------------+ + * | Alignment space | + * +-----------------------------+ <-- Caller/Callee stack boundary + * | ... | + * | Callee Preserved Registers | + * | ... | + * +-----------------------------+ + * | ... | + * | Local Variables | + * | ... | <-- R10 points here at the end of the prologue, R14 always points here + * +-----------------------------+ + * | ... | + * | MJIT Computation Stack | + * | ... | + * +-----------------------------+ + * | ... | + * | Caller Preserved Registers | + * | ... | <-- RSP points here after we sub _stackPeakSize the first time + * +-----------------------------+ + * | ... | + * | Paramaters passed to callee | + * | ... | <-- RSP points here after we sub _stackPeakSize the second time + * +-----------------------------+ + * | Return Address | <-- RSP points here after a callq instruction + * +-----------------------------+ <-- End of stack frame + */ + + // Set up MJIT value stack + COPY_TEMPLATE(buffer, movRSPR10, prologueSize); + COPY_TEMPLATE(buffer, addR10Imm4, prologueSize); + patchImm4(buffer, (U_32)(mjitRegisterSize)); + COPY_TEMPLATE(buffer, addR10Imm4, prologueSize); + patchImm4(buffer, (U_32)(romMethod->maxStack*8)); + // TODO: Find a way to cleanly preserve this value before overriding it in the _stackAllocSize + + // Set up MJIT local array + COPY_TEMPLATE(buffer, movR10R14, prologueSize); + + // Move parameters to where the method body will expect to find them + buffer_size_t saveSize = saveArgsInLocalArray(buffer, _stackPeakSize, typeString); + if ((int32_t)saveSize == -1) + return 0; + buffer += saveSize; + prologueSize += saveSize; + if (!saveSize && romMethod->argCount) + return 0; + + int32_t firstLocalOffset = (preservedRegsSize+8); + // This is the number of slots, not variables. + U_16 localVariableCount = romMethod->argCount + romMethod->tempCount; + MJIT::LocalTableEntry localTableEntries[localVariableCount]; + for (int i=0; icg()->setStackFramePaddingSizeInBytes(adjust); + _comp->cg()->setFrameSizeInBytes(_stackPeakSize); + + // The Parameters are now in the local array, so we update the entries in the ParamTable with their values from the LocalTable + ParamTableEntry entry; + for (int i=0; i<_paramTable->getParamCount(); i += ((entry.notInitialized) ? 1 : entry.slots)) + { + // The indexes should be the same in both tables + // because (as far as we know) the parameters + // are indexed the same as locals and parameters + localTable->getEntry(i, &entry); + _paramTable->setEntry(i, &entry); + } + + TR::GCStackAtlas *atlas = _mjitCGGC->createStackAtlas(_comp, _paramTable, localTable); + if (!atlas) + return 0; + _atlas = atlas; + + // Support to paint allocated frame slots. + if (( _comp->getOption(TR_PaintAllocatedFrameSlotsDead) || _comp->getOption(TR_PaintAllocatedFrameSlotsFauxObject) ) && allocSize!=0) + { + uint64_t paintValue = 0; + + // Paint the slots with deadf00d + if (_comp->getOption(TR_PaintAllocatedFrameSlotsDead)) + paintValue = (uint64_t)CONSTANT64(0xdeadf00ddeadf00d); + else //Paint stack slots with a arbitrary object aligned address. + paintValue = ((uintptr_t) ((uintptr_t)_comp->getOptions()->getHeapBase() + (uintptr_t) 4096)); + + COPY_TEMPLATE(buffer, paintRegister, prologueSize); + patchImm8(buffer, paintValue); + for (int32_t i=_paramTable->getParamCount(); igetLocalCount();) + { + if (entry.notInitialized) + { + i++; + continue; } - localTable->getEntry(i, &entry); - COPY_TEMPLATE(buffer, paintLocal, prologueSize); - patchImm4(buffer, i * pointerSize); - i += entry.slots; - } - } - uint32_t count = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT; - prologueSize += generateGCR(buffer, count, method, startPC); - - return prologueSize; -} + localTable->getEntry(i, &entry); + COPY_TEMPLATE(buffer, paintLocal, prologueSize); + patchImm4(buffer, i * pointerSize); + i += entry.slots; + } + } + uint32_t count = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT; + prologueSize += generateGCR(buffer, count, method, startPC); + + return prologueSize; + } buffer_size_t -MJIT::CodeGenerator::generateEpologue(char *buffer){ - buffer_size_t epologueSize = loadPreserveRegisters(buffer, _stackPeakSize-8); - buffer += epologueSize; - COPY_TEMPLATE(buffer, addRSPImm4, epologueSize); - patchImm4(buffer, _stackPeakSize - _maxCalleeArgsSize); - return epologueSize; -} +MJIT::CodeGenerator::generateEpologue(char *buffer) + { + buffer_size_t epologueSize = loadPreserveRegisters(buffer, _stackPeakSize-8); + buffer += epologueSize; + COPY_TEMPLATE(buffer, addRSPImm4, epologueSize); + patchImm4(buffer, _stackPeakSize - _maxCalleeArgsSize); + return epologueSize; + } // Macros to clean up the switch case for generate body -#define loadCasePrologue(byteCodeCaseType)\ -case TR_J9ByteCode::byteCodeCaseType -#define loadCaseBody(byteCodeCaseType)\ - goto GenericLoadCall -#define storeCasePrologue(byteCodeCaseType)\ -case TR_J9ByteCode::byteCodeCaseType -#define storeCaseBody(byteCodeCaseType)\ - goto GenericStoreCall +#define loadCasePrologue(byteCodeCaseType) \ + case TR_J9ByteCode::byteCodeCaseType +#define loadCaseBody(byteCodeCaseType) \ + goto GenericLoadCall +#define storeCasePrologue(byteCodeCaseType) \ + case TR_J9ByteCode::byteCodeCaseType +#define storeCaseBody(byteCodeCaseType) \ + goto GenericStoreCall // used for each conditional and jump except gotow (wide) -#define branchByteCode(byteCode, template)\ -case byteCode: {\ - _targetIndex = bci->next2BytesSigned() + bci->currentByteCodeIndex();\ - COPY_TEMPLATE(buffer, template, codeGenSize);\ - MJIT::JumpTableEntry _entry(_targetIndex, buffer);\ - _jumpTable[_targetCounter] = _entry;\ - _targetCounter++;\ - break;\ - } +#define branchByteCode(byteCode, template) \ + case byteCode: \ + { \ + _targetIndex = bci->next2BytesSigned() + bci->currentByteCodeIndex(); \ + COPY_TEMPLATE(buffer, template, codeGenSize); \ + MJIT::JumpTableEntry _entry(_targetIndex, buffer); \ + _jumpTable[_targetCounter] = _entry; \ + _targetCounter++; \ + break; \ + } + buffer_size_t -MJIT::CodeGenerator::generateBody(char *buffer, MJIT::ByteCodeIterator *bci){ - buffer_size_t codeGenSize = 0; - buffer_size_t calledCGSize = 0; - int32_t _currentByteCodeIndex = 0; - int32_t _targetIndex = 0; - int32_t _targetCounter = 0; +MJIT::CodeGenerator::generateBody(char *buffer, MJIT::ByteCodeIterator *bci) + { + buffer_size_t codeGenSize = 0; + buffer_size_t calledCGSize = 0; + int32_t _currentByteCodeIndex = 0; + int32_t _targetIndex = 0; + int32_t _targetCounter = 0; #ifdef MJIT_DEBUG_BC_WALKING - std::string _bcMnemonic, _bcText; - u_int8_t _bcOpcode; + std::string _bcMnemonic, _bcText; + u_int8_t _bcOpcode; #endif - uintptr_t SSEfloatRemainder; - uintptr_t SSEdoubleRemainder; + uintptr_t SSEfloatRemainder; + uintptr_t SSEdoubleRemainder; - // Map between ByteCode Index to generated JITed code address - char* _byteCodeIndexToJitTable[bci->maxByteCodeIndex()]; - - // Each time we encounter a branch or goto - // add an entry. After codegen completes use - // these entries to patch addresses using _byteCodeIndexToJitTable. - MJIT::JumpTableEntry _jumpTable[bci->maxByteCodeIndex()]; + // Map between ByteCode Index to generated JITed code address + char *_byteCodeIndexToJitTable[bci->maxByteCodeIndex()]; + // Each time we encounter a branch or goto + // add an entry. After codegen completes use + // these entries to patch addresses using _byteCodeIndexToJitTable. + MJIT::JumpTableEntry _jumpTable[bci->maxByteCodeIndex()]; #ifdef MJIT_DEBUG_BC_WALKING - MJIT_DEBUG_BC_LOG(_logFileFP, "\nMicroJIT Bytecode Dump:\n"); - bci->printByteCodes(); + MJIT_DEBUG_BC_LOG(_logFileFP, "\nMicroJIT Bytecode Dump:\n"); + bci->printByteCodes(); #endif - struct TR::X86LinkageProperties *properties = &_linkage._properties; - TR::RealRegister::RegNum returnRegIndex; - U_16 slotCountRem = 0; - bool isDouble = false; - for(TR_J9ByteCode bc = bci->first(); bc != J9BCunknown; bc = bci->next()) - { - + struct TR::X86LinkageProperties *properties = &_linkage._properties; + TR::RealRegister::RegNum returnRegIndex; + U_16 slotCountRem = 0; + bool isDouble = false; + for (TR_J9ByteCode bc = bci->first(); bc != J9BCunknown; bc = bci->next()) + { #ifdef MJIT_DEBUG_BC_WALKING - // In cases of conjunction of two bytecodes, the _bcMnemonic needs to be replaced by correct bytecodes, - // i.e., JBaload0getfield needs to be replaced by JBaload0 - // and JBnewdup by JBnew - if(!strcmp(bci->currentMnemonic(), "JBaload0getfield")) - { - const char * newMnemonic = "JBaload0"; - _bcMnemonic = std::string(newMnemonic); - } - else if(!strcmp(bci->currentMnemonic(), "JBnewdup")) - { - const char * newMnemonic = "JBnew"; - _bcMnemonic = std::string(newMnemonic); - } - else - { - _bcMnemonic = std::string(bci->currentMnemonic()); - } - _bcOpcode = bci->currentOpcode(); - _bcText = _bcMnemonic + "\n"; - MJIT_DEBUG_BC_LOG(_logFileFP, _bcText.c_str()); + // In cases of conjunction of two bytecodes, the _bcMnemonic needs to be replaced by correct bytecodes, + // i.e., JBaload0getfield needs to be replaced by JBaload0 + // and JBnewdup by JBnew + if (!strcmp(bci->currentMnemonic(), "JBaload0getfield")) + { + const char * newMnemonic = "JBaload0"; + _bcMnemonic = std::string(newMnemonic); + } + else if (!strcmp(bci->currentMnemonic(), "JBnewdup")) + { + const char * newMnemonic = "JBnew"; + _bcMnemonic = std::string(newMnemonic); + } + else + { + _bcMnemonic = std::string(bci->currentMnemonic()); + } + _bcOpcode = bci->currentOpcode(); + _bcText = _bcMnemonic + "\n"; + MJIT_DEBUG_BC_LOG(_logFileFP, _bcText.c_str()); #endif - _currentByteCodeIndex = bci->currentByteCodeIndex(); + _currentByteCodeIndex = bci->currentByteCodeIndex(); - // record BCI to JIT address - _byteCodeIndexToJitTable[_currentByteCodeIndex] = buffer; + // record BCI to JIT address + _byteCodeIndexToJitTable[_currentByteCodeIndex] = buffer; - switch(bc) { - loadCasePrologue(J9BCiload0): + switch (bc) + { + loadCasePrologue(J9BCiload0): loadCaseBody(J9BCiload0); - loadCasePrologue(J9BCiload1): + loadCasePrologue(J9BCiload1): loadCaseBody(J9BCiload1); - loadCasePrologue(J9BCiload2): + loadCasePrologue(J9BCiload2): loadCaseBody(J9BCiload2); - loadCasePrologue(J9BCiload3): + loadCasePrologue(J9BCiload3): loadCaseBody(J9BCiload3); - loadCasePrologue(J9BCiload): + loadCasePrologue(J9BCiload): loadCaseBody(J9BCiload); - loadCasePrologue(J9BClload0): + loadCasePrologue(J9BClload0): loadCaseBody(J9BClload0); - loadCasePrologue(J9BClload1): + loadCasePrologue(J9BClload1): loadCaseBody(J9BClload1); - loadCasePrologue(J9BClload2): + loadCasePrologue(J9BClload2): loadCaseBody(J9BClload2); - loadCasePrologue(J9BClload3): + loadCasePrologue(J9BClload3): loadCaseBody(J9BClload3); - loadCasePrologue(J9BClload): + loadCasePrologue(J9BClload): loadCaseBody(J9BClload); - loadCasePrologue(J9BCfload0): + loadCasePrologue(J9BCfload0): loadCaseBody(J9BCfload0); - loadCasePrologue(J9BCfload1): + loadCasePrologue(J9BCfload1): loadCaseBody(J9BCfload1); - loadCasePrologue(J9BCfload2): + loadCasePrologue(J9BCfload2): loadCaseBody(J9BCfload2); - loadCasePrologue(J9BCfload3): + loadCasePrologue(J9BCfload3): loadCaseBody(J9BCfload3); - loadCasePrologue(J9BCfload): + loadCasePrologue(J9BCfload): loadCaseBody(J9BCfload); - loadCasePrologue(J9BCdload0): + loadCasePrologue(J9BCdload0): loadCaseBody(J9BCdload0); - loadCasePrologue(J9BCdload1): + loadCasePrologue(J9BCdload1): loadCaseBody(J9BCdload1); - loadCasePrologue(J9BCdload2): + loadCasePrologue(J9BCdload2): loadCaseBody(J9BCdload2); - loadCasePrologue(J9BCdload3): + loadCasePrologue(J9BCdload3): loadCaseBody(J9BCdload3); - loadCasePrologue(J9BCdload): + loadCasePrologue(J9BCdload): loadCaseBody(J9BCdload); - loadCasePrologue(J9BCaload0): + loadCasePrologue(J9BCaload0): loadCaseBody(J9BCaload0); - loadCasePrologue(J9BCaload1): + loadCasePrologue(J9BCaload1): loadCaseBody(J9BCaload1); - loadCasePrologue(J9BCaload2): + loadCasePrologue(J9BCaload2): loadCaseBody(J9BCaload2); - loadCasePrologue(J9BCaload3): + loadCasePrologue(J9BCaload3): loadCaseBody(J9BCaload3); - loadCasePrologue(J9BCaload): + loadCasePrologue(J9BCaload): loadCaseBody(J9BCaload); - GenericLoadCall: - if(calledCGSize = generateLoad(buffer, bc, bci)){ - buffer += calledCGSize; - codeGenSize += calledCGSize; - } else { + GenericLoadCall: + if(calledCGSize = generateLoad(buffer, bc, bci)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } + else + { #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "Unsupported load bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); + trfprintf(_logFileFP, "Unsupported load bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); #endif - return 0; - } - break; - storeCasePrologue(J9BCistore0): + return 0; + } + break; + storeCasePrologue(J9BCistore0): storeCaseBody(J9BCistore0); - storeCasePrologue(J9BCistore1): + storeCasePrologue(J9BCistore1): storeCaseBody(J9BCistore1); - storeCasePrologue(J9BCistore2): + storeCasePrologue(J9BCistore2): storeCaseBody(J9BCistore2); - storeCasePrologue(J9BCistore3): + storeCasePrologue(J9BCistore3): storeCaseBody(J9BCistore3); - storeCasePrologue(J9BCistore): + storeCasePrologue(J9BCistore): storeCaseBody(J9BCistore); - storeCasePrologue(J9BClstore0): + storeCasePrologue(J9BClstore0): storeCaseBody(J9BClstore0); - storeCasePrologue(J9BClstore1): + storeCasePrologue(J9BClstore1): storeCaseBody(J9BClstore1); - storeCasePrologue(J9BClstore2): + storeCasePrologue(J9BClstore2): storeCaseBody(J9BClstore2); - storeCasePrologue(J9BClstore3): + storeCasePrologue(J9BClstore3): storeCaseBody(J9BClstore3); - storeCasePrologue(J9BClstore): + storeCasePrologue(J9BClstore): storeCaseBody(J9BClstore); - storeCasePrologue(J9BCfstore0): + storeCasePrologue(J9BCfstore0): storeCaseBody(J9BCfstore0); - storeCasePrologue(J9BCfstore1): + storeCasePrologue(J9BCfstore1): storeCaseBody(J9BCfstore1); - storeCasePrologue(J9BCfstore2): + storeCasePrologue(J9BCfstore2): storeCaseBody(J9BCfstore2); - storeCasePrologue(J9BCfstore3): + storeCasePrologue(J9BCfstore3): storeCaseBody(J9BCfstore3); - storeCasePrologue(J9BCfstore): + storeCasePrologue(J9BCfstore): storeCaseBody(J9BCfstore); - storeCasePrologue(J9BCdstore0): + storeCasePrologue(J9BCdstore0): storeCaseBody(J9BCdstore0); - storeCasePrologue(J9BCdstore1): + storeCasePrologue(J9BCdstore1): storeCaseBody(J9BCdstore1); - storeCasePrologue(J9BCdstore2): + storeCasePrologue(J9BCdstore2): storeCaseBody(J9BCdstore2); - storeCasePrologue(J9BCdstore3): + storeCasePrologue(J9BCdstore3): storeCaseBody(J9BCdstore3); - storeCasePrologue(J9BCdstore): + storeCasePrologue(J9BCdstore): storeCaseBody(J9BCdstore); - storeCasePrologue(J9BCastore0): + storeCasePrologue(J9BCastore0): storeCaseBody(J9BCastore0); - storeCasePrologue(J9BCastore1): + storeCasePrologue(J9BCastore1): storeCaseBody(J9BCastore1); - storeCasePrologue(J9BCastore2): + storeCasePrologue(J9BCastore2): storeCaseBody(J9BCastore2); - storeCasePrologue(J9BCastore3): + storeCasePrologue(J9BCastore3): storeCaseBody(J9BCastore3); - storeCasePrologue(J9BCastore): + storeCasePrologue(J9BCastore): storeCaseBody(J9BCastore); - GenericStoreCall: - if(calledCGSize = generateStore(buffer, bc, bci)){ - buffer += calledCGSize; - codeGenSize += calledCGSize; - } else { + GenericStoreCall: + if (calledCGSize = generateStore(buffer, bc, bci)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } + else + { #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "Unsupported store bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); + trfprintf(_logFileFP, "Unsupported store bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); #endif - return 0; - } - break; - branchByteCode(TR_J9ByteCode::J9BCgoto,gotoTemplate); - branchByteCode(TR_J9ByteCode::J9BCifne,ifneTemplate); - branchByteCode(TR_J9ByteCode::J9BCifeq,ifeqTemplate); - branchByteCode(TR_J9ByteCode::J9BCiflt,ifltTemplate); - branchByteCode(TR_J9ByteCode::J9BCifle,ifleTemplate); - branchByteCode(TR_J9ByteCode::J9BCifgt,ifgtTemplate); - branchByteCode(TR_J9ByteCode::J9BCifge,ifgeTemplate); - branchByteCode(TR_J9ByteCode::J9BCificmpeq,ificmpeqTemplate); - branchByteCode(TR_J9ByteCode::J9BCificmpne,ificmpneTemplate); - branchByteCode(TR_J9ByteCode::J9BCificmplt,ificmpltTemplate); - branchByteCode(TR_J9ByteCode::J9BCificmple,ificmpleTemplate); - branchByteCode(TR_J9ByteCode::J9BCificmpge,ificmpgeTemplate); - branchByteCode(TR_J9ByteCode::J9BCificmpgt,ificmpgtTemplate); - branchByteCode(TR_J9ByteCode::J9BCifacmpeq,ifacmpeqTemplate); - branchByteCode(TR_J9ByteCode::J9BCifacmpne,ifacmpneTemplate); - branchByteCode(TR_J9ByteCode::J9BCifnull,ifnullTemplate); - branchByteCode(TR_J9ByteCode::J9BCifnonnull,ifnonnullTemplate); - /* Commenting out as this is currently untested - case TR_J9ByteCode::J9BCgotow : - { - _targetIndex = bci->next4BytesSigned() + bci->currentByteCodeIndex(); //Copied because of 4 byte jump - COPY_TEMPLATE(buffer, gotoTemplate, codeGenSize); - MJIT::JumpTableEntry _entry(_targetIndex, buffer); - _jumpTable[_targetCounter] = _entry; - _targetCounter++; - } - break; - */ - case TR_J9ByteCode::J9BClcmp: - COPY_TEMPLATE(buffer, lcmpTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCfcmpl: - COPY_TEMPLATE(buffer, fcmplTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCfcmpg: - COPY_TEMPLATE(buffer, fcmpgTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCdcmpl: - COPY_TEMPLATE(buffer, dcmplTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCdcmpg: - COPY_TEMPLATE(buffer, dcmpgTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCpop: - COPY_TEMPLATE(buffer, popTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCpop2: - COPY_TEMPLATE(buffer, pop2Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCswap: - COPY_TEMPLATE(buffer, swapTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCdup: - COPY_TEMPLATE(buffer, dupTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCdupx1: - COPY_TEMPLATE(buffer, dupx1Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCdupx2: - COPY_TEMPLATE(buffer, dupx2Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCdup2: - COPY_TEMPLATE(buffer, dup2Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCdup2x1: - COPY_TEMPLATE(buffer, dup2x1Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCdup2x2: - COPY_TEMPLATE(buffer, dup2x2Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCgetfield: - if(calledCGSize = generateGetField(buffer, bci)) { - buffer += calledCGSize; - codeGenSize += calledCGSize; - } else { + return 0; + } + break; + branchByteCode(TR_J9ByteCode::J9BCgoto,gotoTemplate); + branchByteCode(TR_J9ByteCode::J9BCifne,ifneTemplate); + branchByteCode(TR_J9ByteCode::J9BCifeq,ifeqTemplate); + branchByteCode(TR_J9ByteCode::J9BCiflt,ifltTemplate); + branchByteCode(TR_J9ByteCode::J9BCifle,ifleTemplate); + branchByteCode(TR_J9ByteCode::J9BCifgt,ifgtTemplate); + branchByteCode(TR_J9ByteCode::J9BCifge,ifgeTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmpeq,ificmpeqTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmpne,ificmpneTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmplt,ificmpltTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmple,ificmpleTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmpge,ificmpgeTemplate); + branchByteCode(TR_J9ByteCode::J9BCificmpgt,ificmpgtTemplate); + branchByteCode(TR_J9ByteCode::J9BCifacmpeq,ifacmpeqTemplate); + branchByteCode(TR_J9ByteCode::J9BCifacmpne,ifacmpneTemplate); + branchByteCode(TR_J9ByteCode::J9BCifnull,ifnullTemplate); + branchByteCode(TR_J9ByteCode::J9BCifnonnull,ifnonnullTemplate); + /* Commenting out as this is currently untested + case TR_J9ByteCode::J9BCgotow: + { + _targetIndex = bci->next4BytesSigned() + bci->currentByteCodeIndex(); // Copied because of 4 byte jump + COPY_TEMPLATE(buffer, gotoTemplate, codeGenSize); + MJIT::JumpTableEntry _entry(_targetIndex, buffer); + _jumpTable[_targetCounter] = _entry; + _targetCounter++; + break; + } + */ + case TR_J9ByteCode::J9BClcmp: + COPY_TEMPLATE(buffer, lcmpTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfcmpl: + COPY_TEMPLATE(buffer, fcmplTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfcmpg: + COPY_TEMPLATE(buffer, fcmpgTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdcmpl: + COPY_TEMPLATE(buffer, dcmplTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdcmpg: + COPY_TEMPLATE(buffer, dcmpgTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCpop: + COPY_TEMPLATE(buffer, popTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCpop2: + COPY_TEMPLATE(buffer, pop2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCswap: + COPY_TEMPLATE(buffer, swapTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdup: + COPY_TEMPLATE(buffer, dupTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdupx1: + COPY_TEMPLATE(buffer, dupx1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdupx2: + COPY_TEMPLATE(buffer, dupx2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdup2: + COPY_TEMPLATE(buffer, dup2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdup2x1: + COPY_TEMPLATE(buffer, dup2x1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdup2x2: + COPY_TEMPLATE(buffer, dup2x2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCgetfield: + if (calledCGSize = generateGetField(buffer, bci)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } + else + { #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "Unsupported getField bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); + trfprintf(_logFileFP, "Unsupported getField bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); #endif - return 0; - } - break; - case TR_J9ByteCode::J9BCputfield: - return 0; - if(calledCGSize = generatePutField(buffer, bci)) { - buffer += calledCGSize; - codeGenSize += calledCGSize; - } else { + return 0; + } + break; + case TR_J9ByteCode::J9BCputfield: + return 0; + if (calledCGSize = generatePutField(buffer, bci)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } + else + { #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "Unsupported putField bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); + trfprintf(_logFileFP, "Unsupported putField bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); #endif - return 0; - } - break; - case TR_J9ByteCode::J9BCgetstatic: - if(calledCGSize = generateGetStatic(buffer, bci)){ - buffer += calledCGSize; - codeGenSize += calledCGSize; - } else { + return 0; + } + break; + case TR_J9ByteCode::J9BCgetstatic: + if (calledCGSize = generateGetStatic(buffer, bci)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } + else + { #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "Unsupported getstatic bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); + trfprintf(_logFileFP, "Unsupported getstatic bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); #endif - return 0; - } - break; - case TR_J9ByteCode::J9BCputstatic: - return 0; - if(calledCGSize = generatePutStatic(buffer, bci)){ - buffer += calledCGSize; - codeGenSize += calledCGSize; - } else { + return 0; + } + break; + case TR_J9ByteCode::J9BCputstatic: + return 0; + if (calledCGSize = generatePutStatic(buffer, bci)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } + else + { #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "Unsupported putstatic bytecode - %s %d\n", _bcMnemonic.c_str(), _bcOpcode); + trfprintf(_logFileFP, "Unsupported putstatic bytecode - %s %d\n", _bcMnemonic.c_str(), _bcOpcode); #endif - return 0; - } - break; - case TR_J9ByteCode::J9BCiadd: - COPY_TEMPLATE(buffer, iAddTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCisub: - COPY_TEMPLATE(buffer, iSubTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCimul: - COPY_TEMPLATE(buffer, iMulTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCidiv: - COPY_TEMPLATE(buffer, iDivTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCirem: - COPY_TEMPLATE(buffer, iRemTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCineg: - COPY_TEMPLATE(buffer, iNegTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCishl: - COPY_TEMPLATE(buffer, iShlTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCishr: - COPY_TEMPLATE(buffer, iShrTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCiushr: - COPY_TEMPLATE(buffer, iUshrTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCiand: - COPY_TEMPLATE(buffer, iAndTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCior: - COPY_TEMPLATE(buffer, iOrTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCixor: - COPY_TEMPLATE(buffer, iXorTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCi2l: - COPY_TEMPLATE(buffer, i2lTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCl2i: - COPY_TEMPLATE(buffer, l2iTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCi2b: - COPY_TEMPLATE(buffer, i2bTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCi2s: - COPY_TEMPLATE(buffer, i2sTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCi2c: - COPY_TEMPLATE(buffer, i2cTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCi2d: - COPY_TEMPLATE(buffer, i2dTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCl2d: - COPY_TEMPLATE(buffer, l2dTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCd2i: - COPY_TEMPLATE(buffer, d2iTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCd2l: - COPY_TEMPLATE(buffer, d2lTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCiconstm1: - COPY_TEMPLATE(buffer, iconstm1Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCiconst0: - COPY_TEMPLATE(buffer, iconst0Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCiconst1: - COPY_TEMPLATE(buffer, iconst1Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCiconst2: - COPY_TEMPLATE(buffer, iconst2Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCiconst3: - COPY_TEMPLATE(buffer, iconst3Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCiconst4: - COPY_TEMPLATE(buffer, iconst4Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCiconst5: - COPY_TEMPLATE(buffer, iconst5Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCbipush: - COPY_TEMPLATE(buffer, bipushTemplate, codeGenSize); - patchImm4(buffer, (U_32)bci->nextByteSigned()); - break; - case TR_J9ByteCode::J9BCsipush: - COPY_TEMPLATE(buffer, sipushTemplatePrologue, codeGenSize); - patchImm2(buffer, (U_32)bci->next2Bytes()); - COPY_TEMPLATE(buffer, sipushTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCiinc: { - uint8_t index = bci->nextByte(); - int8_t value = bci->nextByteSigned(2); - COPY_TEMPLATE(buffer, iIncTemplate_01_load, codeGenSize); - patchImm4(buffer, (U_32)(index*8)); - COPY_TEMPLATE(buffer, iIncTemplate_02_add, codeGenSize); - patchImm1(buffer, value); - COPY_TEMPLATE(buffer, iIncTemplate_03_store, codeGenSize); - patchImm4(buffer, (U_32)(index*8)); - break; } - case TR_J9ByteCode::J9BCladd: - COPY_TEMPLATE(buffer, lAddTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClsub: - COPY_TEMPLATE(buffer, lSubTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClmul: - COPY_TEMPLATE(buffer, lMulTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCldiv: - COPY_TEMPLATE(buffer, lDivTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClrem: - COPY_TEMPLATE(buffer, lRemTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClneg: - COPY_TEMPLATE(buffer, lNegTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClshl: - COPY_TEMPLATE(buffer, lShlTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClshr: - COPY_TEMPLATE(buffer, lShrTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClushr: - COPY_TEMPLATE(buffer, lUshrTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCland: - COPY_TEMPLATE(buffer, lAndTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClor: - COPY_TEMPLATE(buffer, lOrTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClxor: - COPY_TEMPLATE(buffer, lXorTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BClconst0: - COPY_TEMPLATE(buffer, lconst0Template, codeGenSize); - break; - case TR_J9ByteCode::J9BClconst1: - COPY_TEMPLATE(buffer, lconst1Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCfadd: - COPY_TEMPLATE(buffer, fAddTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCfsub: - COPY_TEMPLATE(buffer, fSubTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCfmul: - COPY_TEMPLATE(buffer, fMulTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCfdiv: - COPY_TEMPLATE(buffer, fDivTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCfrem: - COPY_TEMPLATE(buffer, fRemTemplate, codeGenSize); - SSEfloatRemainder = getHelperOrTrampolineAddress(TR_AMD64floatRemainder, (uintptr_t)buffer); - PATCH_RELATIVE_ADDR_32_BIT(buffer, (intptr_t)SSEfloatRemainder); - slotCountRem = 1; - goto genRem; - case TR_J9ByteCode::J9BCdrem: - COPY_TEMPLATE(buffer, dRemTemplate, codeGenSize); - SSEdoubleRemainder = getHelperOrTrampolineAddress(TR_AMD64doubleRemainder, (uintptr_t)buffer); - PATCH_RELATIVE_ADDR_32_BIT(buffer, (intptr_t)SSEdoubleRemainder); - slotCountRem = 2; - isDouble = true; - goto genRem; - genRem: - returnRegIndex = properties->getFloatReturnRegister(); - COPY_TEMPLATE(buffer, retTemplate_sub, codeGenSize); - patchImm1(buffer, slotCountRem*8); - switch(returnRegIndex){ - case TR::RealRegister::xmm0: - if(isDouble) { - COPY_TEMPLATE(buffer, loadDxmm0Return, codeGenSize); - } else { - COPY_TEMPLATE(buffer, loadxmm0Return, codeGenSize); - } - break; - } - break; - case TR_J9ByteCode::J9BCfneg: - COPY_TEMPLATE(buffer, fNegTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCfconst0: - COPY_TEMPLATE(buffer, fconst0Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCfconst1: - COPY_TEMPLATE(buffer, fconst1Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCfconst2: - COPY_TEMPLATE(buffer, fconst2Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCdconst0: - COPY_TEMPLATE(buffer, dconst0Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCdconst1: - COPY_TEMPLATE(buffer, dconst1Template, codeGenSize); - break; - case TR_J9ByteCode::J9BCdadd: - COPY_TEMPLATE(buffer, dAddTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCdsub: - COPY_TEMPLATE(buffer, dSubTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCdmul: - COPY_TEMPLATE(buffer, dMulTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCddiv: - COPY_TEMPLATE(buffer, dDivTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCdneg: - COPY_TEMPLATE(buffer, dNegTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCi2f: - COPY_TEMPLATE(buffer, i2fTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCf2i: - COPY_TEMPLATE(buffer, f2iTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCl2f: - COPY_TEMPLATE(buffer, l2fTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCf2l: - COPY_TEMPLATE(buffer, f2lTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCd2f: - COPY_TEMPLATE(buffer, d2fTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCf2d: - COPY_TEMPLATE(buffer, f2dTemplate, codeGenSize); - break; - case TR_J9ByteCode::J9BCReturnB: // fallthrough - case TR_J9ByteCode::J9BCReturnC: // fallthrough - case TR_J9ByteCode::J9BCReturnS: // fallthrough - case TR_J9ByteCode::J9BCReturnZ: // fallthrough - case TR_J9ByteCode::J9BCgenericReturn: - if(calledCGSize = generateReturn(buffer, _compilee->returnType(), properties)) - { - buffer += calledCGSize; - codeGenSize += calledCGSize; - } else if(_compilee->returnType() == TR::Int64) { - buffer_size_t epilogueSize = generateEpologue(buffer); - buffer += epilogueSize; - codeGenSize += epilogueSize; - COPY_TEMPLATE(buffer, eaxReturnTemplate, codeGenSize); - COPY_TEMPLATE(buffer, retTemplate_add, codeGenSize); - patchImm1(buffer, 8); - COPY_TEMPLATE(buffer, vReturnTemplate, codeGenSize); - } else { + return 0; + } + break; + case TR_J9ByteCode::J9BCiadd: + COPY_TEMPLATE(buffer, iAddTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCisub: + COPY_TEMPLATE(buffer, iSubTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCimul: + COPY_TEMPLATE(buffer, iMulTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCidiv: + COPY_TEMPLATE(buffer, iDivTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCirem: + COPY_TEMPLATE(buffer, iRemTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCineg: + COPY_TEMPLATE(buffer, iNegTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCishl: + COPY_TEMPLATE(buffer, iShlTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCishr: + COPY_TEMPLATE(buffer, iShrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCiushr: + COPY_TEMPLATE(buffer, iUshrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCiand: + COPY_TEMPLATE(buffer, iAndTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCior: + COPY_TEMPLATE(buffer, iOrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCixor: + COPY_TEMPLATE(buffer, iXorTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2l: + COPY_TEMPLATE(buffer, i2lTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCl2i: + COPY_TEMPLATE(buffer, l2iTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2b: + COPY_TEMPLATE(buffer, i2bTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2s: + COPY_TEMPLATE(buffer, i2sTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2c: + COPY_TEMPLATE(buffer, i2cTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2d: + COPY_TEMPLATE(buffer, i2dTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCl2d: + COPY_TEMPLATE(buffer, l2dTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCd2i: + COPY_TEMPLATE(buffer, d2iTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCd2l: + COPY_TEMPLATE(buffer, d2lTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconstm1: + COPY_TEMPLATE(buffer, iconstm1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst0: + COPY_TEMPLATE(buffer, iconst0Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst1: + COPY_TEMPLATE(buffer, iconst1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst2: + COPY_TEMPLATE(buffer, iconst2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst3: + COPY_TEMPLATE(buffer, iconst3Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst4: + COPY_TEMPLATE(buffer, iconst4Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCiconst5: + COPY_TEMPLATE(buffer, iconst5Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCbipush: + COPY_TEMPLATE(buffer, bipushTemplate, codeGenSize); + patchImm4(buffer, (U_32)bci->nextByteSigned()); + break; + case TR_J9ByteCode::J9BCsipush: + COPY_TEMPLATE(buffer, sipushTemplatePrologue, codeGenSize); + patchImm2(buffer, (U_32)bci->next2Bytes()); + COPY_TEMPLATE(buffer, sipushTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCiinc: + { + uint8_t index = bci->nextByte(); + int8_t value = bci->nextByteSigned(2); + COPY_TEMPLATE(buffer, iIncTemplate_01_load, codeGenSize); + patchImm4(buffer, (U_32)(index*8)); + COPY_TEMPLATE(buffer, iIncTemplate_02_add, codeGenSize); + patchImm1(buffer, value); + COPY_TEMPLATE(buffer, iIncTemplate_03_store, codeGenSize); + patchImm4(buffer, (U_32)(index*8)); + break; + } + case TR_J9ByteCode::J9BCladd: + COPY_TEMPLATE(buffer, lAddTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClsub: + COPY_TEMPLATE(buffer, lSubTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClmul: + COPY_TEMPLATE(buffer, lMulTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCldiv: + COPY_TEMPLATE(buffer, lDivTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClrem: + COPY_TEMPLATE(buffer, lRemTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClneg: + COPY_TEMPLATE(buffer, lNegTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClshl: + COPY_TEMPLATE(buffer, lShlTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClshr: + COPY_TEMPLATE(buffer, lShrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClushr: + COPY_TEMPLATE(buffer, lUshrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCland: + COPY_TEMPLATE(buffer, lAndTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClor: + COPY_TEMPLATE(buffer, lOrTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClxor: + COPY_TEMPLATE(buffer, lXorTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BClconst0: + COPY_TEMPLATE(buffer, lconst0Template, codeGenSize); + break; + case TR_J9ByteCode::J9BClconst1: + COPY_TEMPLATE(buffer, lconst1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCfadd: + COPY_TEMPLATE(buffer, fAddTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfsub: + COPY_TEMPLATE(buffer, fSubTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfmul: + COPY_TEMPLATE(buffer, fMulTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfdiv: + COPY_TEMPLATE(buffer, fDivTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfrem: + COPY_TEMPLATE(buffer, fRemTemplate, codeGenSize); + SSEfloatRemainder = getHelperOrTrampolineAddress(TR_AMD64floatRemainder, (uintptr_t)buffer); + PATCH_RELATIVE_ADDR_32_BIT(buffer, (intptr_t)SSEfloatRemainder); + slotCountRem = 1; + goto genRem; + case TR_J9ByteCode::J9BCdrem: + COPY_TEMPLATE(buffer, dRemTemplate, codeGenSize); + SSEdoubleRemainder = getHelperOrTrampolineAddress(TR_AMD64doubleRemainder, (uintptr_t)buffer); + PATCH_RELATIVE_ADDR_32_BIT(buffer, (intptr_t)SSEdoubleRemainder); + slotCountRem = 2; + isDouble = true; + goto genRem; + genRem: + returnRegIndex = properties->getFloatReturnRegister(); + COPY_TEMPLATE(buffer, retTemplate_sub, codeGenSize); + patchImm1(buffer, slotCountRem*8); + switch (returnRegIndex) + { + case TR::RealRegister::xmm0: + if (isDouble) + COPY_TEMPLATE(buffer, loadDxmm0Return, codeGenSize); + else + COPY_TEMPLATE(buffer, loadxmm0Return, codeGenSize); + break; + } + break; + case TR_J9ByteCode::J9BCfneg: + COPY_TEMPLATE(buffer, fNegTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCfconst0: + COPY_TEMPLATE(buffer, fconst0Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCfconst1: + COPY_TEMPLATE(buffer, fconst1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCfconst2: + COPY_TEMPLATE(buffer, fconst2Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdconst0: + COPY_TEMPLATE(buffer, dconst0Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdconst1: + COPY_TEMPLATE(buffer, dconst1Template, codeGenSize); + break; + case TR_J9ByteCode::J9BCdadd: + COPY_TEMPLATE(buffer, dAddTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdsub: + COPY_TEMPLATE(buffer, dSubTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdmul: + COPY_TEMPLATE(buffer, dMulTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCddiv: + COPY_TEMPLATE(buffer, dDivTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCdneg: + COPY_TEMPLATE(buffer, dNegTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCi2f: + COPY_TEMPLATE(buffer, i2fTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCf2i: + COPY_TEMPLATE(buffer, f2iTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCl2f: + COPY_TEMPLATE(buffer, l2fTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCf2l: + COPY_TEMPLATE(buffer, f2lTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCd2f: + COPY_TEMPLATE(buffer, d2fTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCf2d: + COPY_TEMPLATE(buffer, f2dTemplate, codeGenSize); + break; + case TR_J9ByteCode::J9BCReturnB: /* FALLTHROUGH */ + case TR_J9ByteCode::J9BCReturnC: /* FALLTHROUGH */ + case TR_J9ByteCode::J9BCReturnS: /* FALLTHROUGH */ + case TR_J9ByteCode::J9BCReturnZ: /* FALLTHROUGH */ + case TR_J9ByteCode::J9BCgenericReturn: + if (calledCGSize = generateReturn(buffer, _compilee->returnType(), properties)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } + else if (_compilee->returnType() == TR::Int64) + { + buffer_size_t epilogueSize = generateEpologue(buffer); + buffer += epilogueSize; + codeGenSize += epilogueSize; + COPY_TEMPLATE(buffer, eaxReturnTemplate, codeGenSize); + COPY_TEMPLATE(buffer, retTemplate_add, codeGenSize); + patchImm1(buffer, 8); + COPY_TEMPLATE(buffer, vReturnTemplate, codeGenSize); + } + else + { #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "Unknown Return type: %d\n", _compilee->returnType()); + trfprintf(_logFileFP, "Unknown Return type: %d\n", _compilee->returnType()); #endif - return 0; - } - break; - case TR_J9ByteCode::J9BCinvokestatic: - if(calledCGSize = generateInvokeStatic(buffer, bci, properties)) { - buffer += calledCGSize; - codeGenSize += calledCGSize; - } else { + return 0; + } + break; + case TR_J9ByteCode::J9BCinvokestatic: + if (calledCGSize = generateInvokeStatic(buffer, bci, properties)) + { + buffer += calledCGSize; + codeGenSize += calledCGSize; + } + else + { #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "Unsupported invokestatic bytecode %d\n", bc); + trfprintf(_logFileFP, "Unsupported invokestatic bytecode %d\n", bc); #endif - return 0; - } - break; - default: + return 0; + } + break; + default: #ifdef MJIT_DEBUG_BC_WALKING - trfprintf(_logFileFP, "no match for bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); + trfprintf(_logFileFP, "no match for bytecode - %s (%d)\n", _bcMnemonic.c_str(), _bcOpcode); #endif - return 0; - } - } - - // patch branches and goto addresses - for(int i = 0; i < _targetCounter; i++) { - MJIT::JumpTableEntry e = _jumpTable[i]; - char * _target = _byteCodeIndexToJitTable[e.byteCodeIndex]; - PATCH_RELATIVE_ADDR_32_BIT(e.codeCacheAddress, (uintptr_t)_target); - } - - return codeGenSize; -} + return 0; + } + } + + // patch branches and goto addresses + for (int i = 0; i < _targetCounter; i++) + { + MJIT::JumpTableEntry e = _jumpTable[i]; + char *_target = _byteCodeIndexToJitTable[e.byteCodeIndex]; + PATCH_RELATIVE_ADDR_32_BIT(e.codeCacheAddress, (uintptr_t)_target); + } + + return codeGenSize; + } buffer_size_t MJIT::CodeGenerator::generateEdgeCounter(char *buffer, TR_J9ByteCodeIterator *bci) -{ - buffer_size_t returnSize = 0; - // TODO: Get the pointer to the profiling counter - uint32_t* profilingCounterLocation = NULL; + { + buffer_size_t returnSize = 0; + // TODO: Get the pointer to the profiling counter + uint32_t *profilingCounterLocation = NULL; - COPY_TEMPLATE(buffer, loadCounter, returnSize); - patchImm8(buffer, (uint64_t)profilingCounterLocation); - COPY_TEMPLATE(buffer, incrementCounter, returnSize); - patchImm8(buffer, (uint64_t)profilingCounterLocation); + COPY_TEMPLATE(buffer, loadCounter, returnSize); + patchImm8(buffer, (uint64_t)profilingCounterLocation); + COPY_TEMPLATE(buffer, incrementCounter, returnSize); + patchImm8(buffer, (uint64_t)profilingCounterLocation); - return returnSize; -} + return returnSize; + } buffer_size_t MJIT::CodeGenerator::generateReturn(char *buffer, TR::DataType dt, struct TR::X86LinkageProperties *properties) -{ - buffer_size_t returnSize = 0; - buffer_size_t calledCGSize = 0; - TR::RealRegister::RegNum returnRegIndex; - U_16 slotCount = 0; - bool isAddressOrLong = false; - switch(dt){ - case TR::Address: - slotCount = 1; - isAddressOrLong = true; - returnRegIndex = properties->getIntegerReturnRegister(); - goto genReturn; - case TR::Int64: - slotCount = 2; - isAddressOrLong = true; - returnRegIndex = properties->getIntegerReturnRegister(); - goto genReturn; - case TR::Int8: // fallthrough - case TR::Int16: // fallthrough - case TR::Int32: // fallthrough - slotCount = 1; - returnRegIndex = properties->getIntegerReturnRegister(); - goto genReturn; - case TR::Float: - slotCount = 1; - returnRegIndex = properties->getFloatReturnRegister(); - goto genReturn; - case TR::Double: - slotCount = 2; - returnRegIndex = properties->getFloatReturnRegister(); - goto genReturn; - case TR::NoType: - returnRegIndex = TR::RealRegister::NoReg; - goto genReturn; - genReturn: - calledCGSize = generateEpologue(buffer); - buffer += calledCGSize; - returnSize += calledCGSize; - switch(returnRegIndex){ - case TR::RealRegister::eax: - if(isAddressOrLong) { - COPY_TEMPLATE(buffer, raxReturnTemplate, returnSize); - } else { - COPY_TEMPLATE(buffer, eaxReturnTemplate, returnSize); - } - goto genSlots; - case TR::RealRegister::xmm0: - COPY_TEMPLATE(buffer, xmm0ReturnTemplate, returnSize); - goto genSlots; - case TR::RealRegister::NoReg: - goto genSlots; - genSlots: - if(slotCount){ - COPY_TEMPLATE(buffer, retTemplate_add, returnSize); - patchImm1(buffer, slotCount*8); - } - COPY_TEMPLATE(buffer, vReturnTemplate, returnSize); - break; + { + buffer_size_t returnSize = 0; + buffer_size_t calledCGSize = 0; + TR::RealRegister::RegNum returnRegIndex; + U_16 slotCount = 0; + bool isAddressOrLong = false; + switch (dt) + { + case TR::Address: + slotCount = 1; + isAddressOrLong = true; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genReturn; + case TR::Int64: + slotCount = 2; + isAddressOrLong = true; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genReturn; + case TR::Int8: /* FALLTHROUGH */ + case TR::Int16: /* FALLTHROUGH */ + case TR::Int32: /* FALLTHROUGH */ + slotCount = 1; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genReturn; + case TR::Float: + slotCount = 1; + returnRegIndex = properties->getFloatReturnRegister(); + goto genReturn; + case TR::Double: + slotCount = 2; + returnRegIndex = properties->getFloatReturnRegister(); + goto genReturn; + case TR::NoType: + returnRegIndex = TR::RealRegister::NoReg; + goto genReturn; + genReturn: + calledCGSize = generateEpologue(buffer); + buffer += calledCGSize; + returnSize += calledCGSize; + switch (returnRegIndex) + { + case TR::RealRegister::eax: + if (isAddressOrLong) + COPY_TEMPLATE(buffer, raxReturnTemplate, returnSize); + else + COPY_TEMPLATE(buffer, eaxReturnTemplate, returnSize); + goto genSlots; + case TR::RealRegister::xmm0: + COPY_TEMPLATE(buffer, xmm0ReturnTemplate, returnSize); + goto genSlots; + case TR::RealRegister::NoReg: + goto genSlots; + genSlots: + if (slotCount) + { + COPY_TEMPLATE(buffer, retTemplate_add, returnSize); + patchImm1(buffer, slotCount*8); + } + COPY_TEMPLATE(buffer, vReturnTemplate, returnSize); + break; } - break; - default: - trfprintf(_logFileFP, "Argument type %s is not supported\n", dt.toString()); - break; - } + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", dt.toString()); + break; + } return returnSize; -} + } buffer_size_t MJIT::CodeGenerator::generateLoad(char *buffer, TR_J9ByteCode bc, MJIT::ByteCodeIterator *bci) -{ - char *signature = _compilee->signatureChars(); - int index = -1; - buffer_size_t loadSize = 0; - switch(bc) { - GenerateATemplate: - COPY_TEMPLATE(buffer, aloadTemplatePrologue, loadSize); - patchImm4(buffer, (U_32)(index*8)); - COPY_TEMPLATE(buffer, loadTemplate, loadSize); - break; - GenerateITemplate: - COPY_TEMPLATE(buffer, iloadTemplatePrologue, loadSize); - patchImm4(buffer, (U_32)(index*8)); - COPY_TEMPLATE(buffer, loadTemplate, loadSize); - break; - GenerateLTemplate: - COPY_TEMPLATE(buffer, lloadTemplatePrologue, loadSize); - patchImm4(buffer, (U_32)(index*8)); - COPY_TEMPLATE(buffer, loadTemplate, loadSize); - break; - case TR_J9ByteCode::J9BClload0: - case TR_J9ByteCode::J9BCdload0: - index = 0; - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfload0: - case TR_J9ByteCode::J9BCiload0: - index = 0; - goto GenerateITemplate; - case TR_J9ByteCode::J9BCaload0: - index = 0; - goto GenerateATemplate; - case TR_J9ByteCode::J9BClload1: - case TR_J9ByteCode::J9BCdload1: - index = 1; - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfload1: - case TR_J9ByteCode::J9BCiload1: - index = 1; - goto GenerateITemplate; - case TR_J9ByteCode::J9BCaload1: - index = 1; - goto GenerateATemplate; - case TR_J9ByteCode::J9BClload2: - case TR_J9ByteCode::J9BCdload2: - index = 2; - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfload2: - case TR_J9ByteCode::J9BCiload2: - index = 2; - goto GenerateITemplate; - case TR_J9ByteCode::J9BCaload2: - index = 2; - goto GenerateATemplate; - case TR_J9ByteCode::J9BClload3: - case TR_J9ByteCode::J9BCdload3: - index = 3; - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfload3: - case TR_J9ByteCode::J9BCiload3: - index = 3; - goto GenerateITemplate; - case TR_J9ByteCode::J9BCaload3: - index = 3; - goto GenerateATemplate; - case TR_J9ByteCode::J9BClload: - case TR_J9ByteCode::J9BCdload: - index = bci->nextByte(); - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfload: - case TR_J9ByteCode::J9BCiload: - index = bci->nextByte(); - goto GenerateITemplate; - case TR_J9ByteCode::J9BCaload: - index = bci->nextByte(); - goto GenerateATemplate; - default: - return 0; - } - return loadSize; -} + { + char *signature = _compilee->signatureChars(); + int index = -1; + buffer_size_t loadSize = 0; + switch (bc) + { + GenerateATemplate: + COPY_TEMPLATE(buffer, aloadTemplatePrologue, loadSize); + patchImm4(buffer, (U_32)(index*8)); + COPY_TEMPLATE(buffer, loadTemplate, loadSize); + break; + GenerateITemplate: + COPY_TEMPLATE(buffer, iloadTemplatePrologue, loadSize); + patchImm4(buffer, (U_32)(index*8)); + COPY_TEMPLATE(buffer, loadTemplate, loadSize); + break; + GenerateLTemplate: + COPY_TEMPLATE(buffer, lloadTemplatePrologue, loadSize); + patchImm4(buffer, (U_32)(index*8)); + COPY_TEMPLATE(buffer, loadTemplate, loadSize); + break; + case TR_J9ByteCode::J9BClload0: + case TR_J9ByteCode::J9BCdload0: + index = 0; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload0: + case TR_J9ByteCode::J9BCiload0: + index = 0; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload0: + index = 0; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClload1: + case TR_J9ByteCode::J9BCdload1: + index = 1; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload1: + case TR_J9ByteCode::J9BCiload1: + index = 1; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload1: + index = 1; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClload2: + case TR_J9ByteCode::J9BCdload2: + index = 2; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload2: + case TR_J9ByteCode::J9BCiload2: + index = 2; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload2: + index = 2; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClload3: + case TR_J9ByteCode::J9BCdload3: + index = 3; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload3: + case TR_J9ByteCode::J9BCiload3: + index = 3; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload3: + index = 3; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClload: + case TR_J9ByteCode::J9BCdload: + index = bci->nextByte(); + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfload: + case TR_J9ByteCode::J9BCiload: + index = bci->nextByte(); + goto GenerateITemplate; + case TR_J9ByteCode::J9BCaload: + index = bci->nextByte(); + goto GenerateATemplate; + default: + return 0; + } + return loadSize; + } buffer_size_t MJIT::CodeGenerator::generateStore(char *buffer, TR_J9ByteCode bc, MJIT::ByteCodeIterator *bci) -{ - char *signature = _compilee->signatureChars(); - int index = -1; - buffer_size_t storeSize = 0; - switch(bc) { - GenerateATemplate: - COPY_TEMPLATE(buffer, astoreTemplate, storeSize); - patchImm4(buffer, (U_32)(index*8)); - break; - GenerateLTemplate: - COPY_TEMPLATE(buffer, lstoreTemplate, storeSize); - patchImm4(buffer, (U_32)(index*8)); - break; - GenerateITemplate: - COPY_TEMPLATE(buffer, istoreTemplate, storeSize); - patchImm4(buffer, (U_32)(index*8)); - break; - case TR_J9ByteCode::J9BClstore0: - case TR_J9ByteCode::J9BCdstore0: - index = 0; - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfstore0: - case TR_J9ByteCode::J9BCistore0: - index = 0; - goto GenerateITemplate; - case TR_J9ByteCode::J9BCastore0: - index = 0; - goto GenerateATemplate; - case TR_J9ByteCode::J9BClstore1: - case TR_J9ByteCode::J9BCdstore1: - index = 1; - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfstore1: - case TR_J9ByteCode::J9BCistore1: - index = 1; - goto GenerateITemplate; - case TR_J9ByteCode::J9BCastore1: - index = 1; - goto GenerateATemplate; - case TR_J9ByteCode::J9BClstore2: - case TR_J9ByteCode::J9BCdstore2: - index = 2; - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfstore2: - case TR_J9ByteCode::J9BCistore2: - index = 2; - goto GenerateITemplate; - case TR_J9ByteCode::J9BCastore2: - index = 2; - goto GenerateATemplate; - case TR_J9ByteCode::J9BClstore3: - case TR_J9ByteCode::J9BCdstore3: - index = 3; - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfstore3: - case TR_J9ByteCode::J9BCistore3: - index = 3; - goto GenerateITemplate; - case TR_J9ByteCode::J9BCastore3: - index = 3; - goto GenerateATemplate; - case TR_J9ByteCode::J9BClstore: - case TR_J9ByteCode::J9BCdstore: - index = bci->nextByte(); - goto GenerateLTemplate; - case TR_J9ByteCode::J9BCfstore: - case TR_J9ByteCode::J9BCistore: - index = bci->nextByte(); - goto GenerateITemplate; - case TR_J9ByteCode::J9BCastore: - index = bci->nextByte(); - goto GenerateATemplate; - default: - return 0; - } - return storeSize; -} - + { + char *signature = _compilee->signatureChars(); + int index = -1; + buffer_size_t storeSize = 0; + switch (bc) + { + GenerateATemplate: + COPY_TEMPLATE(buffer, astoreTemplate, storeSize); + patchImm4(buffer, (U_32)(index*8)); + break; + GenerateLTemplate: + COPY_TEMPLATE(buffer, lstoreTemplate, storeSize); + patchImm4(buffer, (U_32)(index*8)); + break; + GenerateITemplate: + COPY_TEMPLATE(buffer, istoreTemplate, storeSize); + patchImm4(buffer, (U_32)(index*8)); + break; + case TR_J9ByteCode::J9BClstore0: + case TR_J9ByteCode::J9BCdstore0: + index = 0; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore0: + case TR_J9ByteCode::J9BCistore0: + index = 0; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore0: + index = 0; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClstore1: + case TR_J9ByteCode::J9BCdstore1: + index = 1; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore1: + case TR_J9ByteCode::J9BCistore1: + index = 1; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore1: + index = 1; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClstore2: + case TR_J9ByteCode::J9BCdstore2: + index = 2; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore2: + case TR_J9ByteCode::J9BCistore2: + index = 2; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore2: + index = 2; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClstore3: + case TR_J9ByteCode::J9BCdstore3: + index = 3; + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore3: + case TR_J9ByteCode::J9BCistore3: + index = 3; + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore3: + index = 3; + goto GenerateATemplate; + case TR_J9ByteCode::J9BClstore: + case TR_J9ByteCode::J9BCdstore: + index = bci->nextByte(); + goto GenerateLTemplate; + case TR_J9ByteCode::J9BCfstore: + case TR_J9ByteCode::J9BCistore: + index = bci->nextByte(); + goto GenerateITemplate; + case TR_J9ByteCode::J9BCastore: + index = bci->nextByte(); + goto GenerateATemplate; + default: + return 0; + } + return storeSize; + } -//TODO: see ::emitSnippitCode and redo this to use snippits. +// TODO: see ::emitSnippitCode and redo this to use snippits. buffer_size_t -MJIT::CodeGenerator::generateArgumentMoveForStaticMethod(char *buffer, TR_ResolvedMethod *staticMethod, char *typeString, U_16 typeStringLength, U_16 slotCount, struct TR::X86LinkageProperties *properties){ - buffer_size_t argumentMoveSize = 0; - - //Pull out parameters and find their place for calling - U_16 paramCount = MJIT::getParamCount(typeString, typeStringLength); - U_16 actualParamCount = MJIT::getActualParamCount(typeString, typeStringLength); - //TODO: Implement passing arguments on stack when surpassing register count. - if(actualParamCount > 4){ - return 0; - } - - if(!paramCount){ - return -1; - } - MJIT::ParamTableEntry paramTableEntries[paramCount]; - for(int i=0; i0){ - if(paramTableEntries[lastIndex].notInitialized) { - lastIndex--; - } else { - break; - } - } - bool isLongOrDouble; - bool isReference; - TR::RealRegister::RegNum argRegNum; - MJIT::ParamTableEntry entry; - while(actualParamCount>0){ - isLongOrDouble = false; - isReference = false; - argRegNum = TR::RealRegister::NoReg; - switch((actualParamCount-1)){ - case 0: // fallthrough - case 1: // fallthrough - case 2: // fallthrough - case 3:{ - switch(typeString[(actualParamCount-1)+3]){ - case MJIT::BYTE_TYPE_CHARACTER: - case MJIT::CHAR_TYPE_CHARACTER: - case MJIT::INT_TYPE_CHARACTER: - case MJIT::SHORT_TYPE_CHARACTER: - case MJIT::BOOLEAN_TYPE_CHARACTER: - argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); - goto genArg; - case MJIT::LONG_TYPE_CHARACTER: - isLongOrDouble = true; - argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); - goto genArg; - case MJIT::CLASSNAME_TYPE_CHARACTER: - isReference = true; - argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); - goto genArg; - case MJIT::DOUBLE_TYPE_CHARACTER: - isLongOrDouble = true; - argRegNum = properties->getFloatArgumentRegister(actualParamCount-1); - goto genArg; - case MJIT::FLOAT_TYPE_CHARACTER: - argRegNum = properties->getFloatArgumentRegister(actualParamCount-1); - goto genArg; - genArg: - switch(argRegNum){ - case TR::RealRegister::eax: - if(isLongOrDouble) { - COPY_TEMPLATE(buffer, moveraxForCall, argumentMoveSize); - } else if(isReference) { - COPY_TEMPLATE(buffer, moveraxRefForCall, argumentMoveSize); - } else { - COPY_TEMPLATE(buffer, moveeaxForCall, argumentMoveSize); - } - break; - case TR::RealRegister::esi: - if(isLongOrDouble) { - COPY_TEMPLATE(buffer, moversiForCall, argumentMoveSize); - } else if(isReference) { - COPY_TEMPLATE(buffer, moversiRefForCall, argumentMoveSize); - } else { - COPY_TEMPLATE(buffer, moveesiForCall, argumentMoveSize); - } - break; - case TR::RealRegister::edx: - if(isLongOrDouble) { - COPY_TEMPLATE(buffer, moverdxForCall, argumentMoveSize); - } else if(isReference) { - COPY_TEMPLATE(buffer, moverdxRefForCall, argumentMoveSize); - } else { - COPY_TEMPLATE(buffer, moveedxForCall, argumentMoveSize); - } - break; - case TR::RealRegister::ecx: - if(isLongOrDouble) { - COPY_TEMPLATE(buffer, movercxForCall, argumentMoveSize); - } else if(isReference) { - COPY_TEMPLATE(buffer, movercxRefForCall, argumentMoveSize); - } else { - COPY_TEMPLATE(buffer, moveecxForCall, argumentMoveSize); - } - break; - case TR::RealRegister::xmm0: - if(isLongOrDouble) { - COPY_TEMPLATE(buffer, moveDxmm0ForCall, argumentMoveSize); - } else { - COPY_TEMPLATE(buffer, movexmm0ForCall, argumentMoveSize); - } - break; - case TR::RealRegister::xmm1: - if(isLongOrDouble) { - COPY_TEMPLATE(buffer, moveDxmm1ForCall, argumentMoveSize); - } else { - COPY_TEMPLATE(buffer, movexmm1ForCall, argumentMoveSize); - } - break; - case TR::RealRegister::xmm2: - if(isLongOrDouble) { - COPY_TEMPLATE(buffer, moveDxmm2ForCall, argumentMoveSize); - } else { - COPY_TEMPLATE(buffer, movexmm2ForCall, argumentMoveSize); - } - break; - case TR::RealRegister::xmm3: - if(isLongOrDouble) { - COPY_TEMPLATE(buffer, moveDxmm3ForCall, argumentMoveSize); - } else { - COPY_TEMPLATE(buffer, movexmm3ForCall, argumentMoveSize); - } - break; - } +MJIT::CodeGenerator::generateArgumentMoveForStaticMethod( + char *buffer, + TR_ResolvedMethod *staticMethod, + char *typeString, + U_16 typeStringLength, + U_16 slotCount, + struct TR::X86LinkageProperties *properties) + { + buffer_size_t argumentMoveSize = 0; + + // Pull out parameters and find their place for calling + U_16 paramCount = MJIT::getParamCount(typeString, typeStringLength); + U_16 actualParamCount = MJIT::getActualParamCount(typeString, typeStringLength); + // TODO: Implement passing arguments on stack when surpassing register count. + if (actualParamCount > 4) + return 0; + + if (!paramCount) + return -1; + + MJIT::ParamTableEntry paramTableEntries[paramCount]; + for (int i=0; i 0) + { + if (paramTableEntries[lastIndex].notInitialized) + lastIndex--; + else + break; + } + bool isLongOrDouble; + bool isReference; + TR::RealRegister::RegNum argRegNum; + MJIT::ParamTableEntry entry; + while (actualParamCount > 0) + { + isLongOrDouble = false; + isReference = false; + argRegNum = TR::RealRegister::NoReg; + switch (actualParamCount-1) + { + case 0: /* FALLTHROUGH */ + case 1: /* FALLTHROUGH */ + case 2: /* FALLTHROUGH */ + case 3: + { + switch (typeString[(actualParamCount-1)+3]) + { + case MJIT::BYTE_TYPE_CHARACTER: + case MJIT::CHAR_TYPE_CHARACTER: + case MJIT::INT_TYPE_CHARACTER: + case MJIT::SHORT_TYPE_CHARACTER: + case MJIT::BOOLEAN_TYPE_CHARACTER: + argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); + goto genArg; + case MJIT::LONG_TYPE_CHARACTER: + isLongOrDouble = true; + argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); + goto genArg; + case MJIT::CLASSNAME_TYPE_CHARACTER: + isReference = true; + argRegNum = properties->getIntegerArgumentRegister(actualParamCount-1); + goto genArg; + case MJIT::DOUBLE_TYPE_CHARACTER: + isLongOrDouble = true; + argRegNum = properties->getFloatArgumentRegister(actualParamCount-1); + goto genArg; + case MJIT::FLOAT_TYPE_CHARACTER: + argRegNum = properties->getFloatArgumentRegister(actualParamCount-1); + goto genArg; + genArg: + switch (argRegNum) + { + case TR::RealRegister::eax: + if (isLongOrDouble) + COPY_TEMPLATE(buffer, moveraxForCall, argumentMoveSize); + else if (isReference) + COPY_TEMPLATE(buffer, moveraxRefForCall, argumentMoveSize); + else + COPY_TEMPLATE(buffer, moveeaxForCall, argumentMoveSize); + break; + case TR::RealRegister::esi: + if (isLongOrDouble) + COPY_TEMPLATE(buffer, moversiForCall, argumentMoveSize); + else if (isReference) + COPY_TEMPLATE(buffer, moversiRefForCall, argumentMoveSize); + else + COPY_TEMPLATE(buffer, moveesiForCall, argumentMoveSize); break; - } - break; + case TR::RealRegister::edx: + if (isLongOrDouble) + COPY_TEMPLATE(buffer, moverdxForCall, argumentMoveSize); + else if (isReference) + COPY_TEMPLATE(buffer, moverdxRefForCall, argumentMoveSize); + else + COPY_TEMPLATE(buffer, moveedxForCall, argumentMoveSize); + break; + case TR::RealRegister::ecx: + if (isLongOrDouble) + COPY_TEMPLATE(buffer, movercxForCall, argumentMoveSize); + else if (isReference) + COPY_TEMPLATE(buffer, movercxRefForCall, argumentMoveSize); + else + COPY_TEMPLATE(buffer, moveecxForCall, argumentMoveSize); + break; + case TR::RealRegister::xmm0: + if (isLongOrDouble) + COPY_TEMPLATE(buffer, moveDxmm0ForCall, argumentMoveSize); + else + COPY_TEMPLATE(buffer, movexmm0ForCall, argumentMoveSize); + break; + case TR::RealRegister::xmm1: + if (isLongOrDouble) + COPY_TEMPLATE(buffer, moveDxmm1ForCall, argumentMoveSize); + else + COPY_TEMPLATE(buffer, movexmm1ForCall, argumentMoveSize); + break; + case TR::RealRegister::xmm2: + if (isLongOrDouble) + COPY_TEMPLATE(buffer, moveDxmm2ForCall, argumentMoveSize); + else + COPY_TEMPLATE(buffer, movexmm2ForCall, argumentMoveSize); + break; + case TR::RealRegister::xmm3: + if (isLongOrDouble) + COPY_TEMPLATE(buffer, moveDxmm3ForCall, argumentMoveSize); + else + COPY_TEMPLATE(buffer, movexmm3ForCall, argumentMoveSize); + break; + } + break; + } + break; } - default: - //TODO: Add stack argument passing here. - break; - } - actualParamCount--; - } + default: + // TODO: Add stack argument passing here. + break; + } + actualParamCount--; + } - argumentMoveSize += saveArgsOnStack(buffer, -8, ¶mTable); + argumentMoveSize += saveArgsOnStack(buffer, -8, ¶mTable); - return argumentMoveSize; -} + return argumentMoveSize; + } buffer_size_t MJIT::CodeGenerator::generateInvokeStatic(char *buffer, MJIT::ByteCodeIterator *bci, struct TR::X86LinkageProperties *properties) -{ - buffer_size_t invokeStaticSize = 0; - int32_t cpIndex = (int32_t)bci->next2Bytes(); - - //Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. - //In the future research project we could attempt runtime resolution. TR does this with self modifying code. - //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. - bool isUnresolvedInCP; - TR_ResolvedMethod* resolved = _compilee->getResolvedStaticMethod(_comp, cpIndex, &isUnresolvedInCP); - //TODO: Split this into generateInvokeStaticJava and generateInvokeStaticNative - //and call the correct one with required args. Do not turn this method into a mess. - if(!resolved || resolved->isNative()) return 0; - - //Get method signature - J9Method *ramMethod = static_cast(resolved)->ramMethod(); - J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(ramMethod); - U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); - char typeString[maxLength]; - if(MJIT::nativeSignature(ramMethod, typeString)){ - return 0; - } - - //TODO: Find the maximum size that will need to be saved - // for calling a method and reserve it in the prologue. - // This could be done during the meta-data gathering phase. - U_16 slotCount = MJIT::getMJITStackSlotCount(typeString, maxLength); - - //Generate template and instantiate immediate values - buffer_size_t argMoveSize = generateArgumentMoveForStaticMethod(buffer, resolved, typeString, maxLength, slotCount, properties); - - if(!argMoveSize){ - return 0; - } else if(-1 == argMoveSize) { - argMoveSize = 0; - } - buffer += argMoveSize; - invokeStaticSize += argMoveSize; - - //Save Caller preserved registers - COPY_TEMPLATE(buffer, saveR10Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x0 + (slotCount * 8))); - COPY_TEMPLATE(buffer, saveR11Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x8 + (slotCount * 8))); - COPY_TEMPLATE(buffer, saveR12Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x10 + (slotCount * 8))); - COPY_TEMPLATE(buffer, saveR13Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x18 + (slotCount * 8))); - COPY_TEMPLATE(buffer, saveR14Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x20 + (slotCount * 8))); - COPY_TEMPLATE(buffer, saveR15Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x28 + (slotCount * 8))); - COPY_TEMPLATE(buffer, invokeStaticTemplate, invokeStaticSize); - patchImm8(buffer, (U_64)ramMethod); - - //Call the glue code - COPY_TEMPLATE(buffer, call4ByteRel, invokeStaticSize); - intptr_t interpreterStaticAndSpecialGlue = getHelperOrTrampolineAddress(TR_X86interpreterStaticAndSpecialGlue, (uintptr_t)buffer); - PATCH_RELATIVE_ADDR_32_BIT(buffer, interpreterStaticAndSpecialGlue); - - //Load Caller preserved registers - COPY_TEMPLATE(buffer, loadR10Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x00 + (slotCount * 8))); - COPY_TEMPLATE(buffer, loadR11Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x08 + (slotCount * 8))); - COPY_TEMPLATE(buffer, loadR12Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x10 + (slotCount * 8))); - COPY_TEMPLATE(buffer, loadR13Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x18 + (slotCount * 8))); - COPY_TEMPLATE(buffer, loadR14Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x20 + (slotCount * 8))); - COPY_TEMPLATE(buffer, loadR15Offset, invokeStaticSize); - patchImm4(buffer, (U_32)(0x28 + (slotCount * 8))); - TR::RealRegister::RegNum returnRegIndex; - slotCount = 0; - bool isAddressOrLongorDouble = false; - - switch(typeString[0]){ - case MJIT::VOID_TYPE_CHARACTER: - break; - case MJIT::BOOLEAN_TYPE_CHARACTER: - case MJIT::BYTE_TYPE_CHARACTER: - case MJIT::CHAR_TYPE_CHARACTER: - case MJIT::SHORT_TYPE_CHARACTER: - case MJIT::INT_TYPE_CHARACTER: - slotCount = 1; - returnRegIndex = properties->getIntegerReturnRegister(); - goto genStatic; - case MJIT::CLASSNAME_TYPE_CHARACTER: - slotCount = 1; - isAddressOrLongorDouble = true; - returnRegIndex = properties->getIntegerReturnRegister(); - goto genStatic; - case MJIT::LONG_TYPE_CHARACTER: - slotCount = 2; - isAddressOrLongorDouble = true; - returnRegIndex = properties->getIntegerReturnRegister(); - goto genStatic; - case MJIT::FLOAT_TYPE_CHARACTER: - slotCount = 1; - returnRegIndex = properties->getFloatReturnRegister(); - goto genStatic; - case MJIT::DOUBLE_TYPE_CHARACTER: - slotCount = 2; - isAddressOrLongorDouble = true; - returnRegIndex = properties->getFloatReturnRegister(); - goto genStatic; - genStatic: - if(slotCount){ - COPY_TEMPLATE(buffer, retTemplate_sub, invokeStaticSize); - patchImm1(buffer, slotCount*8); + { + buffer_size_t invokeStaticSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + // Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. + // In the future research project we could attempt runtime resolution. TR does this with self modifying code. + // These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. + bool isUnresolvedInCP; + TR_ResolvedMethod *resolved = _compilee->getResolvedStaticMethod(_comp, cpIndex, &isUnresolvedInCP); + // TODO: Split this into generateInvokeStaticJava and generateInvokeStaticNative + // and call the correct one with required args. Do not turn this method into a mess. + if (!resolved || resolved->isNative()) + return 0; + + // Get method signature + J9Method *ramMethod = static_cast(resolved)->ramMethod(); + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(ramMethod); + U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); + char typeString[maxLength]; + if (MJIT::nativeSignature(ramMethod, typeString)) + return 0; + + // TODO: Find the maximum size that will need to be saved + // for calling a method and reserve it in the prologue. + // This could be done during the meta-data gathering phase. + U_16 slotCount = MJIT::getMJITStackSlotCount(typeString, maxLength); + + // Generate template and instantiate immediate values + buffer_size_t argMoveSize = generateArgumentMoveForStaticMethod(buffer, resolved, typeString, maxLength, slotCount, properties); + + if (!argMoveSize) + return 0; + else if(-1 == argMoveSize) + argMoveSize = 0; + buffer += argMoveSize; + invokeStaticSize += argMoveSize; + + // Save Caller preserved registers + COPY_TEMPLATE(buffer, saveR10Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x0 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR11Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x8 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR12Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x10 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR13Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x18 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR14Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x20 + (slotCount * 8))); + COPY_TEMPLATE(buffer, saveR15Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x28 + (slotCount * 8))); + COPY_TEMPLATE(buffer, invokeStaticTemplate, invokeStaticSize); + patchImm8(buffer, (U_64)ramMethod); + + // Call the glue code + COPY_TEMPLATE(buffer, call4ByteRel, invokeStaticSize); + intptr_t interpreterStaticAndSpecialGlue = getHelperOrTrampolineAddress(TR_X86interpreterStaticAndSpecialGlue, (uintptr_t)buffer); + PATCH_RELATIVE_ADDR_32_BIT(buffer, interpreterStaticAndSpecialGlue); + + // Load Caller preserved registers + COPY_TEMPLATE(buffer, loadR10Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x00 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR11Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x08 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR12Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x10 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR13Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x18 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR14Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x20 + (slotCount * 8))); + COPY_TEMPLATE(buffer, loadR15Offset, invokeStaticSize); + patchImm4(buffer, (U_32)(0x28 + (slotCount * 8))); + TR::RealRegister::RegNum returnRegIndex; + slotCount = 0; + bool isAddressOrLongorDouble = false; + + switch (typeString[0]) + { + case MJIT::VOID_TYPE_CHARACTER: + break; + case MJIT::BOOLEAN_TYPE_CHARACTER: + case MJIT::BYTE_TYPE_CHARACTER: + case MJIT::CHAR_TYPE_CHARACTER: + case MJIT::SHORT_TYPE_CHARACTER: + case MJIT::INT_TYPE_CHARACTER: + slotCount = 1; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genStatic; + case MJIT::CLASSNAME_TYPE_CHARACTER: + slotCount = 1; + isAddressOrLongorDouble = true; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genStatic; + case MJIT::LONG_TYPE_CHARACTER: + slotCount = 2; + isAddressOrLongorDouble = true; + returnRegIndex = properties->getIntegerReturnRegister(); + goto genStatic; + case MJIT::FLOAT_TYPE_CHARACTER: + slotCount = 1; + returnRegIndex = properties->getFloatReturnRegister(); + goto genStatic; + case MJIT::DOUBLE_TYPE_CHARACTER: + slotCount = 2; + isAddressOrLongorDouble = true; + returnRegIndex = properties->getFloatReturnRegister(); + goto genStatic; + genStatic: + if (slotCount) + { + COPY_TEMPLATE(buffer, retTemplate_sub, invokeStaticSize); + patchImm1(buffer, slotCount*8); } - switch(returnRegIndex){ - case TR::RealRegister::eax: - if(isAddressOrLongorDouble) { - COPY_TEMPLATE(buffer, loadraxReturn, invokeStaticSize); - } else { - COPY_TEMPLATE(buffer, loadeaxReturn, invokeStaticSize); - } - break; - case TR::RealRegister::xmm0: - if(isAddressOrLongorDouble) { - COPY_TEMPLATE(buffer, loadDxmm0Return, invokeStaticSize); - } else { - COPY_TEMPLATE(buffer, loadxmm0Return, invokeStaticSize); - } - break; + switch (returnRegIndex) + { + case TR::RealRegister::eax: + if (isAddressOrLongorDouble) + COPY_TEMPLATE(buffer, loadraxReturn, invokeStaticSize); + else + COPY_TEMPLATE(buffer, loadeaxReturn, invokeStaticSize); + break; + case TR::RealRegister::xmm0: + if (isAddressOrLongorDouble) + COPY_TEMPLATE(buffer, loadDxmm0Return, invokeStaticSize); + else + COPY_TEMPLATE(buffer, loadxmm0Return, invokeStaticSize); + break; } - break; - default: - MJIT_ASSERT(_logFileFP, false, "Bad return type."); - } - return invokeStaticSize; -} + break; + default: + MJIT_ASSERT(_logFileFP, false, "Bad return type."); + } + return invokeStaticSize; + } buffer_size_t MJIT::CodeGenerator::generateGetField(char *buffer, MJIT::ByteCodeIterator *bci) -{ - buffer_size_t getFieldSize = 0; - int32_t cpIndex = (int32_t)bci->next2Bytes(); - - //Attempt to resolve the offset at compile time. This can fail, and if it does we must fail the compilation for now. - //In the future research project we could attempt runtime resolution. TR does this with self modifying code. - //These are all from TR and likely have implications on how we use this offset. However, we do not know that as of yet. - TR::DataType type = TR::NoType; - U_32 fieldOffset = 0; - bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; - bool resolved = _compilee->fieldAttributes(_comp, cpIndex, &fieldOffset, &type, &isVolatile, &isFinal, &isPrivate, false, &isUnresolvedInCP); - if(!resolved) return 0; - - switch(type) - { - case TR::Int8: - case TR::Int16: - case TR::Int32: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, intGetFieldTemplate, getFieldSize); - break; - case TR::Address: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, addrGetFieldTemplatePrologue, getFieldSize); - if(TR::Compiler->om.compressObjectReferences()){ - int32_t shift = TR::Compiler->om.compressedReferenceShift(); - COPY_TEMPLATE(buffer, decompressReferenceTemplate, getFieldSize); - patchImm1(buffer, shift); + { + buffer_size_t getFieldSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + // Attempt to resolve the offset at compile time. This can fail, and if it does we must fail the compilation for now. + // In the future research project we could attempt runtime resolution. TR does this with self modifying code. + // These are all from TR and likely have implications on how we use this offset. However, we do not know that as of yet. + TR::DataType type = TR::NoType; + U_32 fieldOffset = 0; + bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; + bool resolved = _compilee->fieldAttributes(_comp, cpIndex, &fieldOffset, &type, &isVolatile, &isFinal, &isPrivate, false, &isUnresolvedInCP); + if (!resolved) + return 0; + + switch (type) + { + case TR::Int8: + case TR::Int16: + case TR::Int32: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, intGetFieldTemplate, getFieldSize); + break; + case TR::Address: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, addrGetFieldTemplatePrologue, getFieldSize); + if (TR::Compiler->om.compressObjectReferences()) + { + int32_t shift = TR::Compiler->om.compressedReferenceShift(); + COPY_TEMPLATE(buffer, decompressReferenceTemplate, getFieldSize); + patchImm1(buffer, shift); } - COPY_TEMPLATE(buffer, addrGetFieldTemplate, getFieldSize); - break; - case TR::Int64: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, longGetFieldTemplate, getFieldSize); - break; - case TR::Float: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, floatGetFieldTemplate, getFieldSize); - break; - case TR::Double: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, doubleGetFieldTemplate, getFieldSize); - break; - default: - trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); - break; - } - if(_comp->getOption(TR_TraceCG)){ - trfprintf(_logFileFP, "Field Offset: %u\n", fieldOffset); - } - - return getFieldSize; - -} + COPY_TEMPLATE(buffer, addrGetFieldTemplate, getFieldSize); + break; + case TR::Int64: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, longGetFieldTemplate, getFieldSize); + break; + case TR::Float: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, floatGetFieldTemplate, getFieldSize); + break; + case TR::Double: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, getFieldTemplatePrologue, getFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, doubleGetFieldTemplate, getFieldSize); + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); + break; + } + if (_comp->getOption(TR_TraceCG)) + trfprintf(_logFileFP, "Field Offset: %u\n", fieldOffset); + + return getFieldSize; + } buffer_size_t MJIT::CodeGenerator::generatePutField(char *buffer, MJIT::ByteCodeIterator *bci) -{ - buffer_size_t putFieldSize = 0; - int32_t cpIndex = (int32_t)bci->next2Bytes(); - - //Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. - //In the future research project we could attempt runtime resolution. TR does this with self modifying code. - //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. - U_32 fieldOffset = 0; - TR::DataType type = TR::NoType; - bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; - bool resolved = _compilee->fieldAttributes(_comp, cpIndex, &fieldOffset, &type, &isVolatile, &isFinal, &isPrivate, true, &isUnresolvedInCP); - - if(!resolved) return 0; - - switch(type) - { - case TR::Int8: - case TR::Int16: - case TR::Int32: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, intPutFieldTemplatePrologue, putFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, intPutFieldTemplate, putFieldSize); - break; - case TR::Address: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, addrPutFieldTemplatePrologue, putFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - if(TR::Compiler->om.compressObjectReferences()){ - int32_t shift = TR::Compiler->om.compressedReferenceShift(); - COPY_TEMPLATE(buffer, compressReferenceTemplate, putFieldSize); - patchImm1(buffer, shift); + { + buffer_size_t putFieldSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + // Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. + // In the future research project we could attempt runtime resolution. TR does this with self modifying code. + // These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. + U_32 fieldOffset = 0; + TR::DataType type = TR::NoType; + bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; + bool resolved = _compilee->fieldAttributes(_comp, cpIndex, &fieldOffset, &type, &isVolatile, &isFinal, &isPrivate, true, &isUnresolvedInCP); + + if (!resolved) + return 0; + + switch (type) + { + case TR::Int8: + case TR::Int16: + case TR::Int32: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, intPutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, intPutFieldTemplate, putFieldSize); + break; + case TR::Address: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, addrPutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + if (TR::Compiler->om.compressObjectReferences()) + { + int32_t shift = TR::Compiler->om.compressedReferenceShift(); + COPY_TEMPLATE(buffer, compressReferenceTemplate, putFieldSize); + patchImm1(buffer, shift); } - COPY_TEMPLATE(buffer, intPutFieldTemplate, putFieldSize); - break; - case TR::Int64: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, longPutFieldTemplatePrologue, putFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, longPutFieldTemplate, putFieldSize); - break; - case TR::Float: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, floatPutFieldTemplatePrologue, putFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, floatPutFieldTemplate, putFieldSize); - break; - case TR::Double: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, doublePutFieldTemplatePrologue, putFieldSize); - patchImm4(buffer, (U_32)(fieldOffset)); - COPY_TEMPLATE(buffer, doublePutFieldTemplate, putFieldSize); - break; - default: - trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); - break; - } - if(_comp->getOption(TR_TraceCG)){ - trfprintf(_logFileFP, "Field Offset: %u\n", fieldOffset); - } - return putFieldSize; - -} + COPY_TEMPLATE(buffer, intPutFieldTemplate, putFieldSize); + break; + case TR::Int64: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, longPutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, longPutFieldTemplate, putFieldSize); + break; + case TR::Float: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, floatPutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, floatPutFieldTemplate, putFieldSize); + break; + case TR::Double: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, doublePutFieldTemplatePrologue, putFieldSize); + patchImm4(buffer, (U_32)(fieldOffset)); + COPY_TEMPLATE(buffer, doublePutFieldTemplate, putFieldSize); + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); + break; + } + if (_comp->getOption(TR_TraceCG)) + trfprintf(_logFileFP, "Field Offset: %u\n", fieldOffset); + + return putFieldSize; + } buffer_size_t MJIT::CodeGenerator::generateGetStatic(char *buffer, MJIT::ByteCodeIterator *bci) -{ - buffer_size_t getStaticSize = 0; - int32_t cpIndex = (int32_t)bci->next2Bytes(); - - //Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. - //In the future research project we could attempt runtime resolution. TR does this with self modifying code. - void *dataAddress; - //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. - TR::DataType type = TR::NoType; - bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; - bool resolved = _compilee->staticAttributes(_comp, cpIndex, &dataAddress, &type, &isVolatile, &isFinal, &isPrivate, false, &isUnresolvedInCP); - if(!resolved) return 0; - - switch(type) - { - case TR::Int8: - case TR::Int16: - case TR::Int32: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, intGetStaticTemplate, getStaticSize); - break; - case TR::Address: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, addrGetStaticTemplatePrologue, getStaticSize); - /* - if(TR::Compiler->om.compressObjectReferences()){ - int32_t shift = TR::Compiler->om.compressedReferenceShift(); - COPY_TEMPLATE(buffer, decompressReferenceTemplate, getStaticSize); - patchImm1(buffer, shift); + { + buffer_size_t getStaticSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + // Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. + // In the future research project we could attempt runtime resolution. TR does this with self modifying code. + void *dataAddress; + //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. + TR::DataType type = TR::NoType; + bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; + bool resolved = _compilee->staticAttributes(_comp, cpIndex, &dataAddress, &type, &isVolatile, &isFinal, &isPrivate, false, &isUnresolvedInCP); + if (!resolved) + return 0; + + switch (type) + { + case TR::Int8: + case TR::Int16: + case TR::Int32: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, intGetStaticTemplate, getStaticSize); + break; + case TR::Address: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, addrGetStaticTemplatePrologue, getStaticSize); + /* + if (TR::Compiler->om.compressObjectReferences()) + { + int32_t shift = TR::Compiler->om.compressedReferenceShift(); + COPY_TEMPLATE(buffer, decompressReferenceTemplate, getStaticSize); + patchImm1(buffer, shift); } - */ - COPY_TEMPLATE(buffer, addrGetStaticTemplate, getStaticSize); - break; - case TR::Int64: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, longGetStaticTemplate, getStaticSize); - break; - case TR::Float: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, floatGetStaticTemplate, getStaticSize); - break; - case TR::Double: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, doubleGetStaticTemplate, getStaticSize); - break; - default: - trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); - break; - } - - return getStaticSize; -} + */ + COPY_TEMPLATE(buffer, addrGetStaticTemplate, getStaticSize); + break; + case TR::Int64: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, longGetStaticTemplate, getStaticSize); + break; + case TR::Float: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, floatGetStaticTemplate, getStaticSize); + break; + case TR::Double: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, getStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, doubleGetStaticTemplate, getStaticSize); + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); + break; + } + + return getStaticSize; + } buffer_size_t MJIT::CodeGenerator::generatePutStatic(char *buffer, MJIT::ByteCodeIterator *bci) -{ - buffer_size_t putStaticSize = 0; - int32_t cpIndex = (int32_t)bci->next2Bytes(); - - //Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. - //In the future research project we could attempt runtime resolution. TR does this with self modifying code. - void * dataAddress; - //These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. - TR::DataType type = TR::NoType; - bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; - bool resolved = _compilee->staticAttributes(_comp, cpIndex, &dataAddress, &type, &isVolatile, &isFinal, &isPrivate, true, &isUnresolvedInCP); - if(!resolved) return 0; - - switch(type) - { - case TR::Int8: - case TR::Int16: - case TR::Int32: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, intPutStaticTemplate, putStaticSize); - break; - case TR::Address: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, addrPutStaticTemplatePrologue, putStaticSize); - /* - if(TR::Compiler->om.compressObjectReferences()){ - int32_t shift = TR::Compiler->om.compressedReferenceShift(); - COPY_TEMPLATE(buffer, compressReferenceTemplate, putStaticSize); - patchImm1(buffer, shift); + { + buffer_size_t putStaticSize = 0; + int32_t cpIndex = (int32_t)bci->next2Bytes(); + + // Attempt to resolve the address at compile time. This can fail, and if it does we must fail the compilation for now. + // In the future research project we could attempt runtime resolution. TR does this with self modifying code. + void *dataAddress; + // These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. + TR::DataType type = TR::NoType; + bool isVolatile, isFinal, isPrivate, isUnresolvedInCP; + bool resolved = _compilee->staticAttributes(_comp, cpIndex, &dataAddress, &type, &isVolatile, &isFinal, &isPrivate, true, &isUnresolvedInCP); + if (!resolved) + return 0; + + switch (type) + { + case TR::Int8: + case TR::Int16: + case TR::Int32: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, intPutStaticTemplate, putStaticSize); + break; + case TR::Address: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, addrPutStaticTemplatePrologue, putStaticSize); + /* + if (TR::Compiler->om.compressObjectReferences()) + { + int32_t shift = TR::Compiler->om.compressedReferenceShift(); + COPY_TEMPLATE(buffer, compressReferenceTemplate, putStaticSize); + patchImm1(buffer, shift); } - */ - COPY_TEMPLATE(buffer, addrPutStaticTemplate, putStaticSize); - break; - case TR::Int64: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, longPutStaticTemplate, putStaticSize); - break; - case TR::Float: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, floatPutStaticTemplate, putStaticSize); - break; - case TR::Double: - //Generate template and instantiate immediate values - COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); - MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); - patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); - COPY_TEMPLATE(buffer, doublePutStaticTemplate, putStaticSize); - break; - default: - trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); - break; - } - - return putStaticSize; -} + */ + COPY_TEMPLATE(buffer, addrPutStaticTemplate, putStaticSize); + break; + case TR::Int64: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, longPutStaticTemplate, putStaticSize); + break; + case TR::Float: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, floatPutStaticTemplate, putStaticSize); + break; + case TR::Double: + // Generate template and instantiate immediate values + COPY_TEMPLATE(buffer, staticTemplatePrologue, putStaticSize); + MJIT_ASSERT(_logFileFP, (U_64)dataAddress <= 0x00000000ffffffff, "data is above 4-byte boundary"); + patchImm4(buffer, (U_32)((U_64)dataAddress & 0x00000000ffffffff)); + COPY_TEMPLATE(buffer, doublePutStaticTemplate, putStaticSize); + break; + default: + trfprintf(_logFileFP, "Argument type %s is not supported\n", type.toString()); + break; + } + + return putStaticSize; + } buffer_size_t -MJIT::CodeGenerator::generateColdArea(char *buffer, J9Method *method, char *jitStackOverflowJumpPatchLocation){ - buffer_size_t coldAreaSize = 0; - PATCH_RELATIVE_ADDR_32_BIT(jitStackOverflowJumpPatchLocation, (intptr_t)buffer); - COPY_TEMPLATE(buffer, movEDIImm32, coldAreaSize); - patchImm4(buffer, _stackPeakSize); +MJIT::CodeGenerator::generateColdArea(char *buffer, J9Method *method, char *jitStackOverflowJumpPatchLocation) + { + buffer_size_t coldAreaSize = 0; + PATCH_RELATIVE_ADDR_32_BIT(jitStackOverflowJumpPatchLocation, (intptr_t)buffer); + COPY_TEMPLATE(buffer, movEDIImm32, coldAreaSize); + patchImm4(buffer, _stackPeakSize); - COPY_TEMPLATE(buffer, call4ByteRel, coldAreaSize); - uintptr_t jitStackOverflowHelperAddress = getHelperOrTrampolineAddress(TR_stackOverflow, (uintptr_t)buffer); - PATCH_RELATIVE_ADDR_32_BIT(buffer, jitStackOverflowHelperAddress); + COPY_TEMPLATE(buffer, call4ByteRel, coldAreaSize); + uintptr_t jitStackOverflowHelperAddress = getHelperOrTrampolineAddress(TR_stackOverflow, (uintptr_t)buffer); + PATCH_RELATIVE_ADDR_32_BIT(buffer, jitStackOverflowHelperAddress); - COPY_TEMPLATE(buffer, jump4ByteRel, coldAreaSize); - PATCH_RELATIVE_ADDR_32_BIT(buffer, jitStackOverflowJumpPatchLocation); + COPY_TEMPLATE(buffer, jump4ByteRel, coldAreaSize); + PATCH_RELATIVE_ADDR_32_BIT(buffer, jitStackOverflowJumpPatchLocation); - return coldAreaSize; -} + return coldAreaSize; + } buffer_size_t -MJIT::CodeGenerator::generateDebugBreakpoint(char *buffer){ - buffer_size_t codeGenSize = 0; - COPY_TEMPLATE(buffer, debugBreakpoint, codeGenSize); - return codeGenSize; -} +MJIT::CodeGenerator::generateDebugBreakpoint(char *buffer) + { + buffer_size_t codeGenSize = 0; + COPY_TEMPLATE(buffer, debugBreakpoint, codeGenSize); + return codeGenSize; + } -TR::CodeCache* -MJIT::CodeGenerator::getCodeCache(){ - return _codeCache; -} +TR::CodeCache * +MJIT::CodeGenerator::getCodeCache() + { + return _codeCache; + } U_8 * -MJIT::CodeGenerator::allocateCodeCache(int32_t length, TR_J9VMBase *vmBase, J9VMThread *vmThread){ - TR::CodeCacheManager *manager = TR::CodeCacheManager::instance(); - int32_t compThreadID = vmBase->getCompThreadIDForVMThread(vmThread); - int32_t numReserved; - - if(!_codeCache){ - _codeCache = manager->reserveCodeCache(false, length, compThreadID, &numReserved); - if (!_codeCache){ - return NULL; - } - _comp->cg()->setCodeCache(_codeCache); - } - - uint8_t *coldCode; - U_8 *codeStart = manager->allocateCodeMemory(length, 0, &_codeCache, &coldCode, false); - if(!codeStart){ - return NULL; - } - - return codeStart; -} \ No newline at end of file +MJIT::CodeGenerator::allocateCodeCache(int32_t length, TR_J9VMBase *vmBase, J9VMThread *vmThread) + { + TR::CodeCacheManager *manager = TR::CodeCacheManager::instance(); + int32_t compThreadID = vmBase->getCompThreadIDForVMThread(vmThread); + int32_t numReserved; + + if (!_codeCache) + { + _codeCache = manager->reserveCodeCache(false, length, compThreadID, &numReserved); + if (!_codeCache) + return NULL; + _comp->cg()->setCodeCache(_codeCache); + } + + uint8_t *coldCode; + U_8 *codeStart = manager->allocateCodeMemory(length, 0, &_codeCache, &coldCode, false); + if (!codeStart) + return NULL; + + return codeStart; + } diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp index 5ff1b22b1d9..eb37e602027 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp @@ -28,7 +28,6 @@ #include "microjit/x/amd64/AMD64Linkage.hpp" #include "microjit/x/amd64/AMD64CodegenGC.hpp" #include "microjit/ByteCodeIterator.hpp" - #include "control/RecompilationInfo.hpp" #include "env/IO.hpp" #include "env/TRMemory.hpp" @@ -39,40 +38,43 @@ namespace TR { class Compilation; } namespace TR { class CFG; } typedef unsigned int buffer_size_t; -namespace MJIT { - -enum TypeCharacters { - BYTE_TYPE_CHARACTER = 'B', - CHAR_TYPE_CHARACTER = 'C', - DOUBLE_TYPE_CHARACTER = 'D', - FLOAT_TYPE_CHARACTER = 'F', - INT_TYPE_CHARACTER = 'I', - LONG_TYPE_CHARACTER = 'J', - CLASSNAME_TYPE_CHARACTER = 'L', - SHORT_TYPE_CHARACTER = 'S', - BOOLEAN_TYPE_CHARACTER = 'Z', - VOID_TYPE_CHARACTER = 'V', -}; +namespace MJIT +{ + +enum TypeCharacters + { + BYTE_TYPE_CHARACTER = 'B', + CHAR_TYPE_CHARACTER = 'C', + DOUBLE_TYPE_CHARACTER = 'D', + FLOAT_TYPE_CHARACTER = 'F', + INT_TYPE_CHARACTER = 'I', + LONG_TYPE_CHARACTER = 'J', + CLASSNAME_TYPE_CHARACTER = 'L', + SHORT_TYPE_CHARACTER = 'S', + BOOLEAN_TYPE_CHARACTER = 'Z', + VOID_TYPE_CHARACTER = 'V', + }; inline static U_8 typeSignatureSize(char typeBuffer) -{ - switch(typeBuffer){ - case MJIT::BYTE_TYPE_CHARACTER: - case MJIT::CHAR_TYPE_CHARACTER: - case MJIT::INT_TYPE_CHARACTER: - case MJIT::FLOAT_TYPE_CHARACTER: - case MJIT::CLASSNAME_TYPE_CHARACTER: - case MJIT::SHORT_TYPE_CHARACTER: - case MJIT::BOOLEAN_TYPE_CHARACTER: - return 8; - case MJIT::DOUBLE_TYPE_CHARACTER: - case MJIT::LONG_TYPE_CHARACTER: - return 16; - default: - return 0; - } -} + { + switch (typeBuffer) + { + case MJIT::BYTE_TYPE_CHARACTER: + case MJIT::CHAR_TYPE_CHARACTER: + case MJIT::INT_TYPE_CHARACTER: + case MJIT::FLOAT_TYPE_CHARACTER: + case MJIT::CLASSNAME_TYPE_CHARACTER: + case MJIT::SHORT_TYPE_CHARACTER: + case MJIT::BOOLEAN_TYPE_CHARACTER: + return 8; + case MJIT::DOUBLE_TYPE_CHARACTER: + case MJIT::LONG_TYPE_CHARACTER: + return 16; + default: + return 0; + } + } RegisterStack mapIncomingParams(char*, U_16, int*, ParamTableEntry*, U_16, TR::FilePointer*); @@ -89,356 +91,338 @@ getActualParamCount(char *typeString, U_16 maxLength); int getMJITStackSlotCount(char *typeString, U_16 maxLength); -class CodeGenerator { - private: - Linkage _linkage; - TR::FilePointer *_logFileFP; - TR_J9VMBase& _vm; - TR::CodeCache *_codeCache; - int32_t _stackPeakSize; - ParamTable *_paramTable; - TR::Compilation *_comp; - MJIT::CodeGenGC *_mjitCGGC; - TR::GCStackAtlas *_atlas; - TR::PersistentInfo *_persistentInfo; - TR_Memory *_trMemory; - TR_ResolvedMethod *_compilee; - uint16_t _maxCalleeArgsSize; - bool _printByteCodes; - - buffer_size_t generateSwitchToInterpPrePrologue( - char*, - J9Method*, - buffer_size_t, - buffer_size_t, - char*, - U_16 - ); - - buffer_size_t generateEpologue(char*); - - buffer_size_t loadArgsFromStack( - char*, - buffer_size_t, - ParamTable* - ); - - buffer_size_t saveArgsOnStack( - char*, - buffer_size_t, - ParamTable* - ); - - buffer_size_t - saveArgsInLocalArray( - char*, - buffer_size_t, - char* - ); - - /** - * Generates a load instruction based on the type. - * - * @param buffer code buffer - * @param bc Bytecode that generated the load instruction - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateLoad( - char *buffer, - TR_J9ByteCode bc, - MJIT::ByteCodeIterator *bci - ); - - /** - * Generates a store instruction based on the type. - * - * @param buffer code buffer - * @param bc Bytecode that generated the load instruction - * @param bci Bytecode iterator from which the bytecode was retreived - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateStore( - char *buffer, - TR_J9ByteCode bc, - MJIT::ByteCodeIterator *bci - ); - - /** - * Generates argument moves for static method invokation. - * - * @param buffer code buffer - * @param staticMethod method for introspection - * @param typeString typeString for getting arguments - * @param typeStringLength the length of the string stored in typeString - * @return size of generated code; 0 if method failed, -1 if method has no arguments - */ - buffer_size_t - generateArgumentMoveForStaticMethod( - char* buffer, - TR_ResolvedMethod* staticMethod, - char* typeString, - U_16 typeStringLength, - U_16 slotCount, - struct TR::X86LinkageProperties *properties - ); - - /** - * Generates an invoke static instruction based on the type. - * - * @param buffer code buffer - * @param bci Bytecode iterator from which the bytecode was retreived - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateInvokeStatic( - char* buffer, - MJIT::ByteCodeIterator* bci, - struct TR::X86LinkageProperties *properties - ); - - /** - * Generates a getField instruction based on the type. - * - * @param buffer code buffer - * @param bci Bytecode iterator from which the bytecode was retreived - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateGetField( - char* buffer, - MJIT::ByteCodeIterator* bci - ); - - /** - * Generates a putField instruction based on the type. - * - * @param buffer code buffer - * @param bci Bytecode iterator from which the bytecode was retreived - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generatePutField( - char* buffer, - MJIT::ByteCodeIterator* bci - ); - - /** - * Generates a putStatic instruction based on the type. - * - * @param buffer code buffer - * @param bci Bytecode iterator from which the bytecode was retreived - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generatePutStatic( - char *buffer, - MJIT::ByteCodeIterator *bci - ); - - /** - * Generates a getStatic instruction based on the type. - * - * @param buffer code buffer - * @param bci Bytecode iterator from which the bytecode was retreived - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateGetStatic( - char *buffer, - MJIT::ByteCodeIterator *bci - ); - - /** - * Generates a return instruction based on the type. - * - * @param buffer code buffer - * @param dt The data type to be returned by the TR-compiled method - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateReturn( - char *buffer, - TR::DataType dt, - struct TR::X86LinkageProperties *properties - ); - - /** - * Generates a gaurded counter for recompilation through JITed counting. - * - * @param buffer code buffer - * @param initialCount invocations before compiling with TR - * @return size of generate code; 0 if method failed - */ - buffer_size_t - generateGCR( - char *buffer, - int32_t initialCount, - J9Method *method, - uintptr_t startPC - ); - - public: - CodeGenerator() = delete; - CodeGenerator( - struct J9JITConfig*, - J9VMThread*, - TR::FilePointer*, - TR_J9VMBase&, - ParamTable*, - TR::Compilation*, - MJIT::CodeGenGC*, - TR::PersistentInfo*, - TR_Memory*, - TR_ResolvedMethod* - ); - - inline void - setPeakStackSize(int32_t newSize) - { - _stackPeakSize = newSize; - } - - inline int32_t - getPeakStackSize() - { - return _stackPeakSize; - } - - /** - * Write the pre-prologue to the buffer and return the size written to the buffer. - * - * @param buffer code buffer - * @param method method for introspection - * @param magicWordLocation, location of variable to hold pointer to space allocated for magic word - * @param first2BytesPatchLocation, location of variable to hold pointer to first 2 bytes of method to be executed - * @param bodyInfo location of variable to hold pointer to the bodyInfo, allocated inside this method - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generatePrePrologue( - char* buffer, - J9Method* method, - char** magicWordLocation, - char** first2BytesPatchLocation, - char** samplingRecompileCallLocation - ); - - /** - * Write the prologue to the buffer and return the size written to the buffer. - * - * @param buffer code buffer - * @param method method for introspection - * @param jitStackOverflowJumpPatchLocation, location of variable to hold pointer to line that jumps to the stack overflow helper. - * @param magicWordLocation, pointer to magic word location, written to during prologue generation - * @param first2BytesPatchLocation, pointer to first 2 bytes location, written to during prologue generation - * @param firstInstLocation, location of variable to hold pointer to first instruction, later used to set entry point - * @param bci bytecode iterator, used to gather type information - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generatePrologue( - char* buffer, - J9Method* method, - char** jitStackOverflowJumpPatchLocation, - char* magicWordLocation, - char* first2BytesPatchLocation, - char* samplingRecompileCallLocation, - char** firstInstLocation, - MJIT::ByteCodeIterator* bci - ); - - /** - * Generates the cold area used for helpers, such as the jit stack overflow checker. - * - * @param buffer code buffer - * @param method method for introspection - * @param jitStackOverflowJumpPatchLocation location to be patched with jit stack overflow helper relative address. - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateColdArea( - char *buffer, - J9Method *method, - char *jitStackOverflowJumpPatchLocation - ); - - /** - * Generates the body for a method. - * - * @param buffer code buffer - * @param bci byte code iterator used in hot loop to generate body of compiled method - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateBody( - char *buffer, - MJIT::ByteCodeIterator *bci - ); - - /** - * Generates a profiling counter for an edge. - * - * @param buffer code buffer - * @param bci byte code iterator, currently pointing to the edge that needs to be profiled - * @return size of generated code; 0 if method failed - */ - buffer_size_t - generateEdgeCounter( - char *buffer, - TR_J9ByteCodeIterator *bci - ); - - /** - * Generate an int3 for signaling debug breakpoint for Linux x86 - * - * @param buffer code buffer - */ - buffer_size_t - generateDebugBreakpoint(char *buffer); - - /** - * Allocate space in the code cache of size length and - * copy the buffer to the cache. - * - * @param length length of code in the buffer - * @param vmBase To check if compilation should be interrupted - * @param vmThread Thread Id is stored in cache - * @return pointer to the new code cache segment or NULL if failed. - */ - U_8 * - allocateCodeCache( - int32_t length, - TR_J9VMBase *vmBase, - J9VMThread *vmThread - ); - - /** - * Get the code cache - */ - TR::CodeCache *getCodeCache(); - - /** - * Get a pointer to the Stack Atlas - */ - TR::GCStackAtlas *getStackAtlas(); - - inline uint8_t - getPointerSize() - { - return _linkage._properties.getPointerSize(); - } -}; - -class MJITCompilationFailure: public virtual std::exception { - public: - MJITCompilationFailure() { } - virtual const char *what() const throw(){ - return "Unable to compile method."; - } -}; - - -} //namespace MJIT +class CodeGenerator + { + + private: + Linkage _linkage; + TR::FilePointer *_logFileFP; + TR_J9VMBase& _vm; + TR::CodeCache *_codeCache; + int32_t _stackPeakSize; + ParamTable *_paramTable; + TR::Compilation *_comp; + MJIT::CodeGenGC *_mjitCGGC; + TR::GCStackAtlas *_atlas; + TR::PersistentInfo *_persistentInfo; + TR_Memory *_trMemory; + TR_ResolvedMethod *_compilee; + uint16_t _maxCalleeArgsSize; + bool _printByteCodes; + + buffer_size_t generateSwitchToInterpPrePrologue( + char*, + J9Method*, + buffer_size_t, + buffer_size_t, + char*, + U_16); + + buffer_size_t generateEpologue(char*); + + buffer_size_t loadArgsFromStack( + char*, + buffer_size_t, + ParamTable*); + + buffer_size_t saveArgsOnStack( + char*, + buffer_size_t, + ParamTable*); + + buffer_size_t + saveArgsInLocalArray( + char*, + buffer_size_t, + char*); + + /** + * Generates a load instruction based on the type. + * + * @param buffer code buffer + * @param bc Bytecode that generated the load instruction + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateLoad( + char *buffer, + TR_J9ByteCode bc, + MJIT::ByteCodeIterator *bci); + + /** + * Generates a store instruction based on the type. + * + * @param buffer code buffer + * @param bc Bytecode that generated the load instruction + * @param bci Bytecode iterator from which the bytecode was retrieved + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateStore( + char *buffer, + TR_J9ByteCode bc, + MJIT::ByteCodeIterator *bci); + + /** + * Generates argument moves for static method invocation. + * + * @param buffer code buffer + * @param staticMethod method for introspection + * @param typeString typeString for getting arguments + * @param typeStringLength the length of the string stored in typeString + * @return size of generated code; 0 if method failed, -1 if method has no arguments + */ + buffer_size_t + generateArgumentMoveForStaticMethod( + char* buffer, + TR_ResolvedMethod* staticMethod, + char* typeString, + U_16 typeStringLength, + U_16 slotCount, + struct TR::X86LinkageProperties *properties); + + /** + * Generates an invoke static instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retrieved + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateInvokeStatic( + char* buffer, + MJIT::ByteCodeIterator* bci, + struct TR::X86LinkageProperties *properties); + + /** + * Generates a getField instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retrieved + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateGetField( + char* buffer, + MJIT::ByteCodeIterator* bci); + + /** + * Generates a putField instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retrieved + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generatePutField( + char* buffer, + MJIT::ByteCodeIterator* bci); + + /** + * Generates a putStatic instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retrieved + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generatePutStatic( + char *buffer, + MJIT::ByteCodeIterator *bci); + + /** + * Generates a getStatic instruction based on the type. + * + * @param buffer code buffer + * @param bci Bytecode iterator from which the bytecode was retrieved + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateGetStatic( + char *buffer, + MJIT::ByteCodeIterator *bci); + + /** + * Generates a return instruction based on the type. + * + * @param buffer code buffer + * @param dt The data type to be returned by the TR-compiled method + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateReturn( + char *buffer, + TR::DataType dt, + struct TR::X86LinkageProperties *properties); + + /** + * Generates a guarded counter for recompilation through JITed counting. + * + * @param buffer code buffer + * @param initialCount invocations before compiling with TR + * @return size of generate code; 0 if method failed + */ + buffer_size_t + generateGCR( + char *buffer, + int32_t initialCount, + J9Method *method, + uintptr_t startPC); + + public: + CodeGenerator() = delete; + CodeGenerator( + struct J9JITConfig*, + J9VMThread*, + TR::FilePointer*, + TR_J9VMBase&, + ParamTable*, + TR::Compilation*, + MJIT::CodeGenGC*, + TR::PersistentInfo*, + TR_Memory*, + TR_ResolvedMethod*); + + inline void + setPeakStackSize(int32_t newSize) + { + _stackPeakSize = newSize; + } + + inline int32_t + getPeakStackSize() + { + return _stackPeakSize; + } + + /** + * Write the pre-prologue to the buffer and return the size written to the buffer. + * + * @param buffer code buffer + * @param method method for introspection + * @param magicWordLocation, location of variable to hold pointer to space allocated for magic word + * @param first2BytesPatchLocation, location of variable to hold pointer to first 2 bytes of method to be executed + * @param bodyInfo location of variable to hold pointer to the bodyInfo, allocated inside this method + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generatePrePrologue( + char *buffer, + J9Method *method, + char** magicWordLocation, + char** first2BytesPatchLocation, + char** samplingRecompileCallLocation); + + /** + * Write the prologue to the buffer and return the size written to the buffer. + * + * @param buffer code buffer + * @param method method for introspection + * @param jitStackOverflowJumpPatchLocation, location of variable to hold pointer to line that jumps to the stack overflow helper. + * @param magicWordLocation, pointer to magic word location, written to during prologue generation + * @param first2BytesPatchLocation, pointer to first 2 bytes location, written to during prologue generation + * @param firstInstLocation, location of variable to hold pointer to first instruction, later used to set entry point + * @param bci bytecode iterator, used to gather type information + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generatePrologue( + char *buffer, + J9Method *method, + char** jitStackOverflowJumpPatchLocation, + char *magicWordLocation, + char *first2BytesPatchLocation, + char *samplingRecompileCallLocation, + char** firstInstLocation, + MJIT::ByteCodeIterator* bci); + + /** + * Generates the cold area used for helpers, such as the jit stack overflow checker. + * + * @param buffer code buffer + * @param method method for introspection + * @param jitStackOverflowJumpPatchLocation location to be patched with jit stack overflow helper relative address. + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateColdArea( + char *buffer, + J9Method *method, + char *jitStackOverflowJumpPatchLocation); + + /** + * Generates the body for a method. + * + * @param buffer code buffer + * @param bci byte code iterator used in hot loop to generate body of compiled method + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateBody( + char *buffer, + MJIT::ByteCodeIterator *bci); + + /** + * Generates a profiling counter for an edge. + * + * @param buffer code buffer + * @param bci byte code iterator, currently pointing to the edge that needs to be profiled + * @return size of generated code; 0 if method failed + */ + buffer_size_t + generateEdgeCounter( + char *buffer, + TR_J9ByteCodeIterator *bci); + + /** + * Generate an int3 for signaling debug breakpoint for Linux x86 + * + * @param buffer code buffer + */ + buffer_size_t + generateDebugBreakpoint(char *buffer); + + /** + * Allocate space in the code cache of size length and + * copy the buffer to the cache. + * + * @param length length of code in the buffer + * @param vmBase To check if compilation should be interrupted + * @param vmThread Thread Id is stored in cache + * @return pointer to the new code cache segment or NULL if failed. + */ + U_8 * + allocateCodeCache( + int32_t length, + TR_J9VMBase *vmBase, + J9VMThread *vmThread); + + /** + * Get the code cache + */ + TR::CodeCache *getCodeCache(); + + /** + * Get a pointer to the Stack Atlas + */ + TR::GCStackAtlas *getStackAtlas(); + + inline uint8_t + getPointerSize() + { + return _linkage._properties.getPointerSize(); + } + }; + +class MJITCompilationFailure: public virtual std::exception + { + public: + MJITCompilationFailure() { } + virtual const char *what() const throw() + { + return "Unable to compile method."; + } + }; + +} // namespace MJIT #endif /* MJIT_AMD64CODEGEN_HPP */ diff --git a/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.cpp b/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.cpp index fb51ef186cc..134c72d52ba 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.cpp +++ b/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.cpp @@ -19,10 +19,9 @@ * * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception *******************************************************************************/ -#include "microjit/x/amd64/AMD64CodegenGC.hpp" - #include #include +#include "microjit/x/amd64/AMD64CodegenGC.hpp" #include "env/StackMemoryRegion.hpp" #include "codegen/CodeGenerator.hpp" #include "codegen/CodeGenerator_inlines.hpp" @@ -52,12 +51,11 @@ #include "microjit/utils.hpp" #include "ras/Debug.hpp" - MJIT::CodeGenGC::CodeGenGC(TR::FilePointer *logFileFP) - :_logFileFP(logFileFP) -{} + : _logFileFP(logFileFP) + {} -TR::GCStackAtlas* +TR::GCStackAtlas * MJIT::CodeGenGC::createStackAtlas(TR::Compilation *comp, MJIT::ParamTable *paramTable, MJIT::LocalTable *localTable) { @@ -71,34 +69,41 @@ MJIT::CodeGenGC::createStackAtlas(TR::Compilation *comp, MJIT::ParamTable *param int32_t firstMappedParmOffsetInBytes = -1; MJIT::ParamTableEntry entry; - for(int i=0; igetEntry(i, &entry), "Bad index for table entry"); - if(entry.notInitialized) { + if (entry.notInitialized) + { i++; continue; - } - if(!entry.notInitialized && entry.isReference){ + } + if (!entry.notInitialized && entry.isReference) + { firstMappedParmOffsetInBytes = entry.gcMapOffset; break; - } + } i += entry.slots; - } + } - for(int i=0; igetEntry(i, &entry), "Bad index for table entry"); - if(entry.notInitialized) { + if (entry.notInitialized) + { i++; continue; - } - if(!entry.notInitialized && entry.isReference){ + } + if (!entry.notInitialized && entry.isReference) + { int32_t entryOffsetInBytes = entry.gcMapOffset; parameterMap->setBit(((entryOffsetInBytes-firstMappedParmOffsetInBytes)/stackSlotSize)); - if(!entry.onStack){ + if (!entry.onStack) + { parameterMap->setRegisterBits(TR::RealRegister::gprMask((TR::RealRegister::RegNum)entry.regNo)); + } } - } i += entry.slots; - } + } // -------------------------------------------------------------------------------- // Construct the local map for mapped reference locals. @@ -109,21 +114,25 @@ MJIT::CodeGenGC::createStackAtlas(TR::Compilation *comp, MJIT::ParamTable *param U_16 localCount = localTable->getLocalCount(); TR_GCStackMap *localMap = new (comp->trHeapMemory(), localCount) TR_GCStackMap(localCount); localMap->copy(parameterMap); - for(int i=paramCount; igetEntry(i, &entry), "Bad index for table entry"); - if(entry.notInitialized) { + if (entry.notInitialized) + { i++; continue; - } - if(!entry.notInitialized && entry.offset == -1 && entry.isReference){ + } + if (!entry.notInitialized && entry.offset == -1 && entry.isReference) + { int32_t entryOffsetInBytes = entry.gcMapOffset; localMap->setBit(((entryOffsetInBytes-firstMappedParmOffsetInBytes)/stackSlotSize)); - if(!entry.onStack){ + if (!entry.onStack) + { localMap->setRegisterBits(TR::RealRegister::gprMask((TR::RealRegister::RegNum)entry.regNo)); + } } - } i += entry.slots; - } + } // -------------------------------------------------------------------------- // Now create the stack atlas diff --git a/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.hpp b/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.hpp index 24cfbfa2942..da2aba640a3 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.hpp +++ b/runtime/compiler/microjit/x/amd64/AMD64CodegenGC.hpp @@ -25,13 +25,19 @@ #include "microjit/SideTables.hpp" #include "env/IO.hpp" -namespace MJIT { - class CodeGenGC { - private: - TR::FilePointer *_logFileFP; - public: - CodeGenGC(TR::FilePointer *logFileFP); - TR::GCStackAtlas *createStackAtlas(TR::Compilation*, MJIT::ParamTable*, MJIT::LocalTable*); +namespace MJIT +{ + +class CodeGenGC + { + + private: + TR::FilePointer *_logFileFP; + + public: + CodeGenGC(TR::FilePointer *logFileFP); + TR::GCStackAtlas *createStackAtlas(TR::Compilation*, MJIT::ParamTable*, MJIT::LocalTable*); }; -} + +} // namespace MJIT #endif /* MJIT_AMD64_CODEGENGC_HPP */ diff --git a/runtime/compiler/microjit/x/amd64/AMD64Linkage.cpp b/runtime/compiler/microjit/x/amd64/AMD64Linkage.cpp index cdb4f70ee4c..864711d7393 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64Linkage.cpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Linkage.cpp @@ -20,7 +20,6 @@ * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception *******************************************************************************/ #include "AMD64Linkage.hpp" -#include /* | Architecture | Endian | Return address | Argument registers | @@ -31,8 +30,8 @@ */ MJIT::Linkage::Linkage(TR::CodeGenerator *cg) -{ + { // TODO: MicroJIT: Use the method setOffsetToFirstParm(RETURN_ADDRESS_SIZE); _properties._offsetToFirstParm = RETURN_ADDRESS_SIZE; setProperties(cg,&_properties); -} + } diff --git a/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp b/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp index 84e24af61e2..14c6309ae00 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp @@ -22,7 +22,6 @@ #ifndef AMD64LINKAGE_HPP #define AMD64LINKAGE_HPP -#include #include "codegen/AMD64PrivateLinkage.hpp" #include "codegen/OMRLinkage_inlines.hpp" @@ -34,18 +33,21 @@ | EAX (32-bit) RAX (64-bit) XMM0 (float/double) | RBX R9 RSP | R8 (receiver in RAX) | */ -namespace MJIT { +namespace MJIT +{ enum { RETURN_ADDRESS_SIZE=8, }; -class Linkage { +class Linkage +{ + public: - struct TR::X86LinkageProperties _properties; - Linkage() = delete; - Linkage(TR::CodeGenerator*); + struct TR::X86LinkageProperties _properties; + Linkage() = delete; + Linkage(TR::CodeGenerator*); }; } // namespace MJIT diff --git a/runtime/compiler/microjit/x/amd64/CMakeLists.txt b/runtime/compiler/microjit/x/amd64/CMakeLists.txt index 2c2ebafd739..446738f44f4 100644 --- a/runtime/compiler/microjit/x/amd64/CMakeLists.txt +++ b/runtime/compiler/microjit/x/amd64/CMakeLists.txt @@ -24,4 +24,4 @@ j9jit_files( microjit/x/amd64/AMD64CodegenGC.cpp microjit/x/amd64/AMD64Linkage.cpp ) -add_subdirectory(templates) \ No newline at end of file +add_subdirectory(templates) diff --git a/runtime/compiler/microjit/x/amd64/templates/CMakeLists.txt b/runtime/compiler/microjit/x/amd64/templates/CMakeLists.txt index 0861b09fafd..6e174d6e12b 100644 --- a/runtime/compiler/microjit/x/amd64/templates/CMakeLists.txt +++ b/runtime/compiler/microjit/x/amd64/templates/CMakeLists.txt @@ -23,4 +23,4 @@ j9jit_files( microjit/x/amd64/templates/microjit.nasm microjit/x/amd64/templates/bytecodes.nasm microjit/x/amd64/templates/linkage.nasm -) \ No newline at end of file +) diff --git a/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm b/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm index 9188a0c2fee..3a2f44eb6b0 100644 --- a/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm +++ b/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm @@ -40,1029 +40,1029 @@ ; stores values loaded from memory for storing on the stack or in the local array template_start debugBreakpoint - int3 ; trigger hardware interrupt + int3 ; trigger hardware interrupt template_end debugBreakpoint ; All integer loads require adding stack space of 8 bytes ; and the moving of values from a stack slot to a temp register template_start aloadTemplatePrologue - push_single_slot - _64bit_local_to_rXX_PATCH r15 + push_single_slot + _64bit_local_to_rXX_PATCH r15 template_end aloadTemplatePrologue ; All integer loads require adding stack space of 8 bytes ; and the moving of values from a stack slot to a temp register template_start iloadTemplatePrologue - push_single_slot - _32bit_local_to_rXX_PATCH r15 + push_single_slot + _32bit_local_to_rXX_PATCH r15 template_end iloadTemplatePrologue ; All long loads require adding 2 stack spaces of 8 bytes (16 total) ; and the moving of values from a stack slot to a temp register template_start lloadTemplatePrologue - push_dual_slot - _64bit_local_to_rXX_PATCH r15 + push_dual_slot + _64bit_local_to_rXX_PATCH r15 template_end lloadTemplatePrologue template_start loadTemplate - mov [r10], r15 + mov [r10], r15 template_end loadTemplate template_start invokeStaticTemplate - mov rdi, qword 0xefbeaddeefbeadde + mov rdi, qword 0xefbeaddeefbeadde template_end invokeStaticTemplate template_start retTemplate_sub - sub r10, byte 0xFF + sub r10, byte 0xFF template_end retTemplate_sub template_start loadeaxReturn - _32bit_slot_stack_from_eXX ax,0 + _32bit_slot_stack_from_eXX ax,0 template_end loadeaxReturn template_start loadraxReturn - _64bit_slot_stack_from_rXX rax,0 + _64bit_slot_stack_from_rXX rax,0 template_end loadraxReturn template_start loadxmm0Return - movd [r10], xmm0 + movd [r10], xmm0 template_end loadxmm0Return template_start loadDxmm0Return - movq [r10], xmm0 + movq [r10], xmm0 template_end loadDxmm0Return template_start moveeaxForCall - _32bit_slot_stack_to_eXX ax,0 - pop_single_slot + _32bit_slot_stack_to_eXX ax,0 + pop_single_slot template_end moveeaxForCall template_start moveesiForCall - _32bit_slot_stack_to_eXX si,0 - pop_single_slot + _32bit_slot_stack_to_eXX si,0 + pop_single_slot template_end moveesiForCall template_start moveedxForCall - _32bit_slot_stack_to_eXX dx,0 - pop_single_slot + _32bit_slot_stack_to_eXX dx,0 + pop_single_slot template_end moveedxForCall template_start moveecxForCall - _32bit_slot_stack_to_eXX cx,0 - pop_single_slot + _32bit_slot_stack_to_eXX cx,0 + pop_single_slot template_end moveecxForCall template_start moveraxRefForCall - _64bit_slot_stack_to_rXX rax,0 - pop_single_slot + _64bit_slot_stack_to_rXX rax,0 + pop_single_slot template_end moveraxRefForCall template_start moversiRefForCall - _64bit_slot_stack_to_rXX rsi,0 - pop_single_slot + _64bit_slot_stack_to_rXX rsi,0 + pop_single_slot template_end moversiRefForCall template_start moverdxRefForCall - _64bit_slot_stack_to_rXX rdx,0 - pop_single_slot + _64bit_slot_stack_to_rXX rdx,0 + pop_single_slot template_end moverdxRefForCall template_start movercxRefForCall - _64bit_slot_stack_to_rXX rcx,0 - pop_single_slot + _64bit_slot_stack_to_rXX rcx,0 + pop_single_slot template_end movercxRefForCall template_start moveraxForCall - _64bit_slot_stack_to_rXX rax,0 - pop_dual_slot + _64bit_slot_stack_to_rXX rax,0 + pop_dual_slot template_end moveraxForCall template_start moversiForCall - _64bit_slot_stack_to_rXX rsi,0 - pop_dual_slot + _64bit_slot_stack_to_rXX rsi,0 + pop_dual_slot template_end moversiForCall template_start moverdxForCall - _64bit_slot_stack_to_rXX rdx,0 - pop_dual_slot + _64bit_slot_stack_to_rXX rdx,0 + pop_dual_slot template_end moverdxForCall template_start movercxForCall - _64bit_slot_stack_to_rXX rcx,0 - pop_dual_slot + _64bit_slot_stack_to_rXX rcx,0 + pop_dual_slot template_end movercxForCall template_start movexmm0ForCall - psrldq xmm0, 8 ; clear xmm0 by shifting right - movss xmm0, [r10] - pop_single_slot + psrldq xmm0, 8 ; clear xmm0 by shifting right + movss xmm0, [r10] + pop_single_slot template_end movexmm0ForCall template_start movexmm1ForCall - psrldq xmm1, 8 ; clear xmm1 by shifting right - movss xmm1, [r10] - pop_single_slot + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] + pop_single_slot template_end movexmm1ForCall template_start movexmm2ForCall - psrldq xmm2, 8 ; clear xmm2 by shifting right - movss xmm2, [r10] - pop_single_slot + psrldq xmm2, 8 ; clear xmm2 by shifting right + movss xmm2, [r10] + pop_single_slot template_end movexmm2ForCall template_start movexmm3ForCall - psrldq xmm3, 8 ; clear xmm3 by shifting right - movss xmm3, [r10] - pop_single_slot + psrldq xmm3, 8 ; clear xmm3 by shifting right + movss xmm3, [r10] + pop_single_slot template_end movexmm3ForCall template_start moveDxmm0ForCall - psrldq xmm0, 8 ; clear xmm0 by shifting right - movsd xmm0, [r10] - pop_dual_slot + psrldq xmm0, 8 ; clear xmm0 by shifting right + movsd xmm0, [r10] + pop_dual_slot template_end moveDxmm0ForCall template_start moveDxmm1ForCall - psrldq xmm1, 8 ; clear xmm1 by shifting right - movsd xmm1, [r10] - pop_dual_slot + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] + pop_dual_slot template_end moveDxmm1ForCall template_start moveDxmm2ForCall - psrldq xmm2, 8 ; clear xmm2 by shifting right - movsd xmm2, [r10] - pop_dual_slot + psrldq xmm2, 8 ; clear xmm2 by shifting right + movsd xmm2, [r10] + pop_dual_slot template_end moveDxmm2ForCall template_start moveDxmm3ForCall - psrldq xmm3, 8 ; clear xmm3 by shifting right - movsd xmm3, [r10] - pop_dual_slot + psrldq xmm3, 8 ; clear xmm3 by shifting right + movsd xmm3, [r10] + pop_dual_slot template_end moveDxmm3ForCall template_start astoreTemplate - _64bit_slot_stack_to_rXX r15,0 - pop_single_slot - _64bit_local_from_rXX_PATCH r15 + _64bit_slot_stack_to_rXX r15,0 + pop_single_slot + _64bit_local_from_rXX_PATCH r15 template_end astoreTemplate template_start istoreTemplate - _32bit_slot_stack_to_rXX r15,0 - pop_single_slot - _32bit_local_from_rXX_PATCH r15 + _32bit_slot_stack_to_rXX r15,0 + pop_single_slot + _32bit_local_from_rXX_PATCH r15 template_end istoreTemplate template_start lstoreTemplate - _64bit_slot_stack_to_rXX r15,0 - pop_dual_slot - _64bit_local_from_rXX_PATCH r15 + _64bit_slot_stack_to_rXX r15,0 + pop_dual_slot + _64bit_local_from_rXX_PATCH r15 template_end lstoreTemplate ; stack before: ..., value → ; stack after: ... template_start popTemplate - pop_single_slot ; remove the top value of the stack + pop_single_slot ; remove the top value of the stack template_end popTemplate ; stack before: ..., value2, value1 → ; stack after: ... template_start pop2Template - pop_dual_slot ; remove the top two values from the stack + pop_dual_slot ; remove the top two values from the stack template_end pop2Template ; stack before: ..., value2, value1 → ; stack after: ..., value1, value2 template_start swapTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 - _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 - _64bit_slot_stack_from_rXX r12,8 ; write value1 to the second stack slot - _64bit_slot_stack_from_rXX r11,0 ; write value2 to the top stack slot + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + _64bit_slot_stack_from_rXX r12,8 ; write value1 to the second stack slot + _64bit_slot_stack_from_rXX r11,0 ; write value2 to the top stack slot template_end swapTemplate ; stack before: ..., value1 → ; stack after: ..., value1, value1 template_start dupTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy the top value into r12 - push_single_slot ; increase the stack by 1 slot - _64bit_slot_stack_from_rXX r12,0 ; push the duplicate on top of the stack + _64bit_slot_stack_to_rXX r12,0 ; copy the top value into r12 + push_single_slot ; increase the stack by 1 slot + _64bit_slot_stack_from_rXX r12,0 ; push the duplicate on top of the stack template_end dupTemplate ; stack before: ..., value2, value1 → ; stack after: ..., value1, value2, value1 template_start dupx1Template - _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 - _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 - _64bit_slot_stack_from_rXX r12,8 ; store value1 in third stack slot - _64bit_slot_stack_from_rXX r11,0 ; store value2 in second stack slot - push_single_slot ; increase stack by 1 slot - _64bit_slot_stack_from_rXX r12,0 ; store value1 on top stack slot + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + _64bit_slot_stack_from_rXX r12,8 ; store value1 in third stack slot + _64bit_slot_stack_from_rXX r11,0 ; store value2 in second stack slot + push_single_slot ; increase stack by 1 slot + _64bit_slot_stack_from_rXX r12,0 ; store value1 on top stack slot template_end dupx1Template ; stack before: ..., value3, value2, value1 → ; stack after: ..., value1, value3, value2, value1 template_start dupx2Template - _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 - _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 - _64bit_slot_stack_to_rXX r15,16 ; copy value3 into r15 - _64bit_slot_stack_from_rXX r12,16 ; store value1 in fourth stack slot - _64bit_slot_stack_from_rXX r15,8 ; store value3 in third stack slot - _64bit_slot_stack_from_rXX r11,0 ; store value2 in second stack slot - push_single_slot ; increase stack by 1 slot - _64bit_slot_stack_from_rXX r12,0 ; store value1 in top stack slot + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + _64bit_slot_stack_to_rXX r15,16 ; copy value3 into r15 + _64bit_slot_stack_from_rXX r12,16 ; store value1 in fourth stack slot + _64bit_slot_stack_from_rXX r15,8 ; store value3 in third stack slot + _64bit_slot_stack_from_rXX r11,0 ; store value2 in second stack slot + push_single_slot ; increase stack by 1 slot + _64bit_slot_stack_from_rXX r12,0 ; store value1 in top stack slot template_end dupx2Template ; stack before: ..., value2, value1 → ; stack after: ..., value2, value1, value2, value1 template_start dup2Template - _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 - _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 - push_dual_slot ; increase stack by 2 slots - _64bit_slot_stack_from_rXX r11,8 ; store value2 in second stack slot - _64bit_slot_stack_from_rXX r12,0 ; store value1 in top stack slot + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + push_dual_slot ; increase stack by 2 slots + _64bit_slot_stack_from_rXX r11,8 ; store value2 in second stack slot + _64bit_slot_stack_from_rXX r12,0 ; store value1 in top stack slot template_end dup2Template ; stack before: ..., value3, value2, value1 → ; stack after: ..., value2, value1, value3, value2, value1 template_start dup2x1Template - _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 - _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 - _64bit_slot_stack_to_rXX r15,16 ; copy value3 into r15 - _64bit_slot_stack_from_rXX r11,16 ; store value2 in fifth stack slot - _64bit_slot_stack_from_rXX r12,8 ; store value1 in fourth stack slot - _64bit_slot_stack_from_rXX r15,0 ; store value3 in third stack slot - push_dual_slot ; increase stack by 2 slots - _64bit_slot_stack_from_rXX r11,8 ; store value2 in second stack slot - _64bit_slot_stack_from_rXX r12,0 ; store value in top stack slot + _64bit_slot_stack_to_rXX r12,0 ; copy value1 into r12 + _64bit_slot_stack_to_rXX r11,8 ; copy value2 into r11 + _64bit_slot_stack_to_rXX r15,16 ; copy value3 into r15 + _64bit_slot_stack_from_rXX r11,16 ; store value2 in fifth stack slot + _64bit_slot_stack_from_rXX r12,8 ; store value1 in fourth stack slot + _64bit_slot_stack_from_rXX r15,0 ; store value3 in third stack slot + push_dual_slot ; increase stack by 2 slots + _64bit_slot_stack_from_rXX r11,8 ; store value2 in second stack slot + _64bit_slot_stack_from_rXX r12,0 ; store value in top stack slot template_end dup2x1Template ; stack before: ..., value4, value3, value2, value1 → ; stack after: ..., value2, value1, value4, value3, value2, value1 template_start dup2x2Template - _64bit_slot_stack_to_rXX r11,0 ; mov value1 into r11 - _64bit_slot_stack_to_rXX r12,8 ; mov value2 into r12 - _64bit_slot_stack_to_rXX r15,16 ; mov value3 into r15 - _64bit_slot_stack_from_rXX r11,16 ; mov value1 into old value3 slot - _64bit_slot_stack_from_rXX r15,0 ; mov value3 into old value1 slot - _64bit_slot_stack_to_rXX r15,24 ; mov value4 into r15 - _64bit_slot_stack_from_rXX r15,8 ; mov value4 into old value2 slot - _64bit_slot_stack_from_rXX r12,24 ; mov value2 into old value4 slot - push_single_slot ; increase the stack - _64bit_slot_stack_from_rXX r12,0 ; mov value2 into second slot - push_single_slot ; increase the stack - _64bit_slot_stack_from_rXX r11,0 ; mov value1 into top stack slot + _64bit_slot_stack_to_rXX r11,0 ; mov value1 into r11 + _64bit_slot_stack_to_rXX r12,8 ; mov value2 into r12 + _64bit_slot_stack_to_rXX r15,16 ; mov value3 into r15 + _64bit_slot_stack_from_rXX r11,16 ; mov value1 into old value3 slot + _64bit_slot_stack_from_rXX r15,0 ; mov value3 into old value1 slot + _64bit_slot_stack_to_rXX r15,24 ; mov value4 into r15 + _64bit_slot_stack_from_rXX r15,8 ; mov value4 into old value2 slot + _64bit_slot_stack_from_rXX r12,24 ; mov value2 into old value4 slot + push_single_slot ; increase the stack + _64bit_slot_stack_from_rXX r12,0 ; mov value2 into second slot + push_single_slot ; increase the stack + _64bit_slot_stack_from_rXX r11,0 ; mov value1 into top stack slot template_end dup2x2Template template_start getFieldTemplatePrologue - _64bit_slot_stack_to_rXX r11,0 ; get the objectref from the stack - lea r13, [r11 + 0xefbeadde] ; load the effective address of the object field + _64bit_slot_stack_to_rXX r11,0 ; get the objectref from the stack + lea r13, [r11 + 0xefbeadde] ; load the effective address of the object field template_end getFieldTemplatePrologue template_start intGetFieldTemplate - mov r12d, dword [r13] ; load only 32-bits - _32bit_slot_stack_from_rXX r12,0 ; put it on the stack + mov r12d, dword [r13] ; load only 32-bits + _32bit_slot_stack_from_rXX r12,0 ; put it on the stack template_end intGetFieldTemplate template_start addrGetFieldTemplatePrologue - mov r12d, dword [r13] ; load only 32-bits + mov r12d, dword [r13] ; load only 32-bits template_end addrGetFieldTemplatePrologue template_start decompressReferenceTemplate - shl r12, 0xff ; Reference from field is compressed, decompress by shift amount + shl r12, 0xff ; Reference from field is compressed, decompress by shift amount template_end decompressReferenceTemplate template_start decompressReference1Template - shl r12, 0x01 ; Reference from field is compressed, decompress by shift amount + shl r12, 0x01 ; Reference from field is compressed, decompress by shift amount template_end decompressReference1Template template_start addrGetFieldTemplate - _64bit_slot_stack_from_rXX r12,0 ; put it on the stack + _64bit_slot_stack_from_rXX r12,0 ; put it on the stack template_end addrGetFieldTemplate template_start longGetFieldTemplate - push_single_slot ; increase the stack pointer to make room for extra slot - mov r12, [r13] ; load the value - _64bit_slot_stack_from_rXX r12,0 ; put it on the stack + push_single_slot ; increase the stack pointer to make room for extra slot + mov r12, [r13] ; load the value + _64bit_slot_stack_from_rXX r12,0 ; put it on the stack template_end longGetFieldTemplate template_start floatGetFieldTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - movss xmm0, dword [r13] ; load the value - movss [r10], xmm0 ; put it on the stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + movss xmm0, dword [r13] ; load the value + movss [r10], xmm0 ; put it on the stack template_end floatGetFieldTemplate template_start doubleGetFieldTemplate - push_single_slot ; increase the stack pointer to make room for extra slot - movsd xmm0, [r13] ; load the value - movsd [r10], xmm0 ; put it on the stack + push_single_slot ; increase the stack pointer to make room for extra slot + movsd xmm0, [r13] ; load the value + movsd [r10], xmm0 ; put it on the stack template_end doubleGetFieldTemplate template_start intPutFieldTemplatePrologue - _32bit_slot_stack_to_rXX r11,0 ; copy value into r11 - pop_single_slot ; reduce stack pointer - _64bit_slot_stack_to_rXX r12,0 ; get the object reference - pop_single_slot ; reduce the stack pointer - lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field + _32bit_slot_stack_to_rXX r11,0 ; copy value into r11 + pop_single_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field template_end intPutFieldTemplatePrologue template_start addrPutFieldTemplatePrologue - _64bit_slot_stack_to_rXX r11,0 ; copy value into r11 - pop_single_slot ; reduce stack pointer - _64bit_slot_stack_to_rXX r12,0 ; get the object reference - pop_single_slot ; reduce the stack pointer - lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field + _64bit_slot_stack_to_rXX r11,0 ; copy value into r11 + pop_single_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field template_end addrPutFieldTemplatePrologue template_start longPutFieldTemplatePrologue - _64bit_slot_stack_to_rXX r11,0 ; copy value into r11 - pop_dual_slot ; reduce stack pointer - _64bit_slot_stack_to_rXX r12,0 ; get the object reference - pop_single_slot ; reduce the stack pointer - lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field + _64bit_slot_stack_to_rXX r11,0 ; copy value into r11 + pop_dual_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field template_end longPutFieldTemplatePrologue template_start floatPutFieldTemplatePrologue - movss xmm0, [r10] ; copy value into r11 - pop_single_slot ; reduce stack pointer - _64bit_slot_stack_to_rXX r12,0 ; get the object reference - pop_single_slot ; reduce the stack pointer - lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field + movss xmm0, [r10] ; copy value into r11 + pop_single_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field template_end floatPutFieldTemplatePrologue template_start doublePutFieldTemplatePrologue - movsd xmm0, [r10] ; copy value into r11 - pop_dual_slot ; reduce stack pointer - _64bit_slot_stack_to_rXX r12,0 ; get the object reference - pop_single_slot ; reduce the stack pointer - lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field + movsd xmm0, [r10] ; copy value into r11 + pop_dual_slot ; reduce stack pointer + _64bit_slot_stack_to_rXX r12,0 ; get the object reference + pop_single_slot ; reduce the stack pointer + lea r13, [r12 + 0xefbeadde] ; load the effective address of the object field template_end doublePutFieldTemplatePrologue template_start intPutFieldTemplate - mov [r13], r11d ; Move value to memory + mov [r13], r11d ; Move value to memory template_end intPutFieldTemplate template_start compressReferenceTemplate - shr r11, 0xff + shr r11, 0xff template_end compressReferenceTemplate template_start compressReference1Template - shr r11, 0x01 + shr r11, 0x01 template_end compressReference1Template template_start longPutFieldTemplate - mov [r13], r11 ; Move value to memory + mov [r13], r11 ; Move value to memory template_end longPutFieldTemplate template_start floatPutFieldTemplate - movss [r13], xmm0 ; Move value to memory + movss [r13], xmm0 ; Move value to memory template_end floatPutFieldTemplate template_start doublePutFieldTemplate - movsd [r13], xmm0 ; Move value to memory + movsd [r13], xmm0 ; Move value to memory template_end doublePutFieldTemplate template_start staticTemplatePrologue - lea r13, [0xefbeadde] ; load the absolute address of a static value + lea r13, [0xefbeadde] ; load the absolute address of a static value template_end staticTemplatePrologue template_start intGetStaticTemplate - push_single_slot ; allocate a stack slot - mov r11d, dword [r13] ; load 32-bit value into r11 - _32bit_slot_stack_from_rXX r11,0 ; move it onto the stack + push_single_slot ; allocate a stack slot + mov r11d, dword [r13] ; load 32-bit value into r11 + _32bit_slot_stack_from_rXX r11,0 ; move it onto the stack template_end intGetStaticTemplate template_start addrGetStaticTemplatePrologue - push_single_slot ; allocate a stack slot - mov r11, qword [r13] ; load 32-bit value into r11 + push_single_slot ; allocate a stack slot + mov r11, qword [r13] ; load 32-bit value into r11 template_end addrGetStaticTemplatePrologue template_start addrGetStaticTemplate - _64bit_slot_stack_from_rXX r11,0 ; move it onto the stack + _64bit_slot_stack_from_rXX r11,0 ; move it onto the stack template_end addrGetStaticTemplate template_start longGetStaticTemplate - push_dual_slot ; allocate a stack slot - mov r11, [r13] ; load 64-bit value into r11 - _64bit_slot_stack_from_rXX r11,0 ; move it onto the stack + push_dual_slot ; allocate a stack slot + mov r11, [r13] ; load 64-bit value into r11 + _64bit_slot_stack_from_rXX r11,0 ; move it onto the stack template_end longGetStaticTemplate template_start floatGetStaticTemplate - push_single_slot ; allocate a stack slot - movss xmm0, [r13] ; load value into r11 - movss [r10], xmm0 ; move it onto the stack + push_single_slot ; allocate a stack slot + movss xmm0, [r13] ; load value into r11 + movss [r10], xmm0 ; move it onto the stack template_end floatGetStaticTemplate template_start doubleGetStaticTemplate - push_dual_slot ; allocate a stack slot - movsd xmm0, [r13] ; load value into r11 - movsd [r10], xmm0 ; move it onto the stack + push_dual_slot ; allocate a stack slot + movsd xmm0, [r13] ; load value into r11 + movsd [r10], xmm0 ; move it onto the stack template_end doubleGetStaticTemplate template_start intPutStaticTemplate - _32bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 - pop_single_slot ; Pop the value off the stack - mov dword [r13], r11d ; Move it to memory + _32bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 + pop_single_slot ; Pop the value off the stack + mov dword [r13], r11d ; Move it to memory template_end intPutStaticTemplate template_start addrPutStaticTemplatePrologue - _64bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 - pop_single_slot ; Pop the value off the stack + _64bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 + pop_single_slot ; Pop the value off the stack template_end addrPutStaticTemplatePrologue template_start addrPutStaticTemplate - mov qword [r13], r11 ; Move it to memory + mov qword [r13], r11 ; Move it to memory template_end addrPutStaticTemplate template_start longPutStaticTemplate - _64bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 - pop_dual_slot ; Pop the value off the stack - mov [r13], r11 ; Move it to memory + _64bit_slot_stack_to_rXX r11,0 ; Move stack value into r11 + pop_dual_slot ; Pop the value off the stack + mov [r13], r11 ; Move it to memory template_end longPutStaticTemplate template_start floatPutStaticTemplate - movss xmm0, [r10] ; Move stack value into r11 - pop_single_slot ; Pop the value off the stack - movss [r13], xmm0 ; Move it to memory + movss xmm0, [r10] ; Move stack value into r11 + pop_single_slot ; Pop the value off the stack + movss [r13], xmm0 ; Move it to memory template_end floatPutStaticTemplate template_start doublePutStaticTemplate - movsd xmm0, [r10] ; Move stack value into r11 - pop_dual_slot ; Pop the value off the stack - movsd [r13], xmm0 ; Move it to memory + movsd xmm0, [r10] ; Move stack value into r11 + pop_dual_slot ; Pop the value off the stack + movsd [r13], xmm0 ; Move it to memory template_end doublePutStaticTemplate template_start iAddTemplate - _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_single_slot ; which means reducing the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r12,0 ; copy second value to the value register - add r11d, r12d ; add the value to the accumulator - _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; which means reducing the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r12,0 ; copy second value to the value register + add r11d, r12d ; add the value to the accumulator + _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg template_end iAddTemplate template_start iSubTemplate - _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r11,0 ; copy second value to the accumulator register - sub r11d, r12d ; subtract the value from the accumulator - _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; copy second value to the accumulator register + sub r11d, r12d ; subtract the value from the accumulator + _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg template_end iSubTemplate template_start iMulTemplate - _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_single_slot ; which means reducing the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r12,0 ; copy second value to the value register - imul r11d, r12d ; multiply accumulator by the value to the accumulator - _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; which means reducing the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r12,0 ; copy second value to the value register + imul r11d, r12d ; multiply accumulator by the value to the accumulator + _32bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg template_end iMulTemplate template_start iDivTemplate - _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_eXX ax,0 ; copy second value (dividend) to eax - cdq ; extend sign bit from eax to edx - idiv r12d ; divide edx:eax by lower 32 bits of r12 value - _32bit_slot_stack_from_eXX ax,0 ; store the quotient in accumulator + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value (dividend) to eax + cdq ; extend sign bit from eax to edx + idiv r12d ; divide edx:eax by lower 32 bits of r12 value + _32bit_slot_stack_from_eXX ax,0 ; store the quotient in accumulator template_end iDivTemplate template_start iRemTemplate - _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_eXX ax,0 ; copy second value (dividend) to eax - cdq ; extend sign bit from eax to edx - idiv r12d ; divide edx:eax by lower 32 bits of r12 value - _32bit_slot_stack_from_eXX dx,0 ; store the remainder in accumulator + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value (dividend) to eax + cdq ; extend sign bit from eax to edx + idiv r12d ; divide edx:eax by lower 32 bits of r12 value + _32bit_slot_stack_from_eXX dx,0 ; store the remainder in accumulator template_end iRemTemplate template_start iNegTemplate - _32bit_slot_stack_to_rXX r12,0 ; copy the top value from the stack into r12 - neg r12d ; two's complement negation of r12 - _32bit_slot_stack_from_rXX r12,0 ; store the result back on the stack + _32bit_slot_stack_to_rXX r12,0 ; copy the top value from the stack into r12 + neg r12d ; two's complement negation of r12 + _32bit_slot_stack_from_rXX r12,0 ; store the result back on the stack template_end iNegTemplate template_start iShlTemplate - mov cl, [r10] ; copy top value of stack in the cl register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax - sal eax, cl ; arithmetic left shift eax by cl bits - _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax + sal eax, cl ; arithmetic left shift eax by cl bits + _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack template_end iShlTemplate template_start iShrTemplate - mov cl, [r10] ; copy top value of stack in the cl register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax - sar eax, cl ; arithmetic right shift eax by cl bits - _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax + sar eax, cl ; arithmetic right shift eax by cl bits + _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack template_end iShrTemplate template_start iUshrTemplate - mov cl, [r10] ; copy top value of stack in the cl register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax - shr eax, cl ; logical right shift eax by cl bits - _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_eXX ax,0 ; copy second value to eax + shr eax, cl ; logical right shift eax by cl bits + _32bit_slot_stack_from_eXX ax,0 ; store the result back on the stack template_end iUshrTemplate template_start iAndTemplate - _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) - pop_single_slot ; reduce the stack size by 1 slot - _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) - and r11d, r12d ; perform the bitwise and operation - _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_single_slot ; reduce the stack size by 1 slot + _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + and r11d, r12d ; perform the bitwise and operation + _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack template_end iAndTemplate template_start iOrTemplate - _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) - pop_single_slot ; reduce the stack size by 1 slot - _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) - or r11d, r12d ; perform the bitwise or operation - _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_single_slot ; reduce the stack size by 1 slot + _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + or r11d, r12d ; perform the bitwise or operation + _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack template_end iOrTemplate template_start iXorTemplate - _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) - pop_single_slot ; reduce the stack size by 1 slot - _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) - xor r11d, r12d ; perform the bitwise xor operation - _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack + _32bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_single_slot ; reduce the stack size by 1 slot + _32bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + xor r11d, r12d ; perform the bitwise xor operation + _32bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack template_end iXorTemplate template_start i2lTemplate - movsxd r12, dword [r10] ; takes the bottom 32 from the stack bits and sign extends - push_single_slot ; add slot for long (1 slot -> 2 slot) - _64bit_slot_stack_from_rXX r12,0 ; push the result back on the stack + movsxd r12, dword [r10] ; takes the bottom 32 from the stack bits and sign extends + push_single_slot ; add slot for long (1 slot -> 2 slot) + _64bit_slot_stack_from_rXX r12,0 ; push the result back on the stack template_end i2lTemplate template_start l2iTemplate - movsxd r12, [r10] ; takes the bottom 32 from the stack bits and sign extends - pop_single_slot ; remove extra slot for long (2 slot -> 1 slot) - _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack + movsxd r12, [r10] ; takes the bottom 32 from the stack bits and sign extends + pop_single_slot ; remove extra slot for long (2 slot -> 1 slot) + _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack template_end l2iTemplate template_start i2bTemplate - movsx r12, byte [r10] ; takes the top byte from the stack and sign extends - _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack + movsx r12, byte [r10] ; takes the top byte from the stack and sign extends + _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack template_end i2bTemplate template_start i2sTemplate - movsx r12, word [r10] ; takes the top word from the stack and sign extends - _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack + movsx r12, word [r10] ; takes the top word from the stack and sign extends + _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack template_end i2sTemplate template_start i2cTemplate - movzx r12, word [r10] ; takes the top word from the stack and zero extends - _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack + movzx r12, word [r10] ; takes the top word from the stack and zero extends + _32bit_slot_stack_from_rXX r12,0 ; push the result back on the stack template_end i2cTemplate template_start i2dTemplate - cvtsi2sd xmm0, dword [r10] ; convert signed 32 bit integer to double-presision fp - push_single_slot ; make room for the double - movsd [r10], xmm0 ; push on top of stack + cvtsi2sd xmm0, dword [r10] ; convert signed 32 bit integer to double-presision fp + push_single_slot ; make room for the double + movsd [r10], xmm0 ; push on top of stack template_end i2dTemplate template_start l2dTemplate - cvtsi2sd xmm0, qword [r10] ; convert signed 64 bit integer to double-presision fp - movsd [r10], xmm0 ; push on top of stack + cvtsi2sd xmm0, qword [r10] ; convert signed 64 bit integer to double-presision fp + movsd [r10], xmm0 ; push on top of stack template_end l2dTemplate template_start d2iTemplate - movsd xmm0, [r10] ; load value from top of stack - mov r11, 2147483647 ; store max int in r11 - mov r12, -2147483648 ; store min int in r12 - mov rcx, 0 ; store 0 in rcx - cvtsi2sd xmm1, r11d ; convert max int to double - cvtsi2sd xmm2, r12d ; convert min int to double - cvttsd2si rax, xmm0 ; convert double to int with truncation - ucomisd xmm0, xmm1 ; compare value with max int - cmovae rax, r11 ; if value is above or equal to max int, store max int in rax - ucomisd xmm0, xmm2 ; compare value with min int - cmovb rax, r12 ; if value is below min int, store min int in rax - cmovp rax, rcx ; if value is NaN, store 0 in rax - pop_single_slot ; remove extra slot for double (2 slot -> 1 slot) - _64bit_slot_stack_from_rXX rax,0 ; store result back on stack + movsd xmm0, [r10] ; load value from top of stack + mov r11, 2147483647 ; store max int in r11 + mov r12, -2147483648 ; store min int in r12 + mov rcx, 0 ; store 0 in rcx + cvtsi2sd xmm1, r11d ; convert max int to double + cvtsi2sd xmm2, r12d ; convert min int to double + cvttsd2si rax, xmm0 ; convert double to int with truncation + ucomisd xmm0, xmm1 ; compare value with max int + cmovae rax, r11 ; if value is above or equal to max int, store max int in rax + ucomisd xmm0, xmm2 ; compare value with min int + cmovb rax, r12 ; if value is below min int, store min int in rax + cmovp rax, rcx ; if value is NaN, store 0 in rax + pop_single_slot ; remove extra slot for double (2 slot -> 1 slot) + _64bit_slot_stack_from_rXX rax,0 ; store result back on stack template_end d2iTemplate template_start d2lTemplate - movsd xmm0, [r10] ; load value from top of stack - mov r11, 9223372036854775807 ; store max long in r11 - mov r12, -9223372036854775808 ; store min long in r12 - mov rcx, 0 ; store 0 in rcx - cvtsi2sd xmm1, r11 ; convert max long to double - cvtsi2sd xmm2, r12 ; convert min long to double - cvttsd2si rax, xmm0 ; convert double to long with truncation - ucomisd xmm0, xmm1 ; compare value with max int - cmovae rax, r11 ; if value is above or equal to max long, store max int in rax - ucomisd xmm0, xmm2 ; compare value with min int - cmovb rax, r12 ; if value is below min int, store min int in rax - cmovp rax, rcx ; if value is NaN, store 0 in rax - _64bit_slot_stack_from_rXX rax,0 ; store back on stack + movsd xmm0, [r10] ; load value from top of stack + mov r11, 9223372036854775807 ; store max long in r11 + mov r12, -9223372036854775808 ; store min long in r12 + mov rcx, 0 ; store 0 in rcx + cvtsi2sd xmm1, r11 ; convert max long to double + cvtsi2sd xmm2, r12 ; convert min long to double + cvttsd2si rax, xmm0 ; convert double to long with truncation + ucomisd xmm0, xmm1 ; compare value with max int + cmovae rax, r11 ; if value is above or equal to max long, store max int in rax + ucomisd xmm0, xmm2 ; compare value with min int + cmovb rax, r12 ; if value is below min int, store min int in rax + cmovp rax, rcx ; if value is NaN, store 0 in rax + _64bit_slot_stack_from_rXX rax,0 ; store back on stack template_end d2lTemplate template_start iconstm1Template - push_single_slot ; increase the java stack size by 1 slot (8 bytes) - mov qword [r10], -1 ; push -1 to the java stack + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], -1 ; push -1 to the java stack template_end iconstm1Template template_start iconst0Template - push_single_slot ; increase the java stack size by 1 slot (8 bytes) - mov qword [r10], 0 ; push 0 to the java stack + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 0 ; push 0 to the java stack template_end iconst0Template template_start iconst1Template - push_single_slot ; increase the java stack size by 1 slot (8 bytes) - mov qword [r10], 1 ; push 1 to the java stack + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 1 ; push 1 to the java stack template_end iconst1Template template_start iconst2Template - push_single_slot ; increase the java stack size by 1 slot (8 bytes) - mov qword [r10], 2 ; push 2 to the java stack + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 2 ; push 2 to the java stack template_end iconst2Template template_start iconst3Template - push_single_slot ; increase the java stack size by 1 slot (8 bytes) - mov qword [r10], 3 ; push 3 to the java stack + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 3 ; push 3 to the java stack template_end iconst3Template template_start iconst4Template - push_single_slot ; increase the java stack size by 1 slot (8 bytes) - mov qword [r10], 4 ; push 4 to the java stack + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 4 ; push 4 to the java stack template_end iconst4Template template_start iconst5Template - push_single_slot ; increase the java stack size by 1 slot (8 bytes) - mov qword [r10], 5 ; push 5 to the java stack + push_single_slot ; increase the java stack size by 1 slot (8 bytes) + mov qword [r10], 5 ; push 5 to the java stack template_end iconst5Template template_start bipushTemplate - push_single_slot ; increase the stack by 1 slot - mov qword [r10], 0xFF ; push immediate value to stack + push_single_slot ; increase the stack by 1 slot + mov qword [r10], 0xFF ; push immediate value to stack template_end bipushTemplate template_start sipushTemplatePrologue - mov r11w, 0xFF ; put immediate value in accumulator for sign extension + mov r11w, 0xFF ; put immediate value in accumulator for sign extension template_end sipushTemplatePrologue template_start sipushTemplate - movsx r11, r11w ; sign extend the contents of r11w through the rest of r11 - push_single_slot ; increase the stack by 1 slot - _32bit_slot_stack_from_rXX r11,0 ; push immediate value to stack + movsx r11, r11w ; sign extend the contents of r11w through the rest of r11 + push_single_slot ; increase the stack by 1 slot + _32bit_slot_stack_from_rXX r11,0 ; push immediate value to stack template_end sipushTemplate template_start iIncTemplate_01_load - _32bit_local_to_rXX_PATCH r11 + _32bit_local_to_rXX_PATCH r11 template_end iIncTemplate_01_load template_start iIncTemplate_02_add - add r11, byte 0xFF + add r11, byte 0xFF template_end iIncTemplate_02_add template_start iIncTemplate_03_store - _32bit_local_from_rXX_PATCH r11 + _32bit_local_from_rXX_PATCH r11 template_end iIncTemplate_03_store template_start lAddTemplate - _64bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX r12,0 ; copy second value to the value register - add r11, r12 ; add the value to the accumulator - _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg + _64bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r12,0 ; copy second value to the value register + add r11, r12 ; add the value to the accumulator + _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg template_end lAddTemplate template_start lSubTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX r11,0 ; copy second value to the accumulator register - sub r11, r12 ; subtract the value from the accumulator - _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r11,0 ; copy second value to the accumulator register + sub r11, r12 ; subtract the value from the accumulator + _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg template_end lSubTemplate template_start lMulTemplate - _64bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX r12,0 ; copy second value to the value register - imul r11, r12 ; multiply accumulator by the value to the accumulator - _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg + _64bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r12,0 ; copy second value to the value register + imul r11, r12 ; multiply accumulator by the value to the accumulator + _64bit_slot_stack_from_rXX r11,0 ; write the accumulator over the second arg template_end lMulTemplate template_start lDivTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX rax,0 ; copy second value (dividend) to rax - cqo ; extend sign bit from rax to rdx - idiv r12 ; divide rdx:rax by r12 value - _64bit_slot_stack_from_rXX rax,0 ; store the quotient in accumulator + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value (dividend) to rax + cqo ; extend sign bit from rax to rdx + idiv r12 ; divide rdx:rax by r12 value + _64bit_slot_stack_from_rXX rax,0 ; store the quotient in accumulator template_end lDivTemplate template_start lRemTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX rax,0 ; copy second value (dividend) to rax - cqo ; extend sign bit from rax to rdx - idiv r12 ; divide rdx:rax by r12 value - _64bit_slot_stack_from_rXX rdx,0 ; store the remainder in accumulator + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the value register (divisor) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value (dividend) to rax + cqo ; extend sign bit from rax to rdx + idiv r12 ; divide rdx:rax by r12 value + _64bit_slot_stack_from_rXX rdx,0 ; store the remainder in accumulator template_end lRemTemplate template_start lNegTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy the top value from the stack into r12 - neg r12 ; two's complement negation of r12 - _64bit_slot_stack_from_rXX r12,0 ; store the result back on the stack + _64bit_slot_stack_to_rXX r12,0 ; copy the top value from the stack into r12 + neg r12 ; two's complement negation of r12 + _64bit_slot_stack_from_rXX r12,0 ; store the result back on the stack template_end lNegTemplate template_start lShlTemplate - mov cl, [r10] ; copy top value of stack in the cl register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax - sal rax, cl ; arithmetic left shift rax by cl bits - _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax + sal rax, cl ; arithmetic left shift rax by cl bits + _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack template_end lShlTemplate template_start lShrTemplate - mov cl, [r10] ; copy top value of stack in the cl register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax - sar rax, cl ; arithmetic right shift rax by cl bits - _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax + sar rax, cl ; arithmetic right shift rax by cl bits + _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack template_end lShrTemplate template_start lUshrTemplate - mov cl, [r10] ; copy top value of stack in the cl register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax - shr rax, cl ; logical right shift rax by cl bits - _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack + mov cl, [r10] ; copy top value of stack in the cl register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX rax,0 ; copy second value to rax + shr rax, cl ; logical right shift rax by cl bits + _64bit_slot_stack_from_rXX rax,0 ; store the result back on the stack template_end lUshrTemplate template_start lAndTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) - and r11, r12 ; perform the bitwise and operation - _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + and r11, r12 ; perform the bitwise and operation + _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack template_end lAndTemplate template_start lOrTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) - or r11 , r12 ; perform the bitwise or operation - _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + or r11 , r12 ; perform the bitwise or operation + _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack template_end lOrTemplate template_start lXorTemplate - _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) - xor r11 , r12 ; perform the bitwise xor operation - _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack + _64bit_slot_stack_to_rXX r12,0 ; copy top value of stack in the accumulator (first operand) + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r11,0 ; copy the second value into a register (second operand) + xor r11 , r12 ; perform the bitwise xor operation + _64bit_slot_stack_from_rXX r11,0 ; store the result back on the java stack template_end lXorTemplate template_start lconst0Template - push_dual_slot ; increase the stack size by 2 slots (16 bytes) - mov qword [r10], 0 ; push 0 to the java stack + push_dual_slot ; increase the stack size by 2 slots (16 bytes) + mov qword [r10], 0 ; push 0 to the java stack template_end lconst0Template template_start lconst1Template - push_dual_slot ; increase the stack size by 2 slots (16 bytes) - mov qword [r10], 1 ; push 1 to the java stack + push_dual_slot ; increase the stack size by 2 slots (16 bytes) + mov qword [r10], 1 ; push 1 to the java stack template_end lconst1Template template_start fAddTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movss xmm1, [r10] ; copy top value of stack in xmm1 - pop_single_slot ; reduce the stack size by 1 slot - movss xmm0, [r10] ; copy the second value in xmm0 - addss xmm0, xmm1 ; add the values in xmm registers and store in xmm0 - movss [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] ; copy top value of stack in xmm1 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm0, [r10] ; copy the second value in xmm0 + addss xmm0, xmm1 ; add the values in xmm registers and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack template_end fAddTemplate template_start fSubTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movss xmm1, [r10] ; copy top value of stack in xmm1 - pop_single_slot ; reduce the stack size by 1 slot - movss xmm0, [r10] ; copy the second value in xmm0 - subss xmm0, xmm1 ; subtract the value of xmm1 from xmm0 and store in xmm0 - movss [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] ; copy top value of stack in xmm1 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm0, [r10] ; copy the second value in xmm0 + subss xmm0, xmm1 ; subtract the value of xmm1 from xmm0 and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack template_end fSubTemplate template_start fMulTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movss xmm1, [r10] ; copy top value of stack in xmm1 - pop_single_slot ; reduce the stack size by 1 slot - movss xmm0, [r10] ; copy the second value in xmm0 - mulss xmm0, xmm1 ; multiply the values in xmm registers and store in xmm0 - movss [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] ; copy top value of stack in xmm1 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm0, [r10] ; copy the second value in xmm0 + mulss xmm0, xmm1 ; multiply the values in xmm registers and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack template_end fMulTemplate template_start fDivTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movss xmm1, [r10] ; copy top value of stack in xmm1 - pop_single_slot ; reduce the stack size by 1 slot - movss xmm0, [r10] ; copy the second value in xmm0 - divss xmm0, xmm1 ; divide the value of xmm0 by xmm1 and store in xmm0 - movss [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm1, [r10] ; copy top value of stack in xmm1 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm0, [r10] ; copy the second value in xmm0 + divss xmm0, xmm1 ; divide the value of xmm0 by xmm1 and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack template_end fDivTemplate template_start fRemTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movss xmm0, [r10] ; copy top value of stack in xmm0 - pop_single_slot ; reduce the stack size by 1 slot - movss xmm1, [r10] ; copy the second value in xmm1 - call 0xefbeadde ; for function call + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + pop_single_slot ; reduce the stack size by 1 slot + movss xmm1, [r10] ; copy the second value in xmm1 + call 0xefbeadde ; for function call template_end fRemTemplate template_start fNegTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movss xmm0, [r10] ; copy top value of stack in xmm0 - mov eax, 0x80000000 ; loading mask immediate value to xmm1 through eax - movd xmm1, eax ; as it can't be directly loaded to xmm registers - xorps xmm0, xmm1 ; XOR xmm0 and xmm1 mask bits to negate xmm0 - movss [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + mov eax, 0x80000000 ; loading mask immediate value to xmm1 through eax + movd xmm1, eax ; as it can't be directly loaded to xmm registers + xorps xmm0, xmm1 ; XOR xmm0 and xmm1 mask bits to negate xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack template_end fNegTemplate template_start fconst0Template - push_single_slot ; increase the java stack size by 1 slot - fldz ; push 0 onto the fpu register stack - fbstp [r10] ; stores st0 on the java stack + push_single_slot ; increase the java stack size by 1 slot + fldz ; push 0 onto the fpu register stack + fbstp [r10] ; stores st0 on the java stack template_end fconst0Template template_start fconst1Template - push_single_slot ; increase the java stack size by 1 slot - fld1 ; push 1 onto the fpu register stack - fbstp [r10] ; stores st0 on the java stack + push_single_slot ; increase the java stack size by 1 slot + fld1 ; push 1 onto the fpu register stack + fbstp [r10] ; stores st0 on the java stack template_end fconst1Template template_start fconst2Template - push_single_slot ; increase the java stack size by 1 slot - fld1 ; push 1 on the fpu register stack - fld1 ; push another 1 on the fpu register stack - fadd st0, st1 ; add the top two values together and store them back on the top of the stack - fbstp [r10] ; stores st0 on the java stack + push_single_slot ; increase the java stack size by 1 slot + fld1 ; push 1 on the fpu register stack + fld1 ; push another 1 on the fpu register stack + fadd st0, st1 ; add the top two values together and store them back on the top of the stack + fbstp [r10] ; stores st0 on the java stack template_end fconst2Template template_start dconst0Template - push_dual_slot ; increase the java stack size by 2 slots - fldz ; push 0 onto the fpu register stack - fbstp [r10] ; stores st0 on the java stack + push_dual_slot ; increase the java stack size by 2 slots + fldz ; push 0 onto the fpu register stack + fbstp [r10] ; stores st0 on the java stack template_end dconst0Template template_start dconst1Template - push_dual_slot ; increase the java stack size by 2 slots - fld1 ; push 1 onto the fpu register stack - fbstp [r10] ; stores st0 on the java stack + push_dual_slot ; increase the java stack size by 2 slots + fld1 ; push 1 onto the fpu register stack + fbstp [r10] ; stores st0 on the java stack template_end dconst1Template template_start dAddTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movsd xmm1, [r10] ; copy top value of stack in xmm1 - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - movsd xmm0, [r10] ; copy the second value in xmm0 - addsd xmm0, xmm1 ; add the values in xmm registers and store in xmm0 - movsd [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] ; copy top value of stack in xmm1 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm0, [r10] ; copy the second value in xmm0 + addsd xmm0, xmm1 ; add the values in xmm registers and store in xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack template_end dAddTemplate template_start dSubTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movsd xmm1, [r10] ; copy top value of stack in xmm1 - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - movsd xmm0, [r10] ; copy the second value in xmm0 - subsd xmm0, xmm1 ; subtract the value of xmm1 from xmm0 and store in xmm0 - movsd [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] ; copy top value of stack in xmm1 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm0, [r10] ; copy the second value in xmm0 + subsd xmm0, xmm1 ; subtract the value of xmm1 from xmm0 and store in xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack template_end dSubTemplate template_start dMulTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movsd xmm1, [r10] ; copy top value of stack in xmm1 - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - movsd xmm0, [r10] ; copy the second value in xmm0 - mulsd xmm0, xmm1 ; multiply the values in xmm registers and store in xmm0 - movsd [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] ; copy top value of stack in xmm1 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm0, [r10] ; copy the second value in xmm0 + mulsd xmm0, xmm1 ; multiply the values in xmm registers and store in xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack template_end dMulTemplate template_start dDivTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movsd xmm1, [r10] ; copy top value of stack in xmm1 - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - movsd xmm0, [r10] ; copy the second value in xmm0 - divsd xmm0, xmm1 ; divide the value of xmm0 by xmm1 and store in xmm0 - movsd [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm1, [r10] ; copy top value of stack in xmm1 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm0, [r10] ; copy the second value in xmm0 + divsd xmm0, xmm1 ; divide the value of xmm0 by xmm1 and store in xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack template_end dDivTemplate template_start dRemTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movsd xmm0, [r10] ; copy top value of stack in xmm0 - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - movsd xmm1, [r10] ; copy the second value in xmm1 - call 0xefbeadde ; for function call + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm0, [r10] ; copy top value of stack in xmm0 + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + movsd xmm1, [r10] ; copy the second value in xmm1 + call 0xefbeadde ; for function call template_end dRemTemplate template_start dNegTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - movsd xmm0, [r10] ; copy top value of stack in xmm0 - mov rax, 0x8000000000000000 ; loading mask immediate value to xmm1 through rax - movq xmm1, rax ; as it can't be directly loaded to xmm registers - xorpd xmm0, xmm1 ; XOR xmm0 and xmm1 mask bits to negate xmm0 - movsd [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + movsd xmm0, [r10] ; copy top value of stack in xmm0 + mov rax, 0x8000000000000000 ; loading mask immediate value to xmm1 through rax + movq xmm1, rax ; as it can't be directly loaded to xmm registers + xorpd xmm0, xmm1 ; XOR xmm0 and xmm1 mask bits to negate xmm0 + movsd [r10], xmm0 ; store the result in xmm0 on the java stack template_end dNegTemplate template_start i2fTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - cvtsi2ss xmm0, [r10] ; convert integer to float and store in xmm0 - movss [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + cvtsi2ss xmm0, [r10] ; convert integer to float and store in xmm0 + movss [r10], xmm0 ; store the result in xmm0 on the java stack template_end i2fTemplate template_start f2iTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - psrldq xmm2, 8 ; clear xmm2 by shifting right - movss xmm0, [r10] ; copy top value of stack in xmm0 - mov r11d, 2147483647 ; store max int in r11 - mov r12d, -2147483648 ; store min int in r12 - mov ecx, 0 ; store 0 in rcx - cvtsi2ss xmm1, r11d ; convert max int to float - cvtsi2ss xmm2, r12d ; convert min int to float - cvttss2si eax, xmm0 ; convert float to int with truncation - ucomiss xmm0, xmm1 ; compare value with max int - cmovae eax, r11d ; if value is above or equal to max int, store max int in rax - ucomiss xmm0, xmm2 ; compare value with min int - cmovb eax, r12d ; if value is below min int, store min int in rax - cmovp eax, ecx ; if value is NaN, store 0 in rax - _32bit_slot_stack_from_eXX ax,0 ; store result back on stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + psrldq xmm2, 8 ; clear xmm2 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + mov r11d, 2147483647 ; store max int in r11 + mov r12d, -2147483648 ; store min int in r12 + mov ecx, 0 ; store 0 in rcx + cvtsi2ss xmm1, r11d ; convert max int to float + cvtsi2ss xmm2, r12d ; convert min int to float + cvttss2si eax, xmm0 ; convert float to int with truncation + ucomiss xmm0, xmm1 ; compare value with max int + cmovae eax, r11d ; if value is above or equal to max int, store max int in rax + ucomiss xmm0, xmm2 ; compare value with min int + cmovb eax, r12d ; if value is below min int, store min int in rax + cmovp eax, ecx ; if value is NaN, store 0 in rax + _32bit_slot_stack_from_eXX ax,0 ; store result back on stack template_end f2iTemplate template_start l2fTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - cvtsi2ss xmm0, qword [r10] ; convert long to float and store in xmm0 - pop_single_slot ; remove extra slot (2 slot -> 1 slot) - movss [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + cvtsi2ss xmm0, qword [r10] ; convert long to float and store in xmm0 + pop_single_slot ; remove extra slot (2 slot -> 1 slot) + movss [r10], xmm0 ; store the result in xmm0 on the java stack template_end l2fTemplate template_start f2lTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - psrldq xmm1, 8 ; clear xmm1 by shifting right - psrldq xmm2, 8 ; clear xmm2 by shifting right - movss xmm0, [r10] ; copy top value of stack in xmm0 - mov r11, 9223372036854775807 ; store max long in r11 - mov r12, -9223372036854775808 ; store min long in r12 - mov rcx, 0 ; store 0 in rcx - cvtsi2ss xmm1, r11 ; convert max long to float - cvtsi2ss xmm2, r12 ; convert min long to float - cvttss2si rax, xmm0 ; convert float to long with truncation - ucomiss xmm0, xmm1 ; compare value with max long - cmovae rax, r11 ; if value is above or equal to max long, store max long in rax - ucomiss xmm0, xmm2 ; compare value with min int - cmovb rax, r12 ; if value is below min long, store min long in rax - cmovp rax, rcx ; if value is NaN, store 0 in rax - push_single_slot ; add extra slot for long (1 slot -> 2 slot) - _64bit_slot_stack_from_rXX rax,0 ; store result back on stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + psrldq xmm1, 8 ; clear xmm1 by shifting right + psrldq xmm2, 8 ; clear xmm2 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + mov r11, 9223372036854775807 ; store max long in r11 + mov r12, -9223372036854775808 ; store min long in r12 + mov rcx, 0 ; store 0 in rcx + cvtsi2ss xmm1, r11 ; convert max long to float + cvtsi2ss xmm2, r12 ; convert min long to float + cvttss2si rax, xmm0 ; convert float to long with truncation + ucomiss xmm0, xmm1 ; compare value with max long + cmovae rax, r11 ; if value is above or equal to max long, store max long in rax + ucomiss xmm0, xmm2 ; compare value with min int + cmovb rax, r12 ; if value is below min long, store min long in rax + cmovp rax, rcx ; if value is NaN, store 0 in rax + push_single_slot ; add extra slot for long (1 slot -> 2 slot) + _64bit_slot_stack_from_rXX rax,0 ; store result back on stack template_end f2lTemplate template_start d2fTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - movsd xmm0, [r10] ; copy top value of stack in xmm0 - cvtsd2ss xmm0, xmm0 ; convert double to float and store in xmm0 - pop_single_slot ; remove extra slot from double (2 slot -> 1 slot) - movss [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + movsd xmm0, [r10] ; copy top value of stack in xmm0 + cvtsd2ss xmm0, xmm0 ; convert double to float and store in xmm0 + pop_single_slot ; remove extra slot from double (2 slot -> 1 slot) + movss [r10], xmm0 ; store the result in xmm0 on the java stack template_end d2fTemplate template_start f2dTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - movss xmm0, [r10] ; copy top value of stack in xmm0 - cvtss2sd xmm0, xmm0 ; convert float to double and store in xmm0 - push_single_slot ; add slot to stack for double (1 slot -> 2 slot) - movsd [r10], xmm0 ; store the result in xmm0 on the java stack + psrldq xmm0, 8 ; clear xmm0 by shifting right + movss xmm0, [r10] ; copy top value of stack in xmm0 + cvtss2sd xmm0, xmm0 ; convert float to double and store in xmm0 + push_single_slot ; add slot to stack for double (1 slot -> 2 slot) + movsd [r10], xmm0 ; store the result in xmm0 on the java stack template_end f2dTemplate template_start eaxReturnTemplate - xor rax, rax ; clear rax - _32bit_slot_stack_to_eXX ax,0 ; move the stack top into eax register + xor rax, rax ; clear rax + _32bit_slot_stack_to_eXX ax,0 ; move the stack top into eax register template_end eaxReturnTemplate template_start raxReturnTemplate - _64bit_slot_stack_to_rXX rax,0 ; move the stack top into rax register + _64bit_slot_stack_to_rXX rax,0 ; move the stack top into rax register template_end raxReturnTemplate template_start xmm0ReturnTemplate - psrldq xmm0, 8 ; clear xmm0 by shifting right - movd xmm0, [r10] ; move the stack top into xmm0 register + psrldq xmm0, 8 ; clear xmm0 by shifting right + movd xmm0, [r10] ; move the stack top into xmm0 register template_end xmm0ReturnTemplate template_start retTemplate_add - add r10, byte 0xFF ; decrease stack size + add r10, byte 0xFF ; decrease stack size template_end retTemplate_add template_start vReturnTemplate - ret ; return from the JITed method + ret ; return from the JITed method template_end vReturnTemplate ; Stack before: ..., value1, value2 -> @@ -1071,18 +1071,18 @@ template_end vReturnTemplate ; if value1 == value 2, result = 0 ; if value1 < value 2, result = -1 template_start lcmpTemplate - _64bit_slot_stack_to_rXX r11,0 ; grab value2 from the stack - pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) - _64bit_slot_stack_to_rXX r12,0 ; grab value1 from the stack - pop_single_slot ; reduce long slot to int slot - xor ecx, ecx ; clear ecx (store 0) - mov dword edx, -1 ; store -1 in edx - mov dword ebx, 1 ; store 1 in ebx - cmp r12, r11 ; compare the two values - cmovg eax, ebx ; mov 1 into eax if ZF = 0 and CF = 0 - cmovl eax, edx ; mov -1 into eax if CF = 1 - cmovz eax, ecx ; mov 0 into eax if ZF = 1 - _32bit_slot_stack_from_eXX ax,0 ; store result on top of stack + _64bit_slot_stack_to_rXX r11,0 ; grab value2 from the stack + pop_dual_slot ; reduce the stack size by 2 slots (16 bytes) + _64bit_slot_stack_to_rXX r12,0 ; grab value1 from the stack + pop_single_slot ; reduce long slot to int slot + xor ecx, ecx ; clear ecx (store 0) + mov dword edx, -1 ; store -1 in edx + mov dword ebx, 1 ; store 1 in ebx + cmp r12, r11 ; compare the two values + cmovg eax, ebx ; mov 1 into eax if ZF = 0 and CF = 0 + cmovl eax, edx ; mov -1 into eax if CF = 1 + cmovz eax, ecx ; mov 0 into eax if ZF = 1 + _32bit_slot_stack_from_eXX ax,0 ; store result on top of stack template_end lcmpTemplate ; Stack before: ..., value1, value2 -> @@ -1092,18 +1092,18 @@ template_end lcmpTemplate ; else if value1 < value 2, result = -1 Flags: ZF,PF,CF --> 001 ; else, one of value1 or value2 must be NaN -> result = -1 Flags: ZF,PF,CF --> 111 template_start fcmplTemplate - movss xmm0, dword [r10] ; copy value2 into xmm0 - pop_single_slot ; decrease the stack by 1 slot - movss xmm1, dword [r10] ; copy value1 into xmm1 - xor r11d, r11d ; clear r11 (store 0) - mov dword eax, 1 ; store 1 in rax - mov dword ecx, -1 ; store -1 in rcx - comiss xmm1, xmm0 ; compare the values - cmove r12d, r11d ; mov 0 into r12 if ZF = 1 - cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 - cmovb r12d, ecx ; mov -1 into r12 if CF = 1 - cmovp r12d, ecx ; mov -1 into r12 if PF = 1 - _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack + movss xmm0, dword [r10] ; copy value2 into xmm0 + pop_single_slot ; decrease the stack by 1 slot + movss xmm1, dword [r10] ; copy value1 into xmm1 + xor r11d, r11d ; clear r11 (store 0) + mov dword eax, 1 ; store 1 in rax + mov dword ecx, -1 ; store -1 in rcx + comiss xmm1, xmm0 ; compare the values + cmove r12d, r11d ; mov 0 into r12 if ZF = 1 + cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 + cmovb r12d, ecx ; mov -1 into r12 if CF = 1 + cmovp r12d, ecx ; mov -1 into r12 if PF = 1 + _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack template_end fcmplTemplate ; Stack before: ..., value1, value2 -> @@ -1113,18 +1113,18 @@ template_end fcmplTemplate ; else if value1 < value 2, result = -1 Flags: ZF,PF,CF --> 001 ; else, one of value1 or value2 must be NaN -> result = 1 Flags: ZF,PF,CF --> 111 template_start fcmpgTemplate - movss xmm0, dword [r10] ; copy value2 into xmm0 - pop_single_slot ; decrease the stack by 1 slot - movss xmm1, dword [r10] ; copy value1 into xmm1 - xor r11d, r11d ; clear r11 (store 0) - mov dword eax, 1 ; store 1 in rax - mov dword ecx, -1 ; store -1 in rcx - comiss xmm1, xmm0 ; compare the values - cmove r12d, r11d ; mov 0 into r12 if ZF = 1 - cmovb r12d, ecx ; mov -1 into r12 if CF = 1 - cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 - cmovp r12d, eax ; mov 1 into r12 if PF = 1 - _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack + movss xmm0, dword [r10] ; copy value2 into xmm0 + pop_single_slot ; decrease the stack by 1 slot + movss xmm1, dword [r10] ; copy value1 into xmm1 + xor r11d, r11d ; clear r11 (store 0) + mov dword eax, 1 ; store 1 in rax + mov dword ecx, -1 ; store -1 in rcx + comiss xmm1, xmm0 ; compare the values + cmove r12d, r11d ; mov 0 into r12 if ZF = 1 + cmovb r12d, ecx ; mov -1 into r12 if CF = 1 + cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 + cmovp r12d, eax ; mov 1 into r12 if PF = 1 + _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack template_end fcmpgTemplate ; Stack before: ..., value1, value2 -> @@ -1134,19 +1134,19 @@ template_end fcmpgTemplate ; else if value1 < value 2, result = -1 Flags: ZF,PF,CF --> 001 ; else, one of value1 or value2 must be NaN -> result = -1 Flags: ZF,PF,CF --> 111 template_start dcmplTemplate - movsd xmm0, qword [r10] ; copy value2 into xmm0 - pop_dual_slot ; decrease the stack by 2 slots - movsd xmm1, qword [r10] ; copy value1 into xmm1 - pop_single_slot ; decrease the stack by 1 slot - xor r11d, r11d ; clear r11 (store 0) - mov dword eax, 1 ; store 1 in rax - mov dword ecx, -1 ; store -1 in rcx - comisd xmm1, xmm0 ; compare the values - cmove r12d, r11d ; mov 0 into r12 if ZF = 1 - cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 - cmovb r12d, ecx ; mov -1 into r12 if CF = 1 - cmovp r12d, ecx ; mov -1 into r12 if PF = 1 - _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack + movsd xmm0, qword [r10] ; copy value2 into xmm0 + pop_dual_slot ; decrease the stack by 2 slots + movsd xmm1, qword [r10] ; copy value1 into xmm1 + pop_single_slot ; decrease the stack by 1 slot + xor r11d, r11d ; clear r11 (store 0) + mov dword eax, 1 ; store 1 in rax + mov dword ecx, -1 ; store -1 in rcx + comisd xmm1, xmm0 ; compare the values + cmove r12d, r11d ; mov 0 into r12 if ZF = 1 + cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 + cmovb r12d, ecx ; mov -1 into r12 if CF = 1 + cmovp r12d, ecx ; mov -1 into r12 if PF = 1 + _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack template_end dcmplTemplate ; Stack before: ..., value1, value2 -> @@ -1156,176 +1156,176 @@ template_end dcmplTemplate ; else if value1 < value 2, result = -1 Flags: ZF,PF,CF --> 001 ; else, one of value1 or value2 must be NaN -> result = 1 Flags: ZF,PF,CF --> 111 template_start dcmpgTemplate - movsd xmm0, qword [r10] ; copy value2 into xmm0 - pop_dual_slot ; decrease the stack by 1 slot - movsd xmm1, qword [r10] ; copy value1 into xmm1 - pop_single_slot ; decrease the stack by 1 slot - xor r11d, r11d ; clear r11 (store 0) - mov dword eax, 1 ; store 1 in rax - mov dword ecx, -1 ; store -1 in rcx - comisd xmm1, xmm0 ; compare the values - cmove r12d, r11d ; mov 0 into r12 if ZF = 1 - cmovb r12d, ecx ; mov -1 into r12 if CF = 1 - cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 - cmovp r12d, eax ; mov 1 into r12 if PF = 1 - _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack + movsd xmm0, qword [r10] ; copy value2 into xmm0 + pop_dual_slot ; decrease the stack by 1 slot + movsd xmm1, qword [r10] ; copy value1 into xmm1 + pop_single_slot ; decrease the stack by 1 slot + xor r11d, r11d ; clear r11 (store 0) + mov dword eax, 1 ; store 1 in rax + mov dword ecx, -1 ; store -1 in rcx + comisd xmm1, xmm0 ; compare the values + cmove r12d, r11d ; mov 0 into r12 if ZF = 1 + cmovb r12d, ecx ; mov -1 into r12 if CF = 1 + cmova r12d, eax ; mov 1 into r12 if ZF = 0 and CF = 0 + cmovp r12d, eax ; mov 1 into r12 if PF = 1 + _32bit_slot_stack_from_rXX r12,0 ; store result on top of stack template_end dcmpgTemplate ; ifne succeeds if and only if value ≠ 0 ; Stack before: ..., value -> ; Stack after: ... template_start ifneTemplate - _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - test r11d, r11d ; r11 - r11 - jnz 0xefbeadde ; Used for generating jump not equal to far labels + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jnz 0xefbeadde ; Used for generating jump not equal to far labels template_end ifneTemplate ; ifeq succeeds if and only if value = 0 ; Stack before: ..., value -> ; Stack after: ... template_start ifeqTemplate - _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - test r11d, r11d ; r11 - r11 - jz 0xefbeadde ; Used for generating jump to far labels if ZF = 1 + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jz 0xefbeadde ; Used for generating jump to far labels if ZF = 1 template_end ifeqTemplate ; iflt succeeds if and only if value < 0 ; Stack before: ..., value -> ; Stack after: ... template_start ifltTemplate - _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - test r11d, r11d ; r11 - r11 - jl 0xefbeadde ; Used for generating jump to far labels if SF <> OF + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jl 0xefbeadde ; Used for generating jump to far labels if SF <> OF template_end ifltTemplate ; ifge succeeds if and only if value >= 0 ; Stack before: ..., value -> ; Stack after: ... template_start ifgeTemplate - _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - test r11d, r11d ; r11 - r11 - jge 0xefbeadde ; Used for generating jump to far labels if SF = OF + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jge 0xefbeadde ; Used for generating jump to far labels if SF = OF template_end ifgeTemplate ; ifgt succeeds if and only if value > 0 ; Stack before: ..., value -> ; Stack after: ... template_start ifgtTemplate - _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - test r11d, r11d ; r11 - r11 - jg 0xefbeadde ; Used for generating jump to far labels if ZF = 0 and SF = OF + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jg 0xefbeadde ; Used for generating jump to far labels if ZF = 0 and SF = OF template_end ifgtTemplate ; ifle succeeds if and only if value <= 0 ; Stack before: ..., value -> ; Stack after: ... template_start ifleTemplate - _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - test r11d, r11d ; r11 - r11 - jle 0xefbeadde ; Used for generating jump to far labels if ZF = 1 or SF <> OF + _32bit_slot_stack_to_rXX r11,0 ; pop first value off java stack into the accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + test r11d, r11d ; r11 - r11 + jle 0xefbeadde ; Used for generating jump to far labels if ZF = 1 or SF <> OF template_end ifleTemplate ; if_icmpeq succeeds if and only if value1 = value2 ; Stack before: ..., value1, value2 -> ; Stack after: ... template_start ificmpeqTemplate - _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS - je 0xefbeadde ; Used for generating jump greater than equal to far labels + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + je 0xefbeadde ; Used for generating jump greater than equal to far labels template_end ificmpeqTemplate template_start ifacmpeqTemplate - _64bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _64bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - cmp r11, r12 ; r12 (minuend) - r11 (subtrahend) and set EFLAGS - je 0xefbeadde ; Used for generating jump greater than equal to far labels + _64bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11, r12 ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + je 0xefbeadde ; Used for generating jump greater than equal to far labels template_end ifacmpeqTemplate ; if_icmpne succeeds if and only if value1 ≠ value2 ; Stack before: ..., value1, value2 -> ; Stack after: ... template_start ificmpneTemplate - _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS - jne 0xefbeadde ; Used for generating jump greater than equal to far labels + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jne 0xefbeadde ; Used for generating jump greater than equal to far labels template_end ificmpneTemplate ; if_icmpne succeeds if and only if value1 ≠ value2 ; Stack before: ..., value1, value2 -> ; Stack after: ... template_start ifacmpneTemplate - _64bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _64bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - cmp r11, r12 ; r12 (minuend) - r11 (subtrahend) and set EFLAGS - jne 0xefbeadde ; Used for generating jump greater than equal to far labels + _64bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _64bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11, r12 ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jne 0xefbeadde ; Used for generating jump greater than equal to far labels template_end ifacmpneTemplate ; if_icmplt succeeds if and only if value1 < value2 ; Stack before: ..., value1, value2 -> ; Stack after: ... template_start ificmpltTemplate - _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS - jl 0xefbeadde ; Used for generating jump greater than equal to far labels + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jl 0xefbeadde ; Used for generating jump greater than equal to far labels template_end ificmpltTemplate ; if_icmple succeeds if and only if value1 < value2 ; Stack before: ..., value1, value2 -> ; Stack after: ... template_start ificmpleTemplate - _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS - jle 0xefbeadde ; Used for generating jump greater than equal to far labels + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jle 0xefbeadde ; Used for generating jump greater than equal to far labels template_end ificmpleTemplate ; if_icmpge succeeds if and only if value1 ≥ value2 ; Stack before: ..., value1, value2 -> ; Stack after: ... template_start ificmpgeTemplate - _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS - jge 0xefbeadde ; Used for generating jump greater than equal to far labels + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jge 0xefbeadde ; Used for generating jump greater than equal to far labels template_end ificmpgeTemplate ; if_icmpgt succeeds if and only if value1 > value2 ; Stack before: ..., value1, value2 -> ; Stack after: ... template_start ificmpgtTemplate - _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator - pop_single_slot ; reduce the stack size by 1 slot (8 bytes) - cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS - jg 0xefbeadde ; Used for generating jump greater than to far labels + _32bit_slot_stack_to_rXX r12,0 ; pop first value off java stack into the value register + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + _32bit_slot_stack_to_rXX r11,0 ; pop second value off java stack into accumulator + pop_single_slot ; reduce the stack size by 1 slot (8 bytes) + cmp r11d, r12d ; r12 (minuend) - r11 (subtrahend) and set EFLAGS + jg 0xefbeadde ; Used for generating jump greater than to far labels template_end ificmpgtTemplate template_start gotoTemplate - jmp 0xefbeadde ; Used for generating jump not equal to far labels + jmp 0xefbeadde ; Used for generating jump not equal to far labels template_end gotoTemplate ; Below bytecodes are untested @@ -1334,18 +1334,18 @@ template_end gotoTemplate ; Stack before: ..., value -> ; Stack after: ... template_start ifnullTemplate - _64bit_slot_stack_to_rXX rax,0 ; grab the reference from the operand stack - pop_single_slot ; reduce the mjit stack by one slot - cmp rax, 0 ; compare against 0 (null) - je 0xefbeadde ; jump if rax is null + _64bit_slot_stack_to_rXX rax,0 ; grab the reference from the operand stack + pop_single_slot ; reduce the mjit stack by one slot + cmp rax, 0 ; compare against 0 (null) + je 0xefbeadde ; jump if rax is null template_end ifnullTemplate ; if_nonnull succeeds if and only if value != 0 ; Stack before: ..., value -> ; Stack after: ... template_start ifnonnullTemplate - _64bit_slot_stack_to_rXX rax,0 ; grab the reference from the operand stack - pop_single_slot ; reduce the mjit stack by one slot - cmp rax, 0 ; compare against 0 (null) - jne 0xefbeadde ; jump if rax is not null + _64bit_slot_stack_to_rXX rax,0 ; grab the reference from the operand stack + pop_single_slot ; reduce the mjit stack by one slot + cmp rax, 0 ; compare against 0 (null) + jne 0xefbeadde ; jump if rax is not null template_end ifnonnullTemplate diff --git a/runtime/compiler/microjit/x/amd64/templates/linkage.nasm b/runtime/compiler/microjit/x/amd64/templates/linkage.nasm index ec5981cf4ad..53dba321fde 100644 --- a/runtime/compiler/microjit/x/amd64/templates/linkage.nasm +++ b/runtime/compiler/microjit/x/amd64/templates/linkage.nasm @@ -26,463 +26,463 @@ section .text template_start saveXMM0Local - movq [r14 + 0xefbeadde], xmm0 + movq [r14 + 0xefbeadde], xmm0 template_end saveXMM0Local template_start saveXMM1Local - movq [r14 + 0xefbeadde], xmm1 + movq [r14 + 0xefbeadde], xmm1 template_end saveXMM1Local template_start saveXMM2Local - movq [r14 + 0xefbeadde], xmm2 + movq [r14 + 0xefbeadde], xmm2 template_end saveXMM2Local template_start saveXMM3Local - movq [r14 + 0xefbeadde], xmm3 + movq [r14 + 0xefbeadde], xmm3 template_end saveXMM3Local template_start saveXMM4Local - movq [r14 + 0xefbeadde], xmm4 + movq [r14 + 0xefbeadde], xmm4 template_end saveXMM4Local template_start saveXMM5Local - movq [r14 + 0xefbeadde], xmm5 + movq [r14 + 0xefbeadde], xmm5 template_end saveXMM5Local template_start saveXMM6Local - movq [r14 + 0xefbeadde], xmm6 + movq [r14 + 0xefbeadde], xmm6 template_end saveXMM6Local template_start saveXMM7Local - movq [r14 + 0xefbeadde], xmm7 + movq [r14 + 0xefbeadde], xmm7 template_end saveXMM7Local template_start saveRAXLocal - mov [r14 + 0xefbeadde], rax + mov [r14 + 0xefbeadde], rax template_end saveRAXLocal template_start saveRSILocal - mov [r14 + 0xefbeadde], rsi + mov [r14 + 0xefbeadde], rsi template_end saveRSILocal template_start saveRDXLocal - mov [r14 + 0xefbeadde], rdx + mov [r14 + 0xefbeadde], rdx template_end saveRDXLocal template_start saveRCXLocal - mov [r14 + 0xefbeadde], rcx + mov [r14 + 0xefbeadde], rcx template_end saveRCXLocal template_start saveR11Local - mov [r14 + 0xefbeadde], r11 + mov [r14 + 0xefbeadde], r11 template_end saveR11Local template_start movRSPOffsetR11 - mov r11, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r11, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end movRSPOffsetR11 template_start movRSPR10 - mov r10, rsp; Used for allocating space on the stack. + mov r10, rsp; Used for allocating space on the stack. template_end movRSPR10 template_start movR10R14 - mov r14, r10; Used for allocating space on the stack. + mov r14, r10; Used for allocating space on the stack. template_end movR10R14 template_start subR10Imm4 - sub r10, 0x7fffffff; Used for allocating space on the stack. + sub r10, 0x7fffffff; Used for allocating space on the stack. template_end subR10Imm4 template_start addR10Imm4 - add r10, 0x7fffffff; Used for allocating space on the stack. + add r10, 0x7fffffff; Used for allocating space on the stack. template_end addR10Imm4 template_start subR14Imm4 - sub r14, 0x7fffffff; Used for allocating space on the stack. + sub r14, 0x7fffffff; Used for allocating space on the stack. template_end subR14Imm4 template_start addR14Imm4 - add r14, 0x7fffffff; Used for allocating space on the stack. + add r14, 0x7fffffff; Used for allocating space on the stack. template_end addR14Imm4 template_start saveRBPOffset - mov [rsp+0xff], rbp; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rbp; Used ff as all other labels look valid and 255 will be rare. template_end saveRBPOffset template_start saveEBPOffset - mov [rsp+0xff], ebp; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], ebp; Used ff as all other labels look valid and 255 will be rare. template_end saveEBPOffset template_start saveRSPOffset - mov [rsp+0xff], rsp; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rsp; Used ff as all other labels look valid and 255 will be rare. template_end saveRSPOffset template_start saveESPOffset - mov [rsp+0xff], rsp; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rsp; Used ff as all other labels look valid and 255 will be rare. template_end saveESPOffset template_start loadRBPOffset - mov rbp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rbp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadRBPOffset template_start loadEBPOffset - mov ebp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov ebp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadEBPOffset template_start loadRSPOffset - mov rsp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rsp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadRSPOffset template_start loadESPOffset - mov rsp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rsp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadESPOffset template_start saveEAXOffset - mov [rsp+0xff], eax; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], eax; Used ff as all other labels look valid and 255 will be rare. template_end saveEAXOffset template_start saveESIOffset - mov [rsp+0xff], esi; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], esi; Used ff as all other labels look valid and 255 will be rare. template_end saveESIOffset template_start saveEDXOffset - mov [rsp+0xff], edx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], edx; Used ff as all other labels look valid and 255 will be rare. template_end saveEDXOffset template_start saveECXOffset - mov [rsp+0xff], ecx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], ecx; Used ff as all other labels look valid and 255 will be rare. template_end saveECXOffset template_start saveEBXOffset - mov [rsp+0xff], ebx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], ebx; Used ff as all other labels look valid and 255 will be rare. template_end saveEBXOffset template_start loadEAXOffset - mov eax, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov eax, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadEAXOffset template_start loadESIOffset - mov esi, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov esi, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadESIOffset template_start loadEDXOffset - mov edx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov edx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadEDXOffset template_start loadECXOffset - mov ecx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov ecx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadECXOffset template_start loadEBXOffset - mov ebx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov ebx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadEBXOffset template_start addRSPImm8 - add rsp, 0x7fffffffffffffff; Used for allocating space on the stack. + add rsp, 0x7fffffffffffffff; Used for allocating space on the stack. template_end addRSPImm8 template_start addRSPImm4 - add rsp, 0x7fffffff; Used for allocating space on the stack. + add rsp, 0x7fffffff; Used for allocating space on the stack. template_end addRSPImm4 template_start addRSPImm2 - add rsp, 0x7fff; Used for allocating space on the stack. + add rsp, 0x7fff; Used for allocating space on the stack. template_end addRSPImm2 template_start addRSPImm1 - add rsp, 0x7f; Used for allocating space on the stack. + add rsp, 0x7f; Used for allocating space on the stack. template_end addRSPImm1 template_start je4ByteRel - je 0xff; + je 0xff; template_end je4ByteRel template_start jeByteRel - je 0xff; + je 0xff; template_end jeByteRel template_start subRSPImm8 - sub rsp, 0x7fffffffffffffff; Used for allocating space on the stack. + sub rsp, 0x7fffffffffffffff; Used for allocating space on the stack. template_end subRSPImm8 template_start subRSPImm4 - sub rsp, 0x7fffffff; Used for allocating space on the stack. + sub rsp, 0x7fffffff; Used for allocating space on the stack. template_end subRSPImm4 template_start subRSPImm2 - sub rsp, 0x7fff; Used for allocating space on the stack. + sub rsp, 0x7fff; Used for allocating space on the stack. template_end subRSPImm2 template_start subRSPImm1 - sub rsp, 0x7f; Used for allocating space on the stack. + sub rsp, 0x7f; Used for allocating space on the stack. template_end subRSPImm1 template_start jbe4ByteRel - jbe 0xefbeadde; Used for generating jumps to far labels. + jbe 0xefbeadde; Used for generating jumps to far labels. template_end jbe4ByteRel template_start cmpRspRbpDerefOffset - cmp rsp, [rbp+0xff]; Used ff as all other labels look valid and 255 will be rare. + cmp rsp, [rbp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end cmpRspRbpDerefOffset template_start loadRAXOffset - mov rax, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rax, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadRAXOffset template_start loadRSIOffset - mov rsi, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rsi, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadRSIOffset template_start loadRDXOffset - mov rdx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rdx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadRDXOffset template_start loadRCXOffset - mov rcx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rcx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadRCXOffset template_start loadRBXOffset - mov rbx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rbx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadRBXOffset template_start loadR9Offset - mov r9, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r9, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadR9Offset template_start loadR10Offset - mov r10, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r10, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadR10Offset template_start loadR11Offset - mov r11, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r11, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadR11Offset template_start loadR12Offset - mov r12, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r12, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadR12Offset template_start loadR13Offset - mov r13, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r13, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadR13Offset template_start loadR14Offset - mov r14, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r14, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadR14Offset template_start loadR15Offset - mov r15, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r15, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadR15Offset template_start loadXMM0Offset - movq xmm0, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm0, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM0Offset template_start loadXMM1Offset - movq xmm1, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm1, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM1Offset template_start loadXMM2Offset - movq xmm2, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm2, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM2Offset template_start loadXMM3Offset - movq xmm3, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm3, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM3Offset template_start loadXMM4Offset - movq xmm4, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm4, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM4Offset template_start loadXMM5Offset - movq xmm5, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm5, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM5Offset template_start loadXMM6Offset - movq xmm6, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm6, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM6Offset template_start loadXMM7Offset - movq xmm7, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm7, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM7Offset template_start saveRAXOffset - mov [rsp+0xff], rax; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rax; Used ff as all other labels look valid and 255 will be rare. template_end saveRAXOffset template_start saveRSIOffset - mov [rsp+0xff], rsi; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rsi; Used ff as all other labels look valid and 255 will be rare. template_end saveRSIOffset template_start saveRDXOffset - mov [rsp+0xff], rdx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rdx; Used ff as all other labels look valid and 255 will be rare. template_end saveRDXOffset template_start saveRCXOffset - mov [rsp+0xff], rcx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rcx; Used ff as all other labels look valid and 255 will be rare. template_end saveRCXOffset template_start saveRBXOffset - mov [rsp+0xff], rbx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rbx; Used ff as all other labels look valid and 255 will be rare. template_end saveRBXOffset template_start saveR9Offset - mov [rsp+0xff], r9; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r9; Used ff as all other labels look valid and 255 will be rare. template_end saveR9Offset template_start saveR10Offset - mov [rsp+0xff], r10; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r10; Used ff as all other labels look valid and 255 will be rare. template_end saveR10Offset template_start saveR11Offset - mov [rsp+0xff], r11; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r11; Used ff as all other labels look valid and 255 will be rare. template_end saveR11Offset template_start saveR12Offset - mov [rsp+0xff], r12; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r12; Used ff as all other labels look valid and 255 will be rare. template_end saveR12Offset template_start saveR13Offset - mov [rsp+0xff], r13; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r13; Used ff as all other labels look valid and 255 will be rare. template_end saveR13Offset template_start saveR14Offset - mov [rsp+0xff], r14; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r14; Used ff as all other labels look valid and 255 will be rare. template_end saveR14Offset template_start saveR15Offset - mov [rsp+0xff], r15; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r15; Used ff as all other labels look valid and 255 will be rare. template_end saveR15Offset template_start saveXMM0Offset - movq [rsp+0xff], xmm0; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm0; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM0Offset template_start saveXMM1Offset - movq [rsp+0xff], xmm1; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm1; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM1Offset template_start saveXMM2Offset - movq [rsp+0xff], xmm2; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm2; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM2Offset template_start saveXMM3Offset - movq [rsp+0xff], xmm3; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm3; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM3Offset template_start saveXMM4Offset - movq [rsp+0xff], xmm4; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm4; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM4Offset template_start saveXMM5Offset - movq [rsp+0xff], xmm5; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm5; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM5Offset template_start saveXMM6Offset - movq [rsp+0xff], xmm6; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm6; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM6Offset template_start saveXMM7Offset - movq [rsp+0xff], xmm7; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm7; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM7Offset template_start call4ByteRel - call 0xefbeadde; Used for generating jumps to far labels. + call 0xefbeadde; Used for generating jumps to far labels. template_end call4ByteRel template_start callByteRel - call 0xff; Used for generating jumps to nearby labels. Used ff as all other labels look valid and 255 will be rare. + call 0xff; Used for generating jumps to nearby labels. Used ff as all other labels look valid and 255 will be rare. template_end callByteRel template_start jump4ByteRel - jmp 0xefbeadde; Used for generating jumps to far labels. + jmp 0xefbeadde; Used for generating jumps to far labels. template_end jump4ByteRel template_start jumpByteRel - jmp short $+0x02; Used for generating jumps to nearby labels. shows up as eb 00 in memory (easy to see) + jmp short $+0x02; Used for generating jumps to nearby labels. shows up as eb 00 in memory (easy to see) template_end jumpByteRel template_start nopInstruction - nop; Used for alignment. One byte wide + nop; Used for alignment. One byte wide template_end nopInstruction template_start movRDIImm64 - mov rdi, 0xefbeaddeefbeadde; looks like 'deadbeef' in objdump + mov rdi, 0xefbeaddeefbeadde; looks like 'deadbeef' in objdump template_end movRDIImm64 template_start movEDIImm32 - mov edi, 0xefbeadde; looks like 'deadbeef' in objdump + mov edi, 0xefbeadde; looks like 'deadbeef' in objdump template_end movEDIImm32 template_start movRAXImm64 - mov rax, 0xefbeaddeefbeadde; looks like 'deadbeef' in objdump + mov rax, 0xefbeaddeefbeadde; looks like 'deadbeef' in objdump template_end movRAXImm64 template_start movEAXImm32 - mov eax, 0xefbeadde; looks like 'deadbeef' in objdump + mov eax, 0xefbeadde; looks like 'deadbeef' in objdump template_end movEAXImm32 template_start jumpRDI - jmp rdi; Useful for jumping to absolute addresses. Store in RDI then generate this. + jmp rdi; Useful for jumping to absolute addresses. Store in RDI then generate this. template_end jumpRDI template_start jumpRAX - jmp rax; Useful for jumping to absolute addresses. Store in RAX then generate this. + jmp rax; Useful for jumping to absolute addresses. Store in RAX then generate this. template_end jumpRAX template_start paintRegister - mov rax, qword 0x0df0adde0df0adde + mov rax, qword 0x0df0adde0df0adde template_end paintRegister template_start paintLocal - mov [r14+0xefbeadde], rax + mov [r14+0xefbeadde], rax template_end paintLocal template_start moveCountAndRecompile - mov eax, [qword 0xefbeaddeefbeadde] + mov eax, [qword 0xefbeaddeefbeadde] template_end moveCountAndRecompile template_start checkCountAndRecompile - test eax,eax - je 0xefbeadde + test eax,eax + je 0xefbeadde template_end checkCountAndRecompile template_start loadCounter - mov eax, [qword 0xefbeaddeefbeadde] + mov eax, [qword 0xefbeaddeefbeadde] template_end loadCounter template_start decrementCounter - sub eax, 1 - mov [qword 0xefbeaddeefbeadde], eax + sub eax, 1 + mov [qword 0xefbeaddeefbeadde], eax template_end decrementCounter template_start incrementCounter - inc rax - mov [qword 0xefbeaddeefbeadde], eax + inc rax + mov [qword 0xefbeaddeefbeadde], eax template_end incrementCounter template_start jgCount - test eax, eax - jg 0xefbeadde + test eax, eax + jg 0xefbeadde template_end jgCount template_start callRetranslateArg1 - mov rax, qword 0x0df0adde0df0adde + mov rax, qword 0x0df0adde0df0adde template_end callRetranslateArg1 template_start callRetranslateArg2 - mov rsi, qword 0x0df0adde0df0adde + mov rsi, qword 0x0df0adde0df0adde template_end callRetranslateArg2 template_start callRetranslate - mov edi, 0x00080000 - call 0xefbeadde + mov edi, 0x00080000 + call 0xefbeadde template_end callRetranslate template_start setCounter - mov rax, 0x2710 - mov [qword 0xefbeaddeefbeadde], eax + mov rax, 0x2710 + mov [qword 0xefbeaddeefbeadde], eax template_end setCounter template_start jmpToBody - jmp 0xefbeadde -template_end jmpToBody \ No newline at end of file + jmp 0xefbeadde +template_end jmpToBody diff --git a/runtime/compiler/microjit/x/amd64/templates/microjit.nasm b/runtime/compiler/microjit/x/amd64/templates/microjit.nasm index cd5267d8361..812c2a81b4f 100644 --- a/runtime/compiler/microjit/x/amd64/templates/microjit.nasm +++ b/runtime/compiler/microjit/x/amd64/templates/microjit.nasm @@ -86,7 +86,7 @@ mov dword [r10 + %2], %1d %endmacro %macro _32bit_slot_stack_from_eXX 2 -mov dword [r10 + %2], e%1 +mov dword [r10 + %2], e%1 %endmacro %macro _64bit_slot_stack_from_rXX 2 diff --git a/runtime/compiler/optimizer/J9Optimizer.cpp b/runtime/compiler/optimizer/J9Optimizer.cpp index efa2d2bc4b4..0496b1fdea3 100644 --- a/runtime/compiler/optimizer/J9Optimizer.cpp +++ b/runtime/compiler/optimizer/J9Optimizer.cpp @@ -92,7 +92,7 @@ static const OptimizationStrategy J9MicroJITOpts[] = { OMR::jProfilingBlock, OMR::MustBeDone }, { OMR::endOpts } }; -#endif +#endif /* J9VM_OPT_MICROJIT */ static const OptimizationStrategy J9EarlyGlobalOpts[] = { diff --git a/runtime/compiler/optimizer/JProfilingBlock.cpp b/runtime/compiler/optimizer/JProfilingBlock.cpp index 03557670847..6437d38cd5b 100644 --- a/runtime/compiler/optimizer/JProfilingBlock.cpp +++ b/runtime/compiler/optimizer/JProfilingBlock.cpp @@ -596,10 +596,10 @@ void TR_JProfilingBlock::computeMinimumSpanningTree(BlockParents &parent, BlockP BlockWeights weights(cfg->getNextNodeNumber(), stackMemoryRegion); TR::BlockChecklist inMST(comp()); - char* classname = comp()->getMethodBeingCompiled()->classNameChars(); + char *classname = comp()->getMethodBeingCompiled()->classNameChars(); uint16_t classnamelength = comp()->getMethodBeingCompiled()->classNameLength(); - char* name = comp()->getMethodBeingCompiled()->nameChars(); + char *name = comp()->getMethodBeingCompiled()->nameChars(); uint16_t namelength = comp()->getMethodBeingCompiled()->nameLength(); // Prim's init @@ -940,7 +940,6 @@ void TR_JProfilingBlock::addRecompilationTests(TR_BlockFrequencyInfo *blockFrequ */ int32_t TR_JProfilingBlock::perform() { - // Can we perform these tests before the perform call? if (comp()->getOption(TR_EnableJProfiling)) { if (trace()) @@ -981,14 +980,15 @@ int32_t TR_JProfilingBlock::perform() if (trace()) parent.dump(comp()); -//TODO: Remove this when ready to implement counter insertion. +// TODO: Remove this when ready to implement counter insertion. #if defined(J9VM_OPT_MICROJIT) J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); U_8 *extendedFlags = reinterpret_cast(fe())->fetchMethodExtendedFlagsPointer(method); - if( ((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1 ){ + if (((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1) + { comp()->setSkippedJProfilingBlock(); return 0; - } + } #endif TR::BlockChecklist countedBlocks(comp()); diff --git a/runtime/compiler/runtime/MetaData.cpp b/runtime/compiler/runtime/MetaData.cpp index f37e5e72283..80f95206957 100644 --- a/runtime/compiler/runtime/MetaData.cpp +++ b/runtime/compiler/runtime/MetaData.cpp @@ -61,7 +61,7 @@ #include "il/TreeTop_inlines.hpp" #if defined(J9VM_OPT_MICROJIT) #include "microjit/ExceptionTable.hpp" -#endif +#endif /* J9VM_OPT_MICROJIT */ #include "runtime/ArtifactManager.hpp" #include "runtime/CodeCache.hpp" #include "runtime/MethodMetaData.h" @@ -233,14 +233,14 @@ createExceptionTable( #if defined(J9VM_OPT_MICROJIT) static void createMJITExceptionTable( - TR_MethodMetaData * data, + TR_MethodMetaData *data, MJIT::ExceptionTableEntryIterator & exceptionIterator, bool fourByteOffsets, TR::Compilation *comp) { - uint8_t * cursor = (uint8_t *)data + sizeof(TR_MethodMetaData); + uint8_t *cursor = (uint8_t *)data + sizeof(TR_MethodMetaData); - for (TR_ExceptionTableEntry * e = exceptionIterator.getFirst(); e; e = exceptionIterator.getNext()) + for (TR_ExceptionTableEntry *e = exceptionIterator.getFirst(); e; e = exceptionIterator.getNext()) { if (fourByteOffsets) { @@ -1137,11 +1137,12 @@ static int32_t calculateExceptionsSize( } #if defined(J9VM_OPT_MICROJIT) -static int32_t calculateMJITExceptionsSize( - TR::Compilation* comp, - MJIT::ExceptionTableEntryIterator& exceptionIterator, - bool& fourByteExceptionTableEntries, - uint32_t& numberOfExceptionRangesWithBits) +static int32_t +calculateMJITExceptionsSize( + TR::Compilation *comp, + MJIT::ExceptionTableEntryIterator& exceptionIterator, + bool& fourByteExceptionTableEntries, + uint32_t& numberOfExceptionRangesWithBits) { uint32_t exceptionsSize = 0; uint32_t numberOfExceptionRanges = exceptionIterator.size(); @@ -1152,9 +1153,12 @@ static int32_t calculateMJITExceptionsSize( return -1; // our meta data representation only has 14 bits for the number of exception ranges if (!fourByteExceptionTableEntries) - for (TR_ExceptionTableEntry * e = exceptionIterator.getFirst(); e; e = exceptionIterator.getNext()) + for (TR_ExceptionTableEntry *e = exceptionIterator.getFirst(); e; e = exceptionIterator.getNext()) if (e->_catchType > 0xFFFF || !e->_method->isSameMethod(comp->getCurrentMethod())) - { fourByteExceptionTableEntries = true; break; } + { + fourByteExceptionTableEntries = true; + break; + } int32_t entrySize; if (fourByteExceptionTableEntries) @@ -1847,24 +1851,23 @@ createMethodMetaData( // TR_MethodMetaData * createMJITMethodMetaData( - TR_J9VMBase & vmArg, - TR_ResolvedMethod *vmMethod, - TR::Compilation *comp - ) + TR_J9VMBase & vmArg, + TR_ResolvedMethod *vmMethod, + TR::Compilation *comp) { - //--------------------------------------------------------------------------- - //This function needs to create the meta-data structures for: + // --------------------------------------------------------------------------- + // This function needs to create the meta-data structures for: // GC maps, Exceptions, inlining calls, stack atlas, // GPU optimizations and internal pointer table - TR_J9VMBase * vm = &vmArg; + TR_J9VMBase *vm = &vmArg; MJIT::ExceptionTableEntryIterator exceptionIterator(comp); - TR::ResolvedMethodSymbol * methodSymbol = comp->getJittedMethodSymbol(); - TR::CodeGenerator * cg = comp->cg(); - TR::GCStackAtlas * trStackAtlas = cg->getStackAtlas(); + TR::ResolvedMethodSymbol *methodSymbol = comp->getJittedMethodSymbol(); + TR::CodeGenerator *cg = comp->cg(); + TR::GCStackAtlas *trStackAtlas = cg->getStackAtlas(); // -------------------------------------------------------------------------- // Find unmergeable GC maps - //TODO: Create mechanism like treetop to support this at a MicroJIT level. + // TODO: MicroJIT: Create mechanism like treetop to support this at a MicroJIT level. GCStackMapSet nonmergeableBCI(std::less(), cg->trMemory()->heapMemoryRegion()); // if (comp->getOptions()->getReportByteCodeInfoAtCatchBlock()) // { @@ -1899,11 +1902,7 @@ createMJITMethodMetaData( // bool fourByteExceptionTableEntries = fourByteOffsets; uint32_t numberOfExceptionRangesWithBits = 0; - int32_t exceptionsSize = calculateMJITExceptionsSize( - comp, - exceptionIterator, - fourByteExceptionTableEntries, - numberOfExceptionRangesWithBits); + int32_t exceptionsSize = calculateMJITExceptionsSize(comp, exceptionIterator, fourByteExceptionTableEntries, numberOfExceptionRangesWithBits); if (exceptionsSize == -1) { @@ -1911,7 +1910,7 @@ createMJITMethodMetaData( } // TODO: once exception support is added to MicroJIT uncomment this line. // tableSize += exceptionsSize; - uint32_t exceptionTableSize = tableSize; //Size of the meta data header and exception entries + uint32_t exceptionTableSize = tableSize; // Size of the meta data header and exception entries // -------------------------------------------------------------------------- // Computing the size of the inlinedCall @@ -1922,14 +1921,7 @@ createMJITMethodMetaData( // Add size of stack atlas to allocate // - int32_t sizeOfStackAtlasInBytes = calculateSizeOfStackAtlas( - vm, - cg, - fourByteOffsets, - numberOfSlotsMapped, - numberOfMapBytes, - comp, - nonmergeableBCI); + int32_t sizeOfStackAtlasInBytes = calculateSizeOfStackAtlas(vm, cg, fourByteOffsets, numberOfSlotsMapped, numberOfMapBytes, comp, nonmergeableBCI); tableSize += sizeOfStackAtlasInBytes; @@ -1939,7 +1931,7 @@ createMJITMethodMetaData( // Escape analysis change for compressed pointers // - TR_GCStackAllocMap * stackAllocMap = trStackAtlas->getStackAllocMap(); + TR_GCStackAllocMap *stackAllocMap = trStackAtlas->getStackAllocMap(); if (stackAllocMap) { tableSize += numberOfMapBytes + sizeof(uintptr_t); @@ -1960,7 +1952,7 @@ createMJITMethodMetaData( internal pointer map */ - TR_MethodMetaData * data = (TR_MethodMetaData*) vmMethod->allocateException(tableSize, comp); + TR_MethodMetaData *data = (TR_MethodMetaData*) vmMethod->allocateException(tableSize, comp); populateBodyInfo(comp, vm, data); @@ -1981,16 +1973,7 @@ createMJITMethodMetaData( data->tempOffset = comp->cg()->getStackAtlas()->getNumberOfPendingPushSlots(); data->size = tableSize; - data->gcStackAtlas = createStackAtlas( - vm, - comp->cg(), - fourByteOffsets, - numberOfSlotsMapped, - numberOfMapBytes, - comp, - ((uint8_t *)data + exceptionTableSize + inlinedCallSize), - sizeOfStackAtlasInBytes, - nonmergeableBCI); + data->gcStackAtlas = createStackAtlas(vm, comp->cg(), fourByteOffsets, numberOfSlotsMapped, numberOfMapBytes, comp, ((uint8_t *)data + exceptionTableSize + inlinedCallSize), sizeOfStackAtlasInBytes, nonmergeableBCI); createMJITExceptionTable(data, exceptionIterator, fourByteExceptionTableEntries, comp); @@ -2008,7 +1991,7 @@ createMJITMethodMetaData( { // Insert trace point here for insertion failure } - if (vm->isAnonymousClass( ((TR_ResolvedJ9Method*)vmMethod)->romClassPtr())) + if (vm->isAnonymousClass(((TR_ResolvedJ9Method*)vmMethod)->romClassPtr())) { J9Class *j9clazz = ((TR_ResolvedJ9Method*)vmMethod)->constantPoolHdr(); J9CLASS_EXTENDED_FLAGS_SET(j9clazz, J9ClassContainsJittedMethods); @@ -2020,7 +2003,7 @@ createMJITMethodMetaData( } else { - J9ClassLoader * classLoader = ((TR_ResolvedJ9Method*)vmMethod)->getClassLoader(); + J9ClassLoader *classLoader = ((TR_ResolvedJ9Method*)vmMethod)->getClassLoader(); classLoader->flags |= J9CLASSLOADER_CONTAINS_JITTED_METHODS; data->prevMethod = NULL; data->nextMethod = classLoader->jitMetaDataList; diff --git a/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp b/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp index b56bd49c2d3..04ec1ed9aa7 100644 --- a/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp +++ b/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.cpp @@ -87,8 +87,8 @@ enum J9::X86::AMD64::PrivateLinkage::PrivateLinkage(TR::CodeGenerator *cg) : J9::X86::PrivateLinkage(cg) { - setOffsetToFirstParm(RETURN_ADDRESS_SIZE); - setProperties(cg,&_properties); + setOffsetToFirstParm(RETURN_ADDRESS_SIZE); + setProperties(cg,&_properties); } void J9::X86::AMD64::setProperties( diff --git a/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.hpp b/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.hpp index 2f330d401cb..f0137068489 100644 --- a/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.hpp +++ b/runtime/compiler/x/amd64/codegen/AMD64PrivateLinkage.hpp @@ -39,6 +39,7 @@ namespace X86 namespace AMD64 { void setProperties(TR::CodeGenerator *cg,TR::X86LinkageProperties *properties); + class PrivateLinkage : public J9::X86::PrivateLinkage { public: From 83e34fbbcca3032b1282f4075dbe7aca60d7d04e Mon Sep 17 00:00:00 2001 From: Scott Young Date: Mon, 28 Mar 2022 15:07:22 +0000 Subject: [PATCH 08/25] Changedmulti-line MJIT C macros to functions Signed-off-by: Scott Young --- .../compiler/control/CompilationThread.cpp | 71 +++++++++++-------- .../compiler/microjit/HWMetaDataCreation.cpp | 8 +-- 2 files changed, 45 insertions(+), 34 deletions(-) diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index 1a206b34790..7635691d796 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -166,12 +166,14 @@ extern TR::OptionSet *findOptionSet(J9Method *, bool); #define DECOMPRESSION_SUCCEEDED 0 #if defined(J9VM_OPT_MICROJIT) -#define COMPILE_WITH_MICROJIT(extra, method, extendedFlags)\ - (TR::Options::getJITCmdLineOptions()->_mjitEnabled && \ - extra && \ - !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) &&\ - !J9_ARE_NO_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) &&\ - !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)) +static bool +COMPILE_WITH_MICROJIT(UDATA extra, J9Method *method, U_8 *extendedFlags) + { + return (TR::Options::getJITCmdLineOptions()->_mjitEnabled && //MicroJIT needs to be enabled + J9_ARE_ALL_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) && //MicroJIT is not a target for recompilation + !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) && //MicroJIT does not compile native methods + !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)); //MicroJIT failed to compile this mehtod already + } #endif #if defined(WINDOWS) @@ -9333,22 +9335,31 @@ TR::CompilationInfoPerThreadBase::performAOTLoad( } #if defined(J9VM_OPT_MICROJIT) -// MicroJIT returns a code size of 0 when it encounters a compilation error -// We must use this to set the correct values and return NULL, this is the same -// no matter which phase of compilation fails, so we create a macro here to -// perform that work. -#define MJIT_COMPILE_ERROR(code_size, method, romMethod) do{\ - if (0 == code_size)\ - {\ - intptr_t trCountFromMJIT = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT;\ - trCountFromMJIT = (trCountFromMJIT << 1) | 1;\ - TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread);\ - U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method);\ - *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE;\ - method->extra = reinterpret_cast(trCountFromMJIT);\ - compiler->failCompilation("Cannot compile with MicroJIT.");\ - }\ -} while(0) +// MicroJIT returns a code size of 0 when it encournters a compilation error +// We must use this to set the correct values and fail compilation, this is +// the same no matter which phase of compilation fails. +// This function will bubble the exception up to the caller after clean up. +static void +MJIT_COMPILE_ERROR( + uintptr_t code_size, + J9Method *method, + J9ROMMethod *romMethod, + J9JITConfig *jitConfig, + J9VMThread *vmThread, + TR::Compilation *compiler + ) + { + if (0 == code_size) + { + intptr_t trCountFromMJIT = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT; + trCountFromMJIT = (trCountFromMJIT << 1) | 1; + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); + U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE; + method->extra = reinterpret_cast(trCountFromMJIT); + compiler->failCompilation("Cannot compile with MicroJIT."); + } + } // This routine should only be called from wrappedCompile TR_MethodMetaData * @@ -9406,8 +9417,8 @@ TR::CompilationInfoPerThreadBase::mjit( U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); if (compiler->getOption(TR_TraceCG)) trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiling %s...\n------------------------------\n", signature); - - MJIT_COMPILE_ERROR(!(compilee->isConstructor()), method, romMethod); // "MicroJIT does not currently compile constructors" + if ((compilee->isConstructor())) + MJIT_COMPILE_ERROR(0, method, romMethod, jitConfig, vmThread, compiler); // "MicroJIT does not currently compile constructors" bool isStaticMethod = compilee->isStatic(); // To know if the method we are compiling is static or not if (!isStaticMethod) @@ -9439,7 +9450,7 @@ TR::CompilationInfoPerThreadBase::mjit( int error_code = 0; MJIT::RegisterStack stack = MJIT::mapIncomingParams(typeString, maxLength, &error_code, paramTableEntries, actualParamCount, logFileFP); if (error_code) - MJIT_COMPILE_ERROR(0, method, romMethod); + MJIT_COMPILE_ERROR(0, method, romMethod, jitConfig, vmThread, compiler); MJIT::ParamTable paramTable(paramTableEntries, paramCount, actualParamCount, &stack); #define MAX_INSTRUCTION_SIZE 15 @@ -9456,7 +9467,7 @@ TR::CompilationInfoPerThreadBase::mjit( MJIT::CodeGenerator mjit_cg(_jitConfig, vmThread, logFileFP, vm, ¶mTable, compiler, &mjitCGGC, comp()->getPersistentInfo(), trMemory, compilee); estimated_size = MAX_BUFFER_SIZE(maxBCI); buffer = (char*)mjit_cg.allocateCodeCache(estimated_size, &vm, vmThread); - MJIT_COMPILE_ERROR(buffer, method, romMethod); // MicroJIT cannot compile if allocation fails. + MJIT_COMPILE_ERROR((uintptr_t)buffer, method, romMethod, jitConfig, vmThread, compiler); // MicroJIT cannot compile if allocation fails. codeCache = mjit_cg.getCodeCache(); // provide enough space for CodeCacheMethodHeader @@ -9467,7 +9478,7 @@ TR::CompilationInfoPerThreadBase::mjit( char *magicWordLocation, *first2BytesPatchLocation, *samplingRecompileCallLocation; buffer_size_t code_size = mjit_cg.generatePrePrologue(cursor, method, &magicWordLocation, &first2BytesPatchLocation, &samplingRecompileCallLocation); - MJIT_COMPILE_ERROR(code_size, method, romMethod); + MJIT_COMPILE_ERROR((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); compiler->cg()->setPrePrologueSize(code_size); buffer_size += code_size; @@ -9498,7 +9509,7 @@ TR::CompilationInfoPerThreadBase::mjit( char *firstInstructionLocation = NULL; code_size = mjit_cg.generatePrologue(cursor, method, &jitStackOverflowPatchLocation, magicWordLocation, first2BytesPatchLocation, samplingRecompileCallLocation, &firstInstructionLocation, &bcIterator); - MJIT_COMPILE_ERROR(code_size, method, romMethod); + MJIT_COMPILE_ERROR((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); TR::GCStackAtlas *atlas = mjit_cg.getStackAtlas(); compiler->cg()->setStackAtlas(atlas); @@ -9530,7 +9541,7 @@ TR::CompilationInfoPerThreadBase::mjit( code_size = mjit_cg.generateBody(cursor, &bcIterator); - MJIT_COMPILE_ERROR(code_size, method, romMethod); + MJIT_COMPILE_ERROR((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); buffer_size += code_size; MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after body"); @@ -9547,7 +9558,7 @@ TR::CompilationInfoPerThreadBase::mjit( // GENERATE COLD AREA code_size = mjit_cg.generateColdArea(cursor, method, jitStackOverflowPatchLocation); - MJIT_COMPILE_ERROR(code_size, method, romMethod); + MJIT_COMPILE_ERROR((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); buffer_size += code_size; MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after cold-area"); diff --git a/runtime/compiler/microjit/HWMetaDataCreation.cpp b/runtime/compiler/microjit/HWMetaDataCreation.cpp index f11abcd5faf..2535a9acbce 100644 --- a/runtime/compiler/microjit/HWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/HWMetaDataCreation.cpp @@ -29,7 +29,7 @@ /* do nothing */ #else -void +static void setupNode( TR::Node *node, uint32_t bcIndex, @@ -42,7 +42,7 @@ setupNode( node->setMethod(feMethod->getPersistentIdentifier()); } -TR::Block * +static TR::Block * makeBlock( TR::Compilation *comp, TR_ResolvedMethod *feMethod, @@ -62,7 +62,7 @@ makeBlock( return block; } -void +static void join( TR::CFG &cfg, TR::Block *src, @@ -499,7 +499,7 @@ class CFGCreator } }; -TR::Optimizer *createOptimizer(TR::Compilation *comp, TR::ResolvedMethodSymbol *methodSymbol) +static TR::Optimizer *createOptimizer(TR::Compilation *comp, TR::ResolvedMethodSymbol *methodSymbol) { return new (comp->trHeapMemory()) TR::Optimizer(comp, methodSymbol, true, J9::Optimizer::microJITOptimizationStrategy(comp)); } From c0cf8de4c81f088fa14eda0f9bbb74b9686c75b7 Mon Sep 17 00:00:00 2001 From: Scott Young Date: Thu, 31 Mar 2022 12:49:42 -0300 Subject: [PATCH 09/25] Renamed former MJIT macros and pushed overhead closer to tests Signed-off-by: Scott Young --- .../compiler/control/CompilationThread.cpp | 34 +++++++++++-------- 1 file changed, 19 insertions(+), 15 deletions(-) diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index 7635691d796..ac822b72b5c 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -167,10 +167,17 @@ extern TR::OptionSet *findOptionSet(J9Method *, bool); #if defined(J9VM_OPT_MICROJIT) static bool -COMPILE_WITH_MICROJIT(UDATA extra, J9Method *method, U_8 *extendedFlags) +shouldCompileWithMicroJIT(J9Method *method, J9JITConfig *jitConfig, J9VMThread *vmThread) { - return (TR::Options::getJITCmdLineOptions()->_mjitEnabled && //MicroJIT needs to be enabled - J9_ARE_ALL_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) && //MicroJIT is not a target for recompilation + // If MicroJIT is disabled, return early and avoid overhead + bool compileWithMicroJIT = TR::Options::getJITCmdLineOptions()->_mjitEnabled + if (!compileWithMicroJIT) + return false; + + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); + UDATA extra = (UDATA)method->extra; + U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + return (J9_ARE_ALL_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) && //MicroJIT is not a target for recompilation !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) && //MicroJIT does not compile native methods !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)); //MicroJIT failed to compile this mehtod already } @@ -9101,10 +9108,7 @@ TR::CompilationInfoPerThreadBase::wrappedCompile(J9PortLibrary *portLib, void * TR_ASSERT(compiler->getMethodHotness() != unknownHotness, "Trying to compile at unknown hotness level"); #if defined(J9VM_OPT_MICROJIT) J9Method *method = that->_methodBeingCompiled->getMethodDetails().getMethod(); - UDATA extra = (UDATA)method->extra; - TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); - U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); - if (COMPILE_WITH_MICROJIT(extra, method, extendedFlags)) + if (shouldCompileWithMicroJIT(method, jitConfig, vmThread)) { metaData = that->mjit(vmThread, compiler, compilee, *vm, p->_optimizationPlan, scratchSegmentProvider, p->trMemory()); } @@ -9340,7 +9344,7 @@ TR::CompilationInfoPerThreadBase::performAOTLoad( // the same no matter which phase of compilation fails. // This function will bubble the exception up to the caller after clean up. static void -MJIT_COMPILE_ERROR( +testMicroJITCompilationForErrors( uintptr_t code_size, J9Method *method, J9ROMMethod *romMethod, @@ -9418,7 +9422,7 @@ TR::CompilationInfoPerThreadBase::mjit( if (compiler->getOption(TR_TraceCG)) trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiling %s...\n------------------------------\n", signature); if ((compilee->isConstructor())) - MJIT_COMPILE_ERROR(0, method, romMethod, jitConfig, vmThread, compiler); // "MicroJIT does not currently compile constructors" + testMicroJITCompilationForErrors(0, method, romMethod, jitConfig, vmThread, compiler); // "MicroJIT does not currently compile constructors" bool isStaticMethod = compilee->isStatic(); // To know if the method we are compiling is static or not if (!isStaticMethod) @@ -9450,7 +9454,7 @@ TR::CompilationInfoPerThreadBase::mjit( int error_code = 0; MJIT::RegisterStack stack = MJIT::mapIncomingParams(typeString, maxLength, &error_code, paramTableEntries, actualParamCount, logFileFP); if (error_code) - MJIT_COMPILE_ERROR(0, method, romMethod, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors(0, method, romMethod, jitConfig, vmThread, compiler); MJIT::ParamTable paramTable(paramTableEntries, paramCount, actualParamCount, &stack); #define MAX_INSTRUCTION_SIZE 15 @@ -9467,7 +9471,7 @@ TR::CompilationInfoPerThreadBase::mjit( MJIT::CodeGenerator mjit_cg(_jitConfig, vmThread, logFileFP, vm, ¶mTable, compiler, &mjitCGGC, comp()->getPersistentInfo(), trMemory, compilee); estimated_size = MAX_BUFFER_SIZE(maxBCI); buffer = (char*)mjit_cg.allocateCodeCache(estimated_size, &vm, vmThread); - MJIT_COMPILE_ERROR((uintptr_t)buffer, method, romMethod, jitConfig, vmThread, compiler); // MicroJIT cannot compile if allocation fails. + testMicroJITCompilationForErrors((uintptr_t)buffer, method, romMethod, jitConfig, vmThread, compiler); // MicroJIT cannot compile if allocation fails. codeCache = mjit_cg.getCodeCache(); // provide enough space for CodeCacheMethodHeader @@ -9478,7 +9482,7 @@ TR::CompilationInfoPerThreadBase::mjit( char *magicWordLocation, *first2BytesPatchLocation, *samplingRecompileCallLocation; buffer_size_t code_size = mjit_cg.generatePrePrologue(cursor, method, &magicWordLocation, &first2BytesPatchLocation, &samplingRecompileCallLocation); - MJIT_COMPILE_ERROR((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); compiler->cg()->setPrePrologueSize(code_size); buffer_size += code_size; @@ -9509,7 +9513,7 @@ TR::CompilationInfoPerThreadBase::mjit( char *firstInstructionLocation = NULL; code_size = mjit_cg.generatePrologue(cursor, method, &jitStackOverflowPatchLocation, magicWordLocation, first2BytesPatchLocation, samplingRecompileCallLocation, &firstInstructionLocation, &bcIterator); - MJIT_COMPILE_ERROR((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); TR::GCStackAtlas *atlas = mjit_cg.getStackAtlas(); compiler->cg()->setStackAtlas(atlas); @@ -9541,7 +9545,7 @@ TR::CompilationInfoPerThreadBase::mjit( code_size = mjit_cg.generateBody(cursor, &bcIterator); - MJIT_COMPILE_ERROR((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); buffer_size += code_size; MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after body"); @@ -9558,7 +9562,7 @@ TR::CompilationInfoPerThreadBase::mjit( // GENERATE COLD AREA code_size = mjit_cg.generateColdArea(cursor, method, jitStackOverflowPatchLocation); - MJIT_COMPILE_ERROR((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); buffer_size += code_size; MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after cold-area"); From 9497cbfb02ac7d6f8053ccb7fa0ee6dc30ffa054 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Tue, 29 Mar 2022 14:39:56 -0300 Subject: [PATCH 10/25] MicroJIT: Formatting and alignment Signed-off-by: Harpreet Kaur --- .../compiler/control/CompilationThread.cpp | 31 ++-- runtime/compiler/microjit/SideTables.hpp | 13 +- .../microjit/x/amd64/AMD64Codegen.cpp | 2 +- .../microjit/x/amd64/AMD64Linkage.hpp | 8 +- .../microjit/x/amd64/templates/bytecodes.nasm | 8 +- .../microjit/x/amd64/templates/linkage.nasm | 174 +++++++++--------- 6 files changed, 118 insertions(+), 118 deletions(-) diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index ac822b72b5c..1ca36423a34 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -170,16 +170,16 @@ static bool shouldCompileWithMicroJIT(J9Method *method, J9JITConfig *jitConfig, J9VMThread *vmThread) { // If MicroJIT is disabled, return early and avoid overhead - bool compileWithMicroJIT = TR::Options::getJITCmdLineOptions()->_mjitEnabled + bool compileWithMicroJIT = TR::Options::getJITCmdLineOptions()->_mjitEnabled; if (!compileWithMicroJIT) return false; TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); UDATA extra = (UDATA)method->extra; U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); - return (J9_ARE_ALL_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) && //MicroJIT is not a target for recompilation - !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) && //MicroJIT does not compile native methods - !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)); //MicroJIT failed to compile this mehtod already + return (J9_ARE_ALL_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) && // MicroJIT is not a target for recompilation + !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) && // MicroJIT does not compile native methods + !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)); // MicroJIT failed to compile this method already } #endif @@ -9339,19 +9339,18 @@ TR::CompilationInfoPerThreadBase::performAOTLoad( } #if defined(J9VM_OPT_MICROJIT) -// MicroJIT returns a code size of 0 when it encournters a compilation error +// MicroJIT returns a code size of 0 when it encounters a compilation error // We must use this to set the correct values and fail compilation, this is -// the same no matter which phase of compilation fails. +// the same no matter which phase of compilation fails. // This function will bubble the exception up to the caller after clean up. static void testMicroJITCompilationForErrors( - uintptr_t code_size, - J9Method *method, - J9ROMMethod *romMethod, - J9JITConfig *jitConfig, - J9VMThread *vmThread, - TR::Compilation *compiler - ) + uintptr_t code_size, + J9Method *method, + J9ROMMethod *romMethod, + J9JITConfig *jitConfig, + J9VMThread *vmThread, + TR::Compilation *compiler) { if (0 == code_size) { @@ -9582,9 +9581,9 @@ TR::CompilationInfoPerThreadBase::mjit( // TODO: after adding profiling support, uncomment this. // if (bodyInfo && bodyInfo->getProfileInfo()) - // { - // bodyInfo->getProfileInfo()->setActive(); - // } + // { + // bodyInfo->getProfileInfo()->setActive(); + // } // Put a metaData pointer into the Code Cache Header(s). // diff --git a/runtime/compiler/microjit/SideTables.hpp b/runtime/compiler/microjit/SideTables.hpp index 12c9c97c65f..8f881f869ce 100644 --- a/runtime/compiler/microjit/SideTables.hpp +++ b/runtime/compiler/microjit/SideTables.hpp @@ -27,15 +27,16 @@ #include "microjit/utils.hpp" /* -| Architecture | Endian | Return address | vTable index | -| x86-64 | Little | Stack | R8 (receiver in RAX) +| Architecture | Endian | Return address | vTable index | +| x86-64 | Little | Stack | R8 (receiver in RAX) | -| Integer Return value registers | Integer Preserved registers | Integer Argument registers -| EAX (32-bit) RAX (64-bit) | RBX R9 | RAX RSI RDX RCX +| Integer Return value registers | Integer Preserved registers | Integer Argument registers | +| EAX (32-bit) RAX (64-bit) | RBX R9 | RAX RSI RDX RCX | -| Float Return value registers | Float Preserved registers | Float Argument registers -| XMM0 | XMM8-XMM15 | XMM0-XMM7 +| Float Return value registers | Float Preserved registers | Float Argument registers | +| XMM0 | XMM8-XMM15 | XMM0-XMM7 | */ + namespace MJIT { diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp index 12a26ed959e..82cdb53a153 100755 --- a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp @@ -1518,7 +1518,7 @@ MJIT::CodeGenerator::generatePrologue( /* * MJIT stack frames must conform to TR stack frame shapes. However, so long as we are able to have our frames walked, we should be * free to allocate more things on the stack before the required data which is used for walking. The following is the general shape of the - * initial MJIT stack frame + * initial MJIT stack frame: * +-----------------------------+ * | ... | * | Paramaters passed by caller | <-- The space for saving params passed in registers is also here diff --git a/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp b/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp index 14c6309ae00..9e1159714d4 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp @@ -26,11 +26,11 @@ #include "codegen/OMRLinkage_inlines.hpp" /* -| Architecture | Endian | Return address | Argument registers -| x86-64 | Little | Stack | RAX RSI RDX RCX +| Architecture | Endian | Return address | Argument registers | +| x86-64 | Little | Stack | RAX RSI RDX RCX | -| Return value registers | Preserved registers | vTable index | -| EAX (32-bit) RAX (64-bit) XMM0 (float/double) | RBX R9 RSP | R8 (receiver in RAX) | +| Return value registers | Preserved registers | vTable index | +| EAX (32-bit) RAX (64-bit) XMM0 (float/double) | RBX R9 RSP | R8 (receiver in RAX) | */ namespace MJIT diff --git a/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm b/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm index 3a2f44eb6b0..fde94ab08b9 100644 --- a/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm +++ b/runtime/compiler/microjit/x/amd64/templates/bytecodes.nasm @@ -43,22 +43,22 @@ template_start debugBreakpoint int3 ; trigger hardware interrupt template_end debugBreakpoint -; All integer loads require adding stack space of 8 bytes -; and the moving of values from a stack slot to a temp register +; All reference loads require adding stack space of 8 bytes +; and moving values from a stack slot to a temp register template_start aloadTemplatePrologue push_single_slot _64bit_local_to_rXX_PATCH r15 template_end aloadTemplatePrologue ; All integer loads require adding stack space of 8 bytes -; and the moving of values from a stack slot to a temp register +; and moving values from a stack slot to a temp register template_start iloadTemplatePrologue push_single_slot _32bit_local_to_rXX_PATCH r15 template_end iloadTemplatePrologue ; All long loads require adding 2 stack spaces of 8 bytes (16 total) -; and the moving of values from a stack slot to a temp register +; and moving values from a stack slot to a temp register template_start lloadTemplatePrologue push_dual_slot _64bit_local_to_rXX_PATCH r15 diff --git a/runtime/compiler/microjit/x/amd64/templates/linkage.nasm b/runtime/compiler/microjit/x/amd64/templates/linkage.nasm index 53dba321fde..448418e2fb9 100644 --- a/runtime/compiler/microjit/x/amd64/templates/linkage.nasm +++ b/runtime/compiler/microjit/x/amd64/templates/linkage.nasm @@ -20,7 +20,7 @@ %include "utils.nasm" ; -;Create templates +; Create templates ; section .text @@ -78,119 +78,119 @@ template_start saveR11Local template_end saveR11Local template_start movRSPOffsetR11 - mov r11, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r11, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end movRSPOffsetR11 template_start movRSPR10 - mov r10, rsp; Used for allocating space on the stack. + mov r10, rsp ; Used for allocating space on the stack. template_end movRSPR10 template_start movR10R14 - mov r14, r10; Used for allocating space on the stack. + mov r14, r10 ; Used for allocating space on the stack. template_end movR10R14 template_start subR10Imm4 - sub r10, 0x7fffffff; Used for allocating space on the stack. + sub r10, 0x7fffffff ; Used for allocating space on the stack. template_end subR10Imm4 template_start addR10Imm4 - add r10, 0x7fffffff; Used for allocating space on the stack. + add r10, 0x7fffffff ; Used for allocating space on the stack. template_end addR10Imm4 template_start subR14Imm4 - sub r14, 0x7fffffff; Used for allocating space on the stack. + sub r14, 0x7fffffff ; Used for allocating space on the stack. template_end subR14Imm4 template_start addR14Imm4 - add r14, 0x7fffffff; Used for allocating space on the stack. + add r14, 0x7fffffff ; Used for allocating space on the stack. template_end addR14Imm4 template_start saveRBPOffset - mov [rsp+0xff], rbp; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rbp ; Used ff as all other labels look valid and 255 will be rare. template_end saveRBPOffset template_start saveEBPOffset - mov [rsp+0xff], ebp; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], ebp ; Used ff as all other labels look valid and 255 will be rare. template_end saveEBPOffset template_start saveRSPOffset - mov [rsp+0xff], rsp; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rsp ; Used ff as all other labels look valid and 255 will be rare. template_end saveRSPOffset template_start saveESPOffset - mov [rsp+0xff], rsp; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rsp ; Used ff as all other labels look valid and 255 will be rare. template_end saveESPOffset template_start loadRBPOffset - mov rbp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rbp, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadRBPOffset template_start loadEBPOffset - mov ebp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov ebp, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadEBPOffset template_start loadRSPOffset - mov rsp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rsp, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadRSPOffset template_start loadESPOffset - mov rsp, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rsp, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadESPOffset template_start saveEAXOffset - mov [rsp+0xff], eax; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], eax ; Used ff as all other labels look valid and 255 will be rare. template_end saveEAXOffset template_start saveESIOffset - mov [rsp+0xff], esi; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], esi ; Used ff as all other labels look valid and 255 will be rare. template_end saveESIOffset template_start saveEDXOffset - mov [rsp+0xff], edx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], edx ; Used ff as all other labels look valid and 255 will be rare. template_end saveEDXOffset template_start saveECXOffset - mov [rsp+0xff], ecx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], ecx ; Used ff as all other labels look valid and 255 will be rare. template_end saveECXOffset template_start saveEBXOffset - mov [rsp+0xff], ebx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], ebx ; Used ff as all other labels look valid and 255 will be rare. template_end saveEBXOffset template_start loadEAXOffset - mov eax, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov eax, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadEAXOffset template_start loadESIOffset - mov esi, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov esi, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadESIOffset template_start loadEDXOffset - mov edx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov edx, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadEDXOffset template_start loadECXOffset - mov ecx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov ecx, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadECXOffset template_start loadEBXOffset - mov ebx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov ebx, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadEBXOffset template_start addRSPImm8 - add rsp, 0x7fffffffffffffff; Used for allocating space on the stack. + add rsp, 0x7fffffffffffffff ; Used for allocating space on the stack. template_end addRSPImm8 template_start addRSPImm4 - add rsp, 0x7fffffff; Used for allocating space on the stack. + add rsp, 0x7fffffff ; Used for allocating space on the stack. template_end addRSPImm4 template_start addRSPImm2 - add rsp, 0x7fff; Used for allocating space on the stack. + add rsp, 0x7fff ; Used for allocating space on the stack. template_end addRSPImm2 template_start addRSPImm1 - add rsp, 0x7f; Used for allocating space on the stack. + add rsp, 0x7f ; Used for allocating space on the stack. template_end addRSPImm1 template_start je4ByteRel @@ -202,231 +202,231 @@ template_start jeByteRel template_end jeByteRel template_start subRSPImm8 - sub rsp, 0x7fffffffffffffff; Used for allocating space on the stack. + sub rsp, 0x7fffffffffffffff ; Used for allocating space on the stack. template_end subRSPImm8 template_start subRSPImm4 - sub rsp, 0x7fffffff; Used for allocating space on the stack. + sub rsp, 0x7fffffff ; Used for allocating space on the stack. template_end subRSPImm4 template_start subRSPImm2 - sub rsp, 0x7fff; Used for allocating space on the stack. + sub rsp, 0x7fff ; Used for allocating space on the stack. template_end subRSPImm2 template_start subRSPImm1 - sub rsp, 0x7f; Used for allocating space on the stack. + sub rsp, 0x7f ; Used for allocating space on the stack. template_end subRSPImm1 template_start jbe4ByteRel - jbe 0xefbeadde; Used for generating jumps to far labels. + jbe 0xefbeadde ; Used for generating jumps to far labels. template_end jbe4ByteRel template_start cmpRspRbpDerefOffset - cmp rsp, [rbp+0xff]; Used ff as all other labels look valid and 255 will be rare. + cmp rsp, [rbp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end cmpRspRbpDerefOffset template_start loadRAXOffset - mov rax, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rax, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadRAXOffset template_start loadRSIOffset - mov rsi, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rsi, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadRSIOffset template_start loadRDXOffset - mov rdx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rdx, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadRDXOffset template_start loadRCXOffset - mov rcx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rcx, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadRCXOffset template_start loadRBXOffset - mov rbx, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov rbx, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadRBXOffset template_start loadR9Offset - mov r9, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r9, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadR9Offset template_start loadR10Offset - mov r10, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r10, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadR10Offset template_start loadR11Offset - mov r11, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r11, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadR11Offset template_start loadR12Offset - mov r12, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r12, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadR12Offset template_start loadR13Offset - mov r13, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r13, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadR13Offset template_start loadR14Offset - mov r14, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r14, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadR14Offset template_start loadR15Offset - mov r15, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + mov r15, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadR15Offset template_start loadXMM0Offset - movq xmm0, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm0, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM0Offset template_start loadXMM1Offset - movq xmm1, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm1, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM1Offset template_start loadXMM2Offset - movq xmm2, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm2, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM2Offset template_start loadXMM3Offset - movq xmm3, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm3, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM3Offset template_start loadXMM4Offset - movq xmm4, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm4, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM4Offset template_start loadXMM5Offset - movq xmm5, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm5, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM5Offset template_start loadXMM6Offset - movq xmm6, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm6, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM6Offset template_start loadXMM7Offset - movq xmm7, [rsp+0xff]; Used ff as all other labels look valid and 255 will be rare. + movq xmm7, [rsp+0xff] ; Used ff as all other labels look valid and 255 will be rare. template_end loadXMM7Offset template_start saveRAXOffset - mov [rsp+0xff], rax; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rax ; Used ff as all other labels look valid and 255 will be rare. template_end saveRAXOffset template_start saveRSIOffset - mov [rsp+0xff], rsi; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rsi ; Used ff as all other labels look valid and 255 will be rare. template_end saveRSIOffset template_start saveRDXOffset - mov [rsp+0xff], rdx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rdx ; Used ff as all other labels look valid and 255 will be rare. template_end saveRDXOffset template_start saveRCXOffset - mov [rsp+0xff], rcx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rcx ; Used ff as all other labels look valid and 255 will be rare. template_end saveRCXOffset template_start saveRBXOffset - mov [rsp+0xff], rbx; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], rbx ; Used ff as all other labels look valid and 255 will be rare. template_end saveRBXOffset template_start saveR9Offset - mov [rsp+0xff], r9; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r9 ; Used ff as all other labels look valid and 255 will be rare. template_end saveR9Offset template_start saveR10Offset - mov [rsp+0xff], r10; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r10 ; Used ff as all other labels look valid and 255 will be rare. template_end saveR10Offset template_start saveR11Offset - mov [rsp+0xff], r11; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r11 ; Used ff as all other labels look valid and 255 will be rare. template_end saveR11Offset template_start saveR12Offset - mov [rsp+0xff], r12; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r12 ; Used ff as all other labels look valid and 255 will be rare. template_end saveR12Offset template_start saveR13Offset - mov [rsp+0xff], r13; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r13 ; Used ff as all other labels look valid and 255 will be rare. template_end saveR13Offset template_start saveR14Offset - mov [rsp+0xff], r14; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r14 ; Used ff as all other labels look valid and 255 will be rare. template_end saveR14Offset template_start saveR15Offset - mov [rsp+0xff], r15; Used ff as all other labels look valid and 255 will be rare. + mov [rsp+0xff], r15 ; Used ff as all other labels look valid and 255 will be rare. template_end saveR15Offset template_start saveXMM0Offset - movq [rsp+0xff], xmm0; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm0 ; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM0Offset template_start saveXMM1Offset - movq [rsp+0xff], xmm1; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm1 ; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM1Offset template_start saveXMM2Offset - movq [rsp+0xff], xmm2; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm2 ; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM2Offset template_start saveXMM3Offset - movq [rsp+0xff], xmm3; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm3 ; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM3Offset template_start saveXMM4Offset - movq [rsp+0xff], xmm4; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm4 ; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM4Offset template_start saveXMM5Offset - movq [rsp+0xff], xmm5; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm5 ; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM5Offset template_start saveXMM6Offset - movq [rsp+0xff], xmm6; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm6 ; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM6Offset template_start saveXMM7Offset - movq [rsp+0xff], xmm7; Used ff as all other labels look valid and 255 will be rare. + movq [rsp+0xff], xmm7 ; Used ff as all other labels look valid and 255 will be rare. template_end saveXMM7Offset template_start call4ByteRel - call 0xefbeadde; Used for generating jumps to far labels. + call 0xefbeadde ; Used for generating jumps to far labels. template_end call4ByteRel template_start callByteRel - call 0xff; Used for generating jumps to nearby labels. Used ff as all other labels look valid and 255 will be rare. + call 0xff ; Used for generating jumps to nearby labels, used ff as all other labels look valid and 255 will be rare. template_end callByteRel template_start jump4ByteRel - jmp 0xefbeadde; Used for generating jumps to far labels. + jmp 0xefbeadde ; Used for generating jumps to far labels. template_end jump4ByteRel template_start jumpByteRel - jmp short $+0x02; Used for generating jumps to nearby labels. shows up as eb 00 in memory (easy to see) + jmp short $+0x02 ; Used for generating jumps to nearby labels, shows up as eb 00 in memory (easy to see). template_end jumpByteRel template_start nopInstruction - nop; Used for alignment. One byte wide + nop ; Used for alignment, one byte wide. template_end nopInstruction template_start movRDIImm64 - mov rdi, 0xefbeaddeefbeadde; looks like 'deadbeef' in objdump + mov rdi, 0xefbeaddeefbeadde ; looks like 'deadbeef' in objdump. template_end movRDIImm64 template_start movEDIImm32 - mov edi, 0xefbeadde; looks like 'deadbeef' in objdump + mov edi, 0xefbeadde ; looks like 'deadbeef' in objdump. template_end movEDIImm32 template_start movRAXImm64 - mov rax, 0xefbeaddeefbeadde; looks like 'deadbeef' in objdump + mov rax, 0xefbeaddeefbeadde ; looks like 'deadbeef' in objdump. template_end movRAXImm64 template_start movEAXImm32 - mov eax, 0xefbeadde; looks like 'deadbeef' in objdump + mov eax, 0xefbeadde ; looks like 'deadbeef' in objdump. template_end movEAXImm32 template_start jumpRDI - jmp rdi; Useful for jumping to absolute addresses. Store in RDI then generate this. + jmp rdi ; Used for jumping to absolute addresses, store in RDI then generate this. template_end jumpRDI template_start jumpRAX - jmp rax; Useful for jumping to absolute addresses. Store in RAX then generate this. + jmp rax ; Used for jumping to absolute addresses, store in RAX then generate this. template_end jumpRAX template_start paintRegister From 9b9841348490cae0320a3e60c2e0722da4244caf Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Wed, 30 Mar 2022 11:12:38 -0300 Subject: [PATCH 11/25] MicroJIT: Moving relevant functions and classes inside MJIT namespace and documentation for HW and LW Signed-off-by: Harpreet Kaur --- runtime/cmake/platform/toolcfg/gnu.cmake | 2 +- runtime/compiler/microjit/HWMetaDataCreation.cpp | 7 ++++++- runtime/compiler/microjit/LWMetaDataCreation.cpp | 6 ++++++ 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/runtime/cmake/platform/toolcfg/gnu.cmake b/runtime/cmake/platform/toolcfg/gnu.cmake index aaba270e2a2..b3b8bf93b22 100644 --- a/runtime/cmake/platform/toolcfg/gnu.cmake +++ b/runtime/cmake/platform/toolcfg/gnu.cmake @@ -1,5 +1,5 @@ ################################################################################ -# Copyright (c) 2020, 2022 IBM Corp. and others +# Copyright (c) 2020, 2021 IBM Corp. and others # # This program and the accompanying materials are made available under # the terms of the Eclipse Public License 2.0 which accompanies this diff --git a/runtime/compiler/microjit/HWMetaDataCreation.cpp b/runtime/compiler/microjit/HWMetaDataCreation.cpp index 2535a9acbce..d8cffe794d5 100644 --- a/runtime/compiler/microjit/HWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/HWMetaDataCreation.cpp @@ -29,6 +29,9 @@ /* do nothing */ #else +namespace MJIT +{ + static void setupNode( TR::Node *node, @@ -505,7 +508,7 @@ static TR::Optimizer *createOptimizer(TR::Compilation *comp, TR::ResolvedMethodS } MJIT::InternalMetaData -MJIT::createInternalMethodMetadata( +createInternalMethodMetadata( MJIT::ByteCodeIterator *bci, MJIT::LocalTableEntry *localTableEntries, U_16 entries, @@ -709,4 +712,6 @@ MJIT::createInternalMethodMetadata( MJIT::InternalMetaData internalMetaData(localTable, cfg, maxCalleeArgsSize); return internalMetaData; } + +} // namespace MJIT #endif /* TR_MJIT_Interop */ diff --git a/runtime/compiler/microjit/LWMetaDataCreation.cpp b/runtime/compiler/microjit/LWMetaDataCreation.cpp index 3ca6afd5d2c..ba77826f19c 100644 --- a/runtime/compiler/microjit/LWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/LWMetaDataCreation.cpp @@ -21,6 +21,12 @@ #include "microjit/MethodBytecodeMetaData.hpp" +/* HWMetaDataCreation is for heavyweight and + LWMetaDataCreation is for lightweight. + Lightweight is meant to be for low-memory environments + where TR would use too many resources (e.g. embedded). + */ + #if defined(J9VM_OPT_MJIT_Standalone) // TODO: Complete this section so that MicroJIT can be used in a future case where MicroJIT won't deploy with TR. #else From e5d829edc0c9c1f4c915f9c67273f76927f1a8c4 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Thu, 31 Mar 2022 14:42:48 -0300 Subject: [PATCH 12/25] MicroJIT: Adding list of supported and unsupported bytecodes Signed-off-by: Harpreet Kaur --- doc/compiler/MicroJIT/Support.md | 281 ++++++++++++++---- .../compiler/control/CompilationRuntime.hpp | 6 + .../compiler/control/CompilationThread.cpp | 3 + runtime/compiler/microjit/SideTables.hpp | 6 +- .../microjit/x/amd64/AMD64Linkage.hpp | 8 +- 5 files changed, 247 insertions(+), 57 deletions(-) diff --git a/doc/compiler/MicroJIT/Support.md b/doc/compiler/MicroJIT/Support.md index 6abd13d06bd..007dfbf070f 100644 --- a/doc/compiler/MicroJIT/Support.md +++ b/doc/compiler/MicroJIT/Support.md @@ -20,53 +20,234 @@ OpenJDK Assembly Exception [2]. SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception --> -MicroJIT does not currently support the compilation of: -- System methods -- Methods which call System methods -- Methods which call JNI methods -- Methods which use any of the following bytecodes: - - 01 (0x01) aconst_null - - 18 (0x12) ldc - - 19 (0x13) ldc_w - - 20 (0x14) ldc2_w - - 46 (0x2e) iaload - - 47 (0x2f) laload - - 48 (0x30) faload - - 49 (0x31) daload - - 50 (0x32) aaload - - 51 (0x33) baload - - 52 (0x34) caload - - 53 (0x35) saload - - 79 (0x4f) iastore - - 80 (0x50) lastore - - 81 (0x51) fastore - - 82 (0x52) dastore - - 83 (0x53) aastore - - 84 (0x54) bastore - - 85 (0x55) castore - - 86 (0x56) sastore - - 168 (0xa8) jsr - - 169 (0xa9) ret - - 170 (0xaa) tableswitch - - 171 (0xab) lookupswitch - - 182 (0xb6) invokevirtual - - 183 (0xb7) invokespecial - - 185 (0xb9) invokeinterface - - 186 (0xba) invokedynamic - - 187 (0xbb) new - - 188 (0xbc) newarray - - 189 (0xbd) anewarray - - 190 (0xbe) arraylength - - 191 (0xbf) athrow - - 192 (0xc0) checkcast - - 193 (0xc1) instanceof - - 194 (0xc2) monitorenter - - 195 (0xc3) monitorexit - - 196 (0xc4) wide - - 197 (0xc5) multianewarray - - 200 (0xc8) goto_w - - 201 (0xc9) jsr_w - - 202 (0xca) breakpoint - - 254 (0xfe) impdep1 - - 255 (0xff) impdep2 -- MicroJIT is also currently known to have issue with methods that are far down the call stack which contain live object references \ No newline at end of file +MicroJIT does not currently support the compilation of system methods, methods which call system methods and methods which call JNI methods. +Bytecodes currently supported (marked as ✅) and unsupported (marked as ❌) by MicroJIT are: + +**Constants** +- 00 (0x00) nop ✅ +- 01 (0x01) aconst_null ❌ +- 02 (0x02) iconst_m1 ✅ +- 03 (0x03) iconst_0 ✅ +- 04 (0x04) iconst_1 ✅ +- 05 (0x05) iconst_2 ✅ +- 06 (0x06) iconst_3 ✅ +- 07 (0x07) iconst_4 ✅ +- 08 (0x08) iconst_5 ✅ +- 09 (0x09) lconst_0 ✅ +- 10 (0x0a) lconst_1 ✅ +- 11 (0x0b) fconst_0 ✅ +- 12 (0x0c) fconst_1 ✅ +- 13 (0x0d) fconst_2 ✅ +- 14 (0x0e) dconst_0 ✅ +- 15 (0x0f) dconst_1 ✅ +- 16 (0x10) bipush ✅ +- 17 (0x11) sipush ✅ +- 18 (0x12) ldc ❌ +- 19 (0x13) ldc_w ❌ +- 20 (0x14) ldc2_w ❌ + +**Loads** +- 21 (0x15) iload ✅ +- 22 (0x16) lload ✅ +- 23 (0x17) fload ✅ +- 24 (0x18) dload ✅ +- 25 (0x19) aload ✅ +- 26 (0x1a) iload_0 ✅ +- 27 (0x1b) iload_1 ✅ +- 28 (0x1c) iload_2 ✅ +- 29 (0x1d) iload_3 ✅ +- 30 (0x1e) lload_0 ✅ +- 31 (0x1f) lload_1 ✅ +- 32 (0x20) lload_2 ✅ +- 33 (0x21) lload_3 ✅ +- 34 (0x22) fload_0 ✅ +- 35 (0x23) fload_1 ✅ +- 36 (0x24) fload_2 ✅ +- 37 (0x25) fload_3 ✅ +- 38 (0x26) dload_0 ✅ +- 39 (0x27) dload_1 ✅ +- 40 (0x28) dload_2 ✅ +- 41 (0x29) dload_3 ✅ +- 42 (0x2a) aload_0 ✅ +- 43 (0x2b) aload_1 ✅ +- 44 (0x2c) aload_2 ✅ +- 45 (0x2d) aload_3 ✅ +- 46 (0x2e) iaload ❌ +- 47 (0x2f) laload ❌ +- 48 (0x30) faload ❌ +- 49 (0x31) daload ❌ +- 50 (0x32) aaload ❌ +- 51 (0x33) baload ❌ +- 52 (0x34) caload ❌ +- 53 (0x35) saload ❌ + +**Stores** +- 54 (0x36) istore ✅ +- 55 (0x37) lstore ✅ +- 56 (0x38) fstore ✅ +- 57 (0x39) dstore ✅ +- 58 (0x3a) astore ✅ +- 59 (0x3b) istore_0 ✅ +- 60 (0x3c) istore_1 ✅ +- 61 (0x3d) istore_2 ✅ +- 62 (0x3e) istore_3 ✅ +- 63 (0x3f) lstore_0 ✅ +- 64 (0x40) lstore_1 ✅ +- 65 (0x41) lstore_2 ✅ +- 66 (0x42) lstore_3 ✅ +- 67 (0x43) fstore_0 ✅ +- 68 (0x44) fstore_1 ✅ +- 69 (0x45) fstore_2 ✅ +- 70 (0x46) fstore_3 ✅ +- 71 (0x47) dstore_0 ✅ +- 72 (0x48) dstore_1 ✅ +- 73 (0x49) dstore_2 ✅ +- 74 (0x4a) dstore_3 ✅ +- 75 (0x4b) astore_0 ✅ +- 76 (0x4c) astore_1 ✅ +- 77 (0x4d) astore_2 ✅ +- 78 (0x4e) astore_3 ✅ +- 79 (0x4f) iastore ❌ +- 80 (0x50) lastore ❌ +- 81 (0x51) fastore ❌ +- 82 (0x52) dastore ❌ +- 83 (0x53) aastore ❌ +- 84 (0x54) bastore ❌ +- 85 (0x55) castore ❌ +- 86 (0x56) sastore ❌ + +**Stack** +- 87 (0x57) pop ✅ +- 88 (0x58) pop2 ✅ +- 89 (0x59) dup ✅ +- 90 (0x5a) dup_x1 ✅ +- 91 (0x5b) dup_x2 ✅ +- 92 (0x5c) dup2 ✅ +- 93 (0x5d) dup2_x1 ✅ +- 94 (0x5e) dup2_x2 ✅ +- 95 (0x5f) swap ✅ + +**Math** +- 96 (0x60) iadd ✅ +- 97 (0x61) ladd ✅ +- 98 (0x62) fadd ✅ +- 99 (0x63) dadd ✅ +- 100 (0x64) isub ✅ +- 101 (0x65) lsub ✅ +- 102 (0x66) fsub ✅ +- 103 (0x67) dsub ✅ +- 104 (0x68) imul ✅ +- 105 (0x69) lmul ✅ +- 106 (0x6a) fmul ✅ +- 107 (0x6b) dmul ✅ +- 108 (0x6c) idiv ✅ +- 109 (0x6d) ldiv ✅ +- 110 (0x6e) fdiv ✅ +- 111 (0x6f) ddiv ✅ +- 112 (0x70) irem ✅ +- 113 (0x71) lrem ✅ +- 114 (0x72) frem ✅ +- 115 (0x73) drem ✅ +- 116 (0x74) ineg ✅ +- 117 (0x75) lneg ✅ +- 118 (0x76) fneg ✅ +- 119 (0x77) dneg ✅ +- 120 (0x78) ishl ✅ +- 121 (0x79) lshl ✅ +- 122 (0x7a) ishr ✅ +- 123 (0x7b) lshr ✅ +- 124 (0x7c) iushr ✅ +- 125 (0x7d) lushr ✅ +- 126 (0x7e) iand ✅ +- 127 (0x7f) land ✅ +- 128 (0x80) ior ✅ +- 129 (0x81) lor ✅ +- 130 (0x82) ixor ✅ +- 131 (0x83) lxor ✅ +- 132 (0x84) iinc ✅ + +**Conversions** +- 133 (0x85) i2l ✅ +- 134 (0x86) i2f ✅ +- 135 (0x87) i2d ✅ +- 136 (0x88) l2i ✅ +- 137 (0x89) l2f ✅ +- 138 (0x8a) l2d ✅ +- 139 (0x8b) f2i ✅ +- 140 (0x8c) f2l ✅ +- 141 (0x8d) f2d ✅ +- 142 (0x8e) d2i ✅ +- 143 (0x8f) d2l ✅ +- 144 (0x90) d2f ✅ +- 145 (0x91) i2b ✅ +- 146 (0x92) i2c ✅ +- 147 (0x93) i2s ✅ + +**Comparisons** +- 148 (0x94) lcmp ✅ +- 149 (0x95) fcmpl ✅ +- 150 (0x96) fcmpg ✅ +- 151 (0x97) dcmpl ✅ +- 152 (0x98) dcmpg ✅ +- 153 (0x99) ifeq ✅ +- 154 (0x9a) ifne ✅ +- 155 (0x9b) iflt ✅ +- 156 (0x9c) ifge ✅ +- 157 (0x9d) ifgt ✅ +- 158 (0x9e) ifle ✅ +- 159 (0x9f) if_icmpeq ✅ +- 160 (0xa0) if_icmpne ✅ +- 161 (0xa1) if_icmplt ✅ +- 162 (0xa2) if_icmpge ✅ +- 163 (0xa3) if_icmpgt ✅ +- 164 (0xa4) if_icmple ✅ +- 165 (0xa5) if_acmpeq ✅ +- 166 (0xa6) if_acmpne ✅ + +**Control** +- 167 (0xa7) goto ✅ +- 168 (0xa8) jsr ❌ +- 169 (0xa9) ret ❌ +- 170 (0xaa) tableswitch ❌ +- 171 (0xab) lookupswitch ❌ +- 172 (0xac) ireturn ✅ +- 173 (0xad) lreturn ✅ +- 174 (0xae) freturn ✅ +- 175 (0xaf) dreturn ✅ +- 176 (0xb0) areturn ❌ +- 177 (0xb1) return ✅ + +**References** +- 178 (0xb2) getstatic ✅ +- 179 (0xb3) putstatic ✅ +- 180 (0xb4) getfield ✅ +- 181 (0xb5) putfield ✅ +- 182 (0xb6) invokevirtual ❌ +- 183 (0xb7) invokespecial ❌ +- 184 (0xb8) invokestatic ✅ +- 185 (0xb9) invokeinterface ❌ +- 186 (0xba) invokedynamic ❌ +- 187 (0xbb) new ❌ +- 188 (0xbc) newarray ❌ +- 189 (0xbd) anewarray ❌ +- 190 (0xbe) arraylength ❌ +- 191 (0xbf) athrow ❌ +- 192 (0xc0) checkcast ❌ +- 193 (0xc1) instanceof ❌ +- 194 (0xc2) monitorenter ❌ +- 195 (0xc3) monitorexit ❌ + +**Extended** +- 196 (0xc4) wide ❌ +- 197 (0xc5) multianewarray ❌ +- 198 (0xc6) ifnull ✅ +- 199 (0xc7) ifnonnull ✅ +- 200 (0xc8) goto_w ❌ +- 201 (0xc9) jsr_w ❌ + +**Reserved** +- 202 (0xca) breakpoint ❌ +- 254 (0xfe) impdep1 ❌ +- 255 (0xff) impdep2 ❌ + +MicroJIT is also currently known to have issue with methods that are far down the call stack which contain live object references. \ No newline at end of file diff --git a/runtime/compiler/control/CompilationRuntime.hpp b/runtime/compiler/control/CompilationRuntime.hpp index 75190c72772..8aa352b29e5 100644 --- a/runtime/compiler/control/CompilationRuntime.hpp +++ b/runtime/compiler/control/CompilationRuntime.hpp @@ -676,6 +676,12 @@ class CompilationInfo #endif /* defined(J9VM_OPT_JITSERVER) */ #if defined(J9VM_OPT_MICROJIT) TR::Options *options = TR::Options::getJITCmdLineOptions(); + /* Setting MicroJIT invocation count to 0 + causes some methods to not get compiled. + Hence we generally set it to a non-zero value. + 20 has been proven to be a good estimate for count value + for similar template-based JITs. + */ if (options->_mjitEnabled && value) { return; diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index 1ca36423a34..d1f4ca4d72c 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -2363,6 +2363,9 @@ bool TR::CompilationInfo::shouldRetryCompilation(TR_MethodToBeCompiled *entry, T break; #if defined(J9VM_OPT_MICROJIT) case mjitCompilationFailure: + /* MicroJIT generates this failure when it fails to compile. + The next time this method is queued for compilation, + it will be compiled by the regular JIT compiler, Testarossa. */ tryCompilingAgain = true; break; #endif diff --git a/runtime/compiler/microjit/SideTables.hpp b/runtime/compiler/microjit/SideTables.hpp index 8f881f869ce..7809a195377 100644 --- a/runtime/compiler/microjit/SideTables.hpp +++ b/runtime/compiler/microjit/SideTables.hpp @@ -27,14 +27,14 @@ #include "microjit/utils.hpp" /* -| Architecture | Endian | Return address | vTable index | -| x86-64 | Little | Stack | R8 (receiver in RAX) | +| Architecture | Endian | Return address | vTable index | +| x86-64 | Little | Stack | R8 (receiver in RAX) | | Integer Return value registers | Integer Preserved registers | Integer Argument registers | | EAX (32-bit) RAX (64-bit) | RBX R9 | RAX RSI RDX RCX | | Float Return value registers | Float Preserved registers | Float Argument registers | -| XMM0 | XMM8-XMM15 | XMM0-XMM7 | +| XMM0 | XMM8-XMM15 | XMM0-XMM7 | */ namespace MJIT diff --git a/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp b/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp index 9e1159714d4..47263dc3d26 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Linkage.hpp @@ -26,11 +26,11 @@ #include "codegen/OMRLinkage_inlines.hpp" /* -| Architecture | Endian | Return address | Argument registers | -| x86-64 | Little | Stack | RAX RSI RDX RCX | +| Architecture | Endian | Return address | Argument registers | +| x86-64 | Little | Stack | RAX RSI RDX RCX | -| Return value registers | Preserved registers | vTable index | -| EAX (32-bit) RAX (64-bit) XMM0 (float/double) | RBX R9 RSP | R8 (receiver in RAX) | +| Return value registers | Preserved registers | vTable index | +| EAX (32-bit) RAX (64-bit) XMM0 (float/double) | RBX R9 RSP | R8 (receiver in RAX) | */ namespace MJIT From 2f7d7587b1c1fa5612a1bbef050d5878c2d8a486 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Thu, 31 Mar 2022 16:48:48 -0300 Subject: [PATCH 13/25] Moving MJIT compilation thread to a new file Signed-off-by: Harpreet Kaur --- runtime/compiler/build/files/common.mk | 1 + .../compiler/control/CompilationThread.cpp | 330 ---------------- runtime/compiler/microjit/CMakeLists.txt | 2 + .../microjit/CompilationInfoPerThreadBase.cpp | 367 ++++++++++++++++++ 4 files changed, 370 insertions(+), 330 deletions(-) create mode 100644 runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp diff --git a/runtime/compiler/build/files/common.mk b/runtime/compiler/build/files/common.mk index 65e0b693e95..511fc5abe76 100644 --- a/runtime/compiler/build/files/common.mk +++ b/runtime/compiler/build/files/common.mk @@ -420,6 +420,7 @@ endif ifneq ($(J9VM_OPT_MICROJIT),) JIT_PRODUCT_SOURCE_FILES+=\ compiler/microjit/ExceptionTable.cpp + compiler/microjit/CompilationInfoPerThreadBase.cpp endif -include $(JIT_MAKE_DIR)/files/extra.mk diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index d1f4ca4d72c..a192437a7c6 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -9341,336 +9341,6 @@ TR::CompilationInfoPerThreadBase::performAOTLoad( return metaData; } -#if defined(J9VM_OPT_MICROJIT) -// MicroJIT returns a code size of 0 when it encounters a compilation error -// We must use this to set the correct values and fail compilation, this is -// the same no matter which phase of compilation fails. -// This function will bubble the exception up to the caller after clean up. -static void -testMicroJITCompilationForErrors( - uintptr_t code_size, - J9Method *method, - J9ROMMethod *romMethod, - J9JITConfig *jitConfig, - J9VMThread *vmThread, - TR::Compilation *compiler) - { - if (0 == code_size) - { - intptr_t trCountFromMJIT = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT; - trCountFromMJIT = (trCountFromMJIT << 1) | 1; - TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); - U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); - *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE; - method->extra = reinterpret_cast(trCountFromMJIT); - compiler->failCompilation("Cannot compile with MicroJIT."); - } - } - -// This routine should only be called from wrappedCompile -TR_MethodMetaData * -TR::CompilationInfoPerThreadBase::mjit( - J9VMThread *vmThread, - TR::Compilation *compiler, - TR_ResolvedMethod *compilee, - TR_J9VMBase &vm, - TR_OptimizationPlan *optimizationPlan, - TR::SegmentAllocator const &scratchSegmentProvider, - TR_Memory *trMemory - ) - { - - PORT_ACCESS_FROM_JITCONFIG(jitConfig); - - TR_MethodMetaData *metaData = NULL; - J9Method *method = NULL; - TR::CodeCache *codeCache = NULL; - - if (_methodBeingCompiled->_priority >= CP_SYNC_MIN) - ++_compInfo._numSyncCompilations; - else - ++_compInfo._numAsyncCompilations; - - if (_methodBeingCompiled->isDLTCompile()) - compiler->setDltBcIndex(static_cast(_methodBeingCompiled->getMethodDetails()).getByteCodeIndex()); - - bool volatile haveLockedClassUnloadMonitor = false; // used for compilation without VM access - int32_t estimated_size = 0; - char* buffer = NULL; - try - { - - InterruptibleOperation compilingMethodBody(*this); - - TR::IlGeneratorMethodDetails & details = _methodBeingCompiled->getMethodDetails(); - method = details.getMethod(); - - TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); // TODO: fe can be replaced by vm and removed from here. - U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); - const char* signature = compilee->signature(compiler->trMemory()); - - TRIGGER_J9HOOK_JIT_COMPILING_START(_jitConfig->hookInterface, vmThread, method); - - // BEGIN MICROJIT - TR::FilePointer *logFileFP = this->getCompilation()->getOutFile(); - MJIT::ByteCodeIterator bcIterator(0, static_cast (compilee), static_cast (&vm), comp()); - - if (TR::Options::canJITCompile()) - { - TR::Options *options = TR::Options::getJITCmdLineOptions(); - - J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); - U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); - if (compiler->getOption(TR_TraceCG)) - trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiling %s...\n------------------------------\n", signature); - if ((compilee->isConstructor())) - testMicroJITCompilationForErrors(0, method, romMethod, jitConfig, vmThread, compiler); // "MicroJIT does not currently compile constructors" - - bool isStaticMethod = compilee->isStatic(); // To know if the method we are compiling is static or not - if (!isStaticMethod) - maxLength += 1; // Make space for an extra character for object reference in case of non-static methods - - char typeString[maxLength]; - if (MJIT::nativeSignature(method, typeString)) - return 0; - - // To insert 'L' for object reference with non-static methods - // before the input arguments starting at index 3. - // The last iteration sets typeString[3] to typeString[2], - // which is always MJIT::CLASSNAME_TYPE_CHARACTER, i.e., 'L' - // and other iterations just move the characters one index down. - // e.g. xxIIB -> - if (!isStaticMethod) - { - for (int i = maxLength - 1; i > 2 ; i--) - typeString[i] = typeString[i-1]; - } - - U_16 paramCount = MJIT::getParamCount(typeString, maxLength); - U_16 actualParamCount = MJIT::getActualParamCount(typeString, maxLength); - MJIT::ParamTableEntry paramTableEntries[actualParamCount*2]; - - for (int i=0; imaxBytecodeIndex(); - - MJIT::CodeGenGC mjitCGGC(logFileFP); - MJIT::CodeGenerator mjit_cg(_jitConfig, vmThread, logFileFP, vm, ¶mTable, compiler, &mjitCGGC, comp()->getPersistentInfo(), trMemory, compilee); - estimated_size = MAX_BUFFER_SIZE(maxBCI); - buffer = (char*)mjit_cg.allocateCodeCache(estimated_size, &vm, vmThread); - testMicroJITCompilationForErrors((uintptr_t)buffer, method, romMethod, jitConfig, vmThread, compiler); // MicroJIT cannot compile if allocation fails. - codeCache = mjit_cg.getCodeCache(); - - // provide enough space for CodeCacheMethodHeader - char *cursor = buffer; - - buffer_size_t buffer_size = 0; - - char *magicWordLocation, *first2BytesPatchLocation, *samplingRecompileCallLocation; - buffer_size_t code_size = mjit_cg.generatePrePrologue(cursor, method, &magicWordLocation, &first2BytesPatchLocation, &samplingRecompileCallLocation); - - testMicroJITCompilationForErrors((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); - - compiler->cg()->setPrePrologueSize(code_size); - buffer_size += code_size; - MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after pre-prologue"); - -#ifdef MJIT_DEBUG - trfprintf(this->getCompilation()->getOutFile(), "\ngeneratePrePrologue\n"); - for (int32_t i = 0; i < code_size; i++) - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); - -#endif - cursor += code_size; - - // start point should point to prolog - char *prologue_address = cursor; - - // generate debug breakpoint - if (comp()->getOption(TR_EntryBreakPoints)) - { - code_size = mjit_cg.generateDebugBreakpoint(cursor); - cursor += code_size; - buffer_size += code_size; - } - - char *jitStackOverflowPatchLocation = NULL; - - mjit_cg.setPeakStackSize(romMethod->maxStack * mjit_cg.getPointerSize()); - char *firstInstructionLocation = NULL; - - code_size = mjit_cg.generatePrologue(cursor, method, &jitStackOverflowPatchLocation, magicWordLocation, first2BytesPatchLocation, samplingRecompileCallLocation, &firstInstructionLocation, &bcIterator); - testMicroJITCompilationForErrors((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); - - TR::GCStackAtlas *atlas = mjit_cg.getStackAtlas(); - compiler->cg()->setStackAtlas(atlas); - compiler->cg()->setMethodStackMap(atlas->getLocalMap()); - // TODO: Find out why setting this correctly causes the startPC to report the jitToJit startPC and not the interpreter entry point - // compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(firstInstructionLocation-cursor)); - compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(0)); - - buffer_size += code_size; - MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after prologue"); - - if (compiler->getOption(TR_TraceCG)) - { - trfprintf(this->getCompilation()->getOutFile(), "\ngeneratePrologue\n"); - for (int32_t i = 0; i < code_size; i++) - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); - } - - cursor += code_size; - - // GENERATE BODY - bcIterator.setIndex(0); - - TR_Debug dbg(this->getCompilation()); - this->getCompilation()->setDebug(&dbg); - - if (compiler->getOption(TR_TraceCG)) - trfprintf(this->getCompilation()->getOutFile(), "\n%s\n", signature); - - code_size = mjit_cg.generateBody(cursor, &bcIterator); - - testMicroJITCompilationForErrors((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); - - buffer_size += code_size; - MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after body"); - -#ifdef MJIT_DEBUG - trfprintf(this->getCompilation()->getOutFile(), "\ngenerateBody\n"); - for (int32_t i = 0; i < code_size; i++) - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); -#endif - - cursor += code_size; - // END GENERATE BODY - - // GENERATE COLD AREA - code_size = mjit_cg.generateColdArea(cursor, method, jitStackOverflowPatchLocation); - - testMicroJITCompilationForErrors((uintptr_t)code_size, method, romMethod, jitConfig, vmThread, compiler); - buffer_size += code_size; - MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after cold-area"); - -#ifdef MJIT_DEBUG - trfprintf(this->getCompilation()->getOutFile(), "\ngenerateColdArea\n"); - for (int32_t i = 0; i < code_size; i++) - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); - - trfprintf(this->getCompilation()->getOutFile(), "\nfinal method\n"); - for (int32_t i = 0; i < buffer_size; i++) - trfprintf(this->getCompilation()->getOutFile(), "%02x\n", ((unsigned char)buffer[i]) & (unsigned char)0xff); -#endif - - cursor += code_size; - - // As the body is finished, mark its profile info as active so that the JProfiler thread will inspect it - - // TODO: after adding profiling support, uncomment this. - // if (bodyInfo && bodyInfo->getProfileInfo()) - // { - // bodyInfo->getProfileInfo()->setActive(); - // } - - // Put a metaData pointer into the Code Cache Header(s). - // - compiler->cg()->setBinaryBufferCursor((uint8_t*)(cursor)); - compiler->cg()->setBinaryBufferStart((uint8_t*)(buffer)); - if (compiler->getOption(TR_TraceCG)) - trfprintf(this->getCompilation()->getOutFile(), "Compiled method binary buffer finalized [" POINTER_PRINTF_FORMAT " : " POINTER_PRINTF_FORMAT "] for %s @ %s", ((void*)buffer), ((void*)cursor), compiler->signature(), compiler->getHotnessName()); - - metaData = createMJITMethodMetaData(vm, compilee, compiler); - if (!metaData) - { - if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch)) - TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Failed to create metadata for %s @ %s", compiler->signature(), compiler->getHotnessName()); - compiler->failCompilation("Metadata creation failure"); - } - if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch)) - TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Successfully created metadata [" POINTER_PRINTF_FORMAT "] for %s @ %s", metaData, compiler->signature(), compiler->getHotnessName()); - setMetadata(metaData); - uint8_t *warmMethodHeader = compiler->cg()->getBinaryBufferStart() - sizeof(OMR::CodeCacheMethodHeader); - memcpy(warmMethodHeader + offsetof(OMR::CodeCacheMethodHeader, _metaData), &metaData, sizeof(metaData)); - // FAR: should we do postpone this copying until after CHTable commit? - metaData->runtimeAssumptionList = *(compiler->getMetadataAssumptionList()); - TR::ClassTableCriticalSection chTableCommit(&vm); - TR_ASSERT(!chTableCommit.acquiredVMAccess(), "We should have already acquired VM access at this point."); - TR_CHTable *chTable = compiler->getCHTable(); - - if (prologue_address && compiler->getOption(TR_TraceCG)) - trfprintf(this->getCompilation()->getOutFile(), "\nMJIT:%s\n", compilee->signature(compiler->trMemory())); - - compiler->cg()->getCodeCache()->trimCodeMemoryAllocation(buffer, (cursor-buffer)); - - // END MICROJIT - if (compiler->getOption(TR_TraceCG)) - trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiled %s Successfully!\n------------------------------\n", signature); - - } - - logCompilationSuccess(vmThread, vm, method, scratchSegmentProvider, compilee, compiler, metaData, optimizationPlan); - - TRIGGER_J9HOOK_JIT_COMPILING_END(_jitConfig->hookInterface, vmThread, method); - } - catch (const std::exception &e) - { - const char *exceptionName; - -#if defined(J9ZOS390) - // Compiling with -Wc,lp64 results in a crash on z/OS when trying - // to call the what() virtual method of the exception. - exceptionName = "std::exception"; -#else - exceptionName = e.what(); -#endif - - printCompFailureInfo(compiler, exceptionName); - processException(vmThread, scratchSegmentProvider, compiler, haveLockedClassUnloadMonitor, exceptionName); - if (codeCache) - { - if (estimated_size && buffer) - { - codeCache->trimCodeMemoryAllocation(buffer, 1); - } - TR::CodeCacheManager::instance()->unreserveCodeCache(codeCache); - } - metaData = 0; - } - - // At this point the compilation has either succeeded and compilation cannot be - // interrupted anymore, or it has failed. In either case _compilationShouldBeInterrupted flag - // is not needed anymore - setCompilationShouldBeInterrupted(0); - - // We should not have the classTableMutex at this point - TR_ASSERT_FATAL(!TR::MonitorTable::get()->getClassTableMutex()->owned_by_self(), - "Should not still own classTableMutex"); - - // Increment the number of JIT compilations (either successful or not) - // performed by this compilation thread - // - incNumJITCompilations(); - - return metaData; - } -#endif /* J9VM_OPT_MICROJIT */ - // This routine should only be called from wrappedCompile TR_MethodMetaData * TR::CompilationInfoPerThreadBase::compile( diff --git a/runtime/compiler/microjit/CMakeLists.txt b/runtime/compiler/microjit/CMakeLists.txt index ad53b424d63..bebab4ae699 100644 --- a/runtime/compiler/microjit/CMakeLists.txt +++ b/runtime/compiler/microjit/CMakeLists.txt @@ -22,12 +22,14 @@ if(J9VM_OPT_MJIT_Standalone) j9jit_files( microjit/ExceptionTable.cpp + microjit/CompilationInfoPerThreadBase.cpp microjit/assembly/utils.nasm microjit/LWMetaDataCreation.cpp ) else() j9jit_files( microjit/ExceptionTable.cpp + microjit/CompilationInfoPerThreadBase.cpp microjit/assembly/utils.nasm microjit/HWMetaDataCreation.cpp ) diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp new file mode 100644 index 00000000000..dc8aa341b79 --- /dev/null +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -0,0 +1,367 @@ +/******************************************************************************* + * Copyright (c) 2022, 2022 IBM Corp. and others + * + * This program and the accompanying materials are made available under + * the terms of the Eclipse Public License 2.0 which accompanies this + * distribution and is available at http://eclipse.org/legal/epl-2.0 + * or the Apache License, Version 2.0 which accompanies this distribution + * and is available at https://www.apache.org/licenses/LICENSE-2.0. + * + * This Source Code may also be made available under the following Secondary + * Licenses when the conditions for such availability set forth in the + * Eclipse Public License, v. 2.0 are satisfied: GNU General Public License, + * version 2 with the GNU Classpath Exception [1] and GNU General Public + * License, version 2 with the OpenJDK Assembly Exception [2]. + * + * [1] https://www.gnu.org/software/classpath/license.html + * [2] http://openjdk.java.net/legal/assembly-exception.html + * + * SPDX-License-Identifier: EPL-2.0 OR Apache-2.0 OR GPL-2.0 WITH Classpath-exception-2.0 OR LicenseRef-GPL-2.0 WITH Assembly-exception + *******************************************************************************/ + +#include "control/CompilationThread.hpp" +#include "microjit/x/amd64/AMD64Codegen.hpp" +#include "control/CompilationRuntime.hpp" +#include "compile/Compilation.hpp" +#include "codegen/OMRCodeGenerator.hpp" +#include "env/VerboseLog.hpp" +#include "env/ClassTableCriticalSection.hpp" +#include "runtime/CodeCacheTypes.hpp" +#include "runtime/CodeCache.hpp" +#include "runtime/CodeCacheManager.hpp" + +#if defined(J9VM_OPT_MICROJIT) + +static void +printCompFailureInfo(TR::Compilation * comp, const char * reason) + { + if (comp && comp->getOptions()->getAnyOption(TR_TraceAll)) + traceMsg(comp, "\n=== EXCEPTION THROWN (%s) ===\n", reason); + } + +// MicroJIT returns a code size of 0 when it encounters a compilation error +// We must use this to set the correct values and fail compilation, this is +// the same no matter which phase of compilation fails. +// This function will bubble the exception up to the caller after clean up. +static void +testMicroJITCompilationForErrors( + uintptr_t code_size, + J9Method *method, + J9JITConfig *jitConfig, + J9VMThread *vmThread, + TR::Compilation *compiler) + { + if (0 == code_size) + { + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); + intptr_t trCountFromMJIT = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT; + trCountFromMJIT = (trCountFromMJIT << 1) | 1; + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); + U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE; + method->extra = reinterpret_cast(trCountFromMJIT); + compiler->failCompilation("Cannot compile with MicroJIT."); + } + } + +// This routine should only be called from wrappedCompile +TR_MethodMetaData * +TR::CompilationInfoPerThreadBase::mjit( + J9VMThread *vmThread, + TR::Compilation *compiler, + TR_ResolvedMethod *compilee, + TR_J9VMBase &vm, + TR_OptimizationPlan *optimizationPlan, + TR::SegmentAllocator const &scratchSegmentProvider, + TR_Memory *trMemory + ) + { + + PORT_ACCESS_FROM_JITCONFIG(jitConfig); + + TR_MethodMetaData *metaData = NULL; + J9Method *method = NULL; + TR::CodeCache *codeCache = NULL; + + if (_methodBeingCompiled->_priority >= CP_SYNC_MIN) + ++_compInfo._numSyncCompilations; + else + ++_compInfo._numAsyncCompilations; + + if (_methodBeingCompiled->isDLTCompile()) + testMicroJITCompilationForErrors(0, method, jitConfig, vmThread, compiler); + + bool volatile haveLockedClassUnloadMonitor = false; // used for compilation without VM access + int32_t estimated_size = 0; + char* buffer = NULL; + try + { + + InterruptibleOperation compilingMethodBody(*this); + + TR::IlGeneratorMethodDetails & details = _methodBeingCompiled->getMethodDetails(); + method = details.getMethod(); + + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); // TODO: fe can be replaced by vm and removed from here. + U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + const char* signature = compilee->signature(compiler->trMemory()); + + TRIGGER_J9HOOK_JIT_COMPILING_START(_jitConfig->hookInterface, vmThread, method); + + // BEGIN MICROJIT + TR::FilePointer *logFileFP = this->getCompilation()->getOutFile(); + MJIT::ByteCodeIterator bcIterator(0, static_cast (compilee), static_cast (&vm), comp()); + + if (TR::Options::canJITCompile()) + { + TR::Options *options = TR::Options::getJITCmdLineOptions(); + + J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(method); + U_16 maxLength = J9UTF8_LENGTH(J9ROMMETHOD_SIGNATURE(romMethod)); + if (compiler->getOption(TR_TraceCG)) + trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiling %s...\n------------------------------\n", signature); + if ((compilee->isConstructor())) + testMicroJITCompilationForErrors(0, method, jitConfig, vmThread, compiler); // "MicroJIT does not currently compile constructors" + + bool isStaticMethod = compilee->isStatic(); // To know if the method we are compiling is static or not + if (!isStaticMethod) + maxLength += 1; // Make space for an extra character for object reference in case of non-static methods + + char typeString[maxLength]; + if (MJIT::nativeSignature(method, typeString)) + return 0; + + // To insert 'L' for object reference with non-static methods + // before the input arguments starting at index 3. + // The last iteration sets typeString[3] to typeString[2], + // which is always MJIT::CLASSNAME_TYPE_CHARACTER, i.e., 'L' + // and other iterations just move the characters one index down. + // e.g. xxIIB -> + if (!isStaticMethod) + { + for (int i = maxLength - 1; i > 2 ; i--) + typeString[i] = typeString[i-1]; + } + + U_16 paramCount = MJIT::getParamCount(typeString, maxLength); + U_16 actualParamCount = MJIT::getActualParamCount(typeString, maxLength); + MJIT::ParamTableEntry paramTableEntries[actualParamCount*2]; + + for (int i=0; imaxBytecodeIndex(); + + MJIT::CodeGenGC mjitCGGC(logFileFP); + MJIT::CodeGenerator mjit_cg(_jitConfig, vmThread, logFileFP, vm, ¶mTable, compiler, &mjitCGGC, comp()->getPersistentInfo(), trMemory, compilee); + estimated_size = MAX_BUFFER_SIZE(maxBCI); + buffer = (char*)mjit_cg.allocateCodeCache(estimated_size, &vm, vmThread); + testMicroJITCompilationForErrors((uintptr_t)buffer, method, jitConfig, vmThread, compiler); // MicroJIT cannot compile if allocation fails. + codeCache = mjit_cg.getCodeCache(); + + // provide enough space for CodeCacheMethodHeader + char *cursor = buffer; + + buffer_size_t buffer_size = 0; + + char *magicWordLocation, *first2BytesPatchLocation, *samplingRecompileCallLocation; + buffer_size_t code_size = mjit_cg.generatePrePrologue(cursor, method, &magicWordLocation, &first2BytesPatchLocation, &samplingRecompileCallLocation); + + testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler); + + compiler->cg()->setPrePrologueSize(code_size); + buffer_size += code_size; + MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after pre-prologue"); + +#ifdef MJIT_DEBUG + trfprintf(logFileFP, "\ngeneratePrePrologue\n"); + for (int32_t i = 0; i < code_size; i++) + trfprintf(logFileFP, "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + +#endif + cursor += code_size; + + // start point should point to prolog + char *prologue_address = cursor; + + // generate debug breakpoint + if (comp()->getOption(TR_EntryBreakPoints)) + { + code_size = mjit_cg.generateDebugBreakpoint(cursor); + cursor += code_size; + buffer_size += code_size; + } + + char *jitStackOverflowPatchLocation = NULL; + + mjit_cg.setPeakStackSize(romMethod->maxStack * mjit_cg.getPointerSize()); + char *firstInstructionLocation = NULL; + + code_size = mjit_cg.generatePrologue(cursor, method, &jitStackOverflowPatchLocation, magicWordLocation, first2BytesPatchLocation, samplingRecompileCallLocation, &firstInstructionLocation, &bcIterator); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler); + + TR::GCStackAtlas *atlas = mjit_cg.getStackAtlas(); + compiler->cg()->setStackAtlas(atlas); + compiler->cg()->setMethodStackMap(atlas->getLocalMap()); + // TODO: Find out why setting this correctly causes the startPC to report the jitToJit startPC and not the interpreter entry point + // compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(firstInstructionLocation-cursor)); + compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(0)); + + buffer_size += code_size; + MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after prologue"); + + if (compiler->getOption(TR_TraceCG)) + { + trfprintf(logFileFP, "\ngeneratePrologue\n"); + for (int32_t i = 0; i < code_size; i++) + trfprintf(logFileFP, "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + } + + cursor += code_size; + + // GENERATE BODY + bcIterator.setIndex(0); + + TR_Debug dbg(this->getCompilation()); + this->getCompilation()->setDebug(&dbg); + + if (compiler->getOption(TR_TraceCG)) + trfprintf(logFileFP, "\n%s\n", signature); + + code_size = mjit_cg.generateBody(cursor, &bcIterator); + + testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler); + + buffer_size += code_size; + MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after body"); + +#ifdef MJIT_DEBUG + trfprintf(logFileFP, "\ngenerateBody\n"); + for (int32_t i = 0; i < code_size; i++) + trfprintf(logFileFP, "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); +#endif + + cursor += code_size; + // END GENERATE BODY + + // GENERATE COLD AREA + code_size = mjit_cg.generateColdArea(cursor, method, jitStackOverflowPatchLocation); + + testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler); + buffer_size += code_size; + MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after cold-area"); + +#ifdef MJIT_DEBUG + trfprintf(logFileFP, "\ngenerateColdArea\n"); + for (int32_t i = 0; i < code_size; i++) + trfprintf(logFileFP, "%02x\n", ((unsigned char)cursor[i]) & (unsigned char)0xff); + + trfprintf(logFileFP, "\nfinal method\n"); + for (int32_t i = 0; i < buffer_size; i++) + trfprintf(logFileFP, "%02x\n", ((unsigned char)buffer[i]) & (unsigned char)0xff); +#endif + + cursor += code_size; + + // As the body is finished, mark its profile info as active so that the JProfiler thread will inspect it + + // TODO: after adding profiling support, uncomment this. + // if (bodyInfo && bodyInfo->getProfileInfo()) + // { + // bodyInfo->getProfileInfo()->setActive(); + // } + + // Put a metaData pointer into the Code Cache Header(s). + compiler->cg()->setBinaryBufferCursor((uint8_t*)(cursor)); + compiler->cg()->setBinaryBufferStart((uint8_t*)(buffer)); + if (compiler->getOption(TR_TraceCG)) + trfprintf(logFileFP, "Compiled method binary buffer finalized [" POINTER_PRINTF_FORMAT " : " POINTER_PRINTF_FORMAT "] for %s @ %s", ((void*)buffer), ((void*)cursor), compiler->signature(), compiler->getHotnessName()); + + metaData = createMJITMethodMetaData(vm, compilee, compiler); + if (!metaData) + { + if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch)) + TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Failed to create metadata for %s @ %s", compiler->signature(), compiler->getHotnessName()); + compiler->failCompilation("Metadata creation failure"); + } + if (TR::Options::getVerboseOption(TR_VerboseCompilationDispatch)) + TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Successfully created metadata [" POINTER_PRINTF_FORMAT "] for %s @ %s", metaData, compiler->signature(), compiler->getHotnessName()); + setMetadata(metaData); + uint8_t *warmMethodHeader = compiler->cg()->getBinaryBufferStart() - sizeof(OMR::CodeCacheMethodHeader); + memcpy(warmMethodHeader + offsetof(OMR::CodeCacheMethodHeader, _metaData), &metaData, sizeof(metaData)); + // FAR: should we do postpone this copying until after CHTable commit? + metaData->runtimeAssumptionList = *(compiler->getMetadataAssumptionList()); + TR::ClassTableCriticalSection chTableCommit(&vm); + TR_ASSERT(!chTableCommit.acquiredVMAccess(), "We should have already acquired VM access at this point."); + TR_CHTable *chTable = compiler->getCHTable(); + + if (prologue_address && compiler->getOption(TR_TraceCG)) + trfprintf(logFileFP, "\nMJIT:%s\n", compilee->signature(compiler->trMemory())); + + compiler->cg()->getCodeCache()->trimCodeMemoryAllocation(buffer, (cursor-buffer)); + + // END MICROJIT + if (compiler->getOption(TR_TraceCG)) + trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiled %s Successfully!\n------------------------------\n", signature); + + } + + logCompilationSuccess(vmThread, vm, method, scratchSegmentProvider, compilee, compiler, metaData, optimizationPlan); + + TRIGGER_J9HOOK_JIT_COMPILING_END(_jitConfig->hookInterface, vmThread, method); + } + catch (const std::exception &e) + { + const char *exceptionName; + +#if defined(J9ZOS390) + // Compiling with -Wc,lp64 results in a crash on z/OS when trying + // to call the what() virtual method of the exception. + exceptionName = "std::exception"; +#else + exceptionName = e.what(); +#endif + + printCompFailureInfo(compiler, exceptionName); + processException(vmThread, scratchSegmentProvider, compiler, haveLockedClassUnloadMonitor, exceptionName); + if (codeCache) + { + if (estimated_size && buffer) + { + codeCache->trimCodeMemoryAllocation(buffer, 1); + } + TR::CodeCacheManager::instance()->unreserveCodeCache(codeCache); + } + metaData = 0; + } + + // At this point the compilation has either succeeded and compilation cannot be + // interrupted anymore, or it has failed. In either case _compilationShouldBeInterrupted flag + // is not needed anymore + setCompilationShouldBeInterrupted(0); + + // We should not have the classTableMutex at this point + TR_ASSERT_FATAL(!TR::MonitorTable::get()->getClassTableMutex()->owned_by_self(), + "Should not still own classTableMutex"); + + // Increment the number of JIT compilations (either successful or not) + // performed by this compilation thread + incNumJITCompilations(); + + return metaData; + } +#endif /* J9VM_OPT_MICROJIT */ From fc7e697724a7dd8fecb388a16dce6d7996a8bd7f Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Tue, 5 Apr 2022 16:35:47 -0300 Subject: [PATCH 14/25] Preserving count for AOT loads from being overwritten by MJIT count Signed-off-by: Harpreet Kaur --- runtime/compiler/control/CompilationRuntime.hpp | 9 ++++++++- runtime/compiler/control/HookedByTheJit.cpp | 4 ++-- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/runtime/compiler/control/CompilationRuntime.hpp b/runtime/compiler/control/CompilationRuntime.hpp index 8aa352b29e5..8c6a75b8b17 100644 --- a/runtime/compiler/control/CompilationRuntime.hpp +++ b/runtime/compiler/control/CompilationRuntime.hpp @@ -697,11 +697,18 @@ class CompilationInfo } #if defined(J9VM_OPT_MICROJIT) - static void setInitialMJITCountUnsynchronized(J9Method *method, int32_t mjitThreshold) + static void setInitialMJITCountUnsynchronized(J9Method *method, int32_t mjitThreshold, int32_t trCount, J9JITConfig *jitConfig, J9VMThread *vmThread) { if (TR::Options::getJITCmdLineOptions()->_mjitEnabled) { intptr_t value = (intptr_t)((mjitThreshold << 1) | J9_STARTPC_NOT_TRANSLATED); + if (trCount < value) + { + value = trCount; + TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); + U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE; + } method->extra = reinterpret_cast(static_cast(value)); } } diff --git a/runtime/compiler/control/HookedByTheJit.cpp b/runtime/compiler/control/HookedByTheJit.cpp index 8941b28af39..3a96a51c75a 100644 --- a/runtime/compiler/control/HookedByTheJit.cpp +++ b/runtime/compiler/control/HookedByTheJit.cpp @@ -614,8 +614,8 @@ static void jitHookInitializeSendTarget(J9HookInterface * * hook, UDATA eventNum if (optionsJIT->_mjitEnabled) { int32_t mjitCount = optionsJIT->_mjitInitialCount; - if (count) - TR::CompilationInfo::setInitialMJITCountUnsynchronized(method, mjitCount); + if (mjitCount && !(TR::Options::sharedClassCache() && jitConfig->javaVM->sharedClassConfig->existsCachedCodeForROMMethod(vmThread, romMethod))) + TR::CompilationInfo::setInitialMJITCountUnsynchronized(method, mjitCount, count, jitConfig, vmThread); } #endif From 1433d556f5a097192bc4329c889bf7196bb6ced4 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Thu, 14 Apr 2022 12:18:06 -0300 Subject: [PATCH 15/25] Modify encoding for setting value, formatting and some suggested changes Signed-off-by: Harpreet Kaur --- runtime/compiler/CMakeLists.txt | 4 ++-- .../compiler/control/CompilationRuntime.hpp | 12 ++++++---- .../compiler/control/CompilationThread.cpp | 8 +++---- .../compiler/control/CompilationThread.hpp | 24 ++++++++++++++----- .../compiler/control/RecompilationInfo.hpp | 4 ++-- runtime/compiler/infra/J9Cfg.cpp | 8 +++---- .../microjit/CompilationInfoPerThreadBase.cpp | 4 ++-- .../microjit/x/amd64/AMD64Codegen.hpp | 20 ++++++++-------- .../compiler/optimizer/JProfilingBlock.cpp | 4 ++-- runtime/compiler/runtime/MetaData.cpp | 7 +++--- 10 files changed, 55 insertions(+), 40 deletions(-) diff --git a/runtime/compiler/CMakeLists.txt b/runtime/compiler/CMakeLists.txt index 6e4d98fe365..0065626da41 100644 --- a/runtime/compiler/CMakeLists.txt +++ b/runtime/compiler/CMakeLists.txt @@ -123,8 +123,8 @@ if(J9VM_OPT_MICROJIT) message(STATUS "MicroJIT is enabled") endif() -#TODO We should get rid of this, but its still required by the compiler_support module in omr -set(J9SRC ${j9vm_SOURCE_DIR}) +# TODO We should get rid of this, but it's still required by the compiler_support module in OMR. +set(J9SRC ${j9vm_SOURCE_DIR}) # include compiler support module from OMR include(OmrCompilerSupport) diff --git a/runtime/compiler/control/CompilationRuntime.hpp b/runtime/compiler/control/CompilationRuntime.hpp index 8c6a75b8b17..7682e2e3282 100644 --- a/runtime/compiler/control/CompilationRuntime.hpp +++ b/runtime/compiler/control/CompilationRuntime.hpp @@ -701,14 +701,18 @@ class CompilationInfo { if (TR::Options::getJITCmdLineOptions()->_mjitEnabled) { - intptr_t value = (intptr_t)((mjitThreshold << 1) | J9_STARTPC_NOT_TRANSLATED); - if (trCount < value) + intptr_t value; + if (trCount < mjitThreshold) { - value = trCount; + value = (intptr_t)((trCount << 1) | J9_STARTPC_NOT_TRANSLATED); TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); - U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE; } + else + { + value = (intptr_t)((mjitThreshold << 1) | J9_STARTPC_NOT_TRANSLATED); + } method->extra = reinterpret_cast(static_cast(value)); } } diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index a192437a7c6..18f9d89b494 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -176,10 +176,10 @@ shouldCompileWithMicroJIT(J9Method *method, J9JITConfig *jitConfig, J9VMThread * TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); UDATA extra = (UDATA)method->extra; - U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); - return (J9_ARE_ALL_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) && // MicroJIT is not a target for recompilation - !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) && // MicroJIT does not compile native methods - !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)); // MicroJIT failed to compile this method already + uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + return (J9_ARE_ALL_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) // MicroJIT is not a target for recompilation + && !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) // MicroJIT does not compile native methods + && !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)); // MicroJIT failed to compile this method already } #endif diff --git a/runtime/compiler/control/CompilationThread.hpp b/runtime/compiler/control/CompilationThread.hpp index 972cf8e2028..af6a17736e3 100644 --- a/runtime/compiler/control/CompilationThread.hpp +++ b/runtime/compiler/control/CompilationThread.hpp @@ -158,13 +158,25 @@ class CompilationInfoPerThreadBase void setMetadata(TR_MethodMetaData *m) {_metadata = m;} void *compile(J9VMThread *context, TR_MethodToBeCompiled *entry, J9::J9SegmentProvider &scratchSegmentProvider); #if defined(J9VM_OPT_MICROJIT) - TR_MethodMetaData *mjit(J9VMThread *context, TR::Compilation *, - TR_ResolvedMethod *compilee, TR_J9VMBase &, TR_OptimizationPlan*, TR::SegmentAllocator const &scratchSegmentProvider, - TR_Memory *trMemory); + TR_MethodMetaData *mjit(J9VMThread *context, + TR::Compilation *, + TR_ResolvedMethod *compilee, + TR_J9VMBase &, + TR_OptimizationPlan*, + TR::SegmentAllocator const &scratchSegmentProvider, + TR_Memory *trMemory); #endif - TR_MethodMetaData *compile(J9VMThread *context, TR::Compilation *, - TR_ResolvedMethod *compilee, TR_J9VMBase &, TR_OptimizationPlan*, TR::SegmentAllocator const &scratchSegmentProvider); - TR_MethodMetaData *performAOTLoad(J9VMThread *context, TR::Compilation *, TR_ResolvedMethod *compilee, TR_J9VMBase *vm, J9Method *method); + TR_MethodMetaData *compile(J9VMThread *context, + TR::Compilation *, + TR_ResolvedMethod *compilee, + TR_J9VMBase &, + TR_OptimizationPlan*, + TR::SegmentAllocator const &scratchSegmentProvider); + TR_MethodMetaData *performAOTLoad(J9VMThread *context, + TR::Compilation *, + TR_ResolvedMethod *compilee, + TR_J9VMBase *vm, + J9Method *method); void preCompilationTasks(J9VMThread * vmThread, TR_MethodToBeCompiled *entry, diff --git a/runtime/compiler/control/RecompilationInfo.hpp b/runtime/compiler/control/RecompilationInfo.hpp index 4eaab7564be..6d041008156 100644 --- a/runtime/compiler/control/RecompilationInfo.hpp +++ b/runtime/compiler/control/RecompilationInfo.hpp @@ -339,8 +339,8 @@ class TR_PersistentJittedBodyInfo bool getHasLoops() { return _flags.testAny(HasLoops); } #if defined(J9VM_OPT_MICROJIT) - bool isMJITCompiledMethod() { return _flags.testAny(IsMJITCompiled); } - void setIsMJITCompiledMethod(bool b) { _flags.set(IsMJITCompiled, b); } + bool isMJITCompiledMethod() { return _flags.testAny(IsMJITCompiled); } + void setIsMJITCompiledMethod(bool b){ _flags.set(IsMJITCompiled, b); } #endif /* defined(J9VM_OPT_MICROJIT) */ bool getUsesPreexistence() { return _flags.testAny(UsesPreexistence); } bool getDisableSampling() { return _flags.testAny(DisableSampling); } diff --git a/runtime/compiler/infra/J9Cfg.cpp b/runtime/compiler/infra/J9Cfg.cpp index 0abb161d7cb..dd4cda5e7c9 100644 --- a/runtime/compiler/infra/J9Cfg.cpp +++ b/runtime/compiler/infra/J9Cfg.cpp @@ -1715,8 +1715,8 @@ J9::CFG::propagateFrequencyInfoFromExternalProfiler(TR_ExternalProfiler *profile _externalProfiler = profiler; #if defined(J9VM_OPT_MICROJIT) J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); - U_8 *extendedFlags = reinterpret_cast(comp()->fe())->fetchMethodExtendedFlagsPointer(method); - if (((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1 && TR::Options::getJITCmdLineOptions()->_mjitEnabled) + uint8_t *extendedFlags = comp()->fej9()->fetchMethodExtendedFlagsPointer(method); + if (TR::Options::getJITCmdLineOptions()->_mjitEnabled && ((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1) { if (!profiler) return; @@ -1791,8 +1791,8 @@ TR_J9ByteCode J9::CFG::getBytecodeFromIndex(int32_t index) { TR_ResolvedJ9Method *method = static_cast(_method->getResolvedMethod()); - TR_J9VMBase *fe = _compilation->fej9(); - TR_J9ByteCodeIterator bcIterator(_method, method, fe, _compilation); + TR_J9VMBase *fe = comp()->fej9(); + TR_J9ByteCodeIterator bcIterator(_method, method, fe, comp()); bcIterator.setIndex(index); return bcIterator.next(); } diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp index dc8aa341b79..c159986239b 100644 --- a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -57,7 +57,7 @@ testMicroJITCompilationForErrors( intptr_t trCountFromMJIT = J9ROMMETHOD_HAS_BACKWARDS_BRANCHES(romMethod) ? TR_DEFAULT_INITIAL_BCOUNT : TR_DEFAULT_INITIAL_COUNT; trCountFromMJIT = (trCountFromMJIT << 1) | 1; TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); - U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE; method->extra = reinterpret_cast(trCountFromMJIT); compiler->failCompilation("Cannot compile with MicroJIT."); @@ -103,7 +103,7 @@ TR::CompilationInfoPerThreadBase::mjit( method = details.getMethod(); TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); // TODO: fe can be replaced by vm and removed from here. - U_8 *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); const char* signature = compilee->signature(compiler->trMemory()); TRIGGER_J9HOOK_JIT_COMPILING_START(_jitConfig->hookInterface, vmThread, method); diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp index eb37e602027..12e0dde6ddf 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp @@ -174,9 +174,9 @@ class CodeGenerator */ buffer_size_t generateArgumentMoveForStaticMethod( - char* buffer, - TR_ResolvedMethod* staticMethod, - char* typeString, + char *buffer, + TR_ResolvedMethod *staticMethod, + char *typeString, U_16 typeStringLength, U_16 slotCount, struct TR::X86LinkageProperties *properties); @@ -190,8 +190,8 @@ class CodeGenerator */ buffer_size_t generateInvokeStatic( - char* buffer, - MJIT::ByteCodeIterator* bci, + char *buffer, + MJIT::ByteCodeIterator *bci, struct TR::X86LinkageProperties *properties); /** @@ -203,8 +203,8 @@ class CodeGenerator */ buffer_size_t generateGetField( - char* buffer, - MJIT::ByteCodeIterator* bci); + char *buffer, + MJIT::ByteCodeIterator *bci); /** * Generates a putField instruction based on the type. @@ -215,8 +215,8 @@ class CodeGenerator */ buffer_size_t generatePutField( - char* buffer, - MJIT::ByteCodeIterator* bci); + char *buffer, + MJIT::ByteCodeIterator *bci); /** * Generates a putStatic instruction based on the type. @@ -334,7 +334,7 @@ class CodeGenerator char *first2BytesPatchLocation, char *samplingRecompileCallLocation, char** firstInstLocation, - MJIT::ByteCodeIterator* bci); + MJIT::ByteCodeIterator *bci); /** * Generates the cold area used for helpers, such as the jit stack overflow checker. diff --git a/runtime/compiler/optimizer/JProfilingBlock.cpp b/runtime/compiler/optimizer/JProfilingBlock.cpp index 6437d38cd5b..9bc9f79ed0a 100644 --- a/runtime/compiler/optimizer/JProfilingBlock.cpp +++ b/runtime/compiler/optimizer/JProfilingBlock.cpp @@ -983,8 +983,8 @@ int32_t TR_JProfilingBlock::perform() // TODO: Remove this when ready to implement counter insertion. #if defined(J9VM_OPT_MICROJIT) J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); - U_8 *extendedFlags = reinterpret_cast(fe())->fetchMethodExtendedFlagsPointer(method); - if (((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1) + uint8_t *extendedFlags = comp()->fej9()->fetchMethodExtendedFlagsPointer(method); + if (TR::Options::getJITCmdLineOptions()->_mjitEnabled && ((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1) { comp()->setSkippedJProfilingBlock(); return 0; diff --git a/runtime/compiler/runtime/MetaData.cpp b/runtime/compiler/runtime/MetaData.cpp index 80f95206957..6229a150c2e 100644 --- a/runtime/compiler/runtime/MetaData.cpp +++ b/runtime/compiler/runtime/MetaData.cpp @@ -1845,10 +1845,9 @@ createMethodMetaData( } #if defined(J9VM_OPT_MICROJIT) -// -//the routine that sequences the creation of the meta-data for +// --------------------------------------------------------------------------- +// The routine that sequences the creation of the meta-data for // a method compiled for MJIT -// TR_MethodMetaData * createMJITMethodMetaData( TR_J9VMBase & vmArg, @@ -1865,7 +1864,7 @@ createMJITMethodMetaData( TR::CodeGenerator *cg = comp->cg(); TR::GCStackAtlas *trStackAtlas = cg->getStackAtlas(); - // -------------------------------------------------------------------------- + // --------------------------------------------------------------------------- // Find unmergeable GC maps // TODO: MicroJIT: Create mechanism like treetop to support this at a MicroJIT level. GCStackMapSet nonmergeableBCI(std::less(), cg->trMemory()->heapMemoryRegion()); From 5327dcf36d5379d3e5d48f5f1325bbee554574ba Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Mon, 25 Apr 2022 12:22:25 -0300 Subject: [PATCH 16/25] Using query for checking if the method is interpreted and for MJIT extended flags Signed-off-by: Harpreet Kaur --- runtime/compiler/control/CompilationRuntime.hpp | 2 +- runtime/compiler/control/CompilationThread.cpp | 2 +- runtime/compiler/env/VMJ9.cpp | 9 +++++++++ runtime/compiler/env/VMJ9.h | 3 +++ runtime/compiler/infra/J9Cfg.cpp | 3 +-- .../compiler/microjit/CompilationInfoPerThreadBase.cpp | 2 +- runtime/compiler/optimizer/JProfilingBlock.cpp | 3 +-- 7 files changed, 17 insertions(+), 7 deletions(-) diff --git a/runtime/compiler/control/CompilationRuntime.hpp b/runtime/compiler/control/CompilationRuntime.hpp index 7682e2e3282..3608b298ded 100644 --- a/runtime/compiler/control/CompilationRuntime.hpp +++ b/runtime/compiler/control/CompilationRuntime.hpp @@ -707,7 +707,7 @@ class CompilationInfo value = (intptr_t)((trCount << 1) | J9_STARTPC_NOT_TRANSLATED); TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); - *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE; + *extendedFlags = *extendedFlags | J9_MJIT_FAILED_COMPILE; } else { diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index 18f9d89b494..2487745be1d 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -179,7 +179,7 @@ shouldCompileWithMicroJIT(J9Method *method, J9JITConfig *jitConfig, J9VMThread * uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); return (J9_ARE_ALL_BITS_SET(extra, J9_STARTPC_NOT_TRANSLATED) // MicroJIT is not a target for recompilation && !(J9_ROM_METHOD_FROM_RAM_METHOD(method)->modifiers & J9AccNative) // MicroJIT does not compile native methods - && !((*extendedFlags) & J9_MJIT_FAILED_COMPILE)); // MicroJIT failed to compile this method already + && !(*extendedFlags & J9_MJIT_FAILED_COMPILE)); // MicroJIT failed to compile this method already } #endif diff --git a/runtime/compiler/env/VMJ9.cpp b/runtime/compiler/env/VMJ9.cpp index cb1eb5d1721..bb9f75d9196 100644 --- a/runtime/compiler/env/VMJ9.cpp +++ b/runtime/compiler/env/VMJ9.cpp @@ -6680,6 +6680,15 @@ TR_J9VMBase::isGetImplInliningSupported() return jvm->memoryManagerFunctions->j9gc_modron_isFeatureSupported(jvm, j9gc_modron_feature_inline_reference_get) != 0; } +#if defined(J9VM_OPT_MICROJIT) +bool +TR_J9VMBase::isMJITExtendedFlagsMethod(J9Method *method) + { + uint8_t *extendedFlags = fetchMethodExtendedFlagsPointer(method); + return ((*extendedFlags & J9_MJIT_FAILED_COMPILE) != J9_MJIT_FAILED_COMPILE); + } +#endif + /** \brief * Get the raw modifier from the class pointer. * diff --git a/runtime/compiler/env/VMJ9.h b/runtime/compiler/env/VMJ9.h index 9804142df02..44de88a2cb6 100644 --- a/runtime/compiler/env/VMJ9.h +++ b/runtime/compiler/env/VMJ9.h @@ -252,6 +252,9 @@ class TR_J9VMBase : public TR_FrontEnd virtual bool canAllowDifferingNumberOrTypesOfArgsAndParmsInInliner(); ///// virtual bool isGetImplInliningSupported(); +#if defined(J9VM_OPT_MICROJIT) + virtual bool isMJITExtendedFlagsMethod(J9Method *); +#endif virtual uintptr_t getClassDepthAndFlagsValue(TR_OpaqueClassBlock * classPointer); virtual uintptr_t getClassFlagsValue(TR_OpaqueClassBlock * classPointer); diff --git a/runtime/compiler/infra/J9Cfg.cpp b/runtime/compiler/infra/J9Cfg.cpp index dd4cda5e7c9..5e4f6417f19 100644 --- a/runtime/compiler/infra/J9Cfg.cpp +++ b/runtime/compiler/infra/J9Cfg.cpp @@ -1715,8 +1715,7 @@ J9::CFG::propagateFrequencyInfoFromExternalProfiler(TR_ExternalProfiler *profile _externalProfiler = profiler; #if defined(J9VM_OPT_MICROJIT) J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); - uint8_t *extendedFlags = comp()->fej9()->fetchMethodExtendedFlagsPointer(method); - if (TR::Options::getJITCmdLineOptions()->_mjitEnabled && ((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1) + if (TR::Options::getJITCmdLineOptions()->_mjitEnabled && comp()->fej9()->isMJITExtendedFlagsMethod(method) && comp()->getMethodBeingCompiled()->isInterpreted()) { if (!profiler) return; diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp index c159986239b..933648da9f4 100644 --- a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -58,7 +58,7 @@ testMicroJITCompilationForErrors( trCountFromMJIT = (trCountFromMJIT << 1) | 1; TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); - *extendedFlags = (*extendedFlags) | J9_MJIT_FAILED_COMPILE; + *extendedFlags = *extendedFlags | J9_MJIT_FAILED_COMPILE; method->extra = reinterpret_cast(trCountFromMJIT); compiler->failCompilation("Cannot compile with MicroJIT."); } diff --git a/runtime/compiler/optimizer/JProfilingBlock.cpp b/runtime/compiler/optimizer/JProfilingBlock.cpp index 9bc9f79ed0a..07f7489211e 100644 --- a/runtime/compiler/optimizer/JProfilingBlock.cpp +++ b/runtime/compiler/optimizer/JProfilingBlock.cpp @@ -983,8 +983,7 @@ int32_t TR_JProfilingBlock::perform() // TODO: Remove this when ready to implement counter insertion. #if defined(J9VM_OPT_MICROJIT) J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); - uint8_t *extendedFlags = comp()->fej9()->fetchMethodExtendedFlagsPointer(method); - if (TR::Options::getJITCmdLineOptions()->_mjitEnabled && ((*extendedFlags & 0x10) != 0x10) && (uintptr_t)(method->extra) == 0x1) + if (TR::Options::getJITCmdLineOptions()->_mjitEnabled && comp()->fej9()->isMJITExtendedFlagsMethod(method) && comp()->getMethodBeingCompiled()->isInterpreted()) { comp()->setSkippedJProfilingBlock(); return 0; From ba9cdd7fb2d9cadf313f2b9b5bcbf58ac962270d Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Mon, 25 Apr 2022 17:13:31 -0300 Subject: [PATCH 17/25] Removing underscore in non-members of the CFG class and other formatting suggestions Signed-off-by: Harpreet Kaur --- runtime/compiler/infra/J9Cfg.cpp | 166 +++++++----------- .../microjit/CompilationInfoPerThreadBase.cpp | 6 +- .../compiler/microjit/HWMetaDataCreation.cpp | 10 +- 3 files changed, 75 insertions(+), 107 deletions(-) diff --git a/runtime/compiler/infra/J9Cfg.cpp b/runtime/compiler/infra/J9Cfg.cpp index 5e4f6417f19..da689a2e6c2 100644 --- a/runtime/compiler/infra/J9Cfg.cpp +++ b/runtime/compiler/infra/J9Cfg.cpp @@ -842,8 +842,8 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() TR::StackMemoryRegion stackMemoryRegion(*trMemory()); int32_t numBlocks = getNextNodeNumber(); - TR_BitVector *_seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - TR_BitVector *_seenNodesInCycle = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + TR_BitVector *seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + TR_BitVector *seenNodesInCycle = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); _frequencySet = new (trHeapMemory()) TR_BitVector(numBlocks, trMemory(), heapAlloc, notGrowable); int32_t startFrequency = AVG_FREQ; int32_t taken = AVG_FREQ; @@ -866,11 +866,11 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) { - if (_seenNodes->isSet(temp->getNumber())) + if (seenNodes->isSet(temp->getNumber())) break; upStack.add(temp); - _seenNodes->set(temp->getNumber()); + seenNodes->set(temp->getNumber()); if (!backEdgeExists && !temp->getPredecessors().empty() && !(temp->getPredecessors().size() == 1)) { @@ -980,7 +980,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() else self()->setUniformEdgeFrequenciesOnNode( temp, startFrequency, false, comp()); setBlockFrequency (temp, startFrequency); - _seenNodes->set(temp->getNumber()); + seenNodes->set(temp->getNumber()); if (comp()->getOption(TR_TraceBFGeneration)) dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); @@ -1001,10 +1001,10 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() if (comp()->getOption(TR_TraceBFGeneration)) traceMsg(comp(), "Considering block_%d\n", node->getNumber()); - if (_seenNodes->isSet(node->getNumber())) + if (seenNodes->isSet(node->getNumber())) continue; - _seenNodes->set(node->getNumber()); + seenNodes->set(node->getNumber()); if (!node->asBlock()->getEntry()) continue; @@ -1037,7 +1037,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() if (!isVirtualGuard(node->asBlock()->getLastRealTreeTop()->getNode()) && !node->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) { - _seenNodesInCycle->empty(); + seenNodesInCycle->empty(); getInterpreterProfilerBranchCountersOnDoubleton(node, &taken, ¬taken); if ((taken <= 0) && (nottaken <= 0)) @@ -1077,7 +1077,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() { TR::CFGNode *pred = edge->getFrom(); - if (pred->getFrequency()< 0) + if (pred->getFrequency() < 0) { predNotSet = pred; break; @@ -1085,16 +1085,16 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() } if (predNotSet && - !_seenNodesInCycle->get(node->getNumber())) + !seenNodesInCycle->get(node->getNumber())) { stack.add(predNotSet); - _seenNodesInCycle->set(node->getNumber()); - _seenNodes->reset(node->getNumber()); + seenNodesInCycle->set(node->getNumber()); + seenNodes->reset(node->getNumber()); continue; } if (!predNotSet) - _seenNodesInCycle->empty(); + seenNodesInCycle->empty(); int32_t sumFreq = summarizeFrequencyFromPredecessors(node, self()); if (sumFreq <= 0) @@ -1113,7 +1113,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() (node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) { - _seenNodesInCycle->empty(); + seenNodesInCycle->empty(); int32_t sumFreq = _externalProfiler->getSumSwitchCount(node->asBlock()->getLastRealTreeTop()->getNode(), comp()); setSwitchEdgeFrequenciesOnNode(node, comp()); setBlockFrequency (node, sumFreq); @@ -1132,28 +1132,28 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() { TR::CFGNode *pred = edge->getFrom(); - if (pred->getFrequency()< 0) + if (pred->getFrequency() < 0) { predNotSet = pred; break; } - int32_t edgeFreq = edge->getFrequency(); - sumFreq += edgeFreq; - } + int32_t edgeFreq = edge->getFrequency(); + sumFreq += edgeFreq; + } if (predNotSet && - !_seenNodesInCycle->get(node->getNumber())) + !seenNodesInCycle->get(node->getNumber())) { - _seenNodesInCycle->set(node->getNumber()); - _seenNodes->reset(node->getNumber()); + seenNodesInCycle->set(node->getNumber()); + seenNodes->reset(node->getNumber()); stack.add(predNotSet); continue; } else { if (!predNotSet) - _seenNodesInCycle->empty(); + seenNodesInCycle->empty(); if (comp()->getOption(TR_TraceBFGeneration)) traceMsg(comp(), "2Setting block and uniform freqs\n"); @@ -1179,7 +1179,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() { TR::CFGNode *succ = edge->getTo(); - if (!_seenNodes->isSet(succ->getNumber())) + if (!seenNodes->isSet(succ->getNumber())) stack.add(succ); else { @@ -1197,12 +1197,12 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() if (succ->getSuccessors().size() == 1) { TR::CFGNode *tempNode = succ->getSuccessors().front()->getTo(); - TR_BitVector *_seenGotoNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - _seenGotoNodes->set(succ->getNumber()); + TR_BitVector *seenGotoNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + seenGotoNodes->set(succ->getNumber()); while ((tempNode->getSuccessors().size() == 1) && (tempNode != succ) && (tempNode != getEnd()) && - !_seenGotoNodes->isSet(tempNode->getNumber())) + !seenGotoNodes->isSet(tempNode->getNumber())) { TR::CFGNode *nextTempNode = tempNode->getSuccessors().front()->getTo(); @@ -1212,7 +1212,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() self()->setUniformEdgeFrequenciesOnNode( tempNode, edge->getFrequency(), true, comp()); setBlockFrequency (tempNode, edge->getFrequency(), true); - _seenGotoNodes->set(tempNode->getNumber()); + seenGotoNodes->set(tempNode->getNumber()); tempNode = nextTempNode; } } @@ -1246,9 +1246,9 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() for (node = getFirstNode(); node; node=node->getNext()) { if ((node == getEnd()) || (node == start)) - node->asBlock()->setFrequency(0); + node->asBlock()->setFrequency(0); - if (_seenNodes->isSet(node->getNumber())) + if (seenNodes->isSet(node->getNumber())) continue; if (node->asBlock()->getEntry() && @@ -1287,7 +1287,7 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfiler(TR::Compilation *co TR::StackMemoryRegion stackMemoryRegion(*trMemory()); int32_t numBlocks = getNextNodeNumber(); - TR_BitVector *_seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + TR_BitVector *seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); _frequencySet = new (trHeapMemory()) TR_BitVector(numBlocks, trMemory(), heapAlloc, notGrowable); int32_t startFrequency = AVG_FREQ; int32_t taken = AVG_FREQ; @@ -1309,10 +1309,10 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfiler(TR::Compilation *co temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) { - if (_seenNodes->isSet(temp->getNumber())) + if (seenNodes->isSet(temp->getNumber())) break; - _seenNodes->set(temp->getNumber()); + seenNodes->set(temp->getNumber()); if (!temp->getPredecessors().empty() && !(temp->getPredecessors().size() == 1)) backEdgeExists = true; @@ -1799,7 +1799,7 @@ J9::CFG::getBytecodeFromIndex(int32_t index) TR_J9ByteCode J9::CFG::getLastRealBytecodeOfBlock(TR::CFGNode *start) { - return this->getBytecodeFromIndex(start->asBlock()->getExit()->getNode()->getLocalIndex()); + return getBytecodeFromIndex(start->asBlock()->getExit()->getNode()->getLocalIndex()); } void @@ -1848,29 +1848,6 @@ J9::CFG::getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(TR::CFGNode *cf else *nottaken = LOW_FREQ; - /*int32_t sumFreq = summarizeFrequencyFromPredecessors(cfgNode, this); - - if (sumFreq>0) - { - *nottaken = sumFreq>>1; - *taken = *nottaken; - } - else - { - if (node->getByteCodeIndex()==0) - { - int32_t callCount = getParentCallCount(this, node); - - // we don't know what the call count is, assign some - // moderate frequency - if (callCount<=0) - callCount = AVG_FREQ; - - *taken = 0; - *nottaken = callCount; - } - } - */ if (comp()->getOption(TR_TraceBFGeneration)) dumpOptDetails(comp(), "If with no profiling information on node %p has low branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); } @@ -1888,7 +1865,7 @@ J9::CFG::setSwitchEdgeFrequenciesOnNodeMicroJIT(TR::CFGNode *node, TR::Compilati if (comp->getOption(TR_TraceBFGeneration)) dumpOptDetails(comp, "Low count switch I'll set frequencies using uniform edge distribution\n"); - self()->setUniformEdgeFrequenciesOnNode (node, sumFrequency, false, comp); + self()->setUniformEdgeFrequenciesOnNode(node, sumFrequency, false, comp); return; } @@ -1922,15 +1899,6 @@ J9::CFG::setSwitchEdgeFrequenciesOnNodeMicroJIT(TR::CFGNode *node, TR::Compilati } } -// TODO: Uncomment this method and implement it using an updated traversal order call that won't look at real tree tops. -/* -TR_BitVector * -J9::CFG::setBlockAndEdgeFrequenciesBasedOnJITProfilerMicroJIT() - { - // Use the setBlockAndEdgeFrequenciesBasedOnJITProfiler above as a guide should this method need to be implemented. - } -*/ - bool isBranch(TR_J9ByteCode bc) { @@ -1972,11 +1940,11 @@ isTableOp(TR_J9ByteCode bc) } bool -J9::CFG::continueLoop(TR::CFGNode *temp) // TODO: Find a better name for this method +J9::CFG::continueLoop(TR::CFGNode *temp) { if ((temp->getSuccessors().size() == 1) && !temp->asBlock()->getEntry()) { - TR_J9ByteCode bc = this->getLastRealBytecodeOfBlock(temp); + TR_J9ByteCode bc = getLastRealBytecodeOfBlock(temp); return !(isBranch(bc) || isTableOp(bc)); } return false; @@ -1988,8 +1956,8 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() TR::StackMemoryRegion stackMemoryRegion(*trMemory()); int32_t numBlocks = getNextNodeNumber(); - TR_BitVector *_seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - TR_BitVector *_seenNodesInCycle = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + TR_BitVector *seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + TR_BitVector *seenNodesInCycle = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); _frequencySet = new (trHeapMemory()) TR_BitVector(numBlocks, trMemory(), heapAlloc, notGrowable); int32_t startFrequency = AVG_FREQ; int32_t taken = AVG_FREQ; @@ -2007,11 +1975,11 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() //_maxFrequency = 0; while (continueLoop(temp)) { - if (_seenNodes->isSet(temp->getNumber())) + if (seenNodes->isSet(temp->getNumber())) break; upStack.add(temp); - _seenNodes->set(temp->getNumber()); + seenNodes->set(temp->getNumber()); if (!backEdgeExists && !temp->getPredecessors().empty() && !(temp->getPredecessors().size() == 1)) { @@ -2065,12 +2033,12 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() temp->asBlock()->getEntry() && isBranch(getLastRealBytecodeOfBlock(temp))) { - getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken); + getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(temp, &taken, ¬taken); startFrequency = taken + nottaken; self()->setEdgeFrequenciesOnNode(temp, taken, nottaken, comp()); } else if (temp->asBlock()->getEntry() && - isTableOp(this->getBytecodeFromIndex(temp->asBlock()->getExit()->getNode()->getLocalIndex()))) + isTableOp(getBytecodeFromIndex(temp->asBlock()->getExit()->getNode()->getLocalIndex()))) { startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getExit()->getNode(), comp()); if (comp()->getOption(TR_TraceBFGeneration)) @@ -2115,7 +2083,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() else self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); setBlockFrequency(temp, startFrequency); - _seenNodes->set(temp->getNumber()); + seenNodes->set(temp->getNumber()); if (comp()->getOption(TR_TraceBFGeneration)) dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); @@ -2136,10 +2104,10 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if (comp()->getOption(TR_TraceBFGeneration)) traceMsg(comp(), "Considering block_%d\n", node->getNumber()); - if (_seenNodes->isSet(node->getNumber())) + if (seenNodes->isSet(node->getNumber())) continue; - _seenNodes->set(node->getNumber()); + seenNodes->set(node->getNumber()); if (!node->asBlock()->getEntry()) continue; @@ -2172,8 +2140,8 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if (!isVirtualGuard(node->asBlock()->getLastRealTreeTop()->getNode()) && !node->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) { - _seenNodesInCycle->empty(); - getInterpreterProfilerBranchCountersOnDoubleton(node, &taken, ¬taken); + seenNodesInCycle->empty(); + getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(node, &taken, ¬taken); if ((taken <= 0) && (nottaken <= 0)) { @@ -2212,7 +2180,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() { TR::CFGNode *pred = edge->getFrom(); - if (pred->getFrequency()< 0) + if (pred->getFrequency() < 0) { predNotSet = pred; break; @@ -2220,16 +2188,16 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() } if (predNotSet && - !_seenNodesInCycle->get(node->getNumber())) + !seenNodesInCycle->get(node->getNumber())) { stack.add(predNotSet); - _seenNodesInCycle->set(node->getNumber()); - _seenNodes->reset(node->getNumber()); + seenNodesInCycle->set(node->getNumber()); + seenNodes->reset(node->getNumber()); continue; } if (!predNotSet) - _seenNodesInCycle->empty(); + seenNodesInCycle->empty(); int32_t sumFreq = summarizeFrequencyFromPredecessors(node, self()); if (sumFreq <= 0) @@ -2248,7 +2216,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() (node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) { - _seenNodesInCycle->empty(); + seenNodesInCycle->empty(); int32_t sumFreq = _externalProfiler->getSumSwitchCount(node->asBlock()->getLastRealTreeTop()->getNode(), comp()); setSwitchEdgeFrequenciesOnNode(node, comp()); setBlockFrequency(node, sumFreq); @@ -2278,17 +2246,17 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() } if (predNotSet && - !_seenNodesInCycle->get(node->getNumber())) + !seenNodesInCycle->get(node->getNumber())) { - _seenNodesInCycle->set(node->getNumber()); - _seenNodes->reset(node->getNumber()); + seenNodesInCycle->set(node->getNumber()); + seenNodes->reset(node->getNumber()); stack.add(predNotSet); continue; } else { if (!predNotSet) - _seenNodesInCycle->empty(); + seenNodesInCycle->empty(); if (comp()->getOption(TR_TraceBFGeneration)) traceMsg(comp(), "2Setting block and uniform freqs\n"); @@ -2314,7 +2282,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() { TR::CFGNode *succ = edge->getTo(); - if (!_seenNodes->isSet(succ->getNumber())) + if (!seenNodes->isSet(succ->getNumber())) stack.add(succ); else { @@ -2332,12 +2300,12 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if (succ->getSuccessors().size() == 1) { TR::CFGNode *tempNode = succ->getSuccessors().front()->getTo(); - TR_BitVector *_seenGotoNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - _seenGotoNodes->set(succ->getNumber()); + TR_BitVector *seenGotoNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + seenGotoNodes->set(succ->getNumber()); while ((tempNode->getSuccessors().size() == 1) && (tempNode != succ) && (tempNode != getEnd()) && - !_seenGotoNodes->isSet(tempNode->getNumber())) + !seenGotoNodes->isSet(tempNode->getNumber())) { TR::CFGNode *nextTempNode = tempNode->getSuccessors().front()->getTo(); @@ -2347,7 +2315,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() self()->setUniformEdgeFrequenciesOnNode(tempNode, edge->getFrequency(), true, comp()); setBlockFrequency(tempNode, edge->getFrequency(), true); - _seenGotoNodes->set(tempNode->getNumber()); + seenGotoNodes->set(tempNode->getNumber()); tempNode = nextTempNode; } } @@ -2383,7 +2351,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() if ((node == getEnd()) || (node == start)) node->asBlock()->setFrequency(0); - if (_seenNodes->isSet(node->getNumber())) + if (seenNodes->isSet(node->getNumber())) continue; if (node->asBlock()->getEntry() && @@ -2421,7 +2389,7 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compila TR::StackMemoryRegion stackMemoryRegion(*trMemory()); int32_t numBlocks = getNextNodeNumber(); - TR_BitVector *_seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); + TR_BitVector *seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); _frequencySet = new (trHeapMemory()) TR_BitVector(numBlocks, trMemory(), heapAlloc, notGrowable); int32_t startFrequency = AVG_FREQ; int32_t taken = AVG_FREQ; @@ -2443,10 +2411,10 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compila temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) { - if (_seenNodes->isSet(temp->getNumber())) + if (seenNodes->isSet(temp->getNumber())) break; - _seenNodes->set(temp->getNumber()); + seenNodes->set(temp->getNumber()); if (!temp->getPredecessors().empty() && !(temp->getPredecessors().size() == 1)) backEdgeExists = true; @@ -2479,7 +2447,7 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compila temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() && !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) { - getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken); + getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(temp, &taken, ¬taken); if ((taken <= 0) && (nottaken <= 0)) { taken = LOW_FREQ; diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp index 933648da9f4..c8983cbcd2c 100644 --- a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -109,7 +109,7 @@ TR::CompilationInfoPerThreadBase::mjit( TRIGGER_J9HOOK_JIT_COMPILING_START(_jitConfig->hookInterface, vmThread, method); // BEGIN MICROJIT - TR::FilePointer *logFileFP = this->getCompilation()->getOutFile(); + TR::FilePointer *logFileFP = getCompilation()->getOutFile(); MJIT::ByteCodeIterator bcIterator(0, static_cast (compilee), static_cast (&vm), comp()); if (TR::Options::canJITCompile()) @@ -236,8 +236,8 @@ TR::CompilationInfoPerThreadBase::mjit( // GENERATE BODY bcIterator.setIndex(0); - TR_Debug dbg(this->getCompilation()); - this->getCompilation()->setDebug(&dbg); + TR_Debug dbg(getCompilation()); + getCompilation()->setDebug(&dbg); if (compiler->getOption(TR_TraceCG)) trfprintf(logFileFP, "\n%s\n", signature); diff --git a/runtime/compiler/microjit/HWMetaDataCreation.cpp b/runtime/compiler/microjit/HWMetaDataCreation.cpp index d8cffe794d5..883c8086564 100644 --- a/runtime/compiler/microjit/HWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/HWMetaDataCreation.cpp @@ -285,11 +285,11 @@ class BlockList inline void insertAfter(BlockList *bl) { MJIT_ASSERT_NO_MSG(bl); - bl->_next = this->_next; + bl->_next = _next; bl->_prev = this; - if (this->_next) - this->_next->_prev = bl; - this->_next = bl; + if (_next) + _next->_prev = bl; + _next = bl; if (this == *_tail) *_tail = bl; } @@ -347,7 +347,7 @@ class BlockList // The target isn't before this block, isn't this block, isn't in this block, and there is no more list to search. // Time to make a new block and add it to the list BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); - this->insertAfter(target); + insertAfter(target); return target; } From 70a9123a8224a6d1202453fd3e29c8207d2976c3 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Thu, 28 Apr 2022 15:25:51 -0300 Subject: [PATCH 18/25] Moving methods to TR_J9ByteCodeIterator class Signed-off-by: Harpreet Kaur --- .../compiler/control/CompilationThread.cpp | 1 - runtime/compiler/ilgen/J9ByteCodeIterator.cpp | 24 ++++++++ runtime/compiler/ilgen/J9ByteCodeIterator.hpp | 3 + .../compiler/microjit/ByteCodeIterator.hpp | 61 ------------------- .../microjit/CompilationInfoPerThreadBase.cpp | 2 +- .../compiler/microjit/HWMetaDataCreation.cpp | 2 +- .../microjit/MethodBytecodeMetaData.hpp | 3 +- .../microjit/x/amd64/AMD64Codegen.cpp | 18 +++--- .../microjit/x/amd64/AMD64Codegen.hpp | 19 +++--- 9 files changed, 48 insertions(+), 85 deletions(-) delete mode 100644 runtime/compiler/microjit/ByteCodeIterator.hpp diff --git a/runtime/compiler/control/CompilationThread.cpp b/runtime/compiler/control/CompilationThread.cpp index 2487745be1d..60f50a3f749 100644 --- a/runtime/compiler/control/CompilationThread.cpp +++ b/runtime/compiler/control/CompilationThread.cpp @@ -99,7 +99,6 @@ #if defined(J9VM_OPT_MICROJIT) #include "microjit/x/amd64/AMD64Codegen.hpp" #include "microjit/x/amd64/AMD64CodegenGC.hpp" -#include "microjit/ByteCodeIterator.hpp" #include "microjit/SideTables.hpp" #include "microjit/utils.hpp" #include "codegen/OMRLinkage_inlines.hpp" diff --git a/runtime/compiler/ilgen/J9ByteCodeIterator.cpp b/runtime/compiler/ilgen/J9ByteCodeIterator.cpp index 76e5dd027ec..f99b176cd70 100644 --- a/runtime/compiler/ilgen/J9ByteCodeIterator.cpp +++ b/runtime/compiler/ilgen/J9ByteCodeIterator.cpp @@ -310,6 +310,30 @@ TR_J9ByteCodeIterator::printByteCode() } } +void +TR_J9ByteCodeIterator::printByteCodes() + { + printByteCodePrologue(); + for (TR_J9ByteCode bc = first(); bc != J9BCunknown; bc = next()) + { + printByteCode(); + } + printByteCodeEpilogue(); + } + +const char * +TR_J9ByteCodeIterator::currentMnemonic() + { + uint8_t opcode = nextByte(0); + return fe()->getByteCodeName(opcode); + } + +uint8_t +TR_J9ByteCodeIterator::currentOpcode() + { + return nextByte(0); + } + const TR_J9ByteCode TR_J9ByteCodeIterator::_opCodeToByteCodeEnum[] = { /* 0 */ J9BCnop, diff --git a/runtime/compiler/ilgen/J9ByteCodeIterator.hpp b/runtime/compiler/ilgen/J9ByteCodeIterator.hpp index 9607d0b7928..95ee99c631a 100644 --- a/runtime/compiler/ilgen/J9ByteCodeIterator.hpp +++ b/runtime/compiler/ilgen/J9ByteCodeIterator.hpp @@ -158,6 +158,9 @@ class TR_J9ByteCodeIterator : public TR_ByteCodeIteratorprintByteCodePrologue(); - for (TR_J9ByteCode bc = this->first(); bc != J9BCunknown; bc = this->next()) - { - this->printByteCode(); - } - this->printByteCodeEpilogue(); - } - - const char *currentMnemonic() - { - uint8_t opcode = nextByte(0); - return ((TR_J9VMBase *)fe())->getByteCodeName(opcode); - } - - uint8_t currentOpcode() - { - return nextByte(0); - } - }; - -} // namespace MJIT -#endif /* MJIT_BYTECODEITERATOR_INCL */ diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp index c8983cbcd2c..b6afea53e35 100644 --- a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -110,7 +110,7 @@ TR::CompilationInfoPerThreadBase::mjit( // BEGIN MICROJIT TR::FilePointer *logFileFP = getCompilation()->getOutFile(); - MJIT::ByteCodeIterator bcIterator(0, static_cast (compilee), static_cast (&vm), comp()); + TR_J9ByteCodeIterator bcIterator(0, static_cast (compilee), static_cast (&vm), comp()); if (TR::Options::canJITCompile()) { diff --git a/runtime/compiler/microjit/HWMetaDataCreation.cpp b/runtime/compiler/microjit/HWMetaDataCreation.cpp index 883c8086564..c40ed9b2ec7 100644 --- a/runtime/compiler/microjit/HWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/HWMetaDataCreation.cpp @@ -509,7 +509,7 @@ static TR::Optimizer *createOptimizer(TR::Compilation *comp, TR::ResolvedMethodS MJIT::InternalMetaData createInternalMethodMetadata( - MJIT::ByteCodeIterator *bci, + TR_J9ByteCodeIterator *bci, MJIT::LocalTableEntry *localTableEntries, U_16 entries, int32_t offsetToFirstLocal, diff --git a/runtime/compiler/microjit/MethodBytecodeMetaData.hpp b/runtime/compiler/microjit/MethodBytecodeMetaData.hpp index 347b67e69aa..ec28eb8418f 100644 --- a/runtime/compiler/microjit/MethodBytecodeMetaData.hpp +++ b/runtime/compiler/microjit/MethodBytecodeMetaData.hpp @@ -23,7 +23,6 @@ #define MJIT_METHOD_BYTECODE_META_DATA #include "ilgen/J9ByteCodeIterator.hpp" -#include "microjit/ByteCodeIterator.hpp" #include "microjit/SideTables.hpp" namespace MJIT @@ -55,7 +54,7 @@ class InternalMetaData MJIT::InternalMetaData createInternalMethodMetadata( - MJIT::ByteCodeIterator *bci, + TR_J9ByteCodeIterator *bci, MJIT::LocalTableEntry *localTableEntries, U_16 entries, int32_t offsetToFirstLocal, diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp index 82cdb53a153..f67614a7c15 100755 --- a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp @@ -1437,7 +1437,7 @@ MJIT::CodeGenerator::generatePrologue( char *first2BytesPatchLocation, char *samplingRecompileCallLocation, char **firstInstLocation, - MJIT::ByteCodeIterator *bci) + TR_J9ByteCodeIterator *bci) { buffer_size_t prologueSize = 0; uintptr_t prologueStart = (uintptr_t)buffer; @@ -1673,7 +1673,7 @@ MJIT::CodeGenerator::generateEpologue(char *buffer) } buffer_size_t -MJIT::CodeGenerator::generateBody(char *buffer, MJIT::ByteCodeIterator *bci) +MJIT::CodeGenerator::generateBody(char *buffer, TR_J9ByteCodeIterator *bci) { buffer_size_t codeGenSize = 0; buffer_size_t calledCGSize = 0; @@ -2379,7 +2379,7 @@ MJIT::CodeGenerator::generateReturn(char *buffer, TR::DataType dt, struct TR::X8 } buffer_size_t -MJIT::CodeGenerator::generateLoad(char *buffer, TR_J9ByteCode bc, MJIT::ByteCodeIterator *bci) +MJIT::CodeGenerator::generateLoad(char *buffer, TR_J9ByteCode bc, TR_J9ByteCodeIterator *bci) { char *signature = _compilee->signatureChars(); int index = -1; @@ -2463,7 +2463,7 @@ MJIT::CodeGenerator::generateLoad(char *buffer, TR_J9ByteCode bc, MJIT::ByteCode } buffer_size_t -MJIT::CodeGenerator::generateStore(char *buffer, TR_J9ByteCode bc, MJIT::ByteCodeIterator *bci) +MJIT::CodeGenerator::generateStore(char *buffer, TR_J9ByteCode bc, TR_J9ByteCodeIterator *bci) { char *signature = _compilee->signatureChars(); int index = -1; @@ -2702,7 +2702,7 @@ MJIT::CodeGenerator::generateArgumentMoveForStaticMethod( } buffer_size_t -MJIT::CodeGenerator::generateInvokeStatic(char *buffer, MJIT::ByteCodeIterator *bci, struct TR::X86LinkageProperties *properties) +MJIT::CodeGenerator::generateInvokeStatic(char *buffer, TR_J9ByteCodeIterator *bci, struct TR::X86LinkageProperties *properties) { buffer_size_t invokeStaticSize = 0; int32_t cpIndex = (int32_t)bci->next2Bytes(); @@ -2838,7 +2838,7 @@ MJIT::CodeGenerator::generateInvokeStatic(char *buffer, MJIT::ByteCodeIterator * } buffer_size_t -MJIT::CodeGenerator::generateGetField(char *buffer, MJIT::ByteCodeIterator *bci) +MJIT::CodeGenerator::generateGetField(char *buffer, TR_J9ByteCodeIterator *bci) { buffer_size_t getFieldSize = 0; int32_t cpIndex = (int32_t)bci->next2Bytes(); @@ -2905,7 +2905,7 @@ MJIT::CodeGenerator::generateGetField(char *buffer, MJIT::ByteCodeIterator *bci) } buffer_size_t -MJIT::CodeGenerator::generatePutField(char *buffer, MJIT::ByteCodeIterator *bci) +MJIT::CodeGenerator::generatePutField(char *buffer, TR_J9ByteCodeIterator *bci) { buffer_size_t putFieldSize = 0; int32_t cpIndex = (int32_t)bci->next2Bytes(); @@ -2972,7 +2972,7 @@ MJIT::CodeGenerator::generatePutField(char *buffer, MJIT::ByteCodeIterator *bci) } buffer_size_t -MJIT::CodeGenerator::generateGetStatic(char *buffer, MJIT::ByteCodeIterator *bci) +MJIT::CodeGenerator::generateGetStatic(char *buffer, TR_J9ByteCodeIterator *bci) { buffer_size_t getStaticSize = 0; int32_t cpIndex = (int32_t)bci->next2Bytes(); @@ -3044,7 +3044,7 @@ MJIT::CodeGenerator::generateGetStatic(char *buffer, MJIT::ByteCodeIterator *bci } buffer_size_t -MJIT::CodeGenerator::generatePutStatic(char *buffer, MJIT::ByteCodeIterator *bci) +MJIT::CodeGenerator::generatePutStatic(char *buffer, TR_J9ByteCodeIterator *bci) { buffer_size_t putStaticSize = 0; int32_t cpIndex = (int32_t)bci->next2Bytes(); diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp index 12e0dde6ddf..cb07b272a82 100644 --- a/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.hpp @@ -27,7 +27,6 @@ #include "microjit/utils.hpp" #include "microjit/x/amd64/AMD64Linkage.hpp" #include "microjit/x/amd64/AMD64CodegenGC.hpp" -#include "microjit/ByteCodeIterator.hpp" #include "control/RecompilationInfo.hpp" #include "env/IO.hpp" #include "env/TRMemory.hpp" @@ -147,7 +146,7 @@ class CodeGenerator generateLoad( char *buffer, TR_J9ByteCode bc, - MJIT::ByteCodeIterator *bci); + TR_J9ByteCodeIterator *bci); /** * Generates a store instruction based on the type. @@ -161,7 +160,7 @@ class CodeGenerator generateStore( char *buffer, TR_J9ByteCode bc, - MJIT::ByteCodeIterator *bci); + TR_J9ByteCodeIterator *bci); /** * Generates argument moves for static method invocation. @@ -191,7 +190,7 @@ class CodeGenerator buffer_size_t generateInvokeStatic( char *buffer, - MJIT::ByteCodeIterator *bci, + TR_J9ByteCodeIterator *bci, struct TR::X86LinkageProperties *properties); /** @@ -204,7 +203,7 @@ class CodeGenerator buffer_size_t generateGetField( char *buffer, - MJIT::ByteCodeIterator *bci); + TR_J9ByteCodeIterator *bci); /** * Generates a putField instruction based on the type. @@ -216,7 +215,7 @@ class CodeGenerator buffer_size_t generatePutField( char *buffer, - MJIT::ByteCodeIterator *bci); + TR_J9ByteCodeIterator *bci); /** * Generates a putStatic instruction based on the type. @@ -228,7 +227,7 @@ class CodeGenerator buffer_size_t generatePutStatic( char *buffer, - MJIT::ByteCodeIterator *bci); + TR_J9ByteCodeIterator *bci); /** * Generates a getStatic instruction based on the type. @@ -240,7 +239,7 @@ class CodeGenerator buffer_size_t generateGetStatic( char *buffer, - MJIT::ByteCodeIterator *bci); + TR_J9ByteCodeIterator *bci); /** * Generates a return instruction based on the type. @@ -334,7 +333,7 @@ class CodeGenerator char *first2BytesPatchLocation, char *samplingRecompileCallLocation, char** firstInstLocation, - MJIT::ByteCodeIterator *bci); + TR_J9ByteCodeIterator *bci); /** * Generates the cold area used for helpers, such as the jit stack overflow checker. @@ -360,7 +359,7 @@ class CodeGenerator buffer_size_t generateBody( char *buffer, - MJIT::ByteCodeIterator *bci); + TR_J9ByteCodeIterator *bci); /** * Generates a profiling counter for an edge. From 312a50369324f70a14ae3c6fce96cddd2aa76d39 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Thu, 28 Apr 2022 16:46:50 -0300 Subject: [PATCH 19/25] Passing reason to failCompilation and using reference to TR_J9VMBase Signed-off-by: Harpreet Kaur --- .../microjit/CompilationInfoPerThreadBase.cpp | 28 +++++++++---------- runtime/oti/j9consts.h | 2 +- 2 files changed, 14 insertions(+), 16 deletions(-) diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp index b6afea53e35..55bc23a9701 100644 --- a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -49,7 +49,8 @@ testMicroJITCompilationForErrors( J9Method *method, J9JITConfig *jitConfig, J9VMThread *vmThread, - TR::Compilation *compiler) + TR::Compilation *compiler, + const char *reason) { if (0 == code_size) { @@ -60,7 +61,7 @@ testMicroJITCompilationForErrors( uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); *extendedFlags = *extendedFlags | J9_MJIT_FAILED_COMPILE; method->extra = reinterpret_cast(trCountFromMJIT); - compiler->failCompilation("Cannot compile with MicroJIT."); + compiler->failCompilation("Cannot compile with MicroJIT because %s.", reason); } } @@ -89,7 +90,7 @@ TR::CompilationInfoPerThreadBase::mjit( ++_compInfo._numAsyncCompilations; if (_methodBeingCompiled->isDLTCompile()) - testMicroJITCompilationForErrors(0, method, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors(0, method, jitConfig, vmThread, compiler, "MicroJIT does not support DLT compilations"); bool volatile haveLockedClassUnloadMonitor = false; // used for compilation without VM access int32_t estimated_size = 0; @@ -102,8 +103,7 @@ TR::CompilationInfoPerThreadBase::mjit( TR::IlGeneratorMethodDetails & details = _methodBeingCompiled->getMethodDetails(); method = details.getMethod(); - TR_J9VMBase *fe = TR_J9VMBase::get(jitConfig, vmThread); // TODO: fe can be replaced by vm and removed from here. - uint8_t *extendedFlags = fe->fetchMethodExtendedFlagsPointer(method); + uint8_t *extendedFlags = static_cast (&vm)->fetchMethodExtendedFlagsPointer(method); const char* signature = compilee->signature(compiler->trMemory()); TRIGGER_J9HOOK_JIT_COMPILING_START(_jitConfig->hookInterface, vmThread, method); @@ -121,7 +121,7 @@ TR::CompilationInfoPerThreadBase::mjit( if (compiler->getOption(TR_TraceCG)) trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiling %s...\n------------------------------\n", signature); if ((compilee->isConstructor())) - testMicroJITCompilationForErrors(0, method, jitConfig, vmThread, compiler); // "MicroJIT does not currently compile constructors" + testMicroJITCompilationForErrors(0, method, jitConfig, vmThread, compiler, "MicroJIT does not currently compile constructors"); bool isStaticMethod = compilee->isStatic(); // To know if the method we are compiling is static or not if (!isStaticMethod) @@ -153,7 +153,7 @@ TR::CompilationInfoPerThreadBase::mjit( int error_code = 0; MJIT::RegisterStack stack = MJIT::mapIncomingParams(typeString, maxLength, &error_code, paramTableEntries, actualParamCount, logFileFP); if (error_code) - testMicroJITCompilationForErrors(0, method, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors(0, method, jitConfig, vmThread, compiler, "mapping parameters failed"); MJIT::ParamTable paramTable(paramTableEntries, paramCount, actualParamCount, &stack); #define MAX_INSTRUCTION_SIZE 15 @@ -170,7 +170,7 @@ TR::CompilationInfoPerThreadBase::mjit( MJIT::CodeGenerator mjit_cg(_jitConfig, vmThread, logFileFP, vm, ¶mTable, compiler, &mjitCGGC, comp()->getPersistentInfo(), trMemory, compilee); estimated_size = MAX_BUFFER_SIZE(maxBCI); buffer = (char*)mjit_cg.allocateCodeCache(estimated_size, &vm, vmThread); - testMicroJITCompilationForErrors((uintptr_t)buffer, method, jitConfig, vmThread, compiler); // MicroJIT cannot compile if allocation fails. + testMicroJITCompilationForErrors((uintptr_t)buffer, method, jitConfig, vmThread, compiler, "code cache allocation failed"); codeCache = mjit_cg.getCodeCache(); // provide enough space for CodeCacheMethodHeader @@ -180,8 +180,7 @@ TR::CompilationInfoPerThreadBase::mjit( char *magicWordLocation, *first2BytesPatchLocation, *samplingRecompileCallLocation; buffer_size_t code_size = mjit_cg.generatePrePrologue(cursor, method, &magicWordLocation, &first2BytesPatchLocation, &samplingRecompileCallLocation); - - testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler, "generatePrePrologue failed"); compiler->cg()->setPrePrologueSize(code_size); buffer_size += code_size; @@ -212,7 +211,7 @@ TR::CompilationInfoPerThreadBase::mjit( char *firstInstructionLocation = NULL; code_size = mjit_cg.generatePrologue(cursor, method, &jitStackOverflowPatchLocation, magicWordLocation, first2BytesPatchLocation, samplingRecompileCallLocation, &firstInstructionLocation, &bcIterator); - testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler, "generatePrologue failed"); TR::GCStackAtlas *atlas = mjit_cg.getStackAtlas(); compiler->cg()->setStackAtlas(atlas); @@ -243,8 +242,7 @@ TR::CompilationInfoPerThreadBase::mjit( trfprintf(logFileFP, "\n%s\n", signature); code_size = mjit_cg.generateBody(cursor, &bcIterator); - - testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler, "generateBody failed"); buffer_size += code_size; MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after body"); @@ -260,8 +258,8 @@ TR::CompilationInfoPerThreadBase::mjit( // GENERATE COLD AREA code_size = mjit_cg.generateColdArea(cursor, method, jitStackOverflowPatchLocation); + testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler, "generateColdArea failed"); - testMicroJITCompilationForErrors((uintptr_t)code_size, method, jitConfig, vmThread, compiler); buffer_size += code_size; MJIT_ASSERT(logFileFP, buffer_size < MAX_BUFFER_SIZE(maxBCI), "Buffer overflow after cold-area"); @@ -302,7 +300,7 @@ TR::CompilationInfoPerThreadBase::mjit( TR_VerboseLog::writeLineLocked(TR_Vlog_DISPATCH, "Successfully created metadata [" POINTER_PRINTF_FORMAT "] for %s @ %s", metaData, compiler->signature(), compiler->getHotnessName()); setMetadata(metaData); uint8_t *warmMethodHeader = compiler->cg()->getBinaryBufferStart() - sizeof(OMR::CodeCacheMethodHeader); - memcpy(warmMethodHeader + offsetof(OMR::CodeCacheMethodHeader, _metaData), &metaData, sizeof(metaData)); + reinterpret_cast(warmMethodHeader)->_metaData = metaData; // FAR: should we do postpone this copying until after CHTable commit? metaData->runtimeAssumptionList = *(compiler->getMetadataAssumptionList()); TR::ClassTableCriticalSection chTableCommit(&vm); diff --git a/runtime/oti/j9consts.h b/runtime/oti/j9consts.h index 7951bebdeb2..d5f5ae850ed 100644 --- a/runtime/oti/j9consts.h +++ b/runtime/oti/j9consts.h @@ -706,8 +706,8 @@ extern "C" { #define J9_RAS_METHOD_TRACING 0x2 #define J9_RAS_METHOD_TRACE_ARGS 0x4 #define J9_RAS_METHOD_TRIGGERING 0x8 -#define J9_RAS_MASK 0xF #define J9_MJIT_FAILED_COMPILE 0x10 +#define J9_RAS_MASK 0xF #define J9_JNI_OFFLOAD_SWITCH_CREATE_JAVA_VM 0x1 #define J9_JNI_OFFLOAD_SWITCH_DEALLOCATE_VM_THREAD 0x2 From c9dc6e5d53a8ccdeee0caa0b91dada6b7b484421 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Mon, 2 May 2022 13:14:49 -0300 Subject: [PATCH 20/25] Labelling MicroJIT TODOs as TODO: MicroJIT Signed-off-by: Harpreet Kaur --- .../microjit/CompilationInfoPerThreadBase.cpp | 4 ++-- runtime/compiler/microjit/ExceptionTable.cpp | 6 ++--- .../compiler/microjit/HWMetaDataCreation.cpp | 14 +++++------ .../compiler/microjit/LWMetaDataCreation.cpp | 2 +- .../microjit/x/amd64/AMD64Codegen.cpp | 24 +++++++++---------- .../compiler/optimizer/JProfilingBlock.cpp | 2 +- runtime/compiler/runtime/MetaData.cpp | 6 ++--- 7 files changed, 29 insertions(+), 29 deletions(-) diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp index 55bc23a9701..df0d53cb4a2 100644 --- a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -216,7 +216,7 @@ TR::CompilationInfoPerThreadBase::mjit( TR::GCStackAtlas *atlas = mjit_cg.getStackAtlas(); compiler->cg()->setStackAtlas(atlas); compiler->cg()->setMethodStackMap(atlas->getLocalMap()); - // TODO: Find out why setting this correctly causes the startPC to report the jitToJit startPC and not the interpreter entry point + // TODO: MicroJIT: Find out why setting this correctly causes the startPC to report the jitToJit startPC and not the interpreter entry point // compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(firstInstructionLocation-cursor)); compiler->cg()->setJitMethodEntryPaddingSize((uint32_t)(0)); @@ -277,7 +277,7 @@ TR::CompilationInfoPerThreadBase::mjit( // As the body is finished, mark its profile info as active so that the JProfiler thread will inspect it - // TODO: after adding profiling support, uncomment this. + // TODO: MicroJIT: after adding profiling support, uncomment this. // if (bodyInfo && bodyInfo->getProfileInfo()) // { // bodyInfo->getProfileInfo()->setActive(); diff --git a/runtime/compiler/microjit/ExceptionTable.cpp b/runtime/compiler/microjit/ExceptionTable.cpp index 2851efed7f8..9ca96442f41 100644 --- a/runtime/compiler/microjit/ExceptionTable.cpp +++ b/runtime/compiler/microjit/ExceptionTable.cpp @@ -34,7 +34,7 @@ class TR_ResolvedMethod; -// TODO: This Should create an Exception table using only information MicroJIT has. +// TODO: MicroJIT: This should create an exception table using only information MicroJIT has MJIT::ExceptionTableEntryIterator::ExceptionTableEntryIterator(TR::Compilation *comp) : _compilation(comp) { @@ -51,7 +51,7 @@ MJIT::ExceptionTableEntryIterator::ExceptionTableEntryIterator(TR::Compilation * } -// TODO: This Should create an Exception table entry using only information MicroJIT has. +// TODO: MicroJIT: This should create an exception table entry using only information MicroJIT has void MJIT::ExceptionTableEntryIterator::addSnippetRanges( List & tableEntries, @@ -61,7 +61,7 @@ MJIT::ExceptionTableEntryIterator::addSnippetRanges( TR_ResolvedMethod *method, TR::Compilation *comp) { - // TODO: create a snippet for each exception and add it to the correct place. + // TODO: MicroJIT: Create a snippet for each exception and add it to the correct place } TR_ExceptionTableEntry * diff --git a/runtime/compiler/microjit/HWMetaDataCreation.cpp b/runtime/compiler/microjit/HWMetaDataCreation.cpp index c40ed9b2ec7..95a5dc36bef 100644 --- a/runtime/compiler/microjit/HWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/HWMetaDataCreation.cpp @@ -74,7 +74,7 @@ join( { TR::CFGEdge *e = cfg.addEdge(src, dst); - // TODO: add case for switch + // TODO: MicroJIT: add case for switch if (isBranchDest) { src->getExit()->getNode()->setBranchDestination(dst->getEntry()); @@ -167,7 +167,7 @@ class BytecodeIndexList TR::CFG &_cfg; public: - TR_ALLOC(TR_Memory::UnknownType); //MicroJIT TODO: create a memory object type for this and get it added to OMR + TR_ALLOC(TR_Memory::UnknownType); // TODO: MicroJIT: create a memory object type for this and get it added to OMR uint32_t _index; BytecodeIndexList(uint32_t index, BytecodeIndexList *prev, BytecodeIndexList *next, TR::CFG &cfg) :_index(index) @@ -466,7 +466,7 @@ class CFGCreator } else { // Still working on current block - // TODO: determine if there are any edge cases that need to be filled in here. + // TODO: MicroJIT: determine if there are any edge cases that need to be filled in here } _current->addBCIndex(bci->currentByteCodeIndex()); } @@ -529,7 +529,7 @@ createInternalMethodMetadata( uint16_t maxCalleeArgsSize = 0; - // TODO: Create the CFG before here and pass reference + // TODO: MicroJIT: Create the CFG before here and pass reference bool profilingCompilation = comp->getOption(TR_EnableJProfiling) || comp->getProfilingMode() == JProfiling; TR::CFG *cfg; @@ -558,7 +558,7 @@ createInternalMethodMetadata( * on how many times a local is stored/loaded during a method. * This may be worth doing later, but requires some research first. * - * TODO: Arguments are not guaranteed to be used in the bytecode of a method + * TODO: MicroJIT: Arguments are not guaranteed to be used in the bytecode of a method * e.g. class MyClass { public int ret3() {return 3;} } * The above method is not a static method (arg 0 is an object reference) but the BC * will be [iconst_3, return]. @@ -689,8 +689,8 @@ createInternalMethodMetadata( } J9Method *ramMethod = static_cast(resolved)->ramMethod(); J9ROMMethod *romMethod = J9_ROM_METHOD_FROM_RAM_METHOD(ramMethod); - // TODO replace this with a more accurate count. This is a clear upper bound; - uint16_t calleeArgsSize = romMethod->argCount * pointerSize * 2; // assume 2 slot for now + // TODO: MicroJIT: replace this with a more accurate count; this is a clear upper bound + uint16_t calleeArgsSize = romMethod->argCount * pointerSize * 2; // assume 2 slots for now maxCalleeArgsSize = (calleeArgsSize > maxCalleeArgsSize) ? calleeArgsSize : maxCalleeArgsSize; } default: diff --git a/runtime/compiler/microjit/LWMetaDataCreation.cpp b/runtime/compiler/microjit/LWMetaDataCreation.cpp index ba77826f19c..d3c304c4be6 100644 --- a/runtime/compiler/microjit/LWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/LWMetaDataCreation.cpp @@ -28,7 +28,7 @@ */ #if defined(J9VM_OPT_MJIT_Standalone) -// TODO: Complete this section so that MicroJIT can be used in a future case where MicroJIT won't deploy with TR. +// TODO: MicroJIT: Complete this section so that MicroJIT can be used in a future case where MicroJIT won't deploy with TR #else /* do nothing */ #endif /* TR_MJIT_Interop */ diff --git a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp index f67614a7c15..b61a730b1f1 100755 --- a/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp +++ b/runtime/compiler/microjit/x/amd64/AMD64Codegen.cpp @@ -83,7 +83,7 @@ extern J9_CDATA char * const JavaBCNames[]; patchImm1(buffer, (U_32)(relativeAddress & 0x00000000000000ff)); \ } while (0) -// TODO: Find out how to jump to trampolines for far calls +// TODO: MicroJIT: Find out how to jump to trampolines for far calls #define PATCH_RELATIVE_ADDR_32_BIT(buffer, absAddress) \ do { \ intptr_t relativeAddress = (intptr_t)(buffer); \ @@ -1338,7 +1338,7 @@ MJIT::CodeGenerator::generatePrePrologue( // Make sure the startPC at least 4-byte aligned. This is important, since the VM // depends on the alignment (it uses the low order bits as tag bits). // - // TODO: `mustGenerateSwitchToInterpreterPrePrologue` checks that the code generator's compilation is not: + // TODO: MicroJIT: `mustGenerateSwitchToInterpreterPrePrologue` checks that the code generator's compilation is not: // `usesPreexistence`, // `getOption(TR_EnableHCR)`, // `!fej9()->isAsyncCompilation`, @@ -1346,7 +1346,7 @@ MJIT::CodeGenerator::generatePrePrologue( // if (GENERATE_SWITCH_TO_INTERP_PREPROLOGUE) { - // TODO: Replace with a check that performs the above checks. + // TODO: MicroJIT: Replace with a check that performs the above checks // generateSwitchToInterpPrePrologue will align data for use char *old_buff = buffer; preprologueSize += generateSwitchToInterpPrePrologue(buffer, method, alignmentBoundary, alignmentMargin, typeString, maxLength); @@ -1558,7 +1558,7 @@ MJIT::CodeGenerator::generatePrologue( patchImm4(buffer, (U_32)(mjitRegisterSize)); COPY_TEMPLATE(buffer, addR10Imm4, prologueSize); patchImm4(buffer, (U_32)(romMethod->maxStack*8)); - // TODO: Find a way to cleanly preserve this value before overriding it in the _stackAllocSize + // TODO: MicroJIT: Find a way to cleanly preserve this value before overriding it in the _stackAllocSize // Set up MJIT local array COPY_TEMPLATE(buffer, movR10R14, prologueSize); @@ -2296,7 +2296,7 @@ buffer_size_t MJIT::CodeGenerator::generateEdgeCounter(char *buffer, TR_J9ByteCodeIterator *bci) { buffer_size_t returnSize = 0; - // TODO: Get the pointer to the profiling counter + // TODO: MicroJIT: Get the pointer to the profiling counter uint32_t *profilingCounterLocation = NULL; COPY_TEMPLATE(buffer, loadCounter, returnSize); @@ -2543,7 +2543,7 @@ MJIT::CodeGenerator::generateStore(char *buffer, TR_J9ByteCode bc, TR_J9ByteCode return storeSize; } -// TODO: see ::emitSnippitCode and redo this to use snippits. +// TODO: MicroJIT: see ::emitSnippitCode and redo this to use snippits buffer_size_t MJIT::CodeGenerator::generateArgumentMoveForStaticMethod( char *buffer, @@ -2558,7 +2558,7 @@ MJIT::CodeGenerator::generateArgumentMoveForStaticMethod( // Pull out parameters and find their place for calling U_16 paramCount = MJIT::getParamCount(typeString, typeStringLength); U_16 actualParamCount = MJIT::getActualParamCount(typeString, typeStringLength); - // TODO: Implement passing arguments on stack when surpassing register count. + // TODO: MicroJIT: Implement passing arguments on stack when surpassing register count if (actualParamCount > 4) return 0; @@ -2690,7 +2690,7 @@ MJIT::CodeGenerator::generateArgumentMoveForStaticMethod( break; } default: - // TODO: Add stack argument passing here. + // TODO: MicroJIT: Add stack argument passing here break; } actualParamCount--; @@ -2712,7 +2712,7 @@ MJIT::CodeGenerator::generateInvokeStatic(char *buffer, TR_J9ByteCodeIterator *b // These are all from TR and likely have implications on how we use this address. However, we do not know that as of yet. bool isUnresolvedInCP; TR_ResolvedMethod *resolved = _compilee->getResolvedStaticMethod(_comp, cpIndex, &isUnresolvedInCP); - // TODO: Split this into generateInvokeStaticJava and generateInvokeStaticNative + // TODO: MicroJIT: Split this into generateInvokeStaticJava and generateInvokeStaticNative // and call the correct one with required args. Do not turn this method into a mess. if (!resolved || resolved->isNative()) return 0; @@ -2725,9 +2725,9 @@ MJIT::CodeGenerator::generateInvokeStatic(char *buffer, TR_J9ByteCodeIterator *b if (MJIT::nativeSignature(ramMethod, typeString)) return 0; - // TODO: Find the maximum size that will need to be saved - // for calling a method and reserve it in the prologue. - // This could be done during the meta-data gathering phase. + // TODO: MicroJIT: Find the maximum size that will need to be saved + // for calling a method and reserve it in the prologue. + // This could be done during the meta-data gathering phase. U_16 slotCount = MJIT::getMJITStackSlotCount(typeString, maxLength); // Generate template and instantiate immediate values diff --git a/runtime/compiler/optimizer/JProfilingBlock.cpp b/runtime/compiler/optimizer/JProfilingBlock.cpp index 07f7489211e..bf75ab8c814 100644 --- a/runtime/compiler/optimizer/JProfilingBlock.cpp +++ b/runtime/compiler/optimizer/JProfilingBlock.cpp @@ -980,7 +980,7 @@ int32_t TR_JProfilingBlock::perform() if (trace()) parent.dump(comp()); -// TODO: Remove this when ready to implement counter insertion. +// TODO: MicroJIT: Remove this when ready to implement counter insertion #if defined(J9VM_OPT_MICROJIT) J9Method *method = static_cast(comp()->getMethodBeingCompiled())->ramMethod(); if (TR::Options::getJITCmdLineOptions()->_mjitEnabled && comp()->fej9()->isMJITExtendedFlagsMethod(method) && comp()->getMethodBeingCompiled()->isInterpreted()) diff --git a/runtime/compiler/runtime/MetaData.cpp b/runtime/compiler/runtime/MetaData.cpp index 6229a150c2e..eb38212d69c 100644 --- a/runtime/compiler/runtime/MetaData.cpp +++ b/runtime/compiler/runtime/MetaData.cpp @@ -1866,7 +1866,7 @@ createMJITMethodMetaData( // --------------------------------------------------------------------------- // Find unmergeable GC maps - // TODO: MicroJIT: Create mechanism like treetop to support this at a MicroJIT level. + // TODO: MicroJIT: Create mechanism like treetop to support this at a MicroJIT level GCStackMapSet nonmergeableBCI(std::less(), cg->trMemory()->heapMemoryRegion()); // if (comp->getOptions()->getReportByteCodeInfoAtCatchBlock()) // { @@ -1907,7 +1907,7 @@ createMJITMethodMetaData( { return NULL; } - // TODO: once exception support is added to MicroJIT uncomment this line. + // TODO: MicroJIT: once exception support is added to MicroJIT uncomment this line // tableSize += exceptionsSize; uint32_t exceptionTableSize = tableSize; // Size of the meta data header and exception entries @@ -1915,7 +1915,7 @@ createMJITMethodMetaData( // Computing the size of the inlinedCall // - // TODO: Do we care about inline calls? + // TODO: MicroJIT: Do we care about inline calls? uint32_t inlinedCallSize = 0; // Add size of stack atlas to allocate From 55855765e83f07594c3b8df4658a398a3c61cc83 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Wed, 4 May 2022 18:53:56 -0300 Subject: [PATCH 21/25] Removing the ClassTableCriticalSection aquisition Signed-off-by: Harpreet Kaur --- runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp index df0d53cb4a2..af914d5f0c3 100644 --- a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -25,7 +25,6 @@ #include "compile/Compilation.hpp" #include "codegen/OMRCodeGenerator.hpp" #include "env/VerboseLog.hpp" -#include "env/ClassTableCriticalSection.hpp" #include "runtime/CodeCacheTypes.hpp" #include "runtime/CodeCache.hpp" #include "runtime/CodeCacheManager.hpp" @@ -303,9 +302,9 @@ TR::CompilationInfoPerThreadBase::mjit( reinterpret_cast(warmMethodHeader)->_metaData = metaData; // FAR: should we do postpone this copying until after CHTable commit? metaData->runtimeAssumptionList = *(compiler->getMetadataAssumptionList()); - TR::ClassTableCriticalSection chTableCommit(&vm); - TR_ASSERT(!chTableCommit.acquiredVMAccess(), "We should have already acquired VM access at this point."); + TR_CHTable *chTable = compiler->getCHTable(); + TR_ASSERT_FATAL(!chTable || chTable->canSkipCommit(compiler), "MicroJIT should not use CHTable"); if (prologue_address && compiler->getOption(TR_TraceCG)) trfprintf(logFileFP, "\nMJIT:%s\n", compilee->signature(compiler->trMemory())); From a81b851fcc46f7a3e8ac8c3c53f0a61f158dbfbf Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Thu, 5 May 2022 12:06:20 -0300 Subject: [PATCH 22/25] Moving logCompilationSuccess to appropriate place Signed-off-by: Harpreet Kaur --- runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp index af914d5f0c3..51b1b58d0e1 100644 --- a/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp +++ b/runtime/compiler/microjit/CompilationInfoPerThreadBase.cpp @@ -311,13 +311,11 @@ TR::CompilationInfoPerThreadBase::mjit( compiler->cg()->getCodeCache()->trimCodeMemoryAllocation(buffer, (cursor-buffer)); - // END MICROJIT if (compiler->getOption(TR_TraceCG)) trfprintf(logFileFP, "\n------------------------------\nMicroJIT Compiled %s Successfully!\n------------------------------\n", signature); - + logCompilationSuccess(vmThread, vm, method, scratchSegmentProvider, compilee, compiler, metaData, optimizationPlan); } - - logCompilationSuccess(vmThread, vm, method, scratchSegmentProvider, compilee, compiler, metaData, optimizationPlan); + // END MICROJIT TRIGGER_J9HOOK_JIT_COMPILING_END(_jitConfig->hookInterface, vmThread, method); } From 6e2c58a42173cf9f1dc28abdea17a284cdb34864 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Mon, 9 May 2022 11:32:33 -0300 Subject: [PATCH 23/25] Resolving failing copyright check Signed-off-by: Harpreet Kaur --- runtime/compiler/ilgen/J9ByteCodeIterator.hpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/runtime/compiler/ilgen/J9ByteCodeIterator.hpp b/runtime/compiler/ilgen/J9ByteCodeIterator.hpp index 95ee99c631a..07f0372e254 100644 --- a/runtime/compiler/ilgen/J9ByteCodeIterator.hpp +++ b/runtime/compiler/ilgen/J9ByteCodeIterator.hpp @@ -1,5 +1,5 @@ /******************************************************************************* - * Copyright (c) 2000, 2020 IBM Corp. and others + * Copyright (c) 2000, 2022 IBM Corp. and others * * This program and the accompanying materials are made available under * the terms of the Eclipse Public License 2.0 which accompanies this From 035a4278edf44ecdba5c3577e599c5902a24e094 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Thu, 12 May 2022 12:17:25 -0300 Subject: [PATCH 24/25] Code de-duplication Signed-off-by: Harpreet Kaur --- runtime/compiler/infra/J9Cfg.cpp | 901 ++++++------------------------- runtime/compiler/infra/J9Cfg.hpp | 26 +- 2 files changed, 168 insertions(+), 759 deletions(-) diff --git a/runtime/compiler/infra/J9Cfg.cpp b/runtime/compiler/infra/J9Cfg.cpp index da689a2e6c2..f4c67aff695 100644 --- a/runtime/compiler/infra/J9Cfg.cpp +++ b/runtime/compiler/infra/J9Cfg.cpp @@ -33,9 +33,7 @@ #include "cs2/arrayof.h" #include "cs2/bitvectr.h" #include "cs2/listof.h" -#if defined(J9VM_OPT_MICROJIT) #include "env/j9method.h" -#endif /* defined(J9VM_OPT_MICROJIT) */ #include "env/TRMemory.hpp" #include "env/PersistentInfo.hpp" #include "env/VMJ9.h" @@ -50,9 +48,7 @@ #include "il/SymbolReference.hpp" #include "il/TreeTop.hpp" #include "il/TreeTop_inlines.hpp" -#if defined(J9VM_OPT_MICROJIT) #include "ilgen/J9ByteCodeIterator.hpp" -#endif /* defined(J9VM_OPT_MICROJIT) */ #include "infra/Assert.hpp" #include "infra/Cfg.hpp" #include "infra/Link.hpp" @@ -837,7 +833,7 @@ static int32_t summarizeFrequencyFromPredecessors(TR::CFGNode *node, TR::CFG *cf void -J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() +J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler(bool mjit) { TR::StackMemoryRegion stackMemoryRegion(*trMemory()); int32_t numBlocks = getNextNodeNumber(); @@ -859,12 +855,21 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() bool backEdgeExists = false; TR::TreeTop *methodScanEntry = NULL; //_maxFrequency = 0; - while ( (temp->getSuccessors().size() == 1) || - !temp->asBlock()->getEntry() || - !((temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() - && !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) || - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) + bool loopVar = false; + if (false == mjit) // TR Compilation + { + loopVar = (temp->getSuccessors().size() == 1) || + !temp->asBlock()->getEntry() || + !((temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() + && !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) || + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table); + } + else // MicroJIT Compilation + { + loopVar = continueLoop(temp); + } + while (loopVar) { if (seenNodes->isSet(temp->getNumber())) break; @@ -906,9 +911,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() if (!temp->getSuccessors().empty()) { if ((temp->getSuccessors().size() == 2) && temp->asBlock() && temp->asBlock()->getNextBlock()) - { temp = temp->asBlock()->getNextBlock(); - } else temp = temp->getSuccessors().front()->getTo(); } @@ -917,58 +920,92 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() } if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Propagation start block_%d\n", temp->getNumber()); + dumpOptDetails(comp(), "Propagation start block_%d\n", temp->getNumber()); if (temp->asBlock()->getEntry()) inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); - if ((temp->getSuccessors().size() == 2) && - temp->asBlock()->getEntry() && - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() && - !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) - { - getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken); - startFrequency = taken + nottaken; - self()->setEdgeFrequenciesOnNode( temp, taken, nottaken, comp()); - } - else if (temp->asBlock()->getEntry() && - (temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) + if (false == mjit) { - startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getLastRealTreeTop()->getNode(), comp()); - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Switch with total frequency of %d\n", startFrequency); - setSwitchEdgeFrequenciesOnNode(temp, comp()); + if ((temp->getSuccessors().size() == 2) && + temp->asBlock()->getEntry() && + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() && + !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) + { + getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken, mjit); + startFrequency = taken + nottaken; + self()->setEdgeFrequenciesOnNode(temp, taken, nottaken, comp()); + } + else if (temp->asBlock()->getEntry() && + (temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || + temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) + { + startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getLastRealTreeTop()->getNode(), comp()); + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(), "Switch with total frequency of %d\n", startFrequency); + setSwitchEdgeFrequenciesOnNode(temp, comp()); + } + else + { + if (_calledFrequency > 0) + startFrequency = _calledFrequency; + else if (initialCallScanFreq > 0) + startFrequency = initialCallScanFreq; + else + startFrequency = AVG_FREQ; + + if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) + self()->setEdgeFrequenciesOnNode( temp, 0, startFrequency, comp()); + else + self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); + } } else { - if (_calledFrequency > 0) - startFrequency = _calledFrequency; - else if (initialCallScanFreq > 0) - startFrequency = initialCallScanFreq; - else + if ((temp->getSuccessors().size() == 2) && + temp->asBlock()->getEntry() && + isBranch(getLastRealBytecodeOfBlock(temp))) { - startFrequency = AVG_FREQ; + getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken, mjit); + startFrequency = taken + nottaken; + self()->setEdgeFrequenciesOnNode(temp, taken, nottaken, comp()); + } + else if (temp->asBlock()->getEntry() && + isTableOp(getBytecodeFromIndex(temp->asBlock()->getExit()->getNode()->getLocalIndex()))) + { + startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getExit()->getNode(), comp()); + if (comp()->getOption(TR_TraceBFGeneration)) + dumpOptDetails(comp(), "Switch with total frequency of %d\n", startFrequency); + setSwitchEdgeFrequenciesOnNode(temp, comp()); } - - if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode( temp, 0, startFrequency, comp()); else - self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); + { + if (_calledFrequency > 0) + startFrequency = _calledFrequency; + else if (initialCallScanFreq > 0) + startFrequency = initialCallScanFreq; + else + startFrequency = AVG_FREQ; + + if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && isBranch(getLastRealBytecodeOfBlock(temp))) + self()->setEdgeFrequenciesOnNode(temp, 0, startFrequency, comp()); + else + self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); + } } - setBlockFrequency (temp, startFrequency); + setBlockFrequency(temp, startFrequency); _initialBlockFrequency = startFrequency; if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); + dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); start = temp; // Walk backwards to the start and propagate this frequency if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Propagating start frequency backwards...\n"); + dumpOptDetails(comp(), "Propagating start frequency backwards...\n"); ListIterator upit(&upStack); for (temp = upit.getFirst(); temp; temp = upit.getNext()) @@ -976,18 +1013,18 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() if (!temp->asBlock()->getEntry()) continue; if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode( temp, 0, startFrequency, comp()); + self()->setEdgeFrequenciesOnNode(temp, 0, startFrequency, comp()); else - self()->setUniformEdgeFrequenciesOnNode( temp, startFrequency, false, comp()); - setBlockFrequency (temp, startFrequency); + self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); + setBlockFrequency(temp, startFrequency); seenNodes->set(temp->getNumber()); if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); + dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); } if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Propagating block frequency forward...\n"); + dumpOptDetails(comp(), "Propagating block frequency forward...\n"); // Walk reverse post-order // we start at the first if or switch statement @@ -1014,11 +1051,11 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() if (node->asBlock()->isCold()) { if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Analyzing COLD block_%d\n", node->getNumber()); + dumpOptDetails(comp(), "Analyzing COLD block_%d\n", node->getNumber()); //node->asBlock()->setFrequency(0); int32_t freq = node->getFrequency(); if ((node->getSuccessors().size() == 2) && (freq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode( node, 0, freq, comp()); + self()->setEdgeFrequenciesOnNode(node, 0, freq, comp()); else self()->setUniformEdgeFrequenciesOnNode(node, freq, false, comp()); setBlockFrequency (node, freq); @@ -1038,7 +1075,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() !node->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) { seenNodesInCycle->empty(); - getInterpreterProfilerBranchCountersOnDoubleton(node, &taken, ¬taken); + getInterpreterProfilerBranchCountersOnDoubleton(node, &taken, ¬taken, mjit); if ((taken <= 0) && (nottaken <= 0)) { @@ -1046,11 +1083,11 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() nottaken = LOW_FREQ; } - self()->setEdgeFrequenciesOnNode( node, taken, nottaken, comp()); - setBlockFrequency (node, taken + nottaken); + self()->setEdgeFrequenciesOnNode(node, taken, nottaken, comp()); + setBlockFrequency(node, taken + nottaken); if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"If on node %p is not guard I'm using the taken and nottaken counts for producing block frequency\n", node->asBlock()->getLastRealTreeTop()->getNode()); + dumpOptDetails(comp(), "If on node %p is not guard I'm using the taken and nottaken counts for producing block frequency\n", node->asBlock()->getLastRealTreeTop()->getNode()); } else { @@ -1099,15 +1136,15 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() int32_t sumFreq = summarizeFrequencyFromPredecessors(node, self()); if (sumFreq <= 0) sumFreq = AVG_FREQ; - self()->setEdgeFrequenciesOnNode( node, 0, sumFreq, comp()); - setBlockFrequency (node, sumFreq); + self()->setEdgeFrequenciesOnNode(node, 0, sumFreq, comp()); + setBlockFrequency(node, sumFreq); if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"If on node %p is guard I'm using the predecessor frequency sum\n", node->asBlock()->getLastRealTreeTop()->getNode()); + dumpOptDetails(comp(), "If on node %p is guard I'm using the predecessor frequency sum\n", node->asBlock()->getLastRealTreeTop()->getNode()); } if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); + dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); } else if (node->asBlock()->getEntry() && (node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || @@ -1116,11 +1153,11 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() seenNodesInCycle->empty(); int32_t sumFreq = _externalProfiler->getSumSwitchCount(node->asBlock()->getLastRealTreeTop()->getNode(), comp()); setSwitchEdgeFrequenciesOnNode(node, comp()); - setBlockFrequency (node, sumFreq); + setBlockFrequency(node, sumFreq); if (comp()->getOption(TR_TraceBFGeneration)) { - dumpOptDetails(comp(),"Found a Switch statement at exit of block_%d\n", node->getNumber()); - dumpOptDetails(comp(),"Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); + dumpOptDetails(comp(), "Found a Switch statement at exit of block_%d\n", node->getNumber()); + dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); } } else @@ -1159,16 +1196,16 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() traceMsg(comp(), "2Setting block and uniform freqs\n"); if ((node->getSuccessors().size() == 2) && (sumFreq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode( node, 0, sumFreq, comp()); + self()->setEdgeFrequenciesOnNode(node, 0, sumFreq, comp()); else - self()->setUniformEdgeFrequenciesOnNode( node, sumFreq, false, comp()); - setBlockFrequency (node, sumFreq); + self()->setUniformEdgeFrequenciesOnNode(node, sumFreq, false, comp()); + setBlockFrequency(node, sumFreq); } if (comp()->getOption(TR_TraceBFGeneration)) { - dumpOptDetails(comp(),"Not an if (or unknown if) at exit of block %d (isSingleton=%d)\n", node->getNumber(), (node->getSuccessors().size() == 1)); - dumpOptDetails(comp(),"Set frequency of %d on block %d\n", node->asBlock()->getFrequency(), node->getNumber()); + dumpOptDetails(comp(), "Not an if (or unknown if) at exit of block %d (isSingleton=%d)\n", node->getNumber(), (node->getSuccessors().size() == 1)); + dumpOptDetails(comp(), "Set frequency of %d on block %d\n", node->asBlock()->getFrequency(), node->getNumber()); } } } @@ -1189,7 +1226,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() (succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)))) { - setBlockFrequency ( succ, edge->getFrequency(), true); + setBlockFrequency(succ, edge->getFrequency(), true); // addup this edge to the frequency of the blocks following it // propagate downward until you reach a block that doesn't end in @@ -1209,8 +1246,8 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() if (comp()->getOption(TR_TraceBFGeneration)) traceMsg(comp(), "3Setting block and uniform freqs\n"); - self()->setUniformEdgeFrequenciesOnNode( tempNode, edge->getFrequency(), true, comp()); - setBlockFrequency (tempNode, edge->getFrequency(), true); + self()->setUniformEdgeFrequenciesOnNode(tempNode, edge->getFrequency(), true, comp()); + setBlockFrequency(tempNode, edge->getFrequency(), true); seenGotoNodes->set(tempNode->getNumber()); tempNode = nextTempNode; @@ -1275,7 +1312,7 @@ J9::CFG::setBlockFrequenciesBasedOnInterpreterProfiler() void J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfiler(TR::Compilation *comp) { - TR_ExternalProfiler* profiler = comp->fej9()->hasIProfilerBlockFrequencyInfo(*comp); + TR_ExternalProfiler *profiler = comp->fej9()->hasIProfilerBlockFrequencyInfo(*comp); if (!profiler) { _initialBlockFrequency = AVG_FREQ; @@ -1329,9 +1366,7 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfiler(TR::Compilation *co if (!temp->getSuccessors().empty()) { if ((temp->getSuccessors().size() == 2) && temp->asBlock() && temp->asBlock()->getNextBlock()) - { temp = temp->asBlock()->getNextBlock(); - } else temp = temp->getSuccessors().front()->getTo(); } @@ -1347,7 +1382,7 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfiler(TR::Compilation *co temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() && !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) { - getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken); + getInterpreterProfilerBranchCountersOnDoubleton(temp, &taken, ¬taken, false); if ((taken <= 0) && (nottaken <= 0)) { taken = LOW_FREQ; @@ -1368,9 +1403,7 @@ J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfiler(TR::Compilation *co else if (initialCallScanFreq > 0) startFrequency = initialCallScanFreq; else - { startFrequency = AVG_FREQ; - } } if (startFrequency <= 0) @@ -1411,9 +1444,13 @@ getParentCallCount(TR::CFG *cfg, TR::Node *node) void -J9::CFG::getInterpreterProfilerBranchCountersOnDoubleton(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken) +J9::CFG::getInterpreterProfilerBranchCountersOnDoubleton(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken, bool mjitFlag) { - TR::Node *node = cfgNode->asBlock()->getLastRealTreeTop()->getNode(); + TR::Node *node; + if (false == mjitFlag) + node = cfgNode->asBlock()->getLastRealTreeTop()->getNode(); + else // We don't have Real Tree Tops in MicroJIT, so we need to work around that + node = cfgNode->asBlock()->getExit()->getNode(); if (this != comp()->getFlowGraph()) { @@ -1428,7 +1465,7 @@ J9::CFG::getInterpreterProfilerBranchCountersOnDoubleton(TR::CFGNode *cfgNode, i if (*taken || *nottaken) { if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"If on node %p has branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + dumpOptDetails(comp(), "If on node %p has branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); } else if (isVirtualGuard(node)) { @@ -1440,7 +1477,7 @@ J9::CFG::getInterpreterProfilerBranchCountersOnDoubleton(TR::CFGNode *cfgNode, i *nottaken = sumFreq; if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"Guard on node %p has default branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + dumpOptDetails(comp(), "Guard on node %p has default branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); } else if (!cfgNode->asBlock()->isCold()) { @@ -1479,7 +1516,7 @@ J9::CFG::getInterpreterProfilerBranchCountersOnDoubleton(TR::CFGNode *cfgNode, i } */ if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(),"If with no profiling information on node %p has low branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); + dumpOptDetails(comp(), "If with no profiling information on node %p has low branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); } } @@ -1501,44 +1538,49 @@ J9::CFG::setSwitchEdgeFrequenciesOnNode(TR::CFGNode *node, TR::Compilation *comp { TR::Block *block = node->asBlock(); TR::Node *treeNode = node->asBlock()->getLastRealTreeTop()->getNode(); + /* This and a few other methods here in TR differ from their MJIT counterparts only in whether the TR::Node + used for information about bytecode is in getLastRealTreeTop() or getExit() on the asBlock() of the CFGNode. + Hence, use the following if this needs to be implemented for MicroJIT: + TR::Node *treeNode = node->asBlock()->getExit()->getNode(); + */ int32_t sumFrequency = _externalProfiler->getSumSwitchCount(treeNode, comp); if (sumFrequency < 10) { if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp,"Low count switch I'll set frequencies using uniform edge distribution\n"); + dumpOptDetails(comp, "Low count switch I'll set frequencies using uniform edge distribution\n"); - self()->setUniformEdgeFrequenciesOnNode (node, sumFrequency, false, comp); + self()->setUniformEdgeFrequenciesOnNode(node, sumFrequency, false, comp); return; } if (treeNode->getInlinedSiteIndex() < -1) { if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp,"Dummy switch generated in estimate code size I'll set frequencies using uniform edge distribution\n"); + dumpOptDetails(comp, "Dummy switch generated in estimate code size I'll set frequencies using uniform edge distribution\n"); - self()->setUniformEdgeFrequenciesOnNode (node, sumFrequency, false, comp); + self()->setUniformEdgeFrequenciesOnNode(node, sumFrequency, false, comp); return; } if (_externalProfiler->isSwitchProfileFlat(treeNode, comp)) { if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp,"Flat profile switch, setting average frequency on each case.\n"); + dumpOptDetails(comp, "Flat profile switch, setting average frequency on each case.\n"); self()->setUniformEdgeFrequenciesOnNode(node, _externalProfiler->getFlatSwitchProfileCounts(treeNode, comp), false, comp); return; } - for ( int32_t count=1; count < treeNode->getNumChildren(); count++) + for (int32_t count=1; count < treeNode->getNumChildren(); count++) { TR::Node *child = treeNode->getChild(count); TR::CFGEdge *e = getCFGEdgeForNode(node, child); int32_t frequency = _externalProfiler->getSwitchCountForValue (treeNode, (count-1), comp); - e->setFrequency( std::max(frequency,1) ); + e->setFrequency(std::max(frequency,1)); if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp,"Edge %p between %d and %d has freq %d (Switch)\n", e, e->getFrom()->getNumber(), e->getTo()->getNumber(), e->getFrequency()); + dumpOptDetails(comp, "Edge %p between %d and %d has freq %d (Switch)\n", e, e->getFrom()->getNumber(), e->getTo()->getNumber(), e->getFrequency()); } } @@ -1719,12 +1761,13 @@ J9::CFG::propagateFrequencyInfoFromExternalProfiler(TR_ExternalProfiler *profile { if (!profiler) return; - self()->setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT(); + self()->setBlockFrequenciesBasedOnInterpreterProfiler(true); + return; } #endif /* J9VM_OPT_MICROJIT */ if (profiler) { - self()->setBlockFrequenciesBasedOnInterpreterProfiler(); + self()->setBlockFrequenciesBasedOnInterpreterProfiler(false); return; } @@ -1779,128 +1822,8 @@ J9::CFG::emitVerbosePseudoRandomFrequencies() } -#if defined(J9VM_OPT_MICROJIT) -/* These methods all use MicroJIT, so we cannot assume all the assumptions about - * availability of meta-data structures (such as IL Trees) that TR can usually make. - * To that end, there are no real tree tops in this CFG, so we must use the exit - * nodes to get the last bytecode index for a given block, and from that get the bytecode. - * Since this will be done no less than 3 times, a helper function is provided. - */ -TR_J9ByteCode -J9::CFG::getBytecodeFromIndex(int32_t index) - { - TR_ResolvedJ9Method *method = static_cast(_method->getResolvedMethod()); - TR_J9VMBase *fe = comp()->fej9(); - TR_J9ByteCodeIterator bcIterator(_method, method, fe, comp()); - bcIterator.setIndex(index); - return bcIterator.next(); - } - -TR_J9ByteCode -J9::CFG::getLastRealBytecodeOfBlock(TR::CFGNode *start) - { - return getBytecodeFromIndex(start->asBlock()->getExit()->getNode()->getLocalIndex()); - } - -void -J9::CFG::getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken) - { - // We have not Real Tree Tops, so we need to work around that. - TR::Node *node = cfgNode->asBlock()->getExit()->getNode(); - - if (this != comp()->getFlowGraph()) - { - TR::TreeTop *entry = (cfgNode->asBlock()->getNextBlock()) ? cfgNode->asBlock()->getNextBlock()->getEntry() : NULL; - _externalProfiler->getBranchCounters(node, entry, taken, nottaken, comp()); - } - else - { - getBranchCounters(node, cfgNode->asBlock(), taken, nottaken, comp()); - } - - if (*taken || *nottaken) - { - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "If on node %p has branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); - } - else if (isVirtualGuard(node)) - { - *taken = 0; - *nottaken = AVG_FREQ; - int32_t sumFreq = summarizeFrequencyFromPredecessors(cfgNode, self()); - - if (sumFreq>0) - *nottaken = sumFreq; - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Guard on node %p has default branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); - } - else if (!cfgNode->asBlock()->isCold()) - { - if (node->getBranchDestination()->getNode()->getBlock()->isCold()) - *taken = 0; - else - *taken = LOW_FREQ; - - if (cfgNode->asBlock()->getNextBlock() && - cfgNode->asBlock()->getNextBlock()->isCold()) - *nottaken = 0; - else - *nottaken = LOW_FREQ; - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "If with no profiling information on node %p has low branch counts: taken=%d, not taken=%d\n", node, *taken, *nottaken); - } - } - -void -J9::CFG::setSwitchEdgeFrequenciesOnNodeMicroJIT(TR::CFGNode *node, TR::Compilation *comp) - { - TR::Block *block = node->asBlock(); - TR::Node *treeNode = node->asBlock()->getExit()->getNode(); - int32_t sumFrequency = _externalProfiler->getSumSwitchCount(treeNode, comp); - - if (sumFrequency < 10) - { - if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp, "Low count switch I'll set frequencies using uniform edge distribution\n"); - - self()->setUniformEdgeFrequenciesOnNode(node, sumFrequency, false, comp); - return; - } - - if (treeNode->getInlinedSiteIndex() < -1) - { - if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp, "Dummy switch generated in estimate code size I'll set frequencies using uniform edge distribution\n"); - - self()->setUniformEdgeFrequenciesOnNode(node, sumFrequency, false, comp); - return; - } - - if (_externalProfiler->isSwitchProfileFlat(treeNode, comp)) - { - if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp, "Flat profile switch, setting average frequency on each case.\n"); - - self()->setUniformEdgeFrequenciesOnNode(node, _externalProfiler->getFlatSwitchProfileCounts(treeNode, comp), false, comp); - return; - } - - for (int32_t count=1; count < treeNode->getNumChildren(); count++) - { - TR::Node *child = treeNode->getChild(count); - TR::CFGEdge *e = getCFGEdgeForNode(node, child); - - int32_t frequency = _externalProfiler->getSwitchCountForValue(treeNode, (count-1), comp); - e->setFrequency(std::max(frequency,1)); - if (comp->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp, "Edge %p between %d and %d has freq %d (Switch)\n", e, e->getFrom()->getNumber(), e->getTo()->getNumber(), e->getFrequency()); - } - } - bool -isBranch(TR_J9ByteCode bc) +J9::CFG::isBranch(TR_J9ByteCode bc) { switch(bc) { @@ -1926,8 +1849,9 @@ isBranch(TR_J9ByteCode bc) } } + bool -isTableOp(TR_J9ByteCode bc) +J9::CFG::isTableOp(TR_J9ByteCode bc) { switch(bc) { @@ -1939,6 +1863,31 @@ isTableOp(TR_J9ByteCode bc) } } + +/* These methods all use MicroJIT, so we cannot make all the assumptions about + * availability of meta-data structures (such as IL Trees) that TR can usually make. + * To that end, there are no real tree tops in this CFG, so we must use the exit + * nodes to get the last bytecode index for a given block, and from that get the bytecode. + * Since this will be done no less than 3 times, a helper function is provided. + */ +TR_J9ByteCode +J9::CFG::getBytecodeFromIndex(int32_t index) + { + TR_ResolvedJ9Method *method = static_cast(_method->getResolvedMethod()); + TR_J9VMBase *fe = comp()->fej9(); + TR_J9ByteCodeIterator bcIterator(_method, method, fe, comp()); + bcIterator.setIndex(index); + return bcIterator.next(); + } + + +TR_J9ByteCode +J9::CFG::getLastRealBytecodeOfBlock(TR::CFGNode *start) + { + return getBytecodeFromIndex(start->asBlock()->getExit()->getNode()->getLocalIndex()); + } + + bool J9::CFG::continueLoop(TR::CFGNode *temp) { @@ -1949,531 +1898,3 @@ J9::CFG::continueLoop(TR::CFGNode *temp) } return false; } - -void -J9::CFG::setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT() - { - TR::StackMemoryRegion stackMemoryRegion(*trMemory()); - int32_t numBlocks = getNextNodeNumber(); - - TR_BitVector *seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - TR_BitVector *seenNodesInCycle = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - _frequencySet = new (trHeapMemory()) TR_BitVector(numBlocks, trMemory(), heapAlloc, notGrowable); - int32_t startFrequency = AVG_FREQ; - int32_t taken = AVG_FREQ; - int32_t nottaken = AVG_FREQ; - int32_t initialCallScanFreq = -1; - int32_t inlinedSiteIndex = -1; - - // Walk until we find if or switch statement - TR::CFGNode *start = getStart(); - TR::CFGNode *temp = start; - TR_ScratchList upStack(trMemory()); - - bool backEdgeExists = false; - TR::TreeTop *methodScanEntry = NULL; - //_maxFrequency = 0; - while (continueLoop(temp)) - { - if (seenNodes->isSet(temp->getNumber())) - break; - - upStack.add(temp); - seenNodes->set(temp->getNumber()); - - if (!backEdgeExists && !temp->getPredecessors().empty() && !(temp->getPredecessors().size() == 1)) - { - if (comp()->getHCRMode() != TR::osr) - backEdgeExists = true; - else - { - // Count the number of edges from non OSR blocks - int32_t nonOSREdges = 0; - for (auto edge = temp->getPredecessors().begin(); edge != temp->getPredecessors().end(); ++edge) - { - if ((*edge)->getFrom()->asBlock()->isOSRCodeBlock() || (*edge)->getFrom()->asBlock()->isOSRCatchBlock()) - continue; - if (nonOSREdges > 0) - { - backEdgeExists = true; - break; - } - nonOSREdges++; - } - } - } - - if (temp->asBlock()->getEntry() && initialCallScanFreq < 0) - { - int32_t methodScanFreq = scanForFrequencyOnSimpleMethod(temp->asBlock()->getEntry(), temp->asBlock()->getExit()); - if (methodScanFreq > 0) - initialCallScanFreq = methodScanFreq; - inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); - methodScanEntry = temp->asBlock()->getEntry(); - } - - if (!temp->getSuccessors().empty()) - { - if ((temp->getSuccessors().size() == 2) && temp->asBlock() && temp->asBlock()->getNextBlock()) - temp = temp->asBlock()->getNextBlock(); - else - temp = temp->getSuccessors().front()->getTo(); - } - else - break; - } - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Propagation start block_%d\n", temp->getNumber()); - - if (temp->asBlock()->getEntry()) - inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); - - if ((temp->getSuccessors().size() == 2) && - temp->asBlock()->getEntry() && - isBranch(getLastRealBytecodeOfBlock(temp))) - { - getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(temp, &taken, ¬taken); - startFrequency = taken + nottaken; - self()->setEdgeFrequenciesOnNode(temp, taken, nottaken, comp()); - } - else if (temp->asBlock()->getEntry() && - isTableOp(getBytecodeFromIndex(temp->asBlock()->getExit()->getNode()->getLocalIndex()))) - { - startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getExit()->getNode(), comp()); - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Switch with total frequency of %d\n", startFrequency); - setSwitchEdgeFrequenciesOnNode(temp, comp()); - } - else - { - if (_calledFrequency > 0) - startFrequency = _calledFrequency; - else if (initialCallScanFreq > 0) - startFrequency = initialCallScanFreq; - else - startFrequency = AVG_FREQ; - - if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && isBranch(getLastRealBytecodeOfBlock(temp))) - self()->setEdgeFrequenciesOnNode(temp, 0, startFrequency, comp()); - else - self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); - } - - setBlockFrequency(temp, startFrequency); - _initialBlockFrequency = startFrequency; - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); - - start = temp; - - // Walk backwards to the start and propagate this frequency - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Propagating start frequency backwards...\n"); - - ListIterator upit(&upStack); - for (temp = upit.getFirst(); temp; temp = upit.getNext()) - { - if (!temp->asBlock()->getEntry()) - continue; - if ((temp->getSuccessors().size() == 2) && (startFrequency > 0) && temp->asBlock()->getEntry() && temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode(temp, 0, startFrequency, comp()); - else - self()->setUniformEdgeFrequenciesOnNode(temp, startFrequency, false, comp()); - setBlockFrequency(temp, startFrequency); - seenNodes->set(temp->getNumber()); - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", temp->asBlock()->getFrequency(), temp->getNumber()); - } - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Propagating block frequency forward...\n"); - - // Walk reverse post-order - // we start at the first if or switch statement - TR_ScratchList stack(trMemory()); - stack.add(start); - - while (!stack.isEmpty()) - { - TR::CFGNode *node = stack.popHead(); - - if (comp()->getOption(TR_TraceBFGeneration)) - traceMsg(comp(), "Considering block_%d\n", node->getNumber()); - - if (seenNodes->isSet(node->getNumber())) - continue; - - seenNodes->set(node->getNumber()); - - if (!node->asBlock()->getEntry()) - continue; - - inlinedSiteIndex = node->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); - - if (node->asBlock()->isCold()) - { - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Analyzing COLD block_%d\n", node->getNumber()); - //node->asBlock()->setFrequency(0); - int32_t freq = node->getFrequency(); - if ((node->getSuccessors().size() == 2) && (freq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode(node, 0, freq, comp()); - else - self()->setUniformEdgeFrequenciesOnNode(node, freq, false, comp()); - setBlockFrequency (node, freq); - //continue; - } - - // ignore the first node - if (!node->asBlock()->isCold() && (node != start)) - { - if ((node->getSuccessors().size() == 2) && - node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - { - // if it's an if and it's not JIT inserted artificial if then just - // read the outgoing branches and put that as node frequency - // else get frequency from predecessors - if (!isVirtualGuard(node->asBlock()->getLastRealTreeTop()->getNode()) && - !node->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) - { - seenNodesInCycle->empty(); - getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(node, &taken, ¬taken); - - if ((taken <= 0) && (nottaken <= 0)) - { - taken = LOW_FREQ; - nottaken = LOW_FREQ; - } - - self()->setEdgeFrequenciesOnNode(node, taken, nottaken, comp()); - setBlockFrequency(node, taken + nottaken); - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "If on node %p is not guard I'm using the taken and nottaken counts for producing block frequency\n", node->asBlock()->getLastRealTreeTop()->getNode()); - } - else - { - if (comp()->getOption(TR_TraceBFGeneration)) - { - TR_PredecessorIterator pit(node); - for (TR::CFGEdge *edge = pit.getFirst(); edge; edge = pit.getNext()) - { - TR::CFGNode *pred = edge->getFrom(); - - if (edge->getFrequency()<=0) - continue; - - int32_t edgeFreq = edge->getFrequency(); - - if (comp()->getOption(TR_TraceBFGeneration)) - traceMsg(comp(), "11Pred %d has freq %d\n", pred->getNumber(), edgeFreq); - } - } - - TR::CFGNode *predNotSet = NULL; - TR_PredecessorIterator pit(node); - for (TR::CFGEdge *edge = pit.getFirst(); edge; edge = pit.getNext()) - { - TR::CFGNode *pred = edge->getFrom(); - - if (pred->getFrequency() < 0) - { - predNotSet = pred; - break; - } - } - - if (predNotSet && - !seenNodesInCycle->get(node->getNumber())) - { - stack.add(predNotSet); - seenNodesInCycle->set(node->getNumber()); - seenNodes->reset(node->getNumber()); - continue; - } - - if (!predNotSet) - seenNodesInCycle->empty(); - - int32_t sumFreq = summarizeFrequencyFromPredecessors(node, self()); - if (sumFreq <= 0) - sumFreq = AVG_FREQ; - self()->setEdgeFrequenciesOnNode(node, 0, sumFreq, comp()); - setBlockFrequency(node, sumFreq); - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "If on node %p is guard I'm using the predecessor frequency sum\n", node->asBlock()->getLastRealTreeTop()->getNode()); - } - - if (comp()->getOption(TR_TraceBFGeneration)) - dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); - } - else if (node->asBlock()->getEntry() && - (node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - node->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) - { - seenNodesInCycle->empty(); - int32_t sumFreq = _externalProfiler->getSumSwitchCount(node->asBlock()->getLastRealTreeTop()->getNode(), comp()); - setSwitchEdgeFrequenciesOnNode(node, comp()); - setBlockFrequency(node, sumFreq); - if (comp()->getOption(TR_TraceBFGeneration)) - { - dumpOptDetails(comp(), "Found a Switch statement at exit of block_%d\n", node->getNumber()); - dumpOptDetails(comp(), "Set frequency of %d on block_%d\n", node->asBlock()->getFrequency(), node->getNumber()); - } - } - else - { - TR::CFGNode *predNotSet = NULL; - int32_t sumFreq = 0; - TR_PredecessorIterator pit(node); - for (TR::CFGEdge *edge = pit.getFirst(); edge; edge = pit.getNext()) - { - TR::CFGNode *pred = edge->getFrom(); - - if (pred->getFrequency() < 0) - { - predNotSet = pred; - break; - } - - int32_t edgeFreq = edge->getFrequency(); - sumFreq += edgeFreq; - } - - if (predNotSet && - !seenNodesInCycle->get(node->getNumber())) - { - seenNodesInCycle->set(node->getNumber()); - seenNodes->reset(node->getNumber()); - stack.add(predNotSet); - continue; - } - else - { - if (!predNotSet) - seenNodesInCycle->empty(); - - if (comp()->getOption(TR_TraceBFGeneration)) - traceMsg(comp(), "2Setting block and uniform freqs\n"); - - if ((node->getSuccessors().size() == 2) && (sumFreq > 0) && node->asBlock()->getEntry() && node->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) - self()->setEdgeFrequenciesOnNode(node, 0, sumFreq, comp()); - else - self()->setUniformEdgeFrequenciesOnNode(node, sumFreq, false, comp()); - setBlockFrequency(node, sumFreq); - } - - if (comp()->getOption(TR_TraceBFGeneration)) - { - dumpOptDetails(comp(), "Not an if (or unknown if) at exit of block %d (isSingleton=%d)\n", node->getNumber(), (node->getSuccessors().size() == 1)); - dumpOptDetails(comp(), "Set frequency of %d on block %d\n", node->asBlock()->getFrequency(), node->getNumber()); - } - } - } - - TR_SuccessorIterator sit(node); - // add the successors on the list - for (TR::CFGEdge *edge = sit.getFirst(); edge; edge = sit.getNext()) - { - TR::CFGNode *succ = edge->getTo(); - - if (!seenNodes->isSet(succ->getNumber())) - stack.add(succ); - else - { - if (!(((succ->getSuccessors().size() == 2) && - succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch()) || - (succ->asBlock()->getEntry() && - (succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - succ->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)))) - { - setBlockFrequency(succ, edge->getFrequency(), true); - - // addup this edge to the frequency of the blocks following it - // propagate downward until you reach a block that doesn't end in - // goto or cycle - if (succ->getSuccessors().size() == 1) - { - TR::CFGNode *tempNode = succ->getSuccessors().front()->getTo(); - TR_BitVector *seenGotoNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - seenGotoNodes->set(succ->getNumber()); - while ((tempNode->getSuccessors().size() == 1) && - (tempNode != succ) && - (tempNode != getEnd()) && - !seenGotoNodes->isSet(tempNode->getNumber())) - { - TR::CFGNode *nextTempNode = tempNode->getSuccessors().front()->getTo(); - - if (comp()->getOption(TR_TraceBFGeneration)) - traceMsg(comp(), "3Setting block and uniform freqs\n"); - - self()->setUniformEdgeFrequenciesOnNode(tempNode, edge->getFrequency(), true, comp()); - setBlockFrequency(tempNode, edge->getFrequency(), true); - - seenGotoNodes->set(tempNode->getNumber()); - tempNode = nextTempNode; - } - } - } - } - } - } - - - if (backEdgeExists) - { - while (!upStack.isEmpty()) - { - TR::CFGNode *temp = upStack.popHead(); - if (temp == start) - continue; - - if (backEdgeExists) - temp->setFrequency(std::max(MAX_COLD_BLOCK_COUNT+1, startFrequency/10)); - } - } - - - // now set all the blocks that haven't been seen to have 0 frequency - // catch blocks are one example, we don't try to set frequency there to - // save compilation time, since those are most frequently cold. Future might prove - // this otherwise. - - TR::CFGNode *node; - start = getStart(); - for (node = getFirstNode(); node; node=node->getNext()) - { - if ((node == getEnd()) || (node == start)) - node->asBlock()->setFrequency(0); - - if (seenNodes->isSet(node->getNumber())) - continue; - - if (node->asBlock()->getEntry() && - (node->asBlock()->getFrequency()<0)) - node->asBlock()->setFrequency(0); - } - - //scaleEdgeFrequencies(); - - TR_BitVector *nodesToBeNormalized = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - nodesToBeNormalized->setAll(numBlocks); - _maxFrequency = -1; - _maxEdgeFrequency = -1; - normalizeFrequencies(nodesToBeNormalized); - _oldMaxFrequency = _maxFrequency; - _oldMaxEdgeFrequency = _maxEdgeFrequency; - - if (comp()->getOption(TR_TraceBFGeneration)) - traceMsg(comp(), "max freq %d max edge freq %d\n", _maxFrequency, _maxEdgeFrequency); - - } - -void -J9::CFG::computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compilation *comp) - { - TR_ExternalProfiler *profiler = comp->fej9()->hasIProfilerBlockFrequencyInfo(*comp); - if (!profiler) - { - _initialBlockFrequency = AVG_FREQ; - return; - } - - _externalProfiler = profiler; - - TR::StackMemoryRegion stackMemoryRegion(*trMemory()); - int32_t numBlocks = getNextNodeNumber(); - - TR_BitVector *seenNodes = new (trStackMemory()) TR_BitVector(numBlocks, trMemory(), stackAlloc, notGrowable); - _frequencySet = new (trHeapMemory()) TR_BitVector(numBlocks, trMemory(), heapAlloc, notGrowable); - int32_t startFrequency = AVG_FREQ; - int32_t taken = AVG_FREQ; - int32_t nottaken = AVG_FREQ; - int32_t initialCallScanFreq = -1; - int32_t inlinedSiteIndex = -1; - - // Walk until we find if or switch statement - TR::CFGNode *start = getStart(); - TR::CFGNode *temp = start; - - bool backEdgeExists = false; - TR::TreeTop *methodScanEntry = NULL; - - while ((temp->getSuccessors().size() == 1) || - !temp->asBlock()->getEntry() || - !((temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() - && !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) || - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) - { - if (seenNodes->isSet(temp->getNumber())) - break; - - seenNodes->set(temp->getNumber()); - - if (!temp->getPredecessors().empty() && !(temp->getPredecessors().size() == 1)) - backEdgeExists = true; - - if (temp->asBlock()->getEntry() && initialCallScanFreq < 0) - { - int32_t methodScanFreq = scanForFrequencyOnSimpleMethod(temp->asBlock()->getEntry(), temp->asBlock()->getExit()); - if (methodScanFreq > 0) - initialCallScanFreq = methodScanFreq; - inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); - methodScanEntry = temp->asBlock()->getEntry(); - } - - if (!temp->getSuccessors().empty()) - { - if ((temp->getSuccessors().size() == 2) && temp->asBlock() && temp->asBlock()->getNextBlock()) - temp = temp->asBlock()->getNextBlock(); - else - temp = temp->getSuccessors().front()->getTo(); - } - else - break; - } - - if (temp->asBlock()->getEntry()) - inlinedSiteIndex = temp->asBlock()->getEntry()->getNode()->getInlinedSiteIndex(); - - if ((temp->getSuccessors().size() == 2) && - temp->asBlock()->getEntry() && - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCode().isBranch() && - !temp->asBlock()->getLastRealTreeTop()->getNode()->getByteCodeInfo().doNotProfile()) - { - getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(temp, &taken, ¬taken); - if ((taken <= 0) && (nottaken <= 0)) - { - taken = LOW_FREQ; - nottaken = LOW_FREQ; - } - startFrequency = taken + nottaken; - } - else if (temp->asBlock()->getEntry() && - (temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::lookup || - temp->asBlock()->getLastRealTreeTop()->getNode()->getOpCodeValue() == TR::table)) - { - startFrequency = _externalProfiler->getSumSwitchCount(temp->asBlock()->getLastRealTreeTop()->getNode(), comp); - } - else - { - if (_calledFrequency > 0) - startFrequency = _calledFrequency; - else if (initialCallScanFreq > 0) - startFrequency = initialCallScanFreq; - else - startFrequency = AVG_FREQ; - } - - if (startFrequency <= 0) - startFrequency = LOW_FREQ; - - _initialBlockFrequency = startFrequency; - } -#endif /* defined(J9VM_OPT_MICROJIT) */ diff --git a/runtime/compiler/infra/J9Cfg.hpp b/runtime/compiler/infra/J9Cfg.hpp index a863168f36c..83f514a2839 100644 --- a/runtime/compiler/infra/J9Cfg.hpp +++ b/runtime/compiler/infra/J9Cfg.hpp @@ -39,9 +39,7 @@ namespace J9 { typedef J9::CFG CFGConnector; } #include "cs2/listof.h" #include "env/TRMemory.hpp" #include "il/Node.hpp" -#if defined(J9VM_OPT_MICROJIT) #include "ilgen/J9ByteCode.hpp" -#endif /* defined(J9VM_OPT_MICROJIT) */ #include "infra/Assert.hpp" #include "infra/List.hpp" #include "infra/TRCfgEdge.hpp" @@ -89,25 +87,10 @@ class CFG : public OMR::CFGConnector void setBlockAndEdgeFrequenciesBasedOnStructure(); TR_BitVector *setBlockAndEdgeFrequenciesBasedOnJITProfiler(); - void setBlockFrequenciesBasedOnInterpreterProfiler(); + void setBlockFrequenciesBasedOnInterpreterProfiler(bool mjit); // mjit set to true for MicroJIT compilations; false otherwise void computeInitialBlockFrequencyBasedOnExternalProfiler(TR::Compilation *comp); - // May need MicroJIT versions of each of previous 3 methods in case they are called from outside MicroJIT on MJIT methods - #if defined(J9VM_OPT_MICROJIT) - // A number of methods here differ from their TR counter parts only in whether the TR::Node - // used for information about bytecode is in getLastRealTreeTop(), or getExit() on the asBlock() - // of the cfgNode. Once this is working, first clean-up step will be to have these methods - // take the node as a parameter instead, to reduce code duplication. - TR_J9ByteCode getBytecodeFromIndex(int32_t index); // Helper for getting bytecode from an index - TR_J9ByteCode getLastRealBytecodeOfBlock(TR::CFGNode *start); // Helper for getting last byteode in the CFGNode's block - //TR_BitVector * setBlockAndEdgeFrequenciesBasedOnJITProfilerMicroJIT(); Uncomment after implementing in the cpp file. - void getInterpreterProfilerBranchCountersOnDoubletonMicroJIT(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken); - void setSwitchEdgeFrequenciesOnNodeMicroJIT(TR::CFGNode *node, TR::Compilation *comp); - bool continueLoop(TR::CFGNode *temp); - void setBlockFrequenciesBasedOnInterpreterProfilerMicroJIT(); - void computeInitialBlockFrequencyBasedOnExternalProfilerMicroJIT(TR::Compilation *comp); - #endif /* defined(J9VM_OPT_MICROJIT) */ void propagateFrequencyInfoFromExternalProfiler(TR_ExternalProfiler *profiler); - void getInterpreterProfilerBranchCountersOnDoubleton(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken); + void getInterpreterProfilerBranchCountersOnDoubleton(TR::CFGNode *cfgNode, int32_t *taken, int32_t *nottaken, bool mjitFlag); void setSwitchEdgeFrequenciesOnNode(TR::CFGNode *node, TR::Compilation *comp); void setBlockFrequency(TR::CFGNode *node, int32_t frequency, bool addFrequency = false); int32_t scanForFrequencyOnSimpleMethod(TR::TreeTop *tt, TR::TreeTop *endTT); @@ -116,6 +99,11 @@ class CFG : public OMR::CFGConnector void getBranchCountersFromProfilingData(TR::Node *node, TR::Block *block, int32_t *taken, int32_t *notTaken); bool emitVerbosePseudoRandomFrequencies(); + bool isBranch(TR_J9ByteCode bc); + bool isTableOp(TR_J9ByteCode bc); + TR_J9ByteCode getBytecodeFromIndex(int32_t index); // Helper for getting bytecode from an index + TR_J9ByteCode getLastRealBytecodeOfBlock(TR::CFGNode *start); // Helper for getting last byteode in the CFGNode's block + bool continueLoop(TR::CFGNode *temp); protected: From 0de87ac1bac7f20262b3e2b5f42d044564704c24 Mon Sep 17 00:00:00 2001 From: Harpreet Kaur Date: Thu, 12 May 2022 20:53:34 -0300 Subject: [PATCH 25/25] Rebase changes Signed-off-by: Harpreet Kaur --- runtime/cmake/platform/toolcfg/gnu.cmake | 2 +- runtime/compiler/microjit/HWMetaDataCreation.cpp | 16 ++++++++-------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/runtime/cmake/platform/toolcfg/gnu.cmake b/runtime/cmake/platform/toolcfg/gnu.cmake index b3b8bf93b22..aaba270e2a2 100644 --- a/runtime/cmake/platform/toolcfg/gnu.cmake +++ b/runtime/cmake/platform/toolcfg/gnu.cmake @@ -1,5 +1,5 @@ ################################################################################ -# Copyright (c) 2020, 2021 IBM Corp. and others +# Copyright (c) 2020, 2022 IBM Corp. and others # # This program and the accompanying materials are made available under # the terms of the Eclipse Public License 2.0 which accompanies this diff --git a/runtime/compiler/microjit/HWMetaDataCreation.cpp b/runtime/compiler/microjit/HWMetaDataCreation.cpp index 95a5dc36bef..4c746ac7b2e 100644 --- a/runtime/compiler/microjit/HWMetaDataCreation.cpp +++ b/runtime/compiler/microjit/HWMetaDataCreation.cpp @@ -182,7 +182,7 @@ class BytecodeIndexList return this; if (_next) return _next->appendBCI(i); - BytecodeIndexList *newIndex = new (_cfg.getInternalRegion()) BytecodeIndexList(i, this, _next, _cfg); + BytecodeIndexList *newIndex = new (_cfg.getInternalMemoryRegion()) BytecodeIndexList(i, this, _next, _cfg); _next = newIndex; return newIndex; } @@ -202,7 +202,7 @@ class BytecodeIndexList } else { - BytecodeIndexList *newIndex = new (_cfg.getInternalRegion()) BytecodeIndexList(i, this, _next, _cfg); + BytecodeIndexList *newIndex = new (_cfg.getInternalMemoryRegion()) BytecodeIndexList(i, this, _next, _cfg); return newIndex; } } @@ -258,7 +258,7 @@ class BlockList ,_range(start) ,_block(NULL) { - _bcListHead = new (_cfg.getInternalRegion()) BytecodeIndexList(start, NULL, NULL, _cfg); + _bcListHead = new (_cfg.getInternalMemoryRegion()) BytecodeIndexList(start, NULL, NULL, _cfg); _bcListCurrent = _bcListHead; } @@ -296,7 +296,7 @@ class BlockList inline void addSuccessor(BytecodeRange *successor) { - BytecodeRangeList *brl = new (_cfg.getInternalRegion()) BytecodeRangeList(successor); + BytecodeRangeList *brl = new (_cfg.getInternalMemoryRegion()) BytecodeRangeList(successor); if (_successorList) _successorList = _successorList->insert(brl); else @@ -346,14 +346,14 @@ class BlockList // The target isn't before this block, isn't this block, isn't in this block, and there is no more list to search. // Time to make a new block and add it to the list - BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); + BlockList *target = new (_cfg.getInternalMemoryRegion()) BlockList(_cfg, index, _tail); insertAfter(target); return target; } BlockList *splitBlockListOnIndex(uint32_t index) { - BlockList *target = new (_cfg.getInternalRegion()) BlockList(_cfg, index, _tail); + BlockList *target = new (_cfg.getInternalMemoryRegion()) BlockList(_cfg, index, _tail); insertAfter(target); BytecodeIndexList *splitPoint = _bcListHead->getBCIListEntry(index); @@ -425,7 +425,7 @@ class CFGCreator // Just started parsing, make the block list. if (!_head) { - _head = new (_cfg.getInternalRegion()) BlockList(_cfg, bci->currentByteCodeIndex(), &_tail); + _head = new (_cfg.getInternalMemoryRegion()) BlockList(_cfg, bci->currentByteCodeIndex(), &_tail); _tail = _head; _current = _head; } @@ -447,7 +447,7 @@ class CFGCreator if (_newBlock) { // Current block has ended in a branch, this is a new block now - nextBlock = new (_cfg.getInternalRegion()) BlockList(_cfg, bci->currentByteCodeIndex(), &_tail); + nextBlock = new (_cfg.getInternalMemoryRegion()) BlockList(_cfg, bci->currentByteCodeIndex(), &_tail); _current->insertAfter(nextBlock); if(_fallThrough) _current->addSuccessor(&(nextBlock->_range));