Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Add legacy extended EVEX encoding and EVEX.ND/NF feature to x64 emitter backend #108796

Open
wants to merge 89 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
1820567
Ruihan: POC with REX2
Ruihan-Yin Mar 25, 2024
d1afc68
resolve comments
Ruihan-Yin May 17, 2024
2335aa3
refactor register encoding for REX2
Ruihan-Yin May 20, 2024
6578c58
merge REX2 path to legacy path
Ruihan-Yin May 21, 2024
01eeb80
Enable REX2 in more instructions.
Ruihan-Yin May 30, 2024
690aee3
Avoid repeatedly estimate the size of REX2 prefix
Ruihan-Yin Jun 3, 2024
31d7fb4
Enable REX2 encoding on RI and SV path
Ruihan-Yin Jun 5, 2024
a995878
Add rex2 support to rotate and shift.
Ruihan-Yin Jun 6, 2024
74aacf6
CR session.
Ruihan-Yin Jun 7, 2024
c330927
Testing infra updates: assert REX2 is enabled.
Ruihan-Yin Jun 11, 2024
fbf20d1
revert rcl_N and rcr_N, tp and latency data for these instructions is…
Ruihan-Yin Jun 11, 2024
ea02e70
partially enable REX2 on emitOutputAM, case covered: R_AR and AR_R.
Ruihan-Yin Jun 12, 2024
c74b801
Adding unit tests.
Ruihan-Yin Jun 13, 2024
34980b4
push, pop, inc, dec, neg, not, xadd, shld, shrd, cmpxchg, setcc, bswap.
Ruihan-Yin Jun 26, 2024
2ffdbeb
bug fix for bswap
Ruihan-Yin Jun 27, 2024
3a729bb
bt
Ruihan-Yin Jun 28, 2024
d943b03
xchg, idiv
Ruihan-Yin Jul 1, 2024
c8fee9c
Make sure add REX2 prefix if register encoding for EGPRs are being ca…
Ruihan-Yin Jul 2, 2024
6ec0e97
Ensure code size is correctly computed in R_R_I path.
Ruihan-Yin Jul 8, 2024
1d01003
clean up
Ruihan-Yin Jul 9, 2024
1acc219
Change all AddSimdPrefix to AddX86Prefix
Ruihan-Yin Jul 15, 2024
87ad443
div, mulEAX
Ruihan-Yin Jul 16, 2024
bb9905a
filter out test from REX2 encoding when using ACC form.
Ruihan-Yin Jul 19, 2024
86083b2
Make sure REX prefix will not be added when emitting with REX2.
Ruihan-Yin Jul 24, 2024
dfe8760
resolve comments.
Ruihan-Yin Aug 5, 2024
64761cd
make sure the APX debug knob is only available under debug build.
Ruihan-Yin Oct 24, 2024
f1aba62
clean up some out-dated code.
Ruihan-Yin Nov 12, 2024
f5cc5a8
enable movsxd
Ruihan-Yin Nov 12, 2024
7ca8433
Enable "Call"
Ruihan-Yin Nov 13, 2024
bc4d225
Enable "JMP"
Ruihan-Yin Nov 15, 2024
deb3814
resolve merge errors
Ruihan-Yin Nov 18, 2024
0d63230
formatting
Ruihan-Yin Nov 18, 2024
13b8076
remote coredistools.dll for internal tests only
Ruihan-Yin Nov 18, 2024
42c6cfc
bug fix
Ruihan-Yin Nov 19, 2024
b1a9617
SUB reg, reg, reg
Ruihan-Yin Aug 8, 2024
ec5d5ca
enable NDD on genCodeForBinary
Ruihan-Yin Aug 28, 2024
ebeaf04
consolidate TakesLegacyPromotedEvexPrefix logics.
Ruihan-Yin Aug 30, 2024
547f01d
ensure register encoding is correct under legacy-promoted-evex encoding.
Ruihan-Yin Aug 30, 2024
3566464
Make sure the overflow check is correctly emitted.
Ruihan-Yin Sep 4, 2024
f8e9c4d
simplify the compiler setup logics.
Ruihan-Yin Sep 4, 2024
6bfd050
emitInsNddBinary
Ruihan-Yin Sep 6, 2024
4b0085d
make sure REX will not be added when EVEX presents.
Ruihan-Yin Sep 7, 2024
5701b1c
resolve comment and clean up.
Ruihan-Yin Sep 11, 2024
6d30388
enable more NDD instructions.
Ruihan-Yin Sep 13, 2024
5d3768c
bug fixes
Ruihan-Yin Sep 13, 2024
a5619e4
enable imul
Ruihan-Yin Sep 13, 2024
c71ace6
add emitter unit tests, and fix encoding error for CMOVcc
Ruihan-Yin Sep 16, 2024
ca92da9
bug fixes:
Ruihan-Yin Sep 18, 2024
5d10aef
refactor emitInsBinary
Ruihan-Yin Sep 19, 2024
5f288a6
clean up
Ruihan-Yin Sep 19, 2024
f4e96b0
clean up and refactor some code
Ruihan-Yin Sep 20, 2024
637c413
make sure the code size estimation is correct for some apx promoted i…
Ruihan-Yin Sep 25, 2024
a203a4d
add tuning knob to EVEX.ND feature.
Ruihan-Yin Sep 30, 2024
a99705a
flip the Evex.nd knob.
Ruihan-Yin Oct 1, 2024
b5fa5bf
put NDD control knob to the correct place.
Ruihan-Yin Oct 3, 2024
b69d01e
resolve merge errors
Ruihan-Yin Nov 20, 2024
52539c3
Make sure APX related knobs are defined properly across platforms
Ruihan-Yin Nov 20, 2024
25d66bf
Add Evex.nf to instrDesc
Ruihan-Yin Oct 2, 2024
a19da9e
{nf} add reg, reg
Ruihan-Yin Oct 8, 2024
2e8d714
Enable EVEX.NF in more instructions
Ruihan-Yin Oct 9, 2024
df59342
more instructions
Ruihan-Yin Oct 10, 2024
226fabb
comments.
Ruihan-Yin Oct 10, 2024
36c6631
lzcnt, tzcnt, popcnt
Ruihan-Yin Oct 10, 2024
5f8a01d
Exclude ACC form from EVEX promotion.
Ruihan-Yin Oct 15, 2024
0453630
BMI instructions.
Ruihan-Yin Oct 15, 2024
07868bc
bug fixes
Ruihan-Yin Oct 16, 2024
69f7e8b
Tweak the code size calculation to make sure REX2 and APX-EVEX are pr…
Ruihan-Yin Oct 18, 2024
1c1a894
bug fixes for stress mode
Ruihan-Yin Oct 29, 2024
1be4b12
Add idEvexNoPromotion to emitter to exclude the APX-EVEX promotion fr…
Ruihan-Yin Nov 4, 2024
bfb06c7
resolve merge error
Ruihan-Yin Nov 20, 2024
9541a99
fix merge error
Ruihan-Yin Nov 21, 2024
543d949
Revert "Add idEvexNoPromotion to emitter to exclude the APX-EVEX prom…
Ruihan-Yin Nov 21, 2024
a879019
bug fix
Ruihan-Yin Nov 22, 2024
55cbda6
introduce _no_evex suffix for some instructions for cases when LOCK w…
Ruihan-Yin Nov 22, 2024
a9a3d5c
Merge remote-tracking branch 'origin/main' into apx-evex-nf-nov
Ruihan-Yin Dec 17, 2024
0eef560
resolve merge comflict
Ruihan-Yin Dec 17, 2024
0480c02
fix merge error.
Ruihan-Yin Dec 17, 2024
48cec5f
fix comments and some checks.
Ruihan-Yin Dec 19, 2024
7171e0e
formatting
Ruihan-Yin Dec 19, 2024
5f7606c
remove unneeded env var.
Ruihan-Yin Dec 19, 2024
6e33640
Make sure the BMI instruction is properly hidden behind APX stress kn…
Ruihan-Yin Jan 3, 2025
02786a1
resolve merge error.
Ruihan-Yin Jan 3, 2025
924ba0e
Resolve comments.
Ruihan-Yin Jan 6, 2025
7bc388b
formatting.
Ruihan-Yin Jan 7, 2025
4bf45ae
Merge remote-tracking branch 'origin/main' into apx-evex-legacy-jan
Ruihan-Yin Jan 23, 2025
fd73268
Add rcr/rcl emitter unit tests for extended EVEX.
Ruihan-Yin Jan 23, 2025
521f978
Merge remote-tracking branch 'origin/main' into apx-evex-legacy-jan
Ruihan-Yin Jan 24, 2025
b48e3d1
Resolve merge error.
Ruihan-Yin Jan 24, 2025
5340893
formatting
Ruihan-Yin Jan 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 194 additions & 19 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -402,12 +402,13 @@ void CodeGen::instGen_Set_Reg_To_Imm(emitAttr size,
else
{
// For section constant, the immediate will be relocatable
GetEmitter()->emitIns_R_I(INS_mov, size, reg, imm DEBUGARG(targetHandle) DEBUGARG(gtFlags));
GetEmitter()->emitIns_R_I(INS_mov, size, reg, imm,
INS_OPTS_NONE DEBUGARG(targetHandle) DEBUGARG(gtFlags));
}
}
else
{
GetEmitter()->emitIns_R_I(INS_mov, size, reg, imm DEBUGARG(targetHandle) DEBUGARG(gtFlags));
GetEmitter()->emitIns_R_I(INS_mov, size, reg, imm, INS_OPTS_NONE DEBUGARG(targetHandle) DEBUGARG(gtFlags));
}
}
regSet.verifyRegUsed(reg);
Expand Down Expand Up @@ -738,12 +739,18 @@ void CodeGen::genCodeForNegNot(GenTree* tree)
{
GenTree* operand = tree->gtGetOp1();
assert(operand->isUsedFromReg());
regNumber operandReg = genConsumeReg(operand);
regNumber operandReg = genConsumeReg(operand);
instruction ins = genGetInsForOper(tree->OperGet(), targetType);

inst_Mov(targetType, targetReg, operandReg, /* canSkip */ true);

instruction ins = genGetInsForOper(tree->OperGet(), targetType);
inst_RV(ins, targetReg, targetType);
if (GetEmitter()->DoJitUseApxNDD(ins) && (targetReg != operandReg))
{
GetEmitter()->emitIns_R_R(ins, emitTypeSize(operand), targetReg, operandReg, INS_OPTS_EVEX_nd);
}
else
{
inst_Mov(targetType, targetReg, operandReg, /* canSkip */ true);
inst_RV(ins, targetReg, targetType);
}
Comment on lines +745 to +753
Copy link
Member

@tannergooding tannergooding Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This general pattern is repeated quite a lot (with some variations), so I wonder if we should have a helper like I added for SIMD.

For example, we have emitIns_SIMD_R_R_R which looks like: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/emitxarch.cpp#L8855-L8880 (other variations exist for handling things like memory operands or immediate; and higher level helpers like genHWIntrinsic_R_R_RM exist for determining which of the variations to call between emitIns_SIMD_R_R_R, emitIns_SIMD_R_R_A, emitIns_SIMD_R_R_C, and emitIns_SIMD_R_R_S)

This lets us correctly represent any SIMD dst = src1 op src2 operation given the raw registers and then internally handles the RMW consideration, so that the rest of codegen can remain simpler and more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, for example, it seems like we "could" have simplified this down to something like:

GetEmitter()->emitIns_BASE_R_R(ins, emitTypeSize(operand), targetReg, operandReg);

and than had this helper make the distinction of handling APX, NDD, inserting the Mov for the regular case; etc

Presumably this would also make the diffs for other APX support much simpler as well, since we have fewer centralized helpers to update.

}

genProduceReg(tree);
Expand Down Expand Up @@ -1158,12 +1165,49 @@ void CodeGen::genCodeForBinary(GenTreeOp* treeNode)
// reg3 = reg3 op reg2
else
{
var_types op1Type = op1->TypeGet();
inst_Mov(op1Type, targetReg, op1reg, /* canSkip */ false);
regSet.verifyRegUsed(targetReg);
gcInfo.gcMarkRegPtrVal(targetReg, op1Type);
dst = treeNode;
src = op2;
if (emit->DoJitUseApxNDD(ins) && !varTypeIsFloating(treeNode))
{
// TODO-xarch-apx:
// APX can provide optimal code gen in this case using NDD feature:
// reg3 = op1 op op2 without extra mov

// see if it can be optimized by inc/dec
if (oper == GT_ADD && op2->isContainedIntOrIImmed() && !treeNode->gtOverflowEx())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling here of ADD into INC/DEC is also repeated in multiple locations, so probably another place where having centralized helpers is beneficial and ensures we're not missing it anywhere.

It's much better as a peephole in emit than something codegen must directly consider, IMO.

{
if (op2->IsIntegralConst(1))
{
emit->emitIns_R_R(INS_inc, emitTypeSize(treeNode), targetReg, op1reg, INS_OPTS_EVEX_nd);
genProduceReg(treeNode);
return;
}
else if (op2->IsIntegralConst(-1))
{
emit->emitIns_R_R(INS_dec, emitTypeSize(treeNode), targetReg, op1reg, INS_OPTS_EVEX_nd);
genProduceReg(treeNode);
return;
}
}

assert(op1reg != targetReg);
assert(op2reg != targetReg);
emit->emitInsBinary(ins, emitTypeSize(treeNode), op1, op2, targetReg);
if (treeNode->gtOverflowEx())
{
assert(oper == GT_ADD || oper == GT_SUB);
genCheckOverflow(treeNode);
}
genProduceReg(treeNode);
return;
}
else
{
var_types op1Type = op1->TypeGet();
inst_Mov(op1Type, targetReg, op1reg, /* canSkip */ false);
regSet.verifyRegUsed(targetReg);
gcInfo.gcMarkRegPtrVal(targetReg, op1Type);
dst = treeNode;
src = op2;
}
}

// try to use an inc or dec
Expand All @@ -1182,6 +1226,7 @@ void CodeGen::genCodeForBinary(GenTreeOp* treeNode)
return;
}
}

regNumber r = emit->emitInsBinary(ins, emitTypeSize(treeNode), dst, src);
noway_assert(r == targetReg);

Expand Down Expand Up @@ -1295,6 +1340,24 @@ void CodeGen::genCodeForMul(GenTreeOp* treeNode)
}
assert(regOp->isUsedFromReg());

if (emit->DoJitUseApxNDD(ins) && regOp->GetRegNum() != mulTargetReg)
{
// use NDD form to optimize this form:
// mov targetReg, regOp
// imul targetReg, rmOp
// to imul targetReg, regOp rmOp.
emit->emitInsBinary(ins, size, regOp, rmOp, mulTargetReg);
if (requiresOverflowCheck)
{
// Overflow checking is only used for non-floating point types
noway_assert(!varTypeIsFloating(treeNode));

genCheckOverflow(treeNode);
}
genProduceReg(treeNode);
return;
}

// Setup targetReg when neither of the source operands was a matching register
inst_Mov(targetType, mulTargetReg, regOp->GetRegNum(), /* canSkip */ true);

Expand Down Expand Up @@ -4406,23 +4469,23 @@ void CodeGen::genCodeForLockAdd(GenTreeOp* node)
if (imm == 1)
{
// inc [addr]
GetEmitter()->emitIns_AR(INS_inc, size, addr->GetRegNum(), 0);
GetEmitter()->emitIns_AR(INS_inc_no_evex, size, addr->GetRegNum(), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We should probably keep the existing name since its the baseline instruction. We should rather give the APX specific variant a new name, like INS_inc_apx or similar, to helper ensure other paths don't accidentally use the wrong one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here were made due to the fact that the LOCK prefix can not be used with a EVEX prefixed instruction, but it is legal with a REX2 prefixed instructions. And this happens in very limited cases with inc, dec, and, or

I definitely agree with the idea that we should make the new naming variants pointing to the instructions with new features and only use them when new features are needed like EGPRs, NDD, and NF. But I will probably need to preserve the REX2 functionality in the original INS_inc to get EGPRs support. It might be a bit off the semantic the names: INS_inc/INS_inc_apx. Will that be acceptable?

Copy link
Member

@tannergooding tannergooding Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I will probably need to preserve the REX2 functionality in the original INS_inc to get EGPRs support.

It's definitely fine for an instruction like INS_inc to allow opportunistic lightup for the REX2 encoding; we have the same thing where INS_addps is used for legacy, vex, and evex all together for example.

The main consideration is simply that we don't want the "good name" like INS_inc to be the thing that requires higher level checks (i.e. requires checking APX is supported). Such a case would inevitably cause issues down the road because someone thinks it is simply the inc instruction that's been around for 40+ years now.

If there must be two different entries for the same instruction because the opcodes conflict, then names like INS_inc and INS_inc_apx sound good to me. However, if its just a restriction that something like LOCK can't use the EVEX encoding and the opcode and base information otherwise remains the same, that sounds like we don't actually need "two instructions" defined and is rather something that LSRA handles in the allowed registers and codegen handles in the INS_OPTS it passes down

}
else if (imm == -1)
{
// dec [addr]
GetEmitter()->emitIns_AR(INS_dec, size, addr->GetRegNum(), 0);
GetEmitter()->emitIns_AR(INS_dec_no_evex, size, addr->GetRegNum(), 0);
}
else
{
// add [addr], imm
GetEmitter()->emitIns_I_AR(INS_add, size, imm, addr->GetRegNum(), 0);
GetEmitter()->emitIns_I_AR(INS_add_no_evex, size, imm, addr->GetRegNum(), 0);
}
}
else
{
// add [addr], data
GetEmitter()->emitIns_AR_R(INS_add, size, data->GetRegNum(), addr->GetRegNum(), 0);
GetEmitter()->emitIns_AR_R(INS_add_no_evex, size, data->GetRegNum(), addr->GetRegNum(), 0);
}
}

Expand All @@ -4449,7 +4512,7 @@ void CodeGen::genLockedInstructions(GenTreeOp* node)

if (node->OperIs(GT_XORR, GT_XAND))
{
const instruction ins = node->OperIs(GT_XORR) ? INS_or : INS_and;
const instruction ins = node->OperIs(GT_XORR) ? INS_or_no_evex : INS_and_no_evex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on dec, add, or, etc.


if (node->IsUnusedValue())
{
Expand Down Expand Up @@ -4841,6 +4904,24 @@ void CodeGen::genCodeForShift(GenTree* tree)
genProduceReg(tree);
return;
}

if (GetEmitter()->DoJitUseApxNDD(ins) && (tree->GetRegNum() != operandReg))
{
ins = genMapShiftInsToShiftByConstantIns(ins, shiftByValue);
// If APX is available, we can use NDD to optimize the case when LSRA failed to avoid explicit mov.
// this case might be rarely hit.
if (shiftByValue == 1)
{
GetEmitter()->emitIns_R_R(ins, emitTypeSize(tree), tree->GetRegNum(), operandReg, INS_OPTS_EVEX_nd);
}
else
{
GetEmitter()->emitIns_R_R_I(ins, emitTypeSize(tree), tree->GetRegNum(), operandReg, shiftByValue,
INS_OPTS_EVEX_nd);
}
genProduceReg(tree);
return;
}
#endif
// First, move the operand to the destination register and
// later on perform the shift in-place.
Expand Down Expand Up @@ -4887,6 +4968,15 @@ void CodeGen::genCodeForShift(GenTree* tree)
// The operand to be shifted must not be in ECX
noway_assert(operandReg != REG_RCX);

if (GetEmitter()->DoJitUseApxNDD(ins) && (tree->GetRegNum() != operandReg))
{
// If APX is available, we can use NDD to optimize the case when LSRA failed to avoid explicit mov.
// this case might be rarely hit.
GetEmitter()->emitIns_R_R(ins, emitTypeSize(tree), tree->GetRegNum(), operandReg, INS_OPTS_EVEX_nd);
genProduceReg(tree);
return;
}

inst_Mov(targetType, tree->GetRegNum(), operandReg, /* canSkip */ true);
inst_RV(ins, tree->GetRegNum(), targetType);
}
Expand Down Expand Up @@ -9237,6 +9327,91 @@ void CodeGen::genAmd64EmitterUnitTestsApx()

theEmitter->emitIns_S(INS_neg, EA_2BYTE, 0, 0);
theEmitter->emitIns_S(INS_not, EA_2BYTE, 0, 0);

// APX-EVEX

theEmitter->emitIns_R_R_R(INS_add, EA_8BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_R(INS_sub, EA_2BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_R(INS_or, EA_2BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_R(INS_and, EA_2BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_R(INS_xor, EA_1BYTE, REG_R10, REG_EAX, REG_ECX, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R_I(INS_or, EA_2BYTE, REG_R10, REG_EAX, 10565, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_I(INS_or, EA_8BYTE, REG_R10, REG_EAX, 10, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_S(INS_or, EA_8BYTE, REG_R10, REG_EAX, 0, 1, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R(INS_neg, EA_2BYTE, REG_R10, REG_ECX, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R(INS_shl, EA_2BYTE, REG_R11, REG_EAX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R(INS_shl_1, EA_2BYTE, REG_R11, REG_EAX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_I(INS_shl_N, EA_2BYTE, REG_R11, REG_ECX, 7, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_I(INS_shl_N, EA_2BYTE, REG_R11, REG_ECX, 7, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_I(INS_rcr_N, EA_2BYTE, REG_R11, REG_ECX, 7, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_I(INS_rcl_N, EA_2BYTE, REG_R11, REG_ECX, 7, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R(INS_inc, EA_2BYTE, REG_R11, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R(INS_dec, EA_2BYTE, REG_R11, REG_ECX, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R_R(INS_cmovo, EA_4BYTE, REG_R12, REG_R11, REG_EAX, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R_R(INS_imul, EA_4BYTE, REG_R12, REG_R11, REG_ECX, INS_OPTS_EVEX_nd);
theEmitter->emitIns_R_R_S(INS_imul, EA_4BYTE, REG_R12, REG_R11, 0, 1, INS_OPTS_EVEX_nd);

theEmitter->emitIns_R_R(INS_add, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_sub, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_and, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_or, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_xor, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_inc, EA_4BYTE, REG_R12, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_dec, EA_4BYTE, REG_R12, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_I(INS_add, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_sub, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_and, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_or, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_xor, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_S(INS_add, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_sub, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_and, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_or, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_xor, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R(INS_neg, EA_2BYTE, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_shl, EA_2BYTE, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_shl_1, EA_2BYTE, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_shl_N, EA_2BYTE, REG_R11, 7, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_shl_N, EA_2BYTE, REG_R11, 7, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_rcr_N, EA_2BYTE, REG_R11, 7, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_I(INS_rcl_N, EA_2BYTE, REG_R11, 7, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_R(INS_imul, EA_4BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_imul, EA_4BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_I(INS_imul_15, EA_4BYTE, REG_R12, 5, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R(INS_imulEAX, EA_8BYTE, REG_R12, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_mulEAX, EA_8BYTE, REG_R12, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_div, EA_8BYTE, REG_R12, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R(INS_idiv, EA_8BYTE, REG_R12, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_R(INS_tzcnt_evex, EA_8BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_lzcnt_evex, EA_8BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_popcnt_evex, EA_8BYTE, REG_R12, REG_R11, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_S(INS_tzcnt_evex, EA_8BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_lzcnt_evex, EA_8BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_popcnt_evex, EA_8BYTE, REG_R12, 0, 1, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_R_R(INS_add, EA_2BYTE, REG_R12, REG_R13, REG_R11,
(insOpts)(INS_OPTS_EVEX_nf | INS_OPTS_EVEX_nd));

theEmitter->emitIns_R_R_R(INS_andn, EA_8BYTE, REG_R11, REG_R13, REG_R11, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R_R(INS_bextr, EA_8BYTE, REG_R11, REG_R13, REG_R11, INS_OPTS_EVEX_nf);

theEmitter->emitIns_R_R(INS_blsi, EA_8BYTE, REG_R11, REG_R13, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_R(INS_blsmsk, EA_8BYTE, REG_R11, REG_R13, INS_OPTS_EVEX_nf);
theEmitter->emitIns_R_S(INS_blsr, EA_8BYTE, REG_R11, 0, 1);
}

void CodeGen::genAmd64EmitterUnitTestsAvx10v2()
Expand Down Expand Up @@ -11434,7 +11609,7 @@ void CodeGen::instGen_MemoryBarrier(BarrierKind barrierKind)
if (barrierKind == BARRIER_FULL)
{
instGen(INS_lock);
GetEmitter()->emitIns_I_AR(INS_or, EA_4BYTE, 0, REG_SPBASE, 0);
GetEmitter()->emitIns_I_AR(INS_or_no_evex, EA_4BYTE, 0, REG_SPBASE, 0);
}
}

Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2299,6 +2299,7 @@ void Compiler::compSetProcessor()
if (canUseApxEncoding())
{
codeGen->GetEmitter()->SetUseRex2Encoding(true);
codeGen->GetEmitter()->SetUsePromotedEVEXEncoding(true);
}
}
#endif // TARGET_XARCH
Expand Down
25 changes: 21 additions & 4 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -4012,7 +4012,7 @@ class Compiler

// false: we can add new tracked variables.
// true: We cannot add new 'tracked' variable
bool lvaTrackedFixed = false;
bool lvaTrackedFixed = false;

unsigned lvaCount; // total number of locals, which includes function arguments,
// special arguments, IL local variables, and JIT temporary variables
Expand Down Expand Up @@ -6921,15 +6921,15 @@ class Compiler
unsigned acdCount = 0;

// Get the index to use as part of the AddCodeDsc key for sharing throw blocks
unsigned bbThrowIndex(BasicBlock* blk, AcdKeyDesignator* dsg);
unsigned bbThrowIndex(BasicBlock* blk, AcdKeyDesignator* dsg);

struct AddCodeDscKey
{
public:
AddCodeDscKey(): acdKind(SCK_NONE), acdData(0) {}
AddCodeDscKey(SpecialCodeKind kind, BasicBlock* block, Compiler* comp);
AddCodeDscKey(AddCodeDsc* add);

static bool Equals(const AddCodeDscKey& x, const AddCodeDscKey& y)
{
return (x.acdData == y.acdData) && (x.acdKind == y.acdKind);
Expand Down Expand Up @@ -10080,13 +10080,30 @@ class Compiler
// JitStressEvexEncoding- Answer the question: Is Evex stress knob set
//
// Returns:
// `true` if user requests REX2 encoding.
// `true` if user requests EVEX encoding.
//
bool JitStressEvexEncoding() const
{
#ifdef DEBUG
return JitConfig.JitStressEvexEncoding() || JitConfig.JitStressRex2Encoding();
#endif // DEBUG
return false;
}

//------------------------------------------------------------------------
// DoJitStressPromotedEvexEncoding- Answer the question: Do we force promoted EVEX encoding.
//
// Returns:
// `true` if user requests promoted EVEX encoding.
//
bool DoJitStressPromotedEvexEncoding() const
{
#ifdef DEBUG
if (JitConfig.JitStressPromotedEvexEncoding() && compOpportunisticallyDependsOn(InstructionSet_APX))
{
return true;
}
#endif // DEBUG

return false;
}
Expand Down
Loading
Loading