Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EraVM] Enable MachineCopyPropagation optimization #712

Merged
merged 14 commits into from
Oct 10, 2024
Merged

Conversation

vladimirradosavljevic
Copy link
Contributor

This PR enables MachineCopyPropagation for EraVM (by adding let AllowRegisterRenaming = 1 in EraVM.td), cherry-picks some improvements and adds some more improvements.

Copy link

github-actions bot commented Oct 7, 2024

╔═╡ Size (-%) ╞════════════════╡ All M3B3 ╞═╗
║ Best                               15.385 ║
║ Worst                              -8.000 ║
║ Total                               0.923 ║
╠═╡ Cycles (-%) ╞══════════════╡ All M3B3 ╞═╣
║ Best                               14.286 ║
║ Worst                              -3.488 ║
║ Total                               0.314 ║
╠═╡ Ergs (-%) ╞════════════════╡ All M3B3 ╞═╣
║ Best                               14.634 ║
║ Worst                              -0.707 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════════════╡ All MzB3 ╞═╗
║ Best                               15.385 ║
║ Worst                              -8.000 ║
║ Total                               0.492 ║
╠═╡ Cycles (-%) ╞══════════════╡ All MzB3 ╞═╣
║ Best                               14.286 ║
║ Worst                              -3.448 ║
║ Total                               0.284 ║
╠═╡ Ergs (-%) ╞════════════════╡ All MzB3 ╞═╣
║ Best                               14.634 ║
║ Worst                              -0.705 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════╡ EVMInterpreter M3B3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                                 NaN ║
╠═╡ Cycles (-%) ╞═══╡ EVMInterpreter M3B3 ╞═╣
║ Best                                2.505 ║
║ Worst                               0.000 ║
║ Total                               0.693 ║
╠═╡ Ergs (-%) ╞═════╡ EVMInterpreter M3B3 ╞═╣
║ Best                                0.334 ║
║ Worst                               0.000 ║
║ Total                               0.134 ║
╠═╡ Ergs/gas ╞══════╡ EVMInterpreter M3B3 ╞═╣
║ ADD                                38.750 ║
║ MUL                                23.250 ║
║ SUB                                38.750 ║
║ DIV                                25.650 ║
║ SDIV                               41.250 ║
║ MOD                                25.650 ║
║ SMOD                               38.850 ║
║ ADDMOD                             20.656 ║
║ MULMOD                             23.656 ║
║ EXP                                 7.138 ║
║ SIGNEXTEND                         24.450 ║
║ LT                                 42.750 ║
║ GT                                 42.750 ║
║ SLT                                64.750 ║
║ SGT                                64.750 ║
║ EQ                                 42.750 ║
║ ISZERO                             36.417 ║
║ AND                                38.750 ║
║ OR                                 38.750 ║
║ XOR                                38.750 ║
║ NOT                                32.417 ║
║ BYTE                               46.750 ║
║ SHL                                42.750 ║
║ SHR                                42.750 ║
║ SAR                                60.750 ║
║ SGT                                64.750 ║
║ SHA3                               26.326 ║
║ ADDRESS                            47.812 ║
║ BALANCE                            39.059 ║
║ ORIGIN                           1351.625 ║
║ CALLER                             47.812 ║
║ CALLVALUE                          47.812 ║
║ CALLDATALOAD                       34.750 ║
║ CALLDATASIZE                       48.125 ║
║ CALLDATACOPY                       49.492 ║
║ CODESIZE                           48.625 ║
║ CODECOPY                           60.487 ║
║ GASPRICE                         1348.438 ║
║ EXTCODESIZE                         3.702 ║
║ EXTCODECOPY                         3.769 ║
║ RETURNDATASIZE                     46.500 ║
║ RETURNDATACOPY                     43.222 ║
║ EXTCODEHASH                         4.884 ║
║ BLOCKHASH                         240.119 ║
║ COINBASE                         1348.625 ║
║ TIMESTAMP                        1342.625 ║
║ NUMBER                           1342.625 ║
║ PREVRANDAO                       1342.625 ║
║ GASLIMIT                         1348.625 ║
║ CHAINID                          1336.625 ║
║ SELFBALANCE                       639.250 ║
║ BASEFEE                          1342.625 ║
║ POP                                38.625 ║
║ MLOAD                              51.590 ║
║ MSTORE                             55.248 ║
║ MSTORE8                            64.716 ║
║ SLOAD                              20.632 ║
║ SSTORE                              4.644 ║
║ JUMP                               17.000 ║
║ JUMPI                              16.727 ║
║ PC                                 48.312 ║
║ MSIZE                              54.812 ║
║ GAS                                45.312 ║
║ JUMPDEST                           65.625 ║
║ PUSH0                              45.312 ║
║ PUSH1                              41.958 ║
║ PUSH2                              47.375 ║
║ PUSH4                              50.208 ║
║ PUSH5                              51.625 ║
║ PUSH6                              53.042 ║
║ PUSH7                              54.458 ║
║ PUSH8                              55.875 ║
║ PUSH9                              57.292 ║
║ PUSH10                             58.708 ║
║ PUSH11                             60.125 ║
║ PUSH12                             61.542 ║
║ PUSH13                             62.958 ║
║ PUSH14                             64.375 ║
║ PUSH15                             65.792 ║
║ PUSH16                             67.208 ║
║ PUSH17                             68.625 ║
║ PUSH18                             70.042 ║
║ PUSH19                             71.458 ║
║ PUSH20                             72.875 ║
║ PUSH21                             74.292 ║
║ PUSH22                             75.708 ║
║ PUSH23                             77.125 ║
║ PUSH24                             78.542 ║
║ PUSH25                             79.958 ║
║ PUSH26                             81.375 ║
║ PUSH27                             82.792 ║
║ PUSH28                             84.208 ║
║ PUSH29                             85.625 ║
║ PUSH30                             87.042 ║
║ PUSH31                             88.458 ║
║ PUSH32                             87.875 ║
║ DUP1                               34.417 ║
║ DUP2                               36.417 ║
║ DUP3                               36.417 ║
║ DUP4                               36.417 ║
║ DUP5                               36.417 ║
║ DUP6                               36.417 ║
║ DUP7                               36.417 ║
║ DUP8                               36.417 ║
║ DUP9                               36.417 ║
║ DUP10                              36.417 ║
║ DUP11                              36.417 ║
║ DUP12                              36.417 ║
║ DUP13                              36.417 ║
║ DUP14                              36.417 ║
║ DUP15                              36.417 ║
║ DUP16                              36.417 ║
║ SWAP1                              41.083 ║
║ SWAP2                              41.083 ║
║ SWAP3                              41.083 ║
║ SWAP4                              41.083 ║
║ SWAP5                              41.083 ║
║ SWAP6                              41.083 ║
║ SWAP7                              41.083 ║
║ SWAP8                              41.083 ║
║ SWAP9                              41.083 ║
║ SWAP10                             41.083 ║
║ SWAP11                             41.083 ║
║ SWAP12                             41.083 ║
║ SWAP13                             41.083 ║
║ SWAP14                             41.083 ║
║ SWAP15                             41.083 ║
║ SWAP16                             41.083 ║
║ CALL                               36.115 ║
║ STATICCALL                         35.161 ║
║ DELEGATECALL                       34.205 ║
║ CREATE                              4.075 ║
║ CREATE2                             5.547 ║
║ RETURN                              1.000 ║
║ REVERT                              1.000 ║
╠═╡ Ergs/gas (-%) ╞═╡ EVMInterpreter M3B3 ╞═╣
║ BALANCE                             0.465 ║
║ ORIGIN                              0.231 ║
║ GASPRICE                            0.231 ║
║ EXTCODESIZE                         0.619 ║
║ EXTCODECOPY                         0.608 ║
║ EXTCODEHASH                         0.610 ║
║ BLOCKHASH                           0.125 ║
║ COINBASE                            0.231 ║
║ TIMESTAMP                           0.232 ║
║ NUMBER                              0.232 ║
║ PREVRANDAO                          0.232 ║
║ GASLIMIT                            0.231 ║
║ CHAINID                             0.233 ║
║ SELFBALANCE                         0.195 ║
║ BASEFEE                             0.232 ║
║ SLOAD                               0.368 ║
║ SSTORE                              0.327 ║
║ CALL                                0.211 ║
║ STATICCALL                          0.210 ║
║ DELEGATECALL                        0.204 ║
║ CREATE                              0.185 ║
║ CREATE2                             0.195 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════╡ EVMInterpreter MzB3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                                 NaN ║
╠═╡ Cycles (-%) ╞═══╡ EVMInterpreter MzB3 ╞═╣
║ Best                                2.505 ║
║ Worst                               0.000 ║
║ Total                               0.693 ║
╠═╡ Ergs (-%) ╞═════╡ EVMInterpreter MzB3 ╞═╣
║ Best                                0.334 ║
║ Worst                               0.000 ║
║ Total                               0.134 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════╡ Precompiles M3B3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Cycles (-%) ╞══════╡ Precompiles M3B3 ╞═╣
║ Best                                2.597 ║
║ Worst                               0.000 ║
║ Total                               0.019 ║
╠═╡ Ergs (-%) ╞════════╡ Precompiles M3B3 ╞═╣
║ Best                                0.853 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════╡ Precompiles MzB3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Cycles (-%) ╞══════╡ Precompiles MzB3 ╞═╣
║ Best                                2.410 ║
║ Worst                               0.000 ║
║ Total                               0.009 ║
╠═╡ Ergs (-%) ╞════════╡ Precompiles MzB3 ╞═╣
║ Best                                0.822 ║
║ Worst                              -0.000 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞══════════╡ Real life M3B3 ╞═╗
║ Best                                0.969 ║
║ Worst                               0.000 ║
║ Total                               0.324 ║
╠═╡ Cycles (-%) ╞════════╡ Real life M3B3 ╞═╣
║ Best                                4.348 ║
║ Worst                               0.000 ║
║ Total                               0.367 ║
╠═╡ Ergs (-%) ╞══════════╡ Real life M3B3 ╞═╣
║ Best                                1.791 ║
║ Worst                               0.000 ║
║ Total                               0.106 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞══════════╡ Real life MzB3 ╞═╗
║ Best                                0.881 ║
║ Worst                               0.000 ║
║ Total                               0.097 ║
╠═╡ Cycles (-%) ╞════════╡ Real life MzB3 ╞═╣
║ Best                                3.704 ║
║ Worst                               0.000 ║
║ Total                               0.297 ║
╠═╡ Ergs (-%) ╞══════════╡ Real life MzB3 ╞═╣
║ Best                                1.193 ║
║ Worst                               0.000 ║
║ Total                               0.093 ║
╚═══════════════════════════════════════════╝

@vladimirradosavljevic
Copy link
Contributor Author

One regression test is failing for 20c39f6 commit, since it introduces two parameters in copyPhysReg function and we are applying this in EVM and EraVM in the following commit.
I'm not sure whether we can make CI fully happy since we are doing cherry-pick from the upstream and in the next commit we are adding that to our backends.

@akiramenai
Copy link
Collaborator

Theoretically, we could use a pre-commit XFAIL and re-enable it later, but since we periodically update LLVM and rebase our commits on top of theirs, I recommend not worrying about it. We have admin privileges to bypass the CI if needed.

Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add tests to
[EraVM][EVM] Add renamable bit to copyPhysReg
[EraVM] Enable MachineCopyPropagation optimization.
Also links to upstream counterparts in the commit messages are very welcome for future LLVM updates.

@vladimirradosavljevic
Copy link
Contributor Author

vladimirradosavljevic commented Oct 9, 2024

Could we add tests to [EraVM] Enable MachineCopyPropagation optimization.

This is the name of the PR, and there is no such commit. Did you have some other commit in your mind?

Also links to upstream counterparts in the commit messages are very welcome for future LLVM updates.

This is for c8750aa and b653cb6, right?

@akiramenai
Copy link
Collaborator

@vladimirradosavljevic Sorry, this one: [EraVM] Enable MachineCopyPropagation optimization

@akiramenai
Copy link
Collaborator

akiramenai commented Oct 9, 2024

This is for c8750aa and b653cb6, right?

Yes, to make checking their status easier.

@vladimirradosavljevic
Copy link
Contributor Author

@vladimirradosavljevic Sorry, this one: [EraVM] Enable MachineCopyPropagation optimization

Copy paste typo. Did you mean [EraVM] Set AllowRegisterRenaming to 1?

@vladimirradosavljevic
Copy link
Contributor Author

Added test for [EraVM][EVM] Add renamable bit to copyPhysReg. Needed to reorganize commits, since to test this, we first need to do d34e85b.
Added test for [EraVM] Set AllowRegisterRenaming to 1 and added links to upstream PRs.
@akiramenai PTAL.

kazutakahirata and others added 14 commits October 9, 2024 16:56
Once we modernize CopyInfo with default member initializations,

  Copies.insert({Unit, ...})

becomes equivalent to:

  Copies.try_emplace(Unit)

which we can simplify further down to Copies[Unit].
Previously we wouldn't remove dead copies from basic blocks with
successors. The comment said we didn't want to trust the live-in lists.
The comment is very old so I'm not sure if that's still a concern today.

This patch checks the live-in lists and removes copies from
MaybeDeadCopies if they are referenced by any live-ins in any
successors. We only do this if the tracksLiveness property is set. If
that property is not set, we retain the old behavior.
Tail duplication will generate the redundant move before return. It is
because the MachineCopyPropogation can't recognize COPY after post-RA
pseudoExpand.

This patch make MachineCopyPropogation recognize `%0 = ADDI %1, 0` as
COPY
…EIMM

Some optimizations like MachineCopyPropagation makes
decisions based on the register flags, so in order for
these optimizations to work correctly, we need to preserve
flags after expansion of pseudo instructions.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
Even though in tests registers are marked with
renamable flag, MachineOperand::isRenamable
returns false for them, since AllowRegisterRenaming
is 0 by default.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
MachineCopyPropagation optimization only works if registers
are renamable, so we need to allow register renaming to enable
this optimization for EraVM.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
The renamable flag is useful during MachineCopyPropagation but renamable
flag will be dropped after lowerCopy in some case.

This patch introduces extra arguments to pass the renamable flag to
copyPhysReg.
Run MCP one more time after pseudo expansion,
since we can produce copy like instructions
(e.g. from PTR_TO_INT) that can be optimized.
This patch implements isCopyInstrImpl target
hook which is called from MCP to detect copy
instructions after expansion from COPY is
performed.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
…g forward propagation

Signed-off-by: Vladimir Radosavljevic <[email protected]>
Before this patch, redundant COPY couldn't be removed
for the following case:
  %reg1 = COPY %const-reg
  ... // No use of %reg1 but there is a def/use of %const-reg
  %reg2 = COPY killed %reg1

where this can be optimized to:
  ... // No use of %reg1 but there is a def/use of %const-reg
  %reg2 = COPY %const-reg

This patch allows for such optimization by not
invalidating constant registers. This is safe
even where constant registers are defined, as
architectures like AArch64 and RISCV replace a
dead definition of a GPR with a zero constant
register for certain instructions.

Upstream PR:
llvm/llvm-project#111129

Signed-off-by: Vladimir Radosavljevic <[email protected]>
…ng backward propagation

Signed-off-by: Vladimir Radosavljevic <[email protected]>
Before this patch, redundant COPY couldn't be removed
for the following case:
  $R0 = OP ...
  ... // Read of %R0
  $R1 = COPY killed $R0

This patch adds support for tracking the users of
the source register during backward propagation, so
that we can remove the redundant COPY in the above case
and optimize it to:
  $R1 = OP ...
  ... // Replace all uses of %R0 with $R1

Upstream PR:
llvm/llvm-project#111130

Signed-off-by: Vladimir Radosavljevic <[email protected]>
Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!
Will wait for the rest of the checkers and then merge.

@akiramenai akiramenai merged commit 801623b into main Oct 10, 2024
36 of 37 checks passed
@akiramenai akiramenai deleted the enable_mcp branch October 10, 2024 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants