Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EraVM] Add support for splitting live ranges of PHI nodes in loops #703

Merged
merged 3 commits into from
Sep 25, 2024

Conversation

vladimirradosavljevic
Copy link
Contributor

No description provided.

Copy link

github-actions bot commented Sep 12, 2024

╔═╡ Size (-%) ╞════════════════╡ All M3B3 ╞═╗
║ Best                                1.050 ║
║ Worst                               0.000 ║
║ Total                               0.003 ║
╠═╡ Cycles (-%) ╞══════════════╡ All M3B3 ╞═╣
║ Best                                6.724 ║
║ Worst                               0.000 ║
║ Total                               0.031 ║
╠═╡ Ergs (-%) ╞════════════════╡ All M3B3 ╞═╣
║ Best                                2.610 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════════════╡ All MzB3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Cycles (-%) ╞══════════════╡ All MzB3 ╞═╣
║ Best                                6.724 ║
║ Worst                               0.000 ║
║ Total                               0.028 ║
╠═╡ Ergs (-%) ╞════════════════╡ All MzB3 ╞═╣
║ Best                                2.610 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════╡ EVMInterpreter M3B3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                                 NaN ║
╠═╡ Cycles (-%) ╞═══╡ EVMInterpreter M3B3 ╞═╣
║ Best                                6.724 ║
║ Worst                               0.000 ║
║ Total                               4.266 ║
╠═╡ Ergs (-%) ╞═════╡ EVMInterpreter M3B3 ╞═╣
║ Best                                2.610 ║
║ Worst                               0.000 ║
║ Total                               0.637 ║
╠═╡ Ergs/gas ╞══════╡ EVMInterpreter M3B3 ╞═╣
║ ADD                                38.750 ║
║ MUL                                23.250 ║
║ SUB                                38.750 ║
║ DIV                                25.650 ║
║ SDIV                               41.250 ║
║ MOD                                25.650 ║
║ SMOD                               38.850 ║
║ ADDMOD                             20.656 ║
║ MULMOD                             23.656 ║
║ EXP                                 7.138 ║
║ SIGNEXTEND                         24.450 ║
║ LT                                 42.750 ║
║ GT                                 42.750 ║
║ SLT                                64.750 ║
║ SGT                                64.750 ║
║ EQ                                 42.750 ║
║ ISZERO                             36.417 ║
║ AND                                38.750 ║
║ OR                                 38.750 ║
║ XOR                                38.750 ║
║ NOT                                32.417 ║
║ BYTE                               46.750 ║
║ SHL                                42.750 ║
║ SHR                                42.750 ║
║ SAR                                60.750 ║
║ SGT                                64.750 ║
║ SHA3                               26.326 ║
║ ADDRESS                            47.812 ║
║ BALANCE                            39.901 ║
║ ORIGIN                           1354.750 ║
║ CALLER                             47.812 ║
║ CALLVALUE                          47.812 ║
║ CALLDATALOAD                       34.750 ║
║ CALLDATASIZE                       48.125 ║
║ CALLDATACOPY                       66.231 ║
║ CODESIZE                           48.625 ║
║ CODECOPY                           58.518 ║
║ GASPRICE                         1351.562 ║
║ EXTCODESIZE                         3.758 ║
║ EXTCODECOPY                         3.825 ║
║ RETURNDATASIZE                     46.500 ║
║ RETURNDATACOPY                     43.222 ║
║ EXTCODEHASH                         4.982 ║
║ BLOCKHASH                         240.419 ║
║ COINBASE                         1351.750 ║
║ TIMESTAMP                        1345.750 ║
║ NUMBER                           1345.750 ║
║ PREVRANDAO                       1345.750 ║
║ GASLIMIT                         1351.750 ║
║ CHAINID                          1339.750 ║
║ SELFBALANCE                       640.500 ║
║ BASEFEE                          1345.750 ║
║ POP                                38.625 ║
║ MLOAD                              51.590 ║
║ MSTORE                             55.248 ║
║ MSTORE8                            64.716 ║
║ SLOAD                              20.865 ║
║ SSTORE                              4.729 ║
║ JUMP                               17.000 ║
║ JUMPI                              16.727 ║
║ PC                                 48.312 ║
║ MSIZE                              54.812 ║
║ GAS                                45.312 ║
║ JUMPDEST                           65.625 ║
║ PUSH0                              45.312 ║
║ PUSH1                              41.958 ║
║ PUSH2                              47.375 ║
║ PUSH4                              50.208 ║
║ PUSH5                              51.625 ║
║ PUSH6                              53.042 ║
║ PUSH7                              54.458 ║
║ PUSH8                              55.875 ║
║ PUSH9                              57.292 ║
║ PUSH10                             58.708 ║
║ PUSH11                             60.125 ║
║ PUSH12                             61.542 ║
║ PUSH13                             62.958 ║
║ PUSH14                             64.375 ║
║ PUSH15                             65.792 ║
║ PUSH16                             67.208 ║
║ PUSH17                             68.625 ║
║ PUSH18                             70.042 ║
║ PUSH19                             71.458 ║
║ PUSH20                             72.875 ║
║ PUSH21                             74.292 ║
║ PUSH22                             75.708 ║
║ PUSH23                             77.125 ║
║ PUSH24                             78.542 ║
║ PUSH25                             79.958 ║
║ PUSH26                             81.375 ║
║ PUSH27                             82.792 ║
║ PUSH28                             84.208 ║
║ PUSH29                             85.625 ║
║ PUSH30                             87.042 ║
║ PUSH31                             88.458 ║
║ PUSH32                             87.875 ║
║ DUP1                               34.417 ║
║ DUP2                               36.417 ║
║ DUP3                               36.417 ║
║ DUP4                               36.417 ║
║ DUP5                               36.417 ║
║ DUP6                               36.417 ║
║ DUP7                               36.417 ║
║ DUP8                               36.417 ║
║ DUP9                               36.417 ║
║ DUP10                              36.417 ║
║ DUP11                              36.417 ║
║ DUP12                              36.417 ║
║ DUP13                              36.417 ║
║ DUP14                              36.417 ║
║ DUP15                              36.417 ║
║ DUP16                              36.417 ║
║ SWAP1                              41.083 ║
║ SWAP2                              41.083 ║
║ SWAP3                              41.083 ║
║ SWAP4                              41.083 ║
║ SWAP5                              41.083 ║
║ SWAP6                              41.083 ║
║ SWAP7                              41.083 ║
║ SWAP8                              41.083 ║
║ SWAP9                              41.083 ║
║ SWAP10                             41.083 ║
║ SWAP11                             41.083 ║
║ SWAP12                             41.083 ║
║ SWAP13                             41.083 ║
║ SWAP14                             41.083 ║
║ SWAP15                             41.083 ║
║ SWAP16                             41.083 ║
║ CALL                               36.640 ║
║ STATICCALL                         35.663 ║
║ DELEGATECALL                       34.682 ║
║ CREATE                              4.067 ║
║ CREATE2                             5.532 ║
║ RETURN                              1.000 ║
║ REVERT                              1.000 ║
╠═╡ Ergs/gas (-%) ╞═╡ EVMInterpreter M3B3 ╞═╣
║ ADD                                 4.908 ║
║ MUL                                 4.908 ║
║ SUB                                 4.908 ║
║ DIV                                 4.469 ║
║ SDIV                                2.827 ║
║ MOD                                 4.469 ║
║ SMOD                                2.996 ║
║ ADDMOD                              6.770 ║
║ MULMOD                              5.963 ║
║ EXP                                 1.382 ║
║ SIGNEXTEND                          4.678 ║
║ LT                                  4.469 ║
║ GT                                  4.469 ║
║ SLT                                 2.996 ║
║ SGT                                 2.996 ║
║ EQ                                  4.469 ║
║ ISZERO                              5.206 ║
║ AND                                 4.908 ║
║ OR                                  4.908 ║
║ XOR                                 4.908 ║
║ NOT                                 5.811 ║
║ BYTE                                4.103 ║
║ SHL                                 4.469 ║
║ SHR                                 4.469 ║
║ SAR                                 3.187 ║
║ SGT                                 2.996 ║
║ SHA3                                2.215 ║
║ ADDRESS                             5.904 ║
║ BALANCE                             0.598 ║
║ ORIGIN                              0.221 ║
║ CALLER                              5.904 ║
║ CALLVALUE                           5.904 ║
║ CALLDATALOAD                        5.442 ║
║ CALLDATASIZE                        5.868 ║
║ CALLDATACOPY                        2.887 ║
║ CODESIZE                            5.811 ║
║ CODECOPY                            3.256 ║
║ GASPRICE                            0.221 ║
║ EXTCODESIZE                         0.184 ║
║ EXTCODECOPY                         0.120 ║
║ RETURNDATASIZE                      6.061 ║
║ RETURNDATACOPY                      2.993 ║
║ EXTCODEHASH                         0.231 ║
║ BLOCKHASH                           0.125 ║
║ COINBASE                            0.221 ║
║ TIMESTAMP                           0.222 ║
║ NUMBER                              0.222 ║
║ PREVRANDAO                          0.222 ║
║ GASLIMIT                            0.221 ║
║ CHAINID                             0.223 ║
║ SELFBALANCE                         0.187 ║
║ BASEFEE                             0.222 ║
║ POP                                 7.207 ║
║ MLOAD                               3.423 ║
║ MSTORE                              3.204 ║
║ MSTORE8                             2.826 ║
║ SLOAD                               0.353 ║
║ SSTORE                              0.161 ║
║ PC                                  5.847 ║
║ MSIZE                               5.189 ║
║ GAS                                 6.210 ║
║ JUMPDEST                            8.377 ║
║ PUSH0                               6.210 ║
║ PUSH1                               4.550 ║
║ DUP2                                9.897 ║
║ DUP3                                9.897 ║
║ DUP4                                9.897 ║
║ DUP5                                9.897 ║
║ DUP6                                9.897 ║
║ DUP7                                9.897 ║
║ DUP8                                9.897 ║
║ DUP9                                9.897 ║
║ DUP10                               9.897 ║
║ DUP11                               9.897 ║
║ DUP12                               9.897 ║
║ DUP13                               9.897 ║
║ DUP14                               9.897 ║
║ DUP15                               9.897 ║
║ DUP16                               9.897 ║
║ SWAP1                               4.642 ║
║ SWAP2                               4.642 ║
║ SWAP3                               4.642 ║
║ SWAP4                               4.642 ║
║ SWAP5                               4.642 ║
║ SWAP6                               4.642 ║
║ SWAP7                               4.642 ║
║ SWAP8                               4.642 ║
║ SWAP9                               4.642 ║
║ SWAP10                              4.642 ║
║ SWAP11                              4.642 ║
║ SWAP12                              4.642 ║
║ SWAP13                              4.642 ║
║ SWAP14                              4.642 ║
║ SWAP15                              4.642 ║
║ SWAP16                              4.642 ║
║ CALL                                0.761 ║
║ STATICCALL                          0.781 ║
║ DELEGATECALL                        0.803 ║
║ CREATE                              0.172 ║
║ CREATE2                             0.183 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞═════╡ EVMInterpreter MzB3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                                 NaN ║
╠═╡ Cycles (-%) ╞═══╡ EVMInterpreter MzB3 ╞═╣
║ Best                                6.724 ║
║ Worst                               0.000 ║
║ Total                               4.266 ║
╠═╡ Ergs (-%) ╞═════╡ EVMInterpreter MzB3 ╞═╣
║ Best                                2.610 ║
║ Worst                               0.000 ║
║ Total                               0.637 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════╡ Precompiles M3B3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Cycles (-%) ╞══════╡ Precompiles M3B3 ╞═╣
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Ergs (-%) ╞════════╡ Precompiles M3B3 ╞═╣
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞════════╡ Precompiles MzB3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Cycles (-%) ╞══════╡ Precompiles MzB3 ╞═╣
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Ergs (-%) ╞════════╡ Precompiles MzB3 ╞═╣
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞══════════╡ Real life M3B3 ╞═╗
║ Best                                1.050 ║
║ Worst                               0.000 ║
║ Total                               0.069 ║
╠═╡ Cycles (-%) ╞════════╡ Real life M3B3 ╞═╣
║ Best                                0.417 ║
║ Worst                               0.000 ║
║ Total                               0.200 ║
╠═╡ Ergs (-%) ╞══════════╡ Real life M3B3 ╞═╣
║ Best                                0.346 ║
║ Worst                               0.000 ║
║ Total                               0.050 ║
╚═══════════════════════════════════════════╝

╔═╡ Size (-%) ╞══════════╡ Real life MzB3 ╞═╗
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Cycles (-%) ╞════════╡ Real life MzB3 ╞═╣
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╠═╡ Ergs (-%) ╞══════════╡ Real life MzB3 ╞═╣
║ Best                                0.000 ║
║ Worst                               0.000 ║
║ Total                               0.000 ║
╚═══════════════════════════════════════════╝

Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the documentation in the code, it helped me a lot. Still, I can't fully comprehend the patch as logic seems complex to me.

llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/EraVM/EraVMPostCodegenPrepare.cpp Outdated Show resolved Hide resolved
This is mainly used to test this pass by running it with opt.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
… of PHI nodes in loops

Signed-off-by: Vladimir Radosavljevic <[email protected]>
This is useful for loops with large switch statements where
PHI nodes are used frequently, and we want to keep these
variables in a single register.

Signed-off-by: Vladimir Radosavljevic <[email protected]>
@vladimirradosavljevic vladimirradosavljevic force-pushed the split_live_ranges_of_phi_in_loops branch from 0f61c65 to 28ada23 Compare September 25, 2024 14:11
@vladimirradosavljevic
Copy link
Contributor Author

vladimirradosavljevic commented Sep 25, 2024

I appreciate the documentation in the code, it helped me a lot. Still, I can't fully comprehend the patch as logic seems complex to me.

Idea to do this optimization came when I was investigating performance drop (caused by regalloc not being able to preserve variable in single register and added unnecessary copy instructions in all opcodes) after one of the changes in the EVMInterpreter. We had following case in one of the opcode in the EVMInterpreter:

bb1:
  %add1 = add %phi, 64
  bcc bb2
bb2:
  %add2 = add %phi, -64
  b latch

After I manually changed this to:

bb1:
  %add1 = add %phi, 64
  bcc bb2
bb2:
  %add2 = add %add1, -128
  b latch

regalloc managed to preserve variable in a single register and removed a lot of copy instructions.
This is far from ideal, since the ideal fix would be to change regalloc (I don't think loop with big switch statement is a common case + other architectures won't have these problems given the number of registers they have and compared to us, copy is not expensive since we pay gas for each instruction), but given the time we have, this seemed to me as a better idea to do.
This support is essential for the lazy stack implementation (you can see numbers here), since it removes a lot of copy instructions.
Improvements that we can see here are fortunate, since for the current implementation of the EVMInterpreter, this optimization removes 1 instruction since it does following:
Before this optimization:

... Loop header
	sub.s	31, r14, r1
	ldm.h	r1, r1

... ADD opcode
	add	1, r14, r14      (ip + 1)

After this optimization:

... Loop header
	sub.s	31, r14, r1     (ip - 31)
	ldmi.h	r1, r2, r14     (ip - 31 + 32 = ip + 1)

... ADD opcode (no ip + 1, since it is calculated by above ldmi.h instruction)

I will create a ticket for this kind of optimization, since we could benefit from indexed loads and stores (there are optimizations in DAGCombine for this) and we can do something similar in MIR to create more opportunities for post increment loads and stores.

Btw, my idea is to disable this optimization by default and to enable it only for EVMInterpreter. I don't think we should do this optimization for other contracts.

Copy link
Collaborator

@akiramenai akiramenai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for putting effort into refactoring.

@akiramenai akiramenai merged commit 4c59238 into main Sep 25, 2024
15 checks passed
@akiramenai akiramenai deleted the split_live_ranges_of_phi_in_loops branch September 25, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants