Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation on M1 when moving from Rust 1.69 to 1.70 #917

Closed
bobbinth opened this issue Jun 1, 2023 · 5 comments
Closed

Performance degradation on M1 when moving from Rust 1.69 to 1.70 #917

bobbinth opened this issue Jun 1, 2023 · 5 comments

Comments

@bobbinth
Copy link
Contributor

bobbinth commented Jun 1, 2023

Seems like when compiled with Rust 1.70, Miden VM runs about 13% slower than when compiled with Rust 1.69. I've included the benchmarks for running the VM for $2^{20}$ cycles below. Interestingly, it seems like the thing that really slowed down is hashing (which itself could be caused by a number of things), while most other steps have actually improved in performance.

Compiled with Rust 1.69:

Generated a program to compute 200000-th Fibonacci term; expected result: 18256939052688735311
--------------------------------
Generated execution trace of 73 columns and 1048576 steps in 316 ms
Built domain of 2^23 elements in 8 ms
Extended execution trace of 73 columns from 2^20 to 2^23 steps (8x blowup) in 2255 ms
Computed execution trace commitment (Merkle tree of depth 23) in 909 ms
Built auxiliary trace segment of 9 columns and 2^20 steps in 39 ms
Extended execution trace of 9 columns from 2^20 to 2^23 steps (8x blowup) in 660 ms
Computed execution trace commitment (Merkle tree of depth 23) in 355 ms
Evaluated constraints over domain of 2^23 elements in 2454 ms
Converted constraint evaluations into 8 composition polynomial columns of degree 1048575 in 184 ms
Evaluated 8 composition polynomial columns over LDE domain (2^23 elements) in 398 ms
Computed constraint evaluation commitment (Merkle tree of depth 23) in 281 ms
Built DEEP composition polynomial of degree 1048574 in 331 ms
Evaluated DEEP composition polynomial over LDE domain (2^23 elements) in 62 ms
Computed 4 FRI layers from composition polynomial evaluations in 74 ms
Determined 27 query positions in 0 ms
Built proof object in 59 ms
--------------------------------
Executed program in 8490 ms
Stack outputs: [18256939052688735311]
Execution proof size: 100 KB
Execution proof security: 96 bits
--------------------------------
Execution verified in 2 ms

Compiled with Rust 1.70:

Generated a program to compute 200000-th Fibonacci term; expected result: 18256939052688735311
--------------------------------
Generated execution trace of 73 columns and 1048576 steps in 319 ms
Built domain of 2^23 elements in 7 ms
Extended execution trace of 73 columns from 2^20 to 2^23 steps (8x blowup) in 2063 ms
Computed execution trace commitment (Merkle tree of depth 23) in 1832 ms
Built auxiliary trace segment of 9 columns and 2^20 steps in 39 ms
Extended execution trace of 9 columns from 2^20 to 2^23 steps (8x blowup) in 598 ms
Computed execution trace commitment (Merkle tree of depth 23) in 676 ms
Evaluated constraints over domain of 2^23 elements in 2302 ms
Converted constraint evaluations into 8 composition polynomial columns of degree 1048575 in 181 ms
Evaluated 8 composition polynomial columns over LDE domain (2^23 elements) in 348 ms
Computed constraint evaluation commitment (Merkle tree of depth 23) in 522 ms
Built DEEP composition polynomial of degree 1048574 in 372 ms
Evaluated DEEP composition polynomial over LDE domain (2^23 elements) in 58 ms
Computed 4 FRI layers from composition polynomial evaluations in 116 ms
Determined 27 query positions in 2 ms
Built proof object in 47 ms
--------------------------------
Executed program in 9617 ms
Stack outputs: [18256939052688735311]
Execution proof size: 100 KB
Execution proof security: 96 bits
--------------------------------
Execution verified in 2 ms
@bobbinth
Copy link
Contributor Author

bobbinth commented Jun 2, 2023

Seems like the issue is with BLAKE3 hash function. For w/e reason it became more than 50% slower.

@Al-Kindi-0
Copy link
Collaborator

Very interesting! Have you by chance compared using RPO?

@bobbinth
Copy link
Contributor Author

bobbinth commented Jun 3, 2023

Have you by chance compared using RPO?

Good question! It actually seems a similar thing happened with RPO - so, maybe something else is going on here.

Compiled with Rust 1.69:

Generated a program to compute 200000-th Fibonacci term; expected result: 18256939052688735311
--------------------------------
Generated execution trace of 73 columns and 1048576 steps in 312 ms
Built domain of 2^23 elements in 8 ms
Extended execution trace of 73 columns from 2^20 to 2^23 steps (8x blowup) in 2225 ms
Computed execution trace commitment (Merkle tree of depth 23) in 58374 ms
Built auxiliary trace segment of 9 columns and 2^20 steps in 42 ms
Extended execution trace of 9 columns from 2^20 to 2^23 steps (8x blowup) in 659 ms
Computed execution trace commitment (Merkle tree of depth 23) in 21333 ms
Evaluated constraints over domain of 2^23 elements in 2493 ms
Converted constraint evaluations into 8 composition polynomial columns of degree 1048575 in 185 ms
Evaluated 8 composition polynomial columns over LDE domain (2^23 elements) in 383 ms
Computed constraint evaluation commitment (Merkle tree of depth 23) in 16005 ms
Built DEEP composition polynomial of degree 1048574 in 315 ms
Evaluated DEEP composition polynomial over LDE domain (2^23 elements) in 65 ms
Computed 9 FRI layers from composition polynomial evaluations in 3678 ms
Determined 27 query positions in 7 ms
Built proof object in 45 ms
--------------------------------
Executed program in 106227 ms
Stack outputs: [18256939052688735311]
Execution proof size: 142 KB
Execution proof security: 96 bits
--------------------------------
Execution verified in 25 ms

Compiled with Rust 1.70:

============================================================
Generated a program to compute 200000-th Fibonacci term; expected result: 18256939052688735311
--------------------------------
Generated execution trace of 73 columns and 1048576 steps in 304 ms
Built domain of 2^23 elements in 8 ms
Extended execution trace of 73 columns from 2^20 to 2^23 steps (8x blowup) in 2048 ms
Computed execution trace commitment (Merkle tree of depth 23) in 71330 ms
Built auxiliary trace segment of 9 columns and 2^20 steps in 37 ms
Extended execution trace of 9 columns from 2^20 to 2^23 steps (8x blowup) in 588 ms
Computed execution trace commitment (Merkle tree of depth 23) in 25772 ms
Evaluated constraints over domain of 2^23 elements in 2355 ms
Converted constraint evaluations into 8 composition polynomial columns of degree 1048575 in 173 ms
Evaluated 8 composition polynomial columns over LDE domain (2^23 elements) in 350 ms
Computed constraint evaluation commitment (Merkle tree of depth 23) in 19072 ms
Built DEEP composition polynomial of degree 1048574 in 354 ms
Evaluated DEEP composition polynomial over LDE domain (2^23 elements) in 55 ms
Computed 9 FRI layers from composition polynomial evaluations in 4275 ms
Determined 27 query positions in 93 ms
Built proof object in 34 ms
--------------------------------
Executed program in 126937 ms
Stack outputs: [18256939052688735311]
Execution proof size: 142 KB
Execution proof security: 96 bits
--------------------------------
Execution verified in 28 ms

GPU-accelerated

When running in GPU-accelerated mode, the differences are barely noticeable:

Compiled with Rust 1.69:

Generated a program to compute 200000-th Fibonacci term; expected result: 18256939052688735311
--------------------------------
Generated execution trace of 73 columns and 1048576 steps in 309 ms
Built domain of 2^23 elements in 8 ms
Extended (on CPU) and committed (on GPU) to an execution trace of 73 columns from 2^20 to 2^23 steps in 24102 ms
Built auxiliary trace segment of 9 columns and 2^20 steps in 39 ms
Extended (on CPU) and committed (on GPU) to an execution trace of 9 columns from 2^20 to 2^23 steps in 9359 ms
Evaluated constraints over domain of 2^23 elements in 2543 ms
Converted constraint evaluations into 8 composition polynomial columns of degree 1048575 in 184 ms
Evaluated 8 composition polynomial columns over LDE domain (2^23 elements) in 344 ms
Computed constraint evaluation commitment on the GPU (Merkle tree of depth 23) in 7057 ms
Built DEEP composition polynomial of degree 1048574 in 351 ms
Evaluated DEEP composition polynomial over LDE domain (2^23 elements) in 60 ms
Computed 9 FRI layers from composition polynomial evaluations in 3665 ms
Determined 27 query positions in 27 ms
Built proof object in 29 ms
--------------------------------
Executed program in 48189 ms
Stack outputs: [18256939052688735311]
Execution proof size: 142 KB
Execution proof security: 96 bits
--------------------------------
Execution verified in 24 ms

Compiled with Rust 1.70:

Generated a program to compute 200000-th Fibonacci term; expected result: 18256939052688735311
--------------------------------
Generated execution trace of 73 columns and 1048576 steps in 320 ms
Built domain of 2^23 elements in 8 ms
Extended (on CPU) and committed (on GPU) to an execution trace of 73 columns from 2^20 to 2^23 steps in 24076 ms
Built auxiliary trace segment of 9 columns and 2^20 steps in 38 ms
Extended (on CPU) and committed (on GPU) to an execution trace of 9 columns from 2^20 to 2^23 steps in 9346 ms
Evaluated constraints over domain of 2^23 elements in 2369 ms
Converted constraint evaluations into 8 composition polynomial columns of degree 1048575 in 183 ms
Evaluated 8 composition polynomial columns over LDE domain (2^23 elements) in 313 ms
Computed constraint evaluation commitment on the GPU (Merkle tree of depth 23) in 7051 ms
Built DEEP composition polynomial of degree 1048574 in 411 ms
Evaluated DEEP composition polynomial over LDE domain (2^23 elements) in 57 ms
Computed 9 FRI layers from composition polynomial evaluations in 4284 ms
Determined 27 query positions in 76 ms
Built proof object in 32 ms
--------------------------------
Executed program in 48695 ms
Stack outputs: [18256939052688735311]
Execution proof size: 141 KB
Execution proof security: 96 bits
--------------------------------
Execution verified in 28 ms

@bobbinth
Copy link
Contributor Author

bobbinth commented Aug 29, 2023

It seems like this issue has resolved itself with the current Rust nightly (1.74). And in general, the proving to quite a bit faster due to latest trace optimizations:

Generated a program to compute 200000-th Fibonacci term; expected result: 18256939052688735311
--------------------------------
Generated execution trace of 70 columns and 1048576 steps in 316 ms
Built domain of 2^23 elements in 8 ms
Extended execution trace of 70 columns from 2^20 to 2^23 steps (8x blowup) in 1869 ms
Computed execution trace commitment (Merkle tree of depth 23) in 842 ms
Built auxiliary trace segment of 7 columns and 2^20 steps in 26 ms
Extended execution trace of 7 columns from 2^20 to 2^23 steps (8x blowup) in 421 ms
Computed execution trace commitment (Merkle tree of depth 23) in 296 ms
Evaluated constraints over domain of 2^23 elements in 2100 ms
Converted constraint evaluations into 8 composition polynomial columns of degree 1048575 in 182 ms
Evaluated 8 composition polynomial columns over LDE domain (2^23 elements) in 350 ms
Computed constraint evaluation commitment (Merkle tree of depth 23) in 283 ms
Built DEEP composition polynomial of degree 1048574 in 347 ms
Evaluated DEEP composition polynomial over LDE domain (2^23 elements) in 53 ms
Computed 4 FRI layers from composition polynomial evaluations in 89 ms
Determined 27 query positions in 0 ms
Built proof object in 66 ms
--------------------------------
Executed program in 7359 ms
Stack outputs: [18256939052688735311]
Execution proof size: 98 KB
Execution proof security: 96 bits
--------------------------------
Execution verified in 1 ms

Assuming the performance stays the same, we should set the minimum supported Rust version to 1.74 once it becomes stable.

@bobbinth
Copy link
Contributor Author

Closed by #1084 since we now set min required rustc version to 1.73 (and there is no performance degradation for this version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants