Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Add extensive_hints feature to prevent performance regression for the common use-case #1503

Merged
merged 14 commits into from
Dec 5, 2023

Conversation

fmoletta
Copy link
Contributor

@fmoletta fmoletta commented Dec 4, 2023

Gates behaviour added by #1491 under extensive_hints feature to prevent performance regressions on most use cases

@fmoletta fmoletta changed the title Add load_program feature to prevent performance regression for the broader use case Add extensive_hints feature to prevent performance regression for the common use-case Dec 4, 2023
@fmoletta fmoletta changed the title Add extensive_hints feature to prevent performance regression for the common use-case perf: Add extensive_hints feature to prevent performance regression for the common use-case Dec 4, 2023
Copy link

github-actions bot commented Dec 4, 2023

Benchmark Results for unmodified programs 🚀

Command Mean [s] Min [s] Max [s] Relative
base big_factorial 2.809 ± 0.039 2.781 2.905 1.00 ± 0.01
head big_factorial 2.802 ± 0.013 2.787 2.820 1.00
Command Mean [s] Min [s] Max [s] Relative
base big_fibonacci 2.402 ± 0.008 2.390 2.413 1.00
head big_fibonacci 2.413 ± 0.021 2.386 2.463 1.00 ± 0.01
Command Mean [s] Min [s] Max [s] Relative
base blake2s_integration_benchmark 8.571 ± 0.233 8.447 9.219 1.23 ± 0.03
head blake2s_integration_benchmark 6.942 ± 0.052 6.882 7.077 1.00
Command Mean [s] Min [s] Max [s] Relative
base compare_arrays_200000 2.876 ± 0.031 2.853 2.959 1.15 ± 0.01
head compare_arrays_200000 2.507 ± 0.012 2.484 2.527 1.00
Command Mean [s] Min [s] Max [s] Relative
base dict_integration_benchmark 1.875 ± 0.006 1.866 1.885 1.10 ± 0.01
head dict_integration_benchmark 1.703 ± 0.007 1.695 1.714 1.00
Command Mean [s] Min [s] Max [s] Relative
base field_arithmetic_get_square_benchmark 1.379 ± 0.003 1.375 1.386 1.17 ± 0.01
head field_arithmetic_get_square_benchmark 1.179 ± 0.010 1.166 1.201 1.00
Command Mean [s] Min [s] Max [s] Relative
base integration_builtins 6.924 ± 0.134 6.853 7.303 1.00 ± 0.02
head integration_builtins 6.908 ± 0.036 6.875 6.977 1.00
Command Mean [s] Min [s] Max [s] Relative
base keccak_integration_benchmark 8.644 ± 0.134 8.567 9.015 1.23 ± 0.02
head keccak_integration_benchmark 7.026 ± 0.034 6.988 7.113 1.00
Command Mean [s] Min [s] Max [s] Relative
base linear_search 2.897 ± 0.010 2.883 2.910 1.14 ± 0.01
head linear_search 2.545 ± 0.007 2.538 2.560 1.00
Command Mean [s] Min [s] Max [s] Relative
base math_cmp_and_pow_integration_benchmark 1.881 ± 0.026 1.857 1.934 1.12 ± 0.02
head math_cmp_and_pow_integration_benchmark 1.673 ± 0.009 1.663 1.692 1.00
Command Mean [s] Min [s] Max [s] Relative
base math_integration_benchmark 1.797 ± 0.015 1.779 1.832 1.12 ± 0.01
head math_integration_benchmark 1.604 ± 0.008 1.586 1.613 1.00
Command Mean [s] Min [s] Max [s] Relative
base memory_integration_benchmark 1.596 ± 0.020 1.577 1.649 1.12 ± 0.02
head memory_integration_benchmark 1.426 ± 0.017 1.414 1.472 1.00
Command Mean [s] Min [s] Max [s] Relative
base operations_with_data_structures_benchmarks 1.746 ± 0.007 1.738 1.760 1.12 ± 0.01
head operations_with_data_structures_benchmarks 1.557 ± 0.010 1.538 1.568 1.00
Command Mean [ms] Min [ms] Max [ms] Relative
base pedersen 618.7 ± 8.5 612.8 641.3 1.00 ± 0.02
head pedersen 616.5 ± 4.5 613.2 629.1 1.00
Command Mean [s] Min [s] Max [s] Relative
base poseidon_integration_benchmark 1.231 ± 0.005 1.222 1.238 1.03 ± 0.01
head poseidon_integration_benchmark 1.196 ± 0.012 1.179 1.224 1.00
Command Mean [s] Min [s] Max [s] Relative
base secp_integration_benchmark 2.304 ± 0.011 2.287 2.323 1.09 ± 0.01
head secp_integration_benchmark 2.112 ± 0.007 2.103 2.126 1.00
Command Mean [s] Min [s] Max [s] Relative
base set_integration_benchmark 1.212 ± 0.007 1.202 1.220 1.03 ± 0.01
head set_integration_benchmark 1.181 ± 0.003 1.177 1.185 1.00
Command Mean [s] Min [s] Max [s] Relative
base uint256_integration_benchmark 5.062 ± 0.010 5.048 5.083 1.17 ± 0.01
head uint256_integration_benchmark 4.331 ± 0.047 4.267 4.405 1.00

@fmoletta fmoletta marked this pull request as ready for review December 4, 2023 22:02
Copy link

codecov bot commented Dec 4, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (db708e3) 96.82% compared to head (c023ce7) 96.82%.

Files Patch % Lines
vm/src/types/program.rs 98.27% 1 Missing ⚠️
vm/src/vm/runners/cairo_runner.rs 96.15% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1503   +/-   ##
=======================================
  Coverage   96.82%   96.82%           
=======================================
  Files          96       96           
  Lines       39684    39744   +60     
=======================================
+ Hits        38423    38482   +59     
- Misses       1261     1262    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Oppen
Copy link
Contributor

Oppen commented Dec 5, 2023

I suggest we first try to see if we can alleviate the hit with a faster hash. It may not be as fast as the array approach, but it will be significantly easier to maintain than having two separate cases and could come quite close.
Popular alternatives:
https://crates.io/crates/ahash
https://crates.io/crates/ritehash
https://crates.io/crates/fnv_rs

We can also make the input to the hash much smaller by simply converting the Relocatable to its compact form before hashing, for example:

impl Hash for Relocatable {
    fn hash<H: Hasher>(&self, state: &mut H) {
        (self.segment_index.unsigned_abs() as u64 << 48 | self.offset as u64).hash(state);
    }
}

This halves the input bytes to the hash for nearly zero effort, which is typically about linear in CPU time. I used unsigned_abs for convenience, but of course this would cause collisions for temporary segments sharing the same offset. I'm not sure we store temporary segments on any HashMap, but if we do we could use the MSB to store the sign if we need.

@Oppen
Copy link
Contributor

Oppen commented Dec 5, 2023

Just tried the suggestions in my previous comment on the draft PR #1505. The speedup is quite similar with ahash. I can clean that one up so we don't add the extra feature flag, it should Just Work (TM) for both cases.

@Oppen Oppen added this pull request to the merge queue Dec 5, 2023
Merged via the queue into main with commit 8a2ef24 Dec 5, 2023
59 checks passed
@Oppen Oppen deleted the load_program_feature branch December 5, 2023 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants