Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate hashing shims for hardcoded tags across the LEM codebase #810

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

arthurpaulino
Copy link
Contributor

@arthurpaulino arthurpaulino commented Oct 27, 2023

Changes:

  • Expand LEM's expressiveness to produce that kind of data in the model
  • Adjust the Store accordingly, providing the appropriate API to create such pointers directly

Performance:

  • The store benchmarks are a bit worse because initiating a default Store now has the extra cost of computing a few poseidon hashes
  • But hydration is better because the hardcoded tag check shim is gone
  • No significant changes to evaluation or proving

Note: I tried an approach with OnceCells. It made the store interning and hydration benchmarks have no change wrt master, but it backfired with a 2% slowdown in evaluation, which is worse than paying a small cost on the store startup.

end2end_benchmark/end2end_go_base_nova/_10_0
                        time:   [1.1098 s 1.1130 s 1.1157 s]
                        change: [+0.1501% +0.4740% +0.7700%] (p = 0.01 < 0.05)
                        Change within noise threshold.

store_benchmark/store_go_base_bls12/_10_16
                        time:   [156.42 µs 156.48 µs 156.56 µs]
                        change: [+4.8783% +4.9844% +5.0762%] (p = 0.00 < 0.05)
                        Performance has regressed.
store_benchmark/store_go_base_pallas/_10_16
                        time:   [158.86 µs 158.93 µs 159.01 µs]
                        change: [+4.5726% +4.6446% +4.7182%] (p = 0.00 < 0.05)
                        Performance has regressed.
store_benchmark/store_go_base_bls12/_10_160
                        time:   [159.15 µs 159.23 µs 159.33 µs]
                        change: [+3.9374% +4.0413% +4.1424%] (p = 0.00 < 0.05)
                        Performance has regressed.
store_benchmark/store_go_base_pallas/_10_160
                        time:   [157.71 µs 157.79 µs 157.89 µs]
                        change: [+4.0536% +4.1132% +4.1800%] (p = 0.00 < 0.05)
                        Performance has regressed.

hydration_benchmark/hydration_go_base_bls12/_10_16
                        time:   [103.96 ns 103.97 ns 103.98 ns]
                        change: [-3.7004% -3.5877% -3.4693%] (p = 0.00 < 0.05)
                        Performance has improved.
hydration_benchmark/hydration_go_base_pallas/_10_16
                        time:   [103.97 ns 103.97 ns 103.98 ns]
                        change: [-3.6144% -3.5865% -3.5611%] (p = 0.00 < 0.05)
                        Performance has improved.
hydration_benchmark/hydration_go_base_bls12/_10_160
                        time:   [103.03 ns 103.08 ns 103.12 ns]
                        change: [-4.0741% -3.9706% -3.8657%] (p = 0.00 < 0.05)
                        Performance has improved.
hydration_benchmark/hydration_go_base_pallas/_10_160
                        time:   [103.66 ns 103.68 ns 103.70 ns]
                        change: [-3.5395% -3.5072% -3.4763%] (p = 0.00 < 0.05)
                        Performance has improved.

eval_benchmark/eval_go_base_bls12/_10_16
                        time:   [8.0917 ms 8.0943 ms 8.0970 ms]
                        change: [-0.6780% -0.6198% -0.5642%] (p = 0.00 < 0.05)
                        Change within noise threshold.
eval_benchmark/eval_go_base_pallas/_10_16
                        time:   [8.1685 ms 8.1724 ms 8.1765 ms]
                        change: [+0.1633% +0.2267% +0.2902%] (p = 0.00 < 0.05)
                        Change within noise threshold.
eval_benchmark/eval_go_base_bls12/_10_160
                        time:   [81.775 ms 81.827 ms 81.888 ms]
                        change: [-0.4352% -0.2755% -0.1315%] (p = 0.00 < 0.05)
                        Change within noise threshold.
eval_benchmark/eval_go_base_pallas/_10_160
                        time:   [82.448 ms 82.532 ms 82.611 ms]
                        change: [+0.0092% +0.1232% +0.2417%] (p = 0.04 < 0.05)
                        Change within noise threshold.

Closes #796

@arthurpaulino arthurpaulino requested review from a team as code owners October 27, 2023 18:55
@arthurpaulino arthurpaulino force-pushed the ap/remove-hashing-shims branch 2 times, most recently from c9cf7d5 to 8bd8545 Compare October 27, 2023 21:46
* Expand LEM's expressiveness to produce that kind of data in the model

* Adjust the Store accordingly, providing the appropriate API to create such pointers directly
@arthurpaulino arthurpaulino force-pushed the ap/remove-hashing-shims branch from 8bd8545 to bedb380 Compare October 27, 2023 22:44
Comment on lines -72 to -75
Tag::Cont(Outermost | Error | Dummy | Terminal) => {
// temporary shim for compatibility with Lurk Alpha
ZPtr::from_parts(*tag, store.poseidon_cache.hash8(&[F::ZERO; 8]))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gone

Comment on lines -350 to -351
// the following is a temporary shim for compatibility with Lurk Alpha
// otherwise we could just have a pointer whose value is zero
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gone

Comment on lines -374 to -377
Tag::Cont(Outermost | Error | Dummy | Terminal) => {
// temporary shim for compatibility with Lurk Alpha
g.new_const(cs, store.poseidon_cache.hash8(&[F::ZERO; 8]));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gone

Comment on lines -684 to -687
// temporary shim for compatibility with Lurk Alpha
ctx.global_allocator.get_allocated_const_cloned(
ctx.store.poseidon_cache.hash8(&[F::ZERO; 8]),
)?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gone

Comment on lines -1414 to -1417
Tag::Cont(Outermost | Error | Dummy | Terminal) => {
// temporary shim for compatibility with Lurk Alpha
globals.insert(FWrap(store.poseidon_cache.hash8(&[F::ZERO; 8])));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gone

Comment on lines -502 to -508
Tag::Cont(Outermost | ContTag::Error | Dummy | Terminal) => {
// temporary shim for compatibility with Lurk Alpha
Ok(ZPtr::from_parts(
*tag,
self.poseidon_cache.hash8(&[F::ZERO; 8]),
))
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gone

Comment on lines +77 to +80
let hash3zeros = poseidon_cache.hash3(&[F::ZERO; 3]);
let hash4zeros = poseidon_cache.hash4(&[F::ZERO; 4]);
let hash6zeros = poseidon_cache.hash6(&[F::ZERO; 6]);
let hash8zeros = poseidon_cache.hash8(&[F::ZERO; 8]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the extra cost of store startup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we parallelize this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be parallelized, but I am not sure it's worth it since the PoseidonCache would be a locking bottleneck across threads

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache has four different locations per arity, writing to these locations only requires a shared reference thanks to the FrozenMap, and the locking is internal to each of these. What's the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. But I tried it and the difference in performance wasn't clear enough to justify the increase in complexity:

store_benchmark/store_go_base_bls12/_10_16                                                                           
                        time:   [152.22 µs 152.32 µs 152.40 µs]
                        change: [-2.7313% -2.6485% -2.5683%] (p = 0.00 < 0.05)
                        Performance has improved.
store_benchmark/store_go_base_pallas/_10_16                                                                           
                        time:   [154.93 µs 155.08 µs 155.24 µs]
                        change: [-1.0801% -0.7544% -0.2272%] (p = 0.00 < 0.05)
                        Change within noise threshold.
store_benchmark/store_go_base_bls12/_10_160                                                                           
                        time:   [158.65 µs 158.75 µs 158.86 µs]
                        change: [+1.2575% +1.3732% +1.4909%] (p = 0.00 < 0.05)
                        Performance has regressed.
store_benchmark/store_go_base_pallas/_10_160                                                                           
                        time:   [157.69 µs 158.22 µs 158.95 µs]
                        change: [+0.5752% +0.7856% +1.0481%] (p = 0.00 < 0.05)
                        Change within noise threshold.

This is how I did it:

let hashes = (0..4).into_par_iter().map(|i| {
            match i {
                0 => poseidon_cache.hash3(&[F::ZERO; 3]),
                1 => poseidon_cache.hash4(&[F::ZERO; 4]),
                2 => poseidon_cache.hash6(&[F::ZERO; 6]),
                3 => poseidon_cache.hash8(&[F::ZERO; 8]),
                _ => unreachable!(),
            }
        }).collect::<Vec<_>>();
            hash3zeros: hashes[0],
            hash4zeros: hashes[1],
            hash6zeros: hashes[2],
            hash8zeros: hashes[3],

Copy link
Contributor

@huitseeker huitseeker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough evaluation!

@arthurpaulino arthurpaulino added this pull request to the merge queue Oct 30, 2023
Merged via the queue into master with commit c93b70b Oct 30, 2023
12 checks passed
@arthurpaulino arthurpaulino deleted the ap/remove-hashing-shims branch October 30, 2023 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Op::Null4, Op::Null6 and Op::Null8
2 participants