Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime backend autodetection #523

Merged
merged 11 commits into from
Jun 11, 2023
Merged

Conversation

koute
Copy link
Contributor

@koute koute commented Apr 11, 2023

This PR makes the following changes:

  • The backend used is now by default autodetected at runtime based on the features supported by the CPU.
  • If the appropriate target feature is available at compile time (through -C target_feature) then that particular feature is assumed to be always available (courtesy of the cpufeatures crate), and we won't try to detect whether it's available at runtime.
  • If the target doesn't support runtime autodetection (e.g. x86_64-fortanix-unknown-sgx) then no runtime autodetection will take place (courtesy of the cpufeatures crate)
  • Cargo features are now used to control whether a particular SIMD backend can be used. This is a necessary, but not sufficient, requirement for a backend to be used. (The idea here is to let the users disable those backends which they do not want.)
  • The --cfg curve25519_dalek_backend configuration knob doesn't accept simd anymore; now this can only be used to switch to the fiat backend.
  • The README previously implicitly stated that for the AVX512 backend the avx512ifma feature is sufficient. This is incorrect. This backend actually uses two features: avx512ifma and avx512vl, so I've made this explicit and corrected the README. In particular, if you've previously compiled with only -C target_feature=+avx512ifma without the avx512vl feature the performance of the resulting code was utterly horrible since Rust disables inlining if the appropriate target feature is not enabled (but the code still accidentally worked since CPUs which support avx512ifma usually also support avx512vl)
  • The code uses the unsafe_target_feature crate (which I wrote specifically for this PR) to make the whole SIMD-specific codepath easily callable with runtime autodetection without rewriting everything nor getting rid of any abstractions we have. See the unsafe_target_feature crate's description for why it's necessary.
  • The AVX512 backend will now be tested on the CI if the test runner supports those instructions. If the test runner supports both AVX2 and AVX512 then both will be tested.

Since the code in src/backend/vector/scalar_mul had to be nested in an extra mod the diff is a little messy; I suggest reviewing commit-by-commit and just ignoring the commit which rustfmts those files.

@koute koute force-pushed the main_runtime_simd branch from 51206bb to 738cfee Compare April 11, 2023 12:06
@tarcieri
Copy link
Contributor

FWIW, in lieu of unsafe_target_feature in @RustCrypto project we've used #[inline(always)] in the context of things like trait impls where the code is one or two lines long, though that can slow compile times.

In this PR though, it seems like you're applying it to larger code blocks.

@koute
Copy link
Contributor Author

koute commented Apr 11, 2023

FWIW, in lieu of unsafe_target_feature in @RustCrypto project we've used #[inline(always)] in the context of things like trait impls where the code is one or two lines long, though that can slow compile times.

In this PR though, it seems like you're applying it to larger code blocks.

The #[inline(always)] is only used for the thin wrapper functions though, e.g. this:

#[unsafe_target_feature("sse2")]
fn function() {
   /* ... */
}

gets turned into this:

#[inline(always)]
fn function() {
    #[target_feature(enable = "sse2")]
    unsafe fn _impl_function() {
        /* ... */
    }
    unsafe { _impl_function() }
}

So the inner function which actually contains the body of the function is not marked with #[inline(always)].

@koute
Copy link
Contributor Author

koute commented Apr 12, 2023

I've also ran the benchmarks for this PR; here are the results for anyone interested.

The benches were ran on a Xeon Platinum 8481C 2.70GHz on Google Cloud.

Scalar vs AVX2
testcase scalar avx2 %
multiscalar benches/Constant-time variable-base multiscalar multiplication/384 5896.6 3886.29 -34.09%
multiscalar benches/Constant-time variable-base multiscalar multiplication/768 11792.0 7778.8 -34.03%
multiscalar benches/Constant-time variable-base multiscalar multiplication/512 7843.0 5194.8 -33.77%
multiscalar benches/Constant-time variable-base multiscalar multiplication/16 271.05 179.74 -33.69%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1024 15724.0 10432.0 -33.66%
multiscalar benches/Constant-time variable-base multiscalar multiplication/64 1005.10 669.14 -33.43%
multiscalar benches/Constant-time variable-base multiscalar multiplication/128 1973.4 1315.19 -33.35%
multiscalar benches/Constant-time variable-base multiscalar multiplication/256 3902.8 2601.79 -33.34%
multiscalar benches/Constant-time variable-base multiscalar multiplication/32 512.99 342.09 -33.31%
multiscalar benches/Constant-time variable-base multiscalar multiplication/8 151.03 102.31 -32.26%
multiscalar benches/Constant-time variable-base multiscalar multiplication/4 91.36 63.62 -30.36%
multiscalar benches/Constant-time variable-base multiscalar multiplication/2 61.51 44.24 -28.07%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1024 6677.2 4906.0 -26.53%
multiscalar benches/Variable-time variable-base multiscalar multiplication/512 3775.3 2794.1 -25.99%
multiscalar benches/Variable-time variable-base multiscalar multiplication/384 3034.5 2246.9 -25.95%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1 46.62 34.53 -25.92%
multiscalar benches/Variable-time variable-base multiscalar multiplication/256 2184.5 1621.3 -25.78%
multiscalar benches/Variable-time variable-base multiscalar multiplication/768 5258.20 3903.9 -25.76%
multiscalar benches/Variable-time mixed-base/(size: 8) (50pct dyn) 99.18 77.43 -21.93%
multiscalar benches/Variable-time variable-base multiscalar multiplication/64 616.72 481.67 -21.90%
multiscalar benches/Variable-time variable-base multiscalar multiplication/128 1201.6 938.67 -21.88%
multiscalar benches/Variable-time variable-base multiscalar multiplication/16 177.74 139.12 -21.73%
multiscalar benches/Variable-time variable-base multiscalar multiplication/32 324.31 253.95 -21.70%
multiscalar benches/Variable-time mixed-base/(size: 4) (50pct dyn) 65.02 50.92 -21.68%
multiscalar benches/Variable-time mixed-base/(size: 16) (50pct dyn) 166.7 130.68 -21.61%
multiscalar benches/Variable-time variable-base multiscalar multiplication/8 104.01 81.73 -21.42%
multiscalar benches/Variable-time mixed-base/(size: 256) (50pct dyn) 2172.5 1712.1 -21.19%
multiscalar benches/Variable-time mixed-base/(size: 64) (50pct dyn) 566.45 446.53 -21.17%
multiscalar benches/Variable-time mixed-base/(size: 32) (50pct dyn) 301.09 237.38 -21.16%
multiscalar benches/Variable-time mixed-base/(size: 384) (50pct dyn) 3249.5 2562.60 -21.14%
multiscalar benches/Variable-time mixed-base/(size: 512) (50pct dyn) 4320.2 3409.60 -21.08%
multiscalar benches/Variable-time mixed-base/(size: 128) (50pct dyn) 1097.6 866.8 -21.03%
multiscalar benches/Variable-time variable-base multiscalar multiplication/4 67.03 52.98 -20.96%
multiscalar benches/Variable-time mixed-base/(size: 2) (50pct dyn) 47.33 37.52 -20.73%
multiscalar benches/Variable-time variable-base multiscalar multiplication/2 48.53 38.59 -20.49%
multiscalar benches/Variable-time mixed-base/(size: 8) (20pct dyn) 92.14 73.28 -20.47%
edwards benches/Constant-time variable-base scalar mul 44.45 35.43 -20.27%
multiscalar benches/Variable-time mixed-base/(size: 16) (20pct dyn) 155.31 124.24 -20.01%
multiscalar benches/Variable-time mixed-base/(size: 4) (20pct dyn) 59.81 47.89 -19.92%
multiscalar benches/Variable-time mixed-base/(size: 4) (0pct dyn) 59.79 47.89 -19.89%
multiscalar benches/Variable-time mixed-base/(size: 8) (0pct dyn) 89.27 71.62 -19.77%
multiscalar benches/Variable-time mixed-base/(size: 2) (20pct dyn) 44.73 35.99 -19.54%
multiscalar benches/Variable-time mixed-base/(size: 2) (0pct dyn) 44.72 35.98 -19.52%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1 39.15 31.54 -19.44%
multiscalar benches/Variable-time mixed-base/(size: 512) (20pct dyn) 3986.4 3218.6 -19.26%
multiscalar benches/Variable-time mixed-base/(size: 256) (20pct dyn) 2007.3 1620.9 -19.25%
multiscalar benches/Variable-time mixed-base/(size: 384) (20pct dyn) 2995.6 2419.79 -19.22%
multiscalar benches/Variable-time mixed-base/(size: 64) (20pct dyn) 521.72 421.92 -19.13%
multiscalar benches/Variable-time mixed-base/(size: 128) (20pct dyn) 1015.09 821.04 -19.12%
multiscalar benches/Variable-time mixed-base/(size: 32) (20pct dyn) 277.22 224.61 -18.98%
multiscalar benches/Variable-time mixed-base/(size: 768) (50pct dyn) 6329.90 5141.4 -18.78%
multiscalar benches/Variable-time mixed-base/(size: 16) (0pct dyn) 146.78 119.67 -18.47%
multiscalar benches/Variable-time mixed-base/(size: 1) (20pct dyn) 37.08 30.24 -18.45%
multiscalar benches/Variable-time mixed-base/(size: 1) (50pct dyn) 37.08 30.24 -18.44%
multiscalar benches/Variable-time mixed-base/(size: 1) (0pct dyn) 37.07 30.24 -18.42%
multiscalar benches/Variable-time mixed-base/(size: 1024) (50pct dyn) 8438.0 6894.3 -18.29%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/2 43.61 35.93 -17.60%
multiscalar benches/Variable-time mixed-base/(size: 384) (0pct dyn) 2808.39 2320.3 -17.38%
multiscalar benches/Variable-time mixed-base/(size: 256) (0pct dyn) 1879.1 1553.39 -17.33%
multiscalar benches/Variable-time mixed-base/(size: 512) (0pct dyn) 3736.60 3090.6 -17.29%
multiscalar benches/Variable-time mixed-base/(size: 32) (0pct dyn) 260.52 215.51 -17.28%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/4 57.79 47.82 -17.25%
multiscalar benches/Variable-time mixed-base/(size: 128) (0pct dyn) 949.81 786.7 -17.17%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1 36.41 30.16 -17.16%
multiscalar benches/Variable-time mixed-base/(size: 64) (0pct dyn) 488.12 404.41 -17.15%
multiscalar benches/Variable-time mixed-base/(size: 768) (20pct dyn) 5834.1 4864.0 -16.63%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/8 85.91 71.62 -16.63%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/16 141.72 119.42 -15.74%
multiscalar benches/Variable-time mixed-base/(size: 1024) (20pct dyn) 7773.79 6565.1 -15.55%
edwards benches/Variable-time aA+bB A variable B fixed 40.69 34.62 -14.91%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/32 252.32 215.53 -14.58%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/64 473.06 405.01 -14.39%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/256 1819.19 1558.6 -14.32%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/384 2715.5 2328.6 -14.25%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/512 3614.5 3099.8 -14.24%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/128 918.51 788.72 -14.13%
multiscalar benches/Variable-time mixed-base/(size: 768) (0pct dyn) 5413.2 4667.6 -13.77%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/768 5424.29 4683.5 -13.66%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1024 7260.2 6339.5 -12.68%
multiscalar benches/Variable-time mixed-base/(size: 1024) (0pct dyn) 7231.8 6320.5 -12.60%
edwards benches/Constant-time fixed-base scalar mul 11.56 12.72 9.99%
montgomery benches/Montgomery pseudomultiplication 50.93 55.64 9.25%
montgomery benches/Constant-time fixed-base scalar mul 15.57 16.56 6.36%
scalar benches/Scalar addition 23511000.0 22964000.0 -2.33%
scalar benches/Batch scalar inversion/1 13.10 13.21 0.89%
scalar benches/Scalar inversion 12.81 12.92 0.88%
scalar benches/Batch scalar inversion/4 13.65 13.76 0.85%
scalar benches/Batch scalar inversion/8 14.36 14.48 0.81%
scalar benches/Batch scalar inversion/16 15.80 15.92 0.75%
scalar benches/Batch scalar inversion/2 13.28 13.38 0.74%
ristretto benches/Batch Ristretto double-and-encode/16 13.30 13.21 -0.68%
ristretto benches/Batch Ristretto double-and-encode/8 8.70 8.66 -0.41%
scalar benches/Scalar subtraction 21408000.0 21485000.0 0.36%
scalar benches/Scalar multiplication 91928000.0 91662000.0 -0.29%
ristretto benches/Batch Ristretto double-and-encode/4 6.35 6.34 -0.23%
edwards benches/EdwardsPoint compression 3.95 3.95 0.12%
ristretto benches/RistrettoPoint decompression 4.61 4.61 0.07%
ristretto benches/Batch Ristretto double-and-encode/1 4.58 4.58 -0.06%
edwards benches/EdwardsPoint decompression 4.31 4.32 0.04%
ristretto benches/Batch Ristretto double-and-encode/2 5.18 5.18 -0.01%
ristretto benches/RistrettoPoint compression 4.57 4.57 0.00%
Scalar vs AVX512
testcase scalar avx512 %
multiscalar benches/Constant-time variable-base multiscalar multiplication/768 11792.0 5123.8 -56.55%
multiscalar benches/Constant-time variable-base multiscalar multiplication/384 5896.6 2567.0 -56.47%
multiscalar benches/Constant-time variable-base multiscalar multiplication/512 7843.0 3427.9 -56.29%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1024 15724.0 6908.5 -56.06%
multiscalar benches/Constant-time variable-base multiscalar multiplication/256 3902.8 1718.6 -55.96%
multiscalar benches/Constant-time variable-base multiscalar multiplication/128 1973.4 872.49 -55.79%
multiscalar benches/Constant-time variable-base multiscalar multiplication/64 1005.10 449.67 -55.26%
multiscalar benches/Constant-time variable-base multiscalar multiplication/32 512.99 236.33 -53.93%
multiscalar benches/Constant-time variable-base multiscalar multiplication/16 271.05 128.61 -52.55%
multiscalar benches/Variable-time variable-base multiscalar multiplication/768 5258.20 2529.79 -51.89%
multiscalar benches/Variable-time variable-base multiscalar multiplication/384 3034.5 1475.7 -51.37%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1024 6677.2 3251.79 -51.30%
multiscalar benches/Variable-time variable-base multiscalar multiplication/512 3775.3 1854.60 -50.88%
multiscalar benches/Constant-time variable-base multiscalar multiplication/8 151.03 75.82 -49.79%
multiscalar benches/Variable-time variable-base multiscalar multiplication/256 2184.5 1104.0 -49.46%
multiscalar benches/Constant-time variable-base multiscalar multiplication/4 91.36 49.58 -45.72%
multiscalar benches/Constant-time variable-base multiscalar multiplication/2 61.51 35.94 -41.56%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1 46.62 29.46 -36.80%
edwards benches/Constant-time variable-base scalar mul 44.45 29.97 -32.56%
multiscalar benches/Variable-time mixed-base/(size: 512) (50pct dyn) 4320.2 2954.0 -31.62%
multiscalar benches/Variable-time mixed-base/(size: 384) (50pct dyn) 3249.5 2223.6 -31.57%
multiscalar benches/Variable-time mixed-base/(size: 256) (50pct dyn) 2172.5 1490.89 -31.37%
multiscalar benches/Variable-time mixed-base/(size: 128) (50pct dyn) 1097.6 763.06 -30.48%
multiscalar benches/Variable-time variable-base multiscalar multiplication/128 1201.6 835.7 -30.45%
multiscalar benches/Variable-time mixed-base/(size: 512) (20pct dyn) 3986.4 2785.29 -30.13%
multiscalar benches/Variable-time mixed-base/(size: 384) (20pct dyn) 2995.6 2095.1 -30.06%
multiscalar benches/Variable-time mixed-base/(size: 64) (50pct dyn) 566.45 396.4 -30.02%
multiscalar benches/Variable-time variable-base multiscalar multiplication/64 616.72 431.84 -29.98%
multiscalar benches/Variable-time mixed-base/(size: 768) (50pct dyn) 6329.90 4435.2 -29.93%
multiscalar benches/Variable-time mixed-base/(size: 256) (20pct dyn) 2007.3 1406.9 -29.91%
multiscalar benches/Variable-time mixed-base/(size: 1024) (50pct dyn) 8438.0 5921.59 -29.82%
multiscalar benches/Variable-time mixed-base/(size: 32) (50pct dyn) 301.09 212.54 -29.41%
multiscalar benches/Variable-time mixed-base/(size: 128) (20pct dyn) 1015.09 719.19 -29.15%
multiscalar benches/Variable-time variable-base multiscalar multiplication/32 324.31 230.12 -29.04%
multiscalar benches/Variable-time mixed-base/(size: 512) (0pct dyn) 3736.60 2666.6 -28.64%
multiscalar benches/Variable-time mixed-base/(size: 384) (0pct dyn) 2808.39 2007.1 -28.53%
multiscalar benches/Variable-time mixed-base/(size: 16) (50pct dyn) 166.7 119.31 -28.43%
multiscalar benches/Variable-time mixed-base/(size: 768) (20pct dyn) 5834.1 4180.5 -28.34%
multiscalar benches/Variable-time mixed-base/(size: 256) (0pct dyn) 1879.1 1347.39 -28.30%
multiscalar benches/Variable-time mixed-base/(size: 64) (20pct dyn) 521.72 374.48 -28.22%
multiscalar benches/Variable-time mixed-base/(size: 1024) (20pct dyn) 7773.79 5599.5 -27.97%
multiscalar benches/Variable-time variable-base multiscalar multiplication/16 177.74 129.05 -27.39%
multiscalar benches/Variable-time mixed-base/(size: 128) (0pct dyn) 949.81 689.65 -27.39%
multiscalar benches/Variable-time mixed-base/(size: 32) (20pct dyn) 277.22 201.88 -27.18%
multiscalar benches/Variable-time mixed-base/(size: 8) (50pct dyn) 99.18 72.67 -26.73%
multiscalar benches/Variable-time mixed-base/(size: 768) (0pct dyn) 5413.2 3998.3 -26.14%
multiscalar benches/Variable-time mixed-base/(size: 16) (20pct dyn) 155.31 114.76 -26.11%
multiscalar benches/Variable-time mixed-base/(size: 64) (0pct dyn) 488.12 360.78 -26.09%
multiscalar benches/Variable-time mixed-base/(size: 1024) (0pct dyn) 7231.8 5371.8 -25.72%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1024 7260.2 5398.5 -25.64%
multiscalar benches/Variable-time variable-base multiscalar multiplication/8 104.01 77.50 -25.48%
multiscalar benches/Variable-time mixed-base/(size: 32) (0pct dyn) 260.52 194.85 -25.21%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/768 5424.29 4059.99 -25.15%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/512 3614.5 2708.5 -25.07%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/384 2715.5 2039.8 -24.88%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/256 1819.19 1368.4 -24.78%
multiscalar benches/Variable-time mixed-base/(size: 4) (50pct dyn) 65.02 48.93 -24.75%
multiscalar benches/Variable-time mixed-base/(size: 8) (20pct dyn) 92.14 69.44 -24.64%
multiscalar benches/Variable-time mixed-base/(size: 16) (0pct dyn) 146.78 111.45 -24.07%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/128 918.51 699.73 -23.82%
multiscalar benches/Variable-time variable-base multiscalar multiplication/4 67.03 51.08 -23.79%
multiscalar benches/Variable-time mixed-base/(size: 8) (0pct dyn) 89.27 68.24 -23.56%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/64 473.06 365.51 -22.73%
multiscalar benches/Variable-time mixed-base/(size: 4) (20pct dyn) 59.81 46.43 -22.36%
multiscalar benches/Variable-time mixed-base/(size: 4) (0pct dyn) 59.79 46.43 -22.34%
multiscalar benches/Variable-time variable-base multiscalar multiplication/2 48.53 37.73 -22.25%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/32 252.32 196.43 -22.15%
multiscalar benches/Variable-time mixed-base/(size: 2) (50pct dyn) 47.33 36.85 -22.14%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1 39.15 30.78 -21.38%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/16 141.72 111.78 -21.13%
scalar benches/Scalar inversion 12.81 16.17 26.21%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/8 85.91 68.35 -20.44%
multiscalar benches/Variable-time mixed-base/(size: 2) (20pct dyn) 44.73 35.68 -20.23%
multiscalar benches/Variable-time mixed-base/(size: 2) (0pct dyn) 44.72 35.68 -20.20%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/4 57.79 46.42 -19.68%
multiscalar benches/Variable-time mixed-base/(size: 1) (0pct dyn) 37.07 29.93 -19.27%
multiscalar benches/Variable-time mixed-base/(size: 1) (20pct dyn) 37.08 29.94 -19.27%
multiscalar benches/Variable-time mixed-base/(size: 1) (50pct dyn) 37.08 29.95 -19.23%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/2 43.61 35.52 -18.54%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1 36.41 29.77 -18.25%
edwards benches/Variable-time aA+bB A variable B fixed 40.69 33.53 -17.60%
montgomery benches/Constant-time fixed-base scalar mul 15.57 17.22 10.62%
scalar benches/Scalar multiplication 91928000.0 99047000.0 7.74%
edwards benches/Constant-time fixed-base scalar mul 11.56 12.40 7.29%
ristretto benches/Batch Ristretto double-and-encode/16 13.30 14.25 7.14%
montgomery benches/Montgomery pseudomultiplication 50.93 54.34 6.68%
scalar benches/Scalar addition 23511000.0 22191000.0 -5.61%
ristretto benches/Batch Ristretto double-and-encode/8 8.70 9.21 5.90%
ristretto benches/Batch Ristretto double-and-encode/4 6.35 6.60 3.86%
scalar benches/Batch scalar inversion/16 15.80 16.32 3.29%
ristretto benches/RistrettoPoint compression 4.57 4.69 2.67%
scalar benches/Batch scalar inversion/8 14.36 14.72 2.50%
ristretto benches/Batch Ristretto double-and-encode/2 5.18 5.31 2.47%
ristretto benches/RistrettoPoint decompression 4.61 4.71 2.23%
scalar benches/Batch scalar inversion/4 13.65 13.95 2.18%
edwards benches/EdwardsPoint decompression 4.31 4.39 1.72%
scalar benches/Batch scalar inversion/1 13.10 13.31 1.61%
scalar benches/Batch scalar inversion/2 13.28 13.47 1.45%
ristretto benches/Batch Ristretto double-and-encode/1 4.58 4.64 1.35%
scalar benches/Scalar subtraction 21408000.0 21139000.0 -1.26%
edwards benches/EdwardsPoint compression 3.95 3.95 0.05%
AVX2 vs AVX512
testcase avx2 avx512 %
multiscalar benches/Variable-time variable-base multiscalar multiplication/768 3903.9 2529.79 -35.20%
multiscalar benches/Variable-time variable-base multiscalar multiplication/384 2246.9 1475.7 -34.32%
multiscalar benches/Constant-time variable-base multiscalar multiplication/768 7778.8 5123.8 -34.13%
multiscalar benches/Constant-time variable-base multiscalar multiplication/512 5194.8 3427.9 -34.01%
multiscalar benches/Constant-time variable-base multiscalar multiplication/384 3886.29 2567.0 -33.95%
multiscalar benches/Constant-time variable-base multiscalar multiplication/256 2601.79 1718.6 -33.95%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1024 10432.0 6908.5 -33.78%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1024 4906.0 3251.79 -33.72%
multiscalar benches/Constant-time variable-base multiscalar multiplication/128 1315.19 872.49 -33.66%
multiscalar benches/Variable-time variable-base multiscalar multiplication/512 2794.1 1854.60 -33.62%
multiscalar benches/Constant-time variable-base multiscalar multiplication/64 669.14 449.67 -32.80%
multiscalar benches/Variable-time variable-base multiscalar multiplication/256 1621.3 1104.0 -31.91%
multiscalar benches/Constant-time variable-base multiscalar multiplication/32 342.09 236.33 -30.92%
multiscalar benches/Constant-time variable-base multiscalar multiplication/16 179.74 128.61 -28.45%
multiscalar benches/Constant-time variable-base multiscalar multiplication/8 102.31 75.82 -25.88%
multiscalar benches/Constant-time variable-base multiscalar multiplication/4 63.62 49.58 -22.06%
scalar benches/Scalar inversion 12.92 16.17 25.11%
multiscalar benches/Constant-time variable-base multiscalar multiplication/2 44.24 35.94 -18.76%
edwards benches/Constant-time variable-base scalar mul 35.43 29.97 -15.41%
multiscalar benches/Variable-time mixed-base/(size: 1024) (0pct dyn) 6320.5 5371.8 -15.01%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1024 6339.5 5398.5 -14.84%
multiscalar benches/Variable-time mixed-base/(size: 1024) (20pct dyn) 6565.1 5599.5 -14.71%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1 34.53 29.46 -14.68%
multiscalar benches/Variable-time mixed-base/(size: 768) (0pct dyn) 4667.6 3998.3 -14.34%
multiscalar benches/Variable-time mixed-base/(size: 1024) (50pct dyn) 6894.3 5921.59 -14.11%
multiscalar benches/Variable-time mixed-base/(size: 768) (20pct dyn) 4864.0 4180.5 -14.05%
multiscalar benches/Variable-time mixed-base/(size: 768) (50pct dyn) 5141.4 4435.2 -13.74%
multiscalar benches/Variable-time mixed-base/(size: 512) (0pct dyn) 3090.6 2666.6 -13.72%
multiscalar benches/Variable-time mixed-base/(size: 384) (0pct dyn) 2320.3 2007.1 -13.50%
multiscalar benches/Variable-time mixed-base/(size: 512) (20pct dyn) 3218.6 2785.29 -13.46%
multiscalar benches/Variable-time mixed-base/(size: 384) (20pct dyn) 2419.79 2095.1 -13.42%
multiscalar benches/Variable-time mixed-base/(size: 512) (50pct dyn) 3409.60 2954.0 -13.36%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/768 4683.5 4059.99 -13.31%
multiscalar benches/Variable-time mixed-base/(size: 256) (0pct dyn) 1553.39 1347.39 -13.26%
multiscalar benches/Variable-time mixed-base/(size: 384) (50pct dyn) 2562.60 2223.6 -13.23%
multiscalar benches/Variable-time mixed-base/(size: 256) (20pct dyn) 1620.9 1406.9 -13.20%
multiscalar benches/Variable-time mixed-base/(size: 256) (50pct dyn) 1712.1 1490.89 -12.92%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/512 3099.8 2708.5 -12.62%
multiscalar benches/Variable-time mixed-base/(size: 128) (20pct dyn) 821.04 719.19 -12.40%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/384 2328.6 2039.8 -12.40%
multiscalar benches/Variable-time mixed-base/(size: 128) (0pct dyn) 786.7 689.65 -12.34%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/256 1558.6 1368.4 -12.20%
multiscalar benches/Variable-time mixed-base/(size: 128) (50pct dyn) 866.8 763.06 -11.97%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/128 788.72 699.73 -11.28%
multiscalar benches/Variable-time mixed-base/(size: 64) (20pct dyn) 421.92 374.48 -11.24%
multiscalar benches/Variable-time mixed-base/(size: 64) (50pct dyn) 446.53 396.4 -11.23%
multiscalar benches/Variable-time variable-base multiscalar multiplication/128 938.67 835.7 -10.97%
multiscalar benches/Variable-time mixed-base/(size: 64) (0pct dyn) 404.41 360.78 -10.79%
multiscalar benches/Variable-time mixed-base/(size: 32) (50pct dyn) 237.38 212.54 -10.46%
multiscalar benches/Variable-time variable-base multiscalar multiplication/64 481.67 431.84 -10.35%
multiscalar benches/Variable-time mixed-base/(size: 32) (20pct dyn) 224.61 201.88 -10.12%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/64 405.01 365.51 -9.75%
multiscalar benches/Variable-time mixed-base/(size: 32) (0pct dyn) 215.51 194.85 -9.59%
multiscalar benches/Variable-time variable-base multiscalar multiplication/32 253.95 230.12 -9.38%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/32 215.53 196.43 -8.86%
multiscalar benches/Variable-time mixed-base/(size: 16) (50pct dyn) 130.68 119.31 -8.70%
multiscalar benches/Variable-time mixed-base/(size: 16) (20pct dyn) 124.24 114.76 -7.63%
scalar benches/Scalar multiplication 91662000.0 99047000.0 8.06%
ristretto benches/Batch Ristretto double-and-encode/16 13.21 14.25 7.88%
multiscalar benches/Variable-time variable-base multiscalar multiplication/16 139.12 129.05 -7.24%
multiscalar benches/Variable-time mixed-base/(size: 16) (0pct dyn) 119.67 111.45 -6.87%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/16 119.42 111.78 -6.40%
multiscalar benches/Variable-time mixed-base/(size: 8) (50pct dyn) 77.43 72.67 -6.15%
ristretto benches/Batch Ristretto double-and-encode/8 8.66 9.21 6.33%
multiscalar benches/Variable-time mixed-base/(size: 8) (20pct dyn) 73.28 69.44 -5.24%
multiscalar benches/Variable-time variable-base multiscalar multiplication/8 81.73 77.50 -5.17%
multiscalar benches/Variable-time mixed-base/(size: 8) (0pct dyn) 71.62 68.24 -4.72%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/8 71.62 68.35 -4.58%
ristretto benches/Batch Ristretto double-and-encode/4 6.34 6.60 4.10%
multiscalar benches/Variable-time mixed-base/(size: 4) (50pct dyn) 50.92 48.93 -3.92%
montgomery benches/Constant-time fixed-base scalar mul 16.56 17.22 4.01%
multiscalar benches/Variable-time variable-base multiscalar multiplication/4 52.98 51.08 -3.58%
scalar benches/Scalar addition 22964000.0 22191000.0 -3.37%
edwards benches/Variable-time aA+bB A variable B fixed 34.62 33.53 -3.16%
multiscalar benches/Variable-time mixed-base/(size: 4) (0pct dyn) 47.89 46.43 -3.06%
multiscalar benches/Variable-time mixed-base/(size: 4) (20pct dyn) 47.89 46.43 -3.05%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/4 47.82 46.42 -2.93%
ristretto benches/RistrettoPoint compression 4.57 4.69 2.67%
scalar benches/Batch scalar inversion/16 15.92 16.32 2.52%
edwards benches/Constant-time fixed-base scalar mul 12.72 12.40 -2.45%
ristretto benches/Batch Ristretto double-and-encode/2 5.18 5.31 2.48%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1 31.54 30.78 -2.40%
montgomery benches/Montgomery pseudomultiplication 55.64 54.34 -2.35%
multiscalar benches/Variable-time variable-base multiscalar multiplication/2 38.59 37.73 -2.22%
ristretto benches/RistrettoPoint decompression 4.61 4.71 2.16%
multiscalar benches/Variable-time mixed-base/(size: 2) (50pct dyn) 37.52 36.85 -1.78%
edwards benches/EdwardsPoint decompression 4.32 4.39 1.69%
scalar benches/Batch scalar inversion/8 14.48 14.72 1.67%
scalar benches/Scalar subtraction 21485000.0 21139000.0 -1.61%
ristretto benches/Batch Ristretto double-and-encode/1 4.58 4.64 1.41%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1 30.16 29.77 -1.32%
scalar benches/Batch scalar inversion/4 13.76 13.95 1.32%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/2 35.93 35.52 -1.14%
multiscalar benches/Variable-time mixed-base/(size: 1) (0pct dyn) 30.24 29.93 -1.04%
multiscalar benches/Variable-time mixed-base/(size: 1) (20pct dyn) 30.24 29.94 -1.00%
multiscalar benches/Variable-time mixed-base/(size: 1) (50pct dyn) 30.24 29.95 -0.96%
multiscalar benches/Variable-time mixed-base/(size: 2) (20pct dyn) 35.99 35.68 -0.86%
multiscalar benches/Variable-time mixed-base/(size: 2) (0pct dyn) 35.98 35.68 -0.84%
scalar benches/Batch scalar inversion/1 13.21 13.31 0.72%
scalar benches/Batch scalar inversion/2 13.38 13.47 0.70%
edwards benches/EdwardsPoint compression 3.95 3.95 -0.07%
Before this PR vs after this PR (AVX2)
testcase before PR after PR %
multiscalar benches/Constant-time variable-base multiscalar multiplication/16 191.57 179.74 -6.18%
multiscalar benches/Constant-time variable-base multiscalar multiplication/32 363.33 342.09 -5.85%
multiscalar benches/Constant-time variable-base multiscalar multiplication/8 108.53 102.31 -5.73%
multiscalar benches/Constant-time variable-base multiscalar multiplication/256 2754.6 2601.79 -5.55%
multiscalar benches/Constant-time variable-base multiscalar multiplication/128 1392.4 1315.19 -5.54%
multiscalar benches/Constant-time variable-base multiscalar multiplication/64 708.22 669.14 -5.52%
multiscalar benches/Constant-time variable-base multiscalar multiplication/384 4111.90 3886.29 -5.49%
edwards benches/Constant-time fixed-base scalar mul 13.45 12.72 -5.45%
multiscalar benches/Constant-time variable-base multiscalar multiplication/768 8200.8 7778.8 -5.15%
multiscalar benches/Constant-time variable-base multiscalar multiplication/512 5475.79 5194.8 -5.13%
multiscalar benches/Constant-time variable-base multiscalar multiplication/4 67.04 63.62 -5.09%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1024 10969.0 10432.0 -4.90%
multiscalar benches/Constant-time variable-base multiscalar multiplication/2 46.28 44.24 -4.41%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1 35.90 34.53 -3.81%
edwards benches/Variable-time aA+bB A variable B fixed 35.93 34.62 -3.63%
scalar benches/Scalar addition 23745000.0 22964000.0 -3.29%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1024 5061.0 4906.0 -3.06%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1 32.53 31.54 -3.03%
multiscalar benches/Variable-time variable-base multiscalar multiplication/16 143.46 139.12 -3.03%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1 31.10 30.16 -3.01%
multiscalar benches/Variable-time variable-base multiscalar multiplication/2 39.78 38.59 -3.00%
multiscalar benches/Variable-time variable-base multiscalar multiplication/4 54.57 52.98 -2.92%
multiscalar benches/Variable-time variable-base multiscalar multiplication/8 84.14 81.73 -2.87%
multiscalar benches/Variable-time mixed-base/(size: 1) (20pct dyn) 31.12 30.24 -2.83%
multiscalar benches/Variable-time mixed-base/(size: 1) (0pct dyn) 31.12 30.24 -2.81%
multiscalar benches/Variable-time mixed-base/(size: 1) (50pct dyn) 31.11 30.24 -2.79%
multiscalar benches/Variable-time mixed-base/(size: 2) (20pct dyn) 37.01 35.99 -2.77%
multiscalar benches/Variable-time mixed-base/(size: 2) (0pct dyn) 37.01 35.98 -2.77%
multiscalar benches/Variable-time mixed-base/(size: 2) (50pct dyn) 38.58 37.52 -2.74%
multiscalar benches/Variable-time variable-base multiscalar multiplication/32 261.0 253.95 -2.70%
multiscalar benches/Variable-time variable-base multiscalar multiplication/64 494.95 481.67 -2.68%
multiscalar benches/Variable-time mixed-base/(size: 16) (50pct dyn) 134.23 130.68 -2.64%
multiscalar benches/Variable-time variable-base multiscalar multiplication/128 963.84 938.67 -2.61%
multiscalar benches/Variable-time mixed-base/(size: 8) (20pct dyn) 75.22 73.28 -2.58%
multiscalar benches/Variable-time mixed-base/(size: 384) (0pct dyn) 2381.5 2320.3 -2.57%
multiscalar benches/Variable-time mixed-base/(size: 512) (20pct dyn) 3303.4 3218.6 -2.57%
multiscalar benches/Variable-time mixed-base/(size: 512) (0pct dyn) 3171.7 3090.6 -2.56%
multiscalar benches/Variable-time mixed-base/(size: 8) (0pct dyn) 73.49 71.62 -2.55%
multiscalar benches/Variable-time mixed-base/(size: 64) (20pct dyn) 432.92 421.92 -2.54%
multiscalar benches/Variable-time mixed-base/(size: 768) (20pct dyn) 4990.70 4864.0 -2.54%
multiscalar benches/Variable-time mixed-base/(size: 64) (50pct dyn) 458.09 446.53 -2.52%
multiscalar benches/Variable-time mixed-base/(size: 4) (20pct dyn) 49.13 47.89 -2.52%
multiscalar benches/Variable-time mixed-base/(size: 16) (20pct dyn) 127.44 124.24 -2.51%
multiscalar benches/Variable-time mixed-base/(size: 256) (0pct dyn) 1593.39 1553.39 -2.51%
multiscalar benches/Variable-time mixed-base/(size: 128) (20pct dyn) 842.11 821.04 -2.50%
multiscalar benches/Variable-time mixed-base/(size: 64) (0pct dyn) 414.72 404.41 -2.49%
multiscalar benches/Variable-time mixed-base/(size: 4) (0pct dyn) 49.11 47.89 -2.49%
multiscalar benches/Variable-time mixed-base/(size: 4) (50pct dyn) 52.21 50.92 -2.47%
multiscalar benches/Variable-time mixed-base/(size: 768) (0pct dyn) 4785.70 4667.6 -2.47%
multiscalar benches/Variable-time mixed-base/(size: 256) (20pct dyn) 1661.7 1620.9 -2.46%
multiscalar benches/Variable-time mixed-base/(size: 32) (50pct dyn) 243.34 237.38 -2.45%
multiscalar benches/Variable-time mixed-base/(size: 128) (0pct dyn) 806.45 786.7 -2.45%
multiscalar benches/Variable-time mixed-base/(size: 384) (20pct dyn) 2480.10 2419.79 -2.43%
multiscalar benches/Variable-time mixed-base/(size: 1024) (0pct dyn) 6477.40 6320.5 -2.42%
multiscalar benches/Variable-time variable-base multiscalar multiplication/512 2863.4 2794.1 -2.42%
multiscalar benches/Variable-time mixed-base/(size: 512) (50pct dyn) 3493.9 3409.60 -2.41%
multiscalar benches/Variable-time variable-base multiscalar multiplication/384 2302.29 2246.9 -2.41%
multiscalar benches/Variable-time mixed-base/(size: 128) (50pct dyn) 888.06 866.8 -2.39%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/2 36.81 35.93 -2.39%
multiscalar benches/Variable-time mixed-base/(size: 1024) (20pct dyn) 6725.3 6565.1 -2.38%
multiscalar benches/Variable-time mixed-base/(size: 8) (50pct dyn) 79.32 77.43 -2.37%
multiscalar benches/Variable-time mixed-base/(size: 384) (50pct dyn) 2624.89 2562.60 -2.37%
multiscalar benches/Variable-time mixed-base/(size: 32) (0pct dyn) 220.72 215.51 -2.36%
multiscalar benches/Variable-time mixed-base/(size: 768) (50pct dyn) 5264.9 5141.4 -2.35%
multiscalar benches/Variable-time mixed-base/(size: 32) (20pct dyn) 229.98 224.61 -2.33%
multiscalar benches/Variable-time mixed-base/(size: 256) (50pct dyn) 1752.69 1712.1 -2.32%
multiscalar benches/Variable-time variable-base multiscalar multiplication/256 1659.7 1621.3 -2.31%
multiscalar benches/Variable-time mixed-base/(size: 16) (0pct dyn) 122.44 119.67 -2.26%
multiscalar benches/Variable-time mixed-base/(size: 1024) (50pct dyn) 7049.1 6894.3 -2.20%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/4 48.82 47.82 -2.03%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/64 413.36 405.01 -2.02%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/512 3163.7 3099.8 -2.02%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1024 6469.59 6339.5 -2.01%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/256 1589.9 1558.6 -1.97%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/128 804.5 788.72 -1.96%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/32 219.82 215.53 -1.95%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/384 2374.89 2328.6 -1.95%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/8 72.99 71.62 -1.87%
multiscalar benches/Variable-time variable-base multiscalar multiplication/768 3978.0 3903.9 -1.86%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/768 4771.7 4683.5 -1.85%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/16 121.62 119.42 -1.81%
edwards benches/Constant-time variable-base scalar mul 35.77 35.43 -0.95%
ristretto benches/RistrettoPoint compression 4.61 4.57 -0.90%
scalar benches/Scalar multiplication 92256000.0 91662000.0 -0.64%
edwards benches/EdwardsPoint decompression 4.34 4.32 -0.59%
scalar benches/Scalar subtraction 21571000.0 21485000.0 -0.40%
ristretto benches/Batch Ristretto double-and-encode/16 13.25 13.21 -0.29%
scalar benches/Batch scalar inversion/8 14.46 14.48 0.18%
scalar benches/Batch scalar inversion/16 15.89 15.92 0.18%
ristretto benches/Batch Ristretto double-and-encode/2 5.17 5.18 0.17%
montgomery benches/Constant-time fixed-base scalar mul 16.53 16.56 0.15%
scalar benches/Batch scalar inversion/4 13.75 13.76 0.10%
scalar benches/Scalar inversion 12.93 12.92 -0.07%
scalar benches/Batch scalar inversion/2 13.37 13.38 0.07%
scalar benches/Batch scalar inversion/1 13.21 13.21 0.06%
montgomery benches/Montgomery pseudomultiplication 55.62 55.64 0.04%
edwards benches/EdwardsPoint compression 3.95 3.95 -0.04%
ristretto benches/Batch Ristretto double-and-encode/4 6.34 6.34 -0.03%
ristretto benches/RistrettoPoint decompression 4.62 4.61 -0.03%
ristretto benches/Batch Ristretto double-and-encode/8 8.67 8.66 -0.01%
ristretto benches/Batch Ristretto double-and-encode/1 4.58 4.58 0.01%
Before this PR vs after this PR (AVX512)
testcase before PR after PR %
scalar benches/Scalar inversion 12.63 16.17 28.06%
scalar benches/Scalar addition 25469000.0 22191000.0 -12.87%
scalar benches/Scalar subtraction 24187000.0 21139000.0 -12.60%
montgomery benches/Constant-time fixed-base scalar mul 16.33 17.22 5.42%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1024 6698.0 6908.5 3.14%
multiscalar benches/Constant-time variable-base multiscalar multiplication/32 229.61 236.33 2.93%
multiscalar benches/Constant-time variable-base multiscalar multiplication/768 4990.2 5123.8 2.68%
multiscalar benches/Constant-time variable-base multiscalar multiplication/2 36.87 35.94 -2.50%
multiscalar benches/Constant-time variable-base multiscalar multiplication/8 74.11 75.82 2.32%
multiscalar benches/Constant-time variable-base multiscalar multiplication/1 30.14 29.46 -2.24%
multiscalar benches/Constant-time variable-base multiscalar multiplication/384 2511.7 2567.0 2.20%
multiscalar benches/Constant-time variable-base multiscalar multiplication/512 3354.9 3427.9 2.18%
multiscalar benches/Constant-time variable-base multiscalar multiplication/64 440.6 449.67 2.06%
multiscalar benches/Constant-time variable-base multiscalar multiplication/256 1684.0 1718.6 2.05%
multiscalar benches/Variable-time variable-base multiscalar multiplication/256 1082.0 1104.0 2.03%
multiscalar benches/Constant-time variable-base multiscalar multiplication/128 857.79 872.49 1.71%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/64 359.36 365.51 1.71%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/128 688.29 699.73 1.66%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/384 2006.80 2039.8 1.64%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/512 2667.60 2708.5 1.53%
multiscalar benches/Variable-time variable-base multiscalar multiplication/64 425.4 431.84 1.51%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/256 1348.0 1368.4 1.51%
multiscalar benches/Variable-time variable-base multiscalar multiplication/128 823.29 835.7 1.51%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/768 4004.2 4059.99 1.39%
multiscalar benches/Variable-time variable-base multiscalar multiplication/16 127.28 129.05 1.39%
multiscalar benches/Variable-time variable-base multiscalar multiplication/8 76.45 77.50 1.38%
multiscalar benches/Variable-time variable-base multiscalar multiplication/32 227.04 230.12 1.36%
multiscalar benches/Variable-time variable-base multiscalar multiplication/4 50.44 51.08 1.27%
multiscalar benches/Variable-time mixed-base/(size: 1) (0pct dyn) 30.26 29.93 -1.09%
multiscalar benches/Variable-time mixed-base/(size: 1) (20pct dyn) 30.27 29.94 -1.09%
multiscalar benches/Variable-time mixed-base/(size: 1) (50pct dyn) 30.27 29.95 -1.05%
multiscalar benches/Constant-time variable-base multiscalar multiplication/4 49.11 49.58 0.96%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1 29.49 29.77 0.95%
scalar benches/Scalar multiplication 98142000.0 99047000.0 0.92%
multiscalar benches/Constant-time variable-base multiscalar multiplication/16 127.44 128.61 0.92%
multiscalar benches/Variable-time mixed-base/(size: 2) (0pct dyn) 35.92 35.68 -0.68%
multiscalar benches/Variable-time mixed-base/(size: 2) (20pct dyn) 35.91 35.68 -0.66%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/32 195.15 196.43 0.66%
multiscalar benches/Variable-time variable-base multiscalar multiplication/2 37.49 37.73 0.65%
multiscalar benches/Variable-time mixed-base/(size: 128) (50pct dyn) 758.23 763.06 0.64%
multiscalar benches/Variable-time mixed-base/(size: 256) (0pct dyn) 1355.9 1347.39 -0.63%
multiscalar benches/Variable-time mixed-base/(size: 512) (0pct dyn) 2683.1 2666.6 -0.61%
multiscalar benches/Variable-time mixed-base/(size: 16) (50pct dyn) 118.58 119.31 0.62%
multiscalar benches/Variable-time mixed-base/(size: 128) (0pct dyn) 693.79 689.65 -0.60%
multiscalar benches/Variable-time mixed-base/(size: 32) (0pct dyn) 196.0 194.85 -0.59%
multiscalar benches/Variable-time mixed-base/(size: 768) (0pct dyn) 4021.70 3998.3 -0.58%
multiscalar benches/Variable-time mixed-base/(size: 1024) (0pct dyn) 5402.4 5371.8 -0.57%
multiscalar benches/Variable-time mixed-base/(size: 8) (0pct dyn) 68.63 68.24 -0.56%
multiscalar benches/Variable-time mixed-base/(size: 384) (0pct dyn) 2018.1 2007.1 -0.55%
multiscalar benches/Variable-time mixed-base/(size: 512) (50pct dyn) 2938.0 2954.0 0.54%
multiscalar benches/Variable-time variable-base multiscalar multiplication/384 1483.5 1475.7 -0.53%
multiscalar benches/Variable-time mixed-base/(size: 64) (50pct dyn) 394.37 396.4 0.51%
multiscalar benches/Variable-time mixed-base/(size: 2) (50pct dyn) 37.04 36.85 -0.51%
multiscalar benches/Variable-time mixed-base/(size: 384) (50pct dyn) 2212.5 2223.6 0.50%
multiscalar benches/Variable-time mixed-base/(size: 256) (50pct dyn) 1483.5 1490.89 0.50%
multiscalar benches/Variable-time mixed-base/(size: 768) (50pct dyn) 4413.2 4435.2 0.50%
multiscalar benches/Variable-time mixed-base/(size: 8) (50pct dyn) 72.31 72.67 0.50%
multiscalar benches/Variable-time mixed-base/(size: 4) (0pct dyn) 46.65 46.43 -0.48%
multiscalar benches/Variable-time mixed-base/(size: 16) (0pct dyn) 111.99 111.45 -0.48%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/8 68.66 68.35 -0.46%
multiscalar benches/Variable-time mixed-base/(size: 32) (50pct dyn) 211.61 212.54 0.44%
multiscalar benches/Variable-time mixed-base/(size: 4) (20pct dyn) 46.63 46.43 -0.42%
multiscalar benches/Variable-time mixed-base/(size: 1024) (50pct dyn) 5897.70 5921.59 0.41%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1024 3239.1 3251.79 0.39%
multiscalar benches/Variable-time mixed-base/(size: 64) (0pct dyn) 362.16 360.78 -0.38%
multiscalar benches/Variable-time variable-base multiscalar multiplication/1 30.69 30.78 0.32%
ristretto benches/Batch Ristretto double-and-encode/1 4.66 4.64 -0.30%
ristretto benches/Batch Ristretto double-and-encode/8 9.24 9.21 -0.27%
edwards benches/EdwardsPoint compression 3.96 3.95 -0.26%
edwards benches/Variable-time aA+bB A variable B fixed 33.44 33.53 0.26%
edwards benches/Constant-time variable-base scalar mul 29.90 29.97 0.25%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/16 111.51 111.78 0.24%
multiscalar benches/Variable-time mixed-base/(size: 4) (50pct dyn) 48.83 48.93 0.21%
multiscalar benches/Variable-time variable-base multiscalar multiplication/768 2525.1 2529.79 0.19%
scalar benches/Batch scalar inversion/1 13.33 13.31 -0.16%
multiscalar benches/Variable-time mixed-base/(size: 768) (20pct dyn) 4187.40 4180.5 -0.16%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/2 35.46 35.52 0.16%
multiscalar benches/Variable-time mixed-base/(size: 32) (20pct dyn) 202.18 201.88 -0.15%
multiscalar benches/Variable-time mixed-base/(size: 1024) (20pct dyn) 5607.59 5599.5 -0.14%
scalar benches/Batch scalar inversion/4 13.97 13.95 -0.14%
ristretto benches/RistrettoPoint compression 4.69 4.69 0.14%
multiscalar benches/Variable-time variable-base multiscalar multiplication/512 1852.10 1854.60 0.13%
multiscalar benches/Variable-time mixed-base/(size: 8) (20pct dyn) 69.53 69.44 -0.13%
multiscalar benches/Variable-time mixed-base/(size: 384) (20pct dyn) 2097.70 2095.1 -0.12%
scalar benches/Batch scalar inversion/8 14.74 14.72 -0.12%
ristretto benches/Batch Ristretto double-and-encode/16 14.27 14.25 -0.11%
ristretto benches/RistrettoPoint decompression 4.71 4.71 0.11%
multiscalar benches/Variable-time mixed-base/(size: 256) (20pct dyn) 1408.4 1406.9 -0.11%
edwards benches/EdwardsPoint decompression 4.39 4.39 -0.10%
multiscalar benches/Variable-time mixed-base/(size: 128) (20pct dyn) 719.89 719.19 -0.10%
montgomery benches/Montgomery pseudomultiplication 54.28 54.34 0.10%
multiscalar benches/Variable-time mixed-base/(size: 64) (20pct dyn) 374.8 374.48 -0.09%
scalar benches/Batch scalar inversion/16 16.33 16.32 -0.08%
ristretto benches/Batch Ristretto double-and-encode/4 6.60 6.60 -0.07%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/4 46.39 46.42 0.07%
edwards benches/Constant-time fixed-base scalar mul 12.41 12.40 -0.06%
scalar benches/Batch scalar inversion/2 13.48 13.47 -0.06%
multiscalar benches/Variable-time mixed-base/(size: 512) (20pct dyn) 2786.1 2785.29 -0.03%
ristretto benches/Batch Ristretto double-and-encode/2 5.31 5.31 -0.02%
multiscalar benches/Variable-time fixed-base multiscalar multiplication/1024 5397.5 5398.5 0.02%
multiscalar benches/Variable-time mixed-base/(size: 16) (20pct dyn) 114.78 114.76 -0.02%

@tarcieri
Copy link
Contributor

@koute any chance you could push a commit that would show the difference of what this PR would look like without unsafe_target_feature?

I ask because I think we'd really like to keep 3rd party dependencies to a minimum. There's currently only two enabled by default: cfg-if (@rust-lang) and zeroize (@RustCrypto, i.e. me), with subtle being a @dalek-cryptography crate.

@koute
Copy link
Contributor Author

koute commented May 17, 2023

@koute any chance you could push a commit that would show the difference of what this PR would look like without unsafe_target_feature?

Done. Please take a look at the latest commit. I've only done it partially though, as doing it is really tedious and error-prone (it's easy to forget e.g. an #[inline(always)] annotation somewhere and completely crater the performance). Let me know if you want me to do this for the rest of the code too (which would need to be modified in a similar fashion).

@tarcieri
Copy link
Contributor

@koute c67e430 looks good to me, thanks for attempting it!

@rozbb do you have an opinion on which way to proceed? I think it's worth removing the additional dependency.

@tarcieri tarcieri requested review from rozbb and tarcieri May 20, 2023 19:25
@tarcieri tarcieri mentioned this pull request May 29, 2023
@pinkforest
Copy link
Contributor

pinkforest commented May 30, 2023

Can I suggest to have the features as optional by way of them being negative ?

This way if using negative "disallow" features these can be explicitly ruled out and if one rules them out it applies wholesale

Also this would make them non-default so managing the featureset becomes easier when people use one-size-fits-all featureset

This would make sense also given most people should not need to be disallowing the detection.

fwiw - I'm beginning to think that disallow / forbid should be cfg flags as well giben it's niche e.g.

cfg(curve25519_dalek_forbid = "simd" | "simd_avx512" "simd_avx2")

Having niche configuration scattered between featuresets and cfg() would be confusing otherwise and having them via cfg would make it better to document and educate while leaving the top-level binary in control as intended.

Depending on which one is chosen I can send documentation PR after.

Comment on lines +56 to +58
| `simd_avx2` | ✓ | Allows the AVX2 SIMD backend to be used, if available. |
| `simd_avx512` | ✓ | Allows the AVX512 SIMD backend to be used, if available. |
| `simd` | ✓ | Allows every SIMD backend to be used, if available. |
Copy link
Contributor

@pinkforest pinkforest May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this:

Suggested change
| `simd_avx2` | | Allows the AVX2 SIMD backend to be used, if available. |
| `simd_avx512` | | Allows the AVX512 SIMD backend to be used, if available. |
| `simd` | | Allows every SIMD backend to be used, if available. |
| `forbid_avx2` | | Disallows the AVX2 SIMD backend to be used, if available. |
| `forbid_avx512` | | Disallows the AVX512 SIMD backend to be used, if available. |
| `forbid_simd` | | Disallows every SIMD backend to be used, if available. |

Optionally could make the forbid as cfg as is done with the rest of the configuration.

@koute
Copy link
Contributor Author

koute commented Jun 1, 2023

So do we have a consensus regarding removing the unsafe_target_features dependency and doing everything by hand? (I'm happy to do it, but I don't really want to do all of this legwork and then end up having to revert it in the end.)

@pinkforest Usually cargo features should be only additive, but I guess in this case this doesn't really matter as those features don't actually change any functionality per-se, so sure, we could make those negative.

However, I'd really prefer to keep them as cargo features though; the issue with specifying anything through RUSTFLAGS is that it's really awkward to use, especially for bigger framework-like projects with hundreds/thousands of downstream users where telling everyone "hey, you now need to specify those RUSTFLAGS to compile your stuff because it depends on our stuff" is just not feasible. If we had to specify the flags we need through RUSTFLAGS we'd be most likely forced to fork the crate and change the configuration knob to be a cargo feature.

@tarcieri
Copy link
Contributor

tarcieri commented Jun 1, 2023

So do we have a consensus regarding removing the unsafe_target_features dependency and doing everything by hand?

@koute that would be my preference out of paranoia regarding adding third-party dependencies

@rozbb do you agree?

@pinkforest
Copy link
Contributor

pinkforest commented Jun 1, 2023

However, I'd really prefer to keep them as cargo features though [ .. ]

Yeah I hear you - we can have both via build.rs - similarly in build.rs for the cfg(curve25519_dalek_bits) acts as an override we can put feature-gate there that sets the cfg-flags in build.rs - which means both can be used. I've seen often feature-chains that get broken and then things like openssl might get stuck in the dependency tree 10 layers down :)

No need to change anything on this PR - I can send a PR straight after this to address this as negative feature that works both ways.

@koute
Copy link
Contributor Author

koute commented Jun 1, 2023

No need to change anything on this PR - I can send a PR straight after this to address this as negative feature that works both ways.

I'm happy to change it if you want, but that works for me too. Thank you!

@jrose-signal
Copy link
Contributor

Just to offer an alternate perspective: I think the usual justification for "no negative features" is that they're not composable: if client crate A enables "no_apples" and client crate B enables "no_bananas", a project might not be able to use A and B together because A depends on bananas and B on apples. But in this case it's "avx2_backend" and "avx512_backend" that aren't composable; you cannot have more than one backend enabled at once. If you want to phrase these positively, you could call them "pre_avx2_compatibility" and "pre_avx512_compatibility" or something; then at least --all-features does something meaningful, if probably not desirable.

(The cfg vs. feature debate is not unrelated; I agree with @koute that cfg flags are much less discoverable and more awkward to work with, and I agree with @tarcieri that some controls really are "one choice for the entire build" and cannot be represented composably.)

@rozbb
Copy link
Contributor

rozbb commented Jun 1, 2023

Sorry all, this thread flew under the radar for me. Getting up to date now.

@pinkforest
Copy link
Contributor

pinkforest commented Jun 1, 2023

Yeah that's valid point re: --all-features most (?) would not need (I suspect?) these negative features in any case and that would veer towards using cfg() given it would be niche (?) - is there use-case where this would happen more often that people would really need to disable ?

Should the feature be around to disable detection and forcing backend. In any case I'll create a separate issue as we've visited this issue before and it needs to be re-visited properly.

@koute
Copy link
Contributor Author

koute commented Jun 1, 2023

But in this case it's "avx2_backend" and "avx512_backend" that aren't composable

FYI, actually in this case they are composable. (: That's because they don't force a given backend, they just make it available for selection at runtime if the host on which the program's running supports it.

@rozbb
Copy link
Contributor

rozbb commented Jun 1, 2023

@rozbb do you have an opinion on which way to proceed? I think it's worth removing the additional dependency.

Just reviewed everything. My thought is: removing unsafe_target_feature consistently would 1) add a lot of noise to the code and 2) make it easier to mess up in the future if we make changes or additions. As an intermediate solution, I think we could vendor the dep, or (my preference) just pin to a specific version and call it a day. Thoughts?

@tarcieri
Copy link
Contributor

tarcieri commented Jun 1, 2023

If we had an in-tree dependency like curve25519-dalek-derive, that'd be fine with me.

Note that it will make the custom derive stack at least a default dependency where it isn't right now (i.e. syn has long compile times, although it is commonly found in most projects)

@pinkforest
Copy link
Contributor

pinkforest commented Jun 1, 2023

Would be good reason to do the monorepo base here (now) for this + git combine (later) then

May I suggest landing this PR pinning to version first and then I can just send another PR to rename some files and add it in-tree as monorepo ?

@rozbb
Copy link
Contributor

rozbb commented Jun 1, 2023

I like that!

@koute
Copy link
Contributor Author

koute commented Jun 1, 2023

That sounds good to me! So I'll revert the last commit and I'll pin the version of the unsafe_target_features crate, and later we can just vendor it.

Or perhaps alternatively I could just transfer it to dalek-cryptography? Then it wouldn't be a third-party crate anymore. (:

My three cents regarding the syn dependency: I don't think it's that big of a deal, considering that most projects probably already have it in their dependency tree. I agree that we shouldn't add extra unnecessary dependencies, but all things considered I personally think the tradeoff here is worth it.

@pinkforest
Copy link
Contributor

Yeah better just pin it for now - we were going to re-organise everything to monorepo anyways so this was a good reason to start it

@koute
Copy link
Contributor Author

koute commented Jun 5, 2023

Sorry for the delay.

I have reverted the commit, pinned the version of the dependency, and fixed failing clippy. Should be ready to go!

environment variable:
```sh
RUSTFLAGS='--cfg curve25519_dalek_backend="BACKEND"'
```
where `BACKEND` is `simd` or `fiat`. Equivalently, you can write to
where `BACKEND` is `fiat`. Equivalently, you can write to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if fiat is the only backend we have, perhaps we should reconsider how gating works. But I think that's something we can do in a followup PR.

Cargo.toml Show resolved Hide resolved
Copy link
Contributor

@tarcieri tarcieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with one caveat: we're back to an odd mixture of crate features and cfg attributes for backend selection. I think it kind of makes sense in this case since autodetection can map to multiple backends which work additively, but I worry this might persist the problem where curve25519-dalek is a transitive dependency with these features being enabled via intermediate dependencies.

Curious how this is all going to work with SIMD backends for non-x86 platforms, e.g. NEON (#457). It may make sense to try to land that PR before a final release, just to make sure we have the right abstractions for backend selection which can work correctly across CPU architectures.

@koute
Copy link
Contributor Author

koute commented Jun 6, 2023

Hmm, if fiat is the only backend we have, perhaps we should reconsider how gating works.

Indeed. Not sure if this is a good idea, and this would be a big change and require more refactoring, but maybe we could parametrize all of the public types by the backend? For example, instead of Scalar we could have Scalar<B>, and B would be a type through which the user could pick whatever backend they want, like e.g. Scalar<FiatU64> or Scalar<U32> (with an appropriate default). Then we wouldn't need any --cfg knobs. (Although the compile times would probably suffer.)

but I worry this might persist the problem where curve25519-dalek is a transitive dependency with these features being enabled via intermediate dependencies

Hm, this is a fair point, but I think doing what @pinkforest suggested would probably fix this -- make the features be negative and non-default, so that would make it very unlikely that an intermediate crate using curve25519-dalek would enable them (most likely it'd either use the default features, or no default features and pick a subset).

Curious how this is all going to work with SIMD backends for non-x86 platforms

From what I can see it should work mostly fine, although it might still require some minor refactoring.

@tarcieri
Copy link
Contributor

tarcieri commented Jun 6, 2023

make the features be negative and non-default, so that would make it very unlikely that an intermediate crate using curve25519-dalek would enable them

FWIW we did this with @RustCrypto and I still consider it a mistake. We have/had force-soft feature(s) to disable all hardware optimizations, and intermediate dependencies would turn it on. All it takes is one misbehaving crate in your dependency tree to force a performance downgrade.

That was the major impetus for using cfg attributes: it removes the ability of intermediate dependencies to do this, and relegates all control to the toplevel binary.

@daira
Copy link

daira commented Jun 7, 2023

That was the major impetus for using cfg attributes: it removes the ability of intermediate dependencies to do this, and relegates all control to the toplevel binary.

Agreed. Using either positive or negative features for selection of crypto implementations is also a nightmare for auditing, because it greatly expands the number of crates in which a backdoor could potentially force use of an insecure implementation.

(Example: suppose that you don't trust RDRAND. Now try to audit that the getrandom crate is not relying on it, given that it might if the rdrand feature is enabled. Ugh.)

@rozbb rozbb merged commit e111b5d into dalek-cryptography:main Jun 11, 2023
Comment on lines -158 to -161
| CPU feature | `RUSTFLAGS` | Requires nightly? |
| :--- | :--- | :--- |
| avx2 | `-C target_feature=+avx2` | no |
| avx512ifma | `-C target_feature=+avx512ifma` | yes |
Copy link
Contributor

@tarcieri tarcieri Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed all of this documentation is removed, but the newly introduced crate features aren't equivalent and perhaps this should still be documented.

Namely these flags bypass autodetection and force the use of SIMD features.

Copy link
Contributor

@pinkforest pinkforest Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I can take care of documenting that - just need to re-organise a bit to make room for the vendored dep first
Also need to do a lot more testing now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants