Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try target-cpu=native #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Try target-cpu=native #3

wants to merge 2 commits into from

Conversation

Skgland
Copy link

@Skgland Skgland commented Jul 20, 2023

Setting target-cpu to native might enable the compiler to apply more optimization.

I had some significant improvements, but also some regressions on my machine.

My local bench results from adding this Created using the compare target in the justfile.

compare.log

running 3 tests
test tests::test_all_s ... ignored
test tests::test_large ... ignored
test tests::test_simple ... ignored

test result: ok. 0 passed; 0 failed; 3 ignored; 0 measured; 0 filtered out; finished in 0.00s

run_switches/baseline_unicode
                        time:   [3.5822 ms 3.5846 ms 3.5870 ms]
                        thrpt:  [265.87 MiB/s 266.05 MiB/s 266.23 MiB/s]
                 change:
                        time:   [-0.8116% -0.7452% -0.6735%] (p = 0.00 < 0.05)
                        thrpt:  [+0.6781% +0.7508% +0.8182%]
                        Change within noise threshold.
run_switches/baseline   time:   [3.1508 ms 3.1551 ms 3.1597 ms]
                        thrpt:  [301.82 MiB/s 302.26 MiB/s 302.68 MiB/s]
                 change:
                        time:   [-0.3682% -0.1833% +0.0162%] (p = 0.06 > 0.05)
                        thrpt:  [-0.0162% +0.1837% +0.3695%]
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
run_switches/opt1_idiomatic
                        time:   [152.82 µs 152.86 µs 152.91 µs]
                        thrpt:  [6.0908 GiB/s 6.0927 GiB/s 6.0944 GiB/s]
                 change:
                        time:   [-76.164% -76.123% -76.061%] (p = 0.00 < 0.05)
                        thrpt:  [+317.72% +318.81% +319.54%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
run_switches/opt2_count_s
                        time:   [93.749 µs 93.887 µs 94.042 µs]
                        thrpt:  [9.9032 GiB/s 9.9196 GiB/s 9.9342 GiB/s]
                 change:
                        time:   [-66.435% -66.356% -66.271%] (p = 0.00 < 0.05)
                        thrpt:  [+196.48% +197.23% +197.93%]
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  2 (2.00%) low severe
  13 (13.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe
run_switches/opt3_count_s_branchless
                        time:   [74.144 µs 74.154 µs 74.165 µs]
                        thrpt:  [12.557 GiB/s 12.559 GiB/s 12.561 GiB/s]
                 change:
                        time:   [-69.834% -69.725% -69.658%] (p = 0.00 < 0.05)
                        thrpt:  [+229.58% +230.30% +231.49%]
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  6 (6.00%) high mild
  10 (10.00%) high severe
run_switches/opt6_chunk_count
                        time:   [19.981 µs 19.993 µs 20.005 µs]
                        thrpt:  [46.554 GiB/s 46.583 GiB/s 46.610 GiB/s]
                 change:
                        time:   [+6.1832% +6.6117% +7.2029%] (p = 0.00 < 0.05)
                        thrpt:  [-6.7189% -6.2017% -5.8231%]
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
run_switches/opt6_chunk_exact_count
                        time:   [14.590 µs 14.596 µs 14.603 µs]
                        thrpt:  [63.777 GiB/s 63.807 GiB/s 63.832 GiB/s]
                 change:
                        time:   [+1.1436% +1.6182% +1.9169%] (p = 0.00 < 0.05)
                        thrpt:  [-1.8808% -1.5924% -1.1307%]
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  6 (6.00%) high mild
  7 (7.00%) high severe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant