-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Unsafe Undefined Behavior - Zero Sized PlaneData
#23
Conversation
Calls to `alloc` with zero sized layouts is undefined behavior. This is particularly problematic when using non-default allocators. Jemalloc panics when a zero sized layout used. rav1e creates zero sized `Planes` in multiple places to avoid the cost of allocation when the field is not used internally.
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #23 +/- ##
==========================================
+ Coverage 65.58% 68.93% +3.35%
==========================================
Files 4 4
Lines 985 998 +13
==========================================
+ Hits 646 688 +42
+ Misses 339 310 -29 ☔ View full report in Codecov by Sentry. |
PlaneData
Come to think of it, was there a reason we needed to use unsafe allocation here? @lu-zero @barrbrain any chance you remember? To be honest, I'd prefer to rework this into safe code if possible, as long as it doesn't have significant performance implications. |
We need to use aligned memory |
It needs a benchmark to be sure that we aren't paying more with those branches than just doing a min-sized allocation and keep len 0 |
I don't think safe code can work here easily while having control over alignment. Initially, I thought that branches almost always be cheaper than doing min-sized allocations, but I found that What would a benchmark here look like? Would we just compare encode times, or would something that exclusively tested Planes be more suitable? |
There are some existing benches in this crate that cover Frame::new_with_padding and Plane::new_with_padding which should cover this change. Probably wouldn't hurt if we just ran the whole bench suite in v_frame, it's not that large and would make sure the changes to |
|
I'm pretty sure the approach here can be changed a bit to solve most of these issues. Can't we just use That way, we'd just need a special case during construction (and drop) and probably not need to worry about it in deref? Haven't tested it myself, but I think it's possible |
Benchmark ResultsImportant Tested on AMD EPYC 7452 (32 cores, AVX2) rav1eNote rav1e ( MainBenchmark 1: ./target/release/rav1e --benchmark --speed 10 --threads 512 -o park_joy_420_720p50_unpatched.ivf park_joy_420_720p50.y4m -y --tiles 512
Time (mean ± σ): 72.374 s ± 3.134 s [User: 3279.587 s, System: 46.993 s]
Range (min … max): 69.432 s … 79.481 s 10 runs DanglingBenchmark 1: ./target/release/rav1e --benchmark --speed 10 --threads 512 -o park_joy_420_720p50_dangling.ivf park_joy_420_720p50.y4m -y --tiles 512
Time (mean ± σ): 79.984 s ± 2.775 s [User: 3751.362 s, System: 46.819 s]
Range (min … max): 76.706 s … 83.858 s 10 runs OptionBenchmark 1: ./target/release/rav1e --benchmark --speed 10 --threads 512 -o park_joy_420_720p50_options.ivf park_joy_420_720p50.y4m -y --tiles 512
Time (mean ± σ): 87.499 s ± 1.769 s [User: 4185.019 s, System: 45.935 s]
Range (min … max): 83.604 s … 89.096 s 10 runs v_frameNote v_frame benchmark suite MainExpandframe new_with_padding padding=0
time: [5.5458 µs 5.5481 µs 5.5508 µs]
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe
frame new_with_padding padding!=0
time: [8.5383 µs 8.5482 µs 8.5597 µs]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
plane new padding=0 time: [3.4554 µs 3.4568 µs 3.4584 µs]
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
3 (3.00%) high mild
8 (8.00%) high severe
plane new padding!=0 time: [4.6187 µs 4.6210 µs 4.6237 µs]
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
plane clone time: [6.8910 µs 6.8973 µs 6.9044 µs]
Found 18 outliers among 100 measurements (18.00%)
15 (15.00%) high mild
3 (3.00%) high severe
plane pad time: [3.9786 µs 3.9822 µs 3.9861 µs]
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
plane copy_from_raw_u8 8-bit
time: [7.3728 µs 7.3771 µs 7.3817 µs]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
plane copy_from_raw_u8 10-bit
time: [8.3578 µs 8.3989 µs 8.4338 µs]
plane downsampled time: [13.487 µs 13.492 µs 13.498 µs]
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low mild
2 (2.00%) high mild
6 (6.00%) high severe Option (vs Main)Expandframe new_with_padding padding=0
time: [5.6476 µs 5.6486 µs 5.6495 µs]
change: [+1.5166% +1.6587% +1.7744%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
frame new_with_padding padding!=0
time: [8.5360 µs 8.5387 µs 8.5422 µs]
change: [-0.4582% -0.2924% -0.1211%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
plane new padding=0 time: [3.2918 µs 3.2950 µs 3.2989 µs]
change: [-4.8978% -4.7580% -4.6412%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
plane new padding!=0 time: [4.6465 µs 4.6478 µs 4.6493 µs]
change: [+0.3847% +0.5299% +0.6492%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
plane clone time: [7.4861 µs 7.4906 µs 7.4959 µs]
change: [+7.9969% +8.2529% +8.4897%] (p = 0.00 < 0.05)
Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
7 (7.00%) high mild
12 (12.00%) high severe
plane pad time: [3.8507 µs 3.8541 µs 3.8581 µs]
change: [-3.6760% -3.4901% -3.3192%] (p = 0.00 < 0.05)
Performance has improved.
Found 33 outliers among 100 measurements (33.00%)
10 (10.00%) low severe
8 (8.00%) low mild
4 (4.00%) high mild
11 (11.00%) high severe
plane copy_from_raw_u8 8-bit
time: [6.7926 µs 6.8812 µs 6.9702 µs]
change: [-8.4359% -7.9193% -7.4218%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) high mild
9 (9.00%) high severe
plane copy_from_raw_u8 10-bit
time: [7.8379 µs 7.8743 µs 7.9137 µs]
change: [-5.8788% -5.5102% -5.1328%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) high mild
7 (7.00%) high severe
plane downsampled time: [69.591 µs 69.626 µs 69.664 µs]
change: [+415.02% +415.87% +416.77%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
plane downscale time: [71.090 µs 71.094 µs 71.098 µs]
change: [+500.50% +501.21% +501.68%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
plane rows_iter time: [14.619 µs 14.621 µs 14.624 µs]
change: [-0.3328% -0.2295% -0.1348%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe Dangling (vs Main)Expandframe new_with_padding padding=0
time: [5.5834 µs 5.5852 µs 5.5870 µs]
change: [+1.5181% +1.6370% +1.7391%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severe
frame new_with_padding padding!=0
time: [8.3606 µs 8.3625 µs 8.3647 µs]
change: [-0.6897% -0.5402% -0.4254%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
plane new padding=0 time: [3.1663 µs 3.1672 µs 3.1684 µs]
change: [+0.1292% +0.2762% +0.3923%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
plane new padding!=0 time: [4.6428 µs 4.6446 µs 4.6463 µs]
change: [+2.0101% +2.1166% +2.2101%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
plane clone time: [6.4055 µs 6.4060 µs 6.4065 µs]
change: [-15.251% -15.148% -15.073%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high severe
plane pad time: [4.0703 µs 4.0708 µs 4.0713 µs]
change: [+0.9177% +1.0594% +1.1705%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low severe
4 (4.00%) high mild
7 (7.00%) high severe
plane copy_from_raw_u8 8-bit
time: [6.6586 µs 6.6602 µs 6.6620 µs]
change: [-10.209% -9.8462% -9.5142%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
7 (7.00%) high mild
2 (2.00%) high severe
plane copy_from_raw_u8 10-bit
time: [8.1811 µs 8.1887 µs 8.1941 µs]
change: [+1.3277% +1.9080% +2.4623%] (p = 0.00 < 0.05)
Performance has regressed.
plane downsampled time: [13.135 µs 13.147 µs 13.160 µs]
change: [-0.2355% -0.0659% +0.0786%] (p = 0.43 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild
plane downscale time: [11.551 µs 11.554 µs 11.557 µs]
change: [-0.5335% -0.3642% -0.2149%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
plane rows_iter time: [14.917 µs 14.920 µs 14.923 µs]
change: [+1.9106% +2.0829% +2.2188%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe Dangling (vs Option)Expandframe new_with_padding padding=0
time: [5.6026 µs 5.6037 µs 5.6048 µs]
change: [-0.8154% -0.7411% -0.6475%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
1 (1.00%) high mild
6 (6.00%) high severe
frame new_with_padding padding!=0
time: [8.6580 µs 8.6588 µs 8.6598 µs]
change: [+1.2420% +1.3651% +1.4547%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
plane new padding=0 time: [3.1576 µs 3.1586 µs 3.1596 µs]
change: [-4.2208% -4.1403% -4.0633%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
plane new padding!=0 time: [4.4064 µs 4.4083 µs 4.4104 µs]
change: [-5.3059% -5.2506% -5.1973%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
8 (8.00%) high mild
5 (5.00%) high severe
plane clone time: [6.4131 µs 6.4206 µs 6.4290 µs]
change: [-14.541% -14.427% -14.311%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
plane pad time: [3.9366 µs 3.9409 µs 3.9451 µs]
change: [+2.3583% +2.4852% +2.6144%] (p = 0.00 < 0.05)
Performance has regressed.
Found 20 outliers among 100 measurements (20.00%)
5 (5.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
9 (9.00%) high severe
plane copy_from_raw_u8 8-bit
time: [6.7173 µs 6.7181 µs 6.7190 µs]
change: [-1.5480% -0.9748% -0.4779%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
plane copy_from_raw_u8 10-bit
time: [8.1901 µs 8.1917 µs 8.1937 µs]
change: [+4.2488% +4.4765% +4.6728%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
4 (4.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
plane downsampled time: [13.179 µs 13.187 µs 13.197 µs]
change: [-81.114% -81.088% -81.065%] (p = 0.00 < 0.05)
Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
6 (6.00%) low severe
4 (4.00%) high mild
7 (7.00%) high severe
plane downscale time: [11.412 µs 11.416 µs 11.421 µs]
change: [-83.949% -83.941% -83.930%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
plane rows_iter time: [14.566 µs 14.573 µs 14.582 µs]
change: [-0.4033% -0.3578% -0.3154%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) high mild
9 (9.00%) high severe |
I can't currently access my desktop machine, but I tested the branches dangling vs. main on my notebook and the results are noticeably different. This might need more investigation? Not sure how I feel about delaying a soundness fix, but the change in behavior doesn't seem 100% consistent. Dangling vs Main Running benches/bench.rs (target/release/deps/bench-77e6c2e86cb854dc)
frame new_with_padding padding=0
time: [4.8044 µs 4.8139 µs 4.8242 µs]
change: [-18.398% -17.548% -16.674%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking frame new_with_padding padding!=0: Collecting 100 samples in estimated 5.0397 s (626k iterations
frame new_with_padding padding!=0
time: [7.9278 µs 8.0211 µs 8.1120 µs]
change: [-2.5854% -1.7537% -0.8989%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) low severe
3 (3.00%) high mild
2 (2.00%) high severe
plane new padding=0 time: [2.8500 µs 2.8578 µs 2.8661 µs]
change: [+1.4054% +2.1370% +3.1526%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
plane new padding!=0 time: [4.4487 µs 4.4620 µs 4.4746 µs]
change: [-4.0047% -3.1560% -2.3405%] (p = 0.00 < 0.05)
Performance has improved.
plane clone time: [6.8967 µs 6.9171 µs 6.9412 µs]
change: [+1.0552% +1.4883% +1.9725%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe
plane pad time: [3.3016 µs 3.3085 µs 3.3166 µs]
change: [-7.4507% -6.8170% -6.1368%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high mild
plane copy_from_raw_u8 8-bit
time: [6.9379 µs 6.9516 µs 6.9657 µs]
change: [-8.2592% -7.6483% -7.0711%] (p = 0.00 < 0.05)
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
12 (12.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
plane copy_from_raw_u8 10-bit
time: [7.8394 µs 7.8589 µs 7.8804 µs]
change: [-1.1914% -0.4869% +0.1826%] (p = 0.19 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
plane downsampled time: [10.931 µs 11.006 µs 11.083 µs]
change: [-3.7809% -2.8298% -1.8916%] (p = 0.00 < 0.05)
Performance has improved.
plane downscale time: [9.7859 µs 9.8030 µs 9.8221 µs]
change: [-10.635% -9.8731% -9.1971%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
plane rows_iter time: [12.750 µs 12.852 µs 12.974 µs]
change: [-0.1309% +0.5361% +1.1598%] (p = 0.12 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe |
I just noticed that This might also explain the weird performance behavior. There might be value in using |
I opened #24 to see if using an aligned vec crate would be a viable option. It seems to have an acceptable performance impact, which is positive in some cases and negative in others, but below 10% in all cases. My thought process was that that would be the fastest way to get rid of all soundness bugs that might be present as a result of our manual initialization code, by just not doing it ourselves. |
Closed in favor of the 100% safe approach in #24 |
Calls to
alloc
with zero sized layouts is undefined behavior.This is particularly problematic when using non-default allocators.
Jemallocator (using dev profile only) panics when a zero sized layout is used.
rav1e creates zero sized
Planes
in multiple places to avoid the cost of allocation when the field is not used internally.