Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash join flatten fix #4668

Merged
merged 1 commit into from
Dec 29, 2024
Merged

Hash join flatten fix #4668

merged 1 commit into from
Dec 29, 2024

Conversation

andyfengHKU
Copy link
Contributor

Description

This PR fixes a bug in hash join planning exposed in PR #4625. Below is a description of the bug.

We append flatten to probe side if and only if join key is an internal ID and this ID is guaranteed to be unique on the build side. The check was done based on number of factorization group. This is not true if an aggregation happens, which will always reduce the number of factorization group to 1.

This PR checks by looking at operators along the build side subplan.

Testing

The bug cannot be revealed easily on master because we don't reorder multi-part queries. So aggregation does not happen on build side. I have tested by cherry-picking to #4625.

Fixes # (issue)

Contributor agreement

Copy link

Benchmark Result

Master commit hash: c14b6d45134f889ab44831a9a4531c0272e07a26
Branch commit hash: 7bef8293756a68c629e7640c180dc3c28ff1218b

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 653.91 664.61 -10.69 (-1.61%)
aggregation q28 12157.54 11590.99 566.54 (4.89%)
filter q14 133.24 134.15 -0.91 (-0.68%)
filter q15 137.65 134.36 3.28 (2.44%)
filter q16 319.96 313.93 6.03 (1.92%)
filter q17 454.16 462.71 -8.55 (-1.85%)
filter q18 1942.46 1944.74 -2.28 (-0.12%)
filter zonemap-node 96.33 96.79 -0.46 (-0.47%)
filter zonemap-node-lhs-cast 96.44 96.52 -0.08 (-0.08%)
filter zonemap-node-null 93.09 92.51 0.59 (0.63%)
filter zonemap-rel 5813.85 5649.63 164.22 (2.91%)
fixed_size_expr_evaluator q07 590.55 597.48 -6.93 (-1.16%)
fixed_size_expr_evaluator q08 820.04 823.67 -3.64 (-0.44%)
fixed_size_expr_evaluator q09 818.74 831.01 -12.26 (-1.48%)
fixed_size_expr_evaluator q10 257.34 257.09 0.25 (0.10%)
fixed_size_expr_evaluator q11 251.08 250.84 0.24 (0.09%)
fixed_size_expr_evaluator q12 244.72 245.42 -0.69 (-0.28%)
fixed_size_expr_evaluator q13 1470.16 1480.84 -10.68 (-0.72%)
fixed_size_seq_scan q23 127.74 127.99 -0.25 (-0.19%)
join q29 644.96 613.32 31.64 (5.16%)
join q30 1615.44 1497.83 117.61 (7.85%)
join q31 5.43 6.07 -0.64 (-10.48%)
join SelectiveTwoHopJoin 50.74 54.80 -4.06 (-7.41%)
ldbc_snb_ic q35 2658.65 2657.63 1.01 (0.04%)
ldbc_snb_ic q36 558.95 558.26 0.69 (0.12%)
ldbc_snb_is q32 6.70 7.49 -0.80 (-10.65%)
ldbc_snb_is q33 15.37 14.84 0.53 (3.57%)
ldbc_snb_is q34 1.32 1.15 0.17 (14.47%)
multi-rel multi-rel-large-scan 1314.56 1339.92 -25.36 (-1.89%)
multi-rel multi-rel-lookup 9.50 10.12 -0.62 (-6.13%)
multi-rel multi-rel-small-scan 93.82 64.23 29.59 (46.06%)
order_by q25 140.25 139.08 1.17 (0.84%)
order_by q26 458.88 461.11 -2.23 (-0.48%)
order_by q27 1471.90 1493.22 -21.32 (-1.43%)
recursive_join recursive-join-bidirection 285.28 242.56 42.72 (17.61%)
recursive_join recursive-join-dense 7420.85 4796.68 2624.16 (54.71%)
recursive_join recursive-join-path 23882.58 23154.06 728.52 (3.15%)
recursive_join recursive-join-sparse 14311.92 13989.04 322.88 (2.31%)
recursive_join recursive-join-trail 7344.41 5085.94 2258.46 (44.41%)
scan_after_filter q01 178.23 178.96 -0.73 (-0.41%)
scan_after_filter q02 164.72 166.16 -1.44 (-0.87%)
shortest_path_ldbc100 q37 86.98 92.16 -5.18 (-5.62%)
shortest_path_ldbc100 q38 272.99 363.74 -90.75 (-24.95%)
shortest_path_ldbc100 q39 65.07 63.73 1.34 (2.10%)
shortest_path_ldbc100 q40 346.34 402.46 -56.12 (-13.94%)
var_size_expr_evaluator q03 2108.42 2119.11 -10.69 (-0.50%)
var_size_expr_evaluator q04 2320.79 2211.57 109.22 (4.94%)
var_size_expr_evaluator q05 2703.90 2770.93 -67.03 (-2.42%)
var_size_expr_evaluator q06 1341.94 1352.98 -11.04 (-0.82%)
var_size_seq_scan q19 1477.15 1468.30 8.85 (0.60%)
var_size_seq_scan q20 2744.76 2644.71 100.04 (3.78%)
var_size_seq_scan q21 2317.77 2294.25 23.52 (1.03%)
var_size_seq_scan q22 132.09 129.66 2.43 (1.88%)

Copy link

codecov bot commented Dec 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.60%. Comparing base (c14b6d4) to head (7e84af4).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4668   +/-   ##
=======================================
  Coverage   86.60%   86.60%           
=======================================
  Files        1373     1373           
  Lines       58273    58267    -6     
  Branches     7231     7226    -5     
=======================================
- Hits        50465    50460    -5     
+ Misses       7642     7641    -1     
  Partials      166      166           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -165,6 +165,24 @@ binder::expression_vector LogicalHashJoin::getJoinNodeIDs(
return result;
}

class JoinNodeIDUniquenessAnalyzer {
public:
bool isUnique(LogicalOperator* op, const binder::Expression& joinNodeID) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this function static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@andyfengHKU andyfengHKU force-pushed the hash-join-flatten-fix branch from 41c6449 to 7e84af4 Compare December 29, 2024 04:10
Copy link

Benchmark Result

Master commit hash: c14b6d45134f889ab44831a9a4531c0272e07a26
Branch commit hash: 82b79d0d4b03b86fec5e2225cd5a2399ef9ab734

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 657.34 664.61 -7.27 (-1.09%)
aggregation q28 11595.42 11590.99 4.43 (0.04%)
filter q14 128.89 134.15 -5.26 (-3.92%)
filter q15 124.03 134.36 -10.33 (-7.69%)
filter q16 303.91 313.93 -10.02 (-3.19%)
filter q17 446.90 462.71 -15.81 (-3.42%)
filter q18 1883.36 1944.74 -61.38 (-3.16%)
filter zonemap-node 88.45 96.79 -8.34 (-8.61%)
filter zonemap-node-lhs-cast 91.79 96.52 -4.73 (-4.91%)
filter zonemap-node-null 88.63 92.51 -3.88 (-4.19%)
filter zonemap-rel 5771.03 5649.63 121.40 (2.15%)
fixed_size_expr_evaluator q07 579.41 597.48 -18.07 (-3.02%)
fixed_size_expr_evaluator q08 808.71 823.67 -14.96 (-1.82%)
fixed_size_expr_evaluator q09 813.55 831.01 -17.46 (-2.10%)
fixed_size_expr_evaluator q10 243.93 257.09 -13.15 (-5.12%)
fixed_size_expr_evaluator q11 238.24 250.84 -12.60 (-5.02%)
fixed_size_expr_evaluator q12 233.65 245.42 -11.77 (-4.80%)
fixed_size_expr_evaluator q13 1455.78 1480.84 -25.06 (-1.69%)
fixed_size_seq_scan q23 116.92 127.99 -11.07 (-8.65%)
join q29 658.15 613.32 44.84 (7.31%)
join q30 1633.41 1497.83 135.59 (9.05%)
join q31 5.39 6.07 -0.68 (-11.20%)
join SelectiveTwoHopJoin 55.09 54.80 0.29 (0.53%)
ldbc_snb_ic q35 2667.76 2657.63 10.13 (0.38%)
ldbc_snb_ic q36 541.66 558.26 -16.60 (-2.97%)
ldbc_snb_is q32 5.97 7.49 -1.53 (-20.36%)
ldbc_snb_is q33 13.20 14.84 -1.64 (-11.02%)
ldbc_snb_is q34 1.13 1.15 -0.02 (-1.62%)
multi-rel multi-rel-large-scan 1343.12 1339.92 3.20 (0.24%)
multi-rel multi-rel-lookup 21.28 10.12 11.15 (110.18%)
multi-rel multi-rel-small-scan 75.33 64.23 11.10 (17.28%)
order_by q25 133.05 139.08 -6.03 (-4.33%)
order_by q26 451.81 461.11 -9.29 (-2.02%)
order_by q27 1500.78 1493.22 7.56 (0.51%)
recursive_join recursive-join-bidirection 312.07 242.56 69.51 (28.66%)
recursive_join recursive-join-dense 7407.18 4796.68 2610.49 (54.42%)
recursive_join recursive-join-path 24050.33 23154.06 896.27 (3.87%)
recursive_join recursive-join-sparse 14542.25 13989.04 553.21 (3.95%)
recursive_join recursive-join-trail 7360.45 5085.94 2274.51 (44.72%)
scan_after_filter q01 170.66 178.96 -8.30 (-4.64%)
scan_after_filter q02 156.19 166.16 -9.98 (-6.00%)
shortest_path_ldbc100 q37 86.60 92.16 -5.56 (-6.03%)
shortest_path_ldbc100 q38 358.83 363.74 -4.91 (-1.35%)
shortest_path_ldbc100 q39 56.09 63.73 -7.65 (-12.00%)
shortest_path_ldbc100 q40 422.46 402.46 20.00 (4.97%)
var_size_expr_evaluator q03 2085.59 2119.11 -33.52 (-1.58%)
var_size_expr_evaluator q04 2179.43 2211.57 -32.13 (-1.45%)
var_size_expr_evaluator q05 2694.21 2770.93 -76.72 (-2.77%)
var_size_expr_evaluator q06 1346.05 1352.98 -6.93 (-0.51%)
var_size_seq_scan q19 1465.81 1468.30 -2.49 (-0.17%)
var_size_seq_scan q20 2722.51 2644.71 77.79 (2.94%)
var_size_seq_scan q21 2304.20 2294.25 9.95 (0.43%)
var_size_seq_scan q22 127.11 129.66 -2.55 (-1.97%)

@andyfengHKU andyfengHKU merged commit 17a4121 into master Dec 29, 2024
25 checks passed
@andyfengHKU andyfengHKU deleted the hash-join-flatten-fix branch December 29, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants