Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6768][CH] Try to use multi join on clauses instead of inequal join condition #6787

Merged
merged 4 commits into from
Aug 16, 2024

Conversation

lgbo-ustc
Copy link
Contributor

@lgbo-ustc lgbo-ustc commented Aug 12, 2024

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

Fixes: #6768

Transform a join with inequal condition into multi join on clauses as possible, it could be more efficient. For example convert

on t1.key = t2.key and (t1.a1 = t2.a1 or t1.a2 = t1.a2 or t1.a3 = t2.a3)

to

on (t1.key = t2.key and t1.a1 = t2.a1) or (t1.key = t2.key and t1.a2 = t1.a2) or (t1.key = t2.key and t1.a3 = t2.a3)

We need to limit the right table size to avoid OOM, because we can only use hash join algorithm on multi join on clauses.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

unit tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Copy link

#6768

Copy link

Run Gluten Clickhouse CI

@lgbo-ustc
Copy link
Contributor Author

lgbo-ustc commented Aug 12, 2024

before(run timeout and cannot finish)
image

after
image

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

2 similar comments
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@lgbo-ustc lgbo-ustc marked this pull request as ready for review August 15, 2024 06:49
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@github-actions github-actions bot removed CORE works for Gluten Core VELOX labels Aug 16, 2024
Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

Copy link
Contributor

@liuneng1994 liuneng1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liuneng1994 liuneng1994 merged commit 5f4e558 into apache:main Aug 16, 2024
9 checks passed
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
… join condition (apache#6787)

What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)

Fixes: apache#6768

Transform a join with inequal condition into multi join on clauses as possible, it could be more efficient. For example convert

on t1.key = t2.key and (t1.a1 = t2.a1 or t1.a2 = t1.a2 or t1.a3 = t2.a3)
to

on (t1.key = t2.key and t1.a1 = t2.a1) or (t1.key = t2.key and t1.a2 = t1.a2) or (t1.key = t2.key and t1.a3 = t2.a3)
We need to limit the right table size to avoid OOM, because we can only use hash join algorithm on multi join on clauses.

How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

unit tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] A bad case for joining with mixed join conditions
2 participants