Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spark] Add EvalSubqueriesForDeleteTable for quick delete with subqueries #3464

Merged
merged 4 commits into from
Jun 4, 2024

Conversation

Zouxxyy
Copy link
Contributor

@Zouxxyy Zouxxyy commented Jun 3, 2024

Purpose

Add new rule EvalSubqueriesForDeleteTable:

For those delete conditions with subqueries that only contain partition columns, we can eval them
in advance. So that when running DeleteFromPaimonTableCommand, we can directly call
dropPartitions to achieve fast deletion.

Tests

Test on 100T tpcds's DF_SS q2: before 100s+, after 2s

API and Format

Documentation

override def apply(plan: LogicalPlan): LogicalPlan = {
plan.transformDown {
case d @ DeleteFromPaimonTableCommand(_, table, condition)
if SubqueryExpression.hasSubquery(condition) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider the case that this condition contains the partition predicate with subquery and other data predicates. I think taking this case into account may reduce the splits we have to scan.

@YannByron YannByron merged commit 24f1624 into apache:master Jun 4, 2024
9 checks passed
@Zouxxyy Zouxxyy deleted the dev/delete-sub branch June 11, 2024 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants