-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spark] Support report partitioning to eliminate shuffle exchange #3912
Conversation
cc @JingsongLi @YannByron do you have time to take a look ? thank you |
@@ -66,6 +67,10 @@ public class TableSchema implements Serializable { | |||
|
|||
private final List<String> primaryKeys; | |||
|
|||
private List<String> bucketKeys; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this field can be final
Wow, this is bucketed join, this optimization is on my list. |
assert(bucketSpec.getBucketKeys.size() == 1) | ||
Expressions.bucket(bucketSpec.getNumBucket, bucketSpec.getBucketKeys.get(0)) | ||
val key = Expressions.bucket(bucketSpec.getNumBucket, bucketSpec.getBucketKeys.get(0)) | ||
new KeyGroupedPartitioning(Array(key), lazyInputPartitions.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is an equivalent join of two keys, cannot it be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore me, I see Spark does not support bucket with several input attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, it seems an issue in Spark community. I did not find the strong reason why Spark forbid it..
* bucket(10, col)` would fail since we do not implement {@link | ||
* org.apache.spark.sql.connector.catalog.functions.ScalarFunction} | ||
*/ | ||
public static class BucketFunction implements UnboundFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Paimon's bucket calculation method and Spark's bucket function are completely different implementations.
So this function is UnboundFunction
? I'm not sure if my understanding is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnboundFunction
is kind of a unresolved expression in Spark and finally will be resolved to BoundFunction
, see UnboundFunction#bind
method.
For paimon, I think it more like a placeholder as it is not used to do evaluation. It only used to compare if two partitioning are semantics equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An another thing maybe related. It seems Paimon did not follow Spark DSv2 write feature, e.g., RequiresDistributionAndOrdering
, so the bucket function is only used to report partiioning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For using RequiresDistributionAndOrdering
:
The biggest problem before was: "Paimon's bucket calculation method and Spark's bucket function are completely different implementations".
Implementing org.apache.spark.sql.connector.catalog.functions.ScalarFunction
looks like can solve this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you @JingsongLi for the context, I will work on it in followups if find time.
Thanks for this pr @ulysses-you. Can we modify this pr to make it available for spark3.3+ at the first shot? |
40e4d7f
to
5cb8923
Compare
@YannByron I made changes to support 3.3, 3.4 and 3.5, thank you |
Hi, @ulysses-you paimon is different from iceberg in terms of multi-Spark version support. iceberg does this by copying the code, while paimon does this by extracting the common code and solving the compatibility. maybe @JingsongLi can explain more. |
5cb8923
to
9d20037
Compare
This is not easy to explain, I will try to clarify the trade-off here. In the case of minimal changes, avoid copying a large amount of code, and even copy some classes to different versions to solve version incompatibility differences. From my personal perspective, having a lot of code redundancy is not right, and it is not worth abstracting a lot for the sake of compatibility with a small part. |
thank you @JingsongLi , got it. |
9d20037
to
c03b170
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ulysses-you for update! Looks good to me!
And thanks @YannByron for review.
Just to help user to trouble shotting, Spark community has a bug about bucketed scan if there are multi inner joins:
See https://issues.apache.org/jira/browse/SPARK-49179. It has been fixed at 3.4.4/3.5.3/4.0.0. |
Link to #2404. |
Purpose
This pr makes
PaimonScan
implementSupportsReportPartitioning
for Spark engine, so that we can eliminate shuffle exchange when doing join,aggregate,window,etc...For example, the following query with join does not introduce shuffle exchange:
This feature depends on Spark storage partition join, in particular the interface
KeyGroupedPartitioning
which is from Spark3.3.To achive the goal, this pr introduces
BucketSpec
for Paimon to hold the bucket related things:and only if the bucket mode is
HASH_FIXED
we report the partitioning. It now supports primary table and bucket table using one column.Also introduce
FunctionCatalog
forSparkBaseCatalog
to resolvebucket
tranform expression.Tests
add new tests
BucketedTableQueryTest
API and Format
no
Documentation