-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spark] Support spark 4.0 #4325
Conversation
5f291ae
to
a122bfd
Compare
a122bfd
to
d19542c
Compare
<scala.version>${scala212.version}</scala.version> | ||
<paimon-spark-common.spark.version>3.5.3</paimon-spark-common.spark.version> | ||
<paimon-spark-x.x.common>paimon-spark-3.x-common</paimon-spark-x.x.common> | ||
<test.spark.main.version>3.3</test.spark.main.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test.spark.version
?
and why just test spark 3.3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because paimon use https://github.com/big-data-europe/docker-spark to run e2e test now, and it only supports up to 3.3.0 now, add a todo here
paimon-spark/paimon-spark-4.x-common/src/main/scala/org/apache/spark/sql/shims.scala
Outdated
Show resolved
Hide resolved
paimon-spark/paimon-spark-common/src/main/java/org/apache/paimon/spark/SparkGenericCatalog.java
Outdated
Show resolved
Hide resolved
...imon-spark-common/src/main/scala/org/apache/paimon/spark/commands/MergeIntoPaimonTable.scala
Outdated
Show resolved
Hide resolved
@@ -122,7 +122,11 @@ class BucketedTableQueryTest extends PaimonSparkTestBase with AdaptiveSparkPlanH | |||
spark.sql( | |||
"CREATE TABLE t5 (id INT, c STRING) TBLPROPERTIES ('primary-key' = 'id', 'bucket'='10')") | |||
spark.sql("INSERT INTO t5 VALUES (1, 'x1')") | |||
checkAnswerAndShuffleSorts("SELECT * FROM t1 JOIN t5 on t1.id = t5.id", 2, 2) | |||
if (gteqSpark4_0) { | |||
checkAnswerAndShuffleSorts("SELECT * FROM t1 JOIN t5 on t1.id = t5.id", 0, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the JIRA here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an optimization made in spark4.0. I looked for it, but couldn't find the specific PR
.../paimon-spark-common/src/test/scala/org/apache/paimon/spark/sql/BucketedTableQueryTest.scala
Outdated
Show resolved
Hide resolved
please address some comments. |
f7538bc
to
bd5b697
Compare
<scala.binary.version>2.12</scala.binary.version> | ||
<scala.version>${scala212.version}</scala.version> | ||
<paimon-spark-common.spark.version>3.5.3</paimon-spark-common.spark.version> | ||
<paimon-sparkx-common>paimon-spark3-common</paimon-sparkx-common> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to paimon-spark-common
. cc @JingsongLi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paimon-spark-common may cause ambiguity with the existing paimon-spark-common
module, I prefer to distinguish them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with paimon-sparkx-common
.
@@ -152,10 +156,10 @@ class BucketedTableQueryTest extends PaimonSparkTestBase with AdaptiveSparkPlanH | |||
checkAnswerAndShuffleSorts("SELECT id, max(c) FROM t1 GROUP BY id", 0, 0) | |||
checkAnswerAndShuffleSorts("SELECT c, count(*) FROM t1 GROUP BY c", 1, 0) | |||
checkAnswerAndShuffleSorts("SELECT c, max(c) FROM t1 GROUP BY c", 1, 2) | |||
checkAnswerAndShuffleSorts("select sum(c) OVER (PARTITION BY id ORDER BY c) from t1", 0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Spark 4.0 no longer supports directly calculating the sum of strings, this should be a typo, change it to max like the case above
LGTM. left two comments. cc @JingsongLi |
bd5b697
to
0e4b0e6
Compare
@@ -34,10 +34,22 @@ under the License. | |||
<name>Paimon : Spark : Common</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add scala version here, because this paimon-spark-common
will produce two versions. So we should distinguish them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
to #3940, support Spark 4.0, the following are the module relationships
Tests
API and Format
Documentation