Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spark] adjust paimon spark structure #4573

Merged
merged 4 commits into from
Nov 26, 2024

Conversation

YannByron
Copy link
Contributor

@YannByron YannByron commented Nov 22, 2024

Purpose

To modify the spark structure as below (It is similar to Apache Gluten https://github.com/apache/incubator-gluten):
image

  1. adjust the spark-common module to the bottom layer, and make the spark3-common and spark4-common modules depend on it;
  2. extract the separate spark-ut module;
  3. the specific spark modules (like spark-3.5 or spark-4.0) can directly depend on the sparkx-common module with the corresponding major version (eg: spark3.5 -> spark3-common, spark4.0 -> spark4-common);
  4. the specific spark-test modules will depend on spark-ut and sparkx-common modules;
  5. after this adjustment, we use the SparkShim interface and the SPI mechanism to solve the incompatibility problem, instead of overwriting the classes with the same class name;
  6. fix some fragile UTs to make them robust in all spark versions.

Linked issue: close #xxx

Tests

API and Format

Documentation

@YannByron YannByron force-pushed the spark-structure branch 3 times, most recently from b6d0f2b to cfc8fbb Compare November 22, 2024 08:43
@YannByron YannByron changed the title [spark] modify paimon spark structure [WIP][spark] modify paimon spark structure Nov 22, 2024
@YannByron YannByron changed the title [WIP][spark] modify paimon spark structure [spark] adjust paimon spark structure Nov 25, 2024
@YannByron
Copy link
Contributor Author

@Zouxxyy @JingsongLi please take a look.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! The difference between Spark4 and Spark3 is getting bigger and bigger, it is time to introduce Shim now.

private def loadSparkShim(): SparkShim = {
val shims = ServiceLoader.load(classOf[SparkShim]).asScala
if (shims.size != 1) {
throw new IllegalStateException("No available spark shim here.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle multiple versions in the classloader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

/** Load a [[SparkShim]]'s implementation. */
object SparkShimLoader {

private var sparkShim: SparkShim = _
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here can use lazy val

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.paimon.shims.SparkShimLoader

abstract class SparkInternalRow extends InternalRow {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this, the current inheritance structure is three layers.
Spark3InternalRow extends AbstractSparkInternalRow entends SparkInternalRow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this kind of interface, I prefer to keep the definition and implementation separate.


abstract class InternalRow extends SparkInternalRow {}
override def createSparkInternalRow(rowType: RowType): SparkInternalRow = {
new Spark3InternalRow(rowType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use a unified style, such as unifying PaimonSpark3SqlExtensionsParser Spark3InternalRow,
or PaimonSparkSqlExtensionsParser SparkInternalRow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right,I choose the first style.

Copy link
Contributor

@Zouxxyy Zouxxyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 and left some comments

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@YannByron YannByron merged commit 408b78d into apache:master Nov 26, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants