-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[spark] adjust paimon spark structure #4573
Conversation
b6d0f2b
to
cfc8fbb
Compare
90749db
to
0ab2a77
Compare
@Zouxxyy @JingsongLi please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good! The difference between Spark4 and Spark3 is getting bigger and bigger, it is time to introduce Shim now.
private def loadSparkShim(): SparkShim = { | ||
val shims = ServiceLoader.load(classOf[SparkShim]).asScala | ||
if (shims.size != 1) { | ||
throw new IllegalStateException("No available spark shim here.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle multiple versions in the classloader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
/** Load a [[SparkShim]]'s implementation. */ | ||
object SparkShimLoader { | ||
|
||
private var sparkShim: SparkShim = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here can use lazy val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
import org.apache.spark.sql.catalyst.InternalRow | ||
import org.apache.spark.sql.paimon.shims.SparkShimLoader | ||
|
||
abstract class SparkInternalRow extends InternalRow { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this, the current inheritance structure is three layers.
Spark3InternalRow
extends AbstractSparkInternalRow
entends SparkInternalRow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this kind of interface, I prefer to keep the definition and implementation separate.
|
||
abstract class InternalRow extends SparkInternalRow {} | ||
override def createSparkInternalRow(rowType: RowType): SparkInternalRow = { | ||
new Spark3InternalRow(rowType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use a unified style, such as unifying PaimonSpark3SqlExtensionsParser
Spark3InternalRow
,
or PaimonSparkSqlExtensionsParser
SparkInternalRow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right,I choose the first style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 and left some comments
f414e22
to
14b1371
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
To modify the spark structure as below (It is similar to Apache Gluten https://github.com/apache/incubator-gluten):
spark-common
module to the bottom layer, and make thespark3-common
andspark4-common
modules depend on it;spark-ut
module;spark-3.5
orspark-4.0
) can directly depend on thesparkx-common
module with the corresponding major version (eg:spark3.5
->spark3-common
,spark4.0
->spark4-common
);spark-ut
andsparkx-common
modules;SparkShim
interface and the SPI mechanism to solve the incompatibility problem, instead of overwriting the classes with the same class name;Linked issue: close #xxx
Tests
API and Format
Documentation