[spark] Support table options via SQL conf for Spark Engine #4393

xiangyuf · 2024-10-28T12:04:22Z

Purpose

Linked issue: close #4371

In some cases, users may want to use spark time travel by setting properties like set spark.paimon.scan.tag-name=tag_3. However, this property will take effect globally if the spark job read multiple tables at the same time.

It would be better if we can support table options via sql conf for Spark Engine. So user can specify different time travel options for different table, like this:

Tests

PaimonOptionsTest.scala

API and Format

Documentation

xiangyuf · 2024-10-28T12:08:01Z

@YannByron @Aitozi Hi, would you kindly review this

xiangyuf · 2024-10-28T12:15:06Z

@Zouxxyy

Aitozi · 2024-10-28T14:35:23Z

Do we need to support the table with same name in different db/catalog? Just like flink's global option do. #2104

JingsongLi · 2024-10-29T02:54:00Z

Do we need to support the table with same name in different db/catalog? Just like flink's global option do. #2104

I think we should find an unified way to unify Flink and Spark.

xiangyuf · 2024-10-29T03:10:13Z

Do we need to support the table with same name in different db/catalog? Just like flink's global option do. #2104

I think we should find an unified way to unify Flink and Spark.

@Aitozi @JingsongLi Thx for reply. +1 for unify this.

xiangyuf · 2024-11-01T03:20:48Z

@JingsongLi @Aitozi
Hi, I've unified flink and spark to support both dynamic table options and global options:
Global options format:
Flink:{config_key}
Spark: spark.paimon.{config_key}.

Table options format:
Flink:paimon.${catalogName}.${dbName}.${tableName}.${config_key}
Spark: spark.paimon.${dbName}.${tableName}.{config_key}

Dynamic table options will override global options if there are conflicts.

WDYT?

docs/content/flink/quick-start.md

docs/content/spark/auxiliary.md

xiangyuf · 2024-11-02T14:02:39Z

@Aitozi I’ve updated the dynamic global options format for Flink as {config_key} instead of paimon.{config_key}

Aitozi · 2024-11-02T15:05:13Z

@Aitozi I’ve updated the dynamic global options format for Flink as {config_key} instead of paimon.{config_key}

Get it, LGTM

Zouxxyy · 2024-11-04T06:51:17Z

@JingsongLi @Aitozi Hi, I've unified flink and spark to support both dynamic table options and global options: Global options format: Flink:{config_key} Spark: spark.paimon.{config_key}.

Table options format: Flink:paimon.${catalogName}.${dbName}.${tableName}.${config_key} Spark: spark.paimon.${dbName}.${tableName}.{config_key}

Dynamic table options will override global options if there are conflicts.

WDYT?

Why flink contains ${catalogName}, but spark not

xiangyuf · 2024-11-04T09:59:27Z

@JingsongLi @Aitozi Hi, I've unified flink and spark to support both dynamic table options and global options: Global options format: Flink:{config_key} Spark: spark.paimon.{config_key}.
Table options format: Flink:paimon.${catalogName}.${dbName}.${tableName}.${config_key} Spark: spark.paimon.${dbName}.${tableName}.{config_key}
Dynamic table options will override global options if there are conflicts.
WDYT?

Why flink contains ${catalogName}, but spark not

@Zouxxyy Updated Spark table option format as:
spark.paimon.${catalogName}.${dbName}.${tableName}.${config_key}

… user experience with Flink engine

xiangyuf · 2024-11-04T12:21:53Z

@Zouxxyy @JingsongLi CI has passed, please take a look.

JingsongLi

Thanks @xiangyuf , looks good to me!

Zouxxyy · 2024-11-05T07:46:18Z

paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/SparkSource.scala

    val catalogContext = CatalogContext.create(
-      Options.fromMap(mergeSQLConf(options)),
-      SparkSession.active.sessionState.newHadoopConf())
+      Options.fromMap(


For SparkSource loadTable, maybe we can just keep the original way, because it is simpler and more general through spark.read.format("paimon").options() to set table level config. And it is difficult to get catalogname and dbname here

… user experience with Flink engine (apache#4393)

xiangyuf force-pushed the table_options_filter branch from b1a023a to 48864d7 Compare October 28, 2024 12:10

xiangyuf force-pushed the table_options_filter branch from 48864d7 to fa99bea Compare October 31, 2024 15:48

xiangyuf changed the title ~~[spark] Support table options filter via SQL conf for Spark Engine~~ [spark] Support table options via SQL conf for Spark Engine Oct 31, 2024

xiangyuf force-pushed the table_options_filter branch from 744b03f to a565167 Compare October 31, 2024 17:57

Aitozi reviewed Nov 2, 2024

View reviewed changes

docs/content/flink/quick-start.md Show resolved Hide resolved

docs/content/flink/quick-start.md Show resolved Hide resolved

docs/content/spark/auxiliary.md Outdated Show resolved Hide resolved

docs/content/spark/auxiliary.md Outdated Show resolved Hide resolved

xiangyuf force-pushed the table_options_filter branch 2 times, most recently from 4e4551e to 774f85c Compare November 2, 2024 13:51

xiangyuf force-pushed the table_options_filter branch 3 times, most recently from 72fe7a2 to cbb9342 Compare November 4, 2024 09:55

xiangyuf closed this Nov 4, 2024

xiangyuf reopened this Nov 4, 2024

[spark] Support table options via SQL conf for Spark Engine and unify…

d988039

… user experience with Flink engine

xiangyuf force-pushed the table_options_filter branch from cbb9342 to d988039 Compare November 4, 2024 11:27

JingsongLi approved these changes Nov 5, 2024

View reviewed changes

JingsongLi merged commit 44ae502 into apache:master Nov 5, 2024
13 checks passed

Zouxxyy reviewed Nov 5, 2024

View reviewed changes

hang8929201 pushed a commit to hang8929201/paimon that referenced this pull request Nov 7, 2024

[spark] Support table options via SQL conf for Spark Engine and unify…

b41b9c9

… user experience with Flink engine (apache#4393)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Support table options via SQL conf for Spark Engine #4393

[spark] Support table options via SQL conf for Spark Engine #4393

xiangyuf commented Oct 28, 2024 •

edited

Loading

xiangyuf commented Oct 28, 2024

xiangyuf commented Oct 28, 2024

Aitozi commented Oct 28, 2024

JingsongLi commented Oct 29, 2024

xiangyuf commented Oct 29, 2024

xiangyuf commented Nov 1, 2024 •

edited

Loading

xiangyuf commented Nov 2, 2024

Aitozi commented Nov 2, 2024

Zouxxyy commented Nov 4, 2024

xiangyuf commented Nov 4, 2024

xiangyuf commented Nov 4, 2024

JingsongLi left a comment

Zouxxyy Nov 5, 2024

[spark] Support table options via SQL conf for Spark Engine #4393

[spark] Support table options via SQL conf for Spark Engine #4393

Conversation

xiangyuf commented Oct 28, 2024 • edited Loading

Purpose

Tests

API and Format

Documentation

xiangyuf commented Oct 28, 2024

xiangyuf commented Oct 28, 2024

Aitozi commented Oct 28, 2024

JingsongLi commented Oct 29, 2024

xiangyuf commented Oct 29, 2024

xiangyuf commented Nov 1, 2024 • edited Loading

xiangyuf commented Nov 2, 2024

Aitozi commented Nov 2, 2024

Zouxxyy commented Nov 4, 2024

xiangyuf commented Nov 4, 2024

xiangyuf commented Nov 4, 2024

JingsongLi left a comment

Choose a reason for hiding this comment

Zouxxyy Nov 5, 2024

Choose a reason for hiding this comment

xiangyuf commented Oct 28, 2024 •

edited

Loading

xiangyuf commented Nov 1, 2024 •

edited

Loading