Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spark] Support table options via SQL conf for Spark Engine #4393

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

xiangyuf
Copy link
Contributor

@xiangyuf xiangyuf commented Oct 28, 2024

Purpose

Linked issue: close #4371

In some cases, users may want to use spark time travel by setting properties like set spark.paimon.scan.tag-name=tag_3. However, this property will take effect globally if the spark job read multiple tables at the same time.

It would be better if we can support table options via sql conf for Spark Engine. So user can specify different time travel options for different table, like this:
image

Tests

  • PaimonOptionsTest.scala

API and Format

Documentation

@xiangyuf
Copy link
Contributor Author

@YannByron @Aitozi Hi, would you kindly review this

@xiangyuf xiangyuf force-pushed the table_options_filter branch from b1a023a to 48864d7 Compare October 28, 2024 12:10
@xiangyuf
Copy link
Contributor Author

@Aitozi
Copy link
Contributor

Aitozi commented Oct 28, 2024

Do we need to support the table with same name in different db/catalog? Just like flink's global option do. #2104

@JingsongLi
Copy link
Contributor

Do we need to support the table with same name in different db/catalog? Just like flink's global option do. #2104

I think we should find an unified way to unify Flink and Spark.

@xiangyuf
Copy link
Contributor Author

Do we need to support the table with same name in different db/catalog? Just like flink's global option do. #2104

I think we should find an unified way to unify Flink and Spark.

@Aitozi @JingsongLi Thx for reply. +1 for unify this.

@xiangyuf xiangyuf force-pushed the table_options_filter branch from 48864d7 to fa99bea Compare October 31, 2024 15:48
@xiangyuf xiangyuf changed the title [spark] Support table options filter via SQL conf for Spark Engine [spark] Support table options via SQL conf for Spark Engine Oct 31, 2024
@xiangyuf xiangyuf force-pushed the table_options_filter branch from 744b03f to a565167 Compare October 31, 2024 17:57
@xiangyuf
Copy link
Contributor Author

xiangyuf commented Nov 1, 2024

@JingsongLi @Aitozi
Hi, I've unified flink and spark to support both dynamic table options and global options:
Global options format:
Flink:{config_key}
Spark: spark.paimon.{config_key}.

Table options format:
Flink:paimon.${catalogName}.${dbName}.${tableName}.${config_key}
Spark: spark.paimon.${dbName}.${tableName}.{config_key}

Dynamic table options will override global options if there are conflicts.

WDYT?

docs/content/flink/quick-start.md Show resolved Hide resolved
docs/content/flink/quick-start.md Show resolved Hide resolved
docs/content/spark/auxiliary.md Outdated Show resolved Hide resolved
docs/content/spark/auxiliary.md Outdated Show resolved Hide resolved
@xiangyuf xiangyuf force-pushed the table_options_filter branch 2 times, most recently from 4e4551e to 774f85c Compare November 2, 2024 13:51
@xiangyuf
Copy link
Contributor Author

xiangyuf commented Nov 2, 2024

@Aitozi I’ve updated the dynamic global options format for Flink as {config_key} instead of paimon.{config_key}

@Aitozi
Copy link
Contributor

Aitozi commented Nov 2, 2024

@Aitozi I’ve updated the dynamic global options format for Flink as {config_key} instead of paimon.{config_key}

Get it, LGTM

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Nov 4, 2024

@JingsongLi @Aitozi Hi, I've unified flink and spark to support both dynamic table options and global options: Global options format: Flink:{config_key} Spark: spark.paimon.{config_key}.

Table options format: Flink:paimon.${catalogName}.${dbName}.${tableName}.${config_key} Spark: spark.paimon.${dbName}.${tableName}.{config_key}

Dynamic table options will override global options if there are conflicts.

WDYT?

Why flink contains ${catalogName}, but spark not

@xiangyuf xiangyuf force-pushed the table_options_filter branch 3 times, most recently from 72fe7a2 to cbb9342 Compare November 4, 2024 09:55
@xiangyuf
Copy link
Contributor Author

xiangyuf commented Nov 4, 2024

@JingsongLi @Aitozi Hi, I've unified flink and spark to support both dynamic table options and global options: Global options format: Flink:{config_key} Spark: spark.paimon.{config_key}.
Table options format: Flink:paimon.${catalogName}.${dbName}.${tableName}.${config_key} Spark: spark.paimon.${dbName}.${tableName}.{config_key}
Dynamic table options will override global options if there are conflicts.
WDYT?

Why flink contains ${catalogName}, but spark not

@Zouxxyy Updated Spark table option format as:
spark.paimon.${catalogName}.${dbName}.${tableName}.${config_key}

@xiangyuf xiangyuf closed this Nov 4, 2024
@xiangyuf xiangyuf reopened this Nov 4, 2024
@xiangyuf xiangyuf force-pushed the table_options_filter branch from cbb9342 to d988039 Compare November 4, 2024 11:27
@xiangyuf
Copy link
Contributor Author

xiangyuf commented Nov 4, 2024

@Zouxxyy @JingsongLi CI has passed, please take a look.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xiangyuf , looks good to me!

@JingsongLi JingsongLi merged commit 44ae502 into apache:master Nov 5, 2024
13 checks passed
val catalogContext = CatalogContext.create(
Options.fromMap(mergeSQLConf(options)),
SparkSession.active.sessionState.newHadoopConf())
Options.fromMap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SparkSource loadTable, maybe we can just keep the original way, because it is simpler and more general through spark.read.format("paimon").options() to set table level config. And it is difficult to get catalogname and dbname here

hang8929201 pushed a commit to hang8929201/paimon that referenced this pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support table options via sql conf for Spark Engine
4 participants