Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tc paimon 0.9 compatible spark3.4 timestamp #16

Merged
merged 2 commits into from
Oct 21, 2024

Conversation

MrTanZZ
Copy link

@MrTanZZ MrTanZZ commented Oct 16, 2024

Purpose

This PR is for compatibility with versions of Spark prior to 3.4 concerning the conversion of TimestampType:
In historical versions, due to both Paimon's TimestampType and LocalZonedTimestamp being converted to Spark's TimestampType, many of our Paimon table field attributes were set to TimestampType. However, when we attempted to upgrade to Spark 3.4, the reading of these tables was converted to Spark's TimestampNTZType. This has caused discrepancies in timestamp type data in terms of time zones in our business data. Therefore, we hope to accommodate this issue through configurations added during Spark execution.

Tests

test sql:
The gmt_canceled field type of the paimon table paimon.test_db.test_timestamp_001 is timestamp

SELECT gmt_canceled_time, to_timestamp(gmt_canceled_timestamp) actual_time FROM ( SELECT t1.gmt_canceled as gmt_canceled_time, unix_timestamp(t1.gmt_canceled) as gmt_canceled_timestamp FROM paimon.test_db.test_timestamp_001 as t1 ) tab1 limit 10
Before Repair:
image
The timestamp read by spark3.4 differs from the actual time by 8 hours

After Repair:
The read time is consistent with the real time
image

When submitting a spark job, you need to add: --conf spark.sql.paimon.inferTimestampNTZ.enabled=false

API and Format

no

Documentation

no

@MrTanZZ MrTanZZ closed this Oct 16, 2024
@MrTanZZ MrTanZZ changed the title Tc paimon 0.9 compatible timestamp Tc paimon 0.9 compatible spark3.4 timestamp Oct 16, 2024
@MrTanZZ MrTanZZ reopened this Oct 16, 2024
@wxplovecc
Copy link

etl 过程也尝试进行测试一下,写入场景

@MrTanZZ
Copy link
Author

MrTanZZ commented Oct 21, 2024

In the test cases that I wrote, I transferred part of the data from historical tables that include a Timestamp field into a temporary table to query whether the timestamps in the same records are consistent.
When using the default configuration for spark.sql.paimon.inferTimestampNTZ.enabled, the results obtained are as follows:
378375289-edc31c00-c3a4-4b78-84aa-eb5311f2a716 1
When the configuration of spark.sql.paimon.inferTimestampNTZ.enabled is set to false, the results obtained are as follows:
image
Therefore, this PR does not change Paimon's write behavior for this field type; the results obtained before and after the modification are consistent.

@wxplovecc wxplovecc merged commit c9a572f into tc-paimon-0.9 Oct 21, 2024
3 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants