Tc paimon 0.9 compatible spark3.4 timestamp #16
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
This PR is for compatibility with versions of Spark prior to 3.4 concerning the conversion of TimestampType:
In historical versions, due to both Paimon's TimestampType and LocalZonedTimestamp being converted to Spark's TimestampType, many of our Paimon table field attributes were set to TimestampType. However, when we attempted to upgrade to Spark 3.4, the reading of these tables was converted to Spark's TimestampNTZType. This has caused discrepancies in timestamp type data in terms of time zones in our business data. Therefore, we hope to accommodate this issue through configurations added during Spark execution.
Tests
test sql:
The gmt_canceled field type of the paimon table paimon.test_db.test_timestamp_001 is timestamp
SELECT gmt_canceled_time, to_timestamp(gmt_canceled_timestamp) actual_time FROM ( SELECT t1.gmt_canceled as gmt_canceled_time, unix_timestamp(t1.gmt_canceled) as gmt_canceled_timestamp FROM paimon.test_db.test_timestamp_001 as t1 ) tab1 limit 10
Before Repair:
The timestamp read by spark3.4 differs from the actual time by 8 hours
After Repair:
The read time is consistent with the real time
When submitting a spark job, you need to add: --conf spark.sql.paimon.inferTimestampNTZ.enabled=false
API and Format
no
Documentation
no