-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Support reading partition from fallback branch when not found in current branch #3816
Changes from all commits
23d53bd
5844a36
6fb332b
9ef845b
e5c7376
b23f17b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -159,3 +159,86 @@ Run the following command: | |
{{< /tab >}} | ||
|
||
{{< /tabs >}} | ||
|
||
### Batch Reading from Fallback Branch | ||
|
||
You can set the table option `scan.fallback-branch` | ||
so that when a batch job reads from the current branch, if a partition does not exist, | ||
the reader will try to read this partition from the fallback branch. | ||
For streaming read jobs, this feature is currently not supported, and will only produce results from the current branch. | ||
|
||
What's the use case of this feature? Say you have created a Paimon table partitioned by date. | ||
You have a long-running streaming job which inserts records into Paimon, so that today's data can be queried in time. | ||
You also have a batch job which runs at every night to insert corrected records of yesterday into Paimon, | ||
so that the preciseness of the data can be promised. | ||
|
||
When you query from this Paimon table, you would like to first read from the results of batch job. | ||
But if a partition (for example, today's partition) does not exist in its result, | ||
then you would like to read from the results of streaming job. | ||
In this case, you can create a branch for streaming job, and set `scan.fallback-branch` to this streaming branch. | ||
|
||
Let's look at an example. | ||
|
||
{{< tabs "read-fallback-branch" >}} | ||
|
||
{{< tab "Flink" >}} | ||
|
||
```sql | ||
-- create Paimon table | ||
CREATE TABLE T ( | ||
dt STRING NOT NULL, | ||
name STRING NOT NULL, | ||
amount BIGINT | ||
) PARTITIONED BY (dt); | ||
|
||
-- create a branch for streaming job | ||
CALL sys.create_branch('default.T', 'test'); | ||
|
||
-- set primary key and bucket number for the branch | ||
ALTER TABLE `T$branch_test` SET ( | ||
'primary-key' = 'dt,name', | ||
'bucket' = '2', | ||
'changelog-producer' = 'lookup' | ||
); | ||
|
||
-- set fallback branch | ||
ALTER TABLE T SET ( | ||
'scan.fallback-branch' = 'test' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How to reset it? 'scan.fallback-branch' = null ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
); | ||
|
||
-- write records into the streaming branch | ||
INSERT INTO `T$branch_test` VALUES ('20240725', 'apple', 4), ('20240725', 'peach', 10), ('20240726', 'cherry', 3), ('20240726', 'pear', 6); | ||
|
||
-- write records into the default branch | ||
INSERT INTO T VALUES ('20240725', 'apple', 5), ('20240725', 'banana', 7); | ||
|
||
SELECT * FROM T; | ||
/* | ||
+------------------+------------------+--------+ | ||
| dt | name | amount | | ||
+------------------+------------------+--------+ | ||
| 20240725 | apple | 5 | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. better first run the query without setting |
||
| 20240725 | banana | 7 | | ||
| 20240726 | cherry | 3 | | ||
| 20240726 | pear | 6 | | ||
+------------------+------------------+--------+ | ||
*/ | ||
|
||
-- reset fallback branch | ||
ALTER TABLE T RESET ( 'scan.fallback-branch' ); | ||
|
||
-- now it only reads from default branch | ||
SELECT * FROM T; | ||
/* | ||
+------------------+------------------+--------+ | ||
| dt | name | amount | | ||
+------------------+------------------+--------+ | ||
| 20240725 | apple | 5 | | ||
| 20240725 | banana | 7 | | ||
+------------------+------------------+--------+ | ||
*/ | ||
``` | ||
|
||
{{< /tab >}} | ||
|
||
{{< /tabs >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the expected behaviour if the feature for the streaming read had correctly implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't decided yet. We need to talk with the users about it.