-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] support decouple the delta files lifecycle #3178
Conversation
Perhaps it's best for us to have an abstract mechanism to ensure that we don't keep both the changelog and delta files at the same time, so that we can better understand this set of things. |
078b05a
to
fa73cb7
Compare
I extract this logic to In this PR, I also refactor the |
9c06546
to
2ae1028
Compare
@@ -71,7 +71,7 @@ class ExpireSnapshotsProcedureTest extends PaimonSparkTestBase with StreamTest { | |||
|
|||
// expire | |||
checkAnswer( | |||
spark.sql("CALL paimon.sys.expire_snapshots(table => 'test.T', retain_max => 2)"), | |||
spark.sql("CALL paimon.sys.expire_snapshots(table => 'test.T', retain_max => 2, retain_min => 1)"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before, if user not specify the retain_min
, the default value is 1 in ExpireSnapshotsImpl
, now the default value is fallback to the CoreOptions.SNAPSHOT_RETAIN_MIN = 10
, so if max is 2, we should manually specify the retain_min => 1
. I think the current behavior is more consistent, I'm not sure whether this will break the compatibility. Please also help check this cc @JingsongLi
Already merged to three PRs. |
Purpose
This PR is meant to support decouple the delta files lifecycle #2899
The basic idea behind this is that:
DatafileMeta
to indicate whether this file is generated as anAPPEND
orCOMPACT
fileAPPEND
files in data filebase
anddelta
manifest file for thenone
producer are also postpone to deleteAbout why we need
FileSource
in DataFileMetaFor
none
changelog producer, onlyAPPEND
commits are required for stream read. In aCOMPACT
commit, some files from the compact or append could be marked as delete. We should delete the files from the compact commit and keep the files from the append commit for further stream read. So we need a flag to distinguish the file source (compact or append).Linked issue: close #xxx
Tests
API and Format
Introduce
FileSource
in DataFileMetaDocumentation