-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] The partition expire was not correctly triggered during the commit execution. #3434
Comments
If my logic is correct, then I believe this issue is caused by the PartitionExpire instance not being reused. It can be resolved by recording the instance object in the upper layer AbstractFileStoreTable and reusing this instance object each time a TableCommitImpl instance is created. I look forward to your response and feedback. |
in a stream env, StoreCommitter holds TableCommitImpl, so expiration of partition works as expected。But in a batch env, if you create TableCommitImpl every time, sure there will be a new PartitionExpire every time and PartitionExpire#expire() will not be executed。but in both stream env and batch env, paimon uses TableCommitImpl to commit changes, there's no other way to differ it's a batch env or stream env in TableCommitImpl. But it does have a
maybe Overall, Paimon is a stream-first computing engine, and some designs do not take batch processing into consideration. |
Just need add parameter |
My issue is not that the submitter cannot distinguish between batch processing and streaming modes, but rather that the BatchWrite I am using cannot be triggered at all. Your commit did not resolve my problem. |
Search before asking
Paimon version
0.8
Compute Engine
JavaAPI
Minimal reproduce step
According to the examples in the documentation, this is how we perform the commit. Taking BatchWrite as an example, after writing the data, we need to create a new instance of the BatchTableCommit class each time to execute the commit.
Based on the following call chain, we can infer that each time a BatchTableCommit instance is created, a TableCommitImpl instance is ultimately created within it. In its constructor, a PartitionExpire instance is passed as a parameter.
Based on the following code, we can infer that each time a TableCommitImpl instance is created, the org.apache.paimon.AbstractFileStore#newPartitionExpire method is called to create a new PartitionExpire instance.
Based on the constructor of the PartitionExpire class, we can infer that when the instance is initialized, lastCheck is set to the current time.
After BatchTableCommit is created, based on the example, we immediately start the commit. When the commit is completed, the org.apache.paimon.operation.PartitionExpire#expire(long) method of the PartitionExpire instance is called, as shown in the following code, to check for partition expiration.
But at this time, lastCheck is set to now because it was just initialized. Using the default value checkInterval=1h as an example, lastCheck.plus(checkInterval) would be one hour later. Therefore, now.isAfter(lastCheck.plus(checkInterval)) always results in false, causing the partition expiration to be skipped.
And because the BatchTableCommit can only perform a single commit, the next time we execute a commit, we will use a brand new PartitionExpire instance. This causes our commits to always fail to trigger the partition expiration check.
Please help me check if my logic is correct or if there is an issue with my usage.
What doesn't meet your expectations?
The partition expiration parameters set on the table did not take effect because they were not correctly triggered during the commit.
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: