Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Introduce ExpireChangelogImpl to decouple the changelog lifecycle #3110

Merged
merged 11 commits into from
Apr 1, 2024

Conversation

Aitozi
Copy link
Contributor

@Aitozi Aitozi commented Mar 28, 2024

Purpose

This PR is the first step for #2899. It's mainly do the following things.

  • Add changelog related option to define the changlog lifecycle.
  • Introduce ExpireChangelogImpl to handle the changelog expire

Tests

API and Format

Documentation

@Aitozi Aitozi force-pushed the changelog-lifecycle branch from e750a4c to 2992cd3 Compare March 28, 2024 05:20
@Aitozi
Copy link
Contributor Author

Aitozi commented Mar 29, 2024

After discuss with @JingsongLi . We decide not to support decouple the delta log in first version. The reason is that, it's not easy to determine whether a DELETE file is from an APPEND commit when expire a snapshot. One solution is to modify the ManifestEntry to mark the file source. We may implement this later, not in this PR.

@Aitozi Aitozi force-pushed the changelog-lifecycle branch from ab2ab88 to c64f5fb Compare March 29, 2024 10:25
*/
public class Changelog extends Snapshot {

private static final int CURRENT_VERSION = 3;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current version is set to 3, equal to the version of snapshot. Otherwise such as 1, when pass the Changelog as Snapshot to org.apache.paimon.operation.AbstractFileStoreScan#readManifests() it will be recognize as a table_store_02_version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set version to 1 now. Refactor logic in AbstractFileStoreScan#readManifests()

* changelog of the table can outlive the snapshot's lifecycle. A table's changelog can come from
* two source:
* <li>The changelog file. Eg: from the changelog-producer = 'input'
* <li>The delta files in the APPEND commits when the changelog-producer = 'none'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If currently this feature is not work. We can not document it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

*/
public class Changelog extends Snapshot {

private static final int CURRENT_VERSION = 1;
Copy link
Contributor

@JingsongLi JingsongLi Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why its version is 1? But the version of snapshot is 3?
We can set it to Snapshot.CURRENT_VERSION

@@ -386,6 +387,10 @@ private List<ManifestFileMeta> readManifests(Snapshot snapshot) {
case DELTA:
return snapshot.deltaManifests(manifestList);
case CHANGELOG:
if (snapshot instanceof Changelog) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just set version to 3.

snapshotDeletion.cleanUnusedManifests(snapshot, skippingSet);

// delete snapshot last
if (changelogDecoupled) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should write changelog first? Image there is a failover here.

Maybe you can add test in FileStoreExpireTestBase, to make sure changelog files generated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot path is deleted after changelog files are generated. The test org.apache.paimon.operation.ExpireSnapshotsTest#testChangelogOutLivedSnapshot will verify the changelog metadata is generated.

if (changelog.changelogManifestList() != null) {
snapshotDeletion.deleteAddedDataFiles(changelog.changelogManifestList());
}
if (changelog.deltaManifestList() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not consider deltaManifest now.

@@ -33,6 +34,7 @@
import java.io.UncheckedIOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.SortedMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only Rollback, you need to consider orphan file clean too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this can be done in another pr ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the workload is not significant, you can consider completing it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@Aitozi Aitozi force-pushed the changelog-lifecycle branch from 69dae82 to 35f8b73 Compare April 1, 2024 06:15
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@JingsongLi JingsongLi merged commit 46609f2 into apache:master Apr 1, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants