Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[iceberg] Introduce feature to migrate table from iceberg to paimon #4639

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

LsomeYeah
Copy link
Contributor

@LsomeYeah LsomeYeah commented Dec 4, 2024

Purpose

Linked issue: close #xxx

Paimon has supported generating Iceberg compatible metadata, so that paimon tables can be consumed directly by Iceberg readers. Now paimon try to support an action or a procedure to support migrating iceberg table to paimon table.

The general implementation idea of this feature includes the following steps:

  1. read snapshot metadata of iceberg latest snapshot
  2. get all data files used by iceberg latest snapshot(existing of deleted kind file will cause exception or be filtered)
  3. migrate these data files to paimon

This pr supports basic ability to migrating iceberg table to paimon, including:

  1. core implementation of migrating, supported iceberg catalog type: hadoop-catalog, hive-catalog
  2. UT cases

Procedure or action is not included in this pr.

Tests

org.apache.paimon.iceberg.IcebergMigrateTest

API and Format

Documentation

@JingsongLi
Copy link
Contributor

Please write [WIP] in PR title.

@JingsongLi JingsongLi changed the title [iceberg] Introduce procedure to migrate table from iceberg to paimon [WIP][iceberg] Introduce procedure to migrate table from iceberg to paimon Dec 4, 2024
@LsomeYeah LsomeYeah changed the title [WIP][iceberg] Introduce procedure to migrate table from iceberg to paimon [iceberg] Introduce feature to migrate table from iceberg to paimon Dec 9, 2024
this.paimonCatalog = paimonCatalog;
this.paimonFileIO = paimonCatalog.fileIO();
this.paimonDatabaseName = paimonDatabaseName;
this.paimonTableNameame = paimonTableNameame;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.paimonTableNameame = paimonTableNameame;
this.paimonTableName = paimonTableName;

Schema paimonSchema = icebergSchemaToPaimonSchema(icebergMetadata);
Identifier paimonIdentifier = Identifier.create(paimonDatabaseName, paimonTableNameame);

paimonCatalog.createDatabase(paimonDatabaseName, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why false? User must migrate to a non-existing database?


private List<IcebergManifestFileMeta> checkAndFilterManifestFiles(
List<IcebergManifestFileMeta> icebergManifestFileMetas) {
if (!ignoreDelete) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need such an option? If there are deletion vectors in Iceberg, and user uses this option, then the resulting data will be incorrect. Incorrect data are useless to the users.

@@ -190,6 +199,70 @@ private static Object toTypeObject(DataType dataType, int fieldId, int depth) {
}
}

public DataType getDataType() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class already has a dataType member. Check if dataType is null, if not just return that object, otherwise calculate data type from type string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants