Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] support write to the external path #4770

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

neuyilan
Copy link
Member

@neuyilan neuyilan marked this pull request as draft December 24, 2024 12:26
@neuyilan neuyilan marked this pull request as ready for review December 25, 2024 02:26

// clone job will clone the source path to target warehouse path, so the target external
// path is null
String newExternalPath = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable 'newExternalPath' is never used

Copy link
Member Author

@neuyilan neuyilan Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in line 165, we first declare a variable of String type, and then call the public DataFileMeta copy(String newExternalPath) function. The reason why we declare variables of type String is because there are many other copy functions in DataMileMeta, if we directly call DataFileMeta.copy(null), the compiler doesn't know which function is being called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I means that you can write "manifestEntry.file().copy((String) null))" here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

private final Path root;
// this is the table data root path
private final Path dataRoot;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be it is better named externalRoot?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the current location where data is written, but it may not necessarily be the external path, it could also be the warehouse root, so it is called dataRoot here. It is the same as CoreOptions.TABLE_DATA_PATH, as it description "The data file path of this table in the filesystem. if data-file.external-path is not set, it will be same with PATH.key"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -139,6 +141,37 @@ private Path pathOfTable(Table table) {
return new Path(table.options().get(CoreOptions.PATH.key()));
}

private void copyManifestFile(Path sourcePath, Path targetPath, CloneFileInfo cloneFileInfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy ManifestFile by this new method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because source table's manifest's file entry have DataFileMeta field, in this commit[1], we have add externalPath in DataFileMeta, so when we clone source table's manifest, we should rewrite the manifest file, rewrite the externalPath in DataMileMeta to null, (being null means that the datafile exists in the warehouse path).
[1] 9ff5192

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in clone action, maybe it is better to don't change file's content.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, since we introduced the externalPath, which points to the data storage location of the source table, we need to modify this property to the warehouse path of the target table when cloning.
Do you have any good suggestions to solve this problem?

manifestEntry.partition(),
manifestEntry.bucket(),
manifestEntry.totalBuckets(),
manifestEntry.file().copy(newExternalPath));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value 'newExternalPath' is always 'null'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the same as above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -2364,6 +2390,10 @@ public boolean statsDenseStore() {
return options.get(METADATA_STATS_DENSE_STORE);
}

public String dataFileExternalPath() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where this method be called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@neuyilan neuyilan requested a review from wwj6591812 December 27, 2024 02:38
@JingsongLi
Copy link
Contributor

Hi Houliang,

How about adding an option to table: external-paths?

And we can introduce an option to table: external-paths.strategy, it
is an enum, can be ROUND_ROBIN (use all paths, pick one for a file),
NONE (not use external paths).

@neuyilan
Copy link
Member Author

neuyilan commented Jan 2, 2025

Hi Houliang,

How about adding an option to table: external-paths?

And we can introduce an option to table: external-paths.strategy, it is an enum, can be ROUND_ROBIN (use all paths, pick one for a file), NONE (not use external paths).

Good point!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants