-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] support write to the external path #4770
base: master
Are you sure you want to change the base?
[Core] support write to the external path #4770
Conversation
|
||
// clone job will clone the source path to target warehouse path, so the target external | ||
// path is null | ||
String newExternalPath = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable 'newExternalPath' is never used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in line 165, we first declare a variable of String
type, and then call the public DataFileMeta copy(String newExternalPath)
function. The reason why we declare variables of type String
is because there are many other copy
functions in DataMileMeta
, if we directly call DataFileMeta.copy(null), the compiler doesn't know which function is being called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I means that you can write "manifestEntry.file().copy((String) null))" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
private final Path root; | ||
// this is the table data root path | ||
private final Path dataRoot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be it is better named externalRoot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the current location where data is written, but it may not necessarily be the external path, it could also be the warehouse root, so it is called dataRoot
here. It is the same as CoreOptions.TABLE_DATA_PATH
, as it description "The data file path of this table in the filesystem. if data-file.external-path
is not set, it will be same with PATH.key
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -139,6 +141,37 @@ private Path pathOfTable(Table table) { | |||
return new Path(table.options().get(CoreOptions.PATH.key())); | |||
} | |||
|
|||
private void copyManifestFile(Path sourcePath, Path targetPath, CloneFileInfo cloneFileInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why copy ManifestFile by this new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because source table's manifest's file entry have DataFileMeta
field, in this commit[1], we have add externalPath
in DataFileMeta
, so when we clone source table's manifest, we should rewrite the manifest file, rewrite the externalPath
in DataMileMeta
to null, (being null means that the datafile exists in the warehouse path).
[1] 9ff5192
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in clone action, maybe it is better to don't change file's content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, since we introduced the externalPath
, which points to the data storage location of the source table, we need to modify this property to the warehouse path
of the target table when cloning.
Do you have any good suggestions to solve this problem?
manifestEntry.partition(), | ||
manifestEntry.bucket(), | ||
manifestEntry.totalBuckets(), | ||
manifestEntry.file().copy(newExternalPath)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value 'newExternalPath' is always 'null'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -2364,6 +2390,10 @@ public boolean statsDenseStore() { | |||
return options.get(METADATA_STATS_DENSE_STORE); | |||
} | |||
|
|||
public String dataFileExternalPath() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where this method be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Hi Houliang, How about adding an option to table: external-paths? And we can introduce an option to table: external-paths.strategy, it |
Good point! |
@@ -62,6 +62,8 @@ public class Path implements Comparable<Path>, Serializable { | |||
/** A hierarchical URI. */ | |||
private URI uri; | |||
|
|||
private boolean isExternalPath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid introducing field in Public API Path
.
externalPath -> | ||
new Path(new Path(externalPath, relativePath), newFileName(prefix))) | ||
.orElse(new Path(parent, newFileName(prefix))) | ||
.setExternalPath(externalPathProvider.externalPathExists()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe DataFilePathFactory
can provide a method isExternalPath
?
Purpose
https://cwiki.apache.org/confluence/display/PAIMON/PIP-29%3A+Introduce+Table+Multi-Location++Management
Tests
API and Format
Documentation