-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Introduce deletion files to DataSplit #2988
Conversation
b11e473
to
08b2114
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the modification! leave some questions
paimon-core/src/main/java/org/apache/paimon/table/source/DeletionFile.java
Show resolved
Hide resolved
file.schemaId(), fileName, file.fileSize(), file.level()); | ||
if (deletionFile != null) { | ||
DeletionVector deletionVector = | ||
DeletionVector.read(fileIO, deletionFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe can add this to IndexFileHandler, readDeletionVector contains version and crc verification
public DeletionVector readDeletionVector(DeletionFile deletionFile) {
return deletionVectorsIndex.readDeletionVector(
new Path(deletionFile.path()),
Pair.of(deletionFile.offset(), deletionFile.length()));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want introduce IndexFileHandler
here, imagine a C++Reader, which should not rely on any complex classes, the simpler the better.
if (deletionFile != null) { | ||
DeletionVector deletionVector = | ||
DeletionVector.read(fileIO, deletionFile); | ||
reader = ApplyDeletionVectorReader.create(reader, deletionVector); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worry about conflicts with readerFactoryBuilder.withDeletionVectorSupplier
in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unified here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
Deletion file should be in Split, and the reader should respect deletion files in Split, instead of reading deletion manifest again which is very costly.
Tests
API and Format
Documentation