-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Introduce DeletionVectorIndexFileWriter #3402
Conversation
private final boolean isWrittenToMulitFiles; | ||
private final long targetSizeInBytes; | ||
|
||
private boolean written = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class looks a bit strange, why does it need variables? It only provides one method, which appears to not require any state preservation.
written = true; | ||
writtenSizeInBytes += currentSize; | ||
} | ||
result.add(writer.closeWriter()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to use try (resource)
to avoid resource leak.
return new SingleIndexFileWriter(fileIO, indexPathFactory.newPath()); | ||
} | ||
|
||
static class SingleIndexFileWriter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private, and maybe can be an inner class instead of static.
In this way, you can inline createWriter
to the SingleIndexFileWriter
constructor.
c682030
to
ca79eca
Compare
long currentSize = writer.write(entry.getKey(), entry.getValue()); | ||
|
||
if (isWrittenToMulitFiles | ||
&& !writer.hasWritten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!writer.hasWritten()
always return true?
Maybe you should remove it.
|
||
private final PathFactory indexPathFactory; | ||
private final FileIO fileIO; | ||
private final boolean isWrittenToMulitFiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: isWrittenToMultiFiles
} | ||
} | ||
|
||
class SingleIndexFileWriter implements Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
|
||
private long writtenSizeInBytes = 0L; | ||
|
||
public SingleIndexFileWriter(FileIO fileIO, Path path) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public SingleIndexFileWriter() throws IOException {
this.path = indexPathFactory.newPath();
.....
}
public DeletionVectorIndexFileWriter( | ||
FileIO fileIO, | ||
PathFactory pathFactory, | ||
BucketMode bucketMode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't need bucketMode
, here we can just pass a long targetSizePerIndexFile
. If it is BUCKET_UNAWARE
, targetSizePerIndexFile
can be Long.MAX
.
Same to DeletionVectorsIndexFile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine. That's what I thought at first. But I don't think it's a little serious.
|
||
public List<IndexFileMeta> write(Map<String, DeletionVector> input) throws IOException { | ||
if (input.isEmpty()) { | ||
return emptyIndexFile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is possible? Why we need to return a empty Index file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty index file is needed to overwrite the previous one when compact
.
I am going to think it again when to support compact
operation on deletion vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here just aligned to the original logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may only be to overwrite the old deletion file, as there is no proactive message to delete the deletion file for the primary key table.
But for append table, maybe here no need to generate empty deletion file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
to support merge and write multi deletion vector within same partition and bucket to one index file.
Linked issue: close #xxx
Tests
API and Format
Documentation