Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Introduce DeletionVectorIndexFileWriter #3402

Merged
merged 9 commits into from
May 30, 2024

Conversation

YannByron
Copy link
Contributor

Purpose

to support merge and write multi deletion vector within same partition and bucket to one index file.

Linked issue: close #xxx

Tests

API and Format

Documentation

private final boolean isWrittenToMulitFiles;
private final long targetSizeInBytes;

private boolean written = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class looks a bit strange, why does it need variables? It only provides one method, which appears to not require any state preservation.

written = true;
writtenSizeInBytes += currentSize;
}
result.add(writer.closeWriter());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to use try (resource) to avoid resource leak.

return new SingleIndexFileWriter(fileIO, indexPathFactory.newPath());
}

static class SingleIndexFileWriter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private, and maybe can be an inner class instead of static.
In this way, you can inline createWriter to the SingleIndexFileWriter constructor.

@YannByron YannByron force-pushed the core_dvindex_writer branch from c682030 to ca79eca Compare May 29, 2024 06:53
long currentSize = writer.write(entry.getKey(), entry.getValue());

if (isWrittenToMulitFiles
&& !writer.hasWritten()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!writer.hasWritten() always return true?

Maybe you should remove it.


private final PathFactory indexPathFactory;
private final FileIO fileIO;
private final boolean isWrittenToMulitFiles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: isWrittenToMultiFiles

}
}

class SingleIndexFileWriter implements Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private


private long writtenSizeInBytes = 0L;

public SingleIndexFileWriter(FileIO fileIO, Path path) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public SingleIndexFileWriter() throws IOException {
     this.path = indexPathFactory.newPath();
     .....
}

public DeletionVectorIndexFileWriter(
FileIO fileIO,
PathFactory pathFactory,
BucketMode bucketMode,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need bucketMode, here we can just pass a long targetSizePerIndexFile. If it is BUCKET_UNAWARE, targetSizePerIndexFile can be Long.MAX.

Same to DeletionVectorsIndexFile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine. That's what I thought at first. But I don't think it's a little serious.


public List<IndexFileMeta> write(Map<String, DeletionVector> input) throws IOException {
if (input.isEmpty()) {
return emptyIndexFile();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is possible? Why we need to return a empty Index file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty index file is needed to overwrite the previous one when compact.
I am going to think it again when to support compact operation on deletion vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here just aligned to the original logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may only be to overwrite the old deletion file, as there is no proactive message to delete the deletion file for the primary key table.

But for append table, maybe here no need to generate empty deletion file?

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit cec1709 into apache:master May 30, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants