-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Introduce deletion vectors for primary key table #2898
Labels
enhancement
New feature or request
Comments
Zouxxyy
changed the title
[Feature] Introduce position delete mode
[Feature] Introduce deletion vectors mode
Feb 27, 2024
This was referenced Feb 28, 2024
This was referenced Mar 7, 2024
We have things remaining:
|
This was referenced Mar 27, 2024
JingsongLi
changed the title
[Feature] Introduce deletion vectors mode
[Feature] Introduce deletion vectors for primary key table
Mar 28, 2024
Thanks @Zouxxyy , all finished! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation
Position deletion is a solution to implement the Merge-On-Read (MOR) structure, which has been adopted by other formats such as Iceberg and Delta. By combining with Paimon's LSM tree, we can create a new mode with deletion vectors (bitmap to identity which row id deleted) index file unique to Paimon.
Under this mode, extra overhead (lookup and write deletion vectors index file) will be introduced during writing, but during reading, data can be directly retrieved using "data + filter with deletion vector", avoiding additional merge costs between different files. Furthermore, this mode can be easily integrated into native engine solutions like Spark + Gluten in the future, thereby significantly enhancing read performance.
PIP: https://cwiki.apache.org/confluence/x/Tws4EQ
The text was updated successfully, but these errors were encountered: