Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Introduce deletion vectors for primary key table #2898

Closed
Zouxxyy opened this issue Feb 23, 2024 · 2 comments · Fixed by #3001
Closed

[Feature] Introduce deletion vectors for primary key table #2898

Zouxxyy opened this issue Feb 23, 2024 · 2 comments · Fixed by #3001
Assignees
Labels
enhancement New feature or request

Comments

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Feb 23, 2024

Motivation

Position deletion is a solution to implement the Merge-On-Read (MOR) structure, which has been adopted by other formats such as Iceberg and Delta. By combining with Paimon's LSM tree, we can create a new mode with deletion vectors (bitmap to identity which row id deleted) index file unique to Paimon.

Under this mode, extra overhead (lookup and write deletion vectors index file) will be introduced during writing, but during reading, data can be directly retrieved using "data + filter with deletion vector", avoiding additional merge costs between different files. Furthermore, this mode can be easily integrated into native engine solutions like Spark + Gluten in the future, thereby significantly enhancing read performance.

PIP: https://cwiki.apache.org/confluence/x/Tws4EQ

@JingsongLi
Copy link
Contributor

JingsongLi commented Mar 14, 2024

We have things remaining:

  • Since there is no need to merge when reading, in this mode, we can support filter pushdown of non-PK fields!
  • Supports dv with partial-update and aggregate. Looks like current implementation is not work.
  • Supports dv with first-row.
  • Documentation for using deletion vectors mode.
  • Roaring map dependency should be bundled into paimon-common.
  • AvroBulkFormat should return RecordWithPositionIterator.

@JingsongLi JingsongLi changed the title [Feature] Introduce deletion vectors mode [Feature] Introduce deletion vectors for primary key table Mar 28, 2024
@JingsongLi
Copy link
Contributor

Thanks @Zouxxyy , all finished!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants