Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk and Embedding Management in LanceDB #1587

Closed
Pipboyguy opened this issue Jul 12, 2024 · 0 comments · Fixed by #1620
Closed

Chunk and Embedding Management in LanceDB #1587

Pipboyguy opened this issue Jul 12, 2024 · 0 comments · Fixed by #1620
Assignees
Labels
destination Issue related to new destinations enhancement New feature or request

Comments

@Pipboyguy
Copy link
Collaborator

Pipboyguy commented Jul 12, 2024

Feature description

Prevent orphaned records during updates and deletions, especially with parent-child table relationships to maintain referential integrity. This will enable chunking mechanisms to work seamlessly once documents are dropped or updated.

Are you a dlt user?

Yes, I run dlt in production.

Use case

When working with hierarchical data structures in LanceDB, it's crucial to maintain referential integrity between parent and child records. This is particularly important for scenarios like document chunking, where a large document (parent) is split into smaller chunks (children) for embedding and similarity searches. The main challenge is to update or delete parent records without orphaning their associated child records.

Proposed solution

  • Enhance merge operations to handle parent-child relationships effectively.
  • Implement automatic removal of orphaned child records when parent records are updated or deleted.

Related issues

@Pipboyguy Pipboyguy self-assigned this Jul 12, 2024
@Pipboyguy Pipboyguy changed the title Support Efficient Update Strategy for Chunked Documents LanceDB - Support Efficient Update Strategy for Chunked Documents Jul 12, 2024
@Pipboyguy Pipboyguy added enhancement New feature or request destination Issue related to new destinations labels Jul 12, 2024
@Pipboyguy Pipboyguy moved this from Todo to In Progress in dlt core library Jul 12, 2024
@Pipboyguy Pipboyguy changed the title LanceDB - Support Efficient Update Strategy for Chunked Documents Efficient Merging and Removal of Orphaned Chunks in LanceDB Jul 19, 2024
@Pipboyguy Pipboyguy changed the title Efficient Merging and Removal of Orphaned Chunks in LanceDB Chunk and Embedding Management in LanceDB Jul 19, 2024
@Pipboyguy Pipboyguy linked a pull request Jul 21, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from In Progress to Done in dlt core library Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
destination Issue related to new destinations enhancement New feature or request
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant