Support for multiple indexes in a vector store #16972

vaaale · 2024-11-15T14:32:13Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

Yes
No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

Yes
No

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

I added new unit tests to cover this change
I believe this change is already covered by existing unit tests

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added Google Colab support for the newly added notebooks.
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

…indexes

logan-markewich · 2024-12-11T20:53:26Z

...ions/vector_stores/llama-index-vector-stores-chroma/llama_index/vector_stores/chroma/base.py

@@ -235,36 +240,46 @@ def from_params(
    def class_name(cls) -> str:
        return "ChromaVectorStore"

+    def move_nodes(self, from_index_id: str, to_index_id: str):
+        all_nodes = self._collection.get(where={"__index_id__": from_index_id})


this would bring every node into memory. Not ideal tbh for large datasets

First of all, thank you for getting back to me on this.

I agree with your comment on the ChromaDB implementation. Not ideal!
Maybe something like this is better:

all_metadatas = self._collection.get(where={"__index_id__": from_index_id}, include=[IncludeEnum.metadatas]) ids = all_metadatas["ids"] updated_metadatas = [{**m, "__index_id__": to_index_id} for m in all_metadatas["metadatas"]] self._collection.update(ids, metadatas=updated_metadatas)

And maybe there are even better ways of doing it. I haven't looked too deeply into ChromaDB.

Regarding your "potential" disagreement with the implementation. What exactly do you disagree with? Is it the implementation, or the concept?

If it's the actual implementation you disagree with I would be more than happy to have a conversation around it. The provided implementation is just meant as a proof-of-concept to put forward an idea of how this problem can be fixed. Just as a reminder of what I mean when I say "problem":

patient_index = VectorStore.from_documents(patient_records) hospital_procedures_index = VectorStore.from_documents(hospital_procedures_docs) ...... ...... hospital_procedures_index = load_index_from_storage(index_id="hospital_procedures_idx") procedures = hospital_procedures_index.as_retriever().retrieve("<query>")

The code above does not do what one intuitively would expect! :)
Lets say "another" developer does something like this:

hospital_procedures_index = load_index_from_storage(index_id="hospital_procedures_idx") procedures = hospital_procedures_index.as_retriever().retrieve("<query>") # procedures <----- May contain sensitive patient data!!

I have seen this happen, and it's unlikely that this would get caught during normal testing. (Why would you check if the hospital_procedures_idx contains patient data?)
This would of course be an absolute disaster!

I don't mean to create a big rant here, but in my opinion, this is a HUGE design flaw of the whole storage / "persitence" layer of llamaIndex that really should be addressed. The best solution would probably be to rewrite the whole subsystem as there are several other somewhat related issues that could be addressed at the same time. For example:

Recursive retrieval (which I think is one of the most exciting features of llamaIndex core. Other frameworks don't have this at all!)

Proper persistence of "object_map" in the Retrievers. (The object_map is lost after persistence)

Implementing the changes I propose would be an easy and quick fix to at least make it robust and usable.

Fixing this subsystem would make LlamaIndex the most Kick-ass framework in this space! (In my opinion :) )

logan-markewich · 2024-12-11T20:54:17Z

Sorry for the delay on this! Being totally honest, I'm not sure this will get merged (or at least not anytime soon).

I really appreciate the effort to get this working. Not 100% sure I agree with the implementation just yet, but also, nothing wrong with installing/maintaining your fork if you need this functionality.

For transparency, we are currently focusing on some multimodal refactors that we really need to get in first

Alex added 4 commits November 15, 2024 11:42

Support for having multiple indexes in a VectorStore

01922fb

Support for having multiple indexes in a VectorStore

5593547

Support for having multiple indexes in a VectorStore

45e12a9

Merge remote-tracking branch 'origin/multiple_indexes' into multiple_…

ed11805

…indexes

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Nov 15, 2024

vaaale mentioned this pull request Nov 15, 2024

[Feature Request]: Suport for having multiple Vector Indexes in a single Vector Store #16945

Open

logan-markewich reviewed Dec 11, 2024

View reviewed changes

Improved move_nodes.

9641e45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multiple indexes in a vector store #16972

Support for multiple indexes in a vector store #16972

vaaale commented Nov 15, 2024

logan-markewich Dec 11, 2024

vaaale Dec 13, 2024 •

edited

Loading

logan-markewich commented Dec 11, 2024

Support for multiple indexes in a vector store #16972

Are you sure you want to change the base?

Support for multiple indexes in a vector store #16972

Conversation

vaaale commented Nov 15, 2024

Description

New Package?

Version Bump?

Type of Change

How Has This Been Tested?

Suggested Checklist:

logan-markewich Dec 11, 2024

Choose a reason for hiding this comment

vaaale Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

logan-markewich commented Dec 11, 2024

vaaale Dec 13, 2024 •

edited

Loading