Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LanceDB: removed mode from the add statement since mode should be append #13892

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hragbalian
Copy link
Contributor

Really small update to remove the mode option from the lancedb add statement since it was sharing the mode option with create_table, and I think the wrapper was intending it mainly for create_table. Two issues with sharing the mode with add:

  1. The options for mode are different for add than they are for create_table: for add, the valid values are "append" and "overwrite" whereas for create_table they are "create" and "overwrite".

  2. There is an unintended behavior when the value "overwrite" is specified for mode and is shared by the two methods: it will always overwrite the table thereby not allow you to build up the vector database with new embeddings.

Removing the mode from add reverts the value to it's default which is "append", which should be what most situations will warrant. Alternatively, there could be a separate parameter to control the mode for add, but it really doesn't seem necessary.

…(which is the default value) and create is not an option for add, but is only a mode option for create_table
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 2, 2024
@hragbalian hragbalian changed the title removed mode from the add statement since mode should be append … LanceDB: removed mode from the add statement since mode should be append Jun 2, 2024
Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hragbalian -- makes sense to me. Added a small comment in my review.

@@ -348,7 +348,7 @@ def add(
self._table_name, data, mode=self.mode
)

self._table.add(data, mode=self.mode)
self._table.add(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can just add **add_kwargs here to handle the designated mode parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nerdai good point. I hadn't seen that there. I updated the PR with that change.

Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this test is failing still @hragbalian:

llama-index-integrations/vector_stores/llama-index-vector-stores-lancedb/tests/test_vector_stores_lancedb.py . [ 20%]
...F                                                                     [100%]

=================================== FAILURES ===================================
_________________________________ test_delete __________________________________

index = <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f8f46a14b20>

    def test_delete(index: VectorStoreIndex) -> None:
        index.delete(doc_id="test-0")
>       assert index.vector_store._table.count_rows() == 2
E       assert 12 == 2
E        +  where 12 = <bound method LanceTable.count_rows of LanceTable(connection=LanceDBConnection(/tmp/lancedb), name="vectors")>()
E        +    where <bound method LanceTable.count_rows of LanceTable(connection=LanceDBConnection(/tmp/lancedb), name="vectors")> = LanceTable(connection=LanceDBConnection(/tmp/lancedb), name="vectors").count_rows
E        +      where LanceTable(connection=LanceDBConnection(/tmp/lancedb), name="vectors") = LanceDBVectorStore(stores_text=True, is_embedding_query=True, flat_metadata=True, uri='/tmp/lancedb', vector_column_na..._key='text', doc_id_key='doc_id', api_key=None, region=None, mode='overwrite', query_type='vector', overfetch_factor=1)._table
E        +        where LanceDBVectorStore(stores_text=True, is_embedding_query=True, flat_metadata=True, uri='/tmp/lancedb', vector_column_na..._key='text', doc_id_key='doc_id', api_key=None, region=None, mode='overwrite', query_type='vector', overfetch_factor=1) = <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f8f46a14b20>.vector_store

llama-index-integrations/vector_stores/llama-index-vector-stores-lancedb/tests/test_vector_stores_lancedb.py:42: AssertionError

Do you think you can take a look? You can try to test locally with pytest or if you have pants installed then you can use make test as well.

@hragbalian
Copy link
Contributor Author

@nerdai the delete appears to be failing because TextNode is not producing a ref_doc_id which LanceDBVectorStore is expecting to see on the nodes. I think that might be due to a deprecation somewhere that didn't get updated in this class. That seems unrelated to this PR and I could handle that separately.

@nerdai
Copy link
Contributor

nerdai commented Jun 3, 2024

@hragbalian I just ran the tests locally with and without this PR. Without this PR, all of the tests pass, but when I checkout the code for this PR and try to run tests, test_delete fails.

Looks like overwrite mode needs to be included when invoking add() still

@nerdai
Copy link
Contributor

nerdai commented Jun 3, 2024

@hragbalian if I modify tests/conftest.py L51 as follows, then the test passes.

    vector_store.add(nodes=nodes, mode="overwrite")

@hragbalian
Copy link
Contributor Author

@nerdai ok, I’ll take a closer look later this week to troubleshoot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants