LanceDB: removed mode from the add statement since mode should be append #13892

hragbalian · 2024-06-02T04:26:36Z

Really small update to remove the mode option from the lancedb add statement since it was sharing the mode option with create_table, and I think the wrapper was intending it mainly for create_table. Two issues with sharing the mode with add:

The options for mode are different for add than they are for create_table: for add, the valid values are "append" and "overwrite" whereas for create_table they are "create" and "overwrite".
There is an unintended behavior when the value "overwrite" is specified for mode and is shared by the two methods: it will always overwrite the table thereby not allow you to build up the vector database with new embeddings.

Removing the mode from add reverts the value to it's default which is "append", which should be what most situations will warrant. Alternatively, there could be a separate parameter to control the mode for add, but it really doesn't seem necessary.

…(which is the default value) and create is not an option for add, but is only a mode option for create_table

nerdai

Thanks @hragbalian -- makes sense to me. Added a small comment in my review.

nerdai · 2024-06-02T21:48:30Z

...ns/vector_stores/llama-index-vector-stores-lancedb/llama_index/vector_stores/lancedb/base.py

@@ -348,7 +348,7 @@ def add(
                    self._table_name, data, mode=self.mode
                )

-            self._table.add(data, mode=self.mode)
+            self._table.add(data)


perhaps we can just add **add_kwargs here to handle the designated mode parameter.

@nerdai good point. I hadn't seen that there. I updated the PR with that change.

nerdai

Looks like this test is failing still @hragbalian:

llama-index-integrations/vector_stores/llama-index-vector-stores-lancedb/tests/test_vector_stores_lancedb.py . [ 20%]
...F                                                                     [100%]

=================================== FAILURES ===================================
_________________________________ test_delete __________________________________

index = <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f8f46a14b20>

    def test_delete(index: VectorStoreIndex) -> None:
        index.delete(doc_id="test-0")
>       assert index.vector_store._table.count_rows() == 2
E       assert 12 == 2
E        +  where 12 = <bound method LanceTable.count_rows of LanceTable(connection=LanceDBConnection(/tmp/lancedb), name="vectors")>()
E        +    where <bound method LanceTable.count_rows of LanceTable(connection=LanceDBConnection(/tmp/lancedb), name="vectors")> = LanceTable(connection=LanceDBConnection(/tmp/lancedb), name="vectors").count_rows
E        +      where LanceTable(connection=LanceDBConnection(/tmp/lancedb), name="vectors") = LanceDBVectorStore(stores_text=True, is_embedding_query=True, flat_metadata=True, uri='/tmp/lancedb', vector_column_na..._key='text', doc_id_key='doc_id', api_key=None, region=None, mode='overwrite', query_type='vector', overfetch_factor=1)._table
E        +        where LanceDBVectorStore(stores_text=True, is_embedding_query=True, flat_metadata=True, uri='/tmp/lancedb', vector_column_na..._key='text', doc_id_key='doc_id', api_key=None, region=None, mode='overwrite', query_type='vector', overfetch_factor=1) = <llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7f8f46a14b20>.vector_store

llama-index-integrations/vector_stores/llama-index-vector-stores-lancedb/tests/test_vector_stores_lancedb.py:42: AssertionError

Do you think you can take a look? You can try to test locally with pytest or if you have pants installed then you can use make test as well.

hragbalian · 2024-06-03T06:24:28Z

@nerdai the delete appears to be failing because TextNode is not producing a ref_doc_id which LanceDBVectorStore is expecting to see on the nodes. I think that might be due to a deprecation somewhere that didn't get updated in this class. That seems unrelated to this PR and I could handle that separately.

nerdai · 2024-06-03T15:47:54Z

@hragbalian I just ran the tests locally with and without this PR. Without this PR, all of the tests pass, but when I checkout the code for this PR and try to run tests, test_delete fails.

Looks like overwrite mode needs to be included when invoking add() still

nerdai · 2024-06-03T15:52:32Z

@hragbalian if I modify tests/conftest.py L51 as follows, then the test passes.

    vector_store.add(nodes=nodes, mode="overwrite")

hragbalian · 2024-06-04T06:18:35Z

@nerdai ok, I’ll take a closer look later this week to troubleshoot

removed self.mode from the add statement since mode should be append …

6a1daf3

…(which is the default value) and create is not an option for add, but is only a mode option for create_table

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 2, 2024

hragbalian changed the title ~~removed mode from the add statement since mode should be append …~~ LanceDB: removed mode from the add statement since mode should be append Jun 2, 2024

nerdai reviewed Jun 2, 2024

View reviewed changes

added the **add_kwargs to handle options for add

814564e

nerdai reviewed Jun 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LanceDB: removed mode from the add statement since mode should be append #13892

LanceDB: removed mode from the add statement since mode should be append #13892

hragbalian commented Jun 2, 2024

nerdai left a comment

nerdai Jun 2, 2024

hragbalian Jun 2, 2024

nerdai left a comment

hragbalian commented Jun 3, 2024

nerdai commented Jun 3, 2024 •

edited

Loading

nerdai commented Jun 3, 2024

hragbalian commented Jun 4, 2024

LanceDB: removed mode from the add statement since mode should be append #13892

Are you sure you want to change the base?

LanceDB: removed mode from the add statement since mode should be append #13892

Conversation

hragbalian commented Jun 2, 2024

nerdai left a comment

Choose a reason for hiding this comment

nerdai Jun 2, 2024

Choose a reason for hiding this comment

hragbalian Jun 2, 2024

Choose a reason for hiding this comment

nerdai left a comment

Choose a reason for hiding this comment

hragbalian commented Jun 3, 2024

nerdai commented Jun 3, 2024 • edited Loading

nerdai commented Jun 3, 2024

hragbalian commented Jun 4, 2024

nerdai commented Jun 3, 2024 •

edited

Loading