Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Langchain Milvus - Handle switch between multiple databases #25277

Open
5 tasks done
tvvignesh opened this issue Aug 11, 2024 · 2 comments
Open
5 tasks done

Langchain Milvus - Handle switch between multiple databases #25277

tvvignesh opened this issue Aug 11, 2024 · 2 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: milvus Primarily related to Milvus vector store integration stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed Ɑ: vector store Related to vector store module

Comments

@tvvignesh
Copy link

tvvignesh commented Aug 11, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

If I try to do operations in different databases of the vectorstore, Milvus, it doesn't work until I disconnect all existing connections. Otherwise, it always does the operation in the connection of 0th index of the global connections object

Only this works:

# This block gets called multiple times for different databases

connections.disconnect("default")
connections.remove_connection("default")

connections.disconnect("db1")
connections.remove_connection("db1")

vectorstore = Milvus(
            self.embeddings,
            collection_name=collection_name,
            auto_id=True,
            connection_args={
                "user": self.user,
                "password": self.password,
                "host": self.host,
                "port": self.port,
                "db_name": self.db_name,
            },
            index_params={}
        )

fs = LocalFileStore("./documents/kvdata/" + self.db_name + "_" + collection_name)
        store = create_kv_docstore(fs)
        parent_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=2000)
        child_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=400)

        retriever = ParentDocumentRetriever(
            vectorstore=vectorstore,
            docstore=store,
            child_splitter=child_splitter,
            parent_splitter=parent_splitter,
        )

        retriever.add_documents(documents)

# Do subsequent operations with vectorstore

If I don't disconnect to all databases before doing all operations, all collections get created in the default milvus db and does not respect the db_name parameter in connection_args even when I noticed that the right db_name was being passed for the collection

Description

I have commented on the same issue here: milvus-io/pymilvus#2161 (comment) - this needs an urgent look since because of this, we are unable to use multiple databases with langchain_milvus without disconnecting all existing connections which is not the right way for us and is causing a lot of trouble.

I assume because of this code block:

if given_address is not None:
            for con in connections.list_connections():
                addr = connections.get_connection_addr(con[0])
                if (
                    con[1]
                    and ("address" in addr)
                    and (addr["address"] == given_address)
                    and ("user" in addr)
                    and (addr["user"] == tmp_user)
                ):
                    logger.debug("Using previous connection: %s", con[0])
                    return con[0]

While it gets the previous, connection, I am not sure if db_name parameter supplied in connection_args is respected when it reconnects.

System Info

langchain==0.2.10
langchain-community==0.2.9
langchain-core==0.2.22
langchain-experimental==0.0.62
langchain-huggingface==0.0.3
langchain-milvus==0.1.2
langchain-text-splitters==0.2.2
@langcarl langcarl bot added the investigate label Aug 11, 2024
@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🔌: milvus Primarily related to Milvus vector store integration 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Aug 11, 2024
@tvvignesh
Copy link
Author

@ccurme Got this working by making changes as mentioned in milvus-io/pymilvus#2161 (comment)

But this needs an update in langchain_milvus package as well as suggested

efriis added a commit that referenced this issue Aug 26, 2024
Thank you for contributing to LangChain!

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"
  - "libs: langchain_milvus: add db name to milvus connection check"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:**  add db name to milvus connection check
    - **Issue:** #25277



- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Co-authored-by: Erick Friis <[email protected]>
Copy link

dosubot bot commented Nov 20, 2024

Hi, @tvvignesh. I'm Dosu, and I'm helping the LangChain team manage their backlog. I'm marking this issue as stale.

Issue Summary

  • The issue involves a bug in LangChain's integration with Milvus.
  • Operations default to the first connection, ignoring the db_name parameter.
  • This affects the use of multiple databases.
  • You found a workaround using suggestions from a related Milvus issue.
  • The LangChain Milvus package still needs an update to address the problem.

Next Steps

  • Please let me know if this issue is still relevant to the latest version of LangChain by commenting here.
  • If there is no further activity, this issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: milvus Primarily related to Milvus vector store integration stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed Ɑ: vector store Related to vector store module
Projects
None yet
Development

No branches or pull requests

2 participants