Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New examples added to docs #166

Open
wants to merge 2 commits into
base: release
Choose a base branch
from
Open

Conversation

hari0205
Copy link

@hari0205 hari0205 commented Apr 5, 2024

  • Added a new example on how to RAG using metadata on retrievers
  • Some linting done.

@hari0205 hari0205 marked this pull request as draft April 5, 2024 15:15
@hari0205 hari0205 marked this pull request as ready for review April 5, 2024 15:15
@hari0205
Copy link
Author

hari0205 commented Apr 6, 2024

@hsm207 is this good enough for a PR? I noticed that as_retriever docstring on langchain_core vector stores is a bit misleading and incorrect.

@hsm207
Copy link
Collaborator

hsm207 commented Apr 7, 2024

@hari0205 could you please clarify what part of the as_retriever is a bit misleading and incorrect? Since langchain-weaviate does not override that method, I'm inclined to think that the clarification on as_retriever should go to the main langchain repo

cc @efriis

@hari0205
Copy link
Author

hari0205 commented Apr 7, 2024

@hari0205 could you please clarify what part of the as_retriever is a bit misleading and incorrect? Since langchain-weaviate does not override that method, I'm inclined to think that the clarification on as_retriever should go to the main langchain repo

cc @efriis

The as_retriever docstring mentions that it takes filter as one of the keyword arguments:

 # Use a filter to only retrieve documents from a specific paper
            docsearch.as_retriever(
                search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}}
            )

So, I tried an example

import weaviate as wev
from langchain_weaviate import WeaviateVectorStore

client = wev.connect_to_local(port=8030)
db = WeaviateVectorStore(
    client, "myindex", "mytext", embedding=OpenAIEmbeddings(), by_text=False
)

retriver = db.as_retriever(search_kwargs={"filters":{"userid":userid}})
llm = ChatOpenAI(model="gpt-4-turbo-preview",temperature=0)
    
qa_chain = RetrievalQA.from_chain_type(llm,retriever=retriver)
res=qa_chain.invoke({"query":query})
print(res)

I got a type error:

 File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain\chains\base.py", line 162, in invoke
    raise e
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain\chains\base.py", line 156, in invoke
    self._call(inputs, run_manager=run_manager)
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain\chains\retrieval_qa\base.py", line 141, in _call
    docs = self._get_docs(question, run_manager=_run_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain\chains\retrieval_qa\base.py", line 221, in _get_docs
    return self.retriever.get_relevant_documents(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain_core\retrievers.py", line 245, in get_relevant_documents
    raise e
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain_core\retrievers.py", line 238, in get_relevant_documents
    result = self._get_relevant_documents(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain_core\vectorstores.py", line 696, in _get_relevant_documents
    docs = self.vectorstore.similarity_search(query, **self.search_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain_weaviate\vectorstores.py", line 288, in similarity_search
    result = self._perform_search(query, k, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\langchain_weaviate\vectorstores.py", line 241, in _perform_search
    result = collection.query.hybrid(
             ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: _HybridQuery.hybrid() got an unexpected keyword argument 'filter'

Upon closer inspection on class WeaviateVectorStore(VectorStore): in langchain_weviate, I was able to figure out that the hybrid method takes filters instead of filter.

I changed the code to

retriver = db.as_retriever(search_kwargs={"filters":{"userid":userid}})

Now, I am greeted with another error

  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\weaviate\collections\queries\hybrid\query.py", line 84, in hybrid
    res = self._query.hybrid(
          ^^^^^^^^^^^^^^^^^^^
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\weaviate\collections\grpc\query.py", line 194, in hybrid
    request = self.__create_request(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\weaviate\collections\grpc\query.py", line 521, in __create_request
    _validate_input(
  File "E:\work\beyondcc\resume_parse\langchain\Lib\site-packages\weaviate\validator.py", line 24, in _validate_input
    raise WeaviateInvalidInputError(
weaviate.exceptions.WeaviateInvalidInputError: Invalid input provided: Argument 'filters' must be one of: [<class 'weaviate.collections.classes.filters._Filters'>, None], but got <class 'dict'>.

The only way I was able to get it working was to modify the code to

search_filter=wev.classes.query.Filter.by_property("userid").equal(userid)
retriver = db.as_retriever(search_kwargs={"filters":search_filter})
llm = ChatOpenAI(model="gpt-4-turbo-preview",temperature=0)
    
qa_chain = RetrievalQA.from_chain_type(llm,retriever=retriver)
res=qa_chain.invoke({"query":query})
print(res)

My assumption is that the implementation of metadata filter is different for different vector stores. Since the keyword arguments are passed as they are, we encounter this error. So, I opened this PR to provide some examples and avoid confusion.

PS: package versions

langchain==0.1.6
langchain-community==0.0.19
langchain-core==0.1.38
langchain-openai==0.0.6
langchain-text-splitters==0.0.1
langchain-weaviate==0.0.1rc5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants