forked from langchain-ai/langchain
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Add doc for hybrid search (langchain-ai#21245)
See [preview](https://langchain-git-fork-cbornet-doc-hybrid-search-langchain.vercel.app/docs/use_cases/question_answering/hybrid/) In the model of [per user retrieval](https://python.langchain.com/docs/use_cases/question_answering/per_user/)
- Loading branch information
Showing
3 changed files
with
394 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,392 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "14d3fd06", | ||
"metadata": { | ||
"id": "14d3fd06" | ||
}, | ||
"source": [ | ||
"# Hybrid Search\n", | ||
"\n", | ||
"The standard search in LangChain is done by vector similarity. However, a number of vectorstores implementations (Astra DB, ElasticSearch, Neo4J, AzureSearch, ...) also support more advanced search combining vector similarity search and other search techniques (full-text, BM25, and so on). This is generally referred to as \"Hybrid\" search.\n", | ||
"\n", | ||
"**Step 1: Make sure the vectorstore you are using supports hybrid search**\n", | ||
"\n", | ||
"At the moment, there is no unified way to perform hybrid search in LangChain. Each vectorstore may have their own way to do it. This is generally exposed as a keyword argument that is passed in during `similarity_search`. By reading the documentation or source code, figure out whether the vectorstore you are using supports hybrid search, and, if so, how to use it.\n", | ||
"\n", | ||
"**Step 2: Add that parameter as a configurable field for the chain**\n", | ||
"\n", | ||
"This will let you easily call the chain and configure any relevant flags at runtime. See [this documentation](/docs/expression_language/primitives/configure) for more information on configuration.\n", | ||
"\n", | ||
"**Step 3: Call the chain with that configurable field**\n", | ||
"\n", | ||
"Now, at runtime you can call this chain with configurable field.\n", | ||
"\n", | ||
"## Code Example\n", | ||
"\n", | ||
"Let's see a concrete example of what this looks like in code. We will use the Cassandra/CQL interface of Astra DB for this example.\n", | ||
"\n", | ||
"Install the following Python package:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c2efe35eea197769", | ||
"metadata": { | ||
"id": "c2efe35eea197769", | ||
"outputId": "527275b4-076e-4b22-945c-e41a59188116" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install \"cassio>=0.1.7\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b4ef96d44341cd84", | ||
"metadata": { | ||
"collapsed": false, | ||
"id": "b4ef96d44341cd84" | ||
}, | ||
"source": [ | ||
"Get the [connection secrets](https://docs.datastax.com/en/astra/astra-db-vector/get-started/quickstart.html).\n", | ||
"\n", | ||
"Initialize cassio:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "cb2cef097277c32e", | ||
"metadata": { | ||
"id": "cb2cef097277c32e", | ||
"outputId": "4c3d05a0-319a-44a0-8ec3-0a9c78453132" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import cassio\n", | ||
"\n", | ||
"cassio.init(\n", | ||
" database_id=\"Your database ID\",\n", | ||
" token=\"Your application token\",\n", | ||
" keyspace=\"Your key space\",\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e1e51444877f45eb", | ||
"metadata": { | ||
"collapsed": false, | ||
"id": "e1e51444877f45eb" | ||
}, | ||
"source": [ | ||
"Create the Cassandra VectorStore with a standard [index analyzer](https://docs.datastax.com/en/astra/astra-db-vector/cql/use-analyzers-with-cql.html). The index analyzer is needed to enable term matching." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7345de3c", | ||
"metadata": { | ||
"id": "7345de3c", | ||
"outputId": "d38bcee0-0134-4ac6-8d35-afcce282481b" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from cassio.table.cql import STANDARD_ANALYZER\n", | ||
"from langchain_community.vectorstores import Cassandra\n", | ||
"from langchain_openai import OpenAIEmbeddings\n", | ||
"\n", | ||
"embeddings = OpenAIEmbeddings()\n", | ||
"vectorstore = Cassandra(\n", | ||
" embedding=embeddings,\n", | ||
" table_name=\"test_hybrid\",\n", | ||
" body_index_options=[STANDARD_ANALYZER],\n", | ||
" session=None,\n", | ||
" keyspace=None,\n", | ||
")\n", | ||
"\n", | ||
"vectorstore.add_texts(\n", | ||
" [\n", | ||
" \"In 2023, I visited Paris\",\n", | ||
" \"In 2022, I visited New York\",\n", | ||
" \"In 2021, I visited New Orleans\",\n", | ||
" ]\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "73887f23bbab978c", | ||
"metadata": { | ||
"collapsed": false, | ||
"id": "73887f23bbab978c" | ||
}, | ||
"source": [ | ||
"If we do a standard similarity search, we get all the documents:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "3c2a39fa", | ||
"metadata": { | ||
"id": "3c2a39fa", | ||
"outputId": "5290085b-896c-4c81-9b40-c315331b7009" | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(page_content='In 2022, I visited New York'),\n", | ||
"Document(page_content='In 2023, I visited Paris'),\n", | ||
"Document(page_content='In 2021, I visited New Orleans')]" | ||
] | ||
}, | ||
"execution_count": null, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"vectorstore.as_retriever().invoke(\"What city did I visit last?\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "78d4c3c79e67d8c3", | ||
"metadata": { | ||
"collapsed": false, | ||
"id": "78d4c3c79e67d8c3" | ||
}, | ||
"source": [ | ||
"The Astra DB vectorstore `body_search` argument can be used to filter the search on the term `new`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "56393baa", | ||
"metadata": { | ||
"id": "56393baa", | ||
"outputId": "d1c939f3-342f-4df4-94a3-d25429b5a25e" | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[Document(page_content='In 2022, I visited New York'),\n", | ||
"Document(page_content='In 2021, I visited New Orleans')]" | ||
] | ||
}, | ||
"execution_count": null, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"vectorstore.as_retriever(search_kwargs={\"body_search\": \"new\"}).invoke(\n", | ||
" \"What city did I visit last?\"\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "88ae97ed", | ||
"metadata": { | ||
"id": "88ae97ed" | ||
}, | ||
"source": [ | ||
"We can now create the chain that we will use to do question-answering over" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "62707b4f", | ||
"metadata": { | ||
"id": "62707b4f" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_core.output_parsers import StrOutputParser\n", | ||
"from langchain_core.prompts import ChatPromptTemplate\n", | ||
"from langchain_core.runnables import (\n", | ||
" ConfigurableField,\n", | ||
" RunnablePassthrough,\n", | ||
")\n", | ||
"from langchain_openai import ChatOpenAI" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b6778ffa", | ||
"metadata": { | ||
"id": "b6778ffa" | ||
}, | ||
"source": [ | ||
"This is basic question-answering chain set up." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "44a865f6", | ||
"metadata": { | ||
"id": "44a865f6" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"template = \"\"\"Answer the question based only on the following context:\n", | ||
"{context}\n", | ||
"Question: {question}\n", | ||
"\"\"\"\n", | ||
"prompt = ChatPromptTemplate.from_template(template)\n", | ||
"\n", | ||
"model = ChatOpenAI()\n", | ||
"\n", | ||
"retriever = vectorstore.as_retriever()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "72125166", | ||
"metadata": { | ||
"id": "72125166" | ||
}, | ||
"source": [ | ||
"Here we mark the retriever as having a configurable field. All vectorstore retrievers have `search_kwargs` as a field. This is just a dictionary, with vectorstore specific fields" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "babbadff", | ||
"metadata": { | ||
"id": "babbadff" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"configurable_retriever = retriever.configurable_fields(\n", | ||
" search_kwargs=ConfigurableField(\n", | ||
" id=\"search_kwargs\",\n", | ||
" name=\"Search Kwargs\",\n", | ||
" description=\"The search kwargs to use\",\n", | ||
" )\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "2d481b70", | ||
"metadata": { | ||
"id": "2d481b70" | ||
}, | ||
"source": [ | ||
"We can now create the chain using our configurable retriever" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "210b0446", | ||
"metadata": { | ||
"id": "210b0446" | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"chain = (\n", | ||
" {\"context\": configurable_retriever, \"question\": RunnablePassthrough()}\n", | ||
" | prompt\n", | ||
" | model\n", | ||
" | StrOutputParser()\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "a38037b2", | ||
"metadata": { | ||
"id": "a38037b2", | ||
"outputId": "1ea14996-5965-4a5e-9678-b9c35ce5c6de" | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"Paris" | ||
] | ||
}, | ||
"execution_count": null, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"chain.invoke(\"What city did I visit last?\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7f6458c3", | ||
"metadata": { | ||
"id": "7f6458c3" | ||
}, | ||
"source": [ | ||
"We can now invoke the chain with configurable options. `search_kwargs` is the id of the configurable field. The value is the search kwargs to use for Astra DB." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "9gYLqBTH8BFz", | ||
"metadata": { | ||
"id": "9gYLqBTH8BFz", | ||
"outputId": "4358a2e6-f306-48f1-dd5c-781ac8a33e89" | ||
}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"New York" | ||
] | ||
}, | ||
"execution_count": null, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"chain.invoke(\n", | ||
" \"What city did I visit last?\",\n", | ||
" config={\"configurable\": {\"search_kwargs\": {\"body_search\": \"new\"}}},\n", | ||
")" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.1" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.