You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read the original paper about HyDE and noticed (in sections 3.2 and 4.1) that authors use multiple document generations with temperature 0.7 and the question itself to calculate the final query embeddings which will be used for real documents retrieval (by calculating the mean of these embeddings).
Also I found that implementation from the documentation link provided is probably outdated due to usage of OpenAI model, deprecated chain and without using LCEL. Also id doesn't use query embeddings for final query embeddings calculation.
Since the steps in Part 9 are also not combined in the single LCEL chain, I tried to implement it myself considering all the comments above and wrote the following code (assuming that we already have vectorstore with documents):
fromfunctoolsimportpartialfromoperatorimportitemgetterfromlangchain_openai.chat_modelsimportChatOpenAIfromlangchain_core.output_parsersimportStrOutputParserfromlangchain_core.runnablesimportRunnableParallelfromlangchain.promptsimportChatPromptTemplateimportnumpyasnpdefgenerate_docs(arguments):
question=arguments['question']
generation_template=arguments['template']
n=arguments['n']
prompt_hyde=ChatPromptTemplate.from_template(generation_template)
generate_docs_for_retrieval= (
prompt_hyde|ChatOpenAI(model='gpt-3.5-turbo-0125', temperature=0.7)
|StrOutputParser()
)
generated_docs=generate_docs_for_retrieval.batch([{'question': question}] *n)
returngenerated_docsdefcalculate_query_embeddings(query_components):
question=query_components['question']
generated_docs=query_components['docs']
question_embeddings=np.array(embeddings.embed_query(question))
generated_docs_embeddings=np.array(embeddings.embed_documents(generated_docs))
query_embeddings=np.vstack([question_embeddings, generated_docs_embeddings])
calculated_query_embeddings=np.mean(query_embeddings, axis=0, keepdims=True)
returncalculated_query_embeddingsdefget_relevant_documents(query_embeddings, vectorstore, search_kwargs):
returnvectorstore.similarity_search_by_vector(query_embeddings, **search_kwargs)
search_kwargs= {'k': 4}
get_relevant_documents=partial(get_relevant_documents, vectorstore=vectorstore, search_kwargs=search_kwargs)
rag_template="""Answer the following question based on this context:{context}Question: {question}"""rag_prompt=ChatPromptTemplate.from_template(rag_template)
model=ChatOpenAI(model='gpt-3.5-turbo-0125', temperature=0)
chain= (
RunnableParallel(
{
'question': itemgetter('question'),
'context':
RunnableParallel({
'question': itemgetter('question'),
'docs': generate_docs
})
|calculate_query_embeddings|get_relevant_documents,
}
)
|rag_prompt|model|StrOutputParser()
)
generation_template="""Please write a scientific paper passage to answer the questionQuestion: {question}Passage:"""question="What is task decomposition for LLM agents?"n=4response=chain.invoke({
'question': question,
'template': generation_template,
'n': n,
})
print(response)
I decided to use batch() method of the Runnable to generate multiple documents, because I found that implementation of invoke() method always get only the first generation regardless of the n argument of the ChatOpenAI model (but all n generations are created and will increase the cost of the invocation).
It would be great to get feedback from you about implementation details from the paper (about using multiple documents and query itself for embeddings calculation), about this implementation which I provided (maybe you will recommend more effective solution because with batch() method we need to send prompt tokens with each request) and about the invoke() implementation (why it returns only the first generation, and maybe there is more cost-effective solution than batch() if we can't use invoke() for multiple generations).
Thank you.
The text was updated successfully, but these errors were encountered:
Hi @rlancemartin,
I have read the original paper about HyDE and noticed (in sections 3.2 and 4.1) that authors use multiple document generations with temperature 0.7 and the question itself to calculate the final query embeddings which will be used for real documents retrieval (by calculating the mean of these embeddings).
Also I found that implementation from the documentation link provided is probably outdated due to usage of OpenAI model, deprecated chain and without using LCEL. Also id doesn't use query embeddings for final query embeddings calculation.
Since the steps in Part 9 are also not combined in the single LCEL chain, I tried to implement it myself considering all the comments above and wrote the following code (assuming that we already have vectorstore with documents):
I decided to use
batch()
method of the Runnable to generate multiple documents, because I found that implementation ofinvoke()
method always get only the first generation regardless of then
argument of theChatOpenAI
model (but alln
generations are created and will increase the cost of the invocation).It would be great to get feedback from you about implementation details from the paper (about using multiple documents and query itself for embeddings calculation), about this implementation which I provided (maybe you will recommend more effective solution because with
batch()
method we need to send prompt tokens with each request) and about theinvoke()
implementation (why it returns only the first generation, and maybe there is more cost-effective solution thanbatch()
if we can't useinvoke()
for multiple generations).Thank you.
The text was updated successfully, but these errors were encountered: