Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 9: Questions about implementation of HyDE #8

Open
labdmitriy opened this issue Feb 24, 2024 · 2 comments
Open

Part 9: Questions about implementation of HyDE #8

labdmitriy opened this issue Feb 24, 2024 · 2 comments

Comments

@labdmitriy
Copy link

labdmitriy commented Feb 24, 2024

Hi @rlancemartin,

I have read the original paper about HyDE and noticed (in sections 3.2 and 4.1) that authors use multiple document generations with temperature 0.7 and the question itself to calculate the final query embeddings which will be used for real documents retrieval (by calculating the mean of these embeddings).

Also I found that implementation from the documentation link provided is probably outdated due to usage of OpenAI model, deprecated chain and without using LCEL. Also id doesn't use query embeddings for final query embeddings calculation.

Since the steps in Part 9 are also not combined in the single LCEL chain, I tried to implement it myself considering all the comments above and wrote the following code (assuming that we already have vectorstore with documents):

from functools import partial
from operator import itemgetter

from langchain_openai.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel
from langchain.prompts import ChatPromptTemplate
import numpy as np


def generate_docs(arguments):
    question = arguments['question']
    generation_template = arguments['template']
    n = arguments['n']
    prompt_hyde = ChatPromptTemplate.from_template(generation_template)
    generate_docs_for_retrieval = (
        prompt_hyde 
        | ChatOpenAI(model='gpt-3.5-turbo-0125', temperature=0.7) 
        | StrOutputParser()
    )
    generated_docs = generate_docs_for_retrieval.batch([{'question': question}] * n)
    return generated_docs

def calculate_query_embeddings(query_components):
    question = query_components['question']
    generated_docs = query_components['docs']
    
    question_embeddings = np.array(embeddings.embed_query(question))
    generated_docs_embeddings = np.array(embeddings.embed_documents(generated_docs))
    
    query_embeddings = np.vstack([question_embeddings, generated_docs_embeddings])
    calculated_query_embeddings = np.mean(query_embeddings, axis=0, keepdims=True)
    return calculated_query_embeddings

def get_relevant_documents(query_embeddings, vectorstore, search_kwargs):
    return vectorstore.similarity_search_by_vector(query_embeddings, **search_kwargs)

search_kwargs = {'k': 4}
get_relevant_documents = partial(get_relevant_documents, vectorstore=vectorstore, search_kwargs=search_kwargs)

rag_template = """Answer the following question based on this context:
{context}

Question: {question}
"""
rag_prompt = ChatPromptTemplate.from_template(rag_template)

model = ChatOpenAI(model='gpt-3.5-turbo-0125', temperature=0)

chain = (
    RunnableParallel(
        {
            'question': itemgetter('question'),
            'context':
                RunnableParallel({
                    'question': itemgetter('question'),
                    'docs': generate_docs
                })
                | calculate_query_embeddings
                | get_relevant_documents,
        }
    )
    | rag_prompt
    | model
    | StrOutputParser()
)

generation_template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
question = "What is task decomposition for LLM agents?"
n = 4

response = chain.invoke({
    'question': question,
    'template': generation_template,
    'n': n,
})
print(response)

I decided to use batch() method of the Runnable to generate multiple documents, because I found that implementation of invoke() method always get only the first generation regardless of the n argument of the ChatOpenAI model (but all n generations are created and will increase the cost of the invocation).

It would be great to get feedback from you about implementation details from the paper (about using multiple documents and query itself for embeddings calculation), about this implementation which I provided (maybe you will recommend more effective solution because with batch() method we need to send prompt tokens with each request) and about the invoke() implementation (why it returns only the first generation, and maybe there is more cost-effective solution than batch() if we can't use invoke() for multiple generations).

Thank you.

@rlancemartin
Copy link
Collaborator

Thanks for the detailed feedback! I'm going to review later this week (and continue making some new videos). I appreciate it!

@labdmitriy
Copy link
Author

Hi @rlancemartin,
Could you please tell will you still plan to review it?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants