Bruno is a Retrieval Augmented Generation application to let users Chat with a PDF built using langchain
and langgraph
. It uses Contextual Retrieval suggested by Anthropic to answer the questions with the most suitable context. Reference.
Every time you run out of Claude or Chatgpt file upload feature, you know what to use now! Visit Bruno
- Memory of the current chat, can maintain context through a chat.
- Handles memory uniquely for each session for multiple user simultaneously.
- Contextual chunking to improve to give retrieved chunks more context of their document.
- Ensemble Retrieval to retrieve based on key word match and vector search.
- Streaming result for smooth experience.
- Upload
.pdf
,.txt
,.doc
or.docx
file and ask any questions about it. - Delete and upload new file in between the chat whenever required
- Chat with it even without any file being uploaded
A variant of Retrieval Augmented Generation that focuses on adding more context to the chunks and improving retrieval with ensemble methods.
A jupyter notebook file of how to implement Contextual Retrieval RAG is in backend
\ ContextualRetrievalRAGexample.ipynb
. It uses langchain
and langgraph
to implement with memory(session state) and streaming.
-
Improving retrieval by performing an ensemble of BM25 and Vector retrieval
- Break down the knowledge base (the "corpus" of documents) into smaller chunks of text, usually no more than a few hundred tokens
- Create TF-IDF encodings and semantic embeddings for these chunks
- Use BM25 to find top chunks based on exact matches
- Use embeddings to find top chunks based on semantic similarity
- Combine and deduplicate results from (3) and (4) using rank fusion techniques
- Add the top-K chunks to the prompt to generate the response
-
Modifying chunks to have context of the document they belong to
- Pass each chunk along with the entire document to an LLM to add context to the chunk
- Use the prompt
prompt = ''' <document> {{WHOLE_DOCUMENT}} </document> Here is the chunk we want to situate within the whole document <chunk> {{CHUNK_CONTENT}} </chunk> Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. '''
- Use the new chunks for TF-IDF encoding and embeddings
-
-
Uses
PyPDFLoader
fromlangchain
to extract text from each page of a provided PDF file, combining it into a single string. -
Uses
RecursiveCharacterTextSplitter
to split into chunks - Run inference on llm for each chunk, by asking it to modify the chunk to contain context with respect to entire document. This step is skipped if document is too big.
- Created TF-IDF encodings implicitly while initialising BM25 retriever with new chunks.
-
Created vector embedding for the new chunks using embeddings model from
sentence-transformer
in HuggingFace. - Stores vector embeddings in Pinecone vector database and intialises vector retriever for it.
- Creates and ensemble retriever with BM25 retriever and Pinecone vector retriever, implicity perform Rerank and deduplication.
-
Uses
-
- Modifies prompt based on previous chat message to have their context.
- Uses the new prompt to perform similarity search on vector database and retrieve most relevant chunk. Likewise, use the prompt to retrieve with BM25 retriever to get results.
- Ensemble retriever implicitly handles Rerank all the retrieved chunks and removing duplicates
- Llama 3 8b is prompted with the new history based prompt and context from pdf(most relevant modified chunk).
- The resulting answer is streamed and the original prompt and the answer are appended to temporary chat history
- LLM: Llama3-8b
- LLM Hosting: Groq
- Parser: PyPDFLoader from LangChain
- Chunking: Contextual chunking
- Embedding model: sentence-transformers/all-mpnet-base-v2 from HuggingFace
- Retriever: Ensemble retriever (BM25 Retriever, vectordb retriever), implicit reranking
- Vector database: Pinecone
- Frontend: Flutter
- Backend: FastAPI
- Frontend deployment: Firebase hosting
- Backend deployment: as Docker image in Render
- Handling image and tabular data from PDFs
- Support multiple file upload
- Include authentication and store chats in database
- Include web search in it