Skip to content

A Contextual Retrieval RAG application to chat with a PDF or Text file

Notifications You must be signed in to change notification settings

thru-goes-hamilton/Bruno_V2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bruno_V2: A RAG application to chat with PDFs

Bruno is a Retrieval Augmented Generation application to let users Chat with a PDF built using langchain and langgraph. It uses Contextual Retrieval suggested by Anthropic to answer the questions with the most suitable context. Reference.


Every time you run out of Claude or Chatgpt file upload feature, you know what to use now! Visit Bruno

Features

  • Memory of the current chat, can maintain context through a chat.
  • Handles memory uniquely for each session for multiple user simultaneously.
  • Contextual chunking to improve to give retrieved chunks more context of their document.
  • Ensemble Retrieval to retrieve based on key word match and vector search.
  • Streaming result for smooth experience.

What can Bruno do?

  • Upload .pdf,.txt,.doc or .docx file and ask any questions about it.
  • Delete and upload new file in between the chat whenever required
  • Chat with it even without any file being uploaded

Contextual Retrieval RAG

A variant of Retrieval Augmented Generation that focuses on adding more context to the chunks and improving retrieval with ensemble methods.
A jupyter notebook file of how to implement Contextual Retrieval RAG is in backend\ ContextualRetrievalRAGexample.ipynb. It uses langchain and langgraph to implement with memory(session state) and streaming.

  1. Improving retrieval by performing an ensemble of BM25 and Vector retrieval


    • Break down the knowledge base (the "corpus" of documents) into smaller chunks of text, usually no more than a few hundred tokens
    • Create TF-IDF encodings and semantic embeddings for these chunks
    • Use BM25 to find top chunks based on exact matches
    • Use embeddings to find top chunks based on semantic similarity
    • Combine and deduplicate results from (3) and (4) using rank fusion techniques
    • Add the top-K chunks to the prompt to generate the response
  2. Modifying chunks to have context of the document they belong to


    • Pass each chunk along with the entire document to an LLM to add context to the chunk
    • Use the prompt
      prompt = '''
      <document> 
       {{WHOLE_DOCUMENT}} 
      </document> 
       Here is the chunk we want to situate within the whole document 
       <chunk> 
       {{CHUNK_CONTENT}} 
       </chunk> 
       Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. 
      '''
    • Use the new chunks for TF-IDF encoding and embeddings


Working of Contextual Retrieval RAG in Bruno

  • Upon file upload

    • Parsing text from PDF

      Uses PyPDFLoader from langchain to extract text from each page of a provided PDF file, combining it into a single string.
    • Splitting the text into chunks

      Uses RecursiveCharacterTextSplitter to split into chunks
    • Adding context to chunks

      Run inference on llm for each chunk, by asking it to modify the chunk to contain context with respect to entire document. This step is skipped if document is too big.
    • Creating TF-IDF encodings

      Created TF-IDF encodings implicitly while initialising BM25 retriever with new chunks.
    • Creating vector embeddings

      Created vector embedding for the new chunks using embeddings model from sentence-transformer in HuggingFace.
    • Store vector embeddings in Pinecone

      Stores vector embeddings in Pinecone vector database and intialises vector retriever for it.
    • Define ensemble retriever

      Creates and ensemble retriever with BM25 retriever and Pinecone vector retriever, implicity perform Rerank and deduplication.
  • Inference from RAG pipeline

    • History based prompt modification

      Modifies prompt based on previous chat message to have their context.
    • Retrieving chunks using similarity and key word search

      Uses the new prompt to perform similarity search on vector database and retrieve most relevant chunk. Likewise, use the prompt to retrieve with BM25 retriever to get results.
    • Rerank and deduplicate

      Ensemble retriever implicitly handles Rerank all the retrieved chunks and removing duplicates
    • LLM prompting after augmentation

      Llama 3 8b is prompted with the new history based prompt and context from pdf(most relevant modified chunk).
    • Streaming result and storing to chat history

      The resulting answer is streamed and the original prompt and the answer are appended to temporary chat history

Specifications used

  1. LLM: Llama3-8b
  2. LLM Hosting: Groq
  3. Parser: PyPDFLoader from LangChain
  4. Chunking: Contextual chunking
  5. Embedding model: sentence-transformers/all-mpnet-base-v2 from HuggingFace
  6. Retriever: Ensemble retriever (BM25 Retriever, vectordb retriever), implicit reranking
  7. Vector database: Pinecone

Application

  • Frontend: Flutter
  • Backend: FastAPI
  • Frontend deployment: Firebase hosting
  • Backend deployment: as Docker image in Render

Upcoming features in Bruno 3.0

  • Handling image and tabular data from PDFs
  • Support multiple file upload
  • Include authentication and store chats in database
  • Include web search in it