Bruno_V2: A RAG application to chat with PDFs

Bruno is a Retrieval Augmented Generation application to let users Chat with a PDF built using langchain and langgraph. It uses Contextual Retrieval suggested by Anthropic to answer the questions with the most suitable context. Reference.

Every time you run out of Claude or Chatgpt file upload feature, you know what to use now! Visit Bruno

Features

Memory of the current chat, can maintain context through a chat.
Handles memory uniquely for each session for multiple user simultaneously.
Contextual chunking to improve to give retrieved chunks more context of their document.
Ensemble Retrieval to retrieve based on key word match and vector search.
Streaming result for smooth experience.

What can Bruno do?

Upload .pdf,.txt,.doc or .docx file and ask any questions about it.
Delete and upload new file in between the chat whenever required
Chat with it even without any file being uploaded

Contextual Retrieval RAG

A variant of Retrieval Augmented Generation that focuses on adding more context to the chunks and improving retrieval with ensemble methods.
A jupyter notebook file of how to implement Contextual Retrieval RAG is in backend\ ContextualRetrievalRAGexample.ipynb. It uses langchain and langgraph to implement with memory(session state) and streaming.

Improving retrieval by performing an ensemble of BM25 and Vector retrieval
- Break down the knowledge base (the "corpus" of documents) into smaller chunks of text, usually no more than a few hundred tokens
- Create TF-IDF encodings and semantic embeddings for these chunks
- Use BM25 to find top chunks based on exact matches
- Use embeddings to find top chunks based on semantic similarity
- Combine and deduplicate results from (3) and (4) using rank fusion techniques
- Add the top-K chunks to the prompt to generate the response

Modifying chunks to have context of the document they belong to

Pass each chunk along with the entire document to an LLM to add context to the chunk

Use the prompt

prompt = '''
<document> 
 {{WHOLE_DOCUMENT}} 
</document> 
 Here is the chunk we want to situate within the whole document 
 <chunk> 
 {{CHUNK_CONTENT}} 
 </chunk> 
 Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else. 
'''

Use the new chunks for TF-IDF encoding and embeddings

Working of Contextual Retrieval RAG in Bruno

Upon file upload
- Parsing text from PDF
  Uses PyPDFLoader from langchain to extract text from each page of a provided PDF file, combining it into a single string.
- Splitting the text into chunks
  Uses RecursiveCharacterTextSplitter to split into chunks
- Adding context to chunks
  Run inference on llm for each chunk, by asking it to modify the chunk to contain context with respect to entire document. This step is skipped if document is too big.
- Creating TF-IDF encodings
  Created TF-IDF encodings implicitly while initialising BM25 retriever with new chunks.
- Creating vector embeddings
  Created vector embedding for the new chunks using embeddings model from sentence-transformer in HuggingFace.
- Store vector embeddings in Pinecone
  Stores vector embeddings in Pinecone vector database and intialises vector retriever for it.
- Define ensemble retriever
  Creates and ensemble retriever with BM25 retriever and Pinecone vector retriever, implicity perform Rerank and deduplication.
Inference from RAG pipeline
- History based prompt modification
  Modifies prompt based on previous chat message to have their context.
- Retrieving chunks using similarity and key word search
  Uses the new prompt to perform similarity search on vector database and retrieve most relevant chunk. Likewise, use the prompt to retrieve with BM25 retriever to get results.
- Rerank and deduplicate
  Ensemble retriever implicitly handles Rerank all the retrieved chunks and removing duplicates
- LLM prompting after augmentation
  Llama 3 8b is prompted with the new history based prompt and context from pdf(most relevant modified chunk).
- Streaming result and storing to chat history
  The resulting answer is streamed and the original prompt and the answer are appended to temporary chat history

Specifications used

LLM: Llama3-8b
LLM Hosting: Groq
Parser: PyPDFLoader from LangChain
Chunking: Contextual chunking
Embedding model: sentence-transformers/all-mpnet-base-v2 from HuggingFace
Retriever: Ensemble retriever (BM25 Retriever, vectordb retriever), implicit reranking
Vector database: Pinecone

Application

Frontend: Flutter
Backend: FastAPI
Frontend deployment: Firebase hosting
Backend deployment: as Docker image in Render

Upcoming features in Bruno 3.0

Handling image and tabular data from PDFs
Support multiple file upload
Include authentication and store chats in database
Include web search in it

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Bruno		Bruno
backend		backend
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bruno_V2: A RAG application to chat with PDFs

Features

What can Bruno do?

Contextual Retrieval RAG

Working of Contextual Retrieval RAG in Bruno

Upon file upload

Parsing text from PDF

Splitting the text into chunks

Adding context to chunks

Creating TF-IDF encodings

Creating vector embeddings

Store vector embeddings in Pinecone

Define ensemble retriever

Inference from RAG pipeline

History based prompt modification

Retrieving chunks using similarity and key word search

Rerank and deduplicate

LLM prompting after augmentation

Streaming result and storing to chat history

Specifications used

Application

Upcoming features in Bruno 3.0

About

Releases

Packages

Languages

thru-goes-hamilton/Bruno_V2

Folders and files

Latest commit

History

Repository files navigation

Bruno_V2: A RAG application to chat with PDFs

Features

What can Bruno do?

Contextual Retrieval RAG

Working of Contextual Retrieval RAG in Bruno

Upon file upload

Parsing text from PDF

Splitting the text into chunks

Adding context to chunks

Creating TF-IDF encodings

Creating vector embeddings

Store vector embeddings in Pinecone

Define ensemble retriever

Inference from RAG pipeline

History based prompt modification

Retrieving chunks using similarity and key word search

Rerank and deduplicate

LLM prompting after augmentation

Streaming result and storing to chat history

Specifications used

Application

Upcoming features in Bruno 3.0

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages