Skip to content

Latest commit

 

History

History

RAG_opensource

RAG Step-by-Step with Open Source

This is a working example of Frank Denneman's article RAG Architecture Deep Dive which defines the Load-Transform-Embed-Store workflow. For building RAG applications.

Examples for RAG Step-by-Step.

  • create_embeddings.py: splits the transcripts in chunks and creates vectors from the data
  • insert_embeddings.py: creates a Pincone index and upserts the embeddins to a serverless vector database
  • app.py: a Streamlit client for querying the Pinecone database and prompting OpenAI

Requirements

This example uses Ollama running the llama2 LLM model and PostgreSQL with the pgvector extension. These can be installed locally.

[localstack_ai] is a containerized environment with both Ollama and PostgreSQL and pgvector. To use localstack_ai, Docker and docker compose are required.

To start localstack_ai:

git clone https://github.com/spara/localstack_ai.git
cd ./localstack_ai
docker compose -f ollama_stack.yml up
cd ./ollama
ollama pull llama2

Running the examples

  1. Clone the repository and install the packages:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
  1. Get data. A script for downloading YouTube video transcripts is available, but any set of plain text documents will also work with this example.

  2. Run python3 create_embeddings.py to parse the text documents and create embeddings. Change the directory to the location of the text files and the name of the output file.

  3. Create a PostgreSQL database. Use either pgsql or a client such as pgAdmin or DBeaver. The example uses a database called items.

  4. Run python3 insert_embeddings.py to insert the records into PostgreSQL.

  5. Run streamllit run app.py to run the client. Note that the database_search function in the application includes queries for cosine similarity, L2 distance, and inner product metrics. To experiment with similarity metrics, uncomment the chosen metric and comment the other metrics.