This is a working example of Frank Denneman's article RAG Architecture Deep Dive which defines the Load-Transform-Embed-Store workflow. For building RAG applications.
Examples for RAG Step-by-Step.
- create_embeddings.py: splits the transcripts in chunks and creates vectors from the data
- insert_embeddings.py: creates a Pincone index and upserts the embeddins to a serverless vector database
- app.py: a Streamlit client for querying the Pinecone database and prompting OpenAI
This example uses Ollama running the llama2 LLM model and PostgreSQL with the pgvector extension. These can be installed locally.
[localstack_ai] is a containerized environment with both Ollama and PostgreSQL and pgvector. To use localstack_ai, Docker and docker compose are required.
To start localstack_ai:
git clone https://github.com/spara/localstack_ai.git
cd ./localstack_ai
docker compose -f ollama_stack.yml up
cd ./ollama
ollama pull llama2
- Clone the repository and install the packages:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
-
Get data. A script for downloading YouTube video transcripts is available, but any set of plain text documents will also work with this example.
-
Run
python3 create_embeddings.py
to parse the text documents and create embeddings. Change the directory to the location of the text files and the name of the output file. -
Create a PostgreSQL database. Use either pgsql or a client such as pgAdmin or DBeaver. The example uses a database called
items
. -
Run
python3 insert_embeddings.py
to insert the records into PostgreSQL. -
Run
streamllit run app.py
to run the client. Note that thedatabase_search
function in the application includes queries for cosine similarity, L2 distance, and inner product metrics. To experiment with similarity metrics, uncomment the chosen metric and comment the other metrics.