snowbrain_v2_wide.mp4
SnowBrain is an open-source prototype that serves as your personal data analyst. It converses in SQL, remembers previous discussions, and even draws data visualizations for you.
This project is a unique blend of Snowflake, Langchain, OpenAI, Pinecone, NEXTjs, and FastAPI, among other technologies. It's all about reimagining the simplicity of SQL querying. Dive in and discover a new way of interacting with your data.
- Snowflake - Data Cloud
- Next.js - Frontend & backend
- Supabase - DB - Persist chat messages
- Tailwindcss - Styling
- Pinecone - Vector database
- OpenAI - LLM
- Langchain - LLM wrapper
- Cloudinary - Image data
- Clerk.dev - Auth
- Upstash Redis - Rate limiting
- Fast API - Backend python
- Modal Labs - Host backend fastapi
- Vercel - Hosting
- umami - Web analytics
- Snowflake to Vector Database: Automatic conversion of all Snowflake DDL to a vector database.
- Conversational Memory: Maintain context and improve the quality of interactions.
- Snowflake Integration: Integrate with Snowflake schema for automatic SQL generation and visualization.
- Pinecone Vector Database: Leverage Pinecone's vector database management system for efficient searching capabilities.
- Secure Authentication: Employ Clerk.dev for secure and hassle-free user authentication.
- Rate Limit Handling: Utilize Upstash Redis for managing rate limits.
- Fast API: High-performance Python web framework for building APIs.
snowBrain is designed to make complex data querying simple. Here are some example queries you can try:
- Total revenue per product category: "Show me the total revenue for each product category."
- Top customers by sales: "Who are the top 10 customers by sales?"
- Average order value per region: "What is the average order value for each region?"
- Order volume: "How many orders were placed last week?"
- Product price listing: "Display the list of products with their prices."
Follow these steps to get snowBrain up and running in your local environment.
-
Update Environment Variables
Make sure to update the environment variables as necessary. Refer to the example provided:
.env.example
-
Auto fetch All Schema DDL
You can do this by running the following command:
python3 embed/snowflake_ddl_fetcher.py
Make sure to install requirements using
pip3 install -r embed/requirements.txt
-
Convert DDL Documents to Vector & Upload to Pinecone
Use the following command to do this:
python3 embed/embed.py
-
Install Dependencies for the Code Plugin
Navigate to the code plugin directory and install the necessary dependencies using Poetry:
cd code-plugin && poetry install
-
Deploy FastAPI to Modal Labs
Run the following command to deploy your FastAPI (make sure to add a secrets file in modal labs):
modal deploy main.py
After deploying, make sure to store the endpoint in your environment variables:
MODAL_API_ENDPOINT= MODAL_AUTH_TOKEN=random_secret
-
Install packages
Install packages using the following command:
bun install
-
Run Locally
Test the setup locally using the following command:
bun run dev
Test the build
bun run build
-
Deploy to Vercel
Finally, when you're ready, deploy the project to Vercel.
Note: Vercel build is automatically blocked on folders code-plugin, embed and readme.md. You can additionally add a build block command in vercel's dashboard.
Here's how you can contribute:
- Open an issue if you believe you've encountered a bug.
- Make a pull request to add new features/make improvements/fix bugs.
Thanks to @jaredpalmer, @shuding_, @shadcn, @thorwebdev