Welcome to the source repository of quora_vector_search.

Get Quora question search results from a broad query using vector search. Powered by pgvector for Postgres and all-MiniLM-L6-v2 embeddings model.

Getting Started

Prerequisites

Python 3
PostgreSQL database
pgvector Postgres extension - follow the instructions at https://github.com/pgvector/pgvector to get up and running with the extension
Quora questions dataset — A plain text file with one question per line. Save it as dataset_quora_questions.txt. I used the question1 column of this Kaggle dataset: https://www.kaggle.com/datasets/sambit7/first-quora-dataset

Instructions

Database

Prepare your PostgreSQL database by running the commands in schema.sql in the database shell.

Make a copy of template.env, rename it to .env, and populate it with the info of your PostgreSQL database.

Workspace

Create a virtual environment: python3 -m venv .venv
Activate the virtual environment in the manner according to your OS/platform.

Install project dependencies: pip install -r requirements.txt

Embeddings

Run the script embeddings_generate.py. This will read the dataset in the file dataset_quora_questions.txt, generate embeddings for the questions, and store the questions along with their embeddings in the PostgreSQL database with the pgvector extension.

Important

By default, the script only considers the first 10,000 lines of the dataset for the sake of my own computer. If you want to use the full dataset, replace for line in lines[:10_000]: with for line in lines: in the embeddings_generate.py script.

Running the REST API

After performing the previous step, execute the shell script start_server.sh to start the question search REST API, then visit the URL printed in the console.

You can search for a question by making a GET request to: https://<URL>/search/Your+question+goes+here

The top 5 results from the database will be returned. Example response:

{"results":[["How can I speed up my Internet connectionn?"],["How can I increase the traffic to a website?"],["How can I increase the traffic to my website?"],["How can I increase the speed of my internet connection while using a VPN?"],["How do I  increase traffic on my site?"]]}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
embeddings_generate.py		embeddings_generate.py
requirements.txt		requirements.txt
schema.sql		schema.sql
server.py		server.py
start_server.sh		start_server.sh
template.env		template.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the source repository of quora_vector_search.

Getting Started

Prerequisites

Instructions

Database

Workspace

Embeddings

Running the REST API

License

About

Languages

License

xtt28/quora_vector_search

Folders and files

Latest commit

History

Repository files navigation

Welcome to the source repository of quora_vector_search.

Getting Started

Prerequisites

Instructions

Database

Workspace

Embeddings

Running the REST API

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages