persuaide

Main files are:

extract.py: Convert any pdfs stored in /data/raw to .txt files saved to /data/extracted
rag_for_hybrid_search.py: Initial implementation of chunking, embedding, and upserting to Pinecone db. Extracts author and title from .txt files and adds them into vector metadata. Upserts generated vectors (w/ ids and metadata) to Pinecone db.
query.py: Helper to query db and parse response
api.py: Localhost api to query db

To use:

Install python
Create a virtualenv and install dependencies with pip install -r requirements.txt

The below two steps are only necessary if you are adding new content to the DB

Run extract.py to convert files from .pdf to .txt
Run rag_for_hybrid_search.py to create vector embeddings and upsert to Pinecone db.
Run fastapi dev api.py to run localhost api. Go to http://127.0.0.1:8000/docs to test API routes

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
.template_env		.template_env
LICENSE		LICENSE
README.md		README.md
api.py		api.py
api_requirements.txt		api_requirements.txt
extract.py		extract.py
query.py		query.py
rag_for_hybrid_search.py		rag_for_hybrid_search.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

persuaide

Main files are:

To use:

About

Releases

Packages

Contributors 2

Languages

License

JoshTrim/persuaide

Folders and files

Latest commit

History

Repository files navigation

persuaide

Main files are:

To use:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages