GitHub - arndt-netapp/rag_llama2: Sample code to implement retrieval-augmented generation with a Llama2 model

Sample code to implement a Llama2 chatbot with Langchain and LlamaCpp. as well as a Llama2 chatbot with RAG using a Chroma vector database.

This is meant to be used with Python3 in a venv, with all required libraries installed via pip.

arndt@rag:~$ python3 -m venv rag
arndt@rag:~$ . rag/bin/activate
(rag) arndt@rag:~$ 
pip install llama-cpp-python
pip install langchain
pip install sentence-transformers
pip install chromadb
pip install chardet
pip install netapp-dataops-traditional

The llama-2-7b-chat.Q4_K_M.gguf model will run on CPU and 8GB of ram, but will run faster on GPU :). Note the following configuration settings and llama-cpp-python installation comamands to get GPU support. This is to be done after the CUDA libraries are installed, which match the driver details as found in the "nvidia-smi" command output.

export PATH="${PATH}:/usr/local/cuda/bin"
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
chat.py		chat.py
rag_chat.py		rag_chat.py
rag_db_add.py		rag_db_add.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

arndt-netapp/rag_llama2

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages