From ce49ccd495e379518fb59f33cea252d903781733 Mon Sep 17 00:00:00 2001 From: Alonso Silva Allende Date: Tue, 13 Aug 2024 23:03:01 +0200 Subject: [PATCH] Change cookbook examples: Download model weights in the hub cache folder (#1097) Change cookbook examples: Download model weights in the hub cache folder --- docs/cookbook/chain_of_thought.md | 56 +++++++++++++------- docs/cookbook/knowledge_graph_extraction.md | 56 +++++++++++++------- docs/cookbook/qa-with-citations.md | 55 +++++++++++++------- docs/cookbook/react_agent.md | 57 +++++++++++++-------- 4 files changed, 148 insertions(+), 76 deletions(-) diff --git a/docs/cookbook/chain_of_thought.md b/docs/cookbook/chain_of_thought.md index cc079a7f..17c36269 100644 --- a/docs/cookbook/chain_of_thought.md +++ b/docs/cookbook/chain_of_thought.md @@ -11,30 +11,48 @@ We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp- pip install llama-cpp-python ``` -We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): +We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern): +```python +import llama_cpp +from outlines import generate, models -```bash -wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf +model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF", + "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", + tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( + "NousResearch/Hermes-2-Pro-Llama-3-8B" + ), + n_gpu_layers=-1, + flash_attn=True, + n_ctx=8192, + verbose=False) ``` -We initialize the model: +??? note "(Optional) Store the model weights in a custom folder" -```python -from llama_cpp import Llama -from outlines import generate, models + By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): -llm = Llama( - "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", - tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( - "NousResearch/Hermes-2-Pro-Llama-3-8B" - ), - n_gpu_layers=-1, - flash_attn=True, - n_ctx=8192, - verbose=False -) -model = models.LlamaCpp(llm) -``` + ```bash + wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf + ``` + + We initialize the model: + + ```python + import llama_cpp + from llama_cpp import Llama + from outlines import generate, models + + llm = Llama( + "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", + tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( + "NousResearch/Hermes-2-Pro-Llama-3-8B" + ), + n_gpu_layers=-1, + flash_attn=True, + n_ctx=8192, + verbose=False + ) + ``` ## Chain of thought diff --git a/docs/cookbook/knowledge_graph_extraction.md b/docs/cookbook/knowledge_graph_extraction.md index c4c1dc75..e25166bc 100644 --- a/docs/cookbook/knowledge_graph_extraction.md +++ b/docs/cookbook/knowledge_graph_extraction.md @@ -8,30 +8,48 @@ We will use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama pip install llama-cpp-python ``` -We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): +We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern): +```python +import llama_cpp +from outlines import generate, models -```bash -wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf +model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF", + "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", + tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( + "NousResearch/Hermes-2-Pro-Llama-3-8B" + ), + n_gpu_layers=-1, + flash_attn=True, + n_ctx=8192, + verbose=False) ``` -We initialize the model: +??? note "(Optional) Store the model weights in a custom folder" -```python -from llama_cpp import Llama -from outlines import generate, models + By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): -llm = Llama( - "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", - tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( - "NousResearch/Hermes-2-Pro-Llama-3-8B" - ), - n_gpu_layers=-1, - flash_attn=True, - n_ctx=8192, - verbose=False -) -model = models.LlamaCpp(llm) -``` + ```bash + wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf + ``` + + We initialize the model: + + ```python + import llama_cpp + from llama_cpp import Llama + from outlines import generate, models + + llm = Llama( + "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", + tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( + "NousResearch/Hermes-2-Pro-Llama-3-8B" + ), + n_gpu_layers=-1, + flash_attn=True, + n_ctx=8192, + verbose=False + ) + ``` ## Knowledge Graph Extraction diff --git a/docs/cookbook/qa-with-citations.md b/docs/cookbook/qa-with-citations.md index c2111617..79a2214c 100644 --- a/docs/cookbook/qa-with-citations.md +++ b/docs/cookbook/qa-with-citations.md @@ -8,29 +8,48 @@ We will use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama pip install llama-cpp-python ``` -We pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): +We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern): +```python +import llama_cpp +from outlines import generate, models -```bash -wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf +model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF", + "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", + tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( + "NousResearch/Hermes-2-Pro-Llama-3-8B" + ), + n_gpu_layers=-1, + flash_attn=True, + n_ctx=8192, + verbose=False) ``` -We initialize the model: +??? note "(Optional) Store the model weights in a custom folder" -```python -from llama_cpp import Llama -from outlines import generate, models + By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): -llm = Llama( - "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", - tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( - "NousResearch/Hermes-2-Pro-Llama-3-8B" - ), - n_gpu_layers=-1, - flash_attn=True, - n_ctx=8192, - verbose=False -) -``` + ```bash + wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf + ``` + + We initialize the model: + + ```python + import llama_cpp + from llama_cpp import Llama + from outlines import generate, models + + llm = Llama( + "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", + tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( + "NousResearch/Hermes-2-Pro-Llama-3-8B" + ), + n_gpu_layers=-1, + flash_attn=True, + n_ctx=8192, + verbose=False + ) + ``` ## Generate Synthetic Data diff --git a/docs/cookbook/react_agent.md b/docs/cookbook/react_agent.md index 15fb964a..ca4829d5 100644 --- a/docs/cookbook/react_agent.md +++ b/docs/cookbook/react_agent.md @@ -12,32 +12,49 @@ We use [llama.cpp](https://github.com/ggerganov/llama.cpp) using the [llama-cpp- pip install llama-cpp-python ``` -We pull a quantized GGUF model, in this guide we pull [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): - -```bash -wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf -``` - -We initialize the model: - +We download the model weights by passing the name of the repository on the HuggingFace Hub, and the filenames (or glob pattern): ```python import llama_cpp -from llama_cpp import Llama from outlines import generate, models -llm = Llama( - "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", - tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( - "NousResearch/Hermes-2-Pro-Llama-3-8B" - ), - n_gpu_layers=-1, - flash_attn=True, - n_ctx=8192, - verbose=False -) -model = models.LlamaCpp(llm) +model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF", + "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", + tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( + "NousResearch/Hermes-2-Pro-Llama-3-8B" + ), + n_gpu_layers=-1, + flash_attn=True, + n_ctx=8192, + verbose=False) ``` +??? note "(Optional) Store the model weights in a custom folder" + + By default the model weights are downloaded to the hub cache but if we want so store the weights in a custom folder, we pull a quantized GGUF model [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B-GGUF) by [NousResearch](https://nousresearch.com/) from [HuggingFace](https://huggingface.co/): + + ```bash + wget https://hf.co/NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF/resolve/main/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf + ``` + + We initialize the model: + + ```python + import llama_cpp + from llama_cpp import Llama + from outlines import generate, models + + llm = Llama( + "/path/to/model/Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf", + tokenizer=llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained( + "NousResearch/Hermes-2-Pro-Llama-3-8B" + ), + n_gpu_layers=-1, + flash_attn=True, + n_ctx=8192, + verbose=False + ) + ``` + ## Build a ReAct agent In this example, we use two tools: