Merge pull request #241 from NexaAI/unseenmars-readme-3

Update README.md
NexaAI · Nov 13, 2024 · 519d699 · 519d699
2 parents e2023ba + 1fefdbb
commit 519d699
Showing 1 changed file with 17 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -254,6 +254,23 @@ Supported model examples (full list at [Model Hub](https://nexa.ai/models)):
 | [all-MiniLM-L12-v2](https://nexa.ai/sentence-transformers/all-MiniLM-L12-v2/gguf-fp16/readme) | Embedding | GGUF | `nexa embed all-MiniLM-L12-v2:fp16` |
 | [bark-small](https://nexa.ai/suno/bark-small/gguf-fp16/readme) | Text-to-Speech | GGUF | `nexa run bark-small:fp16` |
 
+## Run Models from 🤗 HuggingFace 
+You can pull, convert (to .gguf), quantize and run [llama.cpp supported](https://github.com/ggerganov/llama.cpp#description) text generation models from HF with Nexa SDK.
+### Run .gguf File
+Use `nexa run -hf <hf-model-id>` to run models with provided .gguf files:
+```bash
+nexa run -hf Qwen/Qwen2.5-Coder-7B-Instruct-GGUF
+```
+> **Note:** You will be prompted to select a single .gguf file. If your desired quantization version has multiple split files (like fp16-00001-of-00004), please use Nexa's conversion tool (see below) to convert and quantize the model locally.
+### Convert .safetensors Files
+Install [Nexa Python package](https://github.com/NexaAI/nexa-sdk?tab=readme-ov-file#install-option-2-python-package), and install Nexa conversion tool with `pip install "nexaai[convert]"`, then convert models with `nexa convert <hf-model-id>`:
+```bash
+nexa convert HuggingFaceTB/SmolLM2-135M-Instruct
+```
+> **Note:** Check our [leaderboard](https://nexa.ai/leaderboard) for performance benchmarks of different quantized versions of mainstream language models and [HuggingFace docs](https://huggingface.co/docs/optimum/en/concept_guides/quantization) to learn about quantization options.
+
+📋 You can view downloaded and converted models with `nexa list`
+
 ## Documentation
 
 > [!NOTE]