From 1fefdbba3491a3bf08c12d1bdde787ac2bea2ddd Mon Sep 17 00:00:00 2001 From: Haoguang Cai Date: Wed, 13 Nov 2024 14:20:14 -0800 Subject: [PATCH] Update README.md Add nexa run -hf and nexa convert instruction --- README.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/README.md b/README.md index 19e44b39..474e3958 100644 --- a/README.md +++ b/README.md @@ -254,6 +254,23 @@ Supported model examples (full list at [Model Hub](https://nexa.ai/models)): | [all-MiniLM-L12-v2](https://nexa.ai/sentence-transformers/all-MiniLM-L12-v2/gguf-fp16/readme) | Embedding | GGUF | `nexa embed all-MiniLM-L12-v2:fp16` | | [bark-small](https://nexa.ai/suno/bark-small/gguf-fp16/readme) | Text-to-Speech | GGUF | `nexa run bark-small:fp16` | +## Run Models from 🤗 HuggingFace +You can pull, convert (to .gguf), quantize and run [llama.cpp supported](https://github.com/ggerganov/llama.cpp#description) text generation models from HF with Nexa SDK. +### Run .gguf File +Use `nexa run -hf ` to run models with provided .gguf files: +```bash +nexa run -hf Qwen/Qwen2.5-Coder-7B-Instruct-GGUF +``` +> **Note:** You will be prompted to select a single .gguf file. If your desired quantization version has multiple split files (like fp16-00001-of-00004), please use Nexa's conversion tool (see below) to convert and quantize the model locally. +### Convert .safetensors Files +Install [Nexa Python package](https://github.com/NexaAI/nexa-sdk?tab=readme-ov-file#install-option-2-python-package), and install Nexa conversion tool with `pip install "nexaai[convert]"`, then convert models with `nexa convert `: +```bash +nexa convert HuggingFaceTB/SmolLM2-135M-Instruct +``` +> **Note:** Check our [leaderboard](https://nexa.ai/leaderboard) for performance benchmarks of different quantized versions of mainstream language models and [HuggingFace docs](https://huggingface.co/docs/optimum/en/concept_guides/quantization) to learn about quantization options. + +📋 You can view downloaded and converted models with `nexa list` + ## Documentation > [!NOTE]