Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omnivision → omniVLM #303

Merged
merged 2 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

## Latest News 🔥

- Support Nexa AI's own vision language model (0.9B parameters): `nexa run omnivision` and audio language model (2.9B parameters): `nexa run omniaudio`
- Support Nexa AI's own vision language model (0.9B parameters): `nexa run omniVLM` and audio language model (2.9B parameters): `nexa run omniaudio`
- Support audio language model: `nexa run qwen2audio`, **we are the first open-source toolkit to support audio language model with GGML tensor library.**
- Support iOS Swift binding for local inference on **iOS mobile** devices.
- Support embedding model: `nexa embed <model_path> <prompt>`
Expand Down Expand Up @@ -228,7 +228,7 @@ Supported model examples (full list at [Model Hub](https://nexa.ai/models)):
| [qwen2audio](https://nexa.ai/Qwen/Qwen2-Audio-7.8B-Instruct/gguf-q4_K_M/readme) | AudioLM | GGUF | `nexa run qwen2audio` |
| [octopus-v2](https://www.nexaai.com/NexaAI/Octopus-v2/gguf-q4_0/readme) | Function Call | GGUF | `nexa run octopus-v2` |
| [octo-net](https://www.nexaai.com/NexaAI/Octo-net/gguf-q4_0/readme) | Text | GGUF | `nexa run octo-net` |
| [omnivision](https://nexa.ai/NexaAI/omnivision/gguf-fp16/readme) | Multimodal | GGUF | `nexa run omnivision` |
| [omniVLM](https://nexa.ai/NexaAI/omniVLM/gguf-fp16/readme) | Multimodal | GGUF | `nexa run omniVLM` |
| [nanollava](https://www.nexaai.com/qnguyen3/nanoLLaVA/gguf-fp16/readme) | Multimodal | GGUF | `nexa run nanollava` |
| [llava-phi3](https://www.nexaai.com/xtuner/llava-phi-3-mini/gguf-q4_0/readme) | Multimodal | GGUF | `nexa run llava-phi3` |
| [llava-llama3](https://www.nexaai.com/xtuner/llava-llama-3-8b-v1.1/gguf-q4_0/readme) | Multimodal | GGUF | `nexa run llava-llama3` |
Expand Down
10 changes: 8 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,16 @@ pip install nexaai[onnx] # if you need ONNX support
```

### build from source

To build C++ only

```
cmake -B build -S .
cmake --build build --config Release -j32
```

To build C++ and install python package from source, run the following commands:

```bash
git clone --recursive https://github.com/NexaAI/nexa-sdk.git
cd nexa-sdk
Expand Down Expand Up @@ -75,7 +79,7 @@ python -m nexa.gguf.nexa_inference_text gemma
python -m nexa.gguf.nexa_inference_text octopusv2 --stop_words "<nexa_end>"
wget https://assets-c4akfrf5b4d3f4b7.z01.azurefd.net/assets/2024/04/BMDataViz_661fb89f3845e.png -O test.png
python -m nexa.gguf.nexa_inference_vlm nanollava
python -m nexa.gguf.nexa_inference_vlm_omni omnivision
python -m nexa.gguf.nexa_inference_vlm_omni omniVLM
python -m nexa.gguf.nexa_inference_image sd1-4
python -m nexa.gguf.nexa_inference_image sd1-4 --img2img
wget -O control_normal-fp16.safetensors https://huggingface.co/webui/ControlNet-modules-safetensors/resolve/main/control_normal-fp16.safetensors
Expand Down Expand Up @@ -235,7 +239,9 @@ dumpbin /dependents your_executable_or_dll.dll # in Developer PowerShell for Vi
```

### Debug dynamic lib

According to [isse](https://github.com/abetlen/llama-cpp-python/issues/1346), below can check the exported symbols on linux.

```
readelf -Ws --dyn-syms libllama.so
```
```
10 changes: 5 additions & 5 deletions nexa/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,8 +188,8 @@ class ModelType(Enum):
"omnivision-preview": "omnivision-preview:projector-fp16",
"omnivision-preview:fp16": "omnivision-preview:projector-fp16",
"omnivision-preview:q4_0": "omnivision-preview:projector-q4_0",
"omnivision": "omnivision:projector-fp16",
"omnivision:fp16": "omnivision:projector-fp16",
"omniVLM": "omniVLM:projector-fp16",
"omniVLM:fp16": "omniVLM:projector-fp16",
"omnivision-ocr": "omnivision-ocr:projector-fp16",
"omnivision-ocr:fp16": "omnivision-ocr:projector-fp16",
}
Expand All @@ -198,8 +198,8 @@ class ModelType(Enum):
"omnivision-preview": "omnivision-preview:model-fp16",
"omnivision-preview:fp16": "omnivision-preview:model-fp16",
"omnivision-preview:q4_0": "omnivision-preview:model-q4_0",
"omnivision": "omnivision:model-fp16",
"omnivision:fp16": "omnivision:model-fp16",
"omniVLM": "omniVLM:model-fp16",
"omniVLM:fp16": "omniVLM:model-fp16",
"omnivision-ocr": "omnivision-ocr:model-fp16",
"omnivision-ocr:fp16": "omnivision-ocr:model-fp16",
}
Expand Down Expand Up @@ -461,7 +461,7 @@ class ModelType(Enum):
"FLUX.1-schnell": ModelType.COMPUTER_VISION,
"Phi-3-vision-128k-instruct": ModelType.MULTIMODAL,
"omnivision-preview": ModelType.MULTIMODAL,
"omnivision": ModelType.MULTIMODAL,
"omniVLM": ModelType.MULTIMODAL,
"omnivision-ocr": ModelType.MULTIMODAL,
"nanoLLaVA": ModelType.MULTIMODAL,
"llava-v1.6-mistral-7b": ModelType.MULTIMODAL,
Expand Down
2 changes: 1 addition & 1 deletion nexa/gguf/nexa_inference_vlm_omni.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def __init__(
else:
self.n_gpu_layers = 0

# Handle direct model file paths (e.g., omnivision:model-fp16)
# Handle direct model file paths (e.g., omniVLM:model-fp16)
if model_path and ':model-' in model_path:
base_name = model_path.split(':')[0]
model_type = model_path.split('model-')[1]
Expand Down
Loading