Release Notes
In addition to the features in the 0.5.0 release, this release adds:
- Add ability to choose provider and modify options at runtime
- Fixed data leakage bug with KV caches
Features in 0.5.0:
- Support for MultiLoRA
- Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
- Support for the Phi-3 MoE model
- Support for NVIDIA Nemotron model
- Support for the Qwen model
- Addition of the Set Terminate feature, which allows users to cancel mid-generation
- Soft capping support for Group Query Attention
- Extend quantization support to embedding and LM head layers
- Mac support in published packages
Known issues
- Models running with DirectML do not support batching
- Python 3.13 is not supported in this release