v0.7.9

The Local Upgrade

Warning: This update attempts to load files from app assets which may fail as I have not yet tested this on multiple devices. Please report if you get stuck in a boot crash! This update is generally very experimental with a few changes to the core c++ code of llama.rn, so it may be unstable.

Features:

Local generation has migrated to cui-llama.rn , a fork of the fantastic llama.rn project, but with custom features tailored for ChatterUI:
- Added stopping prompt processing between batches - more effective when used with low batch size.
- vocab_only mode which allows for tokenizer only usage - this also removes the need for onnx-runtime and the old transformer.js adaptation for tokenizers, cutting down app size significantly!
- Synchronous tokenization for ease of development
- Context Shifting adapted from kobold.cpp (Thanks @LostRuins) - this allows you to use high context chats without needing to reprocess the entire context upon hitting context limit
Added support for i8mm compatible devices (Snapdragon 8 Gen 1 or newer / Exynos 2200 or newer)
- This feature allows the use of Q4_0_4_8 quantization levels optimized for ARM devices.
- It is recommended to requantize your models to this quantization level using the llama.cpp quantize tool:
  .\llama-quantize.exe --allow-requantize model.gguf Q4_0_4_8

Changes:

Local inferencing is now done as a background task! This should mean that tabbing out of the app should not stop inferencing.
Buttons in Local API menu now properly disable based on model state
The internal tokenizer now relies entirely on the much faster implementation in cui-llama.rn. As such the previous JS tokenizer has been removed alongside onnx-runtime, leading to much smaller APK size.

Fixes:

Continuing with local API now properly respects the regenCache
removed BOS token from default Llama 3 instruct preset

Dev:

Moved constants and components under app/, as this seems to affect react-native's Fast Refresh functionality significantly
Moved local api state to zustand this helps a lot with fast refresh bugginess in development and prevents the model state from being unloaded upon a refresh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.9

v0.7.9

Contributors