Skip to content

v0.7.9

Compare
Choose a tag to compare
@Vali-98 Vali-98 released this 30 Jul 05:15
· 836 commits to master since this release

v0.7.9

The Local Upgrade

Warning: This update attempts to load files from app assets which may fail as I have not yet tested this on multiple devices. Please report if you get stuck in a boot crash! This update is generally very experimental with a few changes to the core c++ code of llama.rn, so it may be unstable.

Features:

  • Local generation has migrated to cui-llama.rn , a fork of the fantastic llama.rn project, but with custom features tailored for ChatterUI:
    • Added stopping prompt processing between batches - more effective when used with low batch size.
    • vocab_only mode which allows for tokenizer only usage - this also removes the need for onnx-runtime and the old transformer.js adaptation for tokenizers, cutting down app size significantly!
    • Synchronous tokenization for ease of development
    • Context Shifting adapted from kobold.cpp (Thanks @LostRuins) - this allows you to use high context chats without needing to reprocess the entire context upon hitting context limit
  • Added support for i8mm compatible devices (Snapdragon 8 Gen 1 or newer / Exynos 2200 or newer)
    • This feature allows the use of Q4_0_4_8 quantization levels optimized for ARM devices.
    • It is recommended to requantize your models to this quantization level using the llama.cpp quantize tool:
      .\llama-quantize.exe --allow-requantize model.gguf Q4_0_4_8

Changes:

  • Local inferencing is now done as a background task! This should mean that tabbing out of the app should not stop inferencing.
  • Buttons in Local API menu now properly disable based on model state
  • The internal tokenizer now relies entirely on the much faster implementation in cui-llama.rn. As such the previous JS tokenizer has been removed alongside onnx-runtime, leading to much smaller APK size.

Fixes:

  • Continuing with local API now properly respects the regenCache
  • removed BOS token from default Llama 3 instruct preset

Dev:

  • Moved constants and components under app/, as this seems to affect react-native's Fast Refresh functionality significantly
  • Moved local api state to zustand this helps a lot with fast refresh bugginess in development and prevents the model state from being unloaded upon a refresh