Skip to content

0.0.12

Compare
Choose a tag to compare
@github-actions github-actions released this 22 Jan 20:04
· 853 commits to master since this release

Lots of fixes and tweaks. Main feature updates:

Model support:

  • Basic LoRA support for MoE models
  • Support for Orion models (also groundwork for other layernorm models)
  • Support for loading/converting from Axolotl checkpoints

Generation/sampling:

  • Fused kernels enabled for num_experts = 4
  • Option to return probs from streaming generator
  • Add top-A sampling
  • Add freq/pres penalties
  • CFG support in streaming generator
  • Disable flash-attn for non-causal attention (fixes left-padding until FA2 implements custom bias)

Testing/evaluation:

  • HumanEval test
  • Script to compare two models layer by layer (e.g. quantized vs. original model)
  • "Standard" ppl test that attempts to mimic text-generation-webui

Conversion:

  • VRAM optimizations
  • Optimized quantization kernels

IO:

  • Cache safetensors context managers for faster loading
  • Optional direct IO loader (for very fast arrays)