Skip to content

0.1.9

Compare
Choose a tag to compare
@github-actions github-actions released this 22 Aug 11:54
· 137 commits to master since this release
  • Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
  • CUDA Graphs to reduce overhead and CPU bottlenecking
  • Various other optimizations
  • Some bugfixes

Full Changelog: v0.1.8...v0.1.9