Skip to content

Releases: VJHack/llama.cpp

b4476

14 Jan 01:11
504af20
Compare
Choose a tag to compare
server : (UI) Improve messages bubble shape in RTL (#11220)

I simply have overlooked message bubble's tail placement for RTL
text as I use the dark mode and that isn't visible there and this
fixes it.

b4457

10 Jan 05:54
ee7136c
Compare
Choose a tag to compare
llama: add support for QRWKV6 model architecture (#11001)

llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <[email protected]>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <[email protected]>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <[email protected]>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <[email protected]>

* Fix some typos

Signed-off-by: Molly Sophia <[email protected]>

* code format changes

Signed-off-by: Molly Sophia <[email protected]>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <[email protected]>

* Fix cuda warning

Signed-off-by: Molly Sophia <[email protected]>

* Update README.md

Signed-off-by: Molly Sophia <[email protected]>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <[email protected]>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <[email protected]>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: compilade <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: compilade <[email protected]>

b4447

08 Jan 17:32
f7cd133
Compare
Choose a tag to compare
ci : use actions from ggml-org (#11140)

b4444

08 Jan 13:20
99a3755
Compare
Choose a tag to compare
sync : ggml

b4431

07 Jan 00:32
dc7cef9
Compare
Choose a tag to compare
llama-run : fix context size (#11094)

Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is
a more reasonable 2048.

Signed-off-by: Eric Curtin <[email protected]>

b4311

12 Dec 16:58
9fdb124
Compare
Choose a tag to compare
common : add missing env var for speculative (#10801)

b4306

11 Dec 18:00
1a31d0d
Compare
Choose a tag to compare
Update README.md (#10772)

b4295

10 Dec 02:57
26a8406
Compare
Choose a tag to compare
CUDA: fix shared memory access condition for mmv (#10740)

b4277

06 Dec 21:22
c5ede38
Compare
Choose a tag to compare
convert : add custom attention mapping

b4230

30 Nov 21:17
0c39f44
Compare
Choose a tag to compare
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…