Releases · VJHack/llama.cpp

14 Jan 01:11

504af20

b4476 Latest

Latest

server : (UI) Improve messages bubble shape in RTL (#11220)

I simply have overlooked message bubble's tail placement for RTL
text as I use the dark mode and that isn't visible there and this
fixes it.

Assets 23

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-01-14T01:11:12Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-01-14T01:11:18Z
llama-b4476-bin-macos-arm64.zip

13 MB 2025-01-14T01:11:26Z
llama-b4476-bin-macos-x64.zip

13.9 MB 2025-01-14T01:11:27Z
llama-b4476-bin-ubuntu-x64.zip

15.8 MB 2025-01-14T01:11:28Z
llama-b4476-bin-win-avx-x64.zip

9.83 MB 2025-01-14T01:11:28Z
llama-b4476-bin-win-avx2-x64.zip

9.84 MB 2025-01-14T01:11:29Z
llama-b4476-bin-win-avx512-x64.zip

9.85 MB 2025-01-14T01:11:29Z
llama-b4476-bin-win-cuda-cu11.7-x64.zip

147 MB 2025-01-14T01:11:30Z
llama-b4476-bin-win-cuda-cu12.4-x64.zip

147 MB 2025-01-14T01:11:33Z
Source code (zip)

2025-01-13T19:23:31Z
Source code (tar.gz)

2025-01-13T19:23:31Z

10 Jan 05:54

github-actions

b4457

ee7136c

b4457

llama: add support for QRWKV6 model architecture (#11001)

llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <[email protected]>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <[email protected]>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <[email protected]>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <[email protected]>

* Fix some typos

Signed-off-by: Molly Sophia <[email protected]>

* code format changes

Signed-off-by: Molly Sophia <[email protected]>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <[email protected]>

* Fix cuda warning

Signed-off-by: Molly Sophia <[email protected]>

* Update README.md

Signed-off-by: Molly Sophia <[email protected]>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <[email protected]>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <[email protected]>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: compilade <[email protected]>

---------

Signed-off-by: Molly Sophia <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: compilade <[email protected]>

Assets 23

08 Jan 17:32

github-actions

b4447

f7cd133

b4447

ci : use actions from ggml-org (#11140)

Assets 23

08 Jan 13:20

github-actions

b4444

99a3755

b4444

sync : ggml

Assets 23

07 Jan 00:32

github-actions

b4431

dc7cef9

b4431

llama-run : fix context size (#11094)

Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is
a more reasonable 2048.

Signed-off-by: Eric Curtin <[email protected]>

Assets 23

12 Dec 16:58

github-actions

b4311

9fdb124

b4311

common : add missing env var for speculative (#10801)

Assets 22

11 Dec 18:00

github-actions

b4306

1a31d0d

b4306

Update README.md (#10772)

Assets 22

10 Dec 02:57

github-actions

b4295

26a8406

b4295

CUDA: fix shared memory access condition for mmv (#10740)

Assets 22

06 Dec 21:22

github-actions

b4277

c5ede38

b4277

convert : add custom attention mapping

Assets 22

30 Nov 21:17

github-actions

b4230

0c39f44

b4230

ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…

Assets 22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: VJHack/llama.cpp

b4476

b4457

b4447

b4444

b4431

b4311

b4306

b4295

b4277

b4230