Releases · ggerganov/llama.cpp

27 Nov 22:09

9f91251

b4202 Latest

Latest

common : fix duplicated file name with hf_repo and hf_file (#10550)

Assets 22

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2024-11-27T22:09:47Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2024-11-27T22:09:54Z
llama-b1-bin-win-cuda-cu11.7-x64.zip

146 MB 2024-11-27T22:10:01Z
llama-b1-bin-win-cuda-cu12.4-x64.zip

146 MB 2024-11-27T22:10:05Z
llama-b1-bin-win-hip-x64-gfx1030.zip

236 MB 2024-11-27T22:10:09Z
llama-b1-bin-win-hip-x64-gfx1100.zip

238 MB 2024-11-27T22:10:14Z
llama-b1-bin-win-hip-x64-gfx1101.zip

238 MB 2024-11-27T22:10:18Z
llama-b4202-bin-macos-arm64.zip

52.8 MB 2024-11-27T22:10:23Z
llama-b4202-bin-macos-x64.zip

54.5 MB 2024-11-27T22:10:25Z
llama-b4202-bin-ubuntu-x64.zip

58.6 MB 2024-11-27T22:10:26Z
Source code (zip)

2024-11-27T21:30:52Z
Source code (tar.gz)

2024-11-27T21:30:52Z

27 Nov 16:53

github-actions

b4201

3ad5451

b4201

Add some minimal optimizations for CDNA (#10498)

* Add some minimal optimizations for CDNA

* ggml_cuda: set launch bounds also for GCN as it helps there too

Assets 22

27 Nov 10:59

github-actions

b4200

46c69e0

b4200

ci : faster CUDA toolkit installation method and use ccache (#10537)

* ci : faster CUDA toolkit installation method and use ccache

* remove fetch-depth

* only pack CUDA runtime on master

Assets 22

27 Nov 08:24

github-actions

b4195

5b3466b

b4195

vulkan: Handle GPUs with less shared memory (#10468)

There have been reports of failure to compile on systems with <= 32KB
of shared memory (e.g. #10037). This change makes the large tile size
fall back to a smaller size if necessary, and makes mul_mat_id fall
back to CPU if there's only 16KB of shared memory.

Assets 22

26 Nov 22:05

github-actions

b4191

c9b00a7

b4191

ci : fix cuda releases (#10532)

Assets 22

26 Nov 14:35

github-actions

b4179

25669aa

b4179

ggml-cpu: cmake add arm64 cpu feature check for macos (#10487)

* ggml-cpu: cmake add arm64 cpu feature check for macos

* use vmmlaq_s32 for compile option i8mm check

Assets 21

26 Nov 14:30

github-actions

b4178

84e1c33

b4178

server : fix parallel speculative decoding (#10513)

ggml-ci

Assets 21

26 Nov 14:11

github-actions

b4177

811872a

b4177

speculative : simplify the implementation (#10504)

ggml-ci

Assets 21

26 Nov 13:27

github-actions

b4176

9a4b79b

b4176

CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454)

* improve inferencing performance for ascend npu.

Co-authored-by: Frank Mai <thxCode@[email protected]>

* some modification after review

* some modifications after review

* restore some modifications

* restore some modifications

---------

Co-authored-by: shanshan shen <[email protected]>
Co-authored-by: Frank Mai <thxCode@[email protected]>

Assets 21

26 Nov 12:23

github-actions

b4175

7066b4c

b4175

CANN: RoPE and CANCAT operator optimization (#10488)

Co-authored-by: noemotiovon <[email protected]>

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggerganov/llama.cpp

b4202

b4201

b4200

b4195

b4191

b4179

b4178

b4177

b4176

b4175