Skip to content

Releases: ggerganov/llama.cpp

b4202

27 Nov 22:09
9f91251
Compare
Choose a tag to compare
common : fix duplicated file name with hf_repo and hf_file (#10550)

b4201

27 Nov 16:53
3ad5451
Compare
Choose a tag to compare
Add some minimal optimizations for CDNA (#10498)

* Add some minimal optimizations for CDNA

* ggml_cuda: set launch bounds also for GCN as it helps there too

b4200

27 Nov 10:59
46c69e0
Compare
Choose a tag to compare
ci : faster CUDA toolkit installation method and use ccache (#10537)

* ci : faster CUDA toolkit installation method and use ccache

* remove fetch-depth

* only pack CUDA runtime on master

b4195

27 Nov 08:24
5b3466b
Compare
Choose a tag to compare
vulkan: Handle GPUs with less shared memory (#10468)

There have been reports of failure to compile on systems with <= 32KB
of shared memory (e.g. #10037). This change makes the large tile size
fall back to a smaller size if necessary, and makes mul_mat_id fall
back to CPU if there's only 16KB of shared memory.

b4191

26 Nov 22:05
c9b00a7
Compare
Choose a tag to compare
ci : fix cuda releases (#10532)

b4179

26 Nov 14:35
25669aa
Compare
Choose a tag to compare
ggml-cpu: cmake add arm64 cpu feature check for macos (#10487)

* ggml-cpu: cmake add arm64 cpu feature check for macos

* use vmmlaq_s32 for compile option i8mm check

b4178

26 Nov 14:30
84e1c33
Compare
Choose a tag to compare
server : fix parallel speculative decoding (#10513)

ggml-ci

b4177

26 Nov 14:11
811872a
Compare
Choose a tag to compare
speculative : simplify the implementation (#10504)

ggml-ci

b4176

26 Nov 13:27
9a4b79b
Compare
Choose a tag to compare
CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454)

* improve inferencing performance for ascend npu.

Co-authored-by: Frank Mai <thxCode@[email protected]>

* some modification after review

* some modifications after review

* restore some modifications

* restore some modifications

---------

Co-authored-by: shanshan shen <[email protected]>
Co-authored-by: Frank Mai <thxCode@[email protected]>

b4175

26 Nov 12:23
7066b4c
Compare
Choose a tag to compare
CANN: RoPE and CANCAT operator optimization (#10488)

Co-authored-by: noemotiovon <[email protected]>