Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[auto] Sync version 2406031220.0.0+llamacpp-release.b3075
== Relevant log messages from source repo: commit 3d7ebf63123b8652fb7bbecef7ba731202309901 Author: 0cc4m <[email protected]> Date: Mon Jun 3 10:59:14 2024 +0200 Vulkan Mixture of Experts (MoE) support (#7628) * Finish Vulkan mul_mat_id implementation * Add Vulkan sum_rows and div ops * Fix MUL_MAT_ID matrix matrix shader * Fix MUL_MAT_ID matrix vector shader dispatch size * Fix MUL_MAT_ID matrix vector shader and dispatch code * Update Vulkan CPU offload for MUL_MAT_ID * Fix crash when using split mode none and setting a main GPU commit a10cda58d3199cd85305e0f03a8c6056714ae2e8 Author: Andy Tai <[email protected]> Date: Mon Jun 3 01:06:24 2024 -0700 cmake : add pkg-config spec file for llama.cpp (#7702) commit 6f28a333c1e3fdfdc7b4f9d0367f2b41a9b7e9d4 Author: zhangkaihuo <[email protected]> Date: Mon Jun 3 15:49:30 2024 +0800 llama : MiniCPM support tied embeddings (#7664) * support lm_head * remove the code block --------- Co-authored-by: zhangkaihuo <[email protected]> commit 549279d8049d78620a2b081e26edb654f83c3bbd Author: Georgi Gerganov <[email protected]> Date: Mon Jun 3 08:34:43 2024 +0300 llama : avoid double token-to-piece cache (#7654) ggml-ci commit 9e405b6e2ecb888e860f7b92720b4809e21b3915 Author: woachk <[email protected]> Date: Mon Jun 3 07:32:16 2024 +0200 kompute : implement op_getrows_f32 (#6403) op_getrows_f32 is required since ggerganov/llama.cpp#6122 for the Vulkan w/ Kompute backend to be functional. As such, implement this op to make this backend functional again.
- Loading branch information