-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync master #15
sync master #15
Commits on Jun 4, 2024
-
ggml : prevent builds with -ffinite-math-only (ggerganov#7726)
This enforces a check that -fno-finite-math-only was set and that the operating compiling mode is not in finite maths mode. This is because during rewriting of silu and softmax for cpu ggerganov#7154 there emerged an issue where the result that was observed when >1 slot was nondeterministic as found by @JohannesGaessler. @LostRuins narrowed the problem down to -ffinite-math-only which was theorised to be due to SiLU, instead of flushing small values to 0, returns NaN or some other garbage. @jart proposed a fix that @ggerganov then implemented in this fix ref ggerganov#7154 (comment)
Configuration menu - View commit details
-
Copy full SHA for 6d16169 - Browse repository at this point
Copy the full SHA 6d16169View commit details -
Per token attributes (ggerganov#7685)
* Add per token attributes enum * Using phi-3 for testing 'rstrip' * Using jina-v2 for testing 'lstrip' * Brute force test for 'lstrip' and 'rstrip' * Implement 'rstrip' and 'lstrip' * Update phi-3 GGUF file (obsolete since 917dc8c) * Replace llama_token_type with llama_token_attribs
Configuration menu - View commit details
-
Copy full SHA for 3b38d48 - Browse repository at this point
Copy the full SHA 3b38d48View commit details -
refine .gitignore (ggerganov#7688)
This adds tags and android ndk into the git ignore list
Configuration menu - View commit details
-
Copy full SHA for b226c12 - Browse repository at this point
Copy the full SHA b226c12View commit details -
Improve hipBLAS support in CMake (ggerganov#7696)
* Improve hipBLAS support in CMake This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK. * Set ROCM_PATH correctly
Configuration menu - View commit details
-
Copy full SHA for 987d743 - Browse repository at this point
Copy the full SHA 987d743View commit details -
llama-bench : allow using a different printer for stderr with -oe (gg…
…erganov#7722) compare-commits.sh : hide stdout, use -oe to print markdown
Configuration menu - View commit details
-
Copy full SHA for adc9ff3 - Browse repository at this point
Copy the full SHA adc9ff3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5ca0944 - Browse repository at this point
Copy the full SHA 5ca0944View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0cd6bd3 - Browse repository at this point
Copy the full SHA 0cd6bd3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 554c247 - Browse repository at this point
Copy the full SHA 554c247View commit details -
common : refactor cli arg parsing (ggerganov#7675)
* common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params
Configuration menu - View commit details
-
Copy full SHA for 1442677 - Browse repository at this point
Copy the full SHA 1442677View commit details -
Allow number of nodes in CUDA graph to change (ggerganov#7738)
Previously the code would have failed to cope in the case that the number of nodes changes in an existing CUDA graph. This fixes the issue by removing an unnecessary conditional.
Configuration menu - View commit details
-
Copy full SHA for b90dc56 - Browse repository at this point
Copy the full SHA b90dc56View commit details -
Configuration menu - View commit details
-
Copy full SHA for c90dbe0 - Browse repository at this point
Copy the full SHA c90dbe0View commit details
Commits on Jun 5, 2024
-
readme : remove -ins (ggerganov#7759)
-ins and --instruct were moved in ggerganov#7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.
Configuration menu - View commit details
-
Copy full SHA for 9973e81 - Browse repository at this point
Copy the full SHA 9973e81View commit details -
ggml : refactor rope norm/neox (ggerganov#7634)
* ggml : unify rope norm/neox (CPU) * ggml : fix compile warning * ggml : remove GLM rope mode ggml-ci * metal : better rope implementation ggml-ci * cuda : better rope implementation ggml-ci * naming : n_orig_ctx -> n_ctx_orig ggml-ci * dev : add reminders to update backends ggml-ci * vulkan : fix ggml_rope_ext() usage * cuda : fix array size + indents ggml-ci
Configuration menu - View commit details
-
Copy full SHA for 2b33896 - Browse repository at this point
Copy the full SHA 2b33896View commit details -
CUDA: refactor mmq, dmmv, mmvq (ggerganov#7716)
* CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits
Configuration menu - View commit details
-
Copy full SHA for 7d1a378 - Browse repository at this point
Copy the full SHA 7d1a378View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7672ade - Browse repository at this point
Copy the full SHA 7672adeView commit details
Commits on Jun 6, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d67caea - Browse repository at this point
Copy the full SHA d67caeaView commit details -
docker : build only main and server in their images (ggerganov#7782)
* add openmp lib to dockerfiles * build only main and server in their docker images
Configuration menu - View commit details
-
Copy full SHA for 2d08b7f - Browse repository at this point
Copy the full SHA 2d08b7fView commit details -
llama : add jina v2 base code (ggerganov#7596)
* feat: add changes to handle jina v2 base code * fix: do not complicate things * fix: fix the usage of the code model * fix: fix comments * fix: fix linting issues * fix: remove ollama patches * style : minor --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f5d7b26 - Browse repository at this point
Copy the full SHA f5d7b26View commit details -
grammars: x{min,max} repetition operator (ggerganov#6640)
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates * grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <[email protected]> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <[email protected]> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <[email protected]> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <[email protected]> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <[email protected]> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 55b2d08 - Browse repository at this point
Copy the full SHA 55b2d08View commit details -
README minor fixes (ggerganov#7798) [no ci]
derievatives --> derivatives
Configuration menu - View commit details
-
Copy full SHA for a143c04 - Browse repository at this point
Copy the full SHA a143c04View commit details -
Added support for . (any character) token in grammar engine. (ggergan…
…ov#6467) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.
Configuration menu - View commit details
-
Copy full SHA for ad675e1 - Browse repository at this point
Copy the full SHA ad675e1View commit details -
imatrix : migrate to gpt_params (ggerganov#7771)
* imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl
Configuration menu - View commit details
-
Copy full SHA for f83351f - Browse repository at this point
Copy the full SHA f83351fView commit details -
Configuration menu - View commit details
-
Copy full SHA for ee459f4 - Browse repository at this point
Copy the full SHA ee459f4View commit details
Commits on Jun 7, 2024
-
check for nans in imatrix and quantize (ggerganov#7807)
* imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values
Configuration menu - View commit details
-
Copy full SHA for c9ee711 - Browse repository at this point
Copy the full SHA c9ee711View commit details -
Configuration menu - View commit details
-
Copy full SHA for d5c938c - Browse repository at this point
Copy the full SHA d5c938cView commit details -
server : do not get prompt in infill mode (ggerganov#7286)
* avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a5cabd7 - Browse repository at this point
Copy the full SHA a5cabd7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7027b27 - Browse repository at this point
Copy the full SHA 7027b27View commit details -
cmake : fix BUILD_SHARED_LIBS=ON build (ggerganov#7784)
common depends on pthreads in Linux
Configuration menu - View commit details
-
Copy full SHA for 27615f5 - Browse repository at this point
Copy the full SHA 27615f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for c00fad7 - Browse repository at this point
Copy the full SHA c00fad7View commit details -
vulkan : reuse parent extra for views (ggerganov#7806)
* vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for da799b4 - Browse repository at this point
Copy the full SHA da799b4View commit details
Commits on Jun 8, 2024
-
server : smart slot selection using Longest Common Prefix (ggerganov#…
…7728) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument
Configuration menu - View commit details
-
Copy full SHA for 7a16ce7 - Browse repository at this point
Copy the full SHA 7a16ce7View commit details -
url: save -mu downloads to new cache location (ggerganov#7826)
* url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file
Configuration menu - View commit details
-
Copy full SHA for d4d915d - Browse repository at this point
Copy the full SHA d4d915dView commit details -
Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (ggergan…
…ov#7682)" (ggerganov#7808) This reverts commit 9422c5e.
Configuration menu - View commit details
-
Copy full SHA for fe1e391 - Browse repository at this point
Copy the full SHA fe1e391View commit details
Commits on Jun 9, 2024
-
gguf-py : decouple adding metadata from writing in GGUFWriter (ggerga…
…nov#7827) Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. In addition use_temp_file is now opt-in instead of opt-out defaulting to False. Also GGUFWriter now does not require output file name until when actually writing to it. And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
Configuration menu - View commit details
-
Copy full SHA for ed9f252 - Browse repository at this point
Copy the full SHA ed9f252View commit details -
convert-hf : match model part name prefix and suffix (ggerganov#7687)
In ggerganov#7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.
Configuration menu - View commit details
-
Copy full SHA for 5795b94 - Browse repository at this point
Copy the full SHA 5795b94View commit details -
convert-hf : set the model name based on cli arg, if present (ggergan…
…ov#7693) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.
Configuration menu - View commit details
-
Copy full SHA for 2decf57 - Browse repository at this point
Copy the full SHA 2decf57View commit details -
Configuration menu - View commit details
-
Copy full SHA for 42b53d1 - Browse repository at this point
Copy the full SHA 42b53d1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3e2ee44 - Browse repository at this point
Copy the full SHA 3e2ee44View commit details -
docs: Added initial PR template with directions for doc only changes …
…and squash merges [no ci] (ggerganov#7700) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <[email protected]> Co-authored-by: compilade <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 57bf62c - Browse repository at this point
Copy the full SHA 57bf62cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e95beeb - Browse repository at this point
Copy the full SHA e95beebView commit details -
flake.lock: Update (ggerganov#7838)
Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 10ceba3 - Browse repository at this point
Copy the full SHA 10ceba3View commit details
Commits on Jun 10, 2024
-
use the correct SYCL context for host USM allocations (ggerganov#7777)
Signed-off-by: Ben Ashbaugh <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for af4ae50 - Browse repository at this point
Copy the full SHA af4ae50View commit details -
CUDA: use tensor cores for MMQ (ggerganov#7676)
* CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early
Configuration menu - View commit details
-
Copy full SHA for 1f0dabd - Browse repository at this point
Copy the full SHA 1f0dabdView commit details -
Configuration menu - View commit details
-
Copy full SHA for d9da0e4 - Browse repository at this point
Copy the full SHA d9da0e4View commit details -
Configuration menu - View commit details
-
Copy full SHA for c28a839 - Browse repository at this point
Copy the full SHA c28a839View commit details -
Configuration menu - View commit details
-
Copy full SHA for fd5ea0f - Browse repository at this point
Copy the full SHA fd5ea0fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 864a99e - Browse repository at this point
Copy the full SHA 864a99eView commit details
Commits on Jun 11, 2024
-
json
: document schema conversion in GBNF readme, align manual gramm……ar examples & converters (ggerganov#7841) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme
Configuration menu - View commit details
-
Copy full SHA for 396b18d - Browse repository at this point
Copy the full SHA 396b18dView commit details -
Configuration menu - View commit details
-
Copy full SHA for b61eb96 - Browse repository at this point
Copy the full SHA b61eb96View commit details -
fix CUDA CI by using a windows-2019 image (ggerganov#7861)
* try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019
Configuration menu - View commit details
-
Copy full SHA for c2ce6c4 - Browse repository at this point
Copy the full SHA c2ce6c4View commit details -
Configuration menu - View commit details
-
Copy full SHA for bdcb8f4 - Browse repository at this point
Copy the full SHA bdcb8f4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4bfe50f - Browse repository at this point
Copy the full SHA 4bfe50fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 148995e - Browse repository at this point
Copy the full SHA 148995eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6fe42d0 - Browse repository at this point
Copy the full SHA 6fe42d0View commit details -
fix broken link in pr template (ggerganov#7880) [no ci]
* fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 14f8352 - Browse repository at this point
Copy the full SHA 14f8352View commit details -
Update Vulkan RoPE implementation (ggerganov#7818)
* Update Vulkan RoPE implementation * Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception Minor fixes * Fix segfault when running out of VRAM Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ef52d1d - Browse repository at this point
Copy the full SHA ef52d1dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 73bac2b - Browse repository at this point
Copy the full SHA 73bac2bView commit details
Commits on Jun 12, 2024
-
Fix a typo and add Fedora 40 pacakge to install for Vulkan (ggerganov…
…#7794) [no ci] Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support
Configuration menu - View commit details
-
Copy full SHA for f2b5764 - Browse repository at this point
Copy the full SHA f2b5764View commit details -
update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (gge…
…rganov#7894) In addition this reverts a workaround we had to do to workaround the upstream issue with expired intel GPG package keys in 2024.0.1-devel-ubuntu22.04
Configuration menu - View commit details
-
Copy full SHA for dcf7527 - Browse repository at this point
Copy the full SHA dcf7527View commit details -
Configuration menu - View commit details
-
Copy full SHA for 704a35b - Browse repository at this point
Copy the full SHA 704a35bView commit details -
ggml : improve ggml_is_contiguous logic (ggerganov#7856)
* ggml : improve ggml_is_contiguous logic ggml-ci * ggml : support more contiguous cases ggml-ci
Configuration menu - View commit details
-
Copy full SHA for bfaa676 - Browse repository at this point
Copy the full SHA bfaa676View commit details -
tests : add non-cont unary tests (ggerganov#7857)
* tests : add non-cont unary tests * ggml : update unary asserts and "supports_op" ggml-ci
Configuration menu - View commit details
-
Copy full SHA for a9cae48 - Browse repository at this point
Copy the full SHA a9cae48View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9635529 - Browse repository at this point
Copy the full SHA 9635529View commit details -
build
: rename main → llama-cli, server → llama-server, llava-cli → ……llama-llava-cli, etc... (ggerganov#7809) * `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew * server: update refs -> llama-server gitignore llama-server * server: simplify nix package * main: update refs -> llama fix examples/main ref * main/server: fix targets * update more names * Update build.yml * rm accidentally checked in bins * update straggling refs * Update .gitignore * Update server-llm.sh * main: target name -> llama-cli * Prefix all example bins w/ llama- * fix main refs * rename {main->llama}-cmake-pkg binary * prefix more cmake targets w/ llama- * add/fix gbnf-validator subfolder to cmake * sort cmake example subdirs * rm bin files * fix llama-lookup-* Makefile rules * gitignore /llama-* * rename Dockerfiles * rename llama|main -> llama-cli; consistent RPM bin prefixes * fix some missing -cli suffixes * rename dockerfile w/ llama-cli * rename(make): llama-baby-llama * update dockerfile refs * more llama-cli(.exe) * fix test-eval-callback * rename: llama-cli-cmake-pkg(.exe) * address gbnf-validator unused fread warning (switched to C++ / ifstream) * add two missing llama- prefixes * Updating docs for eval-callback binary to use new `llama-` prefix. * Updating a few lingering doc references for rename of main to llama-cli * Updating `run-with-preset.py` to use new binary names. Updating docs around `perplexity` binary rename. * Updating documentation references for lookup-merge and export-lora * Updating two small `main` references missed earlier in the finetune docs. * Update apps.nix * update grammar/README.md w/ new llama-* names * update llama-rpc-server bin name + doc * Revert "update llama-rpc-server bin name + doc" This reverts commit e474ef1. * add hot topic notice to README.md * Update README.md * Update README.md * rename gguf-split & quantize bins refs in **/tests.sh --------- Co-authored-by: HanClinto <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1c641e6 - Browse repository at this point
Copy the full SHA 1c641e6View commit details
Commits on Jun 13, 2024
-
move BLAS to a separate backend (ggerganov#6210)
* move BLAS to a separate backend * rename GGML_USE_OPENBLAS to GGML_USE_BLAS * alloc : reuse same buffer when the same buffer type if used multiple times * set number of threads automatically for openblas and blis * sched : print assignments when GGML_SCHED_DEBUG env variable is set * sched : allow ops with weights on an incompatible buffer type This will cause the weight to be copied to a backend that supports the op, which is very costly. The weight should have been stored in a buffer of a backend that can run the op, but llama.cpp cannot do this automatically at the moment. --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f578b86 - Browse repository at this point
Copy the full SHA f578b86View commit details -
Configuration menu - View commit details
-
Copy full SHA for a55eb1b - Browse repository at this point
Copy the full SHA a55eb1bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 172c825 - Browse repository at this point
Copy the full SHA 172c825View commit details
Commits on Jun 14, 2024
-
convert : add Poro-34B-chat tokenizer support (ggerganov#7713)
* support for Poro chat pre-tokenizer * add support for Poro pre-tokenizer * Update convert-hf-to-gguf-update.py Co-authored-by: Georgi Gerganov <[email protected]> * Change Poro-34B-chat to poro-chat * Change Poro-34B-chat to poro-chat * Update convert-hf-to-gguf-update.py * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 41b9260 - Browse repository at this point
Copy the full SHA 41b9260View commit details -
llama : more checks before assuming FIM tokens (ggerganov#7644)
* More checks before assuming FIM tokens for Llama arch * extensive token check
Configuration menu - View commit details
-
Copy full SHA for 6fcd133 - Browse repository at this point
Copy the full SHA 6fcd133View commit details -
llama-bench : fix RPC indication (ggerganov#7936)
Show "<backend_name>+RPC" when RPC offloading is used
Configuration menu - View commit details
-
Copy full SHA for e65bbf6 - Browse repository at this point
Copy the full SHA e65bbf6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 66ef1ce - Browse repository at this point
Copy the full SHA 66ef1ceView commit details -
CUDA: faster q2_K, q3_K MMQ + int8 tensor cores (ggerganov#7921)
* CUDA: faster q2_K, q3_K MMQ + int8 tensor cores * try CI fix * try CI fix * try CI fix * fix data race * rever q2_K precision related changes
Configuration menu - View commit details
-
Copy full SHA for 76d66ee - Browse repository at this point
Copy the full SHA 76d66eeView commit details -
ci : fix macos x86 build (ggerganov#7940)
In order to use old `macos-latest` we should use `macos-12` Potentially will fix: ggerganov#6975
Configuration menu - View commit details
-
Copy full SHA for f8ec887 - Browse repository at this point
Copy the full SHA f8ec887View commit details
Commits on Jun 15, 2024
-
[SYCL] remove global variables (ggerganov#7710)
* separate DPCT helpers outside * replace global variables with context * remove useless extra * update mul_mat condition * remove duplicate buft initialization * remove duplicate extra and global work group size * remove useless backend check * remove duplicated extras * use macro for group_size and remove cuda-related
Configuration menu - View commit details
-
Copy full SHA for 7b2f4a7 - Browse repository at this point
Copy the full SHA 7b2f4a7View commit details -
Add
cvector-generator
example (ggerganov#7514)* add control-vector-generator * calc diff * add comments * proof-of-concept stdlib implementation Implements PCA and file writing using mostly standard libraries. The output is recognized as a functional control vector, but outputs gibberish. * param parsing, refactor, comments Added basic command-line parameters for outfile and one each positive/negative prompt. Refactored some messy code in PCA computation and GGUF exporting. Left a bunch of comments regarding further work needed. * example template completions Implements an example template set built from the positive/negative prompts like the control vector Python implementation. * add multi prompts, multi-thread for PCA * fix mem error * add debugs * fix matrix transpose multiplication you have got to be kidding me * preliminary template/multiprompt support model is running out of context and that ought to be fixed (segfaulting) but other than that it looks goodish * fix zero output & param parsing, functional templating fixed a bug where the output file had no tensor data/was all zero fixed a bug where single hyphen flags were not being correctly parsed implements creation of templated prompts from input (still need to adapt based on model) * fix square_diff matmul index range and CRLF->LF line endings fixed a logic error where square_diff would not multiply all rows fixed a formatting error where the provided completions.txt had CRLF line endings * add command-line args for num threads, num completions file lines, always reload model refactored a few things and did what the commit message says on the tin * code aestheticization * fix compiler warnings * in-series multithreading for prompt embedding? added commented-out code to attempt to start implementing mutlithreading for embedding in main * remove unnecessary multithreading * interim fix memory leak * translated everything but PCA (I think) * tentatively translate the rest * fix ggml errors and make new ones at least it compiles and runs * fix cb_eval * temporary commit while I move dev environments it finally outputs a functioning control vector - "functioning" in the sense that it can be loaded and it clearly has the right idea, but makes the model incoherent * update debug statements * pre-tokenize so we can allocate correct memory to ctx_diffs_wrapped * update comments * (wip) refactor * clean up PCA ggml implementation * fix shape of v_diff_original * add n_batch for pca * working version * remember to copy back the last_eigenvector * fix n_completions * bring back n_completions * default n_pca_batch to 20 * fix macos build * add to makefile all targets * use ggml_format_name * add readme * fix .editorconfig * use ggml_backend_tensor_copy * attemp to fix compile problem on mac * fix compile warn * reuse allocr * move param parser to common * better error handling * clean up a bit * add print_usage * shorten help msg * beautify help msg * escape prompt by default * change compile target to llama-cvector-generator * typo * disable GPU for PCA * code style --------- Co-authored-by: Christian Zhou-Zheng <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0c7b359 - Browse repository at this point
Copy the full SHA 0c7b359View commit details
Commits on Jun 16, 2024
-
Vulkan Shader Refactor, Memory Debugging Option (ggerganov#7947)
* Refactor shaders, extract GLSL code from ggml_vk_generate_shaders.py into vulkan-shaders directory * Improve debug log code * Add memory debug output option * Fix flake8 * Fix unnecessary high llama-3 VRAM use
Configuration menu - View commit details
-
Copy full SHA for 7c7836d - Browse repository at this point
Copy the full SHA 7c7836dView commit details -
Configuration menu - View commit details
-
Copy full SHA for c8a8219 - Browse repository at this point
Copy the full SHA c8a8219View commit details -
Configuration menu - View commit details
-
Copy full SHA for cddaf02 - Browse repository at this point
Copy the full SHA cddaf02View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6fe1c62 - Browse repository at this point
Copy the full SHA 6fe1c62View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5239925 - Browse repository at this point
Copy the full SHA 5239925View commit details -
Configuration menu - View commit details
-
Copy full SHA for bc6c457 - Browse repository at this point
Copy the full SHA bc6c457View commit details -
ggml : remove duplicate include of ggml-common.h (ggml/853)
Signed-off-by: Daniel Bevenius <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 398105f - Browse repository at this point
Copy the full SHA 398105fView commit details -
ggml : fix and optimize ppc64le (ggml/849)
* fix compile issues introduced by loongarch_asx * restore quant changes to merge * fix compile issues introduced by loongarch_asx * further optimize by using vec_msum & vec_sum4s on ppc64le
Configuration menu - View commit details
-
Copy full SHA for b5fcf8e - Browse repository at this point
Copy the full SHA b5fcf8eView commit details -
cuda : fix bounds check for src0 rows in MMVQ kernel (whisper/2231)
* cuda : fix bounds check for src0 rows in MMVQ kernel * Update ggml-cuda/mmvq.cu Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 19b7a83 - Browse repository at this point
Copy the full SHA 19b7a83View commit details -
Add support for sqrt on CUDA (ggerganov#7953)
* cuda sqrt support * enable cuda in pca * fix comments in pca * add test * add sqrt to ggml_backend_cuda_supports_op * fix test * new line * Use F32 sqrtf instead of F64 sqrt Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 43b35e3 - Browse repository at this point
Copy the full SHA 43b35e3View commit details
Commits on Jun 17, 2024
-
[SYCL] Update README-sycl.md for Chapter "Recommended release" and "N…
…ews" (ggerganov#7946) * Update README-sycl.md * Update README-sycl.md * Update README-sycl.md * Update README-sycl.md
Configuration menu - View commit details
-
Copy full SHA for df68d4f - Browse repository at this point
Copy the full SHA df68d4fView commit details -
gguf-dump.py: add --markdown dump output (ggerganov#7853)
* gguf-dump.py: add --markdown dump output * gguf-dump.py: Add toc * gguf-dump.py: use standard tensor name lookup. Also add tensor ID field * gguf-dump.py: Add tensor overview count * gguf-dump.py: fix array preview * gguf-dump.py: markdownTableWithAlignmentSupport() added * Add type hints and spacing Co-authored-by: compilade <[email protected]> * gguf-dump.py: prettyfy dimention * gguf-dump: right align element count * gguf-dump.py: element count autosizing * Apply suggestions from code review Co-authored-by: compilade <[email protected]> --------- Co-authored-by: compilade <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 006167a - Browse repository at this point
Copy the full SHA 006167aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 21be9ca - Browse repository at this point
Copy the full SHA 21be9caView commit details -
Implement non-mapped async IO for CUDA on Windows. (ggerganov#7896)
* Implement non-mapped async IO for CUDA on Windows. On a fast Gen5 NVMe drive this change improves model load time by >3x while it should be the same (or slightly faster) on any other drive. * Free resources except for backend. * Change assertions to exceptions in llama_file, find correct cuda backend to create CUDA resources and respect the use_mmap flag again for CUDA. * Apply suggestions from code review Co-authored-by: slaren <[email protected]> * Fix editorconfig and unused variable * Fix issues with Windows build --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6a2f0b3 - Browse repository at this point
Copy the full SHA 6a2f0b3View commit details -
fix: divide 0 exception in mamba (ggerganov#7932)
Signed-off-by: thxCode <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c637fcd - Browse repository at this point
Copy the full SHA c637fcdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 99052cd - Browse repository at this point
Copy the full SHA 99052cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for b473e95 - Browse repository at this point
Copy the full SHA b473e95View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7c26775 - Browse repository at this point
Copy the full SHA 7c26775View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b6da18 - Browse repository at this point
Copy the full SHA 5b6da18View commit details -
update: support Qwen2-57B-A14B (ggerganov#7835)
* update: convert-hf-to-gguf.py to support Qwen2-57B-A14B * fix: QWEN2MOE support for expert_feed_forward_length previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shared_exp are now properly calculated * update: convert-hf-to-gguf.py cleanup for Qwen2MoeForCausalLM * fix: QWEN2MOE support for expert_feed_forward_length previously, expert ff was taken from n_ff (intermediate size) but it is now properly taken from LLM_KV_EXPERT_FEED_FORWARD_LENGTH n_ff_exp and n_ff_shexp are now properly calculated
Configuration menu - View commit details
-
Copy full SHA for a94e6ff - Browse repository at this point
Copy the full SHA a94e6ffView commit details
Commits on Jun 18, 2024
-
whisper : use ggml_backend_sched (whisper/2239)
* whisper : use ggml_backend_sched (wip) * use sched in whisper_allocr * whisper : single backend in whisper_context * whisper : remove whisper_state->backends_used * whisper : remove whisper_context->backend * whisper : reset scheduler after init * whisper : fix external encoder (e.g. CoreML) * whisper : cleanup * whisper : handle null GPU buffer types + fix sycl --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e6ecc2b - Browse repository at this point
Copy the full SHA e6ecc2bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5326bcc - Browse repository at this point
Copy the full SHA 5326bccView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1193778 - Browse repository at this point
Copy the full SHA 1193778View commit details -
chore: clean useless beam search param (ggerganov#7985)
Signed-off-by: thxCode <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b96f9af - Browse repository at this point
Copy the full SHA b96f9afView commit details -
Allow compiling with CUDA without CUDA runtime installed (ggerganov#7989
) On hosts which are not prepared/dedicated to execute code using CUDA it is still possible to compile llama.cpp with CUDA support by just installing the development packages. Missing are the runtime libraries like /usr/lib64/libcuda.so* and currently the link step will fail. The development environment is prepared for such situations. There are stub libraries for all the CUDA libraries available in the $(CUDA_PATH)/lib64/stubs directory. Adding this directory to the end of the search path will not change anything for environments which currently work fine but will enable compiling llama.cpp also in case the runtime code is not available.
Configuration menu - View commit details
-
Copy full SHA for 6166527 - Browse repository at this point
Copy the full SHA 6166527View commit details -
Configuration menu - View commit details
-
Copy full SHA for 84f6de1 - Browse repository at this point
Copy the full SHA 84f6de1View commit details -
Only use FIM middle token if it exists (ggerganov#7648)
* Only use FIM middle if it exists * Only use FIM middle if it exists
Configuration menu - View commit details
-
Copy full SHA for 91c188d - Browse repository at this point
Copy the full SHA 91c188dView commit details -
tokenizer : BPE fixes (ggerganov#7530)
* Random test: add_bos_token, add_eos_token * Random test: add BPE models for testing * Custom regex split fails with codepoint 0 * Fix falcon punctuation regex * Refactor llm_tokenizer_bpe: move code to constructor * Move 'add_special_bos/eos' logic to llm_tokenizer_bpe * Move tokenizer flags to vocab structure. * Default values for special_add_bos/eos * Build vocab.special_tokens_cache using vocab token types * Generalize 'jina-v2' per token attributes * Fix unicode whitespaces (deepseek-coder, deepseek-llm) * Skip missing byte tokens (falcon) * Better unicode data generation * Replace char32_t with uint32_t
Configuration menu - View commit details
-
Copy full SHA for 37bef89 - Browse repository at this point
Copy the full SHA 37bef89View commit details
Commits on Jun 19, 2024
-
[SYCL] refactor (ggerganov#6408)
* seperate lower precision GEMM from the main files * fix workgroup size hardcode
Configuration menu - View commit details
-
Copy full SHA for 623494a - Browse repository at this point
Copy the full SHA 623494aView commit details -
Configuration menu - View commit details
-
Copy full SHA for a04a953 - Browse repository at this point
Copy the full SHA a04a953View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9c77ec1 - Browse repository at this point
Copy the full SHA 9c77ec1View commit details -
un-ignore
build-info.cmake
andbuild-info.sh
(ggerganov#7996)* un-ignore `build-info.cmake` and `build-info.sh` I am assuming that ignoring them was unintentional. If they are ignored, some tools, like cargo, will consider the files inexistent, even if they're comitted, for the purpose of publishing. This leads to the build failing in such cases. * un-ignore `build-info.cpp.in` For the same reason as the previous two files. * Reorganize `.gitignore` * Add exceptions for files mentioned by @slaren I did leave .clang-tidy since it was explicitly ignored before. * Add comments for organization * Sort some lines for pretty * Test with `make` and `cmake` builds to ensure no build artifacts might be comitted * Remove `.clang-tidy` from `.gitignore` Per comment by @ggerganov * Remove `IDEWorkspaceChecks.plist` from root-level `.gitignore`
Configuration menu - View commit details
-
Copy full SHA for a785474 - Browse repository at this point
Copy the full SHA a785474View commit details -
Configuration menu - View commit details
-
Copy full SHA for ba58993 - Browse repository at this point
Copy the full SHA ba58993View commit details
Commits on Jun 20, 2024
-
metal : fix
ggml_metal_supports_op
for BF16 (ggerganov#8021)Currently the Metal backend does not support BF16. `ggml_metal_supports_op` was returning true in these cases, leading to a crash with models converted with `--leave-output-tensor`. This commit checks if the first few sources types are BF16 and returns false if that's the case.
Configuration menu - View commit details
-
Copy full SHA for 2075a66 - Browse repository at this point
Copy the full SHA 2075a66View commit details -
CUDA: stream-k decomposition for MMQ (ggerganov#8018)
* CUDA: stream-k decomposition for MMQ * fix undefined memory reads for small matrices
Configuration menu - View commit details
-
Copy full SHA for d50f889 - Browse repository at this point
Copy the full SHA d50f889View commit details -
[SYCL] Fix windows build and inference (ggerganov#8003)
* add sycl preset * fix debug link error. fix windows crash * update README
Configuration menu - View commit details
-
Copy full SHA for de391e4 - Browse repository at this point
Copy the full SHA de391e4View commit details -
common: fix warning (ggerganov#8036)
* common: fix warning * Update common/common.cpp Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for abd894a - Browse repository at this point
Copy the full SHA abd894aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 17b291a - Browse repository at this point
Copy the full SHA 17b291aView commit details -
Configuration menu - View commit details
-
Copy full SHA for b1ef562 - Browse repository at this point
Copy the full SHA b1ef562View commit details
Commits on Jun 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0e64591 - Browse repository at this point
Copy the full SHA 0e64591View commit details -
llama : allow pooled embeddings on any model (ggerganov#7477)
* create append_pooling operation; allow to specify attention_type; add last token pooling; update examples * find result_norm/result_embd tensors properly; update output allocation logic * only use embd output for pooling_type NONE * get rid of old causal_attn accessor * take out attention_type; add in llama_set_embeddings * bypass logits when doing non-NONE pooling
Configuration menu - View commit details
-
Copy full SHA for 80ea089 - Browse repository at this point
Copy the full SHA 80ea089View commit details -
Configuration menu - View commit details
-
Copy full SHA for a927b0f - Browse repository at this point
Copy the full SHA a927b0fView commit details -
ggml : AVX IQ quants (ggerganov#7845)
* initial iq4_xs * fix ci * iq4_nl * iq1_m * iq1_s * iq2_xxs * iq3_xxs * iq2_s * iq2_xs * iq3_s before sllv * iq3_s * iq3_s small fix * iq3_s sllv can be safely replaced with sse multiply
Configuration menu - View commit details
-
Copy full SHA for 7d5e877 - Browse repository at this point
Copy the full SHA 7d5e877View commit details -
vulkan: detect multiple devices by deviceUUID instead of deviceID (gg…
…erganov#8022) * vulkan: detect multiple devices by deviceUUID instead of deviceID * vulkan: remove unneeded variables * vulkan: fix id query
Configuration menu - View commit details
-
Copy full SHA for 557b653 - Browse repository at this point
Copy the full SHA 557b653View commit details
Commits on Jun 22, 2024
-
JSON Schema to GBNF integration tests (ggerganov#7790)
* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars. * Adding additional examples as documented in ggerganov#7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program. * Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs. * Merging improved schema test methods added by @ochafik in ggerganov#7797 * Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework. * Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings. * Fixing grammar indentation to be consistent throughout file.
Configuration menu - View commit details
-
Copy full SHA for c5a8d4b - Browse repository at this point
Copy the full SHA c5a8d4bView commit details -
Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…
…alues (ggerganov#8058) Uses the values computed by @JohannesGaessler in PR ggerganov#7413
Configuration menu - View commit details
-
Copy full SHA for 5b48cd5 - Browse repository at this point
Copy the full SHA 5b48cd5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3aa184a - Browse repository at this point
Copy the full SHA 3aa184aView commit details -
cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (
ggerganov#8052) * Update negative.txt * Update positive.txt * Update cvector-generator.cpp * Update cvector-generator.cpp
Configuration menu - View commit details
-
Copy full SHA for adf480c - Browse repository at this point
Copy the full SHA adf480cView commit details -
cvector: fix CI + correct help message (ggerganov#8064)
* cvector: fix CI + correct help message * also correct --pca-iter
Configuration menu - View commit details
-
Copy full SHA for 3e58b0e - Browse repository at this point
Copy the full SHA 3e58b0eView commit details -
Configuration menu - View commit details
-
Copy full SHA for b5a5f34 - Browse repository at this point
Copy the full SHA b5a5f34View commit details
Commits on Jun 23, 2024
-
Refactor Vulkan backend to allow multiple contexts (ggerganov#7961)
* Refactor Vulkan backend to allow multiple contexts * Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs * Fix Vulkan debug build error
Configuration menu - View commit details
-
Copy full SHA for 45c0e2e - Browse repository at this point
Copy the full SHA 45c0e2eView commit details -
fix CI failures (ggerganov#8066)
* test-backend-ops : increase cpy max nmse * server ci : disable thread sanitizer
Configuration menu - View commit details
-
Copy full SHA for b6b9a8e - Browse repository at this point
Copy the full SHA b6b9a8eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 11318d9 - Browse repository at this point
Copy the full SHA 11318d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a2f298 - Browse repository at this point
Copy the full SHA 6a2f298View commit details -
llama : add support for BitnetForCausalLM (ggerganov#7931)
* hf bitnet v1 * hf bitnet e2e v2 * finish bitnet e2e * finish f16 hf bitnet e2e * remove unsed * finish bitnet i2 e2e * move i2s to quantize v1 * move i2 to quantize * clean code * clean code 2 * fix codestyle * fix code * fix * fix code * fix merge * remove unused * change table name * fix whitespace * delete redundant * i2_s to absmax * finish i2_s/i8_s vec_dot x86 simd * i2s->q22 * fix code * remove block scale * add dequantize * fix seq * update avx2 * remove q2_2 * remove q22_grid * fix whitespace * reuse llm_build_kv * fix bo --------- Co-authored-by: root <root@wangjinheng>
Configuration menu - View commit details
-
Copy full SHA for e112b61 - Browse repository at this point
Copy the full SHA e112b61View commit details
Commits on Jun 24, 2024
-
ggml : remove ggml_task_type and GGML_PERF (ggerganov#8017)
* ggml : remove ggml_task_type and GGML_PERF * check abort_callback on main thread only * vulkan : remove usage of ggml_compute_params * remove LLAMA_PERF
Configuration menu - View commit details
-
Copy full SHA for 95f57bb - Browse repository at this point
Copy the full SHA 95f57bbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 77beb4d - Browse repository at this point
Copy the full SHA 77beb4dView commit details