sync master #15

tc-mb · 2024-06-24T03:22:32Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

@JohannesGaessler

This enforces a check that -fno-finite-math-only was set and that the operating compiling mode is not in finite maths mode. This is because during rewriting of silu and softmax for cpu ggerganov#7154 there emerged an issue where the result that was observed when >1 slot was nondeterministic as found by @JohannesGaessler. @LostRuins narrowed the problem down to -ffinite-math-only which was theorised to be due to SiLU, instead of flushing small values to 0, returns NaN or some other garbage. @jart proposed a fix that @ggerganov then implemented in this fix ref ggerganov#7154 (comment)

* Add per token attributes enum * Using phi-3 for testing 'rstrip' * Using jina-v2 for testing 'lstrip' * Brute force test for 'lstrip' and 'rstrip' * Implement 'rstrip' and 'lstrip' * Update phi-3 GGUF file (obsolete since 917dc8c) * Replace llama_token_type with llama_token_attribs

This adds tags and android ndk into the git ignore list

* Improve hipBLAS support in CMake This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK. * Set ROCM_PATH correctly

…erganov#7722) compare-commits.sh : hide stdout, use -oe to print markdown

ggml-ci

* common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params

Previously the code would have failed to cope in the case that the number of nodes changes in an existing CUDA graph. This fixes the issue by removing an unnecessary conditional.

-ins and --instruct were moved in ggerganov#7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.

* ggml : unify rope norm/neox (CPU) * ggml : fix compile warning * ggml : remove GLM rope mode ggml-ci * metal : better rope implementation ggml-ci * cuda : better rope implementation ggml-ci * naming : n_orig_ctx -> n_ctx_orig ggml-ci * dev : add reminders to update backends ggml-ci * vulkan : fix ggml_rope_ext() usage * cuda : fix array size + indents ggml-ci

* CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits

* add openmp lib to dockerfiles * build only main and server in their docker images

* feat: add changes to handle jina v2 base code * fix: do not complicate things * fix: fix the usage of the code model * fix: fix comments * fix: fix linting issues * fix: remove ollama patches * style : minor --------- Co-authored-by: Georgi Gerganov <[email protected]>

* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates * grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <[email protected]> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <[email protected]> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <[email protected]> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <[email protected]> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <[email protected]> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <[email protected]>

derievatives --> derivatives

…ov#6467) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.

* imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl

* imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values

* avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <[email protected]>

common depends on pthreads in Linux

* vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <[email protected]>

…erganov#8022) * vulkan: detect multiple devices by deviceUUID instead of deviceID * vulkan: remove unneeded variables * vulkan: fix id query

@ochafik

* Adding simple bare-bones test for end-to-end integration test for json validation against auto-generated JSON-schema grammars. * Adding additional examples as documented in ggerganov#7789 . Also adding the ability to automatically output improperly failing grammars to debug output files so they can more easily be examined in the gbnf-validator program. * Uncommenting formerly commented tests so that they fail for others who are attempting to reproduce the bugs. * Merging improved schema test methods added by @ochafik in ggerganov#7797 * Adding #define to temporarily remove failing tests so that this PR can pass CI, but still be useful for other PRs that want to leverage the framework. * Fixing nits from ochafik. Removing escape slashes, adding additional failing cases, fixing some other strings. * Fixing grammar indentation to be consistent throughout file.

@JohannesGaessler

…alues (ggerganov#8058) Uses the values computed by @JohannesGaessler in PR ggerganov#7413

ggerganov#8052) * Update negative.txt * Update positive.txt * Update cvector-generator.cpp * Update cvector-generator.cpp

* cvector: fix CI + correct help message * also correct --pca-iter

* Refactor Vulkan backend to allow multiple contexts * Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs * Fix Vulkan debug build error

* test-backend-ops : increase cpy max nmse * server ci : disable thread sanitizer

* hf bitnet v1 * hf bitnet e2e v2 * finish bitnet e2e * finish f16 hf bitnet e2e * remove unsed * finish bitnet i2 e2e * move i2s to quantize v1 * move i2 to quantize * clean code * clean code 2 * fix codestyle * fix code * fix * fix code * fix merge * remove unused * change table name * fix whitespace * delete redundant * i2_s to absmax * finish i2_s/i8_s vec_dot x86 simd * i2s->q22 * fix code * remove block scale * add dequantize * fix seq * update avx2 * remove q2_2 * remove q22_grid * fix whitespace * reuse llm_build_kv * fix bo --------- Co-authored-by: root <root@wangjinheng>

* ggml : remove ggml_task_type and GGML_PERF * check abort_callback on main thread only * vulkan : remove usage of ggml_compute_params * remove LLAMA_PERF

ggerganov and others added 30 commits June 4, 2024 17:01

refine .gitignore (ggerganov#7688)

b226c12

This adds tags and android ndk into the git ignore list

Improve hipBLAS support in CMake (ggerganov#7696)

987d743

* Improve hipBLAS support in CMake This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK. * Set ROCM_PATH correctly

llama-bench : allow using a different printer for stderr with -oe (gg…

adc9ff3

…erganov#7722) compare-commits.sh : hide stdout, use -oe to print markdown

readme : remove obsolete Zig instructions (ggerganov#7471)

5ca0944

llama : remove beam search (ggerganov#7736)

0cd6bd3

ggml : remove OpenCL (ggerganov#7735)

554c247

ggml-ci

Allow number of nodes in CUDA graph to change (ggerganov#7738)

b90dc56

Previously the code would have failed to cope in the case that the number of nodes changes in an existing CUDA graph. This fixes the issue by removing an unnecessary conditional.

Fix per token atrributes bits (ggerganov#7749)

c90dbe0

readme : remove -ins (ggerganov#7759)

9973e81

-ins and --instruct were moved in ggerganov#7675 I have adjusted the README accordingly. There was no trace of --chatml in the README.

CUDA: refactor mmq, dmmv, mmvq (ggerganov#7716)

7d1a378

* CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits

Fix encoding in python scripts (ggerganov#7733)

7672ade

docker : add openmp lib (ggerganov#7780)

d67caea

docker : build only main and server in their images (ggerganov#7782)

2d08b7f

* add openmp lib to dockerfiles * build only main and server in their docker images

README minor fixes (ggerganov#7798) [no ci]

a143c04

derievatives --> derivatives

Added support for . (any character) token in grammar engine. (ggergan…

ad675e1

…ov#6467) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.

imatrix : migrate to gpt_params (ggerganov#7771)

f83351f

* imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl

server : fix --threads-http arg (ggerganov#7801)

ee459f4

check for nans in imatrix and quantize (ggerganov#7807)

c9ee711

* imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values

[SYCL] fix softmax r2r result wrong issue (ggerganov#7811)

d5c938c

server : do not get prompt in infill mode (ggerganov#7286)

a5cabd7

* avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <[email protected]>

server: update cache_prompt documentation [no ci] (ggerganov#7745)

7027b27

cmake : fix BUILD_SHARED_LIBS=ON build (ggerganov#7784)

27615f5

common depends on pthreads in Linux

gguf-split : change binary multi-byte units to decimal (ggerganov#7803)

c00fad7

vulkan : reuse parent extra for views (ggerganov#7806)

da799b4

* vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <[email protected]>

Adriankhl and others added 14 commits June 21, 2024 10:28

vulkan: detect multiple devices by deviceUUID instead of deviceID (gg…

557b653

…erganov#8022) * vulkan: detect multiple devices by deviceUUID instead of deviceID * vulkan: remove unneeded variables * vulkan: fix id query

Update llama-quantize ppl/file size output from LLaMA-v1 to Llama-3 v…

5b48cd5

…alues (ggerganov#8058) Uses the values computed by @JohannesGaessler in PR ggerganov#7413

convert-hf : change assert to exception (ggerganov#8015)

3aa184a

cvector-generator: Moe Moe Fixie-Fixie for Lots of Formats~! ♡(ᐢ ᴥ ᐢ)♡ (

adf480c

ggerganov#8052) * Update negative.txt * Update positive.txt * Update cvector-generator.cpp * Update cvector-generator.cpp

cvector: fix CI + correct help message (ggerganov#8064)

3e58b0e

* cvector: fix CI + correct help message * also correct --pca-iter

Removing extra blank lines that were breaking Lint. (ggerganov#8067)

b5a5f34

Refactor Vulkan backend to allow multiple contexts (ggerganov#7961)

45c0e2e

* Refactor Vulkan backend to allow multiple contexts * Fix too many shader groups called validation error in llama3 on AMD and Intel GPUs * Fix Vulkan debug build error

fix CI failures (ggerganov#8066)

b6b9a8e

* test-backend-ops : increase cpy max nmse * server ci : disable thread sanitizer

Fix typo in llama_set_embeddings comment (ggerganov#8077)

11318d9

server : fix JSON-Scheme typo (ggerganov#7975)

6a2f298

ggml : remove ggml_task_type and GGML_PERF (ggerganov#8017)

95f57bb

* ggml : remove ggml_task_type and GGML_PERF * check abort_callback on main thread only * vulkan : remove usage of ggml_compute_params * remove LLAMA_PERF

Merge branch 'prepare-PR-of-minicpm-v2.5' into master

77beb4d

tc-mb merged commit cb8cfb9 into prepare-PR-of-minicpm-v2.5 Jun 24, 2024
9 of 43 checks passed

github-actions bot added documentation Improvements or additions to documentation examples SYCL Nvidia GPU Vulkan testing build devops python server ggml Kompute Apple Metal script nix labels Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync master #15

sync master #15

tc-mb commented Jun 24, 2024

sync master #15

sync master #15

Conversation

tc-mb commented Jun 24, 2024