Releases: Mozer/talk-llama-fast
0.2.0
-
Added support for gemma-2 and mistral-nemo.
-
Added multiple gpu support. Don't set those 3 params if you have just 1 gpu.
--main-gpu 0
- set main gpu id with kv-cache: 0, 1, ...
--split-mode none
- none
or layer
. split-mode tensor is not supported
--tensor-split 0.5,0.5
- how to split layers or tensors per gpus, array of floats.
- Added instruct mode with presets. It is optional and experimental. There are still some bugs.
--instruct-preset gemma
where gemma is the name of the file \instruct_presets\gemma.json
Instruct mode helps to make responses longer and smarter. You can find correct instruct-preset for each model at the model card on huggingface or in sillytavern - formatting - instruct mode sequences.
Example dialogue in assisttant.txt should also be formatted using instruct mode tags. I added gemma and mistral instruct presets. And added some bats to run gemma and nemo in instruct mode.
- Added
-debug
to print whole context dialogue after each LLM response. Useful to see if there's something wrong with formatting.
0.1.8
0.1.7
- Added
--push-to-talk
option: hold "Alt" key to speak (useful with loudspeakers without headphones). Turned off by default. - And now you can use Cyrillic letters in bat files. Save them using Cyrillic "OEM 866" encoding, notepad++ supports it. (В bat файлах теперь можно использовать кириллицу. Для этого сохраните bat файл в кодировке "OEM 866" в приложении notepad++: Encoding -> Character sets -> Cyrillic -> OEM 866).
- Added
talk-llama-fast-v0.1.7_no_avx2.zip
for old CPU's without AVX2 instructions (e.g. Intel i5-2500K). May be a little slower. Use it if main version crashes without an error.
0.1.6
-bug fix with start prompt:
start prompt was not written correctly into context when running with default --batch-size 64
parameter or without it. Llama couldn't remember anything from the start prompt (just first 64 tokens). This bug first came in v0.1.4 and no one noticed.
0.1.5
New features:
- Keyboard input (finally you can type messages using keyboard now).
- You can copy and paste text into talk-llama-fast window.
- Hotkeys: Stop(Ctrl+Space), Regenerate(Ctrl+Right), Delete(Ctrl+Delete), Reset(Ctrl+R).
- Bug fix:
Reset
command works fine now, no bugs with long context.
New bugs:
- Sometimes when you type fast, the first letter of your message is not typed. (Type a little slower. I have to investigate more, what is causing this bug).
- When you paste more than 1 passage of text into llama, the last passage has to have
\n
(new line symbol) at the end, otherwise you have to hit Enter to paste this last passage into llama console window.
0.1.4
New params:
-
--batch-size
(default 64) - process start prompt and user input in batches. With 64 llama takes 0.6 GB less VRAM than it was before with 1024. 64 is fine for small and fast models, for big LLMs try larger values, e.g. 512. -
--verbose
to show speed in tps for testing. Use with--sleep-before-xtts 0
or it will be showing incorrect time including sleep time. -
Start prompt is now not limited in length. But keep start prompt length <
--ctx_size
. Otherwise llama will be several times slower (bug). Increase ctx_size if needed (takes more vram). -
Stop words now support
\n
and\r
symbols. E.g.:--stop-words "Alexej:;---;assistant;===;**;Note:;\n\n;\r\n\r\n;\begin;\end;###; (;["
0.1.3
- removed --xtts-control-path param. Now it is not needed anymore. xtts control file is stored in temp dir. No need to set xtts_play_allowed_path it in extras.
- added missing default.wav voice
No other changes.
To make this version work - please update xtts_api_server, tts, and wav2lip if you have previous versions installed. (pip install with each, as in readme)
0.1.2
0.1.1
Now with wav2llip!
Usage demonstration in Russian (видео на русском): https://youtu.be/ciyEsZpzbM8
English video is coming soon.
changes:
path fixes in xtts .bats