0.0.4
New params:
--stop-words Stop1;stopword2
, stop words for llama, separated by semicolon ;
--min-tokens N
, the minimum number of tokens in the response, if it is less, then llama removes the found stop word, increases the temperature for one token and generates further. Useful for Russian language; the answers in Russian RP are usually very short.
--split-after N
, split the first sentence after N tokens and immediately send to xtts. Relevant for large and slow models, for example mixtral.
--seqrep
, prevents loops. Searches for the latest 20 characters in the last 300 characters. If it finds it, it deletes it, raises the temperature for one token and generates it further.
--xtts-intro
, say a random Mmm/Well/... using xtts right after user input. Relevant for large and slow models, for example mixtral.