Releases: LLukas22/llm-rs-python
Custom RoPE support & Small Langchain bugfixes
Better HuggingfaceHub Integration
Simplified the interaction with other GGML based repos. Like TheBloke/Llama-2-7B-GGML created by TheBloke.
Stable GPU Support
Fixed many gpu acceleration bugs in rustformers\llm
and improved performance to match native ggml
.
Experimental GPU support
Adds support for Metal/CUDA and OpenCL acceleration for LLama-based models.
Adds CI for the different acceleration backends to create prebuild binaries
Added 🌾🔱 Haystack Support + BigCode-Models
- Added support for the haystack library
- Support "BigCode" like models (e.g. WizardCoder) via the
gpt2
architecture
Added 🦜️🔗 LangChain support
Merge pull request #21 from LLukas22/feat/langchain Add LangChain support
Added Huggingface Tokenizer Support
AutoModel
compatible models will now use the official tokenizers
library, which improves the decoding accuracy, especially for all non llama based models.
If you want to specify a tokenizer manually, it can be set via the tokenizer_path_or_repo_id
parameter. If you want to use the default GGML tokenizer the huggingface support can be disabled via use_hf_tokenizer
.
Fixed GPT-J quantization
0.2.8 GPT-J quantization bugfix
Added other quantization formats
Added support for q5_0
,q5_1
and q8_0
formats.
Streaming support
Added the stream
method to each model, which returns a generator that can be consumed to generate a response.