Releases: zhudotexe/kani
v1.2.4
v1.2.3
v1.2.2
- fix(mistral): ensure prompt and completion tokens are passed through in the MistralFunctionCallingAdapter when streaming
- fix(streaming): don't emit text in DummyStream if it is None
- feat: add standalone width formatters
- docs: gpt-3.5-turbo -> gpt-4o-mini defaults
- fix(streaming): potential line len miscount in format_stream
v1.2.1
- Fixes various issues in the
MistralFunctionCallingAdapter
wrapper engine for Mistral-Large and Mistral-Small function calling models. - Fixes an issue in
PromptPipeline.explain()
where manual examples would not be explained. - Fixes an issue in
PromptPipeline.ensure_bound_function_calls()
where passing an ID translator would mutate the ID of the underlying messages
v1.2.0
New Features
- Hugging Face: Models loaded through the
HuggingEngine
now use chat templates for conversational prompting and tool usage if available by default. This should make it much easier to get started with a Hugging Face model in Kani. - Added the ability to supply a custom tokenizer to the
OpenAIEngine
(e.g., for using OpenAI-compatible APIs)\
Fixes/Improvements
- Fixed a missing dependency in the
llama
extra - The
HuggingEngine
will now automatically setdevice_map="auto"
if theaccelerate
library is installed
v1.1.1
v1.1.0
- Added
max_function_rounds
toKani.full_round
,Kani.full_round_str
, andKani.full_round_stream
:The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined.
Default unlimited (continues until model's response does not contain a function call). - Added
__repr__
to engines - Fixed an issue where Kani could underestimate the token usage for certain OpenAI models using parallel function calling
v1.0.2
- Add
Kani.add_completion_to_history
(useful for token counting, see #29) - Add an example of an AIFunction definition to
PromptPipeline.explain()
when a function-related step is included - Add
id_translator
arg toPromptPipeline.ensure_bound_function_calls()
- Ensure that OpenAIEngine and HuggingEngine streams yield a completion including prompt and completion token usage
- Various Mistral-7B Instruct v0.3 prompt fixes
v1.0.1
v1.0.0
New Features
Streaming
kani now supports streaming to print tokens from the engine as they are received! Streaming is designed to be a drop-in superset of the chat_round
and full_round
methods, allowing you to gradually refactor your code without ever leaving it in a broken state.
To request a stream from the engine, use Kani.chat_round_stream()
or Kani.full_round_stream()
. These methods will return a StreamManager
, which you can use in different ways to consume the stream.
The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of str.
# CHAT ROUND:
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
print(token, end="")
msg = await stream.message()
# FULL ROUND:
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"):
async for token in stream:
print(token, end="")
msg = await stream.message()
After a stream finishes, its contents will be available as a ChatMessage
. You can retrieve the final message or BaseCompletion with:
msg = await stream.message()
completion = await stream.completion()
The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.
Tip
For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:
msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
(note the await that is not present in the above examples). This allows you to refactor your code by changing chat_round to chat_round_stream without other changes.
- msg = await ai.chat_round("What is the airspeed velocity of an unladen swallow?")
+ msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
Issue: #30
New Models
kani now has bundled support for the following new models:
Hosted
- Claude 3 (including function calling)
Open Source
- Llama 3 (all sizes)
- Command R and Command R+ (including function calling)
- Mistral-7B and Mixtral-8x7B
- Gemma (all sizes)
Although these models have built-in support, kani supports every chat model available on Hugging Face through transformers
or llama.cpp
using the new Prompt Pipelines feature (see below)!
Issue: #34
llama.cpp
To use GGUF-quantized versions of models, kani now supports the LlamaCppEngine
, which uses the llama-cpp-python
library to interface with the llama.cpp
library. Any model with a GGUF version is compatible with this engine!
Prompt Pipelines
A prompt pipeline creates a reproducible pipeline for translating a list of ChatMessage
into an engine-specific format using fluent-style chaining.
To build a pipeline, create an instance of PromptPipeline()
and add steps by calling the step methods documented below. Most pipelines will end with a call to one of the terminals, which translates the intermediate form into the desired output format.
Pipelines come with a built-in explain()
method to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).
Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:
from kani import PromptPipeline, ChatRole
LLAMA2_PIPELINE = (
PromptPipeline()
# System messages should be wrapped with this tag. We'll translate them to USER
# messages since a system and user message go together in a single [INST] pair.
.wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n")
.translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER)
# If we see two consecutive USER messages, merge them together into one with a
# newline in between.
.merge_consecutive(role=ChatRole.USER, sep="\n")
# Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of
# generations).
.merge_consecutive(role=ChatRole.ASSISTANT, sep=" ")
# Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our
# message list ends with an ASSISTANT message, don't add the EOS token
# (we want the model to continue the generation).
.conversation_fmt(
user_prefix="<s>[INST] ",
user_suffix=" [/INST]",
assistant_prefix=" ",
assistant_suffix=" </s>",
assistant_suffix_if_last="",
)
)
# We can see what this pipeline does by calling explain()...
LLAMA2_PIPELINE.explain()
# And use it in our engine to build a string prompt for the LLM.
prompt = LLAMA2_PIPELINE(ai.get_prompt())
Integration with HuggingEngine and LlamaCppEngine
Previously, to use a model with a different prompt format than the ones bundled with the library, one had to create a subclass of the HuggingEngine
to implement the prompting scheme. With the release of Prompt Pipelines, you can now supply a PromptPipeline
in addition to the model ID to use the HuggingEngine
directly!
For example, the LlamaEngine
(huggingface) is now equivalent to the following:
engine = HuggingEngine(
"meta-llama/Llama-2-7b-chat-hf",
prompt_pipeline=LLAMA2_PIPELINE
)
The engine will use the passed pipeline to automatically infer a model's token usage, making it easier than ever to implement new models.
Issue: #32
Improvements
- The
OpenAIEngine
now uses the officialopenai-python
package. (#31)- This means that
aiohttp
is no longer a direct dependency, and theHTTPClient
has been deprecated. For API-based models, we recommend using thehttpx
library.
- This means that
- Added arguments to the
chat_in_terminal
helper to control maximum width, echo user inputs, show function call arguments and results, and other interactive utilities (#33) - The
HuggingEngine
can now automatically determine a model's context length. - Added a warning message if an
@ai_function
is missing a docstring. (#37) - Added
WrapperEngine
to make writing wrapper extensions easier.
Breaking Changes
- All
kani
models (e.g.ChatMessage
) are no longer immutable. This means that you can edit the chat history directly, and token counting will still work correctly. - As the
ctransformers
library does not appear to be maintained, we have removed theCTransformersEngine
and replaced it with theLlamaCppEngine
. - The arguments to
chat_in_terminal
(except the first) are now keyword-only. - The arguments to
HuggingEngine
(exceptmodel_id
,max_context_size
, andprompt_pipeline
) are now keyword-only. - Generation arguments for OpenAI models now take dictionaries rather than
kani.engines.openai.models.*
models. (If you aren't sure if you're affected by this, you probably aren't.)
Bug Fixes
- Fixed an issue with Claude 3 and parallel function calling.
It should be a painless upgrade from kani v0.x to kani v1.0! We tried our best to ensure that we didn't break any existing code. If you encounter any issues, please reach out on our Discord.