-
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
78 additions
and
200 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,242 +1,120 @@ | ||
<p align="center"> | ||
<img width="256" height="256" alt="kani" src="docs/_static/[email protected]"> | ||
</p> | ||
|
||
<p align="center"> | ||
<a href="https://github.com/zhudotexe/kani/actions/workflows/pytest.yml"> | ||
<img alt="Test Package" src="https://github.com/zhudotexe/kani/actions/workflows/pytest.yml/badge.svg"> | ||
</a> | ||
<a href="https://kani.readthedocs.io/en/latest/?badge=latest"> | ||
<img alt="Documentation Status" src="https://readthedocs.org/projects/kani/badge/?version=latest"> | ||
</a> | ||
<a href="https://pypi.org/project/kani/"> | ||
<img alt="PyPI" src="https://img.shields.io/pypi/v/kani"> | ||
<a href="https://fanoutqa.readthedocs.io/en/latest/?badge=latest"> | ||
<img alt="Documentation Status" src="https://readthedocs.org/projects/fanoutqa/badge/?version=latest"> | ||
</a> | ||
<a href="https://colab.research.google.com/github/zhudotexe/kani/blob/main/examples/colab_examples.ipynb"> | ||
<img alt="Quickstart in Colab" src="https://colab.research.google.com/assets/colab-badge.svg"> | ||
</a> | ||
<a href="https://discord.gg/eTepTNDxYT"> | ||
<img alt="Discord" src="https://img.shields.io/discord/1150902904773935214?color=5865F2&label=discord&logo=discord&logoColor=white"> | ||
<a href="https://pypi.org/project/fanoutqa/"> | ||
<img alt="PyPI" src="https://img.shields.io/pypi/v/fanoutqa"> | ||
</a> | ||
</p> | ||
|
||
# kani (カニ) | ||
|
||
kani (カニ) is a lightweight and highly hackable framework for chat-based language models with tool usage/function | ||
calling. | ||
# FanOutQA | ||
|
||
Compared to other LM frameworks, kani is less opinionated and offers more fine-grained customizability | ||
over the parts of the control flow that matter, making it the perfect choice for NLP researchers, hobbyists, and | ||
developers alike. | ||
Read the paper! | [Download the dataset!](/data) | ||
|
||
kani comes with support for the following models out of the box, with a model-agnostic framework to add support for many | ||
more: | ||
FanOutQA is a high quality, multi-hop, multi-document benchmark for large language models using English Wikipedia as its | ||
knowledge base. Compared to other question-answering benchmarks, FanOutQA requires reasoning over a greater number of | ||
documents, with the benchmark's main focus being on the titular fan-out style of question. We present these questions | ||
in three tasks -- closed-book, open-book, and evidence-provided -- which | ||
measure different abilities of LLM systems. | ||
|
||
- OpenAI Models (GPT-3.5-turbo, GPT-4, GPT-4-turbo) | ||
- Anthropic Models (Claude, Claude Instant) | ||
- LLaMA v2 (via Hugging Face or ctransformers) & fine-tunes | ||
- Vicuna v1.3 (via Hugging Face) & fine-tunes | ||
This repository contains utilities to download and work with the dataset in Python, along with implementations of the | ||
evaluation metrics presented in our paper. Alternatively, you can download the dev and test sets in JSON format and | ||
generate completions to submit to us for evaluation. | ||
|
||
**Interested in contributing? Check out our | ||
[guide](https://kani.readthedocs.io/en/latest/community/contributing.html).** | ||
## Leaderboards | ||
|
||
[Read the docs on ReadTheDocs!](http://kani.readthedocs.io/) | ||
TODO: move to website | ||
|
||
[Read our paper on arXiv!](https://arxiv.org/abs/2309.05542) | ||
## Requirements and Installation | ||
|
||
## Features | ||
The `fanoutqa` package requires Python 3.8+. | ||
|
||
- **Lightweight and high-level** - kani implements common boilerplate to interface with language models without forcing | ||
you to use opinionated prompt frameworks or complex library-specific tooling. | ||
- **Model agnostic** - kani provides a simple interface to implement: token counting and completion generation. | ||
Implement these two, and kani can run with any language model. | ||
- **Automatic chat memory management** - Allow chat sessions to flow without worrying about managing the number of | ||
tokens in the history - kani takes care of it. | ||
- **Function calling with model feedback and retry** - Give models access to functions in just one line of code. | ||
kani elegantly provides feedback about hallucinated parameters and errors and allows the model to retry calls. | ||
- **You control the prompts** - There are no hidden prompt hacks. We will never decide for you how to format your own | ||
data, unlike other popular language model libraries. | ||
- **Fast to iterate and intuitive to learn** - With kani, you only write Python - we handle the rest. | ||
- **Asynchronous design from the start** - kani can scale to run multiple chat sessions in parallel easily, without | ||
having to manage multiple processes or programs. | ||
To work with just the data, use `pip install fanoutqa`. | ||
|
||
## Quickstart | ||
To run evaluations on the dev set, use `pip install "fanoutqa[eval]"`. | ||
|
||
<a href="https://colab.research.google.com/github/zhudotexe/kani/blob/main/examples/colab_examples.ipynb"> | ||
<img alt="Quickstart in Colab" src="https://colab.research.google.com/assets/colab-badge.svg"> | ||
</a> | ||
## Data Format | ||
|
||
kani requires Python 3.10 or above. | ||
To load the dev or test questions, simply use `fanoutqa.load_dev()` or `fanoutqa.load_test()`. This will return a list | ||
of `DevQuestion` or `TestQuestion`, as documented below. | ||
|
||
First, install the library. In this quickstart, we'll use the OpenAI engine, though kani | ||
is [model-agnostic](https://kani.readthedocs.io/en/latest/engines.html). | ||
|
||
```shell | ||
$ pip install "kani[openai]" | ||
``` | ||
|
||
Then, let's use kani to create a simple chatbot using ChatGPT as a backend. | ||
### Common Models | ||
|
||
```python | ||
# import the library | ||
from kani import Kani, chat_in_terminal | ||
from kani.engines.openai import OpenAIEngine | ||
|
||
# Replace this with your OpenAI API key: https://platform.openai.com/account/api-keys | ||
api_key = "sk-..." | ||
Primitive = bool | int | float | str | ||
|
||
# kani uses an Engine to interact with the language model. You can specify other model | ||
# parameters here, like temperature=0.7. | ||
engine = OpenAIEngine(api_key, model="gpt-3.5-turbo") | ||
|
||
# The kani manages the chat state, prompting, and function calling. Here, we only give | ||
# it the engine to call ChatGPT, but you can specify other parameters like | ||
# system_prompt="You are..." here. | ||
ai = Kani(engine) | ||
|
||
# kani comes with a utility to interact with a kani through your terminal! Check out | ||
# the docs for how to use kani programmatically. | ||
chat_in_terminal(ai) | ||
class Evidence: | ||
pageid: int # Wikipedia page ID | ||
revid: int # Wikipedia revision ID of page as of dataset epoch | ||
title: str # Title of page | ||
url: str # Link to page | ||
``` | ||
|
||
kani makes the time to set up a working chat model short, while offering the programmer deep customizability over | ||
every prompt, function call, and even the underlying language model. | ||
|
||
## Function Calling | ||
### Dev Set | ||
|
||
Function calling gives language models the ability to choose when to call a function you provide based off its | ||
documentation. | ||
|
||
With kani, you can write functions in Python and expose them to the model with just one line of code: the `@ai_function` | ||
decorator. | ||
The development set is a JSON file containing a list of DevQuestion objects: | ||
|
||
```python | ||
# import the library | ||
from typing import Annotated | ||
from kani import AIParam, Kani, ai_function, chat_in_terminal | ||
from kani.engines.openai import OpenAIEngine | ||
|
||
# set up the engine as above | ||
api_key = "sk-..." | ||
engine = OpenAIEngine(api_key, model="gpt-3.5-turbo") | ||
|
||
|
||
# subclass Kani to add AI functions | ||
class MyKani(Kani): | ||
# Adding the annotation to a method exposes it to the AI | ||
@ai_function() | ||
def get_weather( | ||
self, | ||
# and you can provide extra documentation about specific parameters | ||
location: Annotated[str, AIParam(desc="The city and state, e.g. San Francisco, CA")], | ||
): | ||
"""Get the current weather in a given location.""" | ||
# In this example, we mock the return, but you could call a real weather API | ||
return f"Weather in {location}: Sunny, 72 degrees fahrenheit." | ||
|
||
|
||
ai = MyKani(engine) | ||
chat_in_terminal(ai) | ||
class DevQuestion: | ||
id: str | ||
question: str # the top-level question to answer | ||
decomposition: list[DevSubquestion] # human-written decomposition of the question | ||
answer: dict[str, Primitive] | list[Primitive] | Primitive | ||
categories: list[str] | ||
|
||
|
||
class DevSubquestion: | ||
id: str | ||
question: str | ||
decomposition: list[DevSubquestion] | ||
answer: dict[str, Primitive] | list[Primitive] | Primitive # the answer to this subquestion | ||
depends_on: list[str] # the IDs of subquestions that this subquestion requires answering first | ||
evidence: Evidence | None # if this is None, the question will have a decomposition | ||
``` | ||
|
||
kani guarantees that function calls are valid by the time they reach your methods while allowing you to focus on | ||
writing code. For more information, check | ||
out [the function calling docs](https://kani.readthedocs.io/en/latest/function_calling.html). | ||
|
||
## Why kani? | ||
|
||
Existing frameworks for language models like LangChain and simpleaichat are opinionated and/or heavyweight - they edit | ||
developers' prompts under the hood, are challenging to learn, and are difficult to customize without adding a lot of | ||
high-maintenance bloat to your codebase. | ||
|
||
<p align="center"> | ||
<img style="max-width: 800px;" alt="kani" src="docs/_static/lib-comparison_white.png"> | ||
</p> | ||
|
||
We built kani as a more flexible, simple, and robust alternative. A good analogy between frameworks would be to say that | ||
kani is to LangChain as Flask (or FastAPI) is to Django. | ||
|
||
kani is appropriate for everyone from academic researchers to industry professionals to hobbyists to use without | ||
worrying about under-the-hood hacks. | ||
|
||
|
||
## Docs | ||
|
||
To learn more about how | ||
to [customize kani with your own prompt wrappers](https://kani.readthedocs.io/en/latest/customization.html), | ||
[function calling](https://kani.readthedocs.io/en/latest/function_calling.html), and | ||
more, [read the docs!](http://kani.readthedocs.io/) | ||
|
||
Or take a look at the hands-on examples [in this repo](https://github.com/zhudotexe/kani/tree/main/examples). | ||
|
||
## Demo | ||
### Test Set | ||
|
||
Want to see kani in action? Using 4-bit quantization to shrink the model, we run LLaMA v2 as part of our test suite | ||
right on GitHub Actions: | ||
The test set contains a slightly different format, as the answers are not provided. We include links to all the evidence | ||
used in the human-written decompositions for our Evidence Provided task. | ||
|
||
https://github.com/zhudotexe/kani/actions/workflows/pytest.yml?query=branch%3Amain+is%3Asuccess | ||
|
||
Simply click on the latest build to see LLaMA's output! | ||
|
||
## Kani in the News | ||
|
||
Kani will appear at the NLP Open Source Software workshop at EMNLP 2023! | ||
|
||
We are really excited and grateful to see people talking about Kani online. We are also trending on Papers With Code, | ||
GitHub, and OSS Insight. Check out some recent articles and videos below! | ||
```python | ||
class TestQuestion: | ||
id: str | ||
question: str | ||
necessary_evidence: list[FinalEvidence] | ||
categories: list[str] | ||
``` | ||
|
||
- [Researchers from the University of Pennsylvania Introduce Kani: A Lightweight, Flexible, and Model-Agnostic Open-Source AI Framework for Building Language Model Applications](https://www.marktechpost.com/2023/09/18/researchers-from-the-university-of-pennsylvania-introduce-kani-a-lightweight-flexible-and-model-agnostic-open-source-ai-framework-for-building-language-model-applications/) | ||
- [Unlocking AI Potential: Unveiling Kani, the Groundbreaking Open-Source Framework Revolutionizing Large Language Model Applications](https://www.cjco.com.au/article/news/unlocking-ai-potential-unveiling-kani-the-groundbreaking-open-source-framework-revolutionizing-large-language-model-applications/) | ||
- [Kani: A Lightweight and Customizable Framework for Language Model Applications](https://ts2.space/en/kani-a-lightweight-and-customizable-framework-for-language-model-applications/) | ||
- [Introducing Kani (Sanskrit Word): A Game-Changing Open-Source AI Framework for Language Models](https://www.linkedin.com/pulse/introducing-kani-sanskrit-word-game-changing/) | ||
- *Kani was originally named after the Japanese word for crab and coincidentally means "knowledge" in Sanskrit.* | ||
- [kani: lightweight LLM framework (Japanese)](https://note.com/hamachi_jp/n/n342becc4f345) | ||
- [Top Trending LLM Projects of the Week: Dive into the Future of Tech! 🚀](https://www.youtube.com/watch?v=qoGKzmnhAnA) | ||
## Wikipedia Retrieval | ||
|
||
## Who we are | ||
TODO | ||
|
||
<img alt="University of Pennsylvania Logo" src="docs/_static/penn-logo.jpg" width="300"> | ||
## Evaluation | ||
|
||
The core development team is made of three PhD students in the Department of Computer and Information Science at the | ||
University of Pennsylvania. We're all members of | ||
[Prof. Chris Callison-Burch's](https://www.cis.upenn.edu/~ccb/) lab, working towards advancing the future of NLP. | ||
To evaluate a model's generation, first ensure that you have installed all the evaluation dependencies (see above). | ||
|
||
- [**Andrew Zhu**](https://zhu.codes/) started in Fall 2022. His research interests include natural language processing, | ||
programming languages, distributed systems, and more. He's also a full-stack software engineer, proficient in all | ||
manner of backend, devops, database, and frontend engineering. Andrew strives to make idiomatic, clean, performant, | ||
and low-maintenance code — philosophies that are often rare in academia. | ||
- [**Liam Dugan**](https://liamdugan.com/) started in Fall 2021. His research focuses primarily on large language models | ||
and how humans interact with them. In particular, he is interested in human detection of generated text and whether we | ||
can apply those insights to automatic detection systems. He is also interested in the practical application of large | ||
language models to education. | ||
- [**Alyssa Hwang**](https://alyssahwang.com/) started in Fall 2020 and is advised by Chris Callison-Burch and Andrew | ||
Head. Her research focuses on AI assistants that effectively communicate complex information, like voice assistants | ||
guiding users through instructions or audiobooks allowing users to seamlessly navigate through spoken text. Beyond | ||
research, Alyssa chairs the Penn CIS Doctoral Association, founded the CIS PhD Mentorship Program, and was supported | ||
by the NSF Graduate Research Fellowship Program. | ||
TODO: what env vars? | ||
TODO: what to run? | ||
TODO: what does it return? | ||
|
||
## Citation | ||
### Test Set Evaluation | ||
|
||
If you use Kani, please cite us as: | ||
To evaluate your model on the hidden test set, please email your generations | ||
to [[email protected]](mailto:[email protected]) with the subject "FanOutQA Test Evaluation". Your generations | ||
should be in the form of a JSONL file, with each line being a JSON object with the following schema for each test | ||
question: | ||
|
||
``` | ||
@misc{zhu2023kani, | ||
title={Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications}, | ||
author={Andrew Zhu and Liam Dugan and Alyssa Hwang and Chris Callison-Burch}, | ||
year={2023}, | ||
eprint={2309.05542}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.SE} | ||
```json | ||
{ | ||
"id": "The ID of the question (see test set schema) this is a generation for.", | ||
"answer": "The model's generation." | ||
} | ||
``` | ||
|
||
### Acknowledgements | ||
|
||
We would like to thank the members of the lab of Chris Callison-Burch for their testing and detailed feedback on the | ||
contents of both our paper and the Kani repository. In addition, we’d like to thank Henry Zhu (no relation to the first | ||
author) for his early and enthusiastic support of the project. | ||
In the email body, please include details about your system, including at least: | ||
- the name of your system | ||
- the list of authors | ||
- a link to your paper and recommended short citation, if applicable | ||
- whether it is a new foundation model, a fine-tune, a prompting approach, or other | ||
|
||
This research is based upon work supported in part by the Air Force Research Laboratory (contract FA8750-23-C-0507), the | ||
IARPA HIATUS Program (contract 2022-22072200005), and the NSF (Award 1928631). Approved for Public Release, Distribution | ||
Unlimited. The views and conclusions contained herein are those of the authors and should not be interpreted as | ||
necessarily representing the official policies, either expressed or implied, of IARPA, NSF, or the U.S. Government. |