forked from renning22/FastChat
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'upstream/main' (2023-08-30)
- Loading branch information
Showing
65 changed files
with
2,833 additions
and
844 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,14 @@ | ||
# Chatbot Arena | ||
Chatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https://arena.lmsys.org. | ||
Chatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https://chat.lmsys.org. | ||
We invite the entire community to join this benchmarking effort by contributing your votes and models. | ||
|
||
## How to add a new model | ||
If you want to see a specific model in the arena, you can follow the steps below. | ||
If you want to see a specific model in the arena, you can follow the methods below. | ||
|
||
1. Contribute code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md#how-to-support-a-new-model). | ||
2. After the model is supported, we will try to schedule some computing resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors. | ||
- Method 1: Hosted by LMSYS. | ||
1. Contribute the code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md#how-to-support-a-new-model). | ||
2. After the model is supported, we will try to schedule some compute resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors. | ||
|
||
- Method 2: Hosted by 3rd party API providers or yourself. | ||
1. If you have a model hosted by a 3rd party API provider or yourself, please give us an API endpoint. We prefer OpenAI-compatible APIs, so we can reuse our [code](https://github.com/lm-sys/FastChat/blob/33dca5cf12ee602455bfa9b5f4790a07829a2db7/fastchat/serve/gradio_web_server.py#L333-L358) for calling OpenAI models. | ||
2. You can use FastChat's OpenAI API [server](openai_api.md) to serve your model with OpenAI-compatible APIs and provide us with the endpoint. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# AWQ 4bit Inference | ||
|
||
We integrated [AWQ](https://github.com/mit-han-lab/llm-awq) into FastChat to provide **efficient and accurate** 4bit LLM inference. | ||
|
||
## Install AWQ | ||
|
||
Setup environment (please refer to [this link](https://github.com/mit-han-lab/llm-awq#install) for more details): | ||
```bash | ||
conda create -n fastchat-awq python=3.10 -y | ||
conda activate fastchat-awq | ||
# cd /path/to/FastChat | ||
pip install --upgrade pip # enable PEP 660 support | ||
pip install -e . # install fastchat | ||
|
||
git clone https://github.com/mit-han-lab/llm-awq repositories/llm-awq | ||
cd repositories/llm-awq | ||
pip install -e . # install awq package | ||
|
||
cd awq/kernels | ||
python setup.py install # install awq CUDA kernels | ||
``` | ||
|
||
## Chat with the CLI | ||
|
||
```bash | ||
# Download quantized model from huggingface | ||
# Make sure you have git-lfs installed (https://git-lfs.com) | ||
git lfs install | ||
git clone https://huggingface.co/mit-han-lab/vicuna-7b-v1.3-4bit-g128-awq | ||
|
||
# You can specify which quantized model to use by setting --awq-ckpt | ||
python3 -m fastchat.serve.cli \ | ||
--model-path models/vicuna-7b-v1.3-4bit-g128-awq \ | ||
--awq-wbits 4 \ | ||
--awq-groupsize 128 | ||
``` | ||
|
||
## Benchmark | ||
|
||
* Through **4-bit weight quantization**, AWQ helps to run larger language models within the device memory restriction and prominently accelerates token generation. All benchmarks are done with group_size 128. | ||
|
||
* Benchmark on NVIDIA RTX A6000: | ||
|
||
| Model | Bits | Max Memory (MiB) | Speed (ms/token) | AWQ Speedup | | ||
| --------------- | ---- | ---------------- | ---------------- | ----------- | | ||
| vicuna-7b | 16 | 13543 | 26.06 | / | | ||
| vicuna-7b | 4 | 5547 | 12.43 | 2.1x | | ||
| llama2-7b-chat | 16 | 13543 | 27.14 | / | | ||
| llama2-7b-chat | 4 | 5547 | 12.44 | 2.2x | | ||
| vicuna-13b | 16 | 25647 | 44.91 | / | | ||
| vicuna-13b | 4 | 9355 | 17.30 | 2.6x | | ||
| llama2-13b-chat | 16 | 25647 | 47.28 | / | | ||
| llama2-13b-chat | 4 | 9355 | 20.28 | 2.3x | | ||
|
||
* NVIDIA RTX 4090: | ||
|
||
| Model | AWQ 4bit Speed (ms/token) | FP16 Speed (ms/token) | AWQ Speedup | | ||
| --------------- | ------------------------- | --------------------- | ----------- | | ||
| vicuna-7b | 8.61 | 19.09 | 2.2x | | ||
| llama2-7b-chat | 8.66 | 19.97 | 2.3x | | ||
| vicuna-13b | 12.17 | OOM | / | | ||
| llama2-13b-chat | 13.54 | OOM | / | | ||
|
||
* NVIDIA Jetson Orin: | ||
|
||
| Model | AWQ 4bit Speed (ms/token) | FP16 Speed (ms/token) | AWQ Speedup | | ||
| --------------- | ------------------------- | --------------------- | ----------- | | ||
| vicuna-7b | 65.34 | 93.12 | 1.4x | | ||
| llama2-7b-chat | 75.11 | 104.71 | 1.4x | | ||
| vicuna-13b | 115.40 | OOM | / | | ||
| llama2-13b-chat | 136.81 | OOM | / | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
## Chatbot Arena Conversations | ||
|
||
1. Gather battles | ||
``` | ||
python3 clean_battle_data.py --max-num 10 --mode conv_release | ||
``` | ||
|
||
2. Tag OpenAI moderation | ||
``` | ||
python3 tag_openai_moderation.py --in clean_battle_conv_20230814.json | ||
``` | ||
|
||
3. Clean PII | ||
|
||
4. Filter additional blocked words | ||
|
||
``` | ||
python3 filter_bad_conv.py --in clean_battle_conv_20230630_tagged_v1_pii.json | ||
``` | ||
|
||
5. Add additional toxicity tag | ||
|
||
|
||
## All Conversations | ||
|
||
1. Gather chats | ||
``` | ||
python3 clean_chat_data.py | ||
``` | ||
|
||
2. Sample | ||
``` | ||
python3 conv_release_scripts/sample.py | ||
``` | ||
|
||
|
||
## Prompt distribution | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,5 +87,3 @@ deepspeed fastchat/train/train_lora_t5.py \ | |
--deepspeed playground/deepspeed_config_s2.json | ||
|
||
``` | ||
|
||
|
Oops, something went wrong.