Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 0922 #6

Merged
merged 46 commits into from
Sep 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
2264580
Remove hardcode flash-attn disable setting (#2342)
Trangle Sep 1, 2023
24a8755
Document turning off proxy_buffering when api is streaming (#2337)
nathanstitt Sep 1, 2023
b039a66
Simplify huggingface api example (#2355)
merrymercy Sep 4, 2023
ea045e6
Update sponsor logos (#2367)
merrymercy Sep 5, 2023
85bec47
if LOGDIR is empty, then don't try output log to local file (#2357)
leiwen83 Sep 5, 2023
f99663c
add best_of and use_beam_search for completions interface (#2348)
leiwen83 Sep 6, 2023
3cf04c2
Extract upvote/downvote from log files (#2369)
merrymercy Sep 6, 2023
94f4dd6
Revert "add best_of and use_beam_search for completions interface" (#…
merrymercy Sep 6, 2023
dc3dd12
Improve doc (#2371)
merrymercy Sep 6, 2023
a5e6abf
add best_of and use_beam_search for completions interface (#2372)
leiwen83 Sep 7, 2023
1d703b2
update monkey patch for llama2 (#2379)
merrymercy Sep 7, 2023
56744d1
Make E5 adapter more restrict to reduce mismatch (#2381)
merrymercy Sep 7, 2023
6af0a7c
Update UI and sponsers (#2387)
merrymercy Sep 8, 2023
9b3147e
Use fsdp api for save save (#2390)
merrymercy Sep 10, 2023
a6167db
Release v0.2.27
merrymercy Sep 10, 2023
7dcdafe
Spicyboros + airoboros 2.2 template update. (#2392)
jondurbin Sep 11, 2023
b921f16
bugfix of openai_api_server for fastchat.serve.vllm_worker (#2398)
Rayrtfr Sep 11, 2023
13f40b3
Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (…
merrymercy Sep 11, 2023
77aa4df
Revert "add best_of and use_beam_search for completions interface" (#…
merrymercy Sep 11, 2023
11b05bb
Release a v0.2.28 with bug fixes and more test cases
merrymercy Sep 11, 2023
a8088ba
Fix model_worker error (#2404)
wangxiyuan Sep 12, 2023
b49d789
Added google/flan models and fixed AutoModelForSeq2SeqLM when loading…
wangzhen263 Sep 12, 2023
7dfcf1a
Rename twitter to X (#2406)
karshPrime Sep 12, 2023
aa153d5
Update huggingface_api.py (#2409)
merrymercy Sep 12, 2023
3149253
Add support for baichuan2 models (#2408)
obitoquilt Sep 13, 2023
2e0e60b
Fixed character overlap issue when api streaming output (#2431)
Somezak1 Sep 18, 2023
c7e3e67
Support custom conversation template in multi_model_worker (#2434)
hi-jin Sep 18, 2023
c685951
Add Ascend NPU support (#2422)
zhangsibo1129 Sep 18, 2023
54a8353
Add raw conversation template (#2417) (#2418)
tobiabir Sep 18, 2023
1119c51
Improve docs & UI (#2436)
merrymercy Sep 18, 2023
658736f
Fix Salesforce xgen inference (#2350)
jaywonchung Sep 18, 2023
d26d9e7
Add support for Phind-CodeLlama models (#2415) (#2416)
tobiabir Sep 18, 2023
0a5f503
Add falcon 180B chat conversation template (#2384)
Btlmd Sep 18, 2023
318d070
Improve docs (#2438)
merrymercy Sep 18, 2023
9cf3c8b
add dtype and seed (#2430)
Ying1123 Sep 18, 2023
24acac1
Data cleaning scripts for dataset release (#2440)
merrymercy Sep 18, 2023
30a6ffc
merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdap…
wangzhen263 Sep 18, 2023
16be5cf
Fix docs
merrymercy Sep 18, 2023
e4758da
Update UI (#2446)
merrymercy Sep 18, 2023
68f1fac
Add Optional SSL Support to controller.py (#2448)
brandonbiggs Sep 19, 2023
db8e271
Format & Improve docs
merrymercy Sep 19, 2023
c4c195c
Release v0.2.29 (#2450)
merrymercy Sep 20, 2023
a040cdc
Show terms of use as an JS alert (#2461)
merrymercy Sep 22, 2023
bcb8076
vllm worker awq quantization update (#2463)
dongxiaolong Sep 22, 2023
2855bf9
Fix falcon chat template (#2464)
merrymercy Sep 22, 2023
20cfb32
Merge commit '2855bf9' into merge_0922
renning22 Sep 23, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ We are focused to support Llama2 at scale now. If you want any other models, ple

## Dev Log

### 2023-09

Sync upstream changes

### 2023-08

Support llama2 at scale.
Expand All @@ -37,4 +41,3 @@ Support "Llama-2-13b-chat-hf" and make it the default for API.

* API key database and rate limit enforcement
* Deployable on Kubernetes

13 changes: 12 additions & 1 deletion docs/commands/leaderboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,16 @@ python3 clean_battle_data.py

### Run Elo analysis
```
python3 elo_analysis.py --clean-battle-file clean_battle_20230523.json
python3 elo_analysis.py --clean-battle-file clean_battle_20230905.json
```

### Copy files to HF space
1. update plots
```
scp atlas:/data/lmzheng/FastChat/fastchat/serve/monitor/elo_results_20230905.pkl .
```

2. update table
```
wget https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/raw/main/leaderboard_table_20230905.csv
```
3 changes: 3 additions & 0 deletions docs/commands/test_process.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
## Unit tests for FastChat
The scripts are under [FastChat/tests](../../tests).

### Test CLI Inference

```
Expand Down
2 changes: 1 addition & 1 deletion docs/commands/webserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ cd fastchat_logs/server0
export OPENAI_API_KEY=
export ANTHROPIC_API_KEY=

python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 10 --add-chatgpt --add-claude --add-palm --anony-only --elo ~/elo_results/elo_results_20230802.pkl --leaderboard-table-file ~/elo_results/leaderboard_table_20230802.csv --register ~/elo_results/register_oai_models.json
python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 10 --add-chatgpt --add-claude --add-palm --anony-only --elo ~/elo_results/elo_results.pkl --leaderboard-table-file ~/elo_results/leaderboard_table.csv --register ~/elo_results/register_oai_models.json --show-terms

python3 backup_logs.py
```
Expand Down
4 changes: 3 additions & 1 deletion docs/model_support.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,15 @@
- [openaccess-ai-collective/manticore-13b-chat-pyg](https://huggingface.co/openaccess-ai-collective/manticore-13b-chat-pyg)
- [OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5)
- [VMware/open-llama-7b-v2-open-instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
- [Phind/Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2)
- [project-baize/baize-v2-7b](https://huggingface.co/project-baize/baize-v2-7b)
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
- [Salesforce/codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
- [StabilityAI/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
- [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)
- [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
- [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
- [tiiuae/falcon-180B-chat](https://huggingface.co/tiiuae/falcon-180B-chat)
- [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)
- [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
- [WizardLM/WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
Expand Down Expand Up @@ -71,7 +73,7 @@ You can add `--debug` to see the actual prompt sent to the model.

FastChat uses the `Conversation` class to handle prompt templates and `BaseModelAdapter` class to handle model loading.

1. Implement a conversation template for the new model at [fastchat/conversation.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py). You can follow existing examples and use `register_conv_template` to add a new one.
1. Implement a conversation template for the new model at [fastchat/conversation.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py). You can follow existing examples and use `register_conv_template` to add a new one. Please also add a link to the official reference code if possible.
2. Implement a model adapter for the new model at [fastchat/model/model_adapter.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_adapter.py). You can follow existing examples and use `register_model_adapter` to add a new one.
3. (Optional) add the model name to the "Supported models" [section](#supported-models) above and add more information in [fastchat/model/model_registry.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_registry.py).

Expand Down
2 changes: 1 addition & 1 deletion docs/openai_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ completion = openai.ChatCompletion.create(
print(completion.choices[0].message.content)
```

Streaming is also supported. See [test_openai_api.py](../tests/test_openai_api.py).
Streaming is also supported. See [test_openai_api.py](../tests/test_openai_api.py). If your api server is behind a proxy you'll need to turn off buffering, you can do so in Nginx by setting `proxy_buffering off;` in the location block for the proxy.

### cURL
cURL is another good tool for observing the output of the api.
Expand Down
29 changes: 29 additions & 0 deletions docs/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,32 @@ deepspeed fastchat/train/train_lora_t5.py \
--deepspeed playground/deepspeed_config_s2.json

```

### Fine-tuning Vicuna-7B with Local NPUs

You can use the following command to train Vicuna-7B with 8 x 910B (60GB). Use `--nproc_per_node` to specify the number of NPUs.
```bash
torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/train.py \
--model_name_or_path ~/vicuna-7b-v1.5-16k \
--data_path data/dummy_conversation.json \
--fp16 True \
--output_dir output_vicuna \
--num_train_epochs 3 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1200 \
--save_total_limit 10 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True
```
5 changes: 5 additions & 0 deletions docs/vllm_integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,8 @@ See the supported models [here](https://vllm.readthedocs.io/en/latest/models/sup
```
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.3 --tokenizer hf-internal-testing/llama-tokenizer
```

if you use a awq model, try
'''
python3 -m fastchat.serve.vllm_worker --model-path TheBloke/vicuna-7B-v1.5-AWQ --quantization awq
'''
2 changes: 1 addition & 1 deletion fastchat/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.2.26"
__version__ = "0.2.29"
2 changes: 1 addition & 1 deletion fastchat/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
CONVERSATION_LIMIT_MSG = "YOU HAVE REACHED THE CONVERSATION LENGTH LIMIT. PLEASE CLEAR HISTORY AND START A NEW CONVERSATION."
INACTIVE_MSG = "THIS SESSION HAS BEEN INACTIVE FOR TOO LONG. PLEASE REFRESH THIS PAGE."
# Maximum input length
INPUT_CHAR_LEN_LIMIT = int(os.getenv("FASTCHAT_INPUT_CHAR_LEN_LIMIT", 2560))
INPUT_CHAR_LEN_LIMIT = int(os.getenv("FASTCHAT_INPUT_CHAR_LEN_LIMIT", 3072))
# Maximum conversation turns
CONVERSATION_TURN_LIMIT = 50
# Session expiration time
Expand Down
84 changes: 80 additions & 4 deletions fastchat/conversation.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ class SeparatorStyle(IntEnum):
RWKV = auto()
PHOENIX = auto()
ROBIN = auto()
FALCON_CHAT = auto()


@dataclasses.dataclass
Expand Down Expand Up @@ -200,6 +201,17 @@ def get_prompt(self) -> str:
else:
ret += role + ":\n"
return ret
elif self.sep_style == SeparatorStyle.FALCON_CHAT:
ret = ""
if self.system_message:
ret += system_prompt + self.sep
for role, message in self.messages:
if message:
ret += role + ": " + message + self.sep
else:
ret += role + ":"

return ret
else:
raise ValueError(f"Invalid style: {self.sep_style}")

Expand Down Expand Up @@ -285,6 +297,17 @@ def get_conv_template(name: str) -> Conversation:
return conv_templates[name].copy()


# An empty template for raw conversation.
register_conv_template(
Conversation(
name="raw",
system_message="",
roles=("", ""),
sep_style=SeparatorStyle.NO_COLON_SINGLE,
sep="",
)
)

# A template with a one-shot conversation example
register_conv_template(
Conversation(
Expand Down Expand Up @@ -357,6 +380,17 @@ def get_conv_template(name: str) -> Conversation:
)
)

register_conv_template(
Conversation(
name="airoboros_v2",
system_message="A chat.",
roles=("USER", "ASSISTANT"),
sep_style=SeparatorStyle.ADD_COLON_TWO,
sep="\n",
sep2="</s>",
)
)

# Koala default template
register_conv_template(
Conversation(
Expand Down Expand Up @@ -743,11 +777,10 @@ def get_conv_template(name: str) -> Conversation:
Conversation(
name="xgen",
system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n",
roles=("### Human: ", "###"),
sep_style=SeparatorStyle.NO_COLON_SINGLE,
roles=("### Human", "### Assistant"),
sep_style=SeparatorStyle.ADD_COLON_SINGLE,
sep="\n",
stop_token_ids=[50256, 0, 1, 2],
stop_str="<|endoftext|>",
stop_token_ids=[50256],
)
)

Expand Down Expand Up @@ -793,6 +826,20 @@ def get_conv_template(name: str) -> Conversation:
)
)

# Baichuan2-13B-Chat template
register_conv_template(
# source: https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/c6f8592a60b4ad73c210b28dd2ab3cca51abbf93/modeling_baichuan.py#L773
# https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/generation_config.json
# https://github.com/baichuan-inc/Baichuan2/issues/62
Conversation(
name="baichuan2-chat",
roles=("<reserved_106>", "<reserved_107>"),
sep_style=SeparatorStyle.NO_COLON_SINGLE,
sep="",
stop_token_ids=[],
)
)

# llama2 template
# reference: https://huggingface.co/blog/codellama#conversational-instructions
# reference: https://github.com/facebookresearch/llama/blob/1a240688810f8036049e8da36b073f63d2ac552c/llama/generation.py#L212
Expand Down Expand Up @@ -905,6 +952,35 @@ def get_conv_template(name: str) -> Conversation:
)
)

# Falcon 180B chat template
# source: https://huggingface.co/spaces/tiiuae/falcon-180b-demo/blob/d1590ee7fae9b6ce331ba7808e61a29dcce9239f/app.py#L28-L37
register_conv_template(
Conversation(
name="falcon-chat",
roles=("User", "Falcon"),
system_template="System: {system_message}",
messages=[],
sep_style=SeparatorStyle.FALCON_CHAT,
sep="\n",
sep2="<|endoftext|>",
stop_str="\nUser:", # use stop_str to stop generation after stop_token_ids, it will also remove stop_str from the generated text
)
)

# Phind template
# source: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
register_conv_template(
Conversation(
name="phind",
system_message="### System Prompt\nYou are an intelligent programming assistant.",
roles=("### User Message", "### Assistant"),
messages=(),
offset=0,
sep_style=SeparatorStyle.ADD_COLON_SINGLE,
sep="\n\n",
)
)


if __name__ == "__main__":
print("Vicuna template:")
Expand Down
1 change: 0 additions & 1 deletion fastchat/data/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

import argparse
import json
from typing import Dict, Sequence, Optional


if __name__ == "__main__":
Expand Down
7 changes: 4 additions & 3 deletions fastchat/llm_judge/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# LLM Judge
| [Paper](https://arxiv.org/abs/2306.05685) | [Leaderboard](https://chat.lmsys.org/?leaderboard) |
| [Paper](https://arxiv.org/abs/2306.05685) | [Leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) |

In this package, you can use MT-bench questions and prompts to evaluate your models with LLM-as-a-judge.
MT-bench is a set of challenging multi-turn open-ended questions for evaluating chat assistants.
Expand All @@ -10,7 +10,7 @@ To automate the evaluation process, we prompt strong LLMs like GPT-4 to act as j
- [Review Pre-Generated Model Answers and Judgments](#review-pre-generated-model-answers-and-judgments)
- [MT-Bench](#mt-bench)
- [Agreement Computation](#agreement-computation)
- [Dataset](#dataset)
- [Datasets](#datasets)
- [Citation](#citation)

## Install
Expand Down Expand Up @@ -64,6 +64,7 @@ This mode asks GPT-4 to grade and give a score to model's answer directly withou
For each turn, GPT-4 will give a score on a scale of 10. We then compute the average score on all turns.

```
export OPENAI_API_KEY=XXXXXX # set the OpenAI API key
python gen_judgment.py --model-list [LIST-OF-MODEL-ID] --parallel [num-concurrent-api-call]
```

Expand Down Expand Up @@ -133,7 +134,7 @@ We released 3.3K human annotations for model responses generated by 6 models in

This Colab [notebook](https://colab.research.google.com/drive/1ctgygDRJhVGUJTQy8-bRZCl1WNcT8De6?usp=sharing) shows how to compute the agreement between humans and GPT-4 judge with the dataset. Our results show that humans and GPT-4 judge achieve over 80\% agreement, the same level of agreement between humans.

## Dataset
## Datasets
- [Chatbot Arena Conversation Dataset](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
- [MT-bench Human Annotation Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)

Expand Down
29 changes: 29 additions & 0 deletions fastchat/llm_judge/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,35 @@ def chat_compeletion_openai(model, conv, temperature, max_tokens):
return output


def chat_compeletion_openai_azure(model, conv, temperature, max_tokens):
openai.api_type = "azure"
openai.api_base = os.environ["AZURE_OPENAI_ENDPOINT"]
openai.api_key = os.environ["AZURE_OPENAI_KEY"]
openai.api_version = "2023-05-15"

if "azure-" in model:
model = model[6:]

output = API_ERROR_OUTPUT
for _ in range(API_MAX_RETRY):
try:
messages = conv.to_openai_api_messages()
response = openai.ChatCompletion.create(
engine=model,
messages=messages,
n=1,
temperature=temperature,
max_tokens=max_tokens,
)
output = response["choices"][0]["message"]["content"]
break
except openai.error.OpenAIError as e:
print(type(e), e)
time.sleep(API_RETRY_SLEEP)

return output


def chat_compeletion_anthropic(model, conv, temperature, max_tokens):
output = API_ERROR_OUTPUT
for _ in range(API_MAX_RETRY):
Expand Down
Loading
Loading