Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge 0223 #8

Merged
merged 246 commits into from
Feb 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
246 commits
Select commit Hold shift + click to select a range
2264580
Remove hardcode flash-attn disable setting (#2342)
Trangle Sep 1, 2023
24a8755
Document turning off proxy_buffering when api is streaming (#2337)
nathanstitt Sep 1, 2023
b039a66
Simplify huggingface api example (#2355)
merrymercy Sep 4, 2023
ea045e6
Update sponsor logos (#2367)
merrymercy Sep 5, 2023
85bec47
if LOGDIR is empty, then don't try output log to local file (#2357)
leiwen83 Sep 5, 2023
f99663c
add best_of and use_beam_search for completions interface (#2348)
leiwen83 Sep 6, 2023
3cf04c2
Extract upvote/downvote from log files (#2369)
merrymercy Sep 6, 2023
94f4dd6
Revert "add best_of and use_beam_search for completions interface" (#…
merrymercy Sep 6, 2023
dc3dd12
Improve doc (#2371)
merrymercy Sep 6, 2023
a5e6abf
add best_of and use_beam_search for completions interface (#2372)
leiwen83 Sep 7, 2023
1d703b2
update monkey patch for llama2 (#2379)
merrymercy Sep 7, 2023
56744d1
Make E5 adapter more restrict to reduce mismatch (#2381)
merrymercy Sep 7, 2023
6af0a7c
Update UI and sponsers (#2387)
merrymercy Sep 8, 2023
9b3147e
Use fsdp api for save save (#2390)
merrymercy Sep 10, 2023
a6167db
Release v0.2.27
merrymercy Sep 10, 2023
7dcdafe
Spicyboros + airoboros 2.2 template update. (#2392)
jondurbin Sep 11, 2023
b921f16
bugfix of openai_api_server for fastchat.serve.vllm_worker (#2398)
Rayrtfr Sep 11, 2023
13f40b3
Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (…
merrymercy Sep 11, 2023
77aa4df
Revert "add best_of and use_beam_search for completions interface" (#…
merrymercy Sep 11, 2023
11b05bb
Release a v0.2.28 with bug fixes and more test cases
merrymercy Sep 11, 2023
a8088ba
Fix model_worker error (#2404)
wangxiyuan Sep 12, 2023
b49d789
Added google/flan models and fixed AutoModelForSeq2SeqLM when loading…
wangzhen263 Sep 12, 2023
7dfcf1a
Rename twitter to X (#2406)
karshPrime Sep 12, 2023
aa153d5
Update huggingface_api.py (#2409)
merrymercy Sep 12, 2023
3149253
Add support for baichuan2 models (#2408)
obitoquilt Sep 13, 2023
2e0e60b
Fixed character overlap issue when api streaming output (#2431)
Somezak1 Sep 18, 2023
c7e3e67
Support custom conversation template in multi_model_worker (#2434)
hi-jin Sep 18, 2023
c685951
Add Ascend NPU support (#2422)
zhangsibo1129 Sep 18, 2023
54a8353
Add raw conversation template (#2417) (#2418)
tobiabir Sep 18, 2023
1119c51
Improve docs & UI (#2436)
merrymercy Sep 18, 2023
658736f
Fix Salesforce xgen inference (#2350)
jaywonchung Sep 18, 2023
d26d9e7
Add support for Phind-CodeLlama models (#2415) (#2416)
tobiabir Sep 18, 2023
0a5f503
Add falcon 180B chat conversation template (#2384)
Btlmd Sep 18, 2023
318d070
Improve docs (#2438)
merrymercy Sep 18, 2023
9cf3c8b
add dtype and seed (#2430)
Ying1123 Sep 18, 2023
24acac1
Data cleaning scripts for dataset release (#2440)
merrymercy Sep 18, 2023
30a6ffc
merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdap…
wangzhen263 Sep 18, 2023
16be5cf
Fix docs
merrymercy Sep 18, 2023
e4758da
Update UI (#2446)
merrymercy Sep 18, 2023
68f1fac
Add Optional SSL Support to controller.py (#2448)
brandonbiggs Sep 19, 2023
db8e271
Format & Improve docs
merrymercy Sep 19, 2023
c4c195c
Release v0.2.29 (#2450)
merrymercy Sep 20, 2023
a040cdc
Show terms of use as an JS alert (#2461)
merrymercy Sep 22, 2023
bcb8076
vllm worker awq quantization update (#2463)
dongxiaolong Sep 22, 2023
2855bf9
Fix falcon chat template (#2464)
merrymercy Sep 22, 2023
f8f302f
Fix chunk handling when partial chunks are returned (#2485)
siddartha-RE Sep 29, 2023
15a094e
Update openai_api_server.py to add an SSL option (#2484)
brandonbiggs Sep 29, 2023
7aace7d
Update vllm_worker.py (#2482)
shuishu Sep 29, 2023
faca3a3
fix typo quantization (#2469)
asaiacai Sep 29, 2023
8e8a604
fix vllm quanziation args
merrymercy Sep 29, 2023
77b3df1
Update README.md (#2492)
merrymercy Sep 29, 2023
f5c90f6
Huggingface api worker (#2456)
hnyls2002 Sep 29, 2023
f70de6b
Update links to lmsys-chat-1m (#2497)
merrymercy Sep 30, 2023
c478bbf
Update train code to support the new tokenizer (#2498)
Ying1123 Sep 30, 2023
bc22411
Third Party UI Example (#2499)
enochlev Sep 30, 2023
6b4fc64
Add metharme (pygmalion) conversation template (#2500)
AlpinDale Oct 1, 2023
46e5207
Optimize for proper flash attn causal handling (#2503)
siddartha-RE Oct 2, 2023
f5eee7d
Add Mistral AI instruction template (#2483)
lerela Oct 2, 2023
759dfbe
Update monitor & plots (#2506)
merrymercy Oct 2, 2023
f9fcc9d
Release v0.2.30 (#2507)
merrymercy Oct 2, 2023
e64ee0e
Fix for single turn dataset (#2509)
toslunar Oct 3, 2023
c3ad73a
replace os.getenv with os.path.expanduser because the first one doesn…
khalil-Hennara Oct 4, 2023
5573aae
Fix arena (#2522)
merrymercy Oct 6, 2023
dad34ea
Update Dockerfile (#2524)
Oct 9, 2023
9d27d68
add Llama2ChangAdapter (#2510)
lcw99 Oct 9, 2023
466da28
Add ExllamaV2 Inference Framework Support. (#2455)
leonxia1018 Oct 9, 2023
5dbc4f3
Improve docs (#2534)
merrymercy Oct 9, 2023
e448a0f
Fix warnings for new gradio versions (#2538)
merrymercy Oct 10, 2023
125f374
revert the gradio change; now works for 3.40
merrymercy Oct 10, 2023
0c37d98
Improve chat templates (#2539)
merrymercy Oct 10, 2023
cd7d048
Add Zephyr 7B Alpha (#2535)
lewtun Oct 11, 2023
f5a4911
Improve Support for Mistral-Instruct (#2547)
Steve-Tech Oct 12, 2023
f683fd1
correct max_tokens by context_length instead of raise exception (#2544)
liunux4odoo Oct 12, 2023
7b0ca39
Revert "Improve Support for Mistral-Instruct" (#2552)
merrymercy Oct 12, 2023
9f7afed
Fix Mistral template (#2529)
normster Oct 12, 2023
f19d449
Add additional Informations from the vllm worker (#2550)
SebastianBodza Oct 12, 2023
631d62f
Make FastChat work with LMSYS-Chat-1M Code (#2551)
CodingWithTim Oct 12, 2023
7ebc29c
Create `tags` attribute to fix `MarkupError` in rich CLI (#2553)
Steve-Tech Oct 13, 2023
8531cf6
move BaseModelWorker outside serve.model_worker to make it independen…
liunux4odoo Oct 13, 2023
ff3cb92
Misc style and bug fixes (#2559)
merrymercy Oct 13, 2023
e1a1f50
Fix README.md (#2561)
infwinston Oct 14, 2023
9db2143
release v0.2.31 (#2563)
merrymercy Oct 14, 2023
cb71875
resolves #2542 modify dockerfile to upgrade cuda to 12.2.0 and pydant…
alexdelapaz Oct 15, 2023
ee0d4d2
Add airoboros_v3 chat template (llama-2 format) (#2564)
jondurbin Oct 15, 2023
06092dd
Add Xwin-LM V0.1, V0.2 support (#2566)
REIGN12 Oct 15, 2023
ff66426
Fixed model_worker generate_gate may blocked main thread (#2540) (#2…
lvxuan263 Oct 16, 2023
7fbf5b1
feat: add claude-v2 (#2571)
congchan Oct 17, 2023
29de51f
Update vigogne template (#2580)
bofenghuang Oct 18, 2023
f79151b
Fix issue #2568: --device mps led to TypeError: forward() got an unex…
Phil-U-U Oct 18, 2023
f06b202
Add Mistral-7B-OpenOrca conversation_temmplate (#2585)
waynespa Oct 20, 2023
8e90d5c
docs: bit misspell comments model adapter default template name conve…
guspan-tanadi Oct 21, 2023
6a149bb
Update Mistral template (#2581)
Gk-rohan Oct 21, 2023
f752996
Fix <s> in mistral template
merrymercy Oct 21, 2023
d61d43e
Update README.md (vicuna-v1.3 -> vicuna-1.5) (#2592)
infwinston Oct 21, 2023
582f48b
Update README.md to highlight chatbot arena (#2596)
infwinston Oct 24, 2023
220257a
Add Lemur model (#2584)
ugolotti Oct 24, 2023
ab169f6
add trust_remote_code=True in BaseModelAdapter (#2583)
edisonwd Oct 24, 2023
cbf2853
Openai interface add use beam search and best of 2 (#2442)
leiwen83 Oct 24, 2023
09e4357
Update qwen and add pygmalion (#2607)
Trangle Oct 28, 2023
7a31d3b
feat: Support model AquilaChat2 (#2616)
fangyinc Nov 1, 2023
d5e4b27
Added settings vllm (#2599)
SebastianBodza Nov 1, 2023
af4dfe3
[Logprobs] Support logprobs=1 (#2612)
comaniac Nov 1, 2023
dd84d16
release v0.2.32
merrymercy Nov 1, 2023
40b235d
fix: Fix for OpenOrcaAdapter to return correct conversation template …
vjsrinath Nov 2, 2023
3d9430a
Make fastchat.serve.model_worker to take debug argument (#2628)
uinone Nov 2, 2023
fdefb5f
openchat 3.5 model support (#2638)
imoneoi Nov 3, 2023
d5a078b
xFastTransformer framework support (#2615)
a3213105 Nov 3, 2023
e8a839a
feat: support custom models vllm serving (#2635)
congchan Nov 5, 2023
86f044b
kill only fastchat process (#2641)
scenaristeur Nov 6, 2023
5d453e4
Update server_arch.png
merrymercy Nov 6, 2023
77932a1
Use conv.update_last_message api in mt-bench answer generation (#2647)
merrymercy Nov 7, 2023
32c41de
Improve Azure OpenAI interface (#2651)
infwinston Nov 7, 2023
f2810e5
Add required_temp support in jsonl format to support flexible tempera…
CodingWithTim Nov 8, 2023
ab01027
Pin openai version < 1 (#2658)
infwinston Nov 8, 2023
18f5692
Remove exclude_unset parameter (#2654)
snapshotpl Nov 9, 2023
2ab0026
Revert "Remove exclude_unset parameter" (#2666)
merrymercy Nov 9, 2023
09033af
added support for CodeGeex(2) (#2645)
peterwilli Nov 9, 2023
e46d97a
add chatglm3 conv template support in conversation.py (#2622)
ZeyuTeng96 Nov 10, 2023
e0b351a
UI and model change (#2672)
infwinston Nov 12, 2023
1901125
train_flant5: fix typo (#2673)
Force1ess Nov 12, 2023
a19866b
Fix gpt template (#2674)
infwinston Nov 12, 2023
a333a55
Update README.md (#2679)
merrymercy Nov 13, 2023
aeec0e0
feat: support template's stop_str as list (#2678)
congchan Nov 13, 2023
9cfeb15
Update exllama_v2.md (#2680)
jm23jeffmorgan Nov 15, 2023
a1324de
save model under deepspeed (#2689)
MrZhengXin Nov 18, 2023
fdf7b2c
Adding SSL support for model workers and huggingface worker (#2687)
lnguyen Nov 18, 2023
e53c73f
Check the max_new_tokens <= 0 in openai api server (#2688)
zeyugao Nov 19, 2023
8bd422b
Add Microsoft/Orca-2-7b and update model support docs (#2714)
BabyChouSr Nov 22, 2023
849a815
fix tokenizer of chatglm2 (#2711)
wangshuai09 Nov 22, 2023
af8d877
Template for using Deepseek code models (#2705)
AmaleshV Nov 22, 2023
85c797e
add support for Chinese-LLaMA-Alpaca (#2700)
zollty Nov 22, 2023
99d19ac
Make --load-8bit flag work with weights in safetensors format (#2698)
xuguodong1999 Nov 22, 2023
0bbeddc
Format code and minor bug fix (#2716)
merrymercy Nov 22, 2023
0a5ad3e
Bump version to v0.2.33 (#2717)
merrymercy Nov 22, 2023
3389cc3
fix tokenizer.pad_token attribute error (#2710)
wangshuai09 Nov 22, 2023
ff25295
support stable-vicuna model (#2696)
hi-jin Nov 23, 2023
6ac7d76
Exllama cache 8bit (#2719)
mjkaye Nov 23, 2023
1f21efb
Add Yi support (#2723)
infwinston Nov 23, 2023
a754c48
Add Hermes 2.5 [fixed] (#2725)
152334H Nov 23, 2023
c199c8d
Fix Hermes2Adapter (#2727)
lewtun Nov 26, 2023
cfba5f4
Fix YiAdapter (#2730)
Jingsong-Yan Nov 26, 2023
96aed4c
add trust_remote_code argument (#2715)
wangshuai09 Nov 26, 2023
3352306
Add revision arg to MT Bench answer generation (#2728)
lewtun Nov 26, 2023
76fbdef
Fix MPS backend 'index out of range' error (#2737)
suquark Nov 26, 2023
686ab04
add starling support (#2738)
infwinston Nov 27, 2023
decceed
Add deepseek chat (#2760)
BabyChouSr Dec 1, 2023
c842764
a convenient script for spinning up the API with Model Workers (#2790)
ckgresla Dec 9, 2023
173f4de
Prevent returning partial stop string in vllm worker (#2780)
pandada8 Dec 9, 2023
d82f7ec
Update UI and new models (#2762)
infwinston Dec 9, 2023
6ffc2ce
Support MetaMath (#2748)
iojw Dec 9, 2023
3abb7cb
Use common logging code in the OpenAI API server (#2758)
geekoftheweek Dec 9, 2023
ea90cf0
Show how to turn on experiment tracking for fine-tuning (#2742)
morganmcg1 Dec 9, 2023
2988943
Support xDAN-L1-Chat Model (#2732)
xiechengmude Dec 9, 2023
65741f0
Format code
merrymercy Dec 9, 2023
a7eb750
Update the version to 0.2.34 (#2793)
merrymercy Dec 9, 2023
5aff351
add dolphin (#2794)
infwinston Dec 9, 2023
ec9a07e
Fix tiny typo (#2805)
bofenghuang Dec 11, 2023
2829fd7
Add instructions for evaluating on MT bench using vLLM (#2770)
iojw Dec 16, 2023
048d813
Update README.md
infwinston Dec 16, 2023
979314b
Add SOLAR-10.7b Instruct Model (#2826)
BabyChouSr Dec 17, 2023
6ccefcf
Update README.md (#2852)
eltociear Dec 24, 2023
74cd881
fix: 'compeletion' typo (#2847)
congchan Dec 24, 2023
c946d0d
Add Tunnelmole as an open source alternative to ngrok and include usa…
robbie-cahill Dec 24, 2023
c7eaa4d
update readme
merrymercy Dec 24, 2023
34e6212
update mt-bench readme
merrymercy Dec 24, 2023
05cc60c
Add support for CatPPT (#2840)
rishiraj Dec 24, 2023
1ffdaee
Add functionality to ping AI2 InferD endpoints for tulu 2 (#2832)
natolambert Dec 24, 2023
c77b0a2
add download models from www.modelscope.cn (#2830)
liuyhwangyh Dec 24, 2023
c405ed6
Fix conv_template of chinese alpaca 2 (#2812)
zollty Dec 24, 2023
c214688
add bagel model adapter (#2814)
jondurbin Dec 24, 2023
43532db
add root_path argument to gradio web server. (#2807)
stephanbertl Dec 24, 2023
05755c2
Import `accelerate` locally to avoid it as a strong dependency (#2820)
chiragjn Dec 24, 2023
0cf2886
Replace dict merge with unpacking for compatibility of 3.8 in vLLM wo…
rudeigerc Dec 24, 2023
c70bb3d
Format code (#2854)
merrymercy Dec 24, 2023
82ef3a3
Openai API migrate (#2765)
andy-yang-1 Dec 24, 2023
093574d
fix openai api server docs
merrymercy Dec 24, 2023
e39ec88
Add a16z as a sponser
Ying1123 Dec 24, 2023
a28563b
Add new models (Perplexity, gemini) & Separate GPT versions (#2856)
merrymercy Dec 24, 2023
bab105a
Clean error messages (#2857)
merrymercy Dec 24, 2023
e67b21d
Update docs (#2858)
Ying1123 Dec 24, 2023
5f7211d
Modify doc description (#2859)
zhangsibo1129 Dec 28, 2023
1f5c5f3
Fix the problem of not using the decoding method corresponding to the…
Jingsong-Yan Dec 28, 2023
3623d9b
update a new sota model on MT-Bench which touch an 8.8 scores. (#2864)
xiechengmude Dec 28, 2023
1368f3f
NPU needs to be initialized when starting a new process (#2843)
jq460494839 Dec 28, 2023
719022f
Fix the problem with "vllm + chatglm3" (#2845) (#2876)
Dec 30, 2023
4735aa7
Update token spacing for mistral conversation.py (#2872)
thavens Dec 30, 2023
722ab02
check if hm in models before deleting to avoid errors (#2870)
joshua-ne Dec 30, 2023
01a007a
Add TinyLlama (#2889)
Gk-rohan Jan 7, 2024
2ed5c5e
Fix bug that model doesn't automatically switch peft adapter (#2884)
Jingsong-Yan Jan 7, 2024
0fa13a6
Update web server commands (#2869)
merrymercy Jan 7, 2024
cd59fd5
fix the tokenize process and prompt template of chatglm3 (#2883)
WHDY Jan 7, 2024
f591012
Add `Notus` support (#2813)
gabrielmbmb Jan 7, 2024
2dbb9f1
feat: support anthropic api with api_dict (#2879)
congchan Jan 8, 2024
06acba1
Update model_adapter.py (#2895)
thavens Jan 8, 2024
6ff8505
leaderboard code update (#2867)
infwinston Jan 10, 2024
06522c7
fix: change order of SEQUENCE_LENGTH_KEYS (#2925)
congchan Jan 17, 2024
13d673c
fix baichuan:apply_prompt_template call args error (#2921)
Force1ess Jan 17, 2024
36e962a
Fix a typo in openai_api_server.py (#2905)
jklj077 Jan 17, 2024
b896373
feat: use variables OPENAI_MODEL_LIST (#2907)
congchan Jan 17, 2024
4a96df4
Add TenyxChat-7B-v1 model (#2901)
sarath-shekkizhar Jan 17, 2024
9cf1d6f
add support for iei yuan2.0 (https://huggingface.co/IEITYuan) (#2919)
wangpengfei1013 Jan 17, 2024
d6ca36a
nous-hermes-2-mixtral-dpo (#2922)
152334H Jan 17, 2024
bb8aae5
Bump the version to 0.2.35 (#2927)
merrymercy Jan 17, 2024
e86e70d
fix specify local path issue use model from www.modelscope.cn (#2934)
liuyhwangyh Jan 18, 2024
8163cb2
support openai embedding for topic clustering (#2729)
CodingWithTim Jan 20, 2024
99f60bb
Remove duplicate API endpoint (#2949)
surak Jan 24, 2024
3eea41e
Update Hermes Mixtral (#2938)
teknium1 Jan 24, 2024
7ef41f3
Enablement of REST API Usage within Google Colab Free Tier (#2940)
ggcr Jan 24, 2024
7c1b7dd
Create a new worker implementation for Apple MLX (#2937)
aliasaria Jan 24, 2024
eca55f5
feat: support Model Yuan2.0, a new generation Fundamental Large Langu…
cauwulixuan Jan 24, 2024
9ba90ad
Fix the pooling method of BGE embedding model (#2926)
staoxiao Jan 24, 2024
7003bbd
format code
merrymercy Jan 24, 2024
df81798
SGLang Worker (#2928)
BabyChouSr Jan 24, 2024
5f46ff4
Fix sglang worker (#2953)
merrymercy Jan 24, 2024
c6d7acd
Update mlx_worker to be async (#2958)
aliasaria Jan 25, 2024
c19a953
Integrate LightLLM into serve worker (#2888)
zeyugao Jan 25, 2024
b837ddc
Copy button (#2963)
surak Jan 25, 2024
6d4ca23
feat: train with template (#2951)
congchan Jan 26, 2024
eeabf52
fix content maybe a str (#2968)
zhouzaida Jan 26, 2024
dc26514
Adding download folder information in README (#2972)
dheeraj-326 Jan 29, 2024
a3aa5cd
use cl100k_base as the default tiktoken encoding (#2974)
bjwswang Jan 29, 2024
81b80fd
Update README.md (#2975)
merrymercy Jan 29, 2024
29c2bd3
Fix tokenizer for vllm worker (#2984)
Michaelvll Jan 31, 2024
707d9ba
update yuan2.0 generation (#2989)
wangpengfei1013 Feb 1, 2024
9924687
fix: tokenization mismatch when training with different templates (#2…
congchan Feb 1, 2024
3bef934
fix: inconsistent tokenization by llama tokenizer (#3006)
congchan Feb 3, 2024
81785d7
Fix type hint for play_a_match_single (#3008)
MonkeyLeeT Feb 4, 2024
2264204
code update (#2997)
infwinston Feb 5, 2024
6a530e1
Update model_support.md (#3016)
infwinston Feb 5, 2024
1db84d0
Update lightllm_integration.md (#3014)
eltociear Feb 6, 2024
3f61c6e
Upgrade gradio to 4.17 (#3027)
infwinston Feb 9, 2024
ddb2cc9
Update MLX integration to use new generate_step function signature (#…
aliasaria Feb 9, 2024
8c6443e
Update readme (#3028)
merrymercy Feb 9, 2024
7c12409
Update gradio version in `pyproject.toml` and fix a bug (#3029)
merrymercy Feb 9, 2024
b9d4d15
Update gradio demo and API model providers (#3030)
merrymercy Feb 10, 2024
98b8f64
Gradio Web Server for Multimodal Models (#2960)
BabyChouSr Feb 10, 2024
e087d6e
Migrate the gradio server to openai v1 (#3032)
merrymercy Feb 10, 2024
b21d0f7
Update version to 0.2.36 (#3033)
merrymercy Feb 11, 2024
ac2a899
Add llava 34b template (#3034)
merrymercy Feb 11, 2024
26505e2
Update model support (#3040)
merrymercy Feb 13, 2024
324bcc6
Add psutil to pyproject.toml dependencies (#3039)
ShukantPal Feb 13, 2024
81225fc
Fix SGLang worker (#3045)
merrymercy Feb 14, 2024
4e734fe
Random VQA Sample button for VLM direct chat (#3041)
lisadunlap Feb 14, 2024
ed6735d
Update arena.md to fix link (#3051)
logankilpatrick Feb 15, 2024
68c542c
Merge commit 'ed6735d' into merge_0223
renning22 Feb 24, 2024
ec856f3
multi inference
renning22 Feb 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ We are focused to support Llama2 at scale now. If you want any other models, ple

## Dev Log

### 2024-02

Sync upstream changes

### 2023-09

Sync upstream changes
Expand Down
13 changes: 7 additions & 6 deletions docs/arena.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@ We invite the entire community to join this benchmarking effort by contributing
## How to add a new model
If you want to see a specific model in the arena, you can follow the methods below.

- Method 1: Hosted by LMSYS.
1. Contribute the code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md#how-to-support-a-new-model).
2. After the model is supported, we will try to schedule some compute resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors.
### Method 1: Hosted by 3rd party API providers or yourself
If you have a model hosted by a 3rd party API provider or yourself, please give us the access to an API endpoint.
- We prefer OpenAI-compatible APIs, so we can reuse our [code](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py) for calling OpenAI models.
- If you have your own API protocol, please follow the [instructions](model_support.md) to add them. Contribute your code by sending a pull request.

- Method 2: Hosted by 3rd party API providers or yourself.
1. If you have a model hosted by a 3rd party API provider or yourself, please give us an API endpoint. We prefer OpenAI-compatible APIs, so we can reuse our [code](https://github.com/lm-sys/FastChat/blob/33dca5cf12ee602455bfa9b5f4790a07829a2db7/fastchat/serve/gradio_web_server.py#L333-L358) for calling OpenAI models.
2. You can use FastChat's OpenAI API [server](openai_api.md) to serve your model with OpenAI-compatible APIs and provide us with the endpoint.
### Method 2: Hosted by LMSYS
1. Contribute the code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md).
2. After the model is supported, we will try to schedule some compute resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors.
5 changes: 4 additions & 1 deletion docs/commands/webserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,13 @@ python3 -m fastchat.serve.test_message --model vicuna-13b --controller http://lo

cd fastchat_logs/server0

python3 -m fastchat.serve.huggingface_api_worker --model-info-file ~/elo_results/register_hf_api_models.json

export OPENAI_API_KEY=
export ANTHROPIC_API_KEY=
export GCP_PROJECT_ID=

python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 10 --add-chatgpt --add-claude --add-palm --anony-only --elo ~/elo_results/elo_results.pkl --leaderboard-table-file ~/elo_results/leaderboard_table.csv --register ~/elo_results/register_oai_models.json --show-terms
python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 50 --add-chatgpt --add-claude --add-palm --elo ~/elo_results/elo_results.pkl --leaderboard-table-file ~/elo_results/leaderboard_table.csv --register ~/elo_results/register_oai_models.json --show-terms

python3 backup_logs.py
```
Expand Down
18 changes: 18 additions & 0 deletions docs/lightllm_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# LightLLM Integration
You can use [LightLLM](https://github.com/ModelTC/lightllm) as an optimized worker implementation in FastChat.
It offers advanced continuous batching and a much higher (~10x) throughput.
See the supported models [here](https://github.com/ModelTC/lightllm?tab=readme-ov-file#supported-model-list).

## Instructions
1. Please refer to the [Get started](https://github.com/ModelTC/lightllm?tab=readme-ov-file#get-started) to install LightLLM. Or use [Pre-built image](https://github.com/ModelTC/lightllm?tab=readme-ov-file#container)

2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the LightLLM worker (`fastchat.serve.lightllm_worker`). All other commands such as controller, gradio web server, and OpenAI API server are kept the same. Refer to [--max_total_token_num](https://github.com/ModelTC/lightllm/blob/4a9824b6b248f4561584b8a48ae126a0c8f5b000/docs/ApiServerArgs.md?plain=1#L23) to understand how to calculate the `--max_total_token_num` argument.
```
python3 -m fastchat.serve.lightllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer_mode "auto" --max_total_token_num 154000
```

If you what to use quantized weight and kv cache for inference, try

```
python3 -m fastchat.serve.lightllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer_mode "auto" --max_total_token_num 154000 --mode triton_int8weight triton_int8kv
```
23 changes: 23 additions & 0 deletions docs/mlx_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Apple MLX Integration

You can use [Apple MLX](https://github.com/ml-explore/mlx) as an optimized worker implementation in FastChat.

It runs models efficiently on Apple Silicon

See the supported models [here](https://github.com/ml-explore/mlx-examples/tree/main/llms#supported-models).

Note that for Apple Silicon Macs with less memory, smaller models (or quantized models) are recommended.

## Instructions

1. Install MLX.

```
pip install "mlx-lm>=0.0.6"
```

2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the MLX worker (`fastchat.serve.mlx_worker`). Remember to launch a model worker after you have launched the controller ([instructions](../README.md))

```
python3 -m fastchat.serve.mlx_worker --model-path TinyLlama/TinyLlama-1.1B-Chat-v1.0
```
93 changes: 67 additions & 26 deletions docs/model_support.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,48 @@
# Model Support
This document describes how to support a new model in FastChat.

## Supported models
## Content
- [Local Models](#local-models)
- [API-Based Models](#api-based-models)

## Local Models
To support a new local model in FastChat, you need to correctly handle its prompt template and model loading.
The goal is to make the following command run with the correct prompts.

```
python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH]
```

You can run this example command to learn the code logic.

```
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5
```

You can add `--debug` to see the actual prompt sent to the model.

### Steps

FastChat uses the `Conversation` class to handle prompt templates and `BaseModelAdapter` class to handle model loading.

1. Implement a conversation template for the new model at [fastchat/conversation.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py). You can follow existing examples and use `register_conv_template` to add a new one. Please also add a link to the official reference code if possible.
2. Implement a model adapter for the new model at [fastchat/model/model_adapter.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_adapter.py). You can follow existing examples and use `register_model_adapter` to add a new one.
3. (Optional) add the model name to the "Supported models" [section](#supported-models) above and add more information in [fastchat/model/model_registry.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_registry.py).

After these steps, the new model should be compatible with most FastChat features, such as CLI, web UI, model worker, and OpenAI-compatible API server. Please do some testing with these features as well.

### Supported models

- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
- example: `python3 -m fastchat.serve.cli --model-path meta-llama/Llama-2-7b-chat-hf`
- Vicuna, Alpaca, LLaMA, Koala
- example: `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5`
- [allenai/tulu-2-dpo-7b](https://huggingface.co/allenai/tulu-2-dpo-7b)
- [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
- [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B)
- [BAAI/AquilaChat2-34B](https://huggingface.co/BAAI/AquilaChat2-34B)
- [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en#using-huggingface-transformers)
- [argilla/notus-7b-v1](https://huggingface.co/argilla/notus-7b-v1)
- [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
- [BlinkDL/RWKV-4-Raven](https://huggingface.co/BlinkDL/rwkv-4-raven)
- example: `python3 -m fastchat.serve.cli --model-path ~/model_weights/RWKV-4-Raven-7B-v11x-Eng99%-Other1%-20230429-ctx8192.pth`
Expand All @@ -18,13 +51,20 @@
- [camel-ai/CAMEL-13B-Combined-Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data)
- [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf)
- [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
- [deepseek-ai/deepseek-llm-67b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat)
- [deepseek-ai/deepseek-coder-33b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
- [FlagAlpha/Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat)
- [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b)
- [FreedomIntelligence/ReaLM-7b-v1](https://huggingface.co/FreedomIntelligence/Realm-7b)
- [h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b)
- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
- [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
- [IEITYuan/Yuan2-2B/51B/102B-hf](https://huggingface.co/IEITYuan)
- [lcw99/polyglot-ko-12.8b-chang-instruct-chat](https://huggingface.co/lcw99/polyglot-ko-12.8b-chang-instruct-chat)
- [lmsys/fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5)
- [meta-math/MetaMath-7B-V1.0](https://huggingface.co/meta-math/MetaMath-7B-V1.0)
- [Microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b)
- [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
- example: `python3 -m fastchat.serve.cli --model-path mosaicml/mpt-7b-chat`
- [Neutralzz/BiLLa-7B-SFT](https://huggingface.co/Neutralzz/BiLLa-7B-SFT)
Expand All @@ -34,56 +74,57 @@
- [OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5)
- [openchat/openchat_3.5](https://huggingface.co/openchat/openchat_3.5)
- [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)
- [VMware/open-llama-7b-v2-open-instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
- [OpenLemur/lemur-70b-chat-v1](https://huggingface.co/OpenLemur/lemur-70b-chat-v1)
- [Phind/Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2)
- [project-baize/baize-v2-7b](https://huggingface.co/project-baize/baize-v2-7b)
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
- [rishiraj/CatPPT](https://huggingface.co/rishiraj/CatPPT)
- [Salesforce/codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
- [StabilityAI/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
- [tenyx/TenyxChat-7B-v1](https://huggingface.co/tenyx/TenyxChat-7B-v1)
- [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)
- [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
- [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
- [tiiuae/falcon-180B-chat](https://huggingface.co/tiiuae/falcon-180B-chat)
- [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)
- [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
- [VMware/open-llama-7b-v2-open-instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
- [WizardLM/WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
- [Xwin-LM/Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1)
- [OpenLemur/lemur-70b-chat-v1](https://huggingface.co/OpenLemur/lemur-70b-chat-v1)
- [allenai/tulu-2-dpo-7b](https://huggingface.co/allenai/tulu-2-dpo-7b)
- [Microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b)
- Any [EleutherAI](https://huggingface.co/EleutherAI) pythia model such as [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b)
- Any [Peft](https://github.com/huggingface/peft) adapter trained on top of a
model above. To activate, must have `peft` in the model path. Note: If
loading multiple peft models, you can have them share the base model weights by
setting the environment variable `PEFT_SHARE_BASE_WEIGHTS=true` in any model
worker.

## How to support a new model

To support a new model in FastChat, you need to correctly handle its prompt template and model loading.
The goal is to make the following command run with the correct prompts.
## API-Based Models
To support an API-based model, consider learning from the existing OpenAI example.
If the model is compatible with OpenAI APIs, then a configuration file is all that's needed without any additional code.
For custom protocols, implementation of a streaming generator in [fastchat/serve/api_provider.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py) is required, following the provided examples. Currently, FastChat is compatible with OpenAI, Anthropic, Google Vertex AI, Mistral, and Nvidia NGC.

### Steps to Launch a WebUI with an API Model
1. Specify the endpoint information in a JSON configuration file. For instance, create a file named `api_endpoints.json`:
```json
{
"gpt-3.5-turbo": {
"model_name": "gpt-3.5-turbo",
"api_type": "openai",
"api_base": "https://api.openai.com/v1",
"api_key": "sk-******",
"anony_only": false
}
}
```
python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH]
```

You can run this example command to learn the code logic.
- "api_type" can be one of the following: openai, anthropic, gemini, or mistral. For custom APIs, add a new type and implement it accordingly.
- "anony_only" indicates whether to display this model in anonymous mode only.

2. Launch the Gradio web server with the argument `--register api_endpoints.json`:
```
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5
python3 -m fastchat.serve.gradio_web_server --controller "" --share --register api_endpoints.json
```

You can add `--debug` to see the actual prompt sent to the model.

### Steps

FastChat uses the `Conversation` class to handle prompt templates and `BaseModelAdapter` class to handle model loading.

1. Implement a conversation template for the new model at [fastchat/conversation.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py). You can follow existing examples and use `register_conv_template` to add a new one. Please also add a link to the official reference code if possible.
2. Implement a model adapter for the new model at [fastchat/model/model_adapter.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_adapter.py). You can follow existing examples and use `register_model_adapter` to add a new one.
3. (Optional) add the model name to the "Supported models" [section](#supported-models) above and add more information in [fastchat/model/model_registry.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_registry.py).

After these steps, the new model should be compatible with most FastChat features, such as CLI, web UI, model worker, and OpenAI-compatible API server. Please do some testing with these features as well.
Now, you can open a browser and interact with the model.
15 changes: 8 additions & 7 deletions docs/openai_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ The following OpenAI APIs are supported:
- Completions. (Reference: https://platform.openai.com/docs/api-reference/completions)
- Embeddings. (Reference: https://platform.openai.com/docs/api-reference/embeddings)

The REST API can be seamlessly operated from Google Colab, as demonstrated in the [FastChat_API_GoogleColab.ipynb](https://github.com/lm-sys/FastChat/blob/main/playground/FastChat_API_GoogleColab.ipynb) notebook, available in our repository. This notebook provides a practical example of how to utilize the API effectively within the Google Colab environment.

## RESTful API Server
First, launch the controller

Expand All @@ -32,29 +34,28 @@ Now, let us test the API server.
### OpenAI Official SDK
The goal of `openai_api_server.py` is to implement a fully OpenAI-compatible API server, so the models can be used directly with [openai-python](https://github.com/openai/openai-python) library.

First, install openai-python:
First, install OpenAI python package >= 1.0:
```bash
pip install --upgrade openai
```

Then, interact with model vicuna:
Then, interact with the Vicuna model:
```python
import openai
# to get proper authentication, make sure to use a valid key that's listed in
# the --api-keys flag. if no flag value is provided, the `api_key` will be ignored.

openai.api_key = "EMPTY"
openai.api_base = "http://localhost:8000/v1"
openai.base_url = "http://localhost:8000/v1/"

model = "vicuna-7b-v1.5"
prompt = "Once upon a time"

# create a completion
completion = openai.Completion.create(model=model, prompt=prompt, max_tokens=64)
completion = openai.completions.create(model=model, prompt=prompt, max_tokens=64)
# print the completion
print(prompt + completion.choices[0].text)

# create a chat completion
completion = openai.ChatCompletion.create(
completion = openai.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Hello! What is your name?"}]
)
Expand Down
24 changes: 24 additions & 0 deletions docs/third_party_ui.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Third Party UI
If you want to host it on your own UI or third party UI, you can launch the [OpenAI compatible server](openai_api.md) and host with a tunnelling service such as Tunnelmole or ngrok, and then enter the credentials appropriately.

You can find suitable UIs from third party repos:
- [WongSaang's ChatGPT UI](https://github.com/WongSaang/chatgpt-ui)
- [McKayWrigley's Chatbot UI](https://github.com/mckaywrigley/chatbot-ui)

- Please note that some third-party providers only offer the standard `gpt-3.5-turbo`, `gpt-4`, etc., so you will have to add your own custom model inside the code. [Here is an example of how to create a UI with any custom model name](https://github.com/ztjhz/BetterChatGPT/pull/461).

##### Using Tunnelmole
Tunnelmole is an open source tunnelling tool. You can find its source code on [Github](https://github.com/robbie-cahill/tunnelmole-client). Here's how you can use Tunnelmole:
1. Install Tunnelmole with `curl -O https://install.tunnelmole.com/9Wtxu/install && sudo bash install`. (On Windows, download [tmole.exe](https://tunnelmole.com/downloads/tmole.exe)). Head over to the [README](https://github.com/robbie-cahill/tunnelmole-client) for other methods such as `npm` or building from source.
2. Run `tmole 7860` (replace `7860` with your listening port if it is different from 7860). The output will display two URLs: one HTTP and one HTTPS. It's best to use the HTTPS URL for better privacy and security.
```
➜ ~ tmole 7860
http://bvdo5f-ip-49-183-170-144.tunnelmole.net is forwarding to localhost:7860
https://bvdo5f-ip-49-183-170-144.tunnelmole.net is forwarding to localhost:7860
```

##### Using ngrok
ngrok is a popular closed source tunnelling tool. First download and install it from [ngrok.com](https://ngrok.com/downloads). Here's how to use it to expose port 7860.
```
ngrok http 7860
```
Loading
Loading