Skip to content

Commit

Permalink
chatglm: Add test and inference example
Browse files Browse the repository at this point in the history
  • Loading branch information
mengker33 committed Dec 2, 2024
1 parent e711dbf commit 1904b2b
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 0 deletions.
20 changes: 20 additions & 0 deletions examples/text-generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,26 @@ python run_generation.py \
--prompt "Hello world" "How are you?"
```

Here is an example for THUDM/glm-4-9b-chat:
```
python3 run_generation.py \
--model_name_or_path THUDM/glm-4-9b-chat \
--use_hpu_graphs \
--use_kv_cache \
--do_sample \
--bf16 \
--trim_logits \
--batch_size 1 \
--max_input_tokens 1024 \
--max_new_tokens 512 \
--reuse_cache \
--use_flash_attention
```
Note that for chatglm2/3, we need to set the env variable to load the corresponding tokenizer:
```
GLM=2 or GLM=3
```

> The batch size should be larger than or equal to the number of prompts. Otherwise, only the first N prompts are kept with N being equal to the batch size.
### Run Speculative Sampling on Gaudi
Expand Down
1 change: 1 addition & 0 deletions tests/test_text_generation_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
("openbmb/MiniCPM3-4B", 1, False, 65.116, False),
("baichuan-inc/Baichuan2-7B-Chat", 1, True, 108, False),
("baichuan-inc/Baichuan2-13B-Chat", 1, False, 66, False),
("THUDM/glm-4-9b-chat", 1, True, 105, False),
],
"fp8": [
("tiiuae/falcon-180B", 4, 950, True, 128, 128, 2506.68),
Expand Down

0 comments on commit 1904b2b

Please sign in to comment.