Skip to content

Commit

Permalink
chatglm: Add test and inference example
Browse files Browse the repository at this point in the history
  • Loading branch information
mengker33 committed Nov 13, 2024
1 parent ab7a247 commit 1b9e9bf
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 0 deletions.
20 changes: 20 additions & 0 deletions examples/text-generation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,26 @@ python run_generation.py \
--prompt "Hello world" "How are you?"
```

Here is an example for THUDM/glm-4-9b-chat:
```
python3 run_generation.py \
--model_name_or_path THUDM/glm-4-9b-chat \
--use_hpu_graphs \
--use_kv_cache \
--do_sample \
--bf16 \
--trim_logits \
--batch_size 1 \
--max_input_tokens 1024 \
--max_new_tokens 512 \
--reuse_cache \
--use_flash_attention
```
Note that for chatglm2/3, we need to set the env variable to load the corresponding tokenizer:
```
GLM=2 or GLM=3
```

> The batch size should be larger than or equal to the number of prompts. Otherwise, only the first N prompts are kept with N being equal to the batch size.
### Run Speculative Sampling on Gaudi
Expand Down
1 change: 1 addition & 0 deletions tests/test_text_generation_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
("EleutherAI/gpt-neo-2.7B", 1, False, 257.2476416844122),
("facebook/xglm-1.7B", 1, False, 357.46365062825083),
("CohereForAI/c4ai-command-r-v01", 1, False, 29.50315234651154),
("THUDM/glm-4-9b-chat", 1, True, 105),
],
"fp8": [
("tiiuae/falcon-180B", 4, 950, True, 128, 128, 2506.68),
Expand Down

0 comments on commit 1b9e9bf

Please sign in to comment.