chatglm: Add test and inference example

huggingface · Nov 13, 2024 · 1b9e9bf · 1b9e9bf
1 parent ab7a247
commit 1b9e9bf
Show file tree

Hide file tree

Showing 2 changed files with 21 additions and 0 deletions.
diff --git a/examples/text-generation/README.md b/examples/text-generation/README.md
@@ -89,6 +89,26 @@ python run_generation.py \
 --prompt "Hello world" "How are you?"
 ```
 
+Here is an example for THUDM/glm-4-9b-chat:
+```
+python3 run_generation.py \
+--model_name_or_path THUDM/glm-4-9b-chat \
+--use_hpu_graphs \
+--use_kv_cache \
+--do_sample \
+--bf16 \
+--trim_logits \
+--batch_size 1 \
+--max_input_tokens 1024 \
+--max_new_tokens 512 \
+--reuse_cache \
+--use_flash_attention
+```
+Note that for chatglm2/3, we need to set the env variable to load the corresponding tokenizer:
+```
+GLM=2 or GLM=3
+```
+
 > The batch size should be larger than or equal to the number of prompts. Otherwise, only the first N prompts are kept with N being equal to the batch size.
 
 ### Run Speculative Sampling on Gaudi

diff --git a/tests/test_text_generation_example.py b/tests/test_text_generation_example.py
@@ -47,6 +47,7 @@
             ("EleutherAI/gpt-neo-2.7B", 1, False, 257.2476416844122),
             ("facebook/xglm-1.7B", 1, False, 357.46365062825083),
             ("CohereForAI/c4ai-command-r-v01", 1, False, 29.50315234651154),
+            ("THUDM/glm-4-9b-chat", 1, True, 105),
         ],
         "fp8": [
             ("tiiuae/falcon-180B", 4, 950, True, 128, 128, 2506.68),