A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

YunLiu1 · 2024-12-02T03:24:31Z

A generation issue: when ignore_eos=False and the model's pad_token==eos_token (like Llama3), the generated results in same batch size erased.

What does this PR do?

When generating text with Optimum-habana, if BS>1, ignore_eos=False, the model's pad_token==eos_token (like Llama3.1-8B),
This is an example:
I submit 2 prompts ("Hello world,", "How are you?"), the BS=2, the short one is padded at the left:

After generation, the first pad_token is recognized as eos_token, and the response is erased.

Fixes

I modified the post-process to ignore the left pad_tokens, and only erase the tokens after real eos_token.

Unit Tests

The changed codes passed this Unit Tests.
eos_test_py.txt

Function Tests

And it passed the Function Tests:
lm_eval mmlu_pro_business for Meta-Llama-3.1-8B-Instruct (pad_token=eos_toekn, bs=8):

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
business	1	custom-extract	5	exact_match	↑	0.4791	±	0.0178

lm_eval mmlu_pro_business for llama2-7b (pad_token=0, bs=8):

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
business	1	custom-extract	5	exact_match	↑	0.1888	±	0.0139

…oken (like Llama3), the generated results in same batch size erased.

regisss · 2024-12-02T21:20:20Z

@YunLiu1 Can you provide an example of command that enables to reproduce this issue please?

YunLiu1 · 2024-12-03T00:12:08Z

@YunLiu1 Can you provide an example of command that enables to reproduce this issue please?

Sure, because in run_generation.py the ignore_eos is always True, you need to change the code first,
Edit examples/text-generation/run_generation.py, L512, explicitly set "ignore_eos=False,",
Then run the cmd:

python3 ~/optimum-habana/examples/text-generation/run_generation.py
--model_name_or_path /host/mnt/disk3/hf_models/Meta-Llama-3.1-8B
--use_hpu_graphs --use_kv_cache --bf16 --batch_size 2
--warmup 0 --n_iterations 1
--prompt "Hello world," "How are you?"

There is no output for the short prompt "Hello world,"

For the issue: when ignore_eos=False and the model's pad_token==eos_t…

134eabd

…oken (like Llama3), the generated results in same batch size erased.

YunLiu1 requested review from ssarkar2, bhargaveede and vivekgoe as code owners December 2, 2024 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

YunLiu1 commented Dec 2, 2024

regisss commented Dec 2, 2024

YunLiu1 commented Dec 3, 2024 •

edited

Loading

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

Are you sure you want to change the base?

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

Conversation

YunLiu1 commented Dec 2, 2024

What does this PR do?

Fixes

Unit Tests

Function Tests

regisss commented Dec 2, 2024

YunLiu1 commented Dec 3, 2024 • edited Loading

YunLiu1 commented Dec 3, 2024 •

edited

Loading