Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

YunLiu1
Copy link

@YunLiu1 YunLiu1 commented Dec 2, 2024

A generation issue: when ignore_eos=False and the model's pad_token==eos_token (like Llama3), the generated results in same batch size erased.

What does this PR do?

When generating text with Optimum-habana, if BS>1, ignore_eos=False, the model's pad_token==eos_token (like Llama3.1-8B),
This is an example:
I submit 2 prompts ("Hello world,", "How are you?"), the BS=2, the short one is padded at the left:
image
After generation, the first pad_token is recognized as eos_token, and the response is erased.

Fixes

I modified the post-process to ignore the left pad_tokens, and only erase the tokens after real eos_token.

Unit Tests

The changed codes passed this Unit Tests.
eos_test_py.txt

Function Tests

And it passed the Function Tests:
lm_eval mmlu_pro_business for Meta-Llama-3.1-8B-Instruct (pad_token=eos_toekn, bs=8):

Tasks Version Filter n-shot Metric Value Stderr
business 1 custom-extract 5 exact_match 0.4791 ± 0.0178

lm_eval mmlu_pro_business for llama2-7b (pad_token=0, bs=8):

Tasks Version Filter n-shot Metric Value Stderr
business 1 custom-extract 5 exact_match 0.1888 ± 0.0139

…oken (like Llama3), the generated results in same batch size erased.
@regisss
Copy link
Collaborator

regisss commented Dec 2, 2024

@YunLiu1 Can you provide an example of command that enables to reproduce this issue please?

@YunLiu1
Copy link
Author

YunLiu1 commented Dec 3, 2024

@YunLiu1 Can you provide an example of command that enables to reproduce this issue please?

Sure, because in run_generation.py the ignore_eos is always True, you need to change the code first,
Edit examples/text-generation/run_generation.py, L512, explicitly set "ignore_eos=False,",
Then run the cmd:

python3 ~/optimum-habana/examples/text-generation/run_generation.py
--model_name_or_path /host/mnt/disk3/hf_models/Meta-Llama-3.1-8B
--use_hpu_graphs --use_kv_cache --bf16 --batch_size 2
--warmup 0 --n_iterations 1
--prompt "Hello world," "How are you?"

There is no output for the short prompt "Hello world,"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants