A generation issue: when ignore_eos=False and the model's pad_token==eos_t… #1539
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A generation issue: when ignore_eos=False and the model's pad_token==eos_token (like Llama3), the generated results in same batch size erased.
What does this PR do?
When generating text with Optimum-habana, if BS>1, ignore_eos=False, the model's pad_token==eos_token (like Llama3.1-8B),
This is an example:
I submit 2 prompts ("Hello world,", "How are you?"), the BS=2, the short one is padded at the left:
After generation, the first pad_token is recognized as eos_token, and the response is erased.
Fixes
I modified the post-process to ignore the left pad_tokens, and only erase the tokens after real eos_token.
Unit Tests
The changed codes passed this Unit Tests.
eos_test_py.txt
Function Tests
And it passed the Function Tests:
lm_eval mmlu_pro_business for Meta-Llama-3.1-8B-Instruct (pad_token=eos_toekn, bs=8):
lm_eval mmlu_pro_business for llama2-7b (pad_token=0, bs=8):