Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generation utils update (minor) #1468

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Conversation

yafshar
Copy link
Contributor

@yafshar yafshar commented Nov 1, 2024

What does this PR do?

  • Fix import path for streamers module from transformers.streamers -> transformers.generation.streamers
  • Fix the _prepare_decoder_attention_mask interface interface
    • return x.index_fill(1, torch.tensor(0), 1) uses the wrong index of torch.tensor(0), it is fixed to the correct index on the correct device index = torch.tensor(0, device=device)
  • Improve the _pad_past_key_values function

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

- Fix the type hint, dtype can not be a str
- Fix the device hint
- Remove the pad token id arg, the decoder_attention_mask is a binary of
  0, and 1
- Added an early return
- Extracted is_mqa_model and lazy_mode to avoid repeated dictionary
  lookups
- Used more descriptive variable names and simplified the nested
  loops for better readability
@yafshar yafshar marked this pull request as ready for review November 4, 2024 17:36
@yafshar yafshar changed the title generation utils update generation utils update (minor) Nov 5, 2024
@yafshar
Copy link
Contributor Author

yafshar commented Nov 8, 2024

The text-generation CI has been executed and will be compared with the main branch once the run is complete.

Copy link
Contributor

@emascarenhas emascarenhas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yafshar , Just a couple of comments below.
Please post results of CI, before and after change.

@emascarenhas
Copy link
Contributor

@yafshar , Makes sense.
Please post the CI results when available and we can move it further along.

@emascarenhas
Copy link
Contributor

@yafshar , Could you post CI results. Thanks.

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Aligned with @emascarenhas, let's make sure there is no regression in generation tests and then I'll merge it 🙂

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yafshar
Copy link
Contributor Author

yafshar commented Dec 2, 2024

I am doing slow CI tests python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s Till now all have been matched exactly. I will finish all the tests by tomorrow

@emascarenhas
Copy link
Contributor

I am doing slow CI tests python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s Till now all have been matched exactly. I will finish all the tests by tomorrow

@yafshar , Can you post results here from the CI tests. Thanks.

@yafshar
Copy link
Contributor Author

yafshar commented Dec 3, 2024

I just finished the CI tests

on both main and this PR on the same machine

>>> python -m pytest tests/test_text_generation_example.py tests/test_encoder_decoder.py -v -s
4 failed, 59 passed

I checked the failures

-> test_text_generation_bf16_1x[token0-EleutherAI/gpt-j-6b-1-False-160.5823842101192-False]
pr:   FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-google/gemma-7b-1-False-109.70751574382221-True] - AssertionError: assert 'DeepSpeed is...be efficient,' == 'DeepSpeed is... PyTorch, and'
main: FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-google/gemma-7b-1-False-109.70751574382221-True] - AssertionError: assert 'DeepSpeed is...be efficient,' == 'DeepSpeed is... PyTorch, and'

-> test_text_generation_bf16_1x[token0-state-spaces/mamba-130m-hf-1536-False-5385.511100161605-False]
pr:   FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-state-spaces/mamba-130m-hf-1536-False-5385.511100161605-False] - assert 4895.173518373703 >= ((2 - 1.05) * 5385.511100161605)
main: FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-state-spaces/mamba-130m-hf-1536-False-5385.511100161605-False] - assert 4895.212904578489 >= ((2 - 1.05) * 5385.511100161605)

-> test_text_generation_bf16_1x[token0-Deci/DeciLM-7B-1-False-120-False]
pr:   FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-Deci/DeciLM-7B-1-False-120-False] - assert 107.58924903315328 >= ((2 - 1.05) * 120)
main: FAILED tests/test_text_generation_example.py::test_text_generation_bf16_1x[token0-Deci/DeciLM-7B-1-False-120-False] - assert 107.56332773820075 >= ((2 - 1.05) * 120)

-> test_text_generation_fp8[token0-tiiuae/falcon-180B-4-950-True-128-128-2506.68]
pr:   FAILED tests/test_text_generation_example.py::test_text_generation_fp8[token0-tiiuae/falcon-180B-4-950-True-128-128-2506.68] - AssertionError: The following command failed:
main: FAILED tests/test_text_generation_example.py::test_text_generation_fp8[token0-tiiuae/falcon-180B-4-950-True-128-128-2506.68] - AssertionError: The following command failed:

The failures are exactly the same.

@yafshar
Copy link
Contributor Author

yafshar commented Dec 3, 2024

@regisss @emascarenhas I do not see any regression. The behavior is the same as far as I tested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants