Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interleaving sliding window for Ministral-8B-Instruct-2410 #10591

Merged
merged 6 commits into from
Nov 30, 2024

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Nov 23, 2024

Same as #10584 but for: https://huggingface.co/mistralai/Ministral-8B-Instruct-2410

Test with:

vllm serve mistralai/Ministral-8B-Instruct-2410 --tokenizer_mode mistral --config_format mistral --load_format mistral --revision ref/pr/18

from https://huggingface.co/mistralai/Ministral-8B-Instruct-2410/discussions/18

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it change llama.py ? the file

https://huggingface.co/mistralai/Ministral-8B-Instruct-2410/blob/main/config.json#L3

seems to indicate it uses MistralForCausalLM

@DarkLight1337
Copy link
Member

why does it change llama.py ? the file

https://huggingface.co/mistralai/Ministral-8B-Instruct-2410/blob/main/config.json#L3

seems to indicate it uses MistralForCausalLM

We use the same LlamaForCausalLM implementation for several models, including MistralForCausalLM. You can check the registry for more details.

@@ -167,13 +167,24 @@ def __init__(
rope_scaling=rope_scaling,
is_neox_style=is_neox_style,
)

layer_idx: int = int(prefix.split(".")[0])
if isinstance(config.interleaved_sliding_window, int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to check hasattr(config, "interleaved_sliding_window")

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the model as a test? Possibly just to the existing test_mistral.py

vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved
@patrickvonplaten
Copy link
Contributor Author

Could you add the model as a test? Possibly just to the existing test_mistral.py

The model is already used in the tests in tests/models/decoder_only/language/test_mistral.py

Changed it manually to:

with vllm_runner(model, dtype=dtype, tokenizer_mode="mistral", revision="refs/pr/18") as vllm_model:

and it everything passes. So as soon as https://huggingface.co/mistralai/Ministral-8B-Instruct-2410/discussions/18 is merged the tests will use the new interleaved attn automatically

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good, LGTM

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 29, 2024
@youkaichao youkaichao enabled auto-merge (squash) November 30, 2024 05:59
@youkaichao
Copy link
Member

@patrickvonplaten do we need to follow any merge order for this pr and https://huggingface.co/mistralai/Ministral-8B-Instruct-2410/discussions/18 ?

@youkaichao youkaichao changed the title [Interleaved ATTN] Support for Mistral-8B Interleaving sliding window for Ministral-8B-Instruct-2410 Nov 30, 2024
@youkaichao youkaichao merged commit e7cfc4e into vllm-project:main Nov 30, 2024
47 of 48 checks passed
@patrickvonplaten
Copy link
Contributor Author

@patrickvonplaten do we need to follow any merge order for this pr and https://huggingface.co/mistralai/Ministral-8B-Instruct-2410/discussions/18 ?

If ok, I'd maybe wait until the next public VLLM release and then merge: https://huggingface.co/mistralai/Ministral-8B-Instruct-2410/discussions/18 as otherwise the default implementation will fail

@patrickvonplaten
Copy link
Contributor Author

Thanks for cleaning up the PR @youkaichao

afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Dec 2, 2024
Signed-off-by: youkaichao <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Signed-off-by: Andrew Feldman <[email protected]>
cedonley pushed a commit to cedonley/vllm that referenced this pull request Dec 7, 2024
Signed-off-by: youkaichao <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Signed-off-by: cedonley <[email protected]>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants