-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Support for fairseq2 Llama #11442
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Martin Gleize <[email protected]>
Signed-off-by: Martin Gleize <[email protected]>
Signed-off-by: Martin Gleize <[email protected]>
aa73146
to
25ce34c
Compare
Could you add a simple example to the weight loader tests so we can confirm this wont break in the future? |
Thank you for the advice, much appreciated. Just so that I'm clear, best way to do what you said is to add a HF model here: https://github.com/vllm-project/vllm/blob/main/tests/weight_loading/models.txt ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! As @robertgshaw2-neuralmagic mentioned it would be great if you can add an example huggingface model repo in https://github.com/vllm-project/vllm/blob/main/tests/weight_loading/models.txt.
I'm also curious if it's worth adding Fairseq2LlamaForCausalLM
to https://github.com/vllm-project/vllm/blob/main/docs/source/models/supported_models.md#list-of-text-only-language-models, but I'll leave it to your judgement!
def _init_model(self, vllm_config: VllmConfig, prefix: str = ""): | ||
return LlamaModel(vllm_config=vllm_config, prefix=prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you needed to redefine this?
vllm/vllm/model_executor/models/llama.py
Lines 553 to 554 in 8c3230d
def _init_model(self, vllm_config: VllmConfig, prefix: str = ""): | |
return LlamaModel(vllm_config=vllm_config, prefix=prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I had used this as a debugging point but forgot to remove it.
Signed-off-by: Martin Gleize <[email protected]>
This PR adds support for all Llama models produced with fairseq2. Its checkpoint format is mostly the same as the original Llama checkpoint format, up to layer renaming.
The PR provides a model with custom weight loading, and adds
Fairseq2LlamaForCausalLM
to the model registry.