-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ultravox Support for LoRA #11253
base: main
Are you sure you want to change the base?
Ultravox Support for LoRA #11253
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Hi folks, I'm working with @petersalas on this. PR is not complete but wanted to start the discussion as I have some open questions and need some help from vLLM community |
supported_lora_modules = [ | ||
'up_proj', 'down_proj', 'gate_proj', 'v_proj', 'o_proj', 'k_proj', | ||
'q_proj' | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeejeelee since you have worked on loras extensively in vllm, i want to get your input. Ultravox uses llama for language model and loads vllm LlamaForCausalLM
which supports LoRA. Given that, what should happen in Ultravox to support LoRA?
As I was adding Llama 3.1 supported modules here, I found that vLLM wasn't loading all v_proj, k_proj, etc. Is that because LlamaForCausalLM supports packed module for vkq_proj
? Also I didn't find packed_modules
in the original LLama 3.1 model. Why does vLLM defines those in LlamaForCausalLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also do i need to use LLMWrapper
when initializing the language model to align the prefix so that vLLM can load Peft loras? Like this here: https://github.com/vllm-project/vllm/pull/7199/files#diff-7b8a4e258637b7c94389c745c449c52137d33cf92957f3e5bcb18a0ee204b21bR807
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that, what should happen in Ultravox to support LoRA?
Treat it as a independent model - you need to add LoRA-related static variables to it
Why does vLLM defines those in LlamaForCausalLM?
Since these are static variables in the model interface
Also do i need to use LLMWrapper
LLMWrapper
has been removed, so this probably doesn't need to be added. I'll figure it out later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Ultravox can be run with mixtral, should i also add its supported lora modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does one figure out what are supported lora modules in LlamaForCausalLM because Llama upstream have no packed modules? Just trying to understand how it all works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh I think i have all that code if you check my pr. But vllm isn't activating the module because fully qualified module names don't match as per my previous comment. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treat UltraVox as an independent model, and its fully qualified module names differ from LLaMA's, you cannot use LLaMA's LoRA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the qualified module names specific to vLLM? AFAICT I have checked peft and I can't find this limitation. I'm writing up the code for peft to verify that. Maybe I'm missing something but it does seem odd to me that I can't apply an existing LLama Lora in Ultravox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeejeelee you were right. after further troubleshooting, peft don't apply llama lora on ultravox either - it ignores missing modules silently until the recent version which logs warning if lora is missing model modules. So I ended up creating a transformed lora for ultravox for testing and added code in vllm to log a warning if lora module is missing for a model.
I have cleaned up the PR and added a test case which should give enough coverage. I'm checking few more things but the PR should be ready shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeejeelee AFAICT LlamaForCasualLM is used when loading mistral models. Is that correct? If yes, this PR should also cover mistral lora modules since PR includes applicable LlamaForCasualLM lora modules..right?
This pull request has merge conflicts that must be resolved before it can be |
@pytest.fixture(scope="session") | ||
def llama3_1_8b_ultravox_chess_lora(): | ||
# ultravox chess lora is result of transformation of above chess llama lora | ||
return snapshot_download(repo_id="thedebugger11/ultravox-chess-lora") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generated using this script: https://github.com/thedebugger/ultravox-lora-scipts/blob/main/peft_ultravox_transform.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done to align module names in the safetensor adapter file base_model.model
-> base_model.model.language_model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if anyone wants to use llama lora with ultravox, it can do that with this after doing this transformation
771484d
to
64a664f
Compare
Signed-off-by: Sumit Vij <[email protected]>
WIP: lora tests Minor tweaks Moar fixes Temp changes Cleanup Add more debugging logs and packed modules Signed-off-by: Sumit Vij <[email protected]>
Remove stale comment Add llama lora modules Add llama test case Add test case and log warning on missing lora modules Rollback unwanted changes and format fixes Signed-off-by: Sumit Vij <[email protected]>
64a664f
to
3f5996c
Compare
Can you refer to #10022 to minimize the changes? |
Changes are inline with 10022 except the test case and other minor logging changes. Do you have any concerns with any particular change? |
It looks like there are issues with both the added tests and logs. We should only modify the Ultravox scipt, following the changes made in the #10022 |
What is/are the issue(s)? Maybe I miss something but tests are passing |
@jeejeelee lmk what are your concerns please? Happy to address it. Having a test case was super helpful to make sure LoRA works as expected with llama and ultravox |
TODOs: