Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] LoRA with lm_head and embed_tokens fully trained #8082

Open
wants to merge 139 commits into
base: main
Choose a base branch
from

Conversation

sergeykochetkov
Copy link

@sergeykochetkov sergeykochetkov commented Sep 2, 2024

FIX #4186 #2816

Support lm_head and embed_tokens fully trained in LoRA.

We found that quality of our adapters significantly drops without fully-trained lm_head or lm_head trained in LoRA style.
This is functionality of peft modules_to_save=[lm_head, mebed_tokens] https://huggingface.co/docs/peft/v0.12.0/en/package_reference/#peft.LoraConfig.modules_to_save

The idea is to replace base_model VocabParallelEmbedding and ParallelLMHead by layers loaded from modules_to_save at inferencing LoRA

  • dirty implementation
  • tests for new functionality
  • checking old functionality is working
  • inference with fully trained lm_head performance measurement
  • implement embed_tokens fully trained as well

PR moved to #11714

Copy link

github-actions bot commented Sep 2, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@sergeykochetkov sergeykochetkov changed the title LoRA with lm_head fully trained [Model] LoRA with lm_head fully trained Sep 2, 2024
@sergeykochetkov sergeykochetkov marked this pull request as ready for review September 11, 2024 13:49
@sergeykochetkov
Copy link
Author

/ready

@sergeykochetkov sergeykochetkov marked this pull request as draft September 11, 2024 13:57
@AlongWY
Copy link

AlongWY commented Sep 18, 2024

should it unmarked as Draft ?

Copy link

mergify bot commented Oct 30, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @sergeykochetkov please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 30, 2024
@Tostino
Copy link
Contributor

Tostino commented Jan 2, 2025

Just wanted to throw out that this is something I am looking forward to.

I am attempting to use Qwen/Qwen2.5-14B as a base model, and load up two LoRA's with the OpenAI API. One of the LoRA's is just the Instruct model extracted as a LoRA from the base. The other model is a fine tune that I did off of the base, and used MergeKit to do a TIES merge with the base and instruct model, and then extracted an adapter from that merge.

Works great when I was testing with HF transformers, but was surprised when I was getting errors trying to use these adapters with vLLM.

@sergeykochetkov
Copy link
Author

PR is recreated here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: lora base_model.model.lm_head.base_layer.weight is not supported