[Model] LoRA with lm_head and embed_tokens fully trained #8082

sergeykochetkov · 2024-09-02T11:56:07Z

Support lm_head and embed_tokens fully trained in LoRA.

We found that quality of our adapters significantly drops without fully-trained lm_head or lm_head trained in LoRA style.
This is functionality of peft modules_to_save=[lm_head, mebed_tokens] https://huggingface.co/docs/peft/v0.12.0/en/package_reference/&num;peft.LoraConfig.modules_to_save

The idea is to replace base_model VocabParallelEmbedding and ParallelLMHead by layers loaded from modules_to_save at inferencing LoRA

dirty implementation
tests for new functionality
checking old functionality is working
inference with fully trained lm_head performance measurement
implement embed_tokens fully trained as well

PR moved to #11714

github-actions · 2024-09-02T11:56:19Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

sergeykochetkov · 2024-09-11T13:50:28Z

/ready

AlongWY · 2024-09-18T14:09:01Z

should it unmarked as Draft ?

mergify · 2024-10-30T12:04:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. @sergeykochetkov please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: s.kochetkov <[email protected]>

Tostino · 2025-01-02T23:33:37Z

Just wanted to throw out that this is something I am looking forward to.

I am attempting to use Qwen/Qwen2.5-14B as a base model, and load up two LoRA's with the OpenAI API. One of the LoRA's is just the Instruct model extracted as a LoRA from the base. The other model is a fine tune that I did off of the base, and used MergeKit to do a TIES merge with the base and instruct model, and then extracted an adapter from that merge.

Works great when I was testing with HF transformers, but was surprised when I was getting errors trying to use these adapters with vLLM.

sergeykochetkov · 2025-01-04T08:35:00Z

PR is recreated here

s.m.kochetkov added 5 commits August 31, 2024 09:02

last

44539e0

test

f11dc75

almost_works

d1b7e97

loading_weights

69f80fe

it_works

ebfffbe

sergeykochetkov changed the title ~~LoRA with lm_head fully trained~~ [Model] LoRA with lm_head fully trained Sep 2, 2024

s.m.kochetkov added 10 commits September 5, 2024 11:40

benchmarking

60e1563

it_works_fast

915a76a

remove_opt_einsum

879a415

benchmark_latency

8f23df5

no_hardcode

1391944

block_wise_impl

17e75ca

revert

9471379

refactor

6ad3dac

merge_with_main

7e7e2ad

format

1984ef3

sergeykochetkov marked this pull request as ready for review September 11, 2024 13:49

sergeykochetkov marked this pull request as draft September 11, 2024 13:57

SMAntony mentioned this pull request Oct 19, 2024

[Bug]: Unable to infer QLoRA adapter using vLLM Docker #9402

Closed

1 task

s.m.kochetkov added 3 commits October 30, 2024 09:18

embed_tokens_dirty_implemented

8a9cfd5

modules_to_save_embed_tokens_implemented

1de3291

enable_modules_to_save_flag

f2209bb

mergify bot added the needs-rebase label Oct 30, 2024

s.m.kochetkov added 3 commits October 31, 2024 09:21

fix_parse

e29a598

bgmv_embed

17b12a4

block_n

f7396be

s.m.kochetkov and others added 27 commits December 17, 2024 07:59

move_modules_to_save_to_protocol

e94c633

Signed-off-by: s.kochetkov <[email protected]>

last

06f7f2e

test

2d010fe

Signed-off-by: s.kochetkov <[email protected]>

almost_works

7d3ddc2

loading_weights

89b0efb

it_works

24d745f

Signed-off-by: s.kochetkov <[email protected]>

benchmarking

3117b45

it_works_fast

7c5441d

remove_opt_einsum

77a0c56

Signed-off-by: s.kochetkov <[email protected]>

benchmark_latency

3b52d61

no_hardcode

5584966

revert

9f83da4

refactor

1174184

format

9686184

embed_tokens_dirty_implemented

117bb25

modules_to_save_embed_tokens_implemented

02b8e13

enable_modules_to_save_flag

0c835b8

fix_parse

9901ca6

bgmv_embed

0fd9be1

block_n

3f068fa

argument_enable_modules_to_save

51d1f96

lora

59d3654

Signed-off-by: s.kochetkov <[email protected]>

format

af3e4cd

fix_isinstance

c04009a

Signed-off-by: s.kochetkov <[email protected]>

fix_lora_reset

fe0ac00

move_modules_to_save_to_protocol

16596be

Signed-off-by: s.kochetkov <[email protected]>

merge

0165b63

sergeykochetkov mentioned this pull request Jan 3, 2025

[Model] LoRA with lm_head and embed_tokens fully trained - 4 #11714

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] LoRA with lm_head and embed_tokens fully trained #8082

[Model] LoRA with lm_head and embed_tokens fully trained #8082

sergeykochetkov commented Sep 2, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Sep 2, 2024

sergeykochetkov commented Sep 11, 2024

AlongWY commented Sep 18, 2024

mergify bot commented Oct 30, 2024

Tostino commented Jan 2, 2025

sergeykochetkov commented Jan 4, 2025

[Model] LoRA with lm_head and embed_tokens fully trained #8082

Are you sure you want to change the base?

[Model] LoRA with lm_head and embed_tokens fully trained #8082

Conversation

sergeykochetkov commented Sep 2, 2024 • edited by github-actions bot Loading

github-actions bot commented Sep 2, 2024

sergeykochetkov commented Sep 11, 2024

AlongWY commented Sep 18, 2024

mergify bot commented Oct 30, 2024

Tostino commented Jan 2, 2025

sergeykochetkov commented Jan 4, 2025

sergeykochetkov commented Sep 2, 2024 •

edited by github-actions bot

Loading