Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoRA] Change lora_tokenizers capacity #10796

Merged
merged 3 commits into from
Dec 4, 2024
Merged

Conversation

xyang16
Copy link
Contributor

@xyang16 xyang16 commented Nov 30, 2024

Currently lora_tokenizers capacity is max_num_seqs. This is fine if max_loras is smaller than max_num_seqs.

But if max_loras is greater than max_num_seqs, lora_tokenizers should have capacity=max_loras, just like active_adapters LoRALRUCache have capacity=max_loras. If not, items in active_adapters are not evicted, but items in lora_tokenizers are evicted. I think this is unnecessarily evicted, causing performance degradation.

In our benchmark, we test max_loras=50, max_num_seqs=32, 50 adapters are invoked evenly distributed. We found this causing 50% performance degradation in P90 latency, comparing to tokenizers are not evicted.

So raising this PR to make lora_tokenizers capacity to be max(max_loras, max_num_seqs).

Another option is to add max_lora_tokenizers, just like max_loras and max_cpu_loras which controls GPU cache and CPU cache. But this might be an overkill.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@jeejeelee
Copy link
Collaborator

Thanks for your contribution, I will look at this PR tomorrow or the day after tomorrow

Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your contribution. Could we pass max_loras as a keyword argument? This way the overall changes would be minimal.

@xyang16
Copy link
Contributor Author

xyang16 commented Dec 3, 2024

Thanks for your contribution, I will look at this PR tomorrow or the day after tomorrow

@jeejeelee Thanks for reviewing. There's already **tokenizer_config https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer_group/tokenizer_group.py#L18, you mean add to **tokenizer_config?

@jeejeelee
Copy link
Collaborator

Thanks for your contribution, I will look at this PR tomorrow or the day after tomorrow

@jeejeelee Thanks for reviewing. There's already **tokenizer_config https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer_group/tokenizer_group.py#L18, you mean add to **tokenizer_config?

Yeah

@xyang16
Copy link
Contributor Author

xyang16 commented Dec 3, 2024

Thanks for your contribution, I will look at this PR tomorrow or the day after tomorrow

@jeejeelee Thanks for reviewing. There's already **tokenizer_config https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer_group/tokenizer_group.py#L18, you mean add to **tokenizer_config?

Yeah

OK

@xyang16 xyang16 force-pushed the lora branch 3 times, most recently from 1053c54 to 2f362f9 Compare December 3, 2024 07:31
Signed-off-by: Xin Yang <[email protected]>
@xyang16
Copy link
Contributor Author

xyang16 commented Dec 3, 2024

@jeejeelee I have updated the code, please review.

@xyang16 xyang16 force-pushed the lora branch 2 times, most recently from 5defaac to d9994b0 Compare December 4, 2024 06:52
Signed-off-by: Xin Yang <[email protected]>
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM, thanks for your contribution and patience

@jeejeelee jeejeelee enabled auto-merge (squash) December 4, 2024 15:40
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 4, 2024
@jeejeelee jeejeelee merged commit 01d079f into vllm-project:main Dec 4, 2024
70 checks passed
@xyang16 xyang16 deleted the lora branch December 7, 2024 01:34
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
siddvenk pushed a commit to siddvenk/vllm that referenced this pull request Dec 20, 2024
BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants