Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) #6909

Merged
merged 10 commits into from
Dec 31, 2024

Conversation

JohnGiorgi
Copy link
Contributor

@JohnGiorgi JohnGiorgi commented Jul 29, 2024

vLLM will produce slightly incorrect outputs if a user provides a LoRA adapter trained with Rank Stabilized LoRA. I think the only change needed is to detect when a LoRA adapter has set use_rslora and then scale the lora_alpha accordingly, i.e.

lora_alpha = config["lora_alpha"] * math.sqrt(rank) if config["use_rslora"] else config["lora_alpha"]

This one-line PR does exactly that!

I confirmed for my own problem, where adapters are trained with RS-LoRA, this change leads to parity in model outputs between vanilla HuggingFace and vLLM, and without it, the outputs from vLLM are significantly different (and in my estimation, worse). Unfortunately I can't share this model but I could find or create a public example if need be. However, given that all RS LoRA does is scale alpha by sqrt(r) I think this change is de-risked.

I'd be happy to add a unit test with a little guidance!

FIX: #10798

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337 DarkLight1337 requested a review from Yard1 July 30, 2024 03:45
@JohnGiorgi
Copy link
Contributor Author

JohnGiorgi commented Jul 30, 2024

This fix works for my setup, but on second inspection it's a little awkward:

  • It will only work when LoRA checkpoints are loaded with LoRAModel.from_local_checkpoint (not LoRAModel.from_lora_tensors) --> Is this okay?
  • It is technically performing (alpha * sqrt(r)) / r, instead of simply alpha / sqrt(r), because it applies the * in LoRAModel.from_local_checkpoint here:

lora_alpha = config["lora_alpha"]

and the / in LoRALayerWeights here:

vllm/vllm/lora/lora.py

Lines 30 to 33 in c66c7f8

if scaling is None:
self.scaling = self.lora_alpha / self.rank
else:
self.scaling = scaling

I think the solution to both problems is to apply the RS LoRA scaling once, in LoRALayerWeights.__init__. However this class currently has no way of knowing if use_rslora=True. It would be straightforward to provide that from LoRAModel.from_local_checkpoint, but I am not sure how to pass it from LoRAModel.from_lora_tensors

Copy link

github-actions bot commented Nov 1, 2024

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale label Nov 1, 2024
@JohnGiorgi
Copy link
Contributor Author

Ping

@github-actions github-actions bot added unstale and removed stale labels Nov 4, 2024
@JohnGiorgi
Copy link
Contributor Author

Is there anyone who could take a look at this? AFAICT models fine-tuned with RSLoRA still aren't working correctly in vLLM due to this bug (cc @Yard1)

@jeejeelee
Copy link
Collaborator

Is there anyone who could take a look at this? AFAICT models fine-tuned with RSLoRA still aren't working correctly in vLLM due to this bug (cc @Yard1)

Thanks for your contribution.
Sorry for the delayed response. Would you still be interested in completing this feature? If so, please let me know.

@JohnGiorgi
Copy link
Contributor Author

Is there anyone who could take a look at this? AFAICT models fine-tuned with RSLoRA still aren't working correctly in vLLM due to this bug (cc @Yard1)

Thanks for your contribution. Sorry for the delayed response. Would you still be interested in completing this feature? If so, please let me know.

Yes! Happy to help get this over the finish line, just had a couple of concerns I listed above, mainly:

  • Solution will only work when LoRA checkpoints are loaded with LoRAModel.from_local_checkpoint (not LoRAModel.from_lora_tensors) --> Is this okay?
  • It is technically performing (alpha * sqrt(r)) / r, instead of simply alpha / sqrt(r) (both give you the same answer)

@jeejeelee
Copy link
Collaborator

I really appreciate your response.

For RSLoRA specifically, your concerns are all okay . However, considering our future plans to support DoRA(see: #10849), I think we need to give this more thought. I suggest we could implement a class that stores some lora information(e.g. rank, use_rslora,lora_alpha and so on) and pass it to from_lora_tensors. This might be a more suitable approach

@RonanKMcGovern
Copy link
Contributor

RonanKMcGovern commented Dec 6, 2024

Is it a valid fix to change the alpha value on the saved Lora to

alpha_new = alpha * sqrt(r)

Should be mathematically equivalent when the Lora is loaded then right?

[Perhaps I am wrong but it seems an odd convention for the alpha or r to be applied at inference time. They are really learning rate hyperparameters, so the convention of rescaling delta W instead of just rescaling the adapter learning rate seems cumbersome. Not that this library has any say on this.]

@jeejeelee
Copy link
Collaborator

Is it a valid fix to change the alpha value on the saved Lora to

alpha_new = alpha * sqrt(r)

Should be mathematically equivalent when the Lora is loaded then right?

[Perhaps I am wrong but it seems an odd convention for the alpha or r to be applied at inference time. They are really learning rate hyperparameters, so the convention of rescaling delta W instead of just rescaling the adapter learning rate seems cumbersome. Not that this library has any say on this.]

THB, I haven't looked into this implementation yet. We don't have control over how the LoRA weights are saved. What we can do is to align with PEFT.

@RonanKMcGovern
Copy link
Contributor

Confirming that a hacky fix is to update the alpha in the adapter config as I described above:
alpha_new = sqrt(rank) * alpha_old

This will result in the same generation as peft.

The better fix would indeed to read in the rank and do that scaling if use_rslora is true.

@jeejeelee
Copy link
Collaborator

@JohnGiorgi FYI: #11003 has been merged. I think we can modify some code based on this to complete this PR

@JohnGiorgi
Copy link
Contributor Author

Thanks @jeejeelee, hoping to dig into this tomorrow!

@nielsrolf
Copy link

Would be really cool if this gets merged soon! :)

@jeejeelee
Copy link
Collaborator

@JohnGiorgi ,If you don't have the bandwidth, I can complete this over the weekend

@JohnGiorgi
Copy link
Contributor Author

JohnGiorgi commented Dec 23, 2024

@jeejeelee Whoops, sorry got pulled away on something else. Will do this today!

Comment on lines 41 to 45
if self.use_rslora:
self.vllm_scaling_factor = self.lora_alpha / math.sqrt(self.r)
if self.context_length:
if self.vllm_max_position_embeddings is None:
self.vllm_max_position_embeddings = self.context_length
Copy link
Contributor Author

@JohnGiorgi JohnGiorgi Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeejeelee I think this is the cleanest way to support the rsLoRA scaling logic using the new PEFTHelper! My main outstanding worry is I am not sure how the scaling applied for rsLoRA and the scaling applied when self.context_length is provided should interact.

The way this is written, if both self.use_rslora and self.context_length, the custom scaling factor logic for self.context_length will take precedence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main outstanding worry is I am not sure how the scaling applied for rsLoRA and the scaling applied when self.context_length is provided should interact.

IMHO, These two scaling are distinct and operate independently of each other.

We shouldn't use vllm_scaling_factor - instead, we can add a variable called scaling

Copy link
Contributor Author

@JohnGiorgi JohnGiorgi Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me, but its gets a little confusing as LoRAModel already defines scaling_factor and uses it for long context support:

class LoRAModel(AdapterModel):
    """A LoRA fine-tuned model."""

    def __init__(
        self,
        lora_model_id: int,
        rank: int,
        loras: Dict[str, LoRALayerWeights],
        scaling_factor: Optional[float] = None,
    ) -> None:
        """
        Args:
            lora_model_id: The integer id for the lora model.
            rank: lora rank.
            loras: module name -> weights for lora-replaced layers.
            scaling_factor: Scaling factor to support long context lora model.
                None if the lora is not tuned for long context support.
         """

which is populated by PEFTHelper.vllm_scaling_factor,

return cls(lora_model_id,
                   peft_helper.r,
                   loras,
                   scaling_factor=peft_helper.vllm_scaling_factor)

Is there an argument to sync LoRAModel and PEFTHelper so each has a scaling_factor and vllm_scaling_factor argument? (Also, if vllm_scaling_factor is strictly used for long context support, maybe a more explicit name is in order, e.g. long_context_scaling_factor or something)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long_context_scaling_factor looks reasonable. Let me verify this tomorrow and confirm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I'll hang tight until then

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented the related logic, please check if it's reasonable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @jeejeelee! Thanks for adding tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please sync with the main branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It is just the DCO check that is failing but I am unsure how to fix, it suggests a rebase but I don't want to do that as there's multiple commit authors on this branch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a known issue in the current lora test. Let's check if other lora tests can pass first. If they do, we can consider force merging

@jeejeelee jeejeelee changed the title [Bugfix] Support Rank Stabilized LoRA (RSLoRA) [Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) Dec 24, 2024
Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohnGiorgi Thank you for your contribution and patience.

@jeejeelee jeejeelee enabled auto-merge (squash) December 30, 2024 16:04
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 30, 2024
@JohnGiorgi
Copy link
Contributor Author

@JohnGiorgi Thank you for your contribution and patience.

Woohoo! Thanks for all the guidance

@simon-mo simon-mo disabled auto-merge December 31, 2024 06:15
@simon-mo simon-mo merged commit 82c49d3 into vllm-project:main Dec 31, 2024
66 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed unstale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: LoRa adapter responses not matching peft/transformers response
5 participants