Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters #1324

Open
KristianMoellmann opened this issue Nov 22, 2024 · 2 comments

Comments

@KristianMoellmann
Copy link

KristianMoellmann commented Nov 22, 2024

Hi! I've observed the following when using Unsloth.

Summary

When fine-tuning the Unsloth Phi-3.5 model with LoRA, the trainable parameters are approximately 3x higher compared to the Microsoft Phi-3.5 implementation, despite using identical hyperparameters and target modules.

Details

Microsoft Phi-3.5 Model

Using the following configuration:

from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from trl import SFTTrainer

lora_alpha = 64
lora_r = 32
lora_dropout = 0
lora_target_modules = "q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj"
load_in_4bit = True

bnb_config = BitsAndBytesConfig(
    load_in_4bit=load_in_4bit,
    bnb_4bit_use_double_quant=True,
)

model_name = "microsoft/Phi-3.5-mini-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True,
)

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    target_modules=lora_target_modules.split(","),
)

tokenizer = AutoTokenizer.from_pretrained(
    model_name, trust_remote_code=True
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    peft_config=peft_config,
)

trainer.model.print_trainable_parameters()

Output:

trainable params: 17,825,792 || all params: 3,838,905,344 || trainable%: 0.4643

Unsloth-Based Setup

Configuration:

from unsloth import FastLanguageModel

model_name_unsloth = "unsloth/Phi-3.5-mini-instruct-bnb-4bit"

model_unsloth, _ = FastLanguageModel.from_pretrained(
    model_name=model_name_unsloth,
    load_in_4bit=load_in_4bit,
)

tokenizer_unsloth = AutoTokenizer.from_pretrained(
    model_name_unsloth, trust_remote_code=True
)

model_unsloth = FastLanguageModel.get_peft_model(
    model_unsloth,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    target_modules=lora_target_modules.split(","),
)
peft_config_unsloth = None

trainer_unsloth = SFTTrainer(
    model=model_unsloth,
    tokenizer=tokenizer_unsloth,
    peft_config=peft_config_unsloth,
)

trainer_unsloth.accelerator.print(f"{trainer_unsloth.model}")
trainer_unsloth.model.print_trainable_parameters()

Output:

trainable params: 59,768,832 || all params: 3,880,848,384 || trainable%: 1.5401

As you can see, there is a huge discrepancy in the number of trainable parameters.

Am I doing something wrong or is this unintentional behaviour?

Thank you for your ongoing work!

@KristianMoellmann KristianMoellmann changed the title Unsloth Phi-3.5 LoRA 3x the number of parameters with the same hyperparameters Unsloth Phi-3.5 LoRA: 3x the Number of Parameters with the Same Hyperparameters Nov 22, 2024
@KristianMoellmann KristianMoellmann changed the title Unsloth Phi-3.5 LoRA: 3x the Number of Parameters with the Same Hyperparameters Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters Nov 22, 2024
@danielhanchen
Copy link
Contributor

Yes so we "mistral-fied" it ie split the QKV into 3 matrices. Microsoft's original impl did not. We show this vastly increases accuracy

@KristianMoellmann
Copy link
Author

Yes so we "mistral-fied" it ie split the QKV into 3 matrices. Microsoft's original impl did not. We show this vastly increases accuracy

Hi Daniel, thank you for the response!

Can I read somewhere what exactly has been done to the model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants