Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters #1324

KristianMoellmann · 2024-11-22T11:51:38Z

Hi! I've observed the following when using Unsloth.

Summary

When fine-tuning the Unsloth Phi-3.5 model with LoRA, the trainable parameters are approximately 3x higher compared to the Microsoft Phi-3.5 implementation, despite using identical hyperparameters and target modules.

Details

Microsoft Phi-3.5 Model

Using the following configuration:

from peft import LoraConfig
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from trl import SFTTrainer

lora_alpha = 64
lora_r = 32
lora_dropout = 0
lora_target_modules = "q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj"
load_in_4bit = True

bnb_config = BitsAndBytesConfig(
    load_in_4bit=load_in_4bit,
    bnb_4bit_use_double_quant=True,
)

model_name = "microsoft/Phi-3.5-mini-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True,
)

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    target_modules=lora_target_modules.split(","),
)

tokenizer = AutoTokenizer.from_pretrained(
    model_name, trust_remote_code=True
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    peft_config=peft_config,
)

trainer.model.print_trainable_parameters()

Output:

trainable params: 17,825,792 || all params: 3,838,905,344 || trainable%: 0.4643

Unsloth-Based Setup

Configuration:

from unsloth import FastLanguageModel

model_name_unsloth = "unsloth/Phi-3.5-mini-instruct-bnb-4bit"

model_unsloth, _ = FastLanguageModel.from_pretrained(
    model_name=model_name_unsloth,
    load_in_4bit=load_in_4bit,
)

tokenizer_unsloth = AutoTokenizer.from_pretrained(
    model_name_unsloth, trust_remote_code=True
)

model_unsloth = FastLanguageModel.get_peft_model(
    model_unsloth,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    target_modules=lora_target_modules.split(","),
)
peft_config_unsloth = None

trainer_unsloth = SFTTrainer(
    model=model_unsloth,
    tokenizer=tokenizer_unsloth,
    peft_config=peft_config_unsloth,
)

trainer_unsloth.accelerator.print(f"{trainer_unsloth.model}")
trainer_unsloth.model.print_trainable_parameters()

Output:

trainable params: 59,768,832 || all params: 3,880,848,384 || trainable%: 1.5401

As you can see, there is a huge discrepancy in the number of trainable parameters.

Am I doing something wrong or is this unintentional behaviour?

Thank you for your ongoing work!

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-11-25T11:07:38Z

Yes so we "mistral-fied" it ie split the QKV into 3 matrices. Microsoft's original impl did not. We show this vastly increases accuracy

KristianMoellmann · 2024-11-25T11:27:29Z

Yes so we "mistral-fied" it ie split the QKV into 3 matrices. Microsoft's original impl did not. We show this vastly increases accuracy

Hi Daniel, thank you for the response!

Can I read somewhere what exactly has been done to the model?

KristianMoellmann changed the title ~~Unsloth Phi-3.5 LoRA 3x the number of parameters with the same hyperparameters~~ Unsloth Phi-3.5 LoRA: 3x the Number of Parameters with the Same Hyperparameters Nov 22, 2024

KristianMoellmann changed the title ~~Unsloth Phi-3.5 LoRA: 3x the Number of Parameters with the Same Hyperparameters~~ Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters #1324

Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters #1324

KristianMoellmann commented Nov 22, 2024 •

edited

Loading

danielhanchen commented Nov 25, 2024

KristianMoellmann commented Nov 25, 2024

Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters #1324

Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters #1324

Comments

KristianMoellmann commented Nov 22, 2024 • edited Loading

Summary

Details

Microsoft Phi-3.5 Model

Unsloth-Based Setup

danielhanchen commented Nov 25, 2024

KristianMoellmann commented Nov 25, 2024

KristianMoellmann commented Nov 22, 2024 •

edited

Loading