You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I've observed the following when using Unsloth.
Summary
When fine-tuning the Unsloth Phi-3.5 model with LoRA, the trainable parameters are approximately 3x higher compared to the Microsoft Phi-3.5 implementation, despite using identical hyperparameters and target modules.
trainable params: 59,768,832 || all params: 3,880,848,384 || trainable%: 1.5401
As you can see, there is a huge discrepancy in the number of trainable parameters.
Am I doing something wrong or is this unintentional behaviour?
Thank you for your ongoing work!
The text was updated successfully, but these errors were encountered:
KristianMoellmann
changed the title
Unsloth Phi-3.5 LoRA 3x the number of parameters with the same hyperparameters
Unsloth Phi-3.5 LoRA: 3x the Number of Parameters with the Same Hyperparameters
Nov 22, 2024
KristianMoellmann
changed the title
Unsloth Phi-3.5 LoRA: 3x the Number of Parameters with the Same Hyperparameters
Unsloth Phi-3.5 LoRA: 3x the Number of Trainable Parameters with the Same Hyperparameters
Nov 22, 2024
Hi! I've observed the following when using Unsloth.
Summary
When fine-tuning the Unsloth Phi-3.5 model with LoRA, the trainable parameters are approximately 3x higher compared to the Microsoft Phi-3.5 implementation, despite using identical hyperparameters and target modules.
Details
Microsoft Phi-3.5 Model
Using the following configuration:
Output:
Unsloth-Based Setup
Configuration:
Output:
As you can see, there is a huge discrepancy in the number of trainable parameters.
Am I doing something wrong or is this unintentional behaviour?
Thank you for your ongoing work!
The text was updated successfully, but these errors were encountered: