-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code bugs #18
Comments
It seems they want the wrapped model to be exactly the same as the original one if keep_original_weights, otherwise lora_A.weight is initialized as kaiming in ReLoRaLinear, but even so, B times A is still zero. So it seems to me not a typo but a redundancy? |
but if A and B both are initialized with zero weights, the training process are stuck? since the gradient of A euqals to B^T\frac{\partial L}{\partial W} and the gradient of B euqals to \frac{\partial L}{\partial W}A^T , in this case your gradients for A and B would be zero all time. |
Oh, even though both A and B are zero-initialized, as you mentioned, the updates will be slow at first due to the small gradients. However, the gradients are not zero because of the presence of the original W, so they can still be gradually updated. I think the authors might intend to do this? |
as i mentioned before the gradient of A = B^TG and B = GA^T. and G is the gradient of W so if you both initialize the A and B zero, it would never update the parameters of A and B |
In your relora.py I found that for every relora layer, the B matrix is initialized as a zero matrix, it is same as standard setting,
however, i also found
when you wrap a model as a relora model, the matrix A is also initialized as a zero matrix, is it a typo ?
The text was updated successfully, but these errors were encountered: