-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can hydra laod mamba's weight directly? #6
Comments
Yes, I believe copying the Mamba weights for both forward and backward passes would be a very good initial weight to start from. |
I think your architecture is similar to mamba2, but as tridao said mamba1 couldn't be loaded into mamba2, can you give me some guidance about how to copy mamba's weights into your model? |
I believe weights of Mamba-2 are also public on HuggingFace |
Thanks for your reply! |
Then you can slightly modify our code from using SSD to S6, with some corrections to the SSM parameters |
I tried to load mamba's weight to hydra, but found there are some differences: 1. Parameters in mamba but not in Hydra:
2. Parameters in Hydra but not in mamba:
3. Parameters with the same name but different shapes:
Below is my code:
|
You will not be able to load Mamba1's parameters from my understanding, as hydra is a mamba2 variant. The reasoning behind this is because the parameterization of the A matrix is completely different between the two models, i.e. Mamba1 will be a (D, N) parameter tensor and Mamba2 will be a (D) parameter tensor, where D is the dimension of a layer, and N is the hidden state size. |
It is not able to directly reuse Mamba1 weights to Mamba2. Unless importing Mamba2 weights, you will need to retrain from scratch. As @Hprairie mentions, they have differences in parameters as well as layouts of the model (e.g., convolution). Please refer to Mamba2 paper for further specific differences between Mamba1 and 2. |
Hello!
Thanks for you amazing work!
If I have a pretrained mamba model, can I load mamba's weight to hydra directly?
The text was updated successfully, but these errors were encountered: