Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layer Types Configuration Intuition #15

Open
SpoSer23 opened this issue Dec 24, 2024 · 1 comment
Open

Layer Types Configuration Intuition #15

SpoSer23 opened this issue Dec 24, 2024 · 1 comment

Comments

@SpoSer23
Copy link

Hi!
I was inquiring about the layer_types config that you used for tinyllama_lckv.json, What was the intuition for this config based on the num_attention_heads and num_key_value_heads and num_hidden_layers for the layer_types and forward_passes and backward_passes ?

@why-in-Shanghaitech
Copy link
Member

Hi! Thank you for the question.

  1. For forward_passes and backward_passes, there is no intuition here -- just the empirical results. We find that regardless of the model size/model structure, forward_pass=7 and backward_passes=2 are the most efficient settings (the least training cost while maintaining performance, see LCKV paper section 4.3, appendix C.2, C.4).
  2. For layer_types, the i-th integer means the layer will use the key-value pair in the i-th layer as the kv cache. So it just corresponds to the w=2 settings in LCKV paper. The layer_types has many design choices, see LCKV paper section 4.1, 4.2 and our new paper.

More details about these configs can be found in the configuration file.

I hope it could help. If it does not answer your question please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants