Layer Types Configuration Intuition #15

SpoSer23 · 2024-12-24T08:40:46Z

Hi!
I was inquiring about the layer_types config that you used for tinyllama_lckv.json, What was the intuition for this config based on the num_attention_heads and num_key_value_heads and num_hidden_layers for the layer_types and forward_passes and backward_passes ?

The text was updated successfully, but these errors were encountered:

why-in-Shanghaitech · 2024-12-25T10:58:00Z

Hi! Thank you for the question.

For forward_passes and backward_passes, there is no intuition here -- just the empirical results. We find that regardless of the model size/model structure, forward_pass=7 and backward_passes=2 are the most efficient settings (the least training cost while maintaining performance, see LCKV paper section 4.3, appendix C.2, C.4).
For layer_types, the i-th integer means the layer will use the key-value pair in the i-th layer as the kv cache. So it just corresponds to the w=2 settings in LCKV paper. The layer_types has many design choices, see LCKV paper section 4.1, 4.2 and our new paper.

More details about these configs can be found in the configuration file.

I hope it could help. If it does not answer your question please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer Types Configuration Intuition #15

Layer Types Configuration Intuition #15

SpoSer23 commented Dec 24, 2024

why-in-Shanghaitech commented Dec 25, 2024

Layer Types Configuration Intuition #15

Layer Types Configuration Intuition #15

Comments

SpoSer23 commented Dec 24, 2024

why-in-Shanghaitech commented Dec 25, 2024