Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
NamrataRShivagunde authored May 8, 2024
1 parent 76575ab commit 4109227
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/2024/pept_relora_n_galore/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ <h2 id="galore">GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Proje
<li><b>Theorem: Gradient Form of reversible models</b><br />
A reversible network with L2 objective has the gradient of form G<sub>t</sub> = A - BW<sub>t</sub>C. The definition and proof of reversible networks are discussed in the paper. It is shown that the Feed Forward networks and softmax loss function are reversible networks, thus having a gradient of the given form. Attention may or may not be a reversible network. </li>
</ul>
<p>As LLMs are made of feed-forward networks and activation functions, based on the above lemma and theorem and its proof, it is implied that LLMs have a gradient of form G<sub>t</sub> = A - BW<sub>t</sub>C and the gradient becomes low rank as training progresses.</p>
<p>As LLMs are made of feed-forward networks and activation functions, based on the above lemma and theorem and its proof, it is implied that LLMs have a gradient of form G<sub>t</sub> = A - BW<sub>t</sub>C. It is assumed the attention is also a reversible network. As the gradient is of the given form, the gradient becomes low rank as training progresses.</p>

<figure>
<img src="/blog/assets/images/galore-decomposition.png" />
Expand Down

0 comments on commit 4109227

Please sign in to comment.