Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
NamrataRShivagunde authored May 8, 2024
1 parent bdf99fb commit bc681e7
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/2024/pept_relora_n_galore/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ <h1>Parameter Efficient Pre-Training: Comparing ReLoRA and GaLore</h1>



<h2 id="intro">Parameter Efficient Pre-training (PEPT)</h2>
<h2 id="intro">Parameter Efficient Pre-Training (PEPT)</h2>
<p>As the size and complexity of large language models (LLMs) continue to grow, so does the demand for computational resources to train them. With billions of parameters, training these models becomes increasingly challenging due to the high cost and resource constraints. In response to these challenges, parameter-efficient fine-tuning (PEFT) methods have emerged to fine-tune billion-scale LLMs, for specific tasks, on a single GPU. This raises the question: can we use parameter-efficient training methods and achieve similar efficiency gains during the pre-training stage too?</p>
<p>Parameter-efficient pre-training (PEPT) is an emerging area of research that explores techniques for pre-training LLMs with fewer parameters. Multiple studies suggest that neural network training is either low-rank or has multiple phrases with initially high-rank and subsequent low-rank training (Aghajanyan et al., 2021, Arora et al., 2019, Frankle et al., 2019). This suggests that parameter-efficient training methods can be used to pre-train LLMs.</p>
<p>ReLoRA (Lialin et. al, 2023) is the first parameter-efficient training method used to pre-train large language models. ReLoRA uses LoRA decomposition, merges and resets the values of the LoRA matrices multiple times during training, increasing the total rank of the update. Another recent advance in PEPT is GaLore (Zhao et. al, 2024). In GaLore, the gradient is projected into its lower rank form, updated using an optimizer, and projected back to its original shape, reducing the memory requirement for pre-training LLMs.</p>
Expand Down

0 comments on commit bc681e7

Please sign in to comment.