From bc681e70d6d498d3b7854a11f02d36f246663b2a Mon Sep 17 00:00:00 2001 From: Namrata Shivagunde <51484711+NamrataRShivagunde@users.noreply.github.com> Date: Wed, 8 May 2024 10:43:58 -0400 Subject: [PATCH] Update index.html --- docs/2024/pept_relora_n_galore/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/2024/pept_relora_n_galore/index.html b/docs/2024/pept_relora_n_galore/index.html index cb8c0f5..27b8ef5 100644 --- a/docs/2024/pept_relora_n_galore/index.html +++ b/docs/2024/pept_relora_n_galore/index.html @@ -147,7 +147,7 @@

Parameter Efficient Pre-Training: Comparing ReLoRA and GaLore

-

Parameter Efficient Pre-training (PEPT)

+

Parameter Efficient Pre-Training (PEPT)

As the size and complexity of large language models (LLMs) continue to grow, so does the demand for computational resources to train them. With billions of parameters, training these models becomes increasingly challenging due to the high cost and resource constraints. In response to these challenges, parameter-efficient fine-tuning (PEFT) methods have emerged to fine-tune billion-scale LLMs, for specific tasks, on a single GPU. This raises the question: can we use parameter-efficient training methods and achieve similar efficiency gains during the pre-training stage too?

Parameter-efficient pre-training (PEPT) is an emerging area of research that explores techniques for pre-training LLMs with fewer parameters. Multiple studies suggest that neural network training is either low-rank or has multiple phrases with initially high-rank and subsequent low-rank training (Aghajanyan et al., 2021, Arora et al., 2019, Frankle et al., 2019). This suggests that parameter-efficient training methods can be used to pre-train LLMs.

ReLoRA (Lialin et. al, 2023) is the first parameter-efficient training method used to pre-train large language models. ReLoRA uses LoRA decomposition, merges and resets the values of the LoRA matrices multiple times during training, increasing the total rank of the update. Another recent advance in PEPT is GaLore (Zhao et. al, 2024). In GaLore, the gradient is projected into its lower rank form, updated using an optimizer, and projected back to its original shape, reducing the memory requirement for pre-training LLMs.