diff --git a/docs/2024/pept_relora_n_galore/index.html b/docs/2024/pept_relora_n_galore/index.html index 36d1b16..b6b906c 100644 --- a/docs/2024/pept_relora_n_galore/index.html +++ b/docs/2024/pept_relora_n_galore/index.html @@ -309,8 +309,7 @@
Both ReLoRA and GaLore offer advantages and disadvantages for pre-training LLMs. Overall, GaLore saves on memory whereas ReLoRA provides more throughput during pre-training LLMs.
+ReLoRA and GaLore represent distinct approaches to parameter-efficient pre-training for LLMs. ReLoRA employs LoRA decomposition along with the warm-start phase, speeding up the training but having a higher memory utilization. Conversely, GaLore relies on Singular Value Decomposition (SVD), offering reduced memory requirements and the potential for higher ranks but reduced throughput. These methods diverge in their requirement of gradient forms, subspace changes, and the number of matrices trained, providing different options for LLM pre-training.