使用fp16进行lora训练的问题 #143

awmthink · 2023-09-25T10:05:51Z

大部分的示例和config都是使用int8/int4进行lora训练；这边测试使用fp16进行lora训练时，加载模型时，使用fp16，SupervisedFinetune里会调用prepare_model_for_kbit_training，该函数会把所有非8bit的参数全部转换为fp32；这就导致了即使是7b的模型，lora(fp16)也会需要40GB以上的显存。

The text was updated successfully, but these errors were encountered:

LZHgrla · 2023-09-25T12:16:10Z

peft社区有一些关于这个的讨论，huggingface/peft#828
总得来看，强行转为fp16会导致一些训练不稳定的问题（如NaN，或Linear计算数值类型不匹配）。

如果确实想使用fp16训练，我们建议借助DeepSpeed来实现，这样可以尽可能节省显存

xtuner train ${CONFIG} --deepspeed deepspeed_zero2

7B LoRA 显存开销，可以从~80GB 降低到 ~48GB

LZHgrla · 2023-10-09T04:39:30Z

@awmthink 发现了一个新bug，LoRA并没有应用上activation_checkpointing

修正后LoRA的显存需求会进一步下降：

#159

LZHgrla self-assigned this Sep 25, 2023

awmthink closed this as completed Sep 27, 2023

LZHgrla reopened this Oct 9, 2023

LZHgrla closed this as completed Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用fp16进行lora训练的问题 #143

使用fp16进行lora训练的问题 #143

awmthink commented Sep 25, 2023

LZHgrla commented Sep 25, 2023

LZHgrla commented Oct 9, 2023

使用fp16进行lora训练的问题 #143

使用fp16进行lora训练的问题 #143

Comments

awmthink commented Sep 25, 2023

LZHgrla commented Sep 25, 2023

LZHgrla commented Oct 9, 2023