Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用fp16进行lora训练的问题 #143

Closed
awmthink opened this issue Sep 25, 2023 · 2 comments
Closed

使用fp16进行lora训练的问题 #143

awmthink opened this issue Sep 25, 2023 · 2 comments
Assignees

Comments

@awmthink
Copy link

大部分的示例和config都是使用int8/int4进行lora训练;这边测试使用fp16进行lora训练时,加载模型时,使用fp16SupervisedFinetune里会调用prepare_model_for_kbit_training,该函数会把所有非8bit的参数全部转换为fp32;这就导致了即使是7b的模型,lora(fp16)也会需要40GB以上的显存。

@LZHgrla LZHgrla self-assigned this Sep 25, 2023
@LZHgrla
Copy link
Collaborator

LZHgrla commented Sep 25, 2023

peft社区有一些关于这个的讨论,huggingface/peft#828
总得来看,强行转为fp16会导致一些训练不稳定的问题(如NaN,或Linear计算数值类型不匹配)。

如果确实想使用fp16训练,我们建议借助DeepSpeed来实现,这样可以尽可能节省显存

xtuner train ${CONFIG} --deepspeed deepspeed_zero2

7B LoRA 显存开销,可以从~80GB 降低到 ~48GB

@LZHgrla LZHgrla reopened this Oct 9, 2023
@LZHgrla
Copy link
Collaborator

LZHgrla commented Oct 9, 2023

@awmthink 发现了一个新bug,LoRA并没有应用上activation_checkpointing

修正后LoRA的显存需求会进一步下降:

#159

@LZHgrla LZHgrla closed this as completed Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants