We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大部分的示例和config都是使用int8/int4进行lora训练;这边测试使用fp16进行lora训练时,加载模型时,使用fp16,SupervisedFinetune里会调用prepare_model_for_kbit_training,该函数会把所有非8bit的参数全部转换为fp32;这就导致了即使是7b的模型,lora(fp16)也会需要40GB以上的显存。
fp16
SupervisedFinetune
prepare_model_for_kbit_training
The text was updated successfully, but these errors were encountered:
peft社区有一些关于这个的讨论,huggingface/peft#828 总得来看,强行转为fp16会导致一些训练不稳定的问题(如NaN,或Linear计算数值类型不匹配)。
peft
如果确实想使用fp16训练,我们建议借助DeepSpeed来实现,这样可以尽可能节省显存
xtuner train ${CONFIG} --deepspeed deepspeed_zero2
7B LoRA 显存开销,可以从~80GB 降低到 ~48GB
Sorry, something went wrong.
@awmthink 发现了一个新bug,LoRA并没有应用上activation_checkpointing
修正后LoRA的显存需求会进一步下降:
#159
LZHgrla
No branches or pull requests
大部分的示例和config都是使用int8/int4进行lora训练;这边测试使用fp16进行lora训练时,加载模型时,使用
fp16
,SupervisedFinetune
里会调用prepare_model_for_kbit_training
,该函数会把所有非8bit的参数全部转换为fp32;这就导致了即使是7b的模型,lora(fp16)也会需要40GB以上的显存。The text was updated successfully, but these errors were encountered: