-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The loss cannot converge when finetuning Llama2-7b-GPTQ on 4090 #16
Comments
The same problem. Do you solve this problem? |
The same problem. Do you solve this problem? |
any update? |
didn't fix. |
我用W4G32,跟论文一样的配置去训练模型,结果依然不收敛。 |
老哥,若不好解决,咱可以换一个方案。 你有试到过 2bit量化效果比较好的方案么? |
please refer PB-LLM. |
老哥,我已经解决这个问题了,需要替换auto-gptq中的peft-utils.py文件! 另外,PB-LLM中的QAT方法,你训练时怎么避免内存溢出呢? |
I have solved this problem by replacing path/auto_gptq/utils/peft_utils.py by the peft_utils.py file in the project |
显存溢出,我80G的显存都不够用! |
Hi @duany049 @cyita @StiphyJay @yuhuixu1993 , could you help to have a look on issue, have you met similar problem? Many thanks! |
I finetuned https://huggingface.co/TheBloke/Llama-2-7B-GPTQ on 4090 using the code from this repo and modified the group_size in peft_utils.py, but it seems cannot converge.
Only pass the learning rate = 3e-05 to qalora.py
The text was updated successfully, but these errors were encountered: