Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The loss cannot converge when finetuning Llama2-7b-GPTQ on 4090 #16

Open
cyita opened this issue Nov 30, 2023 · 11 comments
Open

The loss cannot converge when finetuning Llama2-7b-GPTQ on 4090 #16

cyita opened this issue Nov 30, 2023 · 11 comments

Comments

@cyita
Copy link

cyita commented Nov 30, 2023

I finetuned https://huggingface.co/TheBloke/Llama-2-7B-GPTQ on 4090 using the code from this repo and modified the group_size in peft_utils.py, but it seems cannot converge.

Only pass the learning rate = 3e-05 to qalora.py

image
@StiphyJay
Copy link

I finetuned https://huggingface.co/TheBloke/Llama-2-7B-GPTQ on 4090 using the code from this repo and modified the group_size in peft_utils.py, but it seems cannot converge.

Only pass the learning rate = 3e-05 to qalora.py

image

The same problem. Do you solve this problem?

@duany049
Copy link

The same problem. Do you solve this problem?

The same problem. Do you solve this problem?

@duany049
Copy link

any update?

@StiphyJay
Copy link

any update?

didn't fix.

@duany049
Copy link

any update?

didn't fix.

我用W4G32,跟论文一样的配置去训练模型,结果依然不收敛。
你有试过用loss比较大的模型,跑评测么? 若有,效果如何?

@duany049
Copy link

any update?

didn't fix.

老哥,若不好解决,咱可以换一个方案。 你有试到过 2bit量化效果比较好的方案么?

@StiphyJay
Copy link

any update?

didn't fix.

老哥,若不好解决,咱可以换一个方案。 你有试到过 2bit量化效果比较好的方案么?

please refer PB-LLM.

@duany049
Copy link

any update?

didn't fix.

老哥,若不好解决,咱可以换一个方案。 你有试到过 2bit量化效果比较好的方案么?

please refer PB-LLM.

老哥,我已经解决这个问题了,需要替换auto-gptq中的peft-utils.py文件!

另外,PB-LLM中的QAT方法,你训练时怎么避免内存溢出呢?

@duany049
Copy link

I have solved this problem by replacing path/auto_gptq/utils/peft_utils.py by the peft_utils.py file in the project

@duany049
Copy link

any update?

didn't fix.

老哥,若不好解决,咱可以换一个方案。 你有试到过 2bit量化效果比较好的方案么?

please refer PB-LLM.

老哥,我已经解决这个问题了,需要替换auto-gptq中的peft-utils.py文件!

另外,PB-LLM中的QAT方法,你训练时怎么避免内存溢出呢?

显存溢出,我80G的显存都不够用!

@wenjingk-xilinx
Copy link

Hi @duany049 @cyita @StiphyJay @yuhuixu1993 , could you help to have a look on issue, have you met similar problem? Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants