-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot reproduce the MMLU accuracy claimed in paper, could you release the script? #28
Comments
Hi,using the default parameter settings in qalora.py can replicate the results of the paper. |
@xxw11 Hi. I checked the results without merging the LoRA adapters into the quantized parameters. I used the quantized Llama7B model with 4-bit quantization on the Alpaca dataset. For the MMLU evaluation, the results were as follows:
Regarding the default parameters in the code, I used a learning rate of 0.0002, which was different from the paper's setting of 0.00002. I used the default parameters in the code, but I could not replicate the performance reported in the paper. As a test, I also tried training with the learning rate mentioned in the paper (0.00002), but the results got even worse. Do you have any tips on what might be causing this? Thanks. |
@freeSoul-SNU In my reproduction process, I found that using a linear learning rate, as specified in the original paper, might lead to unstable results. However, the MMLU score should be around 38-40. Your parameters are basically the same as mine; I set the batch size to 1 and the gradient accumulation step to 16, but theoretically, this shouldn't affect the final results. Did you use the MMLU evaluation repository mentioned in the QA-LoRA paper? |
@xxw11 Thank you very much for your response. For the MMLU evaluation, I used the mmlu_evaluation function in qalora.py shared by the author. The dataset wasn't included in the qalora Git repository, but I found it in the following GPTQ LoRA repository and used it: https://github.com/qwopqwop200/gptqlora I also believe that increasing the batch size to 16 with gradient_accumulation set to 1 should not make a significant difference, but the MMLU evaluation results are still very low. I used the script for finetuning as below: CUDA_VISIBLE_DEVICES=1 HF_DATASETS_OFFLINE=1 python qalora.py --model_path "AutoGPTQ/examples/quantization/llama7b-quant4bit-g32/" \
--output_dir output_alpaca \
--dataset alpaca \
--do_eval True \
--do_mmlu_eval True \
--do_train True \
--mmlu_dataset 'mmlu-fs' \
--save_strategy 'no' \
--save_steps 1000 \
--max_steps 10000 \
--optim paged_adamw_32bit \ If it’s not too much trouble, could you share the MMLU evaluation script you used? + Did you change the part in qalora.py where the model is loaded in float32 to float16 before training? Thank you so much. |
Hi, I try to reproduce the llama7b finetune on MMLU, I get the maximum 5-shot eval accuracy is 36.5% with 4bits, 32.8% with 3bits. Could you please kindly specify your training configuration or release the training script?
The text was updated successfully, but these errors were encountered: