Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: 跑llama3-8b的sft微调时,报错 KeyError: 'eval_accuracy' #9386

Open
1 task done
hjx620 opened this issue Nov 7, 2024 · 1 comment
Open
1 task done
Assignees
Labels
bug Something isn't working

Comments

@hjx620
Copy link

hjx620 commented Nov 7, 2024

软件环境

- paddlepaddle-gpu:  0.0.0.post120
- paddlenlp: 3.0.0b2

重复问题

  • I have searched the existing issues

错误描述

跑llama3-8b的sft微调时,报错 
Traceback (most recent call last):
  File "/home/LAB/huangjx/new/PaddleNLP/llm/run_finetune.py", line 730, in <module>
    main()
  File "/home/LAB/huangjx/new/PaddleNLP/llm/run_finetune.py", line 570, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/LAB/huangjx/.local/lib/python3.10/site-packages/paddlenlp/trainer/trainer.py", line 829, in train
    return self._inner_training_loop(
  File "/home/LAB/huangjx/.local/lib/python3.10/site-packages/paddlenlp/trainer/trainer.py", line 1203, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, epoch, ignore_keys_for_eval, inputs=inputs)
  File "/home/LAB/huangjx/.local/lib/python3.10/site-packages/paddlenlp/trainer/trainer.py", line 1478, in _maybe_log_save_evaluate
    self._save_checkpoint(model, metrics=metrics)
  File "/home/LAB/huangjx/.local/lib/python3.10/site-packages/paddlenlp/trainer/trainer.py", line 2460, in _save_checkpoint
    metric_value = metrics[metric_to_check]
KeyError: 'eval_accuracy'

但如果我把config中"metric_for_best_model": "accuracy",删除,就不会报错。所以应该是不支持"metric_for_best_model": "accuracy".在这个过程中我开了pp和tp

稳定复现步骤 & 代码

  1. cd PaddleNLP/llm/config/llama
  2. cat sft_argument.json
    {
    "model_name_or_path": "meta-llama/Meta-Llama-3-8B",
    "dataset_name_or_path": "./data",
    "output_dir": "./checkpoints/llama_sft_ckpts",
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 1,
    "per_device_eval_batch_size": 1,
    "eval_accumulation_steps": 1,
    "num_train_epochs": 3,
    "learning_rate": 3e-05,
    "warmup_steps": 30,
    "max_steps": 20,
    "max_evaluate_steps": 3,
    "logging_steps": 1,
    "evaluation_strategy": "epoch",
    "save_strategy": "epoch",
    "src_length": 1024,
    "max_length": 200,
    "do_train": true,
    "do_eval": true,
    "disable_tqdm": true,
    "load_best_model_at_end": true,
    "eval_with_do_generation": false,
    "metric_for_best_model": "accuracy",
    "recompute": true,
    "save_total_limit": 1,
    "tensor_parallel_degree": 2,
    "pipeline_parallel_degree": 2,
    "pipeline_parallel_config": "disable_p2p_cache_shape",
    "sharding": "stage2",
    "zero_padding": false,
    "unified_checkpoint": false,
    "use_flash_attention": false
    }
  3. python3 -u -m paddle.distributed.launch --gpus "0,1,2,3" run_finetune.py ./config/llama/sft_argument.json
@hjx620 hjx620 added the bug Something isn't working label Nov 7, 2024
@paddle-bot paddle-bot bot assigned ZHUI Nov 7, 2024
@wawltor
Copy link
Collaborator

wawltor commented Nov 8, 2024

收到,我们关注下eval指标问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants