[Core] Update extra validation feature #303

zhaoyinglia · 2025-01-02T06:39:21Z

usage:

extra_eval_interval: 5
extra_valid_data_path: [
    weight1, data_path1,
    weight2, data_path2,
  ]

weight refers to the number of tokens for extra validation in data_path.
NOTE: The extra validation always starts from consumed_sample=0.

output format

(min, max) time across ranks (ms):
    evaluate .......................................: (xxx, xxx)
-------------------------------------------------------------------------------
extra validation iteration 5 loss at data_path1 | consumed samples: xxx | lm loss value: xxx | lm loss PPL: xxx | 
-------------------------------------------------------------------------------
(min, max) time across ranks (ms):
    evaluate .......................................: (xxx, xxx)
-------------------------------------------------------------------------------
extra validation iteration 5 loss at data_path2 | consumed samples: xxx | lm loss value: xxx | lm loss PPL: xxx | 
-------------------------------------------------------------------------------

aoyulong

LGTM

fix test_parallel_context.py fix ut [Fix] "auto_tuner" should be under the field config.experiment. (FlagOpen#301) I want to change the default metric to TFLOPs and change the order to descend, but it doesn't work. Because, the "auto_tuner" is under config.experiment instead of config. After making the following changes it worked. Change if ( "auto_tuner" in self.config and "performance" in self.config.experiment.auto_tuner ): to if ( "auto_tuner" in self.config.experiment and "performance" in self.config.experiment.auto_tuner ): add 'attention_backend: unfused' for functional tests update extra validation feature (FlagOpen#303) 1. usage: ```yaml extra_eval_interval: 5 extra_valid_data_path: [ weight1, data_path1, weight2, data_path2, ] ``` - `weight` refers to the number of tokens for extra validation in `data_path`. - **NOTE: The extra validation always starts from consumed_sample=0.** 2. output format ``` (min, max) time across ranks (ms): evaluate .......................................: (xxx, xxx) ------------------------------------------------------------------------------- extra validation iteration 5 loss at data_path1 | consumed samples: xxx | lm loss value: xxx | lm loss PPL: xxx | ------------------------------------------------------------------------------- (min, max) time across ranks (ms): evaluate .......................................: (xxx, xxx) ------------------------------------------------------------------------------- extra validation iteration 5 loss at data_path2 | consumed samples: xxx | lm loss value: xxx | lm loss PPL: xxx | ------------------------------------------------------------------------------- ``` Fix extra validation corner case (FlagOpen#304) polish train.py

update extra validation feature

b8c5338

zhaoyinglia requested a review from a team as a code owner January 2, 2025 06:39

update print consumed samples

cc72e46

aoyulong approved these changes Jan 2, 2025

View reviewed changes

aoyulong merged commit fbe8888 into FlagOpen:main Jan 2, 2025
3 checks passed

aoyulong changed the title ~~update extra validation feature~~ [Core] Update extra validation feature Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Update extra validation feature #303

[Core] Update extra validation feature #303

zhaoyinglia commented Jan 2, 2025 •

edited

Loading

aoyulong left a comment

[Core] Update extra validation feature #303

[Core] Update extra validation feature #303

Conversation

zhaoyinglia commented Jan 2, 2025 • edited Loading

aoyulong left a comment

Choose a reason for hiding this comment

zhaoyinglia commented Jan 2, 2025 •

edited

Loading