Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Update extra validation feature #303

Merged
merged 2 commits into from
Jan 2, 2025

Conversation

zhaoyinglia
Copy link
Collaborator

@zhaoyinglia zhaoyinglia commented Jan 2, 2025

  1. usage:
extra_eval_interval: 5
extra_valid_data_path: [
    weight1, data_path1,
    weight2, data_path2,
  ]
  • weight refers to the number of tokens for extra validation in data_path.
  • NOTE: The extra validation always starts from consumed_sample=0.
  1. output format
(min, max) time across ranks (ms):
    evaluate .......................................: (xxx, xxx)
-------------------------------------------------------------------------------
extra validation iteration 5 loss at data_path1 | consumed samples: xxx | lm loss value: xxx | lm loss PPL: xxx | 
-------------------------------------------------------------------------------
(min, max) time across ranks (ms):
    evaluate .......................................: (xxx, xxx)
-------------------------------------------------------------------------------
extra validation iteration 5 loss at data_path2 | consumed samples: xxx | lm loss value: xxx | lm loss PPL: xxx | 
-------------------------------------------------------------------------------

@zhaoyinglia zhaoyinglia requested a review from a team as a code owner January 2, 2025 06:39
Copy link
Contributor

@aoyulong aoyulong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aoyulong aoyulong merged commit fbe8888 into FlagOpen:main Jan 2, 2025
3 checks passed
@aoyulong aoyulong changed the title update extra validation feature [Core] Update extra validation feature Jan 2, 2025
heavyrain-lzy pushed a commit to heavyrain-lzy/FlagScale that referenced this pull request Jan 3, 2025
fix test_parallel_context.py

fix ut

[Fix] "auto_tuner" should be under the field config.experiment. (FlagOpen#301)

I want to change the default metric to TFLOPs and change the order to
descend, but it doesn't work. Because, the "auto_tuner" is under
config.experiment instead of config.

After making the following changes it worked.
Change
if (
    "auto_tuner" in self.config
    and "performance" in self.config.experiment.auto_tuner
):
to
if (
    "auto_tuner" in self.config.experiment
    and "performance" in self.config.experiment.auto_tuner
):

add 'attention_backend: unfused' for functional tests

update extra validation feature (FlagOpen#303)

1. usage:
```yaml
extra_eval_interval: 5
extra_valid_data_path: [
    weight1, data_path1,
    weight2, data_path2,
  ]
```
- `weight` refers to the number of tokens for extra validation in
`data_path`.
- **NOTE: The extra validation always starts from consumed_sample=0.**

2. output format
```
(min, max) time across ranks (ms):
    evaluate .......................................: (xxx, xxx)
-------------------------------------------------------------------------------
extra validation iteration 5 loss at data_path1 | consumed samples: xxx | lm loss value: xxx | lm loss PPL: xxx |
-------------------------------------------------------------------------------
(min, max) time across ranks (ms):
    evaluate .......................................: (xxx, xxx)
-------------------------------------------------------------------------------
extra validation iteration 5 loss at data_path2 | consumed samples: xxx | lm loss value: xxx | lm loss PPL: xxx |
-------------------------------------------------------------------------------
```

Fix extra validation corner case (FlagOpen#304)

polish train.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants