-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CodeCamp2023-470] Runner supports setting the number of iterations for each epoch #1292
Conversation
Signed-off-by: ShuRaymond <[email protected]>
Hi, we should also update a unit test to validate this feature works as expected 😄 |
Hi @ShuRaymond , thanks for your contribution. Here are several comments:
|
thanks for reminding, I am doing it. |
thanks for reminding and teaching, just done it. |
Hi, also need to add several unit tests (whether the mmengine/tests/test_runner/test_runner.py Line 1432 in bbd416a
mmengine/tests/test_runner/test_runner.py Line 1841 in bbd416a
mmengine/tests/test_runner/test_runner.py Line 1920 in bbd416a
def test_train(self):
# 15 test num_batch_per_epoch
cfg = copy.deepcopy(self.epoch_based_cfg)
cfg.train_cfg = dict(
by_epoch=True,
max_epochs=3,
num_batch_per_epoch=2,
)
runner = Runner.from_cfg(cfg)
runner.train()
self.assertEqual(runner.iter, 3 * 2) |
mmengine/runner/_flexible_runner.py
Outdated
@@ -298,6 +298,7 @@ def __init__( | |||
f'train_dataloader={train_dataloader}, ' | |||
f'train_cfg={train_cfg}, ' | |||
f'optim_wrapper={optim_wrapper}.') | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, need to update docstring in mmengine/mmengine/runner/loops.py Line 33 in bbd416a
num_batch_per_epoch (int, optional): |
mmengine/runner/loops.py
Outdated
@@ -40,6 +40,7 @@ def __init__( | |||
max_epochs: int, | |||
val_begin: int = 1, | |||
val_interval: int = 1, | |||
num_batch_per_epoch: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new parameter in the middle position may cause a bc issue. Suggest moving it to the end.
docs/en/common_usage/debug_tricks.md
Outdated
Example of a training script | ||
|
||
```python | ||
# Copyright (c) OpenMMLab. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright (c) OpenMMLab. All rights reserved. |
docs/en/common_usage/debug_tricks.md
Outdated
|
||
## Training for a fixed number of iterations (epoch-based training) | ||
|
||
During the process of debugging code, sometimes it is necessary to train for several epochs, such as debugging the validation process or checking whether the checkpoint saving meets expectations. However, if the dataset is too large, it may take a long time to complete one epoch, in which case the cfg parameter can be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the process of debugging code, sometimes it is necessary to train for several epochs, such as debugging the validation process or checking whether the checkpoint saving meets expectations. However, if the dataset is too large, it may take a long time to complete one epoch, in which case the cfg parameter can be added. | |
During the process of debugging code, sometimes it is necessary to train for several epochs, such as debugging the validation process or checking whether the checkpoint saving meets expectations. However, if the dataset is too large, it may take a long time to complete one epoch, in which case the `num_batch_per_epoch` could be configured: |
docs/en/common_usage/debug_tricks.md
Outdated
from mmengine.model import BaseModel | ||
from mmengine.runner import Runner | ||
|
||
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we configure this env variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is to solve my conda environment and torch package conflict. I will delete it in commit.
docs/en/common_usage/debug_tricks.md
Outdated
Take `MMEngine` as an example(Refer to the [documentation](https://mmengine.readthedocs.io/zh_CN/latest/get_started/installation.html)for installing MMEngine)。 | ||
|
||
Example of a training script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take `MMEngine` as an example(Refer to the [documentation](https://mmengine.readthedocs.io/zh_CN/latest/get_started/installation.html)for installing MMEngine)。 | |
Example of a training script |
docs/en/common_usage/debug_tricks.md
Outdated
|
||
``` | ||
|
||
Fast debugging is achieved by adding the `num_batch_per_epoch` parameter to `train_dataloader` and `val_dataloader`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast debugging is achieved by adding the `num_batch_per_epoch` parameter to `train_dataloader` and `val_dataloader`. | |
Fast debugging is achieved by configuring the `num_batch_per_epoch` in `train_dataloader` and `val_dataloader`. You can quickly debug the code of the validation after just 5 training iterations, |
docs/en/common_usage/debug_tricks.md
Outdated
|
||
Fast debugging is achieved by adding the `num_batch_per_epoch` parameter to `train_dataloader` and `val_dataloader`. | ||
|
||
Run the training script. You can see that after running each epoch run 5 batch is over. Compared to the original, debugging is faster and more flexible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run the training script. You can see that after running each epoch run 5 batch is over. Compared to the original, debugging is faster and more flexible. |
FlexibleRunner supports setting the number of iterations for each epoch |
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
One of Openmmlab codecamp task.
Modification
Modified _flexible_runner.py and runner.py so that FlexibleRunner supports setting the number of rounds per epoch iteration to save debugging time.
Checklist