All the training tricks can be configured in the model config files.
After setting, please run tools/train.py
script to initiate training.
train:
gradient_accumulation_steps: 2
clip_grad: True
clip_norm: 5.0
ema: True
ema_decay: 0.9999
Gradient accumulation is an effective way to address memory limitation issue and allow training with large global batch size.
To enable it, set train.gradient_accumulation_steps
to values larger than 1 in yaml config.
The equivalent global batch size would be
global_batch_size = batch_size * num_devices * gradient_accumulation_steps
Gradient clipping is a method to address gradient explosion/overflow problem and stabilize model convergence.
To enable it, set train.ema
to True
and optionally adjust the norm value in train.clip_norm
.
Exponential Moving Average (EMA) can be viewed as a model ensemble method that smooths the model weights. It can help stabilize model convergence in training and usually leads to better model performance.
To enable it, set train.ema
to True
. You may also adjust train.ema_decay
to control the decay rate.
Resuming training is useful when the training was interrupted unexpectedly.
To resume training, set model.resume
to True
in the yaml config as follows:
model:
resume: True
By default, it will resume from the "train_resume.ckpt" checkpoint file located in the directory specified in
train.ckpt_save_dir
.
If you want to use another checkpoint to resume from, specify the checkpoint path in resume
as follows:
model:
resume: /some/path/to/train_resume.ckpt
Please refer to the MindOCR OpenI Training Guideline