-
Notifications
You must be signed in to change notification settings - Fork 2.5k
cosine annealing lr scheduler #864
base: main
Are you sure you want to change the base?
Conversation
Hi @CoinCheung Really nice extension! |
@CoinCheung Hi, any update here? Just curious about the performance when using cosine annealing lr |
Hi, I did not test it coco dataset since I do not have enough gpus, but I did have tested it on our own dataset which is composed of around 30k images. On our own datasets, I observed an improvement of mAP50 by around 0.05, which shows that cosine annealing learning rate curve does no worse than its step-shaped counterpart. If you feel it is better to post benchmark results on coco dataset, I will try to train one, but I am afraid it will take some days. |
Thanks for your reply. I think benchmark results are needed before merging this PR if it can really improve performance on COCO. |
I have tested this pr on coco dataset. It is so sad to find that cosine lr schedule is not better than the multi-step lr scheduler, with map 40.7 and 39.5 separately. I used fbnet based faster-rcnn following the default configuration except that I doubled the image number per gpu and used 4 gpus to train in the fp16 mode. Training log can be found at: multi-step and cosine. I think the reason behind this performance margin is that the milestones of the mult-step lr schedule is carefully picked and maybe many other hyper-parameters are tuned on the basis of using this lr curve rather than the cosine shaped lr. The performance margin varies case by case. On our own dataset where the default configurations tuned for coco might not be optimal, cosine lr performs on par with its multi-step counterpart. I have also tested it on cifar-10 dataset, where with careful choice of the stopping lr, cosine shaped schedule can perform better than multi-step scheduler. So I think cosine lr still makes sense and can be a meaningful choice in the general usages. |
@CoinCheung Did you combine different models (ensembling) from cosine annealing or just checked final model ? |
@gaussiangit No, I didn't. I simply used the final model to test. What is the good strategy of ensembling. Would you be more specific ? |
Personally, I prefer lr schedulers with smooth shapes like this cosine lr scheduler, that is because I can no longer need to consider the plateau positions where I should drop the learning rate(Or maybe I could say this annealing method allows us to not consider the two hyper-parameters of mile-stones, and makes it simpler to decide the training configurations ?).
The efficiency of the usage of cosine lr scheduler is verified both in the task of classification (paper is here), and object-detection (paper is here). So I think maybe it is not improper to add this feature to this repository, and maybe other users are also in need of this feature.
If my modification is not beautiful, please tell me and I will be happy to make it satisfactory :)