cosine annealing lr scheduler #864

CoinCheung · 2019-06-05T13:43:53Z

Personally, I prefer lr schedulers with smooth shapes like this cosine lr scheduler, that is because I can no longer need to consider the plateau positions where I should drop the learning rate(Or maybe I could say this annealing method allows us to not consider the two hyper-parameters of mile-stones, and makes it simpler to decide the training configurations ?).

The efficiency of the usage of cosine lr scheduler is verified both in the task of classification (paper is here), and object-detection (paper is here). So I think maybe it is not improper to add this feature to this repository, and maybe other users are also in need of this feature.

If my modification is not beautiful, please tell me and I will be happy to make it satisfactory :)

botcs · 2019-09-17T16:02:39Z

Hi @CoinCheung

Really nice extension!
Before merging it, could you run sum benchmarks to see if the performance increases accordingly?

Jacobew · 2019-09-18T06:41:51Z

@CoinCheung Hi, any update here? Just curious about the performance when using cosine annealing lr

CoinCheung · 2019-09-18T07:36:05Z

Hi,

I did not test it coco dataset since I do not have enough gpus, but I did have tested it on our own dataset which is composed of around 30k images. On our own datasets, I observed an improvement of mAP50 by around 0.05, which shows that cosine annealing learning rate curve does no worse than its step-shaped counterpart. If you feel it is better to post benchmark results on coco dataset, I will try to train one, but I am afraid it will take some days.

Jacobew · 2019-09-18T07:51:04Z

Thanks for your reply. I think benchmark results are needed before merging this PR if it can really improve performance on COCO.

CoinCheung · 2019-09-29T06:33:25Z

I have tested this pr on coco dataset. It is so sad to find that cosine lr schedule is not better than the multi-step lr scheduler, with map 40.7 and 39.5 separately. I used fbnet based faster-rcnn following the default configuration except that I doubled the image number per gpu and used 4 gpus to train in the fp16 mode. Training log can be found at: multi-step and cosine.

I think the reason behind this performance margin is that the milestones of the mult-step lr schedule is carefully picked and maybe many other hyper-parameters are tuned on the basis of using this lr curve rather than the cosine shaped lr. The performance margin varies case by case. On our own dataset where the default configurations tuned for coco might not be optimal, cosine lr performs on par with its multi-step counterpart. I have also tested it on cifar-10 dataset, where with careful choice of the stopping lr, cosine shaped schedule can perform better than multi-step scheduler. So I think cosine lr still makes sense and can be a meaningful choice in the general usages.

gaussiangit · 2019-11-08T13:53:09Z

@CoinCheung Did you combine different models (ensembling) from cosine annealing or just checked final model ?

CoinCheung · 2019-11-08T14:42:02Z

@gaussiangit No, I didn't. I simply used the final model to test. What is the good strategy of ensembling. Would you be more specific ?

cosine

dbe5de0

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jun 5, 2019

botcs added the enhancement New feature or request label Sep 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cosine annealing lr scheduler #864

cosine annealing lr scheduler #864

CoinCheung commented Jun 5, 2019 •

edited

Loading

botcs commented Sep 17, 2019

Jacobew commented Sep 18, 2019

CoinCheung commented Sep 18, 2019

Jacobew commented Sep 18, 2019

CoinCheung commented Sep 29, 2019

gaussiangit commented Nov 8, 2019

CoinCheung commented Nov 8, 2019

cosine annealing lr scheduler #864

Are you sure you want to change the base?

cosine annealing lr scheduler #864

Conversation

CoinCheung commented Jun 5, 2019 • edited Loading

botcs commented Sep 17, 2019

Jacobew commented Sep 18, 2019

CoinCheung commented Sep 18, 2019

Jacobew commented Sep 18, 2019

CoinCheung commented Sep 29, 2019

gaussiangit commented Nov 8, 2019

CoinCheung commented Nov 8, 2019

CoinCheung commented Jun 5, 2019 •

edited

Loading