This repoitory contains a Paddle implementation of the Stochastic Weight Averaging(SWA) training method.
by Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov and Andrew Gordon Wilson.
- Original Pytorch Implementation: https://github.com/timgaripov/swa
Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. but simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training which called Stochastic Weight Averaging (SWA) procedure.
SWA is extremely easy to implement, improves generalization,and has almost no computational overhead.The experimental results in the paper is summarized in the following.
We implement the SWA method with Paddle and test with VGG16 model. The results are close to the orginal paper on the CIFAR-10 datasets.
swa-paddle
├── models
├── vgg.py
├── preresnet.py
├── wide_resnet.py
├── eval.py
├── train.py
├── utils.py
!python train.py --swa
!python eval.py --model_path="out/checkpoint.pdparams"
Method | DataSet | Environment | Model | Epoch | Test Accuracy |
---|---|---|---|---|---|
SWA | CIFAR-10 | Tesla V100 | VGG-16 | 200 | 93.68 |
- AI studio link : https://aistudio.baidu.com/aistudio/projectdetail/2528609
- if you want to train this model using script, you can click to following link https://aistudio.baidu.com/aistudio/clusterprojectdetail/2504009 To run this script:
!python -m paddle.distributed.launch train.py --swa
The model we have trained is save to : Baidu Aistudio SWA Paddle