Training from scratch #1

baiyancheng20 · 2016-06-11T04:41:40Z

Have you ever trained a model using your code? I tried to train a new model, but did not achieve the accuracy.

antingshen · 2016-06-12T17:08:18Z

Yes, I've trained the ResNet-50 from scratch, and it does not achieve MSRA's accuracy due to a few differences. This was noted in the README. If you figure out changes to reproduce MSRA's accuracy, please let me know and we can fix it :)

baiyancheng20 · 2016-06-13T03:01:07Z

@antingshen Hi, I have used this code to train ResNet-18 from scratch, and it did not achieve a good result, too. I found the training accuracy is higher than the test after about 6-10 echoes, which is abnorm. I still trained one model without BN layers which can achieve 65% top1 accuracy, better than models using BN layers. I change your code to train models on cifar10. I achieve 88.76% top1 accuracy vs 90.0 in He's paper, which seems correct. So I am very confused.Could you tell me what your training process is and your accuracy?

antingshen · 2016-06-14T21:35:53Z

Here's my ResNet-50 top-1 validation error with respect to epochs:

As you can see, this includes BN and reaches ~68% accuracy.

I'm not quite sure what's wrong either at the moment, besides my version not having random reshape & crop. If you figure it out please let me know :)

baiyancheng20 · 2016-06-15T00:18:10Z

This is He's training and testing curves. We can see that the training errors are higher than testing ones before the 60 epochs. As you say, there is no realtime data augmentation (random reshape, color jittering, etc) in Caffe. This should be one reason. I have implementated a few data augmentation methods. If you are interested, we can cooperate and try to train the resnet.

There may be also another reason. Kaiming He said,

In our BN layers, the provided mean and variance are strictly computed using average (not moving average) on a sufficiently large training batch after the training procedure. The numerical results are very stable (variation of val error < 0.1%). Using moving average might lead to different results.

Do you know how to compute the mean/val as He said?

antingshen · 2016-06-16T01:15:15Z

Maybe, I think we might need a bit more detail or experimentation to find out the exact BN implementation.

I'm happy to cooperate. Let me know if you have any ideas.

baiyancheng20 · 2016-06-20T13:22:18Z

Could you use a modified data_reader.cpp https://github.com/lim0606/caffe-googlenet-bn for shuffering data during training? I found this can improve the accuracy for googlenet. I wonder whether it could improve resnet.
PS. Do you use any instant messaging software? I think you are a Chinese, do you use QQ?

antingshen · 2016-06-21T19:53:53Z

The link is broken, but I think we want shuffling + random resize + random crop, all on the fly during training. Or at least it seems like it from the MSRA paper. I'd say modifying data_reader.cpp is the right idea.

I have WeChat & Messenger.

leonid-pishchulin · 2017-03-17T23:04:29Z

could somebody share resnet-18 model pre-trained on image net?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training from scratch #1

Training from scratch #1

baiyancheng20 commented Jun 11, 2016

antingshen commented Jun 12, 2016

baiyancheng20 commented Jun 13, 2016 •

edited

Loading

antingshen commented Jun 14, 2016

baiyancheng20 commented Jun 15, 2016

antingshen commented Jun 16, 2016

baiyancheng20 commented Jun 20, 2016

antingshen commented Jun 21, 2016

leonid-pishchulin commented Mar 17, 2017

Training from scratch #1

Training from scratch #1

Comments

baiyancheng20 commented Jun 11, 2016

antingshen commented Jun 12, 2016

baiyancheng20 commented Jun 13, 2016 • edited Loading

antingshen commented Jun 14, 2016

baiyancheng20 commented Jun 15, 2016

antingshen commented Jun 16, 2016

baiyancheng20 commented Jun 20, 2016

antingshen commented Jun 21, 2016

leonid-pishchulin commented Mar 17, 2017

baiyancheng20 commented Jun 13, 2016 •

edited

Loading