Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training from scratch #1

Open
baiyancheng20 opened this issue Jun 11, 2016 · 8 comments
Open

Training from scratch #1

baiyancheng20 opened this issue Jun 11, 2016 · 8 comments

Comments

@baiyancheng20
Copy link

Have you ever trained a model using your code? I tried to train a new model, but did not achieve the accuracy.

@antingshen
Copy link
Owner

Yes, I've trained the ResNet-50 from scratch, and it does not achieve MSRA's accuracy due to a few differences. This was noted in the README. If you figure out changes to reproduce MSRA's accuracy, please let me know and we can fix it :)

@baiyancheng20
Copy link
Author

baiyancheng20 commented Jun 13, 2016

@antingshen Hi, I have used this code to train ResNet-18 from scratch, and it did not achieve a good result, too. I found the training accuracy is higher than the test after about 6-10 echoes, which is abnorm. I still trained one model without BN layers which can achieve 65% top1 accuracy, better than models using BN layers. I change your code to train models on cifar10. I achieve 88.76% top1 accuracy vs 90.0 in He's paper, which seems correct. So I am very confused.Could you tell me what your training process is and your accuracy?

@antingshen
Copy link
Owner

Here's my ResNet-50 top-1 validation error with respect to epochs:
plot

As you can see, this includes BN and reaches ~68% accuracy.

I'm not quite sure what's wrong either at the moment, besides my version not having random reshape & crop. If you figure it out please let me know :)

@baiyancheng20
Copy link
Author

image
This is He's training and testing curves. We can see that the training errors are higher than testing ones before the 60 epochs. As you say, there is no realtime data augmentation (random reshape, color jittering, etc) in Caffe. This should be one reason. I have implementated a few data augmentation methods. If you are interested, we can cooperate and try to train the resnet.

There may be also another reason. Kaiming He said,

In our BN layers, the provided mean and variance are strictly computed using average (not moving average) on a sufficiently large training batch after the training procedure. The numerical results are very stable (variation of val error < 0.1%). Using moving average might lead to different results.

Do you know how to compute the mean/val as He said?

@antingshen
Copy link
Owner

Maybe, I think we might need a bit more detail or experimentation to find out the exact BN implementation.

I'm happy to cooperate. Let me know if you have any ideas.

@baiyancheng20
Copy link
Author

Could you use a modified data_reader.cpp https://github.com/lim0606/caffe-googlenet-bn for shuffering data during training? I found this can improve the accuracy for googlenet. I wonder whether it could improve resnet.
PS. Do you use any instant messaging software? I think you are a Chinese, do you use QQ?

@antingshen
Copy link
Owner

The link is broken, but I think we want shuffling + random resize + random crop, all on the fly during training. Or at least it seems like it from the MSRA paper. I'd say modifying data_reader.cpp is the right idea.

I have WeChat & Messenger.

@leonid-pishchulin
Copy link

could somebody share resnet-18 model pre-trained on image net?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants