Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot converge when I training with cifar10 #27

Open
fanwenyuan opened this issue Mar 15, 2018 · 6 comments
Open

Cannot converge when I training with cifar10 #27

fanwenyuan opened this issue Mar 15, 2018 · 6 comments

Comments

@fanwenyuan
Copy link

When I following the instructions training with cifar10 dataset, the net didn't converge.Part of the log is:
I0314 17:24:00.262361 27053 solver.cpp:258] Train net output #2: loss: forcing-binary = -0.124964 (* 1 = -0.124964 loss)
I0314 17:24:00.262364 27053 solver.cpp:571] Iteration 17100, lr = 0.001
I0314 17:24:19.773170 27053 solver.cpp:242] Iteration 17200, loss = 2.21847
I0314 17:24:19.773236 27053 solver.cpp:258] Train net output #0: loss: 50%-fire-rate = 0.0034709 (* 1 = 0.0034709 loss)
I0314 17:24:19.773241 27053 solver.cpp:258] Train net output #1: loss: classfication-error = 2.33997 (* 1 = 2.33997 loss)
I0314 17:24:19.773246 27053 solver.cpp:258] Train net output #2: loss: forcing-binary = -0.124965 (* 1 = -0.124965 loss)
I0314 17:24:19.773259 27053 solver.cpp:571] Iteration 17200, lr = 0.001
I0314 17:24:39.260385 27053 solver.cpp:242] Iteration 17300, loss = 2.17777
I0314 17:24:39.260422 27053 solver.cpp:258] Train net output #0: loss: 50%-fire-rate = 0.00347071 (* 1 = 0.00347071 loss)
I0314 17:24:39.260427 27053 solver.cpp:258] Train net output #1: loss: classfication-error = 2.29926 (* 1 = 2.29926 loss)
I0314 17:24:39.260432 27053 solver.cpp:258] Train net output #2: loss: forcing-binary = -0.124964 (* 1 = -0.124964 loss)
I0314 17:24:39.260435 27053 solver.cpp:571] Iteration 17300, lr = 0.001
I0314 17:24:58.870244 27053 solver.cpp:242] Iteration 17400, loss = 2.16347
I0314 17:24:58.870337 27053 solver.cpp:258] Train net output #0: loss: 50%-fire-rate = 0.00347057 (* 1 = 0.00347057 loss)
I0314 17:24:58.870344 27053 solver.cpp:258] Train net output #1: loss: classfication-error = 2.28496 (* 1 = 2.28496 loss)
I0314 17:24:58.870350 27053 solver.cpp:258] Train net output #2: loss: forcing-binary = -0.124964 (* 1 = -0.124964 loss)
I0314 17:24:58.870357 27053 solver.cpp:571] Iteration 17400, lr = 0.001
The loss just stay near 2.2.
And I've mentioned you used 3 kinds of losses to calculate the result, so I just delete one of the loss layers, use two kinds of losses and it converged. Log is as following:
267646 (* 1 = 0.267646 loss)
I0315 01:15:36.250314 501 solver.cpp:571] Iteration 37600, lr = 0.0001
I0315 01:15:55.113540 501 solver.cpp:242] Iteration 37700, loss = 0.305098
I0315 01:15:55.113579 501 solver.cpp:258] Train net output #0: loss: 50%-fire-rate = 0.000179385 (* 1 = 0.000179385 loss)
I0315 01:15:55.113590 501 solver.cpp:258] Train net output #1: loss: classfication-error = 0.304919 (* 1 = 0.304919 loss)
I0315 01:15:55.113598 501 solver.cpp:571] Iteration 37700, lr = 0.0001
I0315 01:16:14.149433 501 solver.cpp:242] Iteration 37800, loss = 0.283406
I0315 01:16:14.149519 501 solver.cpp:258] Train net output #0: loss: 50%-fire-rate = 0.000153867 (* 1 = 0.000153867 loss)
I0315 01:16:14.149533 501 solver.cpp:258] Train net output #1: loss: classfication-error = 0.283253 (* 1 = 0.283253 loss)
I0315 01:16:14.149544 501 solver.cpp:571] Iteration 37800, lr = 0.0001
I0315 01:16:33.261036 501 solver.cpp:242] Iteration 37900, loss = 0.147353
I0315 01:16:33.261082 501 solver.cpp:258] Train net output #0: loss: 50%-fire-rate = 0.000290852 (* 1 = 0.000290852 loss)
I0315 01:16:33.261096 501 solver.cpp:258] Train net output #1: loss: classfication-error = 0.147062 (* 1 = 0.147062 loss)
I0315 01:16:33.261106 501 solver.cpp:571] Iteration 37900, lr = 0.0001
I0315 01:16:51.728065 501 solver.cpp:346] Iteration 38000, Testing net (#0)
So What caused this, and how to solve the problem?Thank you.

@kevinlin311tw
Copy link
Owner

We didn't observe this problem. We used the same codes and model converged.

@fanwenyuan
Copy link
Author

I redownload and built this project, trained on cifar10 without changing any settings, it still didn't converge, so what's going wrong probably? I'm very confused.

@IvyYZ
Copy link

IvyYZ commented Apr 15, 2018

I also have this problem on the cifar10 . And my loss is the same as the classification loss. The code I used is in SSDH-VGG16-48.

I0415 23:06:50.008275 114710 solver.cpp:239] Iteration 100 (1.581 iter/s, 63.251s/100 iters), loss = 2.48801
I0415 23:06:50.008622 114710 solver.cpp:258] Train net output #0: loss: 50%-fire-rate = 0.0266628 (* 1 = 0.0266628 loss)
I0415 23:06:50.008658 114710 solver.cpp:258] Train net output #1: loss: classfication-error = 2.48801 (* 1 = 2.48801 loss)
I0415 23:06:50.008678 114710 solver.cpp:258] Train net output #2: loss: forcing-binary = -0.0266628 (* 1 = -0.0266628 loss)
I0415 23:06:50.008693 114710 sgd_solver.cpp:112] Iteration 100, lr = 0.001

I just modified the batch. Looking forward to reply sincerely.

@kevinlin311tw
Copy link
Owner

Thanks for pointing out this problem. I am confused because I cannot reproduce this error..

@IvyYZ
Copy link

IvyYZ commented Apr 18, 2018

Thanks for your reply.
I forgot to add the pre-training model.
After loading the model, I trained cifar10 and the model started to converge.

@yulijun1234
Copy link

您好!请问loss震荡的问题解决了吗?我在引用了预训练模型之后训练自己的数据集仍然出现了震荡现象,是我的训练集太小的原因吗。我是一千多张的。感谢您!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants