Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNIST experiments creating qpth issues #4

Open
guptakartik opened this issue Oct 23, 2017 · 5 comments
Open

MNIST experiments creating qpth issues #4

guptakartik opened this issue Oct 23, 2017 · 5 comments

Comments

@guptakartik
Copy link

Hi,

I was running the optnet code for MNIST classification with the default configurations for only 10 epochs. In the first couple of epochs I get the warning "qpth warning: Returning an inaccurate and potentially incorrect solutino" and in the subsequent iterations the loss becomes nan. Is there something obviously wrong with my configurations?

@bamos
Copy link
Member

bamos commented Oct 23, 2017

Hi, I just tried running the MNIST experiment and am hitting nans there too. It's been a while since I've ran that example and I've changed the qpth library since the MNIST experiment was last working. It looks like the solver's hitting some nans internally, causing the precision issue and bad gradients. For now you can try reverting to an older commit of qpth, one from around the time I last updated the MNIST example. I'll try to look into the internal solver issues soon.

-Brandon.

@guptakartik
Copy link
Author

Thanks for the quick reply! I will try working with the older commit of qpth.

@Xingyu-Lin
Copy link

Hi, I tried most of the early versions of qpth but none of them works. They fail in various ways, mostly inside qpth. Could you check which version can work?

@guptakartik
Copy link
Author

Hi Brandon,
It would be really helpful if you could point us to the right version of qpth, since we have been unable to get it to work.

@bamos
Copy link
Member

bamos commented Nov 9, 2017

Hi, the nans were coming up in the backwards pass in qpth and I've pushed a fix to it here: locuslab/qpth@e2cac49

Here's the convergence of one of my new runs (I did modify z0 and s0 to be fixed, pull this from the latest version of this repo). For the loss being so jumpy, the LR should probably be bumped down:

image

Can you try running the training again with the latest versions of this repo and qpth?

-Brandon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants