Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BacktrackingLineSearch is likely to select a sub-optimal step size #746

Open
sub7erra opened this issue May 6, 2019 · 1 comment
Open

Comments

@sub7erra
Copy link

sub7erra commented May 6, 2019

When executing optimization tasks with OWLQN class one may notice pretty often that line search step iterates through several points evaluating function at each of them, and then selects an "optimal" alpha value which is not the best of the values that have just been evaluated. This can be noticed by just looking at cost function values, usually it happens like this:

...
19/04/30 11:49:54 INFO OWLQN: Iteration 0
19/04/30 11:50:18 INFO LogisticCostFun: total loss = 2.4630666139138985E9
19/04/30 11:50:40 INFO LogisticCostFun: CostFun total loss = 1.966782600846745E8
19/04/30 11:51:02 INFO LogisticCostFun: CostFun total loss = 5518530.279375923
19/04/30 11:51:24 INFO LogisticCostFun: CostFun total loss = 711099.7439354724
19/04/30 11:51:46 INFO LogisticCostFun: CostFun total loss = 714635.4663931302
19/04/30 11:52:10 INFO OWLQN: Step Size: 0.1315
19/04/30 11:52:14 INFO OWLQN: Val and Grad Norm: 0.0197023 (rel: 0.00102) 0.000178013
19/04/30 11:52:14 INFO Optimize: step 1: cost = 714635.4663931302
19/04/30 11:52:14 INFO OWLQN: Iteration 1
...

See that cost function value on Iteration 1 is the last one evaluated by line search on Iteration 0, while a better point was the previous one.

The happens because BacktrackingLineSearch which is used in OWLQN relies on the implementation of the minimize method from ApproximateLineSearch trait which is as follows:

def minimize(f: DiffFunction[Double], init: Double = 1.0): Double = iterations(f, init).last.alpha

This is clearly incorrect as no one said that the last iteration of the line search contains the best step size. In fact, in like half of the cases it doesn't because the search might just overshoot (or undershoot) on the last try.

An obvious solution would be to track cost function values for those alphas we have tried and, after stopping conditions are met, select the best instead of the last of those.

@dlwh
Copy link
Member

dlwh commented May 10, 2021

Sorry for not responding. The lost you're reporting is different from what OWLQN is trying to minimize:

  1. we're minimizing your objective plus an L1 term.
  2. We're running a line search that's seeking not just a steepest descent step length, but one that enforces the strong wolfe conditions, which put various constraints on the derivatives as well as on the obj values. I honestly don't remember that stuff well enough to know if that's what's happening here, and it is unfortunate we don't log more here.

If you can replicate with LBFGS or with L1 = 0 I'll bite, or convince me why I'm wrong otherwise (I easily could be!)

@dlwh dlwh closed this as completed May 10, 2021
@dlwh dlwh reopened this May 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants