Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HZ: avoid convergence if bisected #174

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

timholy
Copy link
Contributor

@timholy timholy commented Jan 28, 2024

secant2 calls update, and update might switch to bisection in the U3 step (as named in the initial HZ publication). If we did bisect, the line-search "are we making enough progress?" check (step L2) is nonsensical (we might have shrunk multiple iterations of bisection, but that is not an indication that the secant model was "working").

Consequently, this reports back about whether bisection was engaged in update, and if so skip any kind of convergence assessment and do another iteration.

Fixes #173. Sadly there isn't a very portable test.

secant2 calls update, and update might switch to bisection in the U3 step. If we did bisect, the line-search progress
step L2 testing whether the interval is shrinking fast
enough is nonsensical (we might have shrunk multiple
iterations of bisection, but that is not an indication that the
secant model was "working").

Consequently, this reports back about whether bisection
was engaged in `update`, and if so skip any kind of
convergence assessment and do another iteration.

Fixes JuliaNLSolvers#173.
@pkofod
Copy link
Member

pkofod commented Jan 29, 2024

Thanks @timholy really appreciate it. I'll try to find time to review it later today .

@mateuszbaran
Copy link
Contributor

Thank you! This really turned out to be more involved than I initially expected.

@mateuszbaran
Copy link
Contributor

Bump?

@pkofod
Copy link
Member

pkofod commented Feb 12, 2024

Okay, this looks good to me. @mateuszbaran did you somehow check if this solved your original problem?

@mateuszbaran
Copy link
Contributor

I've just checked. We've solved the original problem by making L-BFGS in Manopt more robust to line searches returning stepsize 0, so it's harder to reproduce the true original problem, but I can still generate problems for which the sub-optimal alpha is selected in the "It's so flat, secant didn't do anything useful, time to quit" branch.

@mateuszbaran
Copy link
Contributor

This script can easily generate cases where such choice is made:

using Revise
using Manopt
using Manifolds
using LineSearches
using LinearAlgebra

norm_inf(M::AbstractManifold, p, X) = norm(X, Inf)

function f_rosenbrock(x)
    result = 0.0
    for i in 1:2:length(x)
        result += (1.0 - x[i])^2 + 100.0 * (x[i + 1] - x[i]^2)^2
    end
    return result
end
function f_rosenbrock(::AbstractManifold, x)
    return f_rosenbrock(x)
end

function g_rosenbrock!(storage, x)
    for i in 1:2:length(x)
        storage[i] = -2.0 * (1.0 - x[i]) - 400.0 * (x[i + 1] - x[i]^2) * x[i]
        storage[i + 1] = 200.0 * (x[i + 1] - x[i]^2)
    end
    return storage
end

function g_rosenbrock!(M::AbstractManifold, storage, x)
    g_rosenbrock!(storage, x)
    riemannian_gradient!(M, storage, x, storage)
    return storage
end

function test_case_manopt()
    for i in 2:2:1000
        N = 5000+i
        mem_len = rand(4:10)
        println("i = $i, mem_len=$mem_len")
        M = Manifolds.Sphere(N - 1; parameter=:field)
        ls_hz = LineSearches.HagerZhang()

        x0 = zeros(N)
        x0[1] = 1
        manopt_sc = StopWhenGradientNormLess(1e-6; norm=norm_inf) | StopAfterIteration(1000)

        quasi_Newton(
            M,
            f_rosenbrock,
            g_rosenbrock!,
            x0;
            stepsize=Manopt.LineSearchesStepsize(ls_hz),
            #stepsize=HagerZhangLinesearch(M),
            evaluation=InplaceEvaluation(),
            vector_transport_method=ParallelTransport(),
            return_state=true,
            memory_size=mem_len,
            stopping_criterion=manopt_sc,
        )
    end
end
test_case_manopt()

You can then add if values[ib] < values[iA] || values[iB] < values[iA] check in HZ to see if the choice is indeed suboptimal.

@pkofod
Copy link
Member

pkofod commented Feb 12, 2024

I've just checked. We've solved the original problem by making L-BFGS in Manopt more robust to line searches returning stepsize 0, so it's harder to reproduce the true original problem, but I can still generate problems for which the sub-optimal alpha is selected in the "It's so flat, secant didn't do anything useful, time to quit" branch.

have you decided to reset the approximation in that case?

@mateuszbaran
Copy link
Contributor

Currently it just goes in the direction of negative gradient but the memory isn't purged. Maybe it would work better with purging memory, I just didn't check how to do that safely in Manopt and it works OK without it.

@pkofod
Copy link
Member

pkofod commented Feb 12, 2024

Currently it just goes in the direction of negative gradient but the memory isn't purged. Maybe it would work better with purging memory, I just didn't check how to do that safely in Manopt and it works OK without it.

Okay. I had thought to change Optim to reset in this case which is why I asked :)

@mateuszbaran
Copy link
Contributor

I had a chat about it with Roland Herzog some time ago and he suggested using truncated CG is such case but restarting is also a valid option.

@pkofod
Copy link
Member

pkofod commented Feb 14, 2024

@timholy do you have any thoughts on the above comments by @mateuszbaran ?

@mateuszbaran
Copy link
Contributor

I've checked the paper and it doesn't have the termination condition corresponding to "It's so flat, secant didn't do anything useful, time to quit" present in LineSearches.jl implementation. I don't see what the justification for it is. Maybe it should be modified?

@mateuszbaran
Copy link
Contributor

Bump?

@pkofod
Copy link
Member

pkofod commented Mar 23, 2024

I've checked the paper and it doesn't have the termination condition corresponding to "It's so flat, secant didn't do anything useful, time to quit" present in LineSearches.jl implementation. I don't see what the justification for it is. Maybe it should be modified?

I think Tim may have added that. I suppose you're saying that maybe it's failing here partially because the branch doesn't belong?

@mateuszbaran
Copy link
Contributor

There is clearly a difference between how it's commented and what it actually does. It's possible that termination condition actually catches some corner and in some cases is a good idea but currently it also catches things it shouldn't catch (including in this PR). I don't know that corner case it was supposed to handle so I don't know how to properly fix it.

Anyway, there is always that simple change I've done that works well enough for me and shouldn't really harm any use case.

There is already at least one person (other than me and Ronny) who had issues because my fixed HZ linesearch is what works best for his problem but he would like to use it in Pluto which doesn't like non-registered packages. So it would be nice to have a good HZ linesearch in a registered package.

@timholy
Copy link
Contributor Author

timholy commented Mar 25, 2024

Sorry I haven't had time for "general"/"charitable" development lately. My busy schedule is likely to persist another couple of weeks at least. If anyone wants to take this effort over, by all means go for it.

A couple of high-level thoughts:

  • the Julia implementaton of HZ linesearch should be in this package, and it should be good. It's important to fix this and release a new version.
  • this episode is a reminder that in some ways, the test suite is more valuable than the implementation. My extensions to the published algorithm were almost certainly motivated by real-world cases, but those are lost to the mists of time. My inability to reconstruct those motivations is at risk of paralyzing any decision here. If push comes to shove, it might be better to revert to the published algorithm and force me to deal with the consequences.
  • Likewise, the issues uncovered by @mateuszbaran depend on a complicated software stack that may be more than one wants to depend on for a test suite: updates to a large ecosystem can have a way of making such things fragile. So we're at risk for losing yet more important test cases if we don't do something.
  • To me, it seems that the most important step is to develop a testing layer where we capture the outcome of the one-dimensional problem, as that removes dependence from the "real" problem and allows us to create small tests without a large dependency stack. The idea would be to grab the record of the phi and dphi values at the alpha values chosen by the algorithm, and use interpolation to match both the values and slopes at those alpha values. (IIRC Lagrange polynomial interpolation can be readily generalized to match both values and derivatives.) Then we could have a test suite that consists of vectors of alpha, phi, and dphi values for all the "bad" cases anyone reports.
  • Failing to do this now, while we have real-world failing test cases, is just a way of perpetuating the inability to keep this package at the forefront of Julia implementations of line search methods. Which would be a giant waste.

Thus if someone wants to help move this forward, perhaps the best thing to do is focus on developing that test harness.

@mateuszbaran
Copy link
Contributor

Thank you, your observations are very on-point. To push this forward I can prepare a simplified test case, either this week or the next one. Anyway, I'm not a maintainer of this package so I can't decide how to fix this issue.

@mateuszbaran
Copy link
Contributor

mateuszbaran commented Apr 8, 2024

I have made a self-contained example based on cubic interpolation. It somehow doesn't reproduce the original problem -- I will look for the right test values but that's all I could do in the time I've found so far.

struct LineSearchTestCase
    alphas::Vector{Float64}
    values::Vector{Float64}
    slopes::Vector{Float64}
end


function prepare_test_case(; alphas, values, slopes)
    perm = sortperm(alphas)
    alphas = alphas[perm]
    push!(alphas, alphas[end]+1)
    values = values[perm]
    push!(values, values[end])
    slopes = slopes[perm]
    push!(slopes, 0.0)
    return LineSearchTestCase(alphas, values, slopes)
end

tc1 = prepare_test_case(;
    alphas = [0.0, 1.0, 5.0, 3.541670844449739],
    values = [3003.592409634743, 2962.0378569864743, 2891.4462095232184, 3000.9760725116876],
    slopes = [-22332.321416890798, -20423.214551925797, 11718.185026267562, -22286.821227217057]
)

function tc_to_f(tc)
    function f(x)
        i = findfirst(u -> u > x, tc.alphas) - 1
        xk = tc.alphas[i]
        xkp1 = tc.alphas[i + 1]
        dx = xkp1 - xk
        t = (x - xk) / dx
        h00t = 2t^3 - 3t^2 + 1
        h10t = t * (1 - t)^2
        h01t = t^2 * (3 - 2t)
        h11t = t^2 * (t - 1)
        val =
            h00t * tc.values[i] +
            h10t * dx * tc.slopes[i] +
            h01t * tc.values[i + 1] +
            h11t * dx * tc.slopes[i + 1]

        return val
    end
end

function tc_to_fdf(tc)
    function fdf(x)
        i = findfirst(u -> u > x, tc.alphas) - 1
        xk = tc.alphas[i]
        xkp1 = tc.alphas[i + 1]
        dx = xkp1 - xk
        t = (x - xk) / dx
        h00t = 2t^3 - 3t^2 + 1
        h10t = t * (1 - t)^2
        h01t = t^2 * (3 - 2t)
        h11t = t^2 * (t - 1)
        val =
            h00t * tc.values[i] +
            h10t * dx * tc.slopes[i] +
            h01t * tc.values[i + 1] +
            h11t * dx * tc.slopes[i + 1]

        h00tp = 6t^2 - 6t
        h10tp = 3t^2 - 4t + 1
        h01tp = -6t^2 + 6 * t
        h11tp = 3t^2 - 2t
        slope =
            (
                h00tp * tc.values[i] +
                h10tp * dx * tc.slopes[i] +
                h01tp * tc.values[i + 1] +
                h11tp * dx * tc.slopes[i + 1]
            ) / dx
        println(x, " ", val, " ", slope)
        return val, slope
    end
end

using LineSearches

function test_tc(tc)
    hz = HagerZhang()
    f = tc_to_f(tc)
    fdf = tc_to_fdf(tc)
    hz(f, fdf, 1.0, fdf(0.0)...)
end

test_tc(tc1)

EDIT: new testing code, where the best searched value is 2891.4462095232184 but HZ returns the point at which the objective is equal to 3000.9760725116876

@mateuszbaran
Copy link
Contributor

I've updated my example, now there is a clear, fully self-contained case of sub-optimal returned value.

@pkofod
Copy link
Member

pkofod commented Apr 12, 2024

Thank you so much Mateusz! I hope to have a little more time next week than in the past months :)

@pkofod
Copy link
Member

pkofod commented Apr 18, 2024

Not forgotten! I will use your test-case to look at this next week.

@pkofod
Copy link
Member

pkofod commented Apr 30, 2024

I've updated my example, now there is a clear, fully self-contained case of sub-optimal returned value.

You have been patient :) I had some deadlines that were quite important. I fixed some Optim.jl stuff yesterday evening, and I will try to look at this when I'm off work.

@kbarros
Copy link

kbarros commented May 14, 2024

As per #175, I encountered another issue where the flatness check causes early termination at a point that is not close to stationary, which does not seem to be resolved by this PR.

The problematic sequence of ϕ and data are these:

cs = [0, 0.2, 0.1, 0.055223623837016025]
ϕs = [3.042968312396456, 3.117411287254072, -3.503584823341612, 0.5244246783084083]
dϕs = [-832.4270136930788, -505.33622492291033, 674.947830358913, 738.3388472434362]

This bug took a while for me to diagnose (typical users will not be suspecting a problem in LineSearches). Maybe as a band-aid we can add a check inside that flatness check -- if the norm of the gradient does not seem small, then print a warning for the user that there is a known bug with this part of LineSearches?

My original function is oscillatory, but like @timholy says above, one can just as well create a polynomial giving the same data:

The idea would be to grab the record of the phi and dphi values at the alpha values chosen by the algorithm, and use interpolation to match both the values and slopes at those alpha values. (IIRC Lagrange polynomial interpolation can be readily generalized to match both values and derivatives.)

I created the package HermiteInterpolation for exactly this purpose (submitted for registration), and can be used like this:

using HermiteInterpolation: fit
f = fit(cs, ϕs, dϕs)
@assert f.(cs)  ϕs

using DynamicPolynomials
@polyvar c

println(f(c))
# 3.042968312396456 - 832.4270136930788*c - 132807.15591801773*c^2 + 7.915421661743959e6*c^3 - 1.570284840040962e8*c^4 + 1.4221708747294645e9*c^5 - 5.970065591205604e9*c^6 + 9.405512899903618e9*c^7

println(differentiate(f(c), c))
# -832.4270136930788 - 265614.31183603546*c + 2.3746264985231876e7*c^2 - 6.281139360163848e8*c^3 + 7.110854373647323e9*c^4 - 3.582039354723362e10*c^5 + 6.5838590299325325e10*c^6

Incidentally, as a temporary workaround for my optimization problem, I came up with this sequence:

    method = Optim.LBFGS(; linesearch=Optim.LineSearches.BackTracking(order=2))
    res0 = Optim.optimize(f, g!, collect(reinterpret(Float64, params)), method, options)
    res = Optim.optimize(f, g!, Optim.minimizer(res0), Optim.ConjugateGradient(), options)

LBFGS with BackTracking search seems to converge robustly, and once it's near the local minimum, then ConjugateGradient can effectively fine tune minimizer.

@kbarros
Copy link

kbarros commented May 18, 2024

Here is an optimization problem that highlights a fundamental problem with the current flatness check:

# The minimizer is x0=[0, 2πn/100], with f(x0) = 1. Any integer n is fine.
function f(x)
    return (x[1]^2 + 1) * (2 - cos(100*x[2]))
end

using Optim

function test_converges(method)
    for i in 1:100
        r = randn(2)
        res = optimize(f, r, method)
        if Optim.converged(res) && minimum(res) > f([0,0]) + 1e-8
            println("""
                Incorrectly reported convergence after $(res.iterations) iterations
                Reached x = $(Optim.minimizer(res)) with f(x) = $(minimum(res))
                """)
        end
    end
end

# Works successfully, no printed output
test_converges(LBFGS(; linesearch=Optim.LineSearches.BackTracking(order=2)))

# Prints ~10 failures to converge (in 100 tries). Frequently fails after the
# first line search.
test_converges(ConjugateGradient())

@kbarros
Copy link

kbarros commented Aug 20, 2024

Gentle ping -- any hope for tightening up the HZ "flatness check"? The randomized optimization problem above should work pretty robustly as a unit test.

@timholy
Copy link
Contributor Author

timholy commented Aug 29, 2024

I'll get to this in another week or two.

timholy added a commit that referenced this pull request Oct 22, 2024
This may cause problems of its own, but for the time being it's better
than the status quo.

Fixes #173
Closes #174
Fixes #175
timholy added a commit that referenced this pull request Oct 22, 2024
This may cause problems of its own, but for the time being it's better
than the status quo.

Fixes #173
Closes #174
Fixes #175
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suboptimal choice of alpha in Hager-Zhang
4 participants