-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HZ: avoid convergence if bisected #174
base: master
Are you sure you want to change the base?
Conversation
secant2 calls update, and update might switch to bisection in the U3 step. If we did bisect, the line-search progress step L2 testing whether the interval is shrinking fast enough is nonsensical (we might have shrunk multiple iterations of bisection, but that is not an indication that the secant model was "working"). Consequently, this reports back about whether bisection was engaged in `update`, and if so skip any kind of convergence assessment and do another iteration. Fixes JuliaNLSolvers#173.
Thanks @timholy really appreciate it. I'll try to find time to review it later today . |
Thank you! This really turned out to be more involved than I initially expected. |
Bump? |
Okay, this looks good to me. @mateuszbaran did you somehow check if this solved your original problem? |
I've just checked. We've solved the original problem by making L-BFGS in Manopt more robust to line searches returning stepsize 0, so it's harder to reproduce the true original problem, but I can still generate problems for which the sub-optimal alpha is selected in the "It's so flat, secant didn't do anything useful, time to quit" branch. |
This script can easily generate cases where such choice is made: using Revise
using Manopt
using Manifolds
using LineSearches
using LinearAlgebra
norm_inf(M::AbstractManifold, p, X) = norm(X, Inf)
function f_rosenbrock(x)
result = 0.0
for i in 1:2:length(x)
result += (1.0 - x[i])^2 + 100.0 * (x[i + 1] - x[i]^2)^2
end
return result
end
function f_rosenbrock(::AbstractManifold, x)
return f_rosenbrock(x)
end
function g_rosenbrock!(storage, x)
for i in 1:2:length(x)
storage[i] = -2.0 * (1.0 - x[i]) - 400.0 * (x[i + 1] - x[i]^2) * x[i]
storage[i + 1] = 200.0 * (x[i + 1] - x[i]^2)
end
return storage
end
function g_rosenbrock!(M::AbstractManifold, storage, x)
g_rosenbrock!(storage, x)
riemannian_gradient!(M, storage, x, storage)
return storage
end
function test_case_manopt()
for i in 2:2:1000
N = 5000+i
mem_len = rand(4:10)
println("i = $i, mem_len=$mem_len")
M = Manifolds.Sphere(N - 1; parameter=:field)
ls_hz = LineSearches.HagerZhang()
x0 = zeros(N)
x0[1] = 1
manopt_sc = StopWhenGradientNormLess(1e-6; norm=norm_inf) | StopAfterIteration(1000)
quasi_Newton(
M,
f_rosenbrock,
g_rosenbrock!,
x0;
stepsize=Manopt.LineSearchesStepsize(ls_hz),
#stepsize=HagerZhangLinesearch(M),
evaluation=InplaceEvaluation(),
vector_transport_method=ParallelTransport(),
return_state=true,
memory_size=mem_len,
stopping_criterion=manopt_sc,
)
end
end
test_case_manopt() You can then add |
have you decided to reset the approximation in that case? |
Currently it just goes in the direction of negative gradient but the memory isn't purged. Maybe it would work better with purging memory, I just didn't check how to do that safely in Manopt and it works OK without it. |
Okay. I had thought to change Optim to reset in this case which is why I asked :) |
I had a chat about it with Roland Herzog some time ago and he suggested using truncated CG is such case but restarting is also a valid option. |
@timholy do you have any thoughts on the above comments by @mateuszbaran ? |
I've checked the paper and it doesn't have the termination condition corresponding to "It's so flat, secant didn't do anything useful, time to quit" present in LineSearches.jl implementation. I don't see what the justification for it is. Maybe it should be modified? |
Bump? |
I think Tim may have added that. I suppose you're saying that maybe it's failing here partially because the branch doesn't belong? |
There is clearly a difference between how it's commented and what it actually does. It's possible that termination condition actually catches some corner and in some cases is a good idea but currently it also catches things it shouldn't catch (including in this PR). I don't know that corner case it was supposed to handle so I don't know how to properly fix it. Anyway, there is always that simple change I've done that works well enough for me and shouldn't really harm any use case. There is already at least one person (other than me and Ronny) who had issues because my fixed HZ linesearch is what works best for his problem but he would like to use it in Pluto which doesn't like non-registered packages. So it would be nice to have a good HZ linesearch in a registered package. |
Sorry I haven't had time for "general"/"charitable" development lately. My busy schedule is likely to persist another couple of weeks at least. If anyone wants to take this effort over, by all means go for it. A couple of high-level thoughts:
Thus if someone wants to help move this forward, perhaps the best thing to do is focus on developing that test harness. |
Thank you, your observations are very on-point. To push this forward I can prepare a simplified test case, either this week or the next one. Anyway, I'm not a maintainer of this package so I can't decide how to fix this issue. |
I have made a self-contained example based on cubic interpolation. It somehow doesn't reproduce the original problem -- I will look for the right test values but that's all I could do in the time I've found so far. struct LineSearchTestCase
alphas::Vector{Float64}
values::Vector{Float64}
slopes::Vector{Float64}
end
function prepare_test_case(; alphas, values, slopes)
perm = sortperm(alphas)
alphas = alphas[perm]
push!(alphas, alphas[end]+1)
values = values[perm]
push!(values, values[end])
slopes = slopes[perm]
push!(slopes, 0.0)
return LineSearchTestCase(alphas, values, slopes)
end
tc1 = prepare_test_case(;
alphas = [0.0, 1.0, 5.0, 3.541670844449739],
values = [3003.592409634743, 2962.0378569864743, 2891.4462095232184, 3000.9760725116876],
slopes = [-22332.321416890798, -20423.214551925797, 11718.185026267562, -22286.821227217057]
)
function tc_to_f(tc)
function f(x)
i = findfirst(u -> u > x, tc.alphas) - 1
xk = tc.alphas[i]
xkp1 = tc.alphas[i + 1]
dx = xkp1 - xk
t = (x - xk) / dx
h00t = 2t^3 - 3t^2 + 1
h10t = t * (1 - t)^2
h01t = t^2 * (3 - 2t)
h11t = t^2 * (t - 1)
val =
h00t * tc.values[i] +
h10t * dx * tc.slopes[i] +
h01t * tc.values[i + 1] +
h11t * dx * tc.slopes[i + 1]
return val
end
end
function tc_to_fdf(tc)
function fdf(x)
i = findfirst(u -> u > x, tc.alphas) - 1
xk = tc.alphas[i]
xkp1 = tc.alphas[i + 1]
dx = xkp1 - xk
t = (x - xk) / dx
h00t = 2t^3 - 3t^2 + 1
h10t = t * (1 - t)^2
h01t = t^2 * (3 - 2t)
h11t = t^2 * (t - 1)
val =
h00t * tc.values[i] +
h10t * dx * tc.slopes[i] +
h01t * tc.values[i + 1] +
h11t * dx * tc.slopes[i + 1]
h00tp = 6t^2 - 6t
h10tp = 3t^2 - 4t + 1
h01tp = -6t^2 + 6 * t
h11tp = 3t^2 - 2t
slope =
(
h00tp * tc.values[i] +
h10tp * dx * tc.slopes[i] +
h01tp * tc.values[i + 1] +
h11tp * dx * tc.slopes[i + 1]
) / dx
println(x, " ", val, " ", slope)
return val, slope
end
end
using LineSearches
function test_tc(tc)
hz = HagerZhang()
f = tc_to_f(tc)
fdf = tc_to_fdf(tc)
hz(f, fdf, 1.0, fdf(0.0)...)
end
test_tc(tc1) EDIT: new testing code, where the best searched value is 2891.4462095232184 but HZ returns the point at which the objective is equal to 3000.9760725116876 |
I've updated my example, now there is a clear, fully self-contained case of sub-optimal returned value. |
Thank you so much Mateusz! I hope to have a little more time next week than in the past months :) |
Not forgotten! I will use your test-case to look at this next week. |
You have been patient :) I had some deadlines that were quite important. I fixed some Optim.jl stuff yesterday evening, and I will try to look at this when I'm off work. |
As per #175, I encountered another issue where the flatness check causes early termination at a point that is not close to stationary, which does not seem to be resolved by this PR. The problematic sequence of cs = [0, 0.2, 0.1, 0.055223623837016025]
ϕs = [3.042968312396456, 3.117411287254072, -3.503584823341612, 0.5244246783084083]
dϕs = [-832.4270136930788, -505.33622492291033, 674.947830358913, 738.3388472434362] This bug took a while for me to diagnose (typical users will not be suspecting a problem in LineSearches). Maybe as a band-aid we can add a check inside that flatness check -- if the norm of the gradient does not seem small, then print a warning for the user that there is a known bug with this part of LineSearches? My original function is oscillatory, but like @timholy says above, one can just as well create a polynomial giving the same data:
I created the package HermiteInterpolation for exactly this purpose (submitted for registration), and can be used like this: using HermiteInterpolation: fit
f = fit(cs, ϕs, dϕs)
@assert f.(cs) ≈ ϕs
using DynamicPolynomials
@polyvar c
println(f(c))
# 3.042968312396456 - 832.4270136930788*c - 132807.15591801773*c^2 + 7.915421661743959e6*c^3 - 1.570284840040962e8*c^4 + 1.4221708747294645e9*c^5 - 5.970065591205604e9*c^6 + 9.405512899903618e9*c^7
println(differentiate(f(c), c))
# -832.4270136930788 - 265614.31183603546*c + 2.3746264985231876e7*c^2 - 6.281139360163848e8*c^3 + 7.110854373647323e9*c^4 - 3.582039354723362e10*c^5 + 6.5838590299325325e10*c^6 Incidentally, as a temporary workaround for my optimization problem, I came up with this sequence: method = Optim.LBFGS(; linesearch=Optim.LineSearches.BackTracking(order=2))
res0 = Optim.optimize(f, g!, collect(reinterpret(Float64, params)), method, options)
res = Optim.optimize(f, g!, Optim.minimizer(res0), Optim.ConjugateGradient(), options) LBFGS with BackTracking search seems to converge robustly, and once it's near the local minimum, then ConjugateGradient can effectively fine tune minimizer. |
Here is an optimization problem that highlights a fundamental problem with the current flatness check: # The minimizer is x0=[0, 2πn/100], with f(x0) = 1. Any integer n is fine.
function f(x)
return (x[1]^2 + 1) * (2 - cos(100*x[2]))
end
using Optim
function test_converges(method)
for i in 1:100
r = randn(2)
res = optimize(f, r, method)
if Optim.converged(res) && minimum(res) > f([0,0]) + 1e-8
println("""
Incorrectly reported convergence after $(res.iterations) iterations
Reached x = $(Optim.minimizer(res)) with f(x) = $(minimum(res))
""")
end
end
end
# Works successfully, no printed output
test_converges(LBFGS(; linesearch=Optim.LineSearches.BackTracking(order=2)))
# Prints ~10 failures to converge (in 100 tries). Frequently fails after the
# first line search.
test_converges(ConjugateGradient()) |
Gentle ping -- any hope for tightening up the HZ "flatness check"? The randomized optimization problem above should work pretty robustly as a unit test. |
I'll get to this in another week or two. |
secant2
callsupdate
, andupdate
might switch to bisection in the U3 step (as named in the initial HZ publication). If we did bisect, the line-search "are we making enough progress?" check (step L2) is nonsensical (we might have shrunk multiple iterations of bisection, but that is not an indication that the secant model was "working").Consequently, this reports back about whether bisection was engaged in
update
, and if so skip any kind of convergence assessment and do another iteration.Fixes #173. Sadly there isn't a very portable test.