Skip to content

Latest commit

 

History

History
195 lines (148 loc) · 7.83 KB

README.rst

File metadata and controls

195 lines (148 loc) · 7.83 KB

LBFGSB

License Stars Python PyPI Downoads Build Status Documentation Status Coverage codacy Precommit: enabled Ruff Checked with mypy DOI

A python impementation of the famous L-BFGS-B quasi-Newton solver [1].

This code is a python port of the famous implementation of Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS), algorithm 778 written in Fortran [2,3] (last update in 2011). Note that this is not a wrapper like minimize` in scipy but a complete reimplementation (pure python). The original Fortran code can be found here: https://dl.acm.org/doi/10.1145/279232.279236

Motivations

Although there are many implementations or ports (wrappings) of the lbfgsb code, as evidenced by the list compiled by Jonathan Schilling, in the vast majority, these are merely interfaces (wrapper) to access highly optimized Fortran or C code. In Python, for example, this is the case for scipy. These codes mainly date back to the 90s and are difficult to understand, and therefore difficult to maintain or modify. Incidentally, the only other python implementation we know of to date, by Avieira, is not very optimized and under GPL3 license, which makes it tricky to use.

In this context, the objectives of this code are as follows:

  • learn the underlying mechanisms of lbfgsb code;
  • provide understandable, modern code using the high-level language python, while using typing, explicit function names and standardized formatting thanks to Ruff;
  • provide detailed and explicit documentation;
  • offer totally free code, including for commercial use, thanks to the MIT license;
  • garantee efficient code, with the number of calls to the objective function and gradient at least as low as in the reference implementation, and without drastically increasing memory consumption or computation time, thanks to the use of numpy and vectorization;
  • add relevant stopping criteria;
  • add the possibility to restart the solver from a checkpoint;
  • add the possibility of modifying on-the-fly the gradient sequences stored in memory, an essential mechanism for the automatic and adaptive weighting of a possible regularization term, See (TODO). This is one of the initial motivation;
  • use a logging system rather than prints, for better integration within complex apps.

References

[1] R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound Constrained Optimization, (1995), SIAM Journal on Scientific and Statistical Computing, 16, 5, pp. 1190-1208.

[2] C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (1997), ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560.

[3] J.L. Morales and J. Nocedal. L-BFGS-B: Remark on Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (2011), ACM Transactions on Mathematical Software, 38, 1.

Quick start

Given an optimization problem defined by an objective function and a feasible space:

import numpy as np
from lbfgsb.types import NDArrayFloat  # for type hints, numpy array of floats

 def rosenbrock(x: NDArrayFloat) -> float:
     """
     The Rosenbrock function.

     Parameters
     ----------
     x : array_like
         1-D array of points at which the Rosenbrock function is to be computed.

     Returns
     -------
     float
         The value of the Rosenbrock function.

     """
     x = np.asarray(x)
     sum1 = ((x[1:] - x[:-1] ** 2.0) ** 2.0).sum()
     sum2 = np.square(1.0 - x[:-1]).sum()
     return 100.0 * sum1 + sum2


 def rosenbrock_grad(x: NDArrayFloat) -> NDArrayFloat:
     """
     The gradient of the Rosenbrock function.

     Parameters
     ----------
     x : array_like
         1-D array of points at which the Rosenbrock function is to be derivated.

     Returns
     -------
     NDArrayFloat
         The gradient of the Rosenbrock function.
     """
     x = np.asarray(x)
     g = np.zeros(x.size)
     # derivation of sum1
     g[1:] += 100.0 * (2.0 * x[1:] - 2.0 * x[:-1] ** 2.0)
     g[:-1] += 100.0 * (-4.0 * x[1:] * x[:-1] + 4.0 * x[:-1] ** 3.0)
     # derivation of sum2
     g[:-1] += 2.0 * (x[:-1] - 1.0)
     return g

lb = np.array([-2, -2])  # lower bounds
ub = np.array([2, 2])  # upper bounds
bounds = np.array((l, u)).T  # The number of variables to optimize is len(bounds)
x0 = np.array([-0.8, -1])  # The initial guess

The optimal solution can be found following:

from lbfgsb import minimize_lbfgsb

x = minimize_lbfgsb(
  x0=x0, fun=rosenbrock, jac=rosenbrock_grad, bounds=bounds, ftol=1e-5, gtol=1e-5
)

minimize_lbfgsb returns an OptimalResult instance (from scipy) that contains the results of the optimization:

 message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FTOL
 success: True
  status: 0
     fun: 3.9912062309350614e-08
       x: [ 1.000e+00  1.000e+00]
     nit: 18
     jac: [-6.576e-02  3.220e-02]
    nfev: 23
    njev: 23
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>

See all use cases in the tutorials section of the documentation.