WIP: Derivatives interface #38

bamos · 2019-11-02T18:39:39Z

Hi all -- here's a first attempt at moving some of the derivative code from https://github.com/oxfordcontrol/osqpth/ and into here. What do you think of this interface?

If it's good I can also start adding:

The same interface to modulepurepy
The full differentiation mode, following Add full KKT differentiation mode, split into new-style PyTorch Module/Function, automatically infer some batch sizes osqpth#5 but with slacks too
Backward derivatives
Forward derivatives
Derivative tests (for now just with backward derivatives)
Batch loops and multithreaded

Two other things:

Something that may be worth adding to diff_mode is the linear system solver type too (direct/LSQR/sparse/dense)
This follows the interface of https://github.com/cvxgrp/diffcp/blob/master/diffcp/cone_program.py -- do you also want to stub in a derivative function here too?

\cc @sbarratt @akshayka

CLAassistant · 2019-11-02T18:40:09Z

All committers have signed the CLA.

bstellato · 2019-11-02T19:42:41Z

This is great thanks! Just a couple of comments/suggestions:

I would focus on the sparse version first since the solver is anyway sparse. Then we can work out the dense version after we have the sparse one running.
I totally agree with having a direct solver (QR decomposition) and indirect solver (LSQR). I am not sure LSQR always works for badly conditioned systems. By the way, if we end up using the functions from the internal C code we will benefit from the problem data scaling in OSQP. This should help the LSQR convergence.
What do you think about having the same solve function for osqp and then adding some boolean option to compute the derivatives? I mean something like derivative=True. This would be more coherent with the current osqp interface when we pass the polish=True to perform the final polish step at the end of the ADMM algorithm.

bamos · 2019-11-04T20:01:44Z

I totally agree with having a direct solver (QR decomposition) and indirect solver (LSQR). I am not sure LSQR always works for badly conditioned systems. By the way, if we end up using the functions from the internal C code we will benefit from the problem data scaling in OSQP. This should help the LSQR convergence.

How would you like to specify these modes? Maybe we can put the solver into diff_mode as full_qr, full_lsqr?

What do you think about having the same solve function for osqp and then adding some boolean option to compute the derivatives? I mean something like derivative=True. This would be more coherent with the current osqp interface when we pass the polish=True to perform the final polish step at the end of the ADMM algorithm.

Yes this sounds good too. How would you like the derivative functions to be returned? Maybe we could add derivative and adjoint_derivative fields to the OSQP_results struct that point to Python functions?

bamos · 2019-11-04T20:03:04Z

Maybe we could add derivative and adjoint_derivative fields to the OSQP_results struct that point to Python functions?

It may be easy to accidentally leak memory if we add pointers to Python functions in the struct where the Python functions keep data around

bstellato · 2019-11-05T16:05:06Z

How would you like to specify these modes? Maybe we can put the solver into diff_mode as full_qr, full_lsqr?

Happy to use the parameter diff_mode. We could also just set it to qr and lsqr. The active-set mode could be a special feature we run only in debug (not very robust yet) as active_qr and active_lsqr.

Yes this sounds good too. How would you like the derivative functions to be returned? Maybe we could add derivative and adjoint_derivative fields to the OSQP_results struct that point to Python functions?

It may be easy to accidentally leak memory if we add pointers to Python functions in the struct where the Python functions keep data around

I think we need to discuss this a bit more. I would really like to have a solution that does not necessarily depend on Python function pointers. This would be enable the use of differentiable OSQP also in other languages. (e.g., Julia?)

Here is an idea that goes against what I suggested before (no derivative=True option anymore) but that could work.

We could have lower level functions in C called osqp_derivative and osqp_adjoint_derivative that take the workspace structure as input, check that the problem status is solved and compute the derivative. To start, these functions could be prototyped in Python and then ported to C later.

However, this approach would need the workspace to be stored somewhere in Python. At the moment it does not happen with the one-shot solve function disabling the GIL but it does happen in the standard way we call OSQP with the Python object. We can fix this by using the OSQP object methods instead of the one-shot solve and by disabling the GIL (making them thread unsafe).

- Release GIL on OSQP object functions - Removed OSQP one-shot solve function (useless now and repeats code) - Added test for 'qr_active' differentiation mode - Dropped Python 2 support (next release will not support it)

bstellato · 2019-11-08T15:51:13Z

I have started creating the differentiable interface from Python. It works from the object interface now. In particular, I have

Released GIL on OSQP object functions
Removed OSQP one-shot solve function (useless now and repeats code)
Added test for 'qr_active' differentiation mode
Dropped Python 2 support (next release will not support it)

The tests are now using numdifftools but I would prefer not to install additional packages. Let me know what you think.

bamos · 2019-11-08T20:18:16Z

module/interface.py

+        results = self._model.solve()
+
+        # TODO(bart): this will be unnecessary when the derivative will be in C
+        self._derivative_cache['results'] = results


Should we also set this to None if update is called and then make sure it's populated when the derivative functions are called? This way somebody can't accidentally change the data and then try to differentiate through an old problem.

Also it's ok to not support the setting when the user wants to solve a batch of problems by using update on a single OSQP instance and then differentiate through all the solutions, right?

You are definitely right. I was thinking that anyway we will not have this issue later when we will port it to C. I have changed to code to set it to None when the update function gets called.

Can we remove the self._derivative_cache = None line and add the following code to the bottom of the update function? This way the _derivative_cache is updated properly and it removes results from it, meaning the problem hasn't been solved.

# update problem data in self._derivative_cache if q is not None: self._derivative_cache["q"] = q if l is not None: self._derivative_cache["l"] = l if u is not None: self._derivative_cache["u"] = u if Px is not None: if Px_idx.size == 0: self._derivative_cache["P"].data = Px else: self._derivative_cache["P"].data[Px_idx] = Px if Ax is not None: if Ax_idx.size == 0: self._derivative_cache["A"].data = Ax else: self._derivative_cache["A"].data[Ax_idx] = Ax # delete results from self._derivative_cache to prohibit # taking the derivative of unsolved problems if "results" in self._derivative_cache.keys(): del self._derivative_cache["results"]

bstellato · 2019-11-08T22:09:02Z

I have implemented the backward derivative also with the lsqr method. All the tests pass except dA.

It is weird because the derivative does not match the numerically estimated one, but we get the same output for both lu_active or lsqr method. Probably there is an issue either in the math or in the numerical estimation.

sbarratt · 2019-11-08T23:15:49Z

The tests are now using numdifftools but I would prefer not to install additional packages. Let me know what you think.

Could we use check_grad? (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.check_grad.html)
This is part of scipy, so should already be included.

sbarratt · 2019-11-08T23:42:18Z

module/interface.py

+            dA_vals = \
+                y_u[rows] * r_x[cols] + y_u[rows] * (r_yu[rows] * x[cols]) - \
+                (y_l[rows] * r_x[cols] + y_l[rows] * (r_yl[rows] * x[cols]))


Can this just be:

dA_vals = (y_u[rows] - y_l[rows]) * r_x[cols] + \ (y_u[rows] * r_yu[rows] - y_l[rows] * r_yl[rows]) * x[cols]

Good point. Fixed. Looks neater now.

sbarratt · 2019-11-09T00:06:02Z

module/interface.py

+        results = self._model.solve()
+
+        # TODO(bart): this will be unnecessary when the derivative will be in C
+        self._derivative_cache['results'] = results


Can we remove the self._derivative_cache = None line and add the following code to the bottom of the update function? This way the _derivative_cache is updated properly and it removes results from it, meaning the problem hasn't been solved.

# update problem data in self._derivative_cache if q is not None: self._derivative_cache["q"] = q if l is not None: self._derivative_cache["l"] = l if u is not None: self._derivative_cache["u"] = u if Px is not None: if Px_idx.size == 0: self._derivative_cache["P"].data = Px else: self._derivative_cache["P"].data[Px_idx] = Px if Ax is not None: if Ax_idx.size == 0: self._derivative_cache["A"].data = Ax else: self._derivative_cache["A"].data[Ax_idx] = Ax # delete results from self._derivative_cache to prohibit # taking the derivative of unsolved problems if "results" in self._derivative_cache.keys(): del self._derivative_cache["results"]

sbarratt · 2019-11-09T00:07:09Z

module/interface.py

+        A = self._derivative_cache['A']
+        l, u = self._derivative_cache['l'], self._derivative_cache['u']
+
+        results = self._derivative_cache['results']


Can we do:

try: results = self._derivative_cache['results'] except KeyError: raise ValueError("Problem has not been solved. You cannot take derivatives. Please call the solve function.")

Fixed both. I did not spent too much time fixing these issues since I think they will be much easier to deal with from the C code where the cache will correspond to our workspace.

sbarratt · 2019-11-09T00:50:51Z

I double checked and the math seems right. Here's a write-up:
diff_osqp.pdf

…cache

bstellato · 2019-11-09T16:34:43Z

Tests should be working now (at least from my machine) for all the adjoint derivatives. I ended up using approx_fprime from scipy and tuning its precision accordingly.

I think we should add the forward derivative and the multithreaded code from this interface directly (just as it happens for diffcp).

This reverts commit 115e28c.

bamos · 2019-11-28T22:46:58Z

Just quickly added back in the full, sparse, LU option too. I think we'll need to tune the LSQR settings a bit to make sure it converges for most problems, as here's an ill-conditioned problem that LSQR returns early from but the direct solver gives something reasonable: t.pkl.zip

import pickle as pkl
import torch
from osqpth.osqpth import OSQP

Q_idx, Q_shape, A_idx, A_shape, Q_data, p, A_data, l, u = pkl.load(open('/private/home/bda/repos/optnet/sudoku/t.pkl', 'rb'))
p.requires_grad_()

y_lsqr = OSQP(Q_idx, Q_shape, A_idx, A_shape, diff_mode='lsqr')(
    Q_data, p, A_data, l, u
)
print('==== lsqr')
print(torch.autograd.grad(y_lsqr[0,0], p)[0])

y_lu = OSQP(Q_idx, Q_shape, A_idx, A_shape, diff_mode='lu')(
    Q_data, p, A_data, l, u
)
print('===== lu')
print(torch.autograd.grad(y_lu[0,0], p)[0])

Output:

==== lsqr
 
LSQR            Least-squares solution of  Ax = b
The matrix A has      272 rows  and      272 cols
damp = 0.00000000000000e+00   calc_var =        0
atol = 1.00e-08                 conlim = 1.00e+08
btol = 1.00e-08               iter_lim =      544
 
   Itn      x[0]       r1norm     r2norm   Compatible    LS      Norm A   Cond A
     0  0.00000e+00   1.000e+00  1.000e+00    1.0e+00  5.8e-01
     1  4.70547e-03   9.920e-01  9.920e-01    9.9e-01  1.0e+05   4.6e+00  1.0e+00
     2  4.71200e-03   9.920e-01  9.920e-01    9.9e-01  7.9e-09   1.0e+08  2.2e+07
 
LSQR finished
The least-squares solution is good enough, given atol     
 
istop =       2   r1norm = 9.9e-01   anorm = 1.0e+08   arnorm = 7.9e-01
itn   =       2   r2norm = 9.9e-01   acond = 2.2e+07   xnorm  = 2.7e-02
 
tensor([[-4.7120e-03,  9.1007e-18,  7.7816e-18,  9.2699e-18,  9.8331e-18,
          1.1310e-17,  9.2798e-18,  9.8769e-18,  9.1475e-18,  8.3885e-18,
          7.9731e-18,  1.0012e-17,  9.4575e-18,  9.7441e-18,  8.5112e-18,
          9.1525e-18,  8.9611e-18,  1.0282e-17,  8.6824e-18,  8.2783e-18,
          7.9033e-18,  6.1820e-18,  8.8372e-18,  8.7393e-18,  8.5133e-18,
          8.4072e-18,  9.3348e-18,  6.6223e-18,  8.8388e-18,  7.8768e-18,
          7.1385e-18,  7.4183e-18,  8.0707e-18,  7.9934e-18,  7.0882e-18,
          8.7638e-18,  8.5200e-18,  9.0570e-18,  8.7410e-18,  8.4229e-18,
          8.2337e-18,  5.8012e-18,  8.3921e-18,  6.9027e-18,  9.5226e-18,
          9.1675e-18,  7.7723e-18,  8.3988e-18,  7.4415e-18,  7.5445e-18,
          8.8055e-18,  9.4669e-18,  8.3989e-18,  8.3193e-18,  9.0031e-18,
          7.4605e-18,  1.0165e-17,  7.0037e-18,  8.8694e-18,  9.6764e-18,
          9.5390e-18,  9.1936e-18,  9.7664e-18,  1.0186e-17]], device='cuda:0',
       dtype=torch.float64)
===== lu
tensor([[-1.2817e+00,  1.7240e-04, -7.5087e-05,  7.0880e-02, -2.5255e-02,
          2.9073e-04, -1.2829e+00,  8.6578e-04, -5.7333e-01, -7.3684e-01,
         -1.5084e-01,  3.2826e-01, -3.7487e-01,  6.1822e-01,  2.9244e-01,
         -1.5862e-01,  4.6635e-01,  2.0958e-01,  1.9206e-01,  1.5975e-01,
         -3.8013e-03, -8.8888e-02, -4.0045e-01,  1.8145e-01,  5.0644e-01,
         -4.2531e-01, -2.3564e-02,  4.9506e-01,  6.2890e-01, -5.0957e-01,
         -7.4680e-01, -5.2668e-02, -1.4794e-02,  3.4635e-01,  1.6762e-05,
          1.1221e-01,  4.5385e-01,  6.7591e-02, -5.5885e-01, -2.9073e-01,
         -5.2149e-01, -2.5710e-01, -2.6545e-01, -3.4104e-01, -2.4808e-01,
         -2.8137e-02,  6.1538e-01,  5.7590e-06,  2.3811e-04, -3.5384e-01,
         -7.7634e-02, -3.7694e-01,  6.0934e-01, -1.8100e-04,  4.5111e-01,
          4.4656e-01,  1.9813e-03, -2.0191e-02,  1.6115e-02,  1.6816e-01,
          1.3684e+00,  8.1238e-01,  1.0328e-01,  3.1379e-01]], device='cuda:0',
       dtype=torch.float64)

bstellato · 2020-03-21T15:37:03Z

Just to update on this: there has been much progress on the CVXPY QP interface https://github.com/cvxgrp/cvxpy/pull/960 thanks to @SteveDiamond.

From OSQP side I am working on a formulation that does not need LSQR. We now have qdldl-python that can be quickly hooked in and supports factorization updates/multi-threading.

bstellato · 2020-03-25T22:41:31Z

I have just pushed the derivatives with symmetrized M matrix working with qdldl solver. @bamos @akshayka would you have time to have a look at it? I think it should be more reliable than LSQR. It requires qdldl.

akshayka · 2020-03-25T22:52:41Z

Awesome! The interface for adjoint_derivative looks good to me. It'd be great if a derivative method was also added at some point, but for CVXPY Layers, the adjoint is all we need.

Do the derivative tests all pass? (Just curious, I haven't run them myself.)

The dependency on QDLDL is fine by me.

Besides style things, such as commented out dead code (which I'm sure you're already planning to fix), it looks good to me.

bamos · 2020-03-27T15:37:34Z

Awesome! I've pulled the latest code and updated this osqpth PR/branch to use it: osqp/osqpth#7

The OptNet sudoku experiment is still not converging with osqpth as it does with osqp and I've been debugging it a bit over the past few days. It may be related to gradient scaling (related to summing v averaging across the broadcasted input tensors), the derivatives of equality constraints, or the -inf/large negative inequality constraints, but I'm not quite sure. I've also bumped up the accuracy of OSQP, as these problems are pretty poorly-scaled, which may make them a bad test suite for this...

I've been debugging by doing a single epoch of training with a minibatch size of 1:

If you think it should work with equality constraints and -inf/large negative inequality constraints, I can try to pull out a few concrete examples of where the mismatch is happening.

Also I ran into this error when factorizing M and patched it by adding a small diagonal term:

RuntimeError: Error in computing elimination tree. Matrix not properly upper-triangular, sum_Lnz = -1
> /private/home/bda/repos/osqp-python/osqp/interface.py(374)adjoint_derivative()
    372                                 spa.diags(y_l) @ dy_l])
    373 
--> 374         r_sol = - qdldl.Solver(self._derivative_cache['M']).solve(d_sol)
    375 
    376         r_x, r_yu, r_yl = np.split(r_sol, [n, n+m

module/interface.py

bstellato · 2020-04-04T02:59:12Z

The OptNet sudoku experiment is still not converging with osqpth as it does with osqp and I've been debugging it a bit over the past few days. It may be related to gradient scaling (related to summing v averaging across the broadcasted input tensors), the derivatives of equality constraints, or the -inf/large negative inequality constraints, but I'm not quite sure. I've also bumped up the accuracy of OSQP, as these problems are pretty poorly-scaled, which may make them a bad test suite for this...

It might be worth using this example to check the derivatives correctness since it is more challenging than our unittests.

Also I ran into this error when factorizing M and patched it by adding a small diagonal term:

RuntimeError: Error in computing elimination tree. Matrix not properly upper-triangular, sum_Lnz = -1
> /private/home/bda/repos/osqp-python/osqp/interface.py(374)adjoint_derivative()
    372                                 spa.diags(y_l) @ dy_l])
    373 
--> 374         r_sol = - qdldl.Solver(self._derivative_cache['M']).solve(d_sol)
    375 
    376         r_x, r_yu, r_yl = np.split(r_sol, [n, n+m

This error should not happen since we have a convex program. I wonder if this is related to scipy having not necessarily ordered sparse matrices. Maybe I should check has_sorted_indices before factoring. Do you have a MWE reproducing it?

bstellato · 2020-04-04T03:01:36Z

Awesome! The interface for adjoint_derivative looks good to me. It'd be great if a derivative method was also added at some point, but for CVXPY Layers, the adjoint is all we need.

Do the derivative tests all pass? (Just curious, I haven't run them myself.)

The dependency on QDLDL is fine by me.

Besides style things, such as commented out dead code (which I'm sure you're already planning to fix), it looks good to me.

The tests pass on my computer but do not pass on ci because qdldl-python is not available on conda-forge yet. I just created an issue conda-forge/osqp-feedstock#23 hoping that they could help me creating a feedstock. I will keep you posted and cleanup the code once we have something finalized ;-)

bamos · 2020-04-04T15:18:25Z

Do you have a MWE reproducing it?

Yes, I just looked a bit closer and it looks like this is happening when the solution hits the boundary exactly and makes those residuals in M exactly zero. Here's a MWE:

import scipy.sparse as sp
import numpy as np
import numpy.random as npr

import osqp

n = 100

npr.seed(1)

P = 0.01*npr.randn(n,n)
P = P.T.dot(P)
P = sp.csr_matrix(P)
q = npr.randn(n)
A = sp.csr_matrix(npr.randn(n, n))
l = npr.randn(n)
u = l

s = osqp.OSQP()
s.setup(P, q, A, l, u, eps_abs=1e-10, eps_rel=1e-10, max_iter=int(1e8))
results = s.solve()

s.adjoint_derivative(dx=np.ones(n))

bamos · 2020-04-14T00:14:23Z

I just tried replacing the QP solver with the cvxpylayers with the CP backend and that is able to reproduce the qpth results on the sudoku experiment. From a quick glance it even seems to slightly help roll out some instabilities with qpth. So this at least points to the convergence issues here being either from some way I've hooked osqp up to the sudoku experiment, or from some way the derivatives are being handled for these problems

WIP proposal for solve_and_derivative interface

4031dd0

bamos requested a review from bstellato November 2, 2019 18:39

bamos mentioned this pull request Nov 2, 2019

Tweak PyTorch Function and call into osqp-python for the derivative. osqp/osqpth#7

Open

Replace f-string

6ad2632

bamos force-pushed the master branch from 9be3ff5 to 6ad2632 Compare November 2, 2019 19:23

Added object-oriented derivatives interface

8d42663

- Release GIL on OSQP object functions - Removed OSQP one-shot solve function (useless now and repeats code) - Added test for 'qr_active' differentiation mode - Dropped Python 2 support (next release will not support it)

Install numdifftools

563b325

bamos commented Nov 8, 2019

View reviewed changes

Added lsqr mode. All tests passing except dA

a88e203

ci fix

f24323c

sbarratt reviewed Nov 8, 2019

View reviewed changes

sbarratt reviewed Nov 9, 2019

View reviewed changes

bstellato added 2 commits November 9, 2019 11:27

Using approx_fprime from scipy. Fixed tests for dA. Fixed derivative …

1ecfbb1

…cache

Simplified dA_vals

47a22d4

bstellato changed the title ~~WIP proposal for solve_and_derivative interface~~ WIP: Derivatives interface Nov 9, 2019

bstellato and others added 4 commits November 9, 2019 15:29

Removed py27 installation utils for windows

a3e094b

Added python 3.8

115e28c

Revert "Added python 3.8"

0a28ed9

This reverts commit 115e28c.

Add lu diff_mode

7004b69

bamos force-pushed the master branch from fe30b76 to 7004b69 Compare November 28, 2019 22:23

Woring ldl with scikit-sparse

3058426

Added qdldl-based derivatives

6f3e6f5

bstellato mentioned this pull request Mar 25, 2020

[WIP] Add ParamQuadProg cvxpy/cvxpy#960

Merged

3 tasks

remove sksparse

4e679cb

bamos commented Mar 27, 2020

View reviewed changes

module/interface.py Outdated Show resolved Hide resolved

bstellato mentioned this pull request Apr 4, 2020

Create qdldl-python feedstock conda-forge/osqp-feedstock#23

Closed

bstellato added 4 commits July 28, 2020 19:36

No need to install pip qdldl anymore

f49b621

minor setup.py fixes

efc24a1

Added symmetrization + iterative refinement

129a2c1

ci

46dff11

bstellato merged commit 46dff11 into osqp:master Aug 7, 2020

This was referenced Aug 7, 2020

Derivatives interface #45

Closed

Derivatives Interface #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Derivatives interface #38

WIP: Derivatives interface #38

bamos commented Nov 2, 2019 •

edited by bstellato

Loading

CLAassistant commented Nov 2, 2019 •

edited

Loading

bstellato commented Nov 2, 2019

bamos commented Nov 4, 2019

bamos commented Nov 4, 2019

bstellato commented Nov 5, 2019

bstellato commented Nov 8, 2019

bamos Nov 8, 2019

bstellato Nov 8, 2019

sbarratt Nov 9, 2019

bstellato commented Nov 8, 2019

sbarratt commented Nov 8, 2019

sbarratt Nov 8, 2019

bstellato Nov 9, 2019

sbarratt Nov 9, 2019

sbarratt Nov 9, 2019

bstellato Nov 9, 2019

sbarratt commented Nov 9, 2019 •

edited

Loading

bstellato commented Nov 9, 2019

bamos commented Nov 28, 2019

bstellato commented Mar 21, 2020

bstellato commented Mar 25, 2020

akshayka commented Mar 25, 2020

bamos commented Mar 27, 2020

bstellato commented Apr 4, 2020

bstellato commented Apr 4, 2020

bamos commented Apr 4, 2020

bamos commented Apr 14, 2020

WIP: Derivatives interface #38

WIP: Derivatives interface #38

Conversation

bamos commented Nov 2, 2019 • edited by bstellato Loading

CLAassistant commented Nov 2, 2019 • edited Loading

bstellato commented Nov 2, 2019

bamos commented Nov 4, 2019

bamos commented Nov 4, 2019

bstellato commented Nov 5, 2019

bstellato commented Nov 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bstellato commented Nov 8, 2019

sbarratt commented Nov 8, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbarratt commented Nov 9, 2019 • edited Loading

bstellato commented Nov 9, 2019

bamos commented Nov 28, 2019

bstellato commented Mar 21, 2020

bstellato commented Mar 25, 2020

akshayka commented Mar 25, 2020

bamos commented Mar 27, 2020

bstellato commented Apr 4, 2020

bstellato commented Apr 4, 2020

bamos commented Apr 4, 2020

bamos commented Apr 14, 2020

bamos commented Nov 2, 2019 •

edited by bstellato

Loading

CLAassistant commented Nov 2, 2019 •

edited

Loading

sbarratt commented Nov 9, 2019 •

edited

Loading