-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Derivatives interface #38
Conversation
This is great thanks! Just a couple of comments/suggestions:
|
How would you like to specify these modes? Maybe we can put the solver into
Yes this sounds good too. How would you like the derivative functions to be returned? Maybe we could add |
It may be easy to accidentally leak memory if we add pointers to Python functions in the struct where the Python functions keep data around |
Happy to use the parameter
I think we need to discuss this a bit more. I would really like to have a solution that does not necessarily depend on Python function pointers. This would be enable the use of differentiable OSQP also in other languages. (e.g., Julia?) Here is an idea that goes against what I suggested before (no We could have lower level functions in C called However, this approach would need the workspace to be stored somewhere in Python. At the moment it does not happen with the one-shot solve function disabling the GIL but it does happen in the standard way we call OSQP with the Python object. We can fix this by using the OSQP object methods instead of the one-shot solve and by disabling the GIL (making them thread unsafe). |
- Release GIL on OSQP object functions - Removed OSQP one-shot solve function (useless now and repeats code) - Added test for 'qr_active' differentiation mode - Dropped Python 2 support (next release will not support it)
I have started creating the differentiable interface from Python. It works from the object interface now. In particular, I have
The tests are now using numdifftools but I would prefer not to install additional packages. Let me know what you think. |
results = self._model.solve() | ||
|
||
# TODO(bart): this will be unnecessary when the derivative will be in C | ||
self._derivative_cache['results'] = results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also set this to None
if update
is called and then make sure it's populated when the derivative functions are called? This way somebody can't accidentally change the data and then try to differentiate through an old problem.
Also it's ok to not support the setting when the user wants to solve a batch of problems by using update
on a single OSQP instance and then differentiate through all the solutions, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are definitely right. I was thinking that anyway we will not have this issue later when we will port it to C. I have changed to code to set it to None
when the update
function gets called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the self._derivative_cache = None
line and add the following code to the bottom of the update
function? This way the _derivative_cache
is updated properly and it removes results
from it, meaning the problem hasn't been solved.
# update problem data in self._derivative_cache
if q is not None:
self._derivative_cache["q"] = q
if l is not None:
self._derivative_cache["l"] = l
if u is not None:
self._derivative_cache["u"] = u
if Px is not None:
if Px_idx.size == 0:
self._derivative_cache["P"].data = Px
else:
self._derivative_cache["P"].data[Px_idx] = Px
if Ax is not None:
if Ax_idx.size == 0:
self._derivative_cache["A"].data = Ax
else:
self._derivative_cache["A"].data[Ax_idx] = Ax
# delete results from self._derivative_cache to prohibit
# taking the derivative of unsolved problems
if "results" in self._derivative_cache.keys():
del self._derivative_cache["results"]
I have implemented the backward derivative also with the It is weird because the derivative does not match the numerically estimated one, but we get the same output for both |
Could we use check_grad? (https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.check_grad.html) |
module/interface.py
Outdated
dA_vals = \ | ||
y_u[rows] * r_x[cols] + y_u[rows] * (r_yu[rows] * x[cols]) - \ | ||
(y_l[rows] * r_x[cols] + y_l[rows] * (r_yl[rows] * x[cols])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be:
dA_vals = (y_u[rows] - y_l[rows]) * r_x[cols] + \
(y_u[rows] * r_yu[rows] - y_l[rows] * r_yl[rows]) * x[cols]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Fixed. Looks neater now.
results = self._model.solve() | ||
|
||
# TODO(bart): this will be unnecessary when the derivative will be in C | ||
self._derivative_cache['results'] = results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the self._derivative_cache = None
line and add the following code to the bottom of the update
function? This way the _derivative_cache
is updated properly and it removes results
from it, meaning the problem hasn't been solved.
# update problem data in self._derivative_cache
if q is not None:
self._derivative_cache["q"] = q
if l is not None:
self._derivative_cache["l"] = l
if u is not None:
self._derivative_cache["u"] = u
if Px is not None:
if Px_idx.size == 0:
self._derivative_cache["P"].data = Px
else:
self._derivative_cache["P"].data[Px_idx] = Px
if Ax is not None:
if Ax_idx.size == 0:
self._derivative_cache["A"].data = Ax
else:
self._derivative_cache["A"].data[Ax_idx] = Ax
# delete results from self._derivative_cache to prohibit
# taking the derivative of unsolved problems
if "results" in self._derivative_cache.keys():
del self._derivative_cache["results"]
module/interface.py
Outdated
A = self._derivative_cache['A'] | ||
l, u = self._derivative_cache['l'], self._derivative_cache['u'] | ||
|
||
results = self._derivative_cache['results'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do:
try:
results = self._derivative_cache['results']
except KeyError:
raise ValueError("Problem has not been solved. You cannot take derivatives. Please call the solve function.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed both. I did not spent too much time fixing these issues since I think they will be much easier to deal with from the C code where the cache will correspond to our workspace.
I double checked and the math seems right. Here's a write-up: |
Tests should be working now (at least from my machine) for all the adjoint derivatives. I ended up using approx_fprime from scipy and tuning its precision accordingly. I think we should add the forward derivative and the multithreaded code from this interface directly (just as it happens for diffcp). |
Just quickly added back in the full, sparse, LU option too. I think we'll need to tune the LSQR settings a bit to make sure it converges for most problems, as here's an ill-conditioned problem that LSQR returns early from but the direct solver gives something reasonable: t.pkl.zip import pickle as pkl
import torch
from osqpth.osqpth import OSQP
Q_idx, Q_shape, A_idx, A_shape, Q_data, p, A_data, l, u = pkl.load(open('/private/home/bda/repos/optnet/sudoku/t.pkl', 'rb'))
p.requires_grad_()
y_lsqr = OSQP(Q_idx, Q_shape, A_idx, A_shape, diff_mode='lsqr')(
Q_data, p, A_data, l, u
)
print('==== lsqr')
print(torch.autograd.grad(y_lsqr[0,0], p)[0])
y_lu = OSQP(Q_idx, Q_shape, A_idx, A_shape, diff_mode='lu')(
Q_data, p, A_data, l, u
)
print('===== lu')
print(torch.autograd.grad(y_lu[0,0], p)[0]) Output:
|
Just to update on this: there has been much progress on the CVXPY QP interface https://github.com/cvxgrp/cvxpy/pull/960 thanks to @SteveDiamond. From OSQP side I am working on a formulation that does not need LSQR. We now have qdldl-python that can be quickly hooked in and supports factorization updates/multi-threading. |
Awesome! The interface for Do the derivative tests all pass? (Just curious, I haven't run them myself.) The dependency on QDLDL is fine by me. Besides style things, such as commented out dead code (which I'm sure you're already planning to fix), it looks good to me. |
Awesome! I've pulled the latest code and updated this osqpth PR/branch to use it: osqp/osqpth#7 The OptNet sudoku experiment is still not converging with osqpth as it does with osqp and I've been debugging it a bit over the past few days. It may be related to gradient scaling (related to summing v averaging across the broadcasted input tensors), the derivatives of equality constraints, or the -inf/large negative inequality constraints, but I'm not quite sure. I've also bumped up the accuracy of OSQP, as these problems are pretty poorly-scaled, which may make them a bad test suite for this... I've been debugging by doing a single epoch of training with a minibatch size of 1: If you think it should work with equality constraints and -inf/large negative inequality constraints, I can try to pull out a few concrete examples of where the mismatch is happening. Also I ran into this error when factorizing
|
It might be worth using this example to check the derivatives correctness since it is more challenging than our unittests.
This error should not happen since we have a convex program. I wonder if this is related to scipy having not necessarily ordered sparse matrices. Maybe I should check has_sorted_indices before factoring. Do you have a MWE reproducing it? |
The tests pass on my computer but do not pass on ci because |
Yes, I just looked a bit closer and it looks like this is happening when the solution hits the boundary exactly and makes those residuals in import scipy.sparse as sp
import numpy as np
import numpy.random as npr
import osqp
n = 100
npr.seed(1)
P = 0.01*npr.randn(n,n)
P = P.T.dot(P)
P = sp.csr_matrix(P)
q = npr.randn(n)
A = sp.csr_matrix(npr.randn(n, n))
l = npr.randn(n)
u = l
s = osqp.OSQP()
s.setup(P, q, A, l, u, eps_abs=1e-10, eps_rel=1e-10, max_iter=int(1e8))
results = s.solve()
s.adjoint_derivative(dx=np.ones(n)) |
I just tried replacing the QP solver with the cvxpylayers with the CP backend and that is able to reproduce the qpth results on the sudoku experiment. From a quick glance it even seems to slightly help roll out some instabilities with qpth. So this at least points to the convergence issues here being either from some way I've hooked osqp up to the sudoku experiment, or from some way the derivatives are being handled for these problems |
Hi all -- here's a first attempt at moving some of the derivative code from https://github.com/oxfordcontrol/osqpth/ and into here. What do you think of this interface?
If it's good I can also start adding:
modulepurepy
Two other things:
diff_mode
is the linear system solver type too (direct/LSQR/sparse/dense)derivative
function here too?\cc @sbarratt @akshayka