Skip to content

Commit

Permalink
one pass except proofs
Browse files Browse the repository at this point in the history
  • Loading branch information
anitescu committed Jun 25, 2024
1 parent df326c9 commit a16b29e
Show file tree
Hide file tree
Showing 6 changed files with 32 additions and 28 deletions.
2 changes: 2 additions & 0 deletions tex/main.tex
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@
\newcommand{\epstol}{\mathbf{u}}
\newcommand{\cactive}{\mathcal{B}}
\newcommand{\cinactive}{\mathcal{N}}
\newcommand{\ma}[1]{{\color{red} Mihai: #1}}


\title{Condensed-space methods for nonlinear programming on GPUs}
\author{François Pacaud \and
Expand Down
27 changes: 14 additions & 13 deletions tex/sections/conditioning.tex
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,29 @@ \section{Conditioning of the condensed KKT system}
by exploiting the structured ill-conditioning of the condensed matrix $K$.
We base our analysis on \cite{wright1998ill}, where
the author has put a particular emphasis on the condensed KKT
system~\eqref{eq:kkt:condensed} without equality constraints. We generalize her results to the
system~\eqref{eq:kkt:condensed} without equality constraints. We generalize their results to the
matrix $K_\gamma$, which incorporates both equality and inequality
constraints. The results extend directly to $K_\tau$ (by letting the number of equalities to be zero).

To ease the notation, we suppose that the primal variable
$x$ is unconstrained, leaving only a bounded slack variable $s$ in \eqref{eq:slack_problem}.
To simplify the notation, we suppose that the primal variable
$x$ is unconstrained \ma{Not sure why we needed bound constraints on $x$ in the first place}, leaving only a bounded slack variable $s$ in \eqref{eq:slack_problem}.
This is equivalent to include the bounds on the variable $x$ in the inequality constraints,
as $\bar{h}(x) \leq 0$ with $\bar{h}(x) := (h(x), -x)$.

\subsection{Centrality conditions}
We start the discussion by recalling important results porting
on the iterates of the interior-point algorithm.
Let $p = (x, s, y, z)$ the current primal-dual iterate (the bound multiplier $v$ is considered apart),
and $p^\star$ a solution of the KKT conditions~\eqref{eq:kktconditions}. We suppose
We start the discussion by recalling important results about the iterates of the interior-point algorithm.
Let $p = (x, s, y, z)$ be the current primal-dual iterate (the bound multiplier $v$ is considered apart),
and $p^\star$ be a solution of the KKT conditions~\eqref{eq:kktconditions}. We suppose
that the solution $p^\star$ is regular and satisfies the following assumptions.
We denote $\cactive = \cactive(x^\star)$ the active-set at the optimal solution $x^\star$,
We denote $\cactive = \cactive(x^\star)$ the active-set at the optimal solution $x^\star$,
\ma{As I noted in the beginning; it does not make sense to denote the active set as you did unless you have no inequality constraints on $x$. Maybe you want to define the active set here since here is where you use it; after you make the assumption you have no ineq constraints on $x$}
and $\cinactive = \cinactive(x^\star)$ the inactive set.

% Sungho: not sure if this should be introduced here.
\begin{assumption}
\label{hyp:ipm}
Let $p^\star = (x^\star, s^\star, y^\star, z^\star)$ be a primal-dual solution
\ma{Most people would include the $v^\star$ in the primal dual solution}
satisfying the KKT conditions~\eqref{eq:kktconditions}. Let the following hold:
\begin{itemize}
\item Continuity: The Hessian $\nabla^2_{x x} L(\cdot)$ is Lipschitz continuous
Expand All @@ -67,7 +68,7 @@ \subsection{Centrality conditions}
We denote $\delta(p, v) = \| (p, v) - (p^\star, v^\star) \|$ the Euclidean distance to the
primal-dual stationary point $(p^\star, v^\star)$.
From \cite[Theorem 2.2]{wright2001effects}, if Assumption~\ref{hyp:ipm}
holds at $p^\star$ and $v > 0$,
holds at $p^\star$ and $v > 0$ \ma{Pretty sure that is true only locally},
\begin{equation}
\delta(p, v) = \Theta\left( \left\Vert \begin{bmatrix}
\nabla_p L(p, v) \\ \min(v, s)
Expand All @@ -93,9 +94,9 @@ \subsection{Centrality conditions}
\end{align}
\end{subequations}
for some constants $C > 0$ and $\alpha \in (0, 1)$.
Conditions~\eqref{eq:centralitycond:complement} ensures that the products
Conditions~\eqref{eq:centralitycond:complement} ensure that the products
$s_i v_i$ are not too disparate in the diagonal term $D_s$.
This condition is satisfied (despite rather loosely)
This condition is satisfied (even if rather loosely)
in the solver Ipopt (see \cite[Equation (16)]{wachter2006implementation}).

\begin{proposition}[\cite{wright2001effects}, Lemma 3.2 and Theorem 3.3]
Expand Down Expand Up @@ -136,10 +137,10 @@ \subsubsection{Invariant subspaces in $K_\gamma$}
s_{min} = \min_{i \in \cactive} s_i \; , \quad
s_{max} = \max_{i \in \cactive} s_i \; .
\end{equation}
We remind that $m_e$ is the number of equality constraints,
We recall that $m_e$ is the number of equality constraints,
and define $\ell := m_e + m_a$.

We explicit the structured ill-conditioning of $K_\gamma$ by
We express the structured ill-conditioning of $K_\gamma$ by
modifying the approach outlined in \cite[Theorem 3.2]{wright1998ill} to account for the additional
term $\gamma G^\top G$ arising from the equality constraints.
We show that the matrix $K_\gamma$ has two invariant subspaces
Expand Down
10 changes: 5 additions & 5 deletions tex/sections/introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ \section{Introduction}
GPUs offer two practical advantages: (1) massive parallel computing capability for applications that can exploit coarse-grain parallelism and high-memory bandwidth and (2) power efficiency due to requiring fewer transistors to process multiple tasks in parallel utilizing ``single instruction, multiple data'' (SIMD) parallelism.

While GPUs have made significant strides in enhancing machine learning applications, their adoption in the mathematical programming community has been relatively limited.
This limitation stems primarily from the fact that most optimization solvers were developed in the 1990s and are heavily optimized for CPU architectures.
This limitation stems primarily from the fact that most second-order optimization methods for constrained optimization solve a form of Newton's method using direct linear algebra as finding good iterative solvers for the Newton direction has proved elusive.
Additionally, the utilization of GPUs has been impeded by the challenges associated with sparse matrix factorization routines, which are inherently difficult to parallelize on SIMD architectures. Nevertheless, recent years have witnessed notable advancements that are reshaping this landscape.
\begin{enumerate}
\item \textbf{Improved sparse matrix operations}: The performance of sparse matrix operations has seen substantial improvements in the CUDA library, largely attributed to the integration of novel tensor cores in recent GPUs~\cite{markidis2018nvidia}.
Expand All @@ -13,7 +13,7 @@ \section{Introduction}
\end{enumerate}

\subsection{Solving optimization problems on GPU: current state-of-the-art}
For all the reasons listed before, there is an increasing interest to solve optimization problems on GPUs.
For all the reasons listed before, there is an increased interest for solving optimization problems on GPUs.
We now summarize the previous work on solving classical---large-scale, sparse, constrained---mathematical programs on GPUs.

\paragraph{GPU for mathematical programming.}
Expand Down Expand Up @@ -49,7 +49,7 @@ \subsection{Solving optimization problems on GPU: current state-of-the-art}
\paragraph{GPU for nonlinear programming.}
The success of first-order algorithms in classical mathematical programming
relies on the convexity of the problem. Thus, this approach is nontrivial to replicate
in nonlinear programming: Most engineering problems encode complex
in general nonlinear programming: Most engineering problems encode complex
physical equations that are likely to break any convex structure in the problem.
Previous experiments on the alternating current (AC) optimal power flow (OPF) problem have shown that even a simple
algorithm as the alternating direction method of multipliers (ADMM) has trouble converging as soon as the convergence
Expand All @@ -68,7 +68,7 @@ \subsection{Solving optimization problems on GPU: current state-of-the-art}
% SS: removed due to irrelevance.
% The new NVIDIA Grace CPU architecture could also be a game changer in the future of sparse linear solvers, thanks to fast communication between the CPU and GPU.
Furthermore, variants of interior point methods have been proposed
that does not depend on numerical pivoting in the linear solves,
that do not depend on numerical pivoting in the linear solves,
opening the door to parallelized sparse solvers.
Coupled with a GPU-accelerated automatic differentiation library and a
sparse Cholesky solver, these nonlinear programming solvers can solve
Expand All @@ -90,7 +90,7 @@ \subsection{Solving optimization problems on GPU: current state-of-the-art}
the interior-point methods, with promising results on the GPU.
On the other hand, null-space methods (also known as reduced Hessian methods)
reduce the KKT system down to a dense matrix, a setting also favorable for GPUs.
Our previous research has shown that the method plays nicely with the interior-point
Our previous research has shown that the approach is suitable for interior-point
methods if the number of degrees of freedom in the problem remains relatively small~\cite{pacaud2022condensed}.


Expand Down
15 changes: 8 additions & 7 deletions tex/sections/ipm.tex
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ \subsection{Problem formulation and KKT conditions}
The notation $x \perp u$ is a shorthand for the complementarity
condition $x_i u_i = 0$ (for all $i=1,\cdots, n$).

The set of active constraints at a point $x$ is denoted by
The set of active constraints at a point $x$ is denoted by \ma{This is odd. Don't you need to worry about the activity of $x$ itself?}
\begin{equation}
\mathcal{B}(x) := \{ i \in\{ 1, \cdots, m_i\} \; | \; h_i(x) = 0 \} \; .
\end{equation}
Expand Down Expand Up @@ -174,7 +174,7 @@ \subsection{Solving the KKT conditions with the interior-point method}
In addition, we define $X_k$, $S_k$, $U_k$ and $V_k$ the diagonal matrices built respectively
from the vectors $x_k$, $s_k$, $u_k$ and $v_k$.
Note that~\eqref{eq:kkt:unreduced} can be symmetrized by performing simple block row and column operations.
In what follows, we will omit the index $k$ to alleviate the notations.
In what follows, we will omit the index $k$ to simplify the notations.

\paragraph{Augmented KKT system.}
It is usual to remove in \eqref{eq:kkt:unreduced} the blocks associated
Expand Down Expand Up @@ -210,7 +210,7 @@ \subsection{Solving the KKT conditions with the interior-point method}
$r_2 := z - \mu S^{-1} e$,
$r_3 := g(x)$,
$r_4 := h(x) + s$.
Once \eqref{eq:kkt:augmented} solved, we recover the updates on bound multipliers with
Once \eqref{eq:kkt:augmented} is solved, we recover the updates on bound multipliers with
$d_u = - X^{-1}(U d_x - \mu e) - u$ and
$d_v = - S^{-1}(V d_s - \mu e) - v$.

Expand All @@ -221,6 +221,7 @@ \subsection{Solving the KKT conditions with the interior-point method}
if and only if the Jacobian $J = \begin{bmatrix} G \; &\; 0 \\ H \;&\; I \end{bmatrix}$
is full row-rank and the matrix $\begin{bmatrix} W + D_x & 0 \\ 0 & D_s \end{bmatrix}$
projected onto the null-space of the Jacobian $J$ is positive-definite~\cite{benzi2005numerical}.
\ma{I do not think that is an if and only if conditions. The Jacobian needs to be full rank. but the other matrix need not be positive definite in the projection for the system to be nonsingular}
The condition is satisfied if the inertia (defined as the respective numbers
of positive, negative and zero eigenvalues) of the matrix~\eqref{eq:kkt:augmented} is $(n + m_i, m_i + m_e, 0)$.
We use the inertia-controlling method introduced in \cite{wachter2006implementation}
Expand All @@ -231,7 +232,7 @@ \subsection{Solving the KKT conditions with the interior-point method}
an inertia-revealing \lblt factorization~\cite{duff1983multifrontal}.
Krylov methods are often not competitive when solving~\eqref{eq:kkt:augmented},
as the block diagonal terms $D_x$ and $D_s$ are getting increasingly
ill-conditioned near the solution. Their use in IPM has been limited in
ill-conditioned near the solution. Their use in IPM has been limited to
linear and convex quadratic programming \cite{gondzio-2012} (when paired
with a suitable preconditioner). We also refer to \cite{cao2016augmented}
for an efficient implementation of a preconditioned conjugate gradient
Expand Down Expand Up @@ -289,7 +290,7 @@ \subsection{Solving the KKT conditions with the interior-point method}
the elements in the diagonal tend to infinity if a variable converges to its bound,
and to $0$ if the variable is inactive.
To address the numerical error arising from such ill-conditioning, most of the
implementation of IPM employs Richardson iterations on the original system~\eqref{eq:kkt:unreduced} to refine the solution returned by the direct sparse linear solver (see \cite[Section 3.10]{wachter2006implementation}).
implementations of IPM employ Richardson iterations on the original system~\eqref{eq:kkt:unreduced} to refine the solution returned by the direct sparse linear solver (see \cite[Section 3.10]{wachter2006implementation}).

\subsection{Discussion}

Expand All @@ -303,10 +304,10 @@ \subsection{Discussion}
The system~\eqref{eq:kkt:augmented} is usually factorized using a \lblt factorization: for sparse matrices, the Duff and Reid
multifrontal algorithm~\cite{duff1983multifrontal} is the favored method (as implemented in the HSL linear solvers MA27 and MA57~\cite{duff2004ma57}).
The condensed KKT system~\eqref{eq:kkt:condensed} is often discarded,
as its conditioning is higher
as its conditioning is worse
than \eqref{eq:kkt:augmented} (implying less accurate solutions).
Additionally, condensation may result in increased fill-in within the condensed system \eqref{eq:kkt:condensed}~\cite[Section 19.3, p.571]{nocedal_numerical_2006}.
In the worst cases \eqref{eq:kkt:condensed} itself may become fully dense if an inequality row is completely dense (fortunately, a case rarer than in the normal equations used in linear programming).
In the worst cases \eqref{eq:kkt:condensed} itself may become fully dense if an inequality row is completely dense (fortunately, a case rarer than in the normal equations used in linear programming \ma{Do you really mean normal equations as in regression or do you mean equations commonly encountered in linear programming?}).
Consequently, condensation methods are not commonly utilized in practical optimization settings.
To the best of our knowledge, Artelys Knitro~\cite{waltz2006interior} is the only solver that supports computing the descent direction with \eqref{eq:kkt:condensed}.

Expand Down
2 changes: 1 addition & 1 deletion tex/sections/kkt_systems.tex
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ \subsection{Equality relaxation strategy: LiftedKKT}

\subsection{Discussion}
We have introduced two algorithms to solve
KKT systems on the GPU. On the contrary to classical implementations,
KKT systems on the GPU. As opposed to classical implementations,
the two methods do not require computing a sparse \lblt factorization of the KKT
system and use instead alternate reformulations based on the condensed KKT
system~\eqref{eq:kkt:condensed}. Both strategies
Expand Down
4 changes: 2 additions & 2 deletions tex/sections/numerics.tex
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ \subsection{Implementation}
We have implemented the two condensed-space methods in our nonlinear IPM solver MadNLP~\cite{shin2021graph}.
This implementation utilizes the abstraction {\tt AbstractKKTSystem}
in MadNLP to represent the various formulations of the KKT linear systems.
MadNLP can deport most of the IPM algorithm to the GPU, except for basic IPM operations used for the coordination (e.g., the filter line-search algorithm).
MadNLP can push most of the IPM algorithm to the GPU, except for basic IPM operations used for the coordination (e.g., the filter line-search algorithm).
In particular, any operation that involves the manipulation of array entries is performed by GPU kernels without transferring data to host memory.
We refer to \cite{shin2023accelerating} for a detailed description of the GPU implementation in MadNLP.

Expand Down Expand Up @@ -383,7 +383,7 @@ \subsection{Benchmark on OPF instances}
\subsection{Benchmark on COPS instances}
\label{sec:num:cops}
We have observed in the previous section that both LiftedKKT
and HyKKT outperforms HSL MA27 when running on the GPU.
and HyKKT outperform HSL MA27 when running on the GPU.
However, the OPF instances are specific nonlinear instances.
For that reason, we complement our analysis by looking
at the performance of LiftedKKT and HyKKT on large-scale COPS instances~\cite{dolan2004benchmarking}.
Expand Down

0 comments on commit a16b29e

Please sign in to comment.