diff --git a/.nojekyll b/.nojekyll
index 2dc4847..61b34ca 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-03b73c09
\ No newline at end of file
+1bfa8f0c
\ No newline at end of file
diff --git a/cont_dp_DDP 2.html b/cont_dp_DDP 2.html
new file mode 100644
index 0000000..e00c3dc
--- /dev/null
+++ b/cont_dp_DDP 2.html
@@ -0,0 +1,1174 @@
+
+
Dynamic programming for continuous-time optimal control
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
In the previous sections we investigated both direct and indirect approaches to the optimal control problem. Similarly as in the discrete-time case, complementing the two approaches is the dynamic programming. Indeed, the key Bellmans’s idea, which we previously formulated in discrete time, can be extended to continuous time as well.
+
We consider the continuous-time system
+\dot{\bm{x}} = \mathbf f(\bm{x},\bm{u},t)
+ with the cost function
+J(\bm x(t_\mathrm{i}), \bm u(\cdot), t_\mathrm{i}) = \phi(\bm x(t_\mathrm{f}),t_\mathrm{f}) + \int_{t_\mathrm{i}}^{t_\mathrm{f}}L(\bm x(t),\bm u(t),t)\, \mathrm d t.
+
+
Optionally we can also consider constraints on the state at the final time (be it a particular value or some set of values)
+\psi(\bm x(t_\mathrm{f}),t_\mathrm{f})=0.
+
+
+
Hamilton-Jacobi-Bellman (HJB) equation
+
We now consider an arbitrary time t and split the (remaining) time interval [t,t_\mathrm{f}] into two parts [t,t+\Delta t] and [t+\Delta t,t_\mathrm{f}] , and structure the cost function accordingly
+J(\bm x(t),\bm u(\cdot),t) = \int_{t}^{t+\Delta t} L(\bm x,\bm u,\tau)\,\mathrm{d}\tau + \underbrace{\int_{t+\Delta t}^{t_\mathrm{f}} L(\bm x,\bm u,\tau)\,\mathrm{d}\tau + \phi(\bm x(t_\mathrm{f}),t_\mathrm{f})}_{J(\bm x(t+\Delta t), \bm u(t+\Delta t), t+\Delta t)}.
+
We now perform Taylor series expansion of J^\star(\bm x+\Delta \bm x, t+\Delta t) about (\bm x,t)
+J^\star(\bm x,t) = \min_{\bm u(\tau),\;t\leq\tau\leq t+\Delta t} \left[L\Delta t + J^\star(\bm x,t) + (\nabla_{\bm x} J^\star)^\top \Delta \bm x + \frac{\partial J^\star}{\partial t}\Delta t + \mathcal{O}((\Delta t)^2)\right].
+
+
Using
+\Delta \bm x = \bm f(\bm x,\bm u,t)\Delta t
+ and noting that J^\star and J_t^\star are independent of \bm u(\tau),\;t\leq\tau\leq t+\Delta t, we get
+\cancel{J^\star (\bm x,t)} = \cancel{J^\star (\bm x,t)} + \frac{\partial J^\star }{\partial t}\Delta t + \min_{\bm u(\tau),\;t\leq\tau\leq t+\Delta t}\left[L\Delta t + (\nabla_{\bm x} J^\star )^\top f\Delta t\right].
+
This is obviously a partial differential equation (PDE) for the optimal cost function J^\star(\bm x,t).
+
And since this is a differential equation, boundary value(s) must be specified to determine a unique solution. In particular, since the equation is first-order with respect to both time and state, specifying the value of the optimal cost function at the final state and the final time is enough. With the general final-state constraints we have introduced above, the boundary value condition reads
+J^\star (\bm x(t_\mathrm{f}),t_\mathrm{f}) = \phi(\bm x(t_\mathrm{f}),t_\mathrm{f}),\qquad \text{on the hypersurface } \psi(\bm x(t_\mathrm{f}),t_\mathrm{f}) = 0.
+
+
Note that this includes as special cases the fixed-final-state and free-final-state cases.
+
+
+
HJB equation and Hamiltonian
+
Recall the definition of Hamiltonian H(\bm x,\bm u,\bm \lambda,t) = L(\bm x,\bm u,t) + \boldsymbol{\lambda}^\top \mathbf f(\bm x,\bm u,t). The HJB equation can also be written as \boxed
+{-\frac{\partial J^\star (\bm x(t),t)}{\partial t} = \min_{\bm u(t)}H(\bm x(t),\bm u(t),\nabla_{\bm x} J^\star (\bm x(t),t),t).}
+
+
What we have just derived is one of the most profound results in optimal control – Hamiltonian must be minimized by the optimal control. We will exploit it next for some derivations.
+
Recall also that we have already encountered a similar results that made statements about the necessary maximization (or minimization) of the Hamiltonian with respect to the control – the celebrated Pontryagin’s principle of maximum (or minimum).
Dynamic programming for continuous-time optimal control
with the cost function
J(\bm x(t_\mathrm{i}), \bm u(\cdot), t_\mathrm{i}) = \phi(\bm x(t_\mathrm{f}),t_\mathrm{f}) + \int_{t_\mathrm{i}}^{t_\mathrm{f}}L(\bm x(t),\bm u(t),t)\, \mathrm d t.
-
Optionally we can also consider constraints on the state at the final time (be it a particular value or some set of values)
-\psi(\bm x(t_\mathrm{f}),t_\mathrm{f})=0.
-
+
The final time can be fixed to a particular value t_\mathrm{f}, in which case the state at the final time \bm x(t_\mathrm{f}) is either free (unspecified but penalized through \phi(\bm x(t_\mathrm{f}))), or it is fixed (specified and not penalized, that is, \bm x(t_\mathrm{f}) = \mathbf x^\mathrm{ref}).
+
The final time can also be free (regarded as an optimization variable itself), in which case general constraints on the state at the final time can be expressed as
+\psi(\bm x(t_\mathrm{f}),t_\mathrm{f})=0
+ or possibly even using an inequality, which we will not consider here.
+
The final time can also be considered infinity, that is, t_\mathrm{f}=\infty, but we will handle this situation later separately.
Hamilton-Jacobi-Bellman (HJB) equation
We now consider an arbitrary time t and split the (remaining) time interval [t,t_\mathrm{f}] into two parts [t,t+\Delta t] and [t+\Delta t,t_\mathrm{f}] , and structure the cost function accordingly
@@ -795,18 +803,50 @@
This is obviously a partial differential equation (PDE) for the optimal cost function J^\star(\bm x,t).
-
And since this is a differential equation, boundary value(s) must be specified to determine a unique solution. In particular, since the equation is first-order with respect to both time and state, specifying the value of the optimal cost function at the final state and the final time is enough. With the general final-state constraints we have introduced above, the boundary value condition reads
+
+
Boundary conditions for the HJB equation
+
Since the HJB equation is a differential equation, initial/boundary value(s) must be specified to determine a unique solution. In particular, since the equation is first-order with respect to both time and state, specifying the value of the optimal cost function at the final state and the final time is enough.
+
For a fixed-final-time, free-final-state, the optimal cost at the final time is
+J^\star (\bm x(t_\mathrm{f}),t_\mathrm{f}) = \phi(\bm x(t_\mathrm{f}),t_\mathrm{f}).
+
+
For a fixed-final-time, fixed-final-state, since the component of the cost function corresponding to the terminal state is zero, the optimal cost at the final time is zero as well
+J^\star (\bm x(t_\mathrm{f}),t_\mathrm{f}) = 0.
+
+
With the general final-state constraints introduced above, the boundary value condition reads
J^\star (\bm x(t_\mathrm{f}),t_\mathrm{f}) = \phi(\bm x(t_\mathrm{f}),t_\mathrm{f}),\qquad \text{on the hypersurface } \psi(\bm x(t_\mathrm{f}),t_\mathrm{f}) = 0.
-
Note that this includes as special cases the fixed-final-state and free-final-state cases.
-
-
HJB equation and Hamiltonian
+
+
Optimal control using the optimal cost (-to-go) function
+
Assume now that the solution J^\star (\bm x(t),t) to the HJB equation is available. We can then find the optimal control by the minimization \boxed
+{\bm u^\star(t) = \arg\min_{\bm u(t)}\left[L(\bm x(t),\bm u(t),t)+(\nabla_{\bm x} J^\star (\bm x(t),t))^\top \bm f(\bm x(t),\bm u(t),t)\right].}
+
+
For convenience, the minimized function is often labelled as
+Q(\bm x(t),\bm u(t),t) = L(\bm x(t),\bm u(t),t)+(\nabla_{\bm x} J^\star (\bm x(t),t))^\top \bm f(\bm x(t),\bm u(t),t)
+ and called just Q-function. The optimal control is then
+\bm u^\star(t) = \arg\min_{\bm u(t)} Q(\bm x(t),\bm u(t),t).
+
+
+
+
+
HJB equation formulated using a Hamiltonian
Recall the definition of Hamiltonian H(\bm x,\bm u,\bm \lambda,t) = L(\bm x,\bm u,t) + \boldsymbol{\lambda}^\top \mathbf f(\bm x,\bm u,t). The HJB equation can also be written as \boxed
{-\frac{\partial J^\star (\bm x(t),t)}{\partial t} = \min_{\bm u(t)}H(\bm x(t),\bm u(t),\nabla_{\bm x} J^\star (\bm x(t),t),t).}
-
What we have just derived is one of the most profound results in optimal control – Hamiltonian must be minimized by the optimal control. We will exploit it next for some derivations.
-
Recall also that we have already encountered a similar results that made statements about the necessary maximization (or minimization) of the Hamiltonian with respect to the control – the celebrated Pontryagin’s principle of maximum (or minimum).
+
What we have just derived is one of the most profound results in optimal control – Hamiltonian must be minimized by the optimal control. We will exploit it next as a tool for deriving some theoretical results.
+
+
+
HJB equation vs Pontryagin’s principle of maximum (minimum)
+
Recall also that we have already encountered a similar results that made statements about the necessary maximization (or minimization) of the Hamiltonian with respect to the control – the celebrated Pontryagin’s principle of maximum (or minimum). Are these two related? Equivalent?
+
+
+
HJB equation for an infinite time horizon
+
When both the system and the cost function are time-invariant, and the final time is infinite, that is, t_\mathrm{f}=\infty, the optimal cost function J^\star() must necessarily be independent of time, that is, it’s partial derivative with respect to time is zero, that is, \frac{\partial J^\star (\bm x(t),t)}{\partial t} = 0. The HJB equation then simplifies to
Using HJB equation to solve the continuous-time LQR problem
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
As we have already discussed a couple of times, in the LQR problem we consider a linear time invariant (LTI) system modelled by
+\dot{\bm x}(t) = \mathbf A\bm x(t) + \mathbf B\bm u(t),
+ and the quadratic cost function
+J(\bm x(t_\mathrm{i}),\bm u(\cdot), t_\mathrm{i}) = \frac{1}{2}\bm x^\top(t_\mathrm{f})\mathbf S_\mathrm{f}\bm x(t_\mathrm{f}) + \frac{1}{2}\int_{t_\mathrm{i}}^{t_\mathrm{f}}\left(\bm x^\top \mathbf Q\bm x + \bm u^\top \mathbf R \bm u\right)\mathrm{d}t.
+
+
The Hamiltonian is
+H(\bm x,\bm u,\bm \lambda) = \frac{1}{2}\left(\bm x^\top \mathbf Q\bm x + \bm u^\top \mathbf R \bm u\right) + \boldsymbol{\lambda}^\top \left(\mathbf A\bm x + \mathbf B\bm u\right).
+
+
According to the HJB equation our goal is to minimize H at a given time t, which enforces the condition on its gradient
+\mathbf 0 = \nabla_{\bm u} H = \mathbf R\bm u + \mathbf B^\top \boldsymbol\lambda,
+ from which it follows that the optimal control must necessarily satisfy
+\bm u^\star = -\mathbf R^{-1} \mathbf B^\top \boldsymbol\lambda.
+
+
Since the Hessian of the Hamiltonian is positive definite by our assumption on positive definiteness of \mathbf R
+\nabla_{\bm u \bm u}^2 \mathbf H = \mathbf R > 0,
+ Hamiltonian is really minimized by the above choice of \bm u^\star.
+
The minimized Hamiltonian is
+\min_{\bm u(t)}H(\bm x, \bm u, \bm \lambda) = \frac{1}{2}\bm x^\top \mathbf Q \bm x + \boldsymbol\lambda^\top \mathbf A \bm x - \frac{1}{2}\boldsymbol\lambda^\top \mathbf B\mathbf R^{-1}\mathbf B^\top \boldsymbol\lambda
+
+
Setting \boldsymbol\lambda = \nabla_{\bm x} J^\star, the HJB equation is \boxed
+{-\frac{\partial J^\star}{\partial t} = \frac{1}{2}\bm x^\top \mathbf Q \bm x + (\nabla_{\bm x} J^\star)^\top \mathbf A\bm x - \frac{1}{2}(\nabla_{\bm x} J^\star)^\top \mathbf B\mathbf R^{-1}\mathbf B^\top \nabla_{\bm x} J^\star,}
+ and the boundary condition is
+J^\star(\bm x(t_\mathrm{f}),t_\mathrm{f}) = \frac{1}{2}\bm x^\top (t_\mathrm{f})\mathbf S_\mathrm{f}\bm x(t_\mathrm{f}).
+
+
We can now proceed by assuming that the optimal cost function is quadratic in \bm x for all other times t, that is, there must exist a symmetric matrix function \mathbf S(t) such that
+J^\star(\bm x(t),t) = \frac{1}{2}\bm x^\top (t)\mathbf S(t)\bm x(t).
+
+
+
+
+
+
+
+Note
+
+
+
+
Recall that we did something similar when making a sweep assumption to derive a Riccati equation following the indirect approach – we just make an inspired guess and see if it works. Here the inspiration comes from the observation made elsewhere, that the optimal cost function in the LQR problem is quadratic in \bm x.
+
+
+
We now aim at substituting this into the HJB equation. Observe that \frac{\partial J^\star}{\partial t}=\bm x^\top(t) \dot{\mathbf{S}}(t) \bm x(t) and \nabla_{\bm x} J^\star = \mathbf S \bm x. Upon substitution to the HJB equation, we get
+
+-\bm x^\top \dot{\mathbf{S}} \bm x = \frac{1}{2}\bm x^\top \mathbf Q \bm x + \bm x^\top \mathbf S \mathbf A\bm x - \frac{1}{2}\bm x^\top \mathbf S \mathbf B\mathbf R^{-1}\mathbf B^\top \mathbf S \bm x.
+
+
This can be reformatted as
+-\bm x^\top \dot{\mathbf{S}} \bm x = \frac{1}{2} \bm x^\top \left[\mathbf Q + 2 \mathbf S \mathbf A - \mathbf S \mathbf B\mathbf R^{-1}\mathbf B^\top \mathbf S \right ] \bm x.
+
+
Notice that the middle matrix in the square brackets is not symmetric. Symmetrizing it (with no effect on the resulting value of the quadratic form) we get
+
+-\bm x^\top \dot{\mathbf{S}} \bm x = \frac{1}{2} \bm x^\top \left[\mathbf Q + \mathbf S \mathbf A + \mathbf A^\top \mathbf S - \mathbf S \mathbf B\mathbf R^{-1}\mathbf B^\top \mathbf S \right ] \bm x.
+
+
Finally, since the above single (scalar) equation should hold for all \bm x(t), the matrix equation must hold too, and we get the familiar differential Riccati equation for the matrix variable \mathbf S(t)\boxed
+{-\dot{\mathbf S}(t) = \mathbf A^\top \mathbf S(t) + \mathbf S(t)\mathbf A - \mathbf S(t)\mathbf B\mathbf R^{-1}\mathbf B^\top \mathbf S(t) + \mathbf Q}
+ initialized at the final time t_\mathrm{f} by \mathbf S(t_\mathrm{f}) = \mathbf S_\mathrm{f}.
+
Having obtained \mathbf S(t), we can get the optimal control by substituting it into \boxed
+{
+\begin{aligned}
+ \bm u^\star(t) &= - \mathbf R^{-1}\mathbf B^\top \nabla_{\bm x} J^\star(\bm x(t),t) \\
+ &= - \underbrace{\mathbf R^{-1}\mathbf B^\top \mathbf S(t)}_{\bm K(t)}\bm x(t).
+\end{aligned}
+}
+
+
We have just rederived the continuous-time LQR problem using the HJB equation (previously we did it by massaging the two-point boundary value problem that followed as the necessary condition of optimality from the techniques of calculus of variations).
+
Note that we have also just seen the equivalence between a first-order linear PDE and first-order nonlinear ODE.
+
+
+
+
+
+
\ No newline at end of file
diff --git a/cont_dp_LQR.html b/cont_dp_LQR.html
index 59d0664..9da3ddf 100644
--- a/cont_dp_LQR.html
+++ b/cont_dp_LQR.html
@@ -796,7 +796,7 @@
Using HJB equation to solve the continuous-time LQR problem
-
Recall that we did something similar when making a sweep assumption to derive a Riccati equation following the indirect approach – we just make an inspired guess and see if it works. Here the inspiration comes from the observation made elsewhere, that the optimal cost function in the LQR problem is quadratic in \bm x.
+
Recall that we did something similar when making a sweep assumption to derive a Riccati equation following the indirect approach – we just make an inspired guess and see if it solves the equation. Here the inspiration comes from the observation made elsewhere, that the optimal cost function in the LQR problem is quadratic in \bm x.
We now aim at substituting this into the HJB equation. Observe that \frac{\partial J^\star}{\partial t}=\bm x^\top(t) \dot{\mathbf{S}}(t) \bm x(t) and \nabla_{\bm x} J^\star = \mathbf S \bm x. Upon substitution to the HJB equation, we get
Dynamic programming for continuous-time systems is typically not covered in standard texts on dynamic programming, because those mainly focus on discrete-time systems. But there is no shortage of discussions of HJB equation in control theory texts. Our introductory treatment here is based on Section 6.3 in Lewis, Vrabie, and Syrmo (2012).
+
The classical Anderson and Moore (2007) uses HJB equation as the main tool for solving various version of the LQR problem.
+
Liberzon (2011) discusses the HJB equation in Chapter 5. In the section 5.2 they also discuss the connection with Pontryagin’s principle.
Through this chapter we are entering into the realm of continuous-time optimal control – we are going to consider dynamical systems that evolve in continuous time, and we are going to search for control that also evolves in continuous time.
+
+
+
+
+
+
+Continuous-time or just continuous?
+
+
+
+
Sometimes (and actually very often) we can encounter the terms continuous systems and continuous control. We find the terminology rather unfortunate because it is not clear what the adjective continuous refers to. For example, it might (incorrectly) suggest that the control is a continuous function of time (or state), when we only mean that it is the time that evolves continuously; the control can be discontinuous as a function of time and/or state. That is why in our course we prefer the more explicit term continuous-time instead of just continuous.
+
+
+
+
Continuous-time optimal control problem
+
We start by considering a nonlinear continuous-time system modelled by the state equation
+\dot{\bm{x}}(t) = \mathbf f(\bm x(t),\bm u(t), t),
+ where
+
+
\bm x(t) \in \mathbb R^n is the state vector at the continuous time t\in \mathbb R,
+
\bm u(t) \in \mathbb R^m is the control vector at the continuous time t,
+
\mathbf f: \mathbb{R}^n \times \mathbb{R}^m \times \mathbb R \to \mathbb{R}^n is the state transition function (in general not only nonlinear but also time-varying).
+
+
A general nonlinear continuous-time optimal control problem (OCP) is then formulated as
+\begin{aligned}
+\operatorname*{minimize}_{\bm u(\cdot), \bm x(\cdot)}&\quad \left(\phi(\bm x(t_\mathrm{f}),t_\mathrm{f}) + \int_{t_\mathrm{i}}^{t_\mathrm{f}} L(\bm x(t),\bm u(t),t) \; \mathrm{d}t \right)\\
+\text{subject to} &\quad \dot {\bm{x}}(t) = \mathbf f(\bm x(t),\bm u(t)),\quad t \in [t_\mathrm{i},t_\mathrm{f}],\\
+ &\quad \bm u(t) \in \mathcal U(t),\\
+ &\quad \bm x(t) \in \mathcal X(t),
+\end{aligned}
+ where
+
+
t_\mathrm{i} is the initial continuous time,
+
t_\mathrm{f} is the final continuous time,
+
\phi() is a terminal cost function that penalizes the state at the final time (and possibly the final time too if it is regarded as an optimization variable),
+
L() is a running (also stage) cost function,
+
and \mathcal U(t) and \mathcal X(t) are (possibly time-dependent) sets of feasible controls and states – these sets are typically expressed using equations and inequalities. Should they be constant (not changing in time), the notation is just \mathcal U and \mathcal X.
+
+
Oftentimes it is convenient to handle the constraints of the initial and final states separately:
+\begin{aligned}
+\operatorname*{minimize}_{\bm u(\cdot), \bm x(\cdot)}&\quad \left(\phi(\bm x(t_\mathrm{f}),t_\mathrm{f}) + \int_{t_\mathrm{i}}^{t_\mathrm{f}} L(\bm x(t),\bm u(t),t) \; \mathrm{d}t \right)\\
+\text{subject to} &\quad \dot {\bm{x}}(t) = \mathbf f(\bm x(t),\bm u(t)),\quad t \in [t_\mathrm{i},t_\mathrm{f}],\\
+ &\quad \bm u(t) \in \mathcal U(t),\\
+ &\quad \bm x(t) \in \mathcal X(t),\\
+ &\quad \bm x(t_\mathrm{i}) \in \mathcal X_\mathrm{init},\\
+ &\quad \bm x(t_\mathrm{f}) \in \mathcal X_\mathrm{final}.
+\end{aligned}
+
+
In particular, at the initial time just one particular state is often considered. At the final time, the state might be required to be equal to some given value, it might be required to be in some set defined through equations or inequalities, or it might be left unconstrained. Finally, the constraints on the control and states typically (but not always) come in the form of lower and upper bounds. The optimal control problem then specializes to
+\begin{aligned}
+\operatorname*{minimize}_{\bm u(\cdot), \bm x(\cdot)}&\quad \left(\phi(\bm x(t_\mathrm{f}),t_\mathrm{f}) + \int_{t_\mathrm{i}}^{t_\mathrm{f}} L(\bm x(t),\bm u(t),t) \; \mathrm{d}t \right)\\
+\text{subject to} &\quad \dot {\bm{x}}(t) = \mathbf f(\bm x(t),\bm u(t)),\quad t \in [t_\mathrm{i},t_\mathrm{f}],\\
+ &\quad \bm u_{\min} \leq \bm u(t) \leq \bm u_{\max},\\
+ &\quad \bm x_{\min} \leq \bm x(t) \leq \bm x_{\max},\\
+ &\quad \bm x(t_\mathrm{i}) = \mathbf x^\text{init},\\
+ &\quad \left(\bm x(t_\mathrm{f}) = \mathbf x^\text{ref}, \; \text{or} \; \mathbf h_\text{final}(\bm x(t_\mathrm{f})) = \mathbf 0, \text{or} \; \mathbf g_\text{final}(\bm x(t_\mathrm{f})) \leq \mathbf 0\right),
+\end{aligned}
+ where
+
+
the inequalities should be interpreted componentwise,
+
\bm u_{\min} and \bm u_{\max} are lower and upper bounds on the control, respectively,
+
\bm x_{\min} and \bm x_{\max} are lower and upper bounds on the state, respectively,
+
\mathbf x^\text{init} is a fixed initial state,
+
\mathbf x^\text{ref} is a required (reference) final state,
+
and the functions \mathbf g_\text{final}() and \mathbf h_\text{final}() can be used to define the constraint set for the final state.
+
+
+
+
+
+
+
+Classification of optimal control problems: Bolza, Mayer, and Lagrange problems
+
+
+
+
The cost function in the above defined optimal control problem contains both the cost incurred at the final time and the cumulative cost (the integral of the running cost) incurred over the whole interval. An optimal control problem with this general cost function is called Bolza problem in the literature. If the cost function only penalizes the final state and the final time, the problem is called Mayer problem. If the cost function only penalizes the cumulative cost, the problem is called Lagrange problem.
+
+
+
+
+
Why continuous-time optimal control?
+
Why are we interested in continuous-time optimal control when at the end of the day most if not all controllers are implemented using computers and hence in discrete time? There are several reasons:
+
+
The theory for continuous-time optimal control is highly mature and represents a pinnacle of human ingenuity. It would be a pity to ignore it. It is also much richer than the theory for discrete-time optimal control. For example, when considering the time-optimal control, we can differentiate the cost function with respect to the final time because it is a continuous (real) variable.
+
Although the theoretical concepts needed for continuous-time optimal control are more advanced (integrals and derivatives instead of sums and differences, function spaces instead of spaces of sequences, calculus of variations instead of differential calculus), the results are often simpler than in the discrete-time case – the resulting formulas just look neater and more compact.
+
We will see later that methods for solving general continuous-time optimal control problems must use some kind of (temporal)discretization. Isn’t it then enough to study discretization and discrete-time optimal control separately? It will turn out that discretization can be regarded as a part of the solution process.
+
+
+
+
Approaches to continuous-time optimal control
+
There are essentially the same three approaches to continuous-time optimal control as we have seen when studying discrete-time optimal control:
+
+
indirect approaches,
+
direct approaches,
+
dynamic programming.
+
+
+
+
+
+
+
+
+
In this chapter we start with the indirect approach. But first we need to introduce the framework of calculus of variations, which is what we do in the next section.
Indirect approach to optimal control is based on calculus of variations (and its later extension in the form of Pontryagin’s principle of maximum). Calculus of variations is an advanced mathematical discipline that requires non-trivial foundations and effort to master. In our course, however, we take the liberty of aiming for intuitive understanding rather than mathematical rigor. At roughly the same level, the calculus of variations is introduced in books on optimal control, such as the classic and affordable (Kirk 2004), the popular and online available (Lewis, Vrabie, and Syrmo 2012), or very accessible and also freely available online (Liberzon 2011).
+
With anticipation, we provide here a reference to the paper (Sussmann and Willems 1997), which shows how the celebrated Pontryagin’s principle of maximum extends the calculus of variations significantly. But we will only discuss this in the next chapter.
+
For those interested in a having a standard reference for the calculus of variations, the classic (Gelfand and Fomin 2020) is recommendable, the more so that it is fairly slim.
+Sussmann, H. J., and J. C. Willems. 1997. “300 Years of Optimal Control: From the Brachystochrone to the Maximum Principle.”IEEE Control Systems 17 (3): 32–44. https://doi.org/10.1109/37.588098.
+
Indirect methods for optimal control reformulate the optimal control problem into a set of equtions – boundary value problems with differential and algebraic equations in the case of continuous-time systems – and by solving these (typically numerically) we obtain the optimal state and control trajectories. Practical usefullness of these is rather limited as such optimal control trajectory constitutes an open-loop control – there is certainly no need to advocate the importance of feedback in this advanced control course.
+
One way to introduce feedback is to regard the computed optimal state trajectory \bm x^\star(t) as a reference trajectory and design a feedback controller to track this reference. To our advantage, we already have the corresponding control trajectory \bm u^\star(t) too, and theferore we can formulate such reference tracking problem as a problem of regulating the deviation \delta \bm x(t) of the state from its reference by means of superposing a feedback control \delta \bm u(t) onto the (open-loop) optimal control.
+
While this problem – also known as the problem of stabilization of a (reference) trajectory – can be solved by basically any feedback control scheme, one elegant way is to linearize the system around the reference trajectory and formulate the problem as the LQR problem for a time-varying linear system.
+
+
+
+
+
+
+Linearization around a trajectory
+
+
+
+
Don’t forget that when linearizing a nonlinear system \dot{\bm x} = \mathbf f(\bm x,\bm u) around a point that is not equilibrium – and this inevitably happens when linearizing along the state trajectory \bm x^\star(t) obtained from indirect approach to optimal control – the linearized system \frac{\mathrm d}{\mathrm d t} \delta \bm x= \mathbf A(t) \delta \bm x + \mathbf B(t) \delta \bm u considers not only the state variables but also the control variables as increments \delta \bm x(t) and \delta \bm u(t) that must be added to the nominal values \bm x^\star(t) and \bm u^\star(t) of the state and control variables determining the operating point. That is, \bm x(t) = \bm x^\star(t) + \delta \bm x(t) and \bm u(t) = \bm u^\star(t) + \delta \bm u(t).
+
+
+
Having decided on an LQR framework, we can now come up with the three matrices \mathbf Q, \mathbf R and \mathbf S that set the quadratic cost function. Once this choice is made, we can just invoke the solver for continuous-time Riccati equation with the ultimate goal of finding the time-varying state feedback gain \mathbf K(t).
+
+
+
+
+
+
+LQR for trajectory stabilization cab be done in discrete time
+
+
+
+
If discrete-time feedback control is eventually desired, which it mostly is, the whole LQR design for a time-varying linear system will have to be done using just periodically sampled state and control trajectories and applying recursive formulas for the discrete-time Riccati equation and state feedback gain.
+
+
+
The three weighting matrices \mathbf Q, \mathbf R and \mathbf S, if chosen arbitrarily, are not related to the original cost function that is minimized by the optimal state and control trajectories. The matrices just parameterize a new optimal control problem. It turns out, however, that there is a clever (and insightful) way of choosing these matrices so that the trajectory stabilization problem inherits the original cost function. In other words, even when the system fails to stay on the optimal trajectory perfectly, the LQR state-feedback controller will keep minimizing the same cost function when regulating the deviation from the optimal trajectory.
+
Recall that using the conventional definition of Hamiltonian H(\bm x, \bm u, \bm \lambda) = L(\bm x, \bm u) + \bm \lambda^\top \mathbf f(\bm x, \bm u), in which we now assume time invariance of both the system and the cost function for notational simplicity, the necessary conditions of optimality are
+\begin{aligned}
+\dot{\bm x} &= \nabla_{\bm\lambda} H(\bm x,\bm u,\boldsymbol \lambda) = \mathbf f(\bm x, \bm u), \\
+\dot{\bm \lambda} &= -\nabla_{\bm x} H(\bm x,\bm u,\boldsymbol \lambda), \\
+\mathbf 0 &= \nabla_{\bm u} H(\bm x,\bm u,\boldsymbol \lambda),\\
+\bm x(t_\mathrm{i})&=\mathbf x_\mathrm{i},\\
+\bm x(t_\mathrm{f})&=\mathbf x_\mathrm{f} \quad \text{or}\quad \bm \lambda(t_\mathrm{f})=\nabla\phi(\bm{x}(t_\mathrm{f})),
+\end{aligned}
+ where the option on the last line is selected based on whether the state at the final time is fixed or free.
+
+
+
+
+
+
+The state at the final time can also be restricted by a linear equation
+
+
+
+
The conditions of optimality stated above correspond to one of the two standard situations, in which the state in the final time is either fixed to a single value or completely free. The conditions can also be modified to consider the more general situation, in which the state at the final time is restricted to lie on a manifold defined by an equality constraint \psi(\bm x(t_\mathrm{f})) = 0.
+
+
+
Let’s now consider some tiny perturbation to the initial state \bm x(t_\mathrm{i}) from its prescribed nominal value \mathbf x_\mathrm{i}. It will give rise to deviations all all the variables in the above equations from their nominal – optimal – trajectories. Assuming the deviations are small, linear model suffices to describe them. In other words, what we are now after is linearization of the above equations
+
+\begin{aligned}
+\delta \dot{\bm x} &= \nabla_{\bm x} \mathbf f \; \delta \bm x + \nabla_{\bm u} \mathbf f \; \delta \bm u, \\
+\delta \dot{\bm \lambda} &= -\nabla^2_{\bm{xx}} H \; \delta \bm x -\nabla^2_{\bm{xu}} H \; \delta \bm u -\underbrace{\nabla^2_{\bm{x\lambda}} H}_{(\nabla_{\bm x} \mathbf f)^\top} \; \delta \bm \lambda, \\
+\mathbf 0 &= \nabla^2_{\bm{ux}} H \; \delta \bm x + \nabla^2_{\bm{uu}} H \; \delta \bm u + \underbrace{\nabla^2_{\bm{u\lambda}} H}_{(\nabla_{\bm u} \mathbf f)^\top} \; \delta \bm \lambda,\\
+\delta \bm x(t_\mathrm{i}) &= \text{specified},\\
+\delta \bm \lambda(t_\mathrm{f}) &= \nabla^2_{\bm{xx}}\phi(\mathbf{x}(t_\mathrm{f}))\; \delta \bm x(t_\mathrm{f}).
+\end{aligned}
+\tag{1}
+
+
+
+
+
+
+Note on notation
+
+
+
+
Let’s recall for convenience here that since \mathbf f(\bm x, \bm u) is a vector function of a vector argument(s), \nabla_{\bm x} \mathbf f is a matrix whose columns are gradients of individual components of \mathbf f. Equivalently, (\nabla_{\bm x} \mathbf f)^\top stands for the Jacobian of the function \mathbf f with respect to \bm x. Similarly, \nabla_{\bm{xx}} H is the Hessian of \mathbf f with respect to \bm x. That is, it is a matrix composed of second derivatives. It is a symmetric matrix, hence no need to transpose it. Finally, the terms \nabla_{\bm{ux}} H and \nabla_{\bm{xu}} H are matrices containing mixed second derivatives.
+
+
+
With hindsight we relabel the individual terms in Equation 1 as
+\begin{aligned}
+\mathbf A(t) &\coloneqq (\nabla_{\bm x} \mathbf f)^\top\\
+\mathbf B(t) &\coloneqq (\nabla_{\bm u} \mathbf f)^\top\\
+\mathbf Q(t) &\coloneqq \nabla^2_{\bm{xx}} H\\
+\mathbf R(t) &\coloneqq \nabla^2_{\bm{uu}} H\\
+\mathbf N(t) &\coloneqq \nabla^2_{\bm{xu}} H\\
+\mathbf S_\mathrm{f} &\coloneqq \nabla^2_{\bm{xx}}\phi(\mathbf{x}(t_\mathrm{f})).
+\end{aligned}
+
+
Let’s rewrite the perturbed necessary conditions of optimality using these new symbols
Assuming that \nabla^2_{\bm{uu}} H is nonsingular, which can solve the third equation for \bm u
+\bm u = -\nabla^2_{\bm{uu}} H^{-1} \left( \nabla^2_{\bm{ux}} H \; \delta \bm x + (\nabla_{\bm u} \mathbf f)^\top \; \delta \bm \lambda\right ).
+
The indirect approach to optimal control reformulates the optimal control problem as a system of differential and algebraic equations (DAE) with the values of some variables specified at both ends of the time interval – the two-point boundary value problem (TP–BVP). It is only in special cases that we are able to reformulate the TP–BVP as an initial value problem (IVP), the prominent example of which is the LQR problem and the associate differential Riccati equation solved backwards in time. However, generally we need to solve the TP–BVP DAE and the only way to do it is by numerical methods. Here we give some.
+
+
Gradient method for the TP-BVP DAE for free final state
+
Recall that with the Hamiltonian defined as H(\bm x, \bm u, \bm \lambda) = L(\bm x, \bm u) + \bm \lambda^\top \mathbf f(\bm x, \bm u), the necessary conditions of optimality for the fixed final time and free final state are are given by the following system of differential and algebraic equations (DAE)
+\begin{aligned}
+\dot{\bm{x}} &= \nabla_{\bm\lambda} H(\bm x,\bm u,\bm \lambda) \\
+\dot{\bm{\lambda}} &= -\nabla_{\bm x} H(\bm x,\bm u,\bm \lambda) \\
+\mathbf 0 &= \nabla_{\bm u} H(\bm x,\bm u,\bm \lambda)\qquad (\text{or} \qquad \bm u^\star = \text{argmax } H(\bm x^\star,\bm u, \bm\lambda^\star),\quad \bm u \in\mathcal{U})\\
+\bm x(t_\mathrm{i}) &=\mathbf x_\mathrm{i}\\
+\bm \lambda(t_\mathrm{f}) &= \nabla\phi(\bm{x}(t_\mathrm{f})).
+\end{aligned}
+
+
One idea to solve this is to guess at the trajectory \bm u(t) on a grid of the time interval, use it to solve the state and costate equations, and then with all the three variables \bm x, \bm u, and \bm u evaluate how much the stationarity equation is actually not satisfied. Based on the this, modify \bm u and go for another iteration. Formally this is expressed as the algorithm:
+
+
Set some initial trajectory \bm u(t),\; t\in[t_\mathrm{i},t_\mathrm{f}] on a grid of points in [t_\mathrm{i},t_\mathrm{f}].
+
With the chosen \bm u(\cdot) and the initial state \bm x(t_\mathrm{i}), solve the state equation
+\dot{\bm{x}} = \nabla_{\bm\lambda} H(\bm x,\bm u,\bm \lambda) = \mathbf f(\bm x, \bm u)
+ for \bm x(t) using a solver for initial value problem ODE, that is, on a grid of t\in[t_\mathrm{i},t_\mathrm{f}].
+
Having the control and state trajectories, \bm u(\cdot) and \bm x(\cdot), solve the costate equation
+\dot{\bm{\lambda}} = -\nabla_{\bm x} H(\bm x,\bm u,\bm \lambda)
+
+for the costates \bm \lambda(t), starting at the final time t_\mathrm{f}, invoking the boundary condition \bm \lambda(t_\mathrm{f}) = \nabla\phi(\bm{x}(t_\mathrm{f})).
+
Evaluate \nabla_{\bm u} H(\bm x,\bm u,\bm \lambda) for all t\in[t_\mathrm{i}, t_\mathrm{f}].
+
If \nabla_{\bm u} H(\bm x,\bm u,\bm \lambda) \approx \mathbf 0 for all t\in[t_\mathrm{i}, t_\mathrm{f}], quit, otherwise modify \bm u(\cdot) and go to the step 2.
+
+
The question is, of course, how to modify \bm u(t) for all t \in [t_\mathrm{i}, t_\mathrm{f}] in the step 4. Recall that the variation of the (augmented) cost functional is
+\begin{aligned}
+\delta J^\text{aug} &= [\nabla \phi(\bm x(t_\mathrm{f})) - \bm\lambda(t_\mathrm{f})]^\top \delta \bm{x}(t_\mathrm{f})\\
+& \qquad + \int_{t_\mathrm{i}}^{t_\mathrm{f}} [\dot{\bm{\lambda}} +\nabla_{\bm x} H(\bm x,\bm u,\bm \lambda)]^\top \delta \bm x(t)\mathrm{d}t + \int_{t_\mathrm{i}}^{t_\mathrm{f}} [\dot{\bm{x}} - \nabla_{\bm\lambda} H(\bm x,\bm u,\bm \lambda)]^\top \delta \bm\lambda(t) \mathrm{d}t\\
+& \qquad + \int_{t_\mathrm{i}}^{t_\mathrm{f}} [\nabla_{\bm u} H(\bm x,\bm u,\bm \lambda)]^\top \delta \bm u(t) \mathrm{d}t
+\end{aligned},
+ and for state and costate variables satisfying the state and costate equations this variation simplifies to
+\delta J^\text{aug} = \int_{t_\mathrm{i}}^{t_\mathrm{f}} [\nabla_{\bm u} H(\bm x,\bm u,\bm \lambda)]^\top \delta \bm u(t) \mathrm{d}t.
+
+
Since our goal is to minimize J^\text{aug}, we need to make \Delta J^\text{aug}\leq0. Provided the increment
+\delta \mathbf u(t)=\bm u^{(i+1)}(t)-\bm u^{(i)}(t)
+ is small, we can consider the linear approximation \delta J^\text{aug} instead. We choose
+\delta \bm u(t) = -\alpha \nabla_{\bm u} H(\bm x,\bm u,\bm \lambda)
+ for \alpha>0, which means that the control trajectory in the next iteration is \boxed
+{\bm u^{(i+1)}(t) = \bm u^{(i)}(t) -\alpha \nabla_{\bm u} H(\bm x,\bm u,\bm \lambda),}
+ and the variation of the augmented cost funtion is
+
+\delta J^\text{aug} = -\alpha\int_{t_\mathrm{i}}^{t_\mathrm{f}} [\nabla_{\bm u} H(\bm x,\bm u,\bm \lambda)]^2 \mathrm{d}t \leq 0,
+ and it is zero only for \nabla_{\bm u} H(t,\bm x,\bm u,\bm \lambda) = \mathbf 0 for all t\in[t_\mathrm{i}, t_\mathrm{f}].
+
+
+
Methods for solving TP-BVP ODE
+
Here we assume that from the stationarity equation
+\mathbf 0 = \nabla_{\bm u} H(\bm x,\bm u,\bm \lambda)
+ we can express \bm u(t) as a function of the the state and costate variables, \bm x(t) and \bm \lambda(t), respectively. In fact, Pontryagin’s principles gives this expression as \bm u^\star(t) = \text{arg} \min_{\bm u(t) \in\mathcal{U}} H(\bm x^\star(t),\bm u(t), \bm\lambda^\star(t)). And we substitute for \bm u(t) into the state and costate equations. This way we eliminate \bm u(t) from the system of DAEs and we are left with a system of ODEs for \bm x(t) and \bm \lambda(t) only. Formally, the resulting Hamiltonian is a different function as it is now a functio of two variables only.
Although we now have an ODE system, it is still a BVP. Strictly speaking, from now on, arbitrary reference on numerical solution of boundary value problems can be consulted to get some overview – we no longer need to restrict ourselves to the optimal control literature and software. On the other hand, the right sides are not quite arbitrary – these are Hamiltonian equations – and this property could and perhaps even should be exploited by the solution methods.
+
The methods for solving general BVPs are generally divided into
+
+
shooting and multiple shooting methods,
+
discretization methods,
+
collocation methods.
+
+
+
Shooting methods
+
+
Shooting method outside optimal control
+
Having made the diclaimer that boundary value problems constitute a topic indenendent of the optimal control theory, we start their investigation within a control-unrelated setup. We consider a system of two ordinary differential equations in two variables with the value of the first variable specified at both ends while the value of the other variable is left unspecified
+\begin{aligned}
+\begin{bmatrix}
+ \dot y_1(t)\\
+ \dot y_2(t)
+\end{bmatrix}
+&=
+\begin{bmatrix}
+f_1(\bm y,t)\\
+f_2(\bm y,t)
+\end{bmatrix}\\
+y_1(t_\mathrm{i}) &= \mathrm y_{1\mathrm{i}},\\
+y_1(t_\mathrm{f}) &= \mathrm y_{1\mathrm{f}}.
+\end{aligned}
+
+
An idea for a solution method is this:
+
+
Guess at the missing (unspecified) value y_{2\mathrm{i}} of y_2 at the initial time t_\mathrm{i},
+
Use an IVP solver (for example ode45 in Matlab) to find the values of both variables over the whole interval [t_\mathrm{i},t_\mathrm{f}].
+
Compare the simulated value of the state variable y_1 at the final time t_\mathrm{f} and compare it with the boundary value .
+
Based on the error e = y_1(t_\mathrm{f})-\mathrm y_{1\mathrm{f}}, update y_{2\mathrm{i}} and go back to step 2.
+
+
How shall the update in the step 4 be realized? The value of y_1 at the final time t_\mathrm{f} and therefore the error e are functions of the value y_{2\mathrm{i}} of y_2 at the initial time t_\mathrm{i}. We can formally express this upon introducing a map F such that e = F(y_{2\mathrm{i}}). The problem now boils down to solving the nonlinear equation \boxed
+{F(y_{2\mathrm{i}}) = 0.}
+
+
If Newton’s method is to be used for solving this equation, the derivative of F is needed. Most often than not, numerical solvers for IVP ODE have to be called in order to evaluate the function F, in which case the derivative cannot be determined analytically. Finite difference (FD) and algorithmic/automatic differentiation (AD) methods are available.
+
In this example we only considered y_1 and y_2 as scalar variables, but in general these could be vector variables, in which case a system of equations in the vector variable has to be solved. Instead of a single scalar derivative, its matrix version – Jacobian matrix – must be determined.
+
By now the reason for calling this method shooting is perhaps obvious. Indeed, the analogy with aiming and shooting a cannon is illustrative.
+
As another example, we consider the BVP for a pendulum.
+
+
Example 1 (BVP for pendulum) For an ideal pendulum described by the second-order model \ddot \theta + \frac{b}{ml^2}\dot \theta + \frac{g}{l} \sin(\theta) = 0 and for a final time t_\mathrm{f}, at which some prescribed value of \theta(t_\mathrm{f}) must be achieved, compute by the shooting method the needed value of the initial angle \theta_\mathrm{i}, while assuming the initial angular rate \omega_\mathrm{i} is zero.
+
+
+Show the code
+
usingDifferentialEquations
+usingRoots
+usingPlots
+
+functiondemo_shoot_pendulum()
+ θfinal =-0.2;
+ tfinal =3.5;
+ tspan = (0.0,tfinal)
+ tol =1e-5
+functionpendulum!(dx,x,p,t)
+ g =9.81
+ l =1.0;
+ m =1.0;
+ b =0.1;
+ a₁ = g/l
+ a₂ = b/(m*l^2)
+ θ,ω = x
+ dx[1] = ω
+ dx[2] =-a₁*sin(θ) - a₂*ω
+end
+ prob =ODEProblem(pendulum!,zeros(Float64,2),tspan)
+functionF(θ₀::Float64)
+ xinitial = [θ₀,0.0]
+ prob =remake(prob,u0=xinitial)
+ sol =solve(prob,Tsit5(),reltol=tol/10,abstol=tol/10)
+return θfinal-sol[end][1]
+end
+ θinitial =find_zero(F,(-pi,pi)) # Solving the equation F(θ)=0 using Roots package. In general can find more solutions.
+ xinitial = [θinitial,0.0]
+ prob =remake(prob,u0=xinitial) # Already solved in F(), but we solve it again for plotting.
+ sol =solve(prob,Tsit5())
+ p1 =plot(sol,lw=2,xlabel="Time",ylabel="Angle",label="θ",idxs=(1))
+scatter!([tfinal],[θfinal],label="Required terminal θ")
+ p2 =plot(sol,lw=2,xlabel="Time",ylabel="Angular rate",label="ω",idxs=(2))
+display(plot(p1,p2,layout=(2,1)))
+end
+
+demo_shoot_pendulum()
+
+
+
+
+
+
+
A few general comments to the above code:
+
+
The function F(\theta_\mathrm{i}) that defines the nonlinear equation F(\theta_\mathrm{i})=0 calles a numerical solver for an IVP ODE. The latter solver then should have the numerical tolerances set more stringent than the former.
+
The ODE problem should only be defined once and then in each iteration its parameters should be updated. In Julia, this is done by the remake function, but it may be similar for other languages.
+
+
+
+
Shooting method for indirect approach to optimal control
+
We finally bring the method into the realm of indirect approach to optimal control – it is the initial value \lambda_\mathrm{i} of the costate variable that serves as an optimization variable, while the initial value x_\mathrm{i} of the state variable is known and fixed. The final values of both the state and costate variables are the outcomes of numerical simulation obtained using a numerical solver for an IVP ODE. Based on these, the residual is computed. Either as e = x(t_\mathrm{f})-x_\mathrm{f} if the final state is fixed, or as e = \lambda(t_\mathrm{f}) - \nabla \phi(x(t_\mathrm{f})) if the final state is free. Based on this residual, the initial value of the costate is updated and another iteration of the algorithm is entered.
+
+
+
+
+
Example 2 (Shooting for indirect approach to LQR) Standard LQR optimal control for a second-order system on a fixed finite interval with a fixed final state.
+
+
+Show the code
+
usingLinearAlgebra
+usingDifferentialEquations
+usingNLsolve
+
+functionshoot_lq_fixed(A,B,Q,R,xinitial,xfinal,tfinal)
+ n =size(A)[1]
+functionstatecostateeq!(dw,w,p,t)
+ x = w[1:n]
+ λ = w[(n+1):end]
+ dw[1:n] = A*x -B*(R\B'*λ)
+ dw[(n+1):end] =-Q*x - A'*λ
+end
+ λinitial =zeros(n)
+ tspan = (0.0,tfinal)
+ tol =1e-5
+functionF(λinitial)
+ winitial =vcat(xinitial,λinitial)
+ prob =ODEProblem(statecostateeq!,winitial,tspan)
+ dsol =solve(prob,Tsit5(),abstol=tol/10,reltol=tol/10)
+ xfinalsolved = dsol[end][1:n]
+return (xfinal-xfinalsolved)
+end
+ nsol =nlsolve(F,λinitial,xtol=tol) # Could add autodiff=:forward.
+ λinitial = nsol.zero # Solving once again for plotting.
+ winitial =vcat(xinitial,λinitial)
+ prob =ODEProblem(statecostateeq!,winitial,tspan)
+ dsol =solve(prob,Tsit5(),abstol=tol/10,reltol=tol/10)
+return dsol
+end
+
+functiondemo_shoot_lq_fixed()
+ n =2# Order of the system.
+ m =1# Number of inputs.
+ A =rand(n,n) # Matrices modeling the system.
+ B =rand(n,m)
+
+ Q =diagm(0=>rand(n)) # Weighting matrices for the quadratic cost function.
+ R =rand(1,1)
+
+ xinitial = [1.0, 2.0]
+ xfinal = [3.0, 4.0]
+ tfinal =5.0
+
+ dsol =shoot_lq_fixed(A,B,Q,R,xinitial,xfinal,tfinal)
+
+ p1 =plot(dsol,idxs=(1:2),lw=2,legend=false,xlabel="Time",ylabel="State")
+ p2 =plot(dsol,idxs=(3:4),lw=2,legend=false,xlabel="Time",ylabel="Costate")
+display(plot(p1,p2,layout=(2,1)))
+end
+
+demo_shoot_lq_fixed()
+
+
+
+
+
+
+
+
+
+
Multiple shooting methods
+
The key deficiency of the shooting method is that the only source of the error is the error in the initial condition, this error then amplifies as it propagates over the whole time interval as the numerical integration proceeds, and consequently the residual is very sensitive to tiny changes in the initial value. The multiple shooting method is a remedy for this. The idea is to divide the interval [t_\mathrm{i},t_\mathrm{f}] into N subintervals [t_k,t_{k+1}] and to introduce the values of the state and co-state variable at the beginning of each subinterval as additional variables. Additional equations are then introduced that enforce the continuity of the variable at the end of one subinterval and at the beginning of the next subinterval.
+
+
+
+
+
+
Discretization methods
+
Shooting methods take advantage of availability of solvers for IVP ODEs. These solvers produce discret(ized) trajectories, proceeding (integration) step by step, forward in time. But they do this in a way hidden from users. We just have to set the initial conditions (possibly through numerical optimization) and the solver does the rest.
+
Alternatively, the formulas for the discrete-time updates are not evaluated one by one, step by step, running forward in time, but are assembled to form a system of equations, in general nolinear ones. Appropriate boundary conditions are then added to these nonlinear equations and the whole system is then solved numerically, yielding a discrete approximation of the trajectories satisfying the BVP.
+
Since all those equatins are solved simultaneously (as a system of equations), there is no advantage in using explicit methods for solving ODEs, and implicit methods are used instead.
+
It is now time to recall some crucial results from the numerical methods for solving ODEs. First, we start with the popular single-step methods known as the Runge-Kutta (RK) methods.
+
We consider the standard ODE \dot x(t) = f(x(t),t).
+
and we define the Butcher tableau as
+ \begin{array}{ l | c c c c }
+ c_1 & a_{11} & a_{12} & \ldots & a_{1s}\\
+ c_2 & a_{21} & a_{22} & \ldots & a_{2s}\\
+ \vdots & \vdots\\
+ c_s & a_{s1} & a_{s2} & \ldots & a_{ss}\\
+ \hline
+ & b_{1} & b_{2} & \ldots & b_{s}
+ \end{array}.
+ such that c_i = \sum_{j=1}^s a_{ij}, and 1 = \sum_{j=1}^s b_{j}.
+
Reffering to the particular Butcher table, a single step of the method is
+ \begin{aligned}
+ f_{k1} &= f(x_k + h_k \left(a_{11}f_{k1}+a_{12}f_{k2} + \ldots + a_{1s}f_{ks}),t_k+c_1h_k\right)\\
+ f_{k2} &= f(x_k + h_k \left(a_{21}f_{k1}+a_{22}f_{k2} + \ldots + a_{2s}f_{ks}),t_k+c_2h_k\right)\\
+ \vdots\\
+ f_{ks} &= f(x_k + h_k \left(a_{s1}f_{k1}+a_{s2}f_{k2} + \ldots + a_{ss}f_{ks}),t_k+c_sh_k\right)\\
+ x_{k+1} &= x_k + h_k \left(b_1 f_{k1}+b_2f_{k2} + \ldots + b_sf_{ks}\right).
+ \end{aligned}
+
+
If the matrix A is strictly lower triangular, that is, if a_{ij} = 0 for all i<j , the method belongs to explicit Runge-Kutta methods, otherwise it belongs to implicit Runge-Kutta methods.
+
A prominent example of explicit RK methods is the 4-stage RK method (oftentimes referred to as RK4).
+
+
Explicit RK4 method
+
The Buttcher table for the method is
+ \begin{array}{ l | c c c c }
+ 0 & 0 & 0 & 0 & 0\\
+ 1/2 & 1/2 & 0 & 0 & 0\\
+ 1/2 & 0 & 1/2 & 0 & 0\\
+ 1 & 0 & 0 & 1 & 0\\
+ \hline
+ & 1/6 & 1/3 & 1/3 & 1/6
+ \end{array}.
+
+
Following the Butcher table, a single step of this method is
+ \begin{aligned}
+ f_{k1} &= f(x_k,t_k)\\
+ f_{k2} &= f\left(x_k + \frac{h_k}{2}f_{k1},t_k+\frac{h_k}{2}\right)\\
+ f_{k3} &= f\left(x_k + \frac{h_k}{2}f_{k2},t_k+\frac{h_k}{2}\right)\\
+ f_{k4} &= f\left(x_k + h_k f_{k3},t_k+h_k\right)\\
+ x_{k+1} &= x_k + h_k \left(\frac{1}{6} f_{k1}+\frac{1}{3}f_{k2} + \frac{1}{3}f_{k3} + \frac{1}{6}f_{k4}\right)
+ \end{aligned}.
+
+
But as we have just mentions, explicit methods are not particularly useful for solving BVPs. We prefer implicit methods. One of the simplest is the implicit midpoint method.
+
+
+
Implicit midpoint method
+
The Butcher tableau is
+ \begin{array}{ l | c r }
+ 1/2 & 1/2 \\
+ \hline
+ & 1
+ \end{array}
+
+
A single step is then \begin{aligned}
+ f_{k1} &= f\left(x_k+\frac{1}{2}f_{k1} h_k, t_k+\frac{1}{2}h_k\right)\\
+ x_{k+1} &= x_k + h_k f_{k1}.
+ \end{aligned}
+
But adding to the last equation x_k we get x_{k+1} + x_k = 2x_k + h_k f_{k1}.
+
Dividing by two we get \frac{1}{2}(x_{k+1} + x_k) = x_k + \frac{1}{2}h_k f_{k1} and then it follows that \boxed{x_{k+1} = x_k + h_k f\left(\frac{1}{2}(x_k+x_{k+1}),t_k+\frac{1}{2}h_k\right).}
+
The right hand side of the last equation explains the “midpoint” in the name. It can be viewed as a rectangular approximation to the integral in x_{k+1} = x_k + \int_{t_k}^{t_{k+1}} f(x(t),t)\mathrm{d}t as the integral is computed as an area of a rectangle with the height determined by f() evaluated in the middle point.
+
Although we do not explain the details here, let’s just note that it is the simplest of the collocation methods. In particular it belongs to Gauss (also Gauss-Legandre) methods.
+
+
+
Implicit trapezoidal method
+
The method can be viewed both as a single-step (RK) method and a multi-step method. When viewed as an RK method, its Butcher table is
+ \begin{array}{ l | c r }
+ 0 & 0 & 0 \\
+ 1 & 1/2 & 1/2 \\
+ \hline
+ & 1/2 & 1/2 \\
+ \end{array}
+ Following the Butcher table, a single step of the method is then
+ \begin{aligned}
+ f_{k1} &= f(x_k,t_k)\\
+ f_{k2} &= f(x_k + h_k \frac{f_{k1}+f_{k2}}{2},t_k+h_k)\\
+ x_{k+1} &= x_k + h_k \left(\frac{1}{2} f_{k1}+\frac{1}{2} f_{k2}\right).
+ \end{aligned}
+
+
But since the collocation points are identical with the nodes (grid/mesh points), we can relabel to \begin{aligned}
+ f_{k} &= f(x_k,t_k)\\
+ f_{k+1} &= f(x_{k+1},t_{k+1})\\
+ x_{k+1} &= x_k + h_k \left(\frac{1}{2} f_{k}+\frac{1}{2} f_{k+1}\right).
+ \end{aligned}
+
+
This possibility is a particular advantage of Lobatto and Radau methods, which contain both end points of the interval or just one of them among the collocation points. The two symbols f_k and f_{k+1} are really just shorthands for values of the function f at the beginning and the end of the integration interval, so the first two equations of the triple above are not really equations to be solved but rather definitions. And we can assemble it all into just one equation \boxed{
+ x_{k+1} = x_k + h_k \frac{f(x_k,t_k)+f(x_{k+1},t_{k+1})}{2}.
+ }
+
+
The right hand side of the last equation explains the “trapezoidal” in the name. It can be viewed as a trapezoidal approximation to the integral in x_{k+1} = x_k + \int_{t_k}^{t_{k+1}} f(x(t),t)\mathrm{d}t as the integral is computed as an area of a trapezoid.
+
When it comes to building a system of equations within transcription methods, we typically move all the terms just on one side to obtain the defect equationsx_{k+1} - x_k - h_k \left(\frac{1}{2} f(x_k,t_k)+\frac{1}{2} f(x_{k+1},t_{k+1})\right) = 0.
+
+
+
Hermite-Simpson method
+
It belongs to the family of Lobatto III methods, namely it is a 3-stage Lobatto IIIA method. Butcher tableau
+ \begin{array}{ l | c c c c }
+ 0 & 0 &0 & 0\\
+ 1/2 & 5/24 & 1/3 & -1/24\\
+ 1 & 1/6 & 2/3 & 1/6\\
+ \hline
+ & 1/6 & 2/3 & 1/6
+ \end{array}
+
+
Hermite-Simpson method can actually come in three forms (this is from Betts (2020)):
+
+
Primary form
+
There are two equations for the given integration interval [t_k,t_{k+1}]x_{k+1} = x_k + h_k \left(\frac{1}{6}f_k + \frac{2}{3}f_{k2} + \frac{1}{6}f_{k+1}\right),x_{k2} = x_k + h_k \left(\frac{5}{24}f_k + \frac{1}{3}f_{k2} - \frac{1}{24}f_{k+1}\right), where the f symbols are just shorthand notations for values of the function at a certain point f_k = f(x_k,u(t_k),t_k),f_{k2} = f(x_{k2},u(t_{k2}),t_{k2}),f_{k+1} = f(x_{k+1},u(t_{k+1}),t_{k+1}), and the off-grid time t_{k2} is given by t_{k2} = t_k + \frac{1}{2}h_k.
+
The first of the two equations can be recognized as Simpson’s rule for computing a definite integral. Note that while considering the right hand sides as functions of the control inputs, we also correctly express at which time (the collocation time) we consider the value of the control variable.
+
Being this general allows considering general control inputs and not only piecewise constant control inputs. For example, if we consider piecewise linear control inputs, then u(t_{k2}) = \frac{u_k + u_{k+1}}{2}. But if we stick to the (more common) piecewise constant controls, not surprisingly u(t_{k2}) = u_k. Typically we format the equations as defect equations, that is, with zero on the right hand side
+\begin{aligned}
+x_{k+1} - x_k - h_k \left(\frac{1}{6}f_k + \frac{2}{3}f_{k2} + \frac{1}{6}f_{k+1}\right) &= 0,\\
+x_{k2} - x_k - h_k \left(\frac{5}{24}f_k + \frac{1}{3}f_{k2} - \frac{1}{24}f_{k+1}\right) &= 0.
+\end{aligned}
+
+
The optimization variables for every integration interval are x_k,u_k,x_{k2}, u_{k2}.
+
+
+
Hermite-Simpson Separated (HSS) method
+
Alternatively, we can express f_{k2} in the first equation as a function of the remaining terms and then substitute to the second equation. This will transform the second equation such that only the terms indexed with k and k+1 are present.
+\begin{aligned}
+x_{k+1} - x_k - h_k \left(\frac{1}{6}f_k + \frac{2}{3}f_{k2} + \frac{1}{6}f_{k+1}\right) &= 0,\\
+x_{k2} - \frac{x_k + x_{k+1}}{2} - \frac{h_k}{8} \left(f_k - f_{k+1}\right) &= 0.
+\end{aligned}
+
+
While we already know (from some paragraph above) that the first equation is Simpson’s rule, the second equation is an outcome of Hermite intepolation. Hence the name. The optimization variables for every integration interval are the same as before, that is, x_k,u_k,x_{k2}, u_{k2} .
+
+
+
Hermite-Simpson Condensed (HSC) method
+
Yet some more simplification can be obtained from HSS. Namely, the second equation can be actually used to directly prescribe x_{k2}x_{k2} = \frac{x_k + x_{k+1}}{2} + \frac{h_k}{8} \left(f_k - f_{k+1}\right), which is used in the first equation as an argument for the f() function (represented by the f_{k2} symbol), by which the second equation and the term x_{k2} are eliminated from the set of defect equations. The optimization variables for every integration interval still need to contain u_{k2} even though x_{k2} was eliminated, because it is needed to parameterize f_{k2} . That is, the optimization variables then are x_k,u_k, u_{k2} . Reportedly (by Betts) this has been widely used and historically one of the first methods. When it comes to using it in optimal control, it turns out, however, that the sparsity pattern is better for the HSS.
+
+
+
+
+
Collocation methods
+
Yet another family of methods for solving BVP ODE \dot x(t) = f(x(t),t) are collocation methods. They are also based on discretization of independent variable – the time t. That is, on the interval [t_\mathrm{i}, t_\mathrm{f}], discretization points (or grid points or nodes or knots) are chosen, say, t_0, t_1, \ldots, t_N, where t_0 = t_\mathrm{i} and t_N = t_\mathrm{f}. The solution x(t) is then approximated by a polynomial p_k(t) of a certain degree s on each interval [t_k,t_{k+1}] of length h_k=t_{k+1}-t_k
The degree of the polynomial is low, say s=3 or so, certainly well below 10. With N subintervals, the total number of coefficients to parameterize the (approximation of the) solution x(t) over the whole interval is then N(s+1). For example, for s=3 and N=10, we have 40 coefficients: p_{00}, p_{01}, p_{02}, p_{03}, p_{10}, p_{11}, p_{12}, p_{13},\ldots, p_{90}, p_{91}, p_{92}, p_{93}.
+
+
+
+
Finding a solution amounts to determining all those coefficients. Once we have them, the (approximate) solution is given by a piecewise polynomial.
+
How to determine the coefficients? By interpolation. But we will see in a moment that two types of interpolation are needed – interpolation of the value of the solution and interpolation of the derivative of the solution.
+
The former is only performed at the beginning of each interval, that is, at every discretization point (or grid point or node or knot). The condition reads that the polynomial p_{k-1}(t) approximating the solution x(t) on the (k-1)th interval should attain the same value at the end of that interval, that is, at t_{k-1} + h_{k-1}, as the polynomial p_k(t) approximating the solution x(t) on the kth interval attains at the same point, which from its perspective is the beginning of the kth interval, that is, t_k. We express this condition formally as \boxed{p_{k-1}(\underbrace{t_{k-1}+h_{k-1}}_{t_{k}}) = p_k(t_k).}
+
Expanding the two polynomials, we get p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2+\ldots + p_{k-1,s}h_{k-1}^s = p_{k0}.
+
+
+
+
+
+
+Subscripts in the coefficients
+
+
+
+
We adopt the notational convention that a coefficient of a polynomial is indexed by two indices, the first one indicating the interval and the second one indicating the power of the corresponding term. For example, p_{k-1,2} is the coefficient of the quadratic term in the polynomial approximating the solution on the (k-1)th interval. For the sake of brevity, we omit the comma between the two subscripts in the cases such as p_{k1} (instead of writing p_{k,1}). But we do write p_{k-1,0}, because here omiting the comma would introduce ambiguity.
+
+
+
Good, we have one condition (one equation) for each subinterval. But we need more, if polynomials of degree at least one are considered (we then parameterize them by two parameters, in which case one more equation is needed for each subinterval). Here comes the opportunity for the other kind of interpolation – interpolation of the derivative of the solution. At a given point (or points) that we call collocation points, the polynomial p_k(t) approximating the solution x(t) on the kth interval should satisfy the same differential equation \dot x(t) = f(x(t),t) as the solution does. That is, we require that at
+
t_{kj} = t_k + h_k c_{j}, \quad j=1,\ldots, s, which we call collocation points, the polynomial satisfies \boxed{\dot p_k(t_{kj}) = f(p_k(t_{kj}),t_{kj}), \quad j=1,\ldots, s.}
+
Expressing the derivative of the polynomial on the left and expanding the polynomial itself on the right, we get
+\begin{aligned}
+p_{k1} + &2p_{k2}(t_{kj}-t_k)+\ldots + s p_{ks}(t_{kj}-t_k)^{s-1} = \\ &f(p_{k0} + p_{k1}(t_{kj}-t_k) + p_{ks}(t_{kj}-t_k)^2 + \ldots + p_{ks}(t_{kj}-t_k)^s), \quad j=1,\ldots, s.
+\end{aligned}
+
+
This gives us the complete set of equations for each interval. For the considered example of a cubic polynomial, we have one interpolation condition at the beginning of the interval and then three collocation conditions at the collocation points. In total, we have four equations for each interval. The number of equations is equal to the number of coefficients of the polynomial. Before the system of equations can solved for the coefficients, we must specifies the collocation points. Based on these, the collocation methods split into three families:
+
+
Gauss or Gauss-Legendre methods – the collocation points are strictly inside each interval.
+
Lobatto methods – the collocation points include also both ends of each interval.
+
Radau methods – the collocation points include just one end of the interval.
+
+
+
+
+
+
+
+
+
+
+Important
+
+
+
+
Although in principle the collocation points could be arbitrary (but distinct), within a given family of methods, and for a given number of collocation points, some clever options are known that maximize accuracy.
+
+
+
+
Linear polynomials
+
Besides the piecewise constant approximation, which is too crude, not to speak of the discontinuity it introduces, the next simplest approximation of a solution x(t) on the interval [t_k,t_{k+1}] of length h_k=t_{k+1}-t_k is a linear (actually affine) polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k).
+
On the given kth interval it is parameterized by two parameters p_{k0} and p_{k1}, hence two equations are needed. The first equation enforces the continuity at the beginning of the interval \boxed
+{p_{k-1,0} + p_{k-1,1}h_{k-1} = p_{k0}.}
+
+
The remaining single equation is the collocation condition at a single collocation point t_{k1} = t_k + h_k c_1, which remains to be chosen. One possible choice is c_1 = 1/2, that is
+t_{k1} = t_k + \frac{h_k}{2}
+
+
In words, the collocation point is chosen in the middle of the interval. The collocation condition then reads \boxed
+{p_{k1} = f\left(p_{k0} + p_{k1}\frac{h_k}{2}\right).}
+
+
+
+
Quadratic polynomials
+
If a quadratic polynomial is used to approximate the solution, the condition at the beginning of the interval is \boxed
+{p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2 = p_{k0}.}
+
+
Two more equations – collocation conditions – are needed to specify all the three coefficients that parameterize the aproximating polynomial on a given interval [t_k,t_{k+1}]. One intuitive (and actually clever) choice is to place the collocation points at the beginning and the end of the interval, that is, at t_k and t_{k+1}. The coefficient that parameterize the relative position of the collocation points with respect to the interval are c_1=0 and c_2=1 The collocation conditions then read \boxed
+{\begin{aligned}
+p_{k1} &= f(p_{k0}),\\
+p_{k1} + 2p_{k2}h_{k} &= f(p_{k0} + p_{k1}h_k + p_{k2}h_k^2).
+\end{aligned}}
+
+
+
+
Cubic polynomials
+
When a cubic polynomial is used, the condition at the beginning of the kth interval is \boxed
+{p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2+p_{k-1,3}h_{k-1}^3 = p_{k0}.}
+
+
Three more equations are needed to determine all the four coefficients of the polynomial. Where to place the collocations points? One intuitive (and clever too) option is to place them at the beginning, in the middle, and at the end of the interval. The relative positions of the collocation points are then given by c_1=0, c_2=1/2, and c_3=1. The collocation conditions then read \boxed
+{\begin{aligned}
+p_{k1} &= f\left(p_{k0} + p_{k1}(t_{k1}-t_k) + p_{k2}(t_{k1}-t_k)^2 + p_{k3}(t_{k1}-t_k)^3\right),\\
+p_{k1} + 2p_{k2}\frac{h_k}{2} + 3 p_{k3}\left(\frac{h_k}{2}\right)^{2} &= f\left(p_{k0} + p_{k1}\frac{h_k}{2} + p_{k2}\left(\frac{h_k}{2}\right)^2 + p_{k3}\left(\frac{h_k}{2} \right)^3\right),\\
+p_{k1} + 2p_{k2}h_k + 3 p_{k3}h_k^{2} &= f\left(p_{k0} + p_{k1}h_k + p_{k2}h_k^2 + p_{k3}h_k^3\right).
+\end{aligned}}
+
+
+
+
+
Collocation methods are implicit Runge-Kutta methods
+
An important observation that we are goint to make is that collocation methods can be viewed as implicit Runge-Kutta methods. But not all IRK methods can be viewed as collocation methods. In this section we show that the three implicit RK methods that we covered above are indeed (equivalent to) collocation methods. By the equivalence we mean that there is a linear relationship between the coefficients of the polynomials that approximate the solution on a given (sub)interval and the solution at the discretization point together with the derivative of the solution at the collocation points.
+
+
Implicit midpoint method as a Radau collocation method
+
For the given integration interval [t_k,t_{k+1}], we write down two equations that relate the two coefficients of the linear polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) and an approximation x_k of x(t) at the beginning of the interval t_k, as well as an approximation of \dot x(t) at the (single) collocation point t_{k1} = t_{k} + \frac{h_k}{2}.
+
In particular, the first interpolation condition is p_k(t_k) = \textcolor{red}{p_{k0} = x_k} \approx x(t_k).
+
The second interpolation condition, the one on the derivative in the middle of the interval is \dot p_k\left(t_k + \frac{h_k}{2}\right) = \textcolor{red}{p_{k1} = f(x_{k1},t_{k1})} \approx f(x(t_{k1}),t_{k1}).
+
Note that here we introduced yet another unknown – the approximation x_{k1} of x(t_{k1}) at the collocation point t_{k1}. We can write it using the polynomial p_k(t) as
+x_{k1} = p_k\left(t_k + \frac{h_k}{2}\right) = p_{k0} + p_{k1}\frac{h_k}{2}.
+
+
Substituting for p_{k0} and p_{k1}, we get
+x_{k1} = x_k + f(x_{k1},t_{k1})\frac{h_k}{2}.
+
+
We also introduce the notation f_{k1} for f(x_{k1},t_{k1}) and we can write an equation
+f_{k1} = f\left(x_k + f_{k1}\frac{h_k}{2}\right).
+
+
But we want to find x_{k+1}, which we can accomplish by evaluating the polynomial p_k(t) at t_{k+1} = t_k+h_k
+x_{k+1} = x_k + f_{k1}h_k.
+
+
Collecting the last two equations, we rederived the good old friend – the implicit midpoint method.
+
+
+
Implicit trapezoidal method as a Lobatto collocation method
+
For the given integration interval [t_k,t_{k+1}], we write down three equations that relate the three coefficients of the quadratic polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) + p_{k2}(t-t_k)^2 and an approximation x_k of x(t) at the beginning of the interval t_k, as well as approximations to \dot x(t) at the two collocations points t_k and t_{k+1}.
+
In particular, the first interpolation condition is p_k(t_k) = \textcolor{red}{p_{k0} = x_k} \approx x(t_k).
+
The second interpolation condition, the one on the derivative at the beginning of the interval, the first collocation point, is \dot p_k(t_k) = \textcolor{red}{p_{k1} = f(x_k,t_k)} \approx f(x(t_k),t_k).
+
The third interpolation condition, the one on the derivative at the second collocation point \dot p_k(t_k+h_k) = \textcolor{red}{p_{k1} + 2p_{k2} h_k = f(x_{k+1},t_{k+1})} \approx f(x(t_{k+1}),t_{k+1}).
+
All the three conditions (emphasized in color above) can be written together as
+ \begin{bmatrix}
+ 1 & 0 & 0\\
+ 0 & 1 & 0\\
+ 0 & 1 & 2 h_k\\
+ \end{bmatrix}
+ \begin{bmatrix}
+ p_{k0} \\ p_{k1} \\ p_{k2}
+ \end{bmatrix}
+ =
+ \begin{bmatrix}
+ x_{k} \\ f(x_k,t_k) \\ f(x_{k+1},t_{k+1})
+ \end{bmatrix}.
+
+
The above system of linear equations can be solved by inverting the matrix
+ \begin{bmatrix}
+ p_{k0} \\ p_{k1} \\ p_{k2}
+ \end{bmatrix}
+ =
+ \begin{bmatrix}
+ 1 & 0 & 0\\
+ 0 & 1 & 0\\
+ 0 & -\frac{1}{2h_k} & \frac{1}{2h_k}\\
+ \end{bmatrix}
+ \begin{bmatrix}
+ x_{k} \\ f(x_k,t_k) \\ f(x_{k+1},t_{k+1})
+ \end{bmatrix}.
+
+
We can now write down the interpolating/approximating polynomial p_k(t) = x_{k} + f(x_{k},t_{k})(t-t_k) +\left[-\frac{1}{2h_k}f(x_{k},t_{k}) + \frac{1}{2h_k}f(x_{k+1},t_{k+1})\right](t-t_k)^2.
+
This polynomial can now be used to find an (approximation of the) value of the solution at the end of the interval x_{k+1} = p_k(t_k+h_k) = x_{k} + f(x_{k},t_{k})h_k +\left[-\frac{1}{2h_k}f(x_{k},t_{k}) + \frac{1}{2h_k}f(x_{k+1},t_{k+1})\right]h_k^2, which can be simplified nearly upon inspection to x_{k+1} = x_{k} + \frac{f(x_{k},t_{k}) + f(x_{k+1},t_{k+1})}{2} h_k, but this is our good old friend, isn’t it? We have shown that the collocation method with a quadratic polynomial with the collocation points chosen at the beginning and the end of the interval is (equivalent to) the implicit trapezoidal method. The method belongs to the family of Lobatto IIIA methods, which are all known to be collocation methods.
+
+
+
Hermite-Simpson method as a Lobatto collocation method
+
Here we show that Hermite-Simpson method also qualifies as a collocation method. In particular, it belongs to the family of Lobatto IIIA methods, similarly as implicit trapezoidal method. The first condition, the one on the value of the cubic polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) + p_{k2}(t-t_k)^2+ p_{k3}(t-t_k)^3 at the beginning of the interval is p_k(t_k) = \textcolor{red}{p_{k0} = x_k} \approx x(t_k).
+
The three remaining conditions are imposed at the collocation points, which for the integration interval [t_k,t_{k+1}] are t_{k1} = t_k , t_{k2} = \frac{t_k+t_{k+1}}{2} , and t_{k3} = t_{k+1}. With the first derivative of the polynomial given by \dot p_k(t) = p_{k1} + 2p_{k2}(t-t_k) + 3p_{k3}(t-t_k)^2, the first collocation condition \dot p_k(t_k) = \textcolor{red}{p_{k1} = f(x_k,t_k)} \approx f(x(t_k),t_k).
+
The second collocation condition – the one on the derivative in the middle of the interval – is \dot p_k\left(t_k+\frac{1}{2}h_k\right) = \textcolor{red}{p_{k1} + 2p_{k2} \frac{h_k}{2} + 3p_{k3} \left(\frac{h_k}{2}\right)^2 = f(x_{k2},t_{k2})} \approx f\left(x\left(t_{k}+\frac{h_k}{2}\right),t_{k}+\frac{h_k}{2}\right).
+
The color-emphasized part can be simplified to \textcolor{red}{p_{k1} + p_{k2} h_k + \frac{3}{4}p_{k3} h_k^2 = f(x_{k2},t_{k2})}.
+
Finally, the third collocation condition – the one imposed at the end of the interval – is \dot p_k(t_k+h_k) = \textcolor{red}{p_{k1} + 2p_{k2} h_k + 3p_{k3} h_k^2 = f(x_{k+1},t_{k+1})} \approx f(x(t_{k+1}),t_{k+1}).
+
All the four conditions (emphasized in color above) can be written together as
+ \begin{bmatrix}
+ 1 & 0 & 0 & 0\\
+ 0 & 1 & 0 & 0\\
+ 0 & 1 & h_k & \frac{3}{4} h_k^2\\
+ 0 & 1 & 2 h_k & 3h_k^2\\
+ \end{bmatrix}
+ \begin{bmatrix}
+ p_{k0} \\ p_{k1} \\ p_{k2} \\p_{k3}
+ \end{bmatrix}
+ =
+ \begin{bmatrix}
+ x_{k} \\ f(x_k,t_k) \\ f(x_{k2},t_{k2}) \\ f(x_{k+1},t_{k+1}).
+ \end{bmatrix}
+
We can now write down the interpolating/approximating polynomial
+ \begin{aligned}
+ p_k(t) &= x_{k} + f(x_{k},t_{k})(t-t_k) +\left[-\frac{3}{2h_k}f(x_{k},t_{k}) + \frac{2}{h_k}f(x_{k2},t_{k2}) -\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \right](t-t_k)^2\\
+ & +\left[\frac{2}{3h_k^2}f(x_{k},t_{k}) - \frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \right](t-t_k)^3.
+ \end{aligned}
+
+
We can use this prescription of the polynomial p_k(t) to compute the (approximation of the) value of the solution at the end of the kth interval
+ \begin{aligned}
+ x_{k+1} = p_k(t_k+h_k) &= x_{k} + f(x_{k},t_{k})h_k +\left[-\frac{3}{2h_k}f(x_{k},t_{k}) + \frac{2}{h_k}f(x_{k2},t_{k2}) -\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \right]h_k^2\\
+ & +\left[\frac{2}{3h_k^2}f(x_{k},t_{k}) - \frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \right]h_k^3,
+ \end{aligned}
+ which can be simplified to
+ \begin{aligned}
+ x_{k+1} &= x_{k} + f(x_{k},t_{k})h_k +\left[-\frac{3}{2}f(x_{k},t_{k}) + \frac{2}{1}f(x_{k2},t_{k2}) -\frac{1}{2}f(x_{k+1},t_{k+1}) \right]h_k\\
+ & +\left[\frac{2}{3}f(x_{k},t_{k}) - \frac{4}{3}f(x_{k2},t_{k2}) +\frac{2}{3}f(x_{k+1},t_{k+1}) \right]h_k,
+ \end{aligned}
+ which further simplifies to
+ x_{k+1} = x_{k} + h_k\left[\frac{1}{6}f(x_{k},t_{k}) + \frac{2}{3}f(x_{k2},t_{k2}) + \frac{1}{6}f(x_{k+1},t_{k+1}) \right],
+ which can be recognized as the Simpson integration that we have already seen in implicit Runge-Kutta method described above.
+
Obviously f_{k2} needs to be further elaborated on, namely, x_{k2} needs some prescription too. We know that it was introduced as an approximation to the solution x in the middle of the interval. Since the value of the polynomial in the middle is such an approximation too, we can set x_{k2} equal to the value of the polynomial in the middle.
+ \begin{aligned}
+ x_{k2} = p_k\left(t_k+\frac{1}{2}h_k\right) &= x_{k} + f(x_{k},t_{k})\frac{h_k}{2} +\left[-\frac{3}{2h_k}f(x_{k},t_{k}) + \frac{2}{h_k}f(x_{k2},t_{k2}) -\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \right]\left(\frac{h_k}{2}\right)^2\\
+ & +\left[\frac{2}{3h_k^2}f(x_{k},t_{k}) - \frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \right]\left(\frac{h_k}{2}\right)^3,
+ \end{aligned}
+ which without further ado simplifies to
+ x_{k2} = x_{k} + h_k\left( \frac{5}{24}f(x_{k},t_{k}) +\frac{1}{3}f(x_{k2},t_{k2}) -\frac{1}{24}f(x_{k+1},t_{k+1}) \right),
+ which can be recognized as the other equation in the primary formulation of Implicit trapezoidal method described above.
+
+
+
+
Pseudospectral collocation methods
+
They only consider a single polynomial over the whole interval. The degree of such polynomial, in contrast with classical collocation methods, rather high, therefore also the number of collocation points is high, but their location is crucial.
+Betts, John T. 2020. Practical Methods for Optimal Control Using Nonlinear Programming. 3rd ed. Advances in Design and Control. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611976199.
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/cont_numerical_indirect.html b/cont_numerical_indirect.html
index cf5ad45..ccba861 100644
--- a/cont_numerical_indirect.html
+++ b/cont_numerical_indirect.html
@@ -979,94 +979,94 @@
Sh
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Figure 1: State responses for a pendulum on a given time interval, with zero initial angular rate and the initial angle solved for numerically so that the final angle attains a give value
@@ -1157,82 +1157,82 @@
The indirect approach to the continuous-time optimal control problem (OCP) formulates the necessary conditions of optimality as a two-point boundary value problem (TP-BVP), which generally requires numerical methods. The direct approach to the continuous-time OCP relies heavily on numerical methods too, namely the methods for solving nonlinear programs (NLP) and methods for solving ordinary differential equations (ODE). Numerical methods for both approaches share a lot of common principles and tools, and these are collectively presented in the literature as called numerical optimal control. A recommendable (and freely online available) introduction to these methods is (Gros and Diehl 2022). Shorter version of this is in chapter 8 of (Rawlings, Mayne, and Diehl 2017), which is also available online. A more comprehensive treatment is in (Betts 2020).
Another name under which the numerical methods for the direct approach are presented is trajectory optimization. There are quite a few tutorials and surveys such as (M. Kelly 2017) and (M. P. Kelly 2017).
+Betts, John T. 2020. Practical Methods for Optimal Control Using Nonlinear Programming. 3rd ed. Advances in Design and Control. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611976199.
+
+
+Bryson, Arthur E., Jr., and Yu-Chi Ho. 1975. Applied Optimal Control: Optimization, Estimation and Control. Revised edition. CRC Press.
+
+
+Gros, Sebastien, and Moritz Diehl. 2022. “Numerical Optimal Control (Draft).” Systems Control; Optimization Laboratory IMTEK, Faculty of Engineering, University of Freiburg. https://www.syscop.de/files/2020ss/NOC/book-NOCSE.pdf.
+
+
+Kelly, Matthew. 2017. “An Introduction to Trajectory Optimization: How to Do Your Own Direct Collocation.”SIAM Review 59 (4): 849–904. https://doi.org/10.1137/16M1062569.
+
+
+Kelly, Matthew P. 2017. “Transcription Methods for Trajectory Optimization: A Beginners Tutorial.”arXiv:1707.00284 [Math], July. http://arxiv.org/abs/1707.00284.
+
+
+Kirk, Donald E. 2004. Optimal Control Theory: An Introduction. Reprint of the 1970 edition. Dover Publications.
+
+von Stryk, O., and R. Bulirsch. 1992. “Direct and Indirect Methods for Trajectory Optimization.”Annals of Operations Research 37 (1): 357–73. https://doi.org/10.1007/BF02071065.
+
Here we specialize the general procedure from the previous section to the case of a linear system and a quadratic cost. we start by considering a simple problem of regulation, wherein the goal is to bring the system either exactly or approximately to zero final state, that is, \mathbf x^\text{ref}=\mathbf 0 and we want \bm x_N=\mathbf x^\text{ref} or \bm x_N\approx\mathbf x^\text{ref}, respectively.
+\begin{aligned}
+\operatorname*{minimize}_{\mathbf u_0,\ldots, \mathbf u_{N-1}, \mathbf x_{0},\ldots, \mathbf x_N} &\quad \frac{1}{2} \bm x_N^\top \mathbf S \bm x_N + \frac{1}{2} \sum_{k=0}^{N-1} \left(\bm x_k^\top \mathbf Q \bm x_k + \bm u_k^\top \mathbf R \bm u_k \right)\\
+\text{subject to} &\quad \bm x_{k+1} = \mathbf A\bm x_k + \mathbf B\bm u_k,\quad k = 0, \ldots, N-1, \\
+ &\quad \bm x_0 = \mathbf x_0,\\
+ &\quad \bm x_N = \mathbf 0\; (\text{or}\, \bm x_N \approx \mathbf 0).
+\end{aligned}
+
+
Referring to the two options for the last constraint,
+
+
if the condition \bm x_N=\mathbf 0 on the final state is strictly enforced, the terminal state cost (the term \frac{1}{2} \bm x_N^\top \mathbf S \bm x_N in the cost function) is redundant and can be removed;
+
if the final state condition can be relaxed to \bm x_N\approx\mathbf 0, it is by increasing the weight \mathbf S in the terminal cost function \frac{1}{2} \bm x_N^\top \mathbf S \bm x_N that \bm x_N can be made arbitrarily close to \mathbf 0.
+
+
+
+
+
+
+
+Tip
+
+
+
+
It is a standard dilemma in optimization, not only in optimal control, that if we want to satisfy some requirement, we can either strictly enforce it through constraints or we can seemingly relax it and set a cost to be paid for not satysfying it.
+
+
+
+
Simultaneous (sparse) formulation
+
Below we rewrite the latter problem, that is, \bm x_N\approx\mathbf 0, in the “unrolled” form, where we stack the state and control variables into “long” vectors \bar{\bm x} and \bar{\bm u}. Doing the same for the former is straightforward.
+\begin{aligned}
+\operatorname*{minimize}_{\bar{\bm u},\bar{\bm x}} & \frac{1}{2}\left(\begin{bmatrix} \bm x_1^\top & \bm x_2^\top & \ldots & \bm x_N^\top \end{bmatrix}
+\underbrace{\begin{bmatrix}\mathbf Q & & & \\ & \mathbf Q & &\\ & &\ddots & \\ & & & \mathbf S \end{bmatrix}}_{\overline{\mathbf Q}}
+\underbrace{\begin{bmatrix} \bm x_1 \\ \bm x_2 \\ \vdots \\ \bm x_N \end{bmatrix}}_{\bar{\bm x}}\right.\\
+&\qquad +\left.
+\begin{bmatrix} \bm u_0^\top & \bm u_1^\top & \ldots & \bm u_{N-1}^\top \end{bmatrix}
+\underbrace{\begin{bmatrix}\mathbf R & & & \\ & \mathbf R & &\\ & &\ddots & \\ & & & \mathbf R \end{bmatrix}}_{\overline{\mathbf R}}
+\underbrace{\begin{bmatrix} \bm u_0 \\ \bm u_1 \\ \vdots \\ \bm u_{N-1} \end{bmatrix}}_{\bar{\bm u}}\right)
++ \underbrace{\frac{1}{2}\mathbf x_0^\top \mathbf Q \mathbf x_0}_{\mathrm{constant}}
+\end{aligned}
+ subject to
+\begin{bmatrix} \bm x_1 \\ \bm x_2 \\ \bm x_3\\ \vdots \\ \bm x_N \end{bmatrix} = \underbrace{\begin{bmatrix}\mathbf 0 & & & &\\\mathbf A & \mathbf 0 & & &\\ &\mathbf A &\mathbf 0 & & \\ & & &\ddots & \\& & &\mathbf A & \mathbf 0 \end{bmatrix}}_{\overline{\mathbf A}}
+\begin{bmatrix} \bm x_1 \\ \bm x_2 \\ \bm x_3\\ \vdots \\ \bm x_N \end{bmatrix} + \underbrace{\begin{bmatrix}\mathbf B & & & & \\ & \mathbf B & & & \\& &\mathbf B & \\ & & &\ddots \\ & & & & \mathbf B \end{bmatrix}}_{\overline{\mathbf B}}\begin{bmatrix} \bm u_0 \\ \bm u_1 \\ \bm u_2\\\vdots \\ \bm u_{N-1} \end{bmatrix} + \underbrace{\begin{bmatrix}\mathbf A\\\mathbf 0\\\mathbf 0\\\vdots\\\mathbf 0\end{bmatrix}}_{\overline{\mathbf A}_0}\mathbf x_0.
+
+
Note that the last term in the cost function can be discarded because it is constant.
+
The terms with the \bar{\bm x} vector can be combined and we get
+\begin{bmatrix} \mathbf 0 \\ \mathbf 0 \\ \mathbf 0\\ \vdots \\ \mathbf 0 \end{bmatrix} = \underbrace{\begin{bmatrix}-\mathbf I & & & &\\\mathbf A & -\mathbf I & & &\\ &\mathbf A &-\mathbf I & & \\ & & &\ddots & \\& & &\mathbf A & -\mathbf I \end{bmatrix}}_{\overline{\mathbf A} - \mathbf I}
+\begin{bmatrix} \mathbf x_1 \\ \mathbf x_2 \\ \mathbf x_3\\ \vdots \\ \mathbf x_N \end{bmatrix} + \underbrace{\begin{bmatrix}\mathbf B & & & & \\ & \mathbf B & & & \\& &\mathbf B & \\ & & &\ddots \\ & & & & \mathbf B \end{bmatrix}}_{\overline{\mathbf B}}\begin{bmatrix} \mathbf u_0 \\ \mathbf u_1 \\ \mathbf u_2\\\vdots \\ \mathbf u_{N-1} \end{bmatrix} + \underbrace{\begin{bmatrix}\mathbf A\\\mathbf 0\\\mathbf 0\\\vdots\\\mathbf 0\end{bmatrix}}_{\overline{\mathbf A}_0}\mathbf x_0.
+\tag{1}
+
Upon stacking the two “long” vectors into \bar{\bm z} we reformulate the optimization problem as
+\operatorname*{minimize}_{\widetilde{\mathbf z}\in\mathbb{R}^{2N}}\quad \frac{1}{2}\underbrace{\begin{bmatrix}\bar{\bm x}^\top &\bar{\bm u}^\top\end{bmatrix}}_{\bar{\bm z}^\top} \underbrace{\begin{bmatrix}\overline{\mathbf Q} & \\ & \overline{\mathbf R} \end{bmatrix}}_{\widetilde{\mathbf Q}}\underbrace{\begin{bmatrix}\bar{\bm x}\\\bar{\bm u}\end{bmatrix}}_{\bar{\bm z}}
+ subject to
+\mathbf 0 = \underbrace{\begin{bmatrix}(\overline{\mathbf A}-\mathbf I) & \overline{\mathbf B}\end{bmatrix}}_{\widetilde{\mathbf A}}\underbrace{\begin{bmatrix}\bar{\bm x}\\\bar{\bm u}\end{bmatrix}}_{\bar{\bm z}} + \underbrace{\overline{\mathbf A}_0 \mathbf x_0}_{\tilde{\mathbf b}}.
+
+
To summarize, we have reformulated the optimal control problem as a linearly constrained quadratic program
+\boxed{
+\begin{aligned}
+\underset{\bar{\bm z}\in\mathbb{R}^{2N}}{\text{minimize}} &\quad \frac{1}{2}\bar{\bm z}^\top \widetilde{\mathbf Q} \bar{\bm z}\\
+\text{subject to} &\quad \widetilde{\mathbf A} \bar{\bm z} + \tilde{\bm b} = \mathbf 0.
+\end{aligned}}
+
+
+
+Code
+
functiondirect_dlqr_simultaneous(A,B,x₀,Q,R,S,N)
+ Qbar =BlockArray(spzeros(N*n,N*n),repeat([n],N),repeat([n],N))
+for i=1:(N-1)
+ Qbar[Block(i,i)] = Q
+end
+ Qbar[Block(N,N)] = S
+ Rbar =BlockArray(spzeros(N*m,N*m),repeat([m],N),repeat([m],N))
+for i=1:N
+ Rbar[Block(i,i)] = R
+end
+ Qtilde =blockdiag(sparse(Qbar),sparse(Rbar)) # The matrix defining the quadratic cost.
+ Bbar =BlockArray(spzeros(N*n,N*m),repeat([n],N),repeat([m],N))
+for i=1:N
+ Bbar[Block(i,i)] = B
+end
+ Abar =BlockArray(sparse(-1.0*I,n*N,n*N),repeat([n],N),repeat([n],N))
+for i=2:N
+ Abar[Block(i,(i-1))] = A
+end
+ Atilde =sparse([Abar Bbar]) # The matrix defining the linear (affine) equation.
+ A0bar =spzeros(n*N,n)
+ A0bar[1:n,1:n] = A
+ btilde =A0bar*sparse(x₀) # The constant offset for the linear (affine) equation.
+ K = [Qtilde Atilde'; Atilde spzeros(size(Atilde,1),size(Atilde,1))] # Sparse KKT matrix.
+ F =qdldl(K) # KKT matrix LDL factorization.
+ k = [spzeros(size(Atilde,1)); -btilde] # Right hand side of the KKT system
+ xtildeλ =solve(F,k) # Solving the KKT system using the factorization.
+ xopt =reshape(xtildeλ[1:(n*N)],(n,:))
+ uopt =reshape(xtildeλ[(n*N+1):(n+m)*N+1],(m,:))
+return xopt,uopt
+end
+
+n =2# Number of state variables.
+m =1# Number of (control) input variables.
+A =rand(n,n) # State matrix.
+B =rand(n,m) # Input coupling matrix.
+x₀ = [1.0, 3.0] # Initial state.
+
+N =10# Time horizon.
+
+s = [1.0, 2.0]
+q = [1.0, 2.0]
+r = [1.0]
+
+S =diagm(0=>s) # Matrix defining the terminal state cost.
+Q =diagm(0=>q) # Matrix defining the running state dost.
+R =diagm(0=>r) # Matrix defining the cost of control.
+
+uopts,xopts =direct_dlqr_simultaneous(A,B,x₀,Q,R,S,N)
+
+usingPlots
+p1 =plot(0:(N-1),uopts,marker=:diamond,label="u",linetype=:steppost)
+xlabel!("k")
+ylabel!("u")
+
+p2 =plot(0:N,hcat(x0,xopts)',marker=:diamond,label=["x1" "x2"],linetype=:steppost)
+xlabel!("k")
+ylabel!("x")
+
+plot(p1,p2,layout=(2,1))
+
+
+
This constrained optimization problem can still be solved without invoking a numerical solver for solving quadratic programs (QP). We do it by introducing a vector \boldsymbol\lambda of Lagrange multipliers to form the Lagrangian function
+\mathcal{L}(\bar{\bm z}, \boldsymbol \lambda) = \frac{1}{2}\bar{\bm z}^\top \widetilde{\mathbf Q} \bar{\bm z} + \boldsymbol\lambda^\top(\widetilde{\mathbf A} \bar{\bm z} + \tilde{\mathbf b}),
+ for which the gradients with respect to \bar{\bm z} and \boldsymbol\lambda are
+\begin{aligned}
+\nabla_{\tilde{\bm{z}}} \mathcal{L}(\bar{\bm z}, \boldsymbol\lambda) &= \widetilde{\mathbf Q}\bar{\bm z} + \tilde{\mathbf A}^\top\boldsymbol\lambda,\\
+\nabla_{\boldsymbol{\lambda}} \mathcal{L}(\tilde{\bm x}, \boldsymbol\lambda) &=\widetilde{\mathbf A} \bar{\bm z} + \tilde{\mathbf b}.
+\end{aligned}
+
+
Requiring that the overall gradient vanishes leads to the following KKT set of linear equations
+\begin{bmatrix}
+ \widetilde{\mathbf Q} & \widetilde{\mathbf A}^\top\\ \widetilde{\mathbf A} & \mathbf 0
+\end{bmatrix}
+\begin{bmatrix}
+\bar{\bm z}\\\boldsymbol\lambda
+\end{bmatrix}
+=
+\begin{bmatrix}
+\mathbf 0\\ -\tilde{\mathbf b}
+\end{bmatrix}.
+
+
Solving this could be accomplished by using some general solver for linear systems or by using some more tailored solver for symmetric indefinite systems (based on LDL factorization, for example ldl in Matlab).
+
+
Adding constraints on controls and states
+
When solving a real optimal control problem, we may want to impose inequality constraints on \bm u_k due to saturation of actuators. We may also want to add constraints on \bm x_k as well, which may reflect some performance specifications. In both cases, the KKT system above would have to be augmented and we resort to some already finetuned numerical solver for quadratic programming (QP) instead.
+
+
+
+
Sequential (dense) formulation
+
We can express \bar{\bm x} as a function of \bar{\bm u} and \mathbf x_0. This can be done in a straightforward way using (Equation 1), namely,
+\bar{\bm x} = (\mathbf I-\overline{\mathbf A})^{-1}\overline{\mathbf B} \bm u + (\mathbf I-\overline{\mathbf A})^{-1} \overline{\mathbf A}_0 \mathbf x_0.
+
+
However, instead of solving the sets of equations, we can do this substitution in a more insightful way. Write down the state equation for several discrete times
+\begin{aligned}
+\bm x_1 &= \mathbf A\mathbf x_0 + \mathbf B\bm u_0\\
+\bm x_2 &= \mathbf A\mathbf x_0 + \mathbf B\bm u_0\\
+ &= \mathbf A(\mathbf A\mathbf x_0 + \mathbf B\bm u_0)+ \mathbf B\bm u_0\\
+ &= \mathbf A^2\mathbf x_0 + \mathbf A\mathbf B\bm u_0 + \mathbf B\bm u_0\\
+ &\vdots\\
+\bm x_k &= \mathbf A^k\mathbf x_0 + \mathbf A^{k-1}\mathbf B\bm u_0 +\mathbf A^{k-2}\mathbf B\bm u_1 +\ldots \mathbf B\bm u_{k-1}.
+\end{aligned}
+
+
Rewriting into matrix-vector form (and extending the time k up to the final time N)
+\begin{bmatrix}
+\bm x_1\\\bm x_2\\\vdots\\\bm x_N
+\end{bmatrix}
+=
+\underbrace{
+\begin{bmatrix}
+ \mathbf B & & & \\
+ \mathbf A\mathbf B & \mathbf B & & \\
+ \vdots & & \ddots &\\
+ \mathbf A^{N-1}\mathbf B & \mathbf A^{N-2}\mathbf B & & \mathbf B
+\end{bmatrix}}_{\widehat{\mathbf C}}
+ \begin{bmatrix}
+\bm u_0\\\bm u_1\\\vdots\\\bm u_{N-1}
+\end{bmatrix}
++
+\underbrace{
+ \begin{bmatrix}
+\mathbf A\\\mathbf A^2\\\vdots\\\mathbf A^N
+\end{bmatrix}}_{\widehat{\mathbf A}}\mathbf x_0
+
+
For convenience, let’s rewrite the compact relation between \bar{\bm x} and \bar{\bm u} and \mathbf x_0
+\bar{\bm x} = \widehat{\mathbf C} \bar{\bm u} + \widehat{\mathbf A} \mathbf x_0.
+\tag{2}
+
We can now substitute this into the original cost, which then becomes independent of \bar{\bm x}, which we reflect by using a new name \tilde J
+\begin{aligned}
+\tilde J(\bar{\bm u};\mathbf x_0) &= \frac{1}{2}(\widehat{\mathbf C} \bar{\bm u} + \widehat{\mathbf A} \mathbf x_0)^\top\overline{\mathbf Q} (\widehat{\mathbf C} \bar{\bm u} + \widehat{\mathbf A} \mathbf x_0) + \frac{1}{2}\bar{\bm u}^\top\overline{\mathbf R} \bar{\bm u} + \frac{1}{2}\mathbf x_0^\top\mathbf Q\mathbf x_0\\
+&= \frac{1}{2}\bar{\bm u}^\top\widehat{\mathbf C}^\top \overline{\mathbf Q} \widehat{\mathbf C} \bar{\bm u} + \mathbf x_0^\top\widehat{\mathbf A}^\top \overline{\mathbf Q} \widehat{\mathbf C} \bar{\bm u} + \frac{1}{2} \mathbf x_0^\top\widehat{\mathbf A}^\top \overline{\mathbf Q} \widehat{\mathbf A} \mathbf x_0 + \frac{1}{2}\bar{\bm u}^\top\overline{\mathbf R} \bar{\bm u} + \frac{1}{2}\mathbf x_0^\top\mathbf Q\mathbf x_0\\
+&= \frac{1}{2}\bar{\bm u}^\top(\widehat{\mathbf C}^\top \overline{\mathbf Q} \widehat{\mathbf C} + \overline{\mathbf R})\bar{\bm u} + \mathbf x_0^\top\widehat{\mathbf A}^\top \overline{\mathbf Q} \widehat{\mathbf C} \bar{\bm u} + \frac{1}{2} \mathbf x_0^\top(\widehat{\mathbf A}^\top \overline{\mathbf Q} \widehat{\mathbf A} + \mathbf Q)\mathbf x_0.
+\end{aligned}
+
+
The last term (the one independent of \bar{\bm u}) does not have an impact on the optimal \bar{\bm u} and therefore it can be discarded, but such minor modification perhaps does no justify a new name for the cost function and we write it as
+\tilde J(\bar{\bm u};\mathbf x_0) = \frac{1}{2}\bar{\bm u}^\top\underbrace{(\widehat{\mathbf C}^\top \overline{\mathbf Q} \widehat{\mathbf C} + \overline{\mathbf R})}_{\mathbf H}\bar{\bm u} + \mathbf x_0^\top\underbrace{\widehat{\mathbf A}^\top \overline{\mathbf Q} \widehat{\mathbf C}}_{\mathbf F^\top} \bar{\bm u}.
+
+
This cost is a function of \bar{\bm u}, the initial state \mathbf x_0 is regarded as a fixed parameter. Its gradient is
+\nabla \tilde J = \mathbf H\bar{\bm u}+\mathbf F\mathbf x_0.
+
+
Setting it to zero leads to the following linear system of equations
+\mathbf H\bar{\bm u}=-\mathbf F\mathbf x_0
+ that needs to be solved for \bar{\bm u}. Formally, we write the solution as
+\bar{\bm u} = -\mathbf H^{-1} \mathbf F \mathbf x_0.
+
+
+
+
+
+
+
+Note
+
+
+
+
Solving linear equations by direct computation of the matrix inverse is not a recommended practice. Use dedicated solvers of linear equations instead. For example, in Matlab use the backslash operator, which invokes the most suitable solver.
+
+
+
+
Adding the constraints on controls
+
Adding constraints on \bar{\bm u} is straightforward. It is just that instead of a linear system we will have a linear system with additional inequality constraints. Let’s get one
+\begin{aligned}
+\operatorname*{minimize}_{\bar{\bm u}} & \quad \frac{1}{2}\bar{\bm u}^T \mathbf H \bar{\bm u} + \mathbf x_0^T\mathbf F^T \bar{\bm u}\\
+\text{subject to} &\quad \bar{\bm u} \leq \bar{\mathbf u}_\mathrm{max}\\
+ &\quad \bar{\bm u} \geq \bar{\mathbf u}_\mathrm{min},
+\end{aligned}
+ which we can rewrite more explicitly (in the matrix-vector format) as
+\begin{aligned}
+\operatorname*{minimize}_{\bar{\bm u}} & \quad \frac{1}{2}\bar{\bm u}^T \mathbf H \bar{\bm u} + \mathbf x_0^T\mathbf F^T \bar{\bm u}\\
+\text{subject to} & \begin{bmatrix}
+ \mathbf{I} & & & \\
+ & \mathbf{I} & & \\
+ & & \ddots & \\
+ & & & \mathbf{I} \\
+ -\mathbf{I} & & & \\
+ & -\mathbf{I} & & \\
+ & & \ddots & \\
+ & & & -\mathbf{I}
+ \end{bmatrix}
+ \begin{bmatrix}
+ \mathbf u_0 \\ \mathbf u_1 \\ \vdots \\ \mathbf u_{N-1}
+ \end{bmatrix}
+ \leq
+ \begin{bmatrix}
+ \mathbf u_\mathbf{max} \\ \mathbf u_\mathrm{max} \\ \vdots \\ \mathbf u_\mathrm{max}\\ -\mathbf u_\mathrm{min} \\ -\mathbf u_\mathrm{min} \\ \vdots \\ -\mathbf u_\mathrm{min}
+ \end{bmatrix}.
+\end{aligned}
+
+
+
+
Adding the constraints on states
+
We might feel a little bit uneasy about loosing an immediate access to \bar{\bm x}. But the game is not lost. We just need to express \bar{\bm x} as a function of \bar{\bm u} and \mathbf x_0 and impose the constraint on the result. But such expression is already available, see (Equation 2). Therefore, we can formulate the constraint, say, an upper bound on the state vector
+\bm x_k \leq \mathbf x_\mathrm{max}
+ as
+\bar{\mathbf x}_\mathrm{min} \leq \widehat{\mathbf C} \bar{\bm u} + \widehat{\mathbf A} \mathbf x_0 \leq \bar{\mathbf x}_\mathrm{max},
+ where the the bars in \bar{\mathbf x}_\mathrm{min} and \bar{\mathbf x}_\mathrm{max} obviously indicates that these vectors were obtained by stacking the corresponding vectors for all times k=1,\ldots,N.
General finite-horizon nonlinear discrete-time optimal control as a nonlinear program
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
In this section we formulate a finite-horizon optimal control problem (OCP) for a discrete-time dynamical system as a mathematical optimization (also mathematical programming) problem, which can then be solved numerically by a suitable solvers for nonlinear programming (NLP), or possibly quadratic programming (QP). The outcome of such numerical optimization is an optimal control trajectory (a sequence of controls), which is why this approach is called direct – we optimize directly over the trajectories.
+
In the following chapter we then present an alternative – indirect – approach, wherein the conditions of optimality are formulated first. These come in the form of a set of equations, some of them recurrent/recursive, some just algebraic. The indirect approach amounts to solving such equations.
+
And then in another chapter we present the third approach – dynamic programming.
+
The three approaches form the backbone of the theory of optimal control for discrete-time systems, but later we are going to recognize the same triplet in the context of continuous-time systems.
+
+
+
+
+
+
+
+
But now back to the direct approaches. We will start with a general nonlinear discrete-time optimal control problem in this section and then specialize to the linear quadratic regulation (LQR) problem in the next section. Finally, since the computed control trajectory constitutes an open-loop control scheme, something must be done about it if feedback scheme is preferred – we introduce the concept of a receding horizon control (RHC), perhaps bettern known as model predictive control (MPC), that turns the direct approach into a feedback control scheme.
+
We start by considering a nonlinear discrete-time system modelled by the state equation
+\bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),
+ where
+
+
\bm x_k\in \mathbb R^n is the state at the discrete time k\in \mathbb Z,
+
\bm u_k\in \mathbb R^m is the control at the discrete time k,
+
\mathbf f_k: \mathbb{R}^n \times \mathbb{R}^m \times \mathbb Z \to \mathbb{R}^n is a state transition function (in general not only nonlinear but also time-varying, with the convention that the dependence on k is expressed through the lower index).
+
+
A general nonlinear discrete-time optimal control problem (OCP) is then formulated as
+\begin{aligned}
+\operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\
+\text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\
+ &\quad \bm u_k \in \mathcal U_k,\quad k=i, \ldots, N-1,\\
+ &\quad \bm x_k \in \mathcal X_k,\quad k=i, \ldots, N,
+\end{aligned}
+ where
+
+
i is the initial discrete time,
+
N is the final discrete time,
+
\phi() is a terminal cost function that penalizes the state at the final time,
+
L_k() is a running (also stage) cost function,
+
and \mathcal U_k and \mathcal X_k are sets of feasible controls and states – these sets are typically expressed using equations and inequalities. Should they be constant, the notation is just \mathcal U and \mathcal X.
+
+
Oftentimes it is convenient to handle the constraints of the initial and final states separately:
+\begin{aligned}
+\operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\
+\text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\
+ &\quad \bm u_k \in \mathcal U_k,\quad k=i, \ldots, N-1,\\
+ &\quad \bm x_k \in \mathcal X_k,\quad k=i+1, \ldots, N-1,\\
+ &\quad \bm x_i \in \mathcal X_\mathrm{init},\\
+ &\quad \bm x_N \in \mathcal X_\mathrm{final}.
+\end{aligned}
+
+
In particular, at the initial time just one particular state is often considered. At the final time, the state might be required to be equal to some given value, it might be required to be in some set defined through equations or inequalities, or it might be left unconstrained. Finally, the constraints on the control and states typically (but not always) come in the form of lower and upper bounds. The optimal control problem then specializes to
+\begin{aligned}
+\operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\
+\text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\
+ &\quad \bm u_{\min} \leq \bm u_k \leq \bm u_{\max},\\
+ &\quad \bm x_{\min} \leq \bm x_k \leq \bm x_{\max},\\
+ &\quad\bm x_i = \mathbf x^\text{init},\\
+ &\quad \left(\bm x_N = \mathbf x^\text{ref}, \; \text{or} \; \mathbf h_\text{final}(\bm x_N) = \mathbf 0, \text{or} \; \mathbf g_\text{final}(\bm x_N) \leq \mathbf 0\right),
+\end{aligned}
+ where
+
+
the inequalities should be interpreted componentwise,
+
\bm u_{\min} and \bm u_{\max} are lower and upper bounds on the control, respectively,
+
\bm x_{\min} and \bm x_{\max} are lower and upper bounds on the state, respectively,
+
\mathbf x^\text{init} is a fixed initial state,
+
\mathbf x^\text{ref} is a required (reference) final state,
+
and the functions \mathbf g_\text{final}() and \mathbf h_\text{final}() can be used to define the constraint set for the final state.
+
+
This optimal control problem is an instance of a general nonlinear programming (NLP) problem
+\begin{aligned}
+\operatorname*{minimize}_{\bar{\bm x}\in\mathbb{R}^{n(N-i)},\bar{\bm u}\in\mathbb{R}^{m(N-i)}} &\quad J(\bar{\bm x},\bar{\bm u})\\
+\text{subject to} &\quad \mathbf h(\bar{\bm x},\bar{\bm u}) =0,\\
+&\quad \mathbf g(\bar{\bm x},\bar{\bm u}) \leq \mathbf 0,
+\end{aligned}
+ where \bar{\bm u} and \bar{\bm x} are vectors obtained by stacking control and state vectors for individual times
Althought there may be applications where it is desirable to optimize over the initial state \bm x_i as well, mostly the initial state \bm x_i is fixed, and it does not have to be considered as an optimization variable. This can be even emphasized through the notation J(\bar{\bm x},\bar{\bm u}; \bm x_i), where the semicolon separates the variables from (fixed) parameters.
+
The last control that affects the state trajectory on the interval [i,N] is \bm u_{N-1}.
Deficiencies of precomputed (open-loop) optimal control
+
In the previous section we learnt how to compute an optimal control sequence on a finite time horizon using numerical methods for solving nonlinear programs (NLP), and quadratic programs (QP) in particular. There are two major deficiencies of such approach:
+
+
The control sequence was computed under the assumption that the mathematical model is perfectly accurate. As soon as the reality deviates from the model, either because of some unmodelled dynamics or because of the presence of (external) disturbances, the performance of the system will deteriorate. We need a way to turn the presented open-loop (also feedforward) control scheme into a feedback one.
+
The control sequence was computed for a finite time horizon. It is commonly required to consider an infinite time horizon, which is not possible with the presented approach based on solving finite-dimensional mathematical programs.
+
+
There are several ways to address these issues. Here we introduced one of them. It is knowns are Model Predictive Control (MPC), or also Receding Horizon Control (RHC). Some more are presented in the next two sections (one based on indirect approach, another one based on dynamic programming).
+
+
+
Model predictive control (MPC) as a way to turn open-loop control into feedback control
+
The idea is to compute an optimal control sequence on a finite time horizon using the material presented in the previous section, apply only the first control action to the system, and then repeat the procedure upon shifting the time horizon by one time step.
+
Although this name “model predictive control” is commonly used in the control community, the other – perhaps a bit less popular – name “receding horizon control” is equally descriptive, if not even a bit more.
+
+
+
+
+
+
+Note
+
+
+
+
It may take a few moments to digest the idea, but it is actually quite natural. As a matter of fact, this is the way most of us control our lifes every day. We plan our actions on a finite time horizon, and while building this plan we use our understanding (model) of the world. We then perform the first action from our plan, observe the impact of our action and possibly a change in the environment, and update our plan accordingly on a new (shifted) time horizon. We repeat this procedure over and over again. It is crucial that the prediction horizon must be long enough so that the full impact of our actions can be observed.
+Gros, Sebastien, and Moritz Diehl. 2022. “Numerical Optimal Control (Draft).” Systems Control; Optimization Laboratory IMTEK, Faculty of Engineering, University of Freiburg. https://www.syscop.de/files/2020ss/NOC/book-NOCSE.pdf.
+
+Ellis, Matthew, Helen Durand, and Panagiotis D. Christofides. 2014. “A Tutorial Review of Economic Model Predictive Control Methods.”Journal of Process Control, Economic nonlinear model predictive control, 24 (8): 1156–78. https://doi.org/10.1016/j.jprocont.2014.03.010.
+
+
+Ellis, Matthew, Jinfeng Liu, and Panagiotis D. Christofides. 2017. Economic Model Predictive Control: Theory, Formulations and Chemical Process Applications. Advances in Industrial Control. Cham: Springer. https://doi.org/10.1007/978-3-319-41108-8.
+
+
+Faulwasser, Timm, Lars Grüne, and Matthias A. Müller. 2018. “Economic Nonlinear Model Predictive Control.”Foundations and Trends in Systems and Control 5 (1): 1–98. https://doi.org/10.1561/2600000014.
+
+
+Rawlings, James B., David Angeli, and Cuyler N. Bates. 2012. “Fundamentals of Economic Model Predictive Control.” In 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 3851–61. https://doi.org/10.1109/CDC.2012.6425822.
+
+Alessio, Alessandro, and Alberto Bemporad. 2009. “A Survey on Explicit Model Predictive Control.” In Nonlinear Model Predictive Control: Towards New Challenging Applications, edited by Lalo Magni, Davide Martino Raimondo, and Frank Allgöwer, 345–69. Lecture Notes in Control and Information Sciences. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-01094-1_29.
+
+
+Bemporad, A., F. Borrelli, and M. Morari. 2002. “Model Predictive Control Based on Linear Programming - the Explicit Solution.”IEEE Transactions on Automatic Control 47 (12): 1974–85. https://doi.org/10.1109/TAC.2002.805688.
+
Essentially QP solvers with some extra features for MPC:
+
+
warmstarting requires fesibility of the previous solution. If only a fixed number of iterations is allowed (in favor of predictable timing), for some methods the iterations may temporarily lose feasibility.
The crucial message of this chapter — the concept of model predictive control (MPC) — has been described in a number of dedicated monographs and textbooks. Particularly recommendable are (Rawlings, Mayne, and Diehl 2017) and (Borrelli, Bemporad, and Morari 2017). They are not only reasonably up-to-date, written by leaders in the field, but they are also available online.
Since MPC essentially boils down to solving optimization problems in real time on some industrial device, the topic of embedded optimization is important. Nice overview is given in (Ferreau et al. 2017).
+Ferreau, H. J., S. Almér, R. Verschueren, M. Diehl, D. Frick, A. Domahidi, J. L. Jerez, G. Stathopoulos, and C. Jones. 2017. “Embedded Optimization Methods for Industrial Automatic Control.”IFAC-PapersOnLine, 20th IFAC World Congress, 50 (1): 13194–209. https://doi.org/10.1016/j.ifacol.2017.08.1946.
+
+
+Grüne, Lars, and Jürgen Pannek. 2017. Nonlinear Model Predictive Control: Theory and Algorithms. 2nd ed. Communications and Control Engineering. Cham: Springer. https://doi.org/10.1007/978-3-319-46024-6.
+
We consider a linear time-invariant (LTI) system described by the state equation
+\bm x_{k+1} = \mathbf A \bm x_{k} + \mathbf B \bm u_k, \qquad \bm x_0 = \mathbf x_0,
+ and our goal is to find a (vector) control sequence \bm u_0, \bm u_{1},\ldots, \bm u_{N-1} that minimizes
+J_0^N = \frac{1}{2}\bm x_N^\top\mathbf S_N\bm x_N + \frac{1}{2}\sum_{k=0}^{N-1}\left[\bm x_k^\top \mathbf Q \bm x_k+\bm u_k^\top \mathbf R\bm u_k\right],
+ where the quadratic cost function is parameterized the matrices that must be symmetric and at least positive semidefinite, otherwise the corresponding quadratic terms will not play a good role of penalizing the (weighted) distance from zero.
+
+
+
+
+
+
+Regulation vs tracking
+
+
+
+
Indeed, with the current setup our goal is to bring the state to zero and keep the control effort as small as possible. This control problem is called regulation. Later we are going to extend this into the problem of tracking a nonzero reference state (or even output, after adding the output variables into the game) trajectory.
+
+
+
We will see in a moment that the matrix \mathbf R must comply with an even stricter condition – it must be positive definite. To summarize the assumptions about the matrices, we require
+\mathbf S_N\succeq 0, \mathbf Q\succeq 0, \mathbf R\succ 0.
+
+
+
+
+
+
+
+Time-invariant systems can start at time zero
+
+
+
+
For time-invariant systems we can set the initial time to zero, that is, i=0, without loss of generality.
+
+
+
The Hamiltonian for our problem is
+\boxed{
+H(\bm x_k, \bm u_k, \bm \lambda_{k+1}) = \frac{1}{2}\left(\bm x_k^\top \mathbf Q\bm x_k+\bm u_k^\top \mathbf R\bm u_k\right) + \boldsymbol \lambda_{k+1}^\top\left(\mathbf A\bm x_k+\mathbf B\bm u_k\right).
+}
+
+
In the following derivations we use the shorthand notation H_k for H(\bm x_k, \bm u_k, \bm \lambda_{k+1}).
The last two equations represent boundary conditions. Note that here we have already fixed the initial state. If this is not appropriate in a particular scenario, go back and adjust the boundary equation accordingly.
+
The third equation above – the stationarity equation – can be used to extract the optimal control
+\bm u_k = -\mathbf R^{-1}\mathbf B^\top\boldsymbol\lambda_{k+1}.
+
+
The need for nonsingularity of \mathbf R is now obvious. Upon substituting the recipe for the optimal \bm u_k into the state and the co-state equations, two recursive (or recurrent or just discrete-time) equations result
+\begin{bmatrix}
+\mathbf x_{k+1}\\\boldsymbol\lambda_k
+\end{bmatrix}
+=
+\begin{bmatrix}
+\mathbf A & -\mathbf B\mathbf R^{-1}\mathbf B^\top\\\mathbf Q & \mathbf A^\top
+\end{bmatrix}
+\begin{bmatrix}
+\bm x_k \\ \boldsymbol\lambda_{k+1}
+\end{bmatrix}.
+
+
This is a two-point boundary value problem (TP-BVP). The problem is of order 2n, where n is the dimension of the state space. In order to solve it we need 2n boundary values: n boundary values are provided by \bm x_i = \mathbf x_0, and n boundary values are given by the other boundary condition, from which \boldsymbol\lambda_N must be extracted. Most of our subsequent discussion will revolve around this task.
+
An idea might come into our mind: provided \mathbf A is nonsingular, we can left-multiply the above equation by the inverse of \mathbf A to obtain
+\begin{bmatrix}
+\mathbf x_{k}\\\boldsymbol\lambda_k
+\end{bmatrix}
+=
+\begin{bmatrix}
+\mathbf A^{-1} & \mathbf A^{-1}\mathbf B\mathbf R^{-1}\mathbf B^\top\\\mathbf Q\mathbf A^{-1} & \mathbf A^\top+\mathbf Q\mathbf A^{-1}\mathbf B\mathbf R^{-1}\mathbf B^\top
+\end{bmatrix}
+\begin{bmatrix}
+\mathbf x_{k+1} \\ \boldsymbol\lambda_{k+1}
+\end{bmatrix}
+\tag{1}
+
This helped at least to have both variable evolving in the same direction in time (both backward) but we do not know \boldsymbol\lambda_N anyway. Nonetheless, do not forget this result. We are going to invoke it later.
+
+
Zero-input case and discrete-time (algebraic) Lyapunov equation
+
Before we delve into solution of the original problem, let us investigate a somewhat artificial problem when no control input is applied. We compute the cost of not controlling the system at all. This will give us some valuable insight.
+
We start by evaluating the cost of starting at the terminal time N and then proceed backwards in time, that is, decrease the initial time to to N-1, N-2 and so on. For simplicity of notation we omit the upper index in the cost function, since the final time remains the same throughout the computation. But we do use the lower index here
+\begin{aligned}
+J_N &= \frac{1}{2}\bm x_N^\top \mathbf S_N\bm x_N\\
+J_{N-1} &= \frac{1}{2}\bm x_N^\top\mathbf S_N\bm x_N + \frac{1}{2}\mathbf x_{N-1}^\top \mathbf Q_{N-1}\mathbf x_{N-1}\\
+ &= \frac{1}{2}\mathbf x_{N-1}^\top\left(\mathbf A^\top \mathbf S_NA+\mathbf Q\right)\mathbf x_{N-1}\\
+J_{N-2} &=\ldots
+\end{aligned}
+
+
Upon introducing a new name \mathbf S_{N-1} for term \mathbf A^\top \mathbf S_N \mathbf A+\mathbf Q and similarly for all preceding times, we arrive at a discrete-time Lyapunov equation
+\boxed{\mathbf S_{k} = \mathbf A^\top \mathbf S_{k+1}\mathbf A+\mathbf Q.}
+
+
This is a very famous and well-investigated equation in systems and control theory. Its solution is given by
+\mathbf S_k = (\mathbf A^\top)^{N-k}\mathbf S_N\mathbf A^{N-k} + \sum_{i=k}^{N-1}\left(\mathbf A^\top \right)^{N-i-1}\mathbf Q\mathbf A^{N-i-1}.
+
+
Having the sequence of \mathbf S_k at hand, the cost function when starting at time k (and finishing at time N) can be readily evaluated as
+J_k = \frac{1}{2}\bm x_k^\top \mathbf S_k\bm x_k.
+
+
We will come back to this observation in a few moments. Before we do that, note that if the plant is stable, the cost over [-\infty,N] is finite and is given by
+J_{-\infty}^N = \frac{1}{2}\bm x_{-\infty}^\top \mathbf S_{-\infty} \bm x_{-\infty},
+ or, equivalently (thanks to the fact that the system is time-invariant) – and perhaps even more conveniently – we can introduce a new time k' = N-k, which then ranges over [0,\infty] as k goes from N to -\infty
+J_0^\infty = \frac{1}{2}\bm x_0^\top \mathbf S_\infty \bm x_0.
+
+
Even though this result was derived for the no-control case, which is not what we are after in the course on control design, it is still useful. It gives some hint as for the structure of the cost function. Indeed, we will see later that even in the case of nonzero control, the cost will be a quadratic function of the initial state.
+
When it comes to the computation of \mathbf S_\infty, besides the implementation of the limiting iterative process, we may exploit the fact that in the steady state
+\mathbf S_k = \mathbf S_{k+1},
+ which turns the difference Lyapunov equation into the even more famous algebraic Lyapunov equation (ALE)
+\boxed{\mathbf S = \mathbf A^\top \mathbf S\mathbf A+\mathbf Q.}
+
+
Notoriously known facts about this equation (studied in introductory courses on linear systems) are
+
+
If \mathbf A stable and \mathbf Q\succeq 0, then there is a solution to the ALE satisfying \mathbf S\succeq 0.
+
If \mathbf A stable and (\mathbf A,\sqrt{\mathbf Q}) observable, then there is a {unique} solution to ALE satisfying \mathbf S\succ 0.
+
+
If the system is unstable, the cost can be finite or infinite, depending on \mathbf Q. As a trivial example, for \mathbf Q=0, the cost will stay finite — exactly zero — disregarding the system blowing out.
+
Concerning methods for numerical solution, ALE is just a linear equation and as such can be reformulated into the standard \mathbf A \bm x=\mathbf b form (using a trick based on Kronecker product, see kron in Matlab). Specialized algorithms exist and some of them are implemented in dlyap function in Matlab. State-of-the-art Julia implementations are in MatrixEquations.jl package.
+
+
+
Fixed final state and finite time horizon
+
Back to the nonzero control case. First we are going to investigate the scenario when the final requested state is given by \mathbf x^\text{ref}. The optimal control problem turns into
+\begin{aligned}
+\operatorname*{minimize}_{\bm x_0, \bm{x}_{1},\ldots,\bm{x}_{N},\bm{u}_{0},\ldots,\bm{u}_{N-1}} &\; \frac{1}{2}\sum_{k=0}^{N-1}\left[\bm x_k^T \mathbf Q \bm x_k+\bm u_k^T \mathbf R\bm u_k\right]\\
+\text{s.t. } & \; \mathbf x_{k+1} = \mathbf A \mathbf x_{k} + \mathbf B \bm u_k,\\
+&\; \bm x_0 = \mathbf x_0,\\
+&\; \bm x_N = \mathbf x^\text{ref},\\
+&\; \mathbf Q\geq 0, \mathbf R>0.
+\end{aligned}
+
+
+
Note also that the term penalizing the final state is removed from the cost because it is always fixed. After eliminating the controls using the stationarity equation
+\bm u_k = -\mathbf R^{-1}\mathbf B^\top\boldsymbol\lambda_{k+1},
+ and replacing the general boundary condition at the final time by \bm x_N = \mathbf x^\text{ref}, the two-point boundary value problem specializes to
+\begin{aligned}
+\mathbf x_{k+1} &=\mathbf A\bm x_k-\mathbf B\mathbf R^{-1}\mathbf B^\top\boldsymbol\lambda_{k+1},\\
+\boldsymbol\lambda_k &= \mathbf Q\bm x_k+\mathbf A^\top\boldsymbol\lambda_{k+1},\\
+\bm x_0 &= \mathbf x_0,\\
+\bm x_N &= \mathbf x^\text{ref}.
+\end{aligned}
+
+
This problem is clearly an instance of a two-point boundary value problem (TP-BVP) as the state vector is specified at both ends of the time interval. The costate is left unspecified, but it is fine because only 2n boundary conditions are needed. While BVP are generally difficult to solve, our problem at hand adds one more layer of complexity. For the state variable its evolution forward in time is specified by the state equation, while for the co-state variable the evolution backward in time is prescribed by the co-state equation.
There is not much we can do with these equations in this form. However, in case of a nonsingular matrix \mathbf A, we can invoke the discrete-time Hamiltonian system (Equation 1), in which we reorganized the equations so that both state and co-state variables evolve backwards. For convenience we give it here again
+\begin{bmatrix}
+\mathbf x_{k}\\\boldsymbol\lambda_k
+\end{bmatrix}
+=\underbrace{
+\begin{bmatrix}
+\mathbf A^{-1} & \mathbf A^{-1}\mathbf B\mathbf R^{-1}\mathbf B^\top\\\mathbf Q\mathbf A^{-1} & \mathbf A^\top+\mathbf Q\mathbf A^{-1}\mathbf B\mathbf R^{-1}\mathbf B^\top
+\end{bmatrix}}_{\mathbf H}
+\begin{bmatrix}
+\mathbf x_{k+1} \\ \boldsymbol\lambda_{k+1}.
+\end{bmatrix}
+
+
This can be used to relate the state and costate at the initial and final times of the interval
+\begin{bmatrix}
+\mathbf x_{0}\\\boldsymbol\lambda_0
+\end{bmatrix}
+=\underbrace{
+\begin{bmatrix}
+\mathbf A^{-1} & \mathbf A^{-1}\mathbf B\mathbf R^{-1}\mathbf B^\top\\\mathbf Q\mathbf A^{-1} & \mathbf A^\top+\mathbf Q\mathbf A^{-1}\mathbf B\mathbf R^{-1}\mathbf B^\top
+\end{bmatrix}^N}_{\mathbf M\coloneqq \mathbf H^N}
+\begin{bmatrix}
+\mathbf x_{N} \\ \boldsymbol\lambda_{N}
+\end{bmatrix}.
+
+
From the first equation we can get \boldsymbol \lambda_N. First, let’s rewrite it here
+\mathbf M_{12}\boldsymbol \lambda_N = \bm x_0-\mathbf M_{11}\bm x_N,
+ from which (after substituting for the known initial and final states)
+\boldsymbol \lambda_N = \mathbf M_{12}^{-1}(\mathbf r_0-\mathbf M_{11}\mathbf r_N).
+
+
Having the final state and the final co-state, \bm x_N and \boldsymbol \lambda_N, respectively, we can solve the Hamiltonian system backward to get the states and co-states on the whole time interval [0,N-1].
+
+
Special case: minimum-energy control (\mathbf Q = \mathbf 0)
+
We can get some more insight into the problem if we further restrict the class of problems we can treat. Namely, we will assume
+\mathbf Q = \mathbf 0.
+
+
This is a significant restriction, nonetheless the resulting problem is still practically reasonable. And we do not need to assume that \mathbf A is nonsingular. The cost function is then
+J = \sum_{k=0}^N \mathbf u^\top_k\;\bm u_k = \sum_{k=0}^N \|\mathbf u\|_2^2,
+ which is why the problem is called the minimum-energy control problem. Rewriting the state and co-state equations with the new restriction \mathbf Q=\mathbf 0 we get
+\begin{aligned}
+\bm x_{k+1} &= \mathbf A\bm x_k - \mathbf B\mathbf R^{-1}\mathbf B^\top\boldsymbol\lambda_{k+1}\\
+\boldsymbol \lambda_k &= \mathbf A^\top\boldsymbol\lambda_{k+1}.
+\end{aligned}
+
+
It is obvious why we wanted to enforce the \mathbf Q=\mathbf 0 restriction — the co-state equation is now completely decoupled from the state equation and can be solved independently
+\boldsymbol \lambda_k = (\mathbf A^\top)^{N-k}\boldsymbol \lambda_N.
+
+
Now substitute this solution of the co-state equation into the state equation
+\bm x_{k+1} = \mathbf A\bm x_k - \mathbf B\mathbf R^{-1}\mathbf B^\top(\mathbf A^\top)^{N-k-1}\boldsymbol \lambda_N.
+
+
Finding a solution to the state equation is now straightforward — the second summand on the right is considered as a an “input”. The solution is then
+\bm x_{k} = \mathbf A^k\bm x_0 - \sum_{i=0}^{k-1}\mathbf A^{k-1-i}\mathbf B\mathbf R^{-1}\mathbf B^\top(\mathbf A^\top)^{N-i-1}\boldsymbol \lambda_N.
+
+
The last step reveals the motivation for all the previous steps — we can now express the state at the final time, and by doing that we introduce some known quantity into the problem
+\bm x_{N} = \mathbf x^\text{ref}= \mathbf A^N\bm x_0 - \underbrace{\sum_{i=0}^{N-1}\mathbf A^{N-1-i}\mathbf B\mathbf R^{-1}\mathbf B^\top(\mathbf A^\top)^{N-i-1}}_{G_{0,N,R}}\boldsymbol \lambda_N.
+
+
This enables us to calculate \boldsymbol \lambda_N directly as a solution to a linear equation. To make the notation simpler, denote the sum in the expression above by \mathbf G_{0,N,R} (we will discuss this particular object in a while)
+\boldsymbol \lambda_N = -\mathbf G^{-1}_{0,N,R}\; (\mathbf x^\text{ref}-\mathbf A^N\bm x_0).
+
+
The rest is quite straightforward as the optimal control depends (through the stationarity equation) on the co-state
+\boxed{
+\bm u_k = \mathbf R^{-1}\mathbf B^\top(\mathbf A^\top)^{N-k-1}\mathbf G^{-1}_{0,N,R}\; (\mathbf x^\text{ref}-\mathbf A^N\bm x_0).
+}
+
+
This is the desired formula for computation of the optimal control.
+
A few observations can be made
+
+
The control is proportional to the difference (\mathbf x^\text{ref}-\mathbf A^N\bm x_0). The intuitive interpretation is that the further the requested final state is from the state into which the system would finally evolve without any control, the higher the control is needed.
+
+
The control is proportional to the inverse of a matrix \mathbf G_{0,N,R} which is called weighted reachability Gramian. The standard result from the theory of linear dynamic systems is that nonsingularity of a reachability Gramian is equivalent to reachability of the system. More on this below.
+
+
+
Weighted reachability Gramian
+
Recall (perhaps from your linear systems course) that there is a matrix called discrete-time reachability Gramian defined as
+\mathbf G = \sum_{k=0}^{\infty} \mathbf A^{k}\mathbf B\mathbf B^\top(\mathbf A^\top)^k
+ and the nonsingularity of this matrix serves as a test of reachability for stable discrete-time linear systems.
+
How does this classical object relate to the object \mathbf G_{0,N,R} introduced in the previous paragraph? First consider the restriction of the summation from the infinite interval [0,\infty] to [0,N-1]. In other words, we analyze the matrix
+\mathbf G_{0,N} = \sum_{k=0}^{N-1} \mathbf A^{N-1-k}\mathbf B\mathbf B^\top(\mathbf A^\top)^{N-1-k}.
+
+
Recall that Caley-Hamilton theorem tells us that every higher power of an N\times N matrix can be expressed as a linear combination of powers of 0 through N-1. In other words, using higher order powers of A than N-1 cannot increase the rank of the matrix.
+
Finally, provided \mathbf R is nonsingular (hence \mathbf R^{-1} is nonsingular as well), the rank of the Gramian is not changed after introducing the weight
+
+\mathbf G_{0,N,R} = \sum_{k=0}^{N-1} \mathbf A^{N-1-k}\mathbf B\mathbf R^{-1}\mathbf B^\top(\mathbf A^\top)^{N-1-k}.
+
+
The weighted Gramian defined on a finite discrete-time horizon is invertible if and only if the (stable) system is reachable. This conclusion is quite natural: if an optimal control is to be found, first it must be guaranteed that any control can be found which brings the system from an arbitrary initial state into an arbitrary final state on a finite time interval — the very definition of reachability.
+
To summarize the whole fixed-final state case, the optimal control can be computed numerically by solving a TP-BVP. For the minimum-problem even a formula exists and there is no need for a numerical optimization solver. But the outcome is always just a sequence of controls. In this regard, the new (indirect) approach did not offer much more that what the direct approach did. Although the new insight is rewarding, it is paid for by the inability to handle constraints on the control or state variables.
+
+
+
+
+
Free final state and finite time horizon
+
The previous discussion revolved around the task of bringing the system to a given final state exactly. What if we relax this strict requirement and instead just request that the system be eventually brought to the close vicinity of the requested state? How close — this could be affected by the terminal state penalty in the cost function.
+
+
The only change with respect to the previous development is just in the boundary condition — the one at the final time. Now the final state \bm x_N can also be used as a parameter for our optimization. Hence \text{d}\bm x_N\neq 0 and the other term in the product must vanish. We write down again the full necessary conditions including the new boundary conditions
+\begin{aligned}
+\bm x_{k+1} &=\mathbf A\bm x_k-\mathbf B\mathbf R^{-1}\mathbf B^\top\boldsymbol\lambda_{k+1},\\
+\boldsymbol\lambda_k &= \mathbf Q\bm x_k+\mathbf A^\top\boldsymbol\lambda_{k+1},\\
+\bm u_k &= -\mathbf R^{-1}\mathbf B^\top\boldsymbol\lambda_{k+1},\\
+\mathbf S_N \bm x_N &= \boldsymbol \lambda_N,\\
+\bm x_0 &= \mathbf x_0.
+\end{aligned}
+
+
We find ourselves in a pretty much similar trouble as before. The final-time boundary condition refers to the variables whose values we do not know. The solution is provided by the insightful guess, namely, why not trying to extend the linear relationship between the state and the co-state at the final time to all preceding discrete times? That is, we assume
+\mathbf S_k \bm x_k = \boldsymbol \lambda_k.
+\tag{2}
+
At first, we can have no idea if it works. But let’s try it and see what happens. Substitute (Equation 2) into the state and co-state equations. We start with the state equation
+\bm x_{k+1} =\mathbf A\bm x_k-\mathbf B\mathbf R^{-1}\mathbf B^\top\mathbf S_{k+1}\bm x_{k+1}.
+
Now perform the same substitution into the co-state equation
+\mathbf S_k \bm x_k = \mathbf Q\bm x_k+\mathbf A^\top\mathbf S_{k+1}\bm x_{k+1},
+ and substitute for \bm x_{k+1} from the state equation into the previous equation to get
+\mathbf S_k \bm x_k = \mathbf Q\bm x_k+\mathbf A^\top\mathbf S_{k+1}(\mathbf I+\mathbf B\mathbf R^{-1}\mathbf B^\top\mathbf S_{k+1})^{-1}\mathbf A\bm x_k.
+
+
Since this equation must hold for an arbitrary \bm x_k, we get an equation in the matrices \mathbf S_k
+\boxed{
+\mathbf S_k = \mathbf Q+\mathbf A^\top\mathbf S_{k+1}(\mathbf I+\mathbf B\mathbf R^{-1}\mathbf B^\top\mathbf S_{k+1})^{-1}\mathbf A.
+}
+
+
This is a superfamous equation and is called difference (or discrete-time) Riccati equation. When initialized with \mathbf S_N, it generates the sequence of matrices \mathbf S_{N-1}, \mathbf S_{N-2}, \mathbf S_{N-3},\ldots Indeed, a noteworthy feature of this sequence is that it is initialized at the final time and the equation prescribes how the sequence evolves backwards.
+
Once we have generated a sufficiently long sequence (down to \mathbf S_{1}), the optimal control sequence \bm u_0, \bm u_1, \ldots, \bm u_{N-1} is then computed using the stationary equation
+\bm u_k = -\mathbf R^{-1}\mathbf B^\top\boldsymbol\lambda_{k+1}=-\mathbf R^{-1}\mathbf B^\top\mathbf S_{k+1}\bm x_{k+1}.
+
+
This suggests that the optimal control is generated using the state but the current scheme is noncausal because the control at a given time depends on the state at the next time. But turning this into a causal one is easy — just substitute the state equation for \bm x_{k+1} and get
+\bm u_k =-\mathbf R^{-1}\mathbf B^\top\mathbf S_{k+1}(\mathbf A\bm x_{k}+\mathbf B\bm u_{k}).
+
+
Solving this equation for \bm u_k gives
+\bm u_k = -\underbrace{(\mathbf I + \mathbf R^{-1}\mathbf B^\top\mathbf S_{k+1}\mathbf B)^{-1}\mathbf R^{-1}\mathbf B^\top\mathbf S_{k+1}\mathbf A}_{\mathbf K_k}\mathbf x_{k}.
+
+
Mission accomplished. This is our desired control. A striking observation is that although we made no specifications as for the controller structure, the optimal control strategy turned out a feedback one! Let’s write it down explicitly
+\boxed{
+\bm u_k = -\mathbf K_k \bm x_{k}.
+}
+
+
+
+
+
+
+
+LQ-optimal control on a finite time horizon with a free final state is a feedback control
+
+
+
+
The importance of this result can hardly be overstated – the optimal control comes in the form of a proportional state-feedback control law.
+
+
+
The feedback gain is time-varying and deserves a name after its inventor — Kalman gain. Incorporating the knowledge that \mathbf R is nonsingular, a minor simplification of the lengthy expression can be made
+\mathbf K_k = (\mathbf R + \mathbf B^\top\mathbf S_{k+1}\mathbf B)^{-1}\mathbf B^\top\mathbf S_{k+1}\mathbf A.
+\tag{3}
+
Before we move on, let us elaborate a bit more on the difference Riccati equation. Invoking a popular (but hard to reliably memorize) rule for inversion of a sum of two matrices called matrix inversion lemma, which reads
+(\mathbf A_{11}^{-1}+\mathbf A_{12}\mathbf A_{22}\mathbf A_{21})^{-1} =\mathbf A_{11}-\mathbf A_{11}\mathbf A_{12}(\mathbf A_{21}\mathbf A_{11}\mathbf A_{12}+\mathbf A_{22}^{-1})^{-1}\mathbf A_{21}\mathbf A_{11},
+ the Riccati equation can be rewritten (after multiplying the brackets out) as
+\boxed{
+\mathbf S_k = \mathbf Q + \mathbf A^\top\mathbf S_{k+1}\mathbf A - \mathbf A^\top\mathbf S_{k+1}\mathbf B( \mathbf B^\top\mathbf S_{k+1}\mathbf B+\mathbf R)^{-1}\mathbf B^\top\mathbf S_{k+1}\mathbf A,
+}
+ which we will regard as an alternative form of difference Riccati equation.
+
Observing that the steps of the computation of the Kalman gain \mathbf K_k reappear in the computation of the solution of the Riccati equation, a more efficient arrangement of the computation in every iteration step is
+\boxed{
+\begin{aligned}
+\mathbf K_k &= \left(\mathbf B^\top \mathbf S_{k+1}\mathbf B+\mathbf R\right)^{-1}\mathbf B^\top \mathbf S_{k+1}\mathbf A\\
+\mathbf S_k &= \mathbf A^\top \mathbf S_{k+1}(\mathbf A-\mathbf B\mathbf K_k) + \mathbf Q.
+\end{aligned}
+}
+
+
Finally, yet another equivalent version of Riccati equation is known as Joseph stabilized form of Riccati equation
+\boxed{
+\mathbf S_k = (\mathbf A-\mathbf B\mathbf K_k)^\top \mathbf S_{k+1}(\mathbf A-\mathbf B\mathbf K_k) + \mathbf K_k^\top \mathbf R\mathbf K_k + \mathbf Q.
+}
+\tag{4}
+
Showing the equivalence can be an exercise.
+
+
Second order sufficient conditions
+
So far we only found a solution that satisfies the first-order necessary equation but we have been warned at the introductory lessons to optimization that such solution need not necessarily constitute an optimum (minimum in our case). In order to check this, the second derivative (Hessian, curvature matrix) must be found and checked for positive definiteness. Our strategy will be to find the value of the optimal cost first and then we will identify its second derivative with respect to \bm u_k.
+
The trick to find the value of the optimal cost is from (Lewis, Vrabie, and Syrmo 2012) and it is rather technical and it may be hard to learn a general lesson from it. Nonetheless we will need the result. Therefore we swiftly go through the procedure without pretending that we are building a general competence. The trick is based on the observation that
+\frac{1}{2}\sum_{k=0}^{N-1}(\mathbf x^\top _{k+1}\mathbf S_{k+1} \mathbf x_{k+1} - \mathbf x^\top _{k}\mathbf S_{k} \mathbf x_{k}) = \frac{1}{2}\mathbf x^\top _{N}\mathbf S_{N} \mathbf x_{N} - \frac{1}{2}\mathbf x^\top _{0}\mathbf S_{0} \mathbf x_{0}.
+
+
Now consider our optimization criterion and add zero to it. The value of the cost function does not change. Weird procedure, right? Observing that zero can also be expressed as the right hand side minus the left hand side in the above equation, we get
+J_0 = \frac{1}{2}\bm x_0^\top\mathbf S_0\bm x_0 + \frac{1}{2}\sum_{k=0}^{N-1}\left[\mathbf x^\top _{k+1}\mathbf S_{k+1} \mathbf x_{k+1}+\bm x_k^\top (\mathbf Q - \mathbf S_k) \bm x_k+\bm u_k^\top \mathbf R\bm u_k\right].
+
+
Substituting the state equation, the cost function transforms to
+\begin{aligned}
+J_0 &= \frac{1}{2}\bm x_0^\top\mathbf S_0\bm x_0 + \frac{1}{2}\sum_{k=0}^{N-1}[\mathbf x^\top _{k}(\mathbf A^\top \mathbf S_{k+1}\mathbf A + \mathbf Q - \mathbf S_k) \mathbf x_{k}+\bm x_k^\top \mathbf A^\top \mathbf S_{k+1}\mathbf B \bm u_k\\
+&\qquad\qquad\qquad\qquad+\bm u_k^\top \mathbf B^\top \mathbf S_{k+1}\mathbf A \bm x_k+\bm u_k^\top (\mathbf B^\top \mathbf S_{k+1}\mathbf B + \mathbf R)\bm u_k].
+\end{aligned}
+
+
Substituting for \mathbf S_k from the Riccati equation gives
+\begin{aligned}
+J_0 &= \frac{1}{2}\bm x_0^\top\mathbf S_0\bm x_0 + \frac{1}{2}\sum_{k=0}^{N-1}[\mathbf x^\top _{k}(\mathbf A^\top \mathbf S_{k+1}\mathbf B( \mathbf B^\top \mathbf S_{k+1}\mathbf B+\mathbf R)^{-1}\mathbf B^\top \mathbf S_{k+1}\mathbf A) \mathbf x_{k}+\bm x_k^\top \mathbf A^\top \mathbf S_{k+1}\mathbf B \bm u_k\\
+&\qquad\qquad\qquad\qquad+\bm u_k^\top \mathbf B^\top \mathbf S_{k+1}\mathbf A \bm x_k+\bm u_k^\top (\mathbf B^\top \mathbf S_{k+1}\mathbf B + \mathbf R)\bm u_k].
+\end{aligned}
+
+
The time-varying Hessian with respect to the control \bm u_k is
+\nabla_{\bm u_k}^2 J_0 = \mathbf B^\top \mathbf S_{k+1}\mathbf B + \mathbf R.
+
+
Provided that \mathbf R\succ 0, it can be seen that it is always guaranteed that \nabla_{\bm u_k}^2 J_0\succ 0. To prove this it must be shown that \mathbf B^\top \mathbf S_{k+1}\mathbf B\succeq 0. As usual, let us make things more intuitive by switching to the scalar case. The previous expression simplifies to b^2s_{k+1}. No matter what the value of b is, the square is always nonnegative. It remains to show that s_{k+1}\geq0 (and in the matrix case \mathbf S_{k+1}\succeq 0). This can be seen from the prescription for \mathbf S_{k} given by the Riccati equation using similar arguments for proving positive semidefiniteness of compound expressions.
+
To conclude, the solution to the first-order necessary conditions represented by the Riccati equation is always a minimizing solution.
+
We can work a bit more with the value of the optimal cost. Substituting the optimal control we can see (after some careful two-line work) that
+J_0 = \frac{1}{2}\bm x_0^\top \mathbf S_0 \bm x_0.
+
+
The same conclusion can be obtained for any time instant k inside the interval [0,N]
+\boxed{
+J_k = \frac{1}{2}\bm x_k^\top \mathbf S_k \bm x_k.
+}
+
+
This is a result that we have already seen in the no-control case: the optimal cost can be obtained as a quadratic function of the initial state using a matrix obtained as a solution to some iteration. We will use this result in the future derivations.
+
+
+
Numerical example with a scalar and first-order system
+
As usual, some practical insight can be developed by analyzing the things when restricted to the scalar case. For this, consider a first order system described by the first-order state equation
+x_{k+1} = ax_k + bu_k
+ and the optimization criterion in the form
+J_0 = \frac{1}{2}s_N x_N^2 + \frac{1}{2}\sum_{k=0}^{N-1}\left[ q x_k^2+r u_k^2\right ].
+
+
The scalar Riccati equation simplifies to
+s_k = a^2s_{k+1} - \frac{a^2b^2s_{k+1}^2}{b^2s_{k+1}+r} + q
+ or
+s_k = \frac{a^2rs_{k+1}}{b^2s_{k+1}+r} + q.
+
+
Julia code and its outputs follow.
+
+
+Code
+
functiondre(a,b,q,r,sN,N)
+ s =Vector{Float64}(undef,N+1) # the S[1] will then not be needed (even defined) but the indices will fit
+ k =Vector{Float64}(undef,N)
+ s[end] = sN
+for i=N:-1:1
+ k[i]=(a*b*s[i+1])/(r + s[i+1]*b^2);
+ s[i]= a*s[i+1]*(a-b*k[i]) + q;
+end
+return s,k
+end
+
+a =1.05;
+b =0.01;
+q =100;
+r =1;
+x0 =10;
+sN =100;
+N =20;
+
+s,k =dre(a,b,q,r,sN,N);
+
+usingPlots
+
+p1 =plot(0:1:N,s,xlabel="i",ylabel="RE solution",label="s",markershape=:circ,markersize=1,linetype=:steppost)
+p2 =plot(0:1:N-1,k,xlabel="i",ylabel="State-feedback gain",label="k",markershape=:circ,markersize=1,linetype=:steppost,xlims=xlims(p1))
+
+x =Vector{Float64}(undef,N+1)
+u =Vector{Float64}(undef,N)
+
+x[1]=x0;
+
+for i=1:N
+ u[i] =-k[i]*x[i];
+ x[i+1] = a*x[i] + b*u[i];
+end
+
+p3 =plot(0:1:N,x,xlabel="i",ylabel="State",label="x",markershape=:circ,markersize=1,linetype=:steppost)
+plot(p1,p2,p3,layout=(3,1))
+
+
+
+
+
+
+
Obviously the final state is not particularly close to zero, which is the desired final value. However, increasing the s_N term we can bring the system arbitrarily close, as the next simulation confirms.
The last outputs suggests that both s_N and K_k stay constant for most of the time interval and they only change dramatically towards the end of the control interval.
+
The observation in the example poses a question of how much is lost after replacing the optimal control represented by the sequence \mathbf K_k by some constant value \mathbf K. A natural candidate is the steady-state value that \mathbf K_k has as the beginning of the control interval, that is at k=0 in our case.
+
Obviously, on a finite-horizon there is not much to be investigated, the constant feedback gain is just suboptimal, but things are somewhat more involved as the control horizon stretches to infinity, that is, N\rightarrow \infty.
Obviously the final state is not particularly close to zero, which is the desired final value. However, increasing the s_N term we can bring the system arbitrarily close, as the next simulation confirms.
The last outputs suggests that both s_N and K_k stay constant for most of the time interval and they only change dramatically towards the end of the control interval.
Discrete-time LQR-optimal control on an infinite horizon
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
In this section we are going to solve the LQR problem on the time horizon extended to infinity, that is, our goal is to find an infinite (vector) control sequence \bm u_0, \bm u_{1},\ldots, \bm u_{\infty} that minimizes
+J_0^\infty = \frac{1}{2}\sum_{k=0}^{\infty}\left[\bm x_k^\top \mathbf Q \bm x_k+\bm u_k^\top \mathbf R\bm u_k\right],
+ where, as before \mathbf Q = \mathbf Q^\top \succeq 0 and \mathbf R = \mathbf R^\top \succ 0 and the system is modelled by
+\bm x_{k+1} = \mathbf A \bm x_{k} + \mathbf B \bm u_k, \qquad \bm x_0 = \mathbf x_0.
+
+
+
+
+
+
+
+Important
+
+
+
+
There is no penalty on the terminal state in the infinite time horizon LQR problem.
+
+
+
+
Why the infinite time horizon?
+
The first question that must inevitably pop up is the one about the motivation for introducing the infinite time horizon:
+
+
Does the introduction of an infinite time horizon reflect that we do not care about when the controller accomplishes the task?
+
+
No, certainly not. The infinite time horizon is introduced to model the case when the system is expected to operate indefinitely. This is a common scenario in practice, for example, in the case of temperature control in a building.
+
Similarly, the infinite time horizon can be used in the scenarios when the final time is not known and we leave it up to the controller to take as much time as it needs to reach the desired state. But even then we can still express our desire to reach the desired state as soon as possible by choosing the weights \mathbf Q and \mathbf R appropriately.
+
+
+
Steady-state solution to discrete-time Riccati equation
+
We have seen in the previous section that the solution to LQR problem with free final state and finite time horizon is given by a time-varying state feedback control law \bm u_k = \mathbf K_k \bm x_k. The sequence of gains \mathbf K_k for k=0,\ldots, N-1, is given by the sequence of matrices \mathbf S_k for k=0,\ldots, N, which in turn is given as the solution to the (discrete-time) Riccati equation initialized by the penalty \mathbf S_N and solved backwards in time. But we have also seen that, at least in our example, provided the time interval was long enough, the sequence \mathbf K_k and \mathbf S_k both converged to some steady state values as the time k proceeded backwards towards the beginning of the time interval.
+
While using these steady-state values instead of the full sequences lead to a suboptimal solution on a finite time horizon, it turns out that it actually gives the optimal solution on an infinite time horizon. Although our argument here may be viewed as rather hand-wavy, it is intuitive — there is no end to the time interval, hence the steady-state values are not given a chance to change “towards the end”, as we observed in the finite time horizon case.
+
+
+
+
+
+
+Note
+
+
+
+
Other approaches exist for solving the infinite time horizon LQR problem that do not make any reference to the finite time horizon problem, some of them are very elegant and concise, but here we intentionally stick to viewing it as the extension of the finite time horizon problem.
+
+
+
+
Notation
+
Before we proceed with the discussion of how to find the steady-state values (the limits) of \mathbf S_k and subsequently \mathbf K_k, we must discuss the notation first. So, while icreasing the time horizon N and the solution to the Riccati equation settles towards the beginning of the time interval. We can thenk pick the steady-state values right at the initial time k=0, that is, \mathbf S_0 and \mathbf K_0. But thanks to time invariance, we can also fix the final time to some (arbitrary) N and strech the interval by moving its beginning toward -\infty. The limits of the sequences \mathbf S_k and \mathbf K_k can be then considered at k goes toward -\infty. It seems appropriate to denote these limits as \mathbf S_{-\infty} and \mathbf K_{-\infty} then. Well, the fact is that the commonly accepted notation for the limits found in the literature is just \mathbf S_\infty and \mathbf K_\infty
+\mathbf S_\infty \triangleq \lim_{k\rightarrow -\infty} \mathbf S_k, \qquad \mathbf K_\infty \triangleq \lim_{k\rightarrow -\infty} \mathbf K_k.
+
+
+
+
How to compute the steady-state solution to Riccati equation?
+
Leaving aside for the moment the important question whether and under which conditions such a limit \mathbf S_\infty exists, the immediate question is how to compute such limit. One straightforward strategy is to run the recurrent scheme (Riccati equation) and generate the sequence \mathbf S_{N}, \mathbf S_{N-1}, \mathbf S_{N-2}, \ldots so long as there is a nonnegligible improvement, that is, once \mathbf S_{k}\approx\mathbf S_{k+1}, stop iterating. That is certainly doable.
+
There is, however, another idea. We apply the steady-state condition
+\mathbf S_{\infty} = \mathbf S_k=\mathbf S_{k+1}
+ to the Riccati equation. The resulting equation
+\mathbf S_{\infty}=\mathbf A^\text{T}\left[\mathbf S_{\infty}-\mathbf S_{\infty}\mathbf B(\mathbf B^\text{T}\mathbf S_{\infty}\mathbf B+\mathbf R)^{-1}\mathbf B^\text{T}\mathbf S_{\infty}\right]\mathbf A+\mathbf Q
+ is called discrete-time algebraic Riccati equation (DARE) and it is one of the most important equations in the field of computational control design.
+
The equation may look quite “messy” and offers hardly any insight. Remember the good advice to shring the problem to the scalar size while studying similar matrix-vector expressions and striving to get some insight. Our DARE simplifies to
+s_\infty = a^2s_\infty - \frac{a^2b^2s_\infty^2}{b^2s_\infty+r} + q
+
+
Multiplying both sides by the denominator we get the equivalent quadratic (in s_\infty) equation
+b^2s_\infty^2 + (r - a^2b^2 - b^2q)s_\infty - qr = 0.
+
+
Voilà! A scalar DARE is just a quadratic equation, for which the solutions can be found readily.
+
There is a caveat here, though, reflected in using plural in “solutions” above. Quadratic equation can have two (or none) real solutions. But the sequence produced by original recursive Riccati equation is determined uniquely! What’s up? How are the solutions to ARE related to the limiting solution of recursive Riccati equation?
+
Answering this question will keep us busy for most of this lecture. We will structure this broad question into several sub-questions
+
+
Under which conditions it is guaranteed that there exists a (bounded) limiting solution \mathbf S_\infty to the recursive Riccati equation for all initial (actually final) values \mathbf S_N?
+
Under which conditions is the limit solution unique for arbitrary \mathbf S_N?
+
Under which conditions is it guaranteed that the time-invariant feedback gain \mathbf K_\infty computed from \mathbf S_\infty stabilizes the system (on the infinite control interval)?
While in the previous chapter we formulated an optimal control problem (OCP) directly as a mathematical programming (general NLP or even QP) problem over the control (and possibly state) trajectories, in this chapter we introduce an alternative – indirect – approach. The essence of the approach is that we formulate first-order necessary conditions of optimality for the OCP in the form of equations, and then solve these. Although less straightforward to extend with additional constraints than the direct approach, the indirect approach also exhibits some advantages. In particular, in some cases (such as a quadratic cost and a linear system) it yields a feedback controller and not just a control trajetory.
+
+
+
Optimization constrains given only by the state equations
+
As in the chapter on the direct approach, here we also start by considering a general nonlinear and possibly time-varying discrete-time dynamical system characterized by the state vector \bm x_k\in\mathbb R^n whose evolution in discrete time k is uniquely determined by the state equation
+\bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),
+ accompanied by the initial state (vector) \bm x_i\in\mathbb R^n and a sequence of control inputs \bm u_i, \bm u_{i+1}, \ldots, \bm u_{k-1}, where the control variable can also be a vector, that is, \bm u_k \in \mathbb R^m.
+
These state equations will constitute the only constraints of the optimization problem. Unlike in the direct approach, here in our introductory treatment we do not impose any inequality constraints such as bounds on the control inputs, because the theory to be presented is not able to handle them.
+
+
+
General additive cost function
+
For the above described dynamical system we want to find a control sequence \bm u_k that minimizes a suitable optimization criterion over a finite horizon k\in[i,N]. Namely, we will look for a control that minimizes a criterion of the following kind
+J_i^N(\underbrace{\bm x_{i+1}, \bm x_{i+2}, \ldots, \bm x_{N}}_{\bar{\bm x}}, \underbrace{\bm u_{i}, \ldots, \bm u_{N-1}}_{\bar{\bm u}};\bm x_i) = \phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k).
+\tag{1}
+
+
+
+
+
+
+Note
+
+
+
+
Regarding the notation J_i^N(\cdot) for the cost, if the initial and final times are understood from the context, they do not have to be displayed. But we will soon need to indicate the initial time explicitly in our derivations.
+
+
+
The property of the presented cost function that will turn out crucial in our subsequent work is that is additive over the time horizon. Although this restricts the class of cost functions a bit, it is still general enough to encompass a wide range of problems, such as minimizing the total (financial) cost to be paid, the total energy to be used, the total distance to be travelled, the cumulative error to be minimized, etc.
+
Here is a list of a few popular cost functions.
+
+
Minimum-time (or time-optimal) problem
+
+Setting \phi=1 and L_k=1 gives J=N-i, that is, the length of the time horizon, the duration of control. Altough in this course we do not introduce concepts and tools for optimization over integer variables, in this simple case of just a single integer variable even a simple search over the length of control interval will be computationally tractable. Furthermore, as we will see in one of the next chapters once we switch from discrete-time to continuous-time systems, this time-optimal control design problem will turn out tractable using the tools presented in this course.
+
+
Minimum-fuel problem
+
+Setting \phi=0 and L_k=|u_k|, which gives J=\sum_{k=i}^{N-1}|u_k|.
+
+
Minimum-energy problem
+
+Setting \phi=0 and L_k=\frac{1}{2} u_k^2, which gives J=\frac{1}{2} \sum_{k=i}^{N-1} u_k^2. It is fair to admit that this sum of squared inputs cannot always be interpretted as the energy – for instance, what if the control input is a degree of openning of a valve? Sum of angles over time can hardly be interpreted as energy. Instead, it should be interpretted in the mathematical way as the (squared) norm, that is, a “size” of the input. Note that the same objection can be given to the previous case of a minimum-fuel problem.
+
+
Mixed quadratic problem (also LQ-optimal control problem)
+
+Setting \phi=\frac{1}{2}s_N x_N^2 and L_k=\frac{1}{2} (qx_k^2+ru_k^2),\, q,r\geq 0, which gives J=\frac{1}{2}s_Nx_N^2+\frac{1}{2} \sum_{k=i}^{N-1} (r x_k^2+q u_k^2). Or in the case of vector state and control variables J=\frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N+\frac{1}{2} \sum_{k=i}^{N-1} (\bm x_k^\top \mathbf Q \bm x_k + \bm u_k^\top \mathbf R \bm u_k), \, \mathbf Q, \mathbf R \succeq 0. This type of an optimization cost is particularly popular. Both for the mathematical reasons (we all now appreciate the nice properties of quadratic functions) and for practical engineering reasons as it allows us to capture a trade-off between the control performance (penalty on \bm x_k) and control effort (penalty on \bm u_k). Note also that the state at the terminal time N is penalized separately just in order to allow another trade-off between the transient and terminal behavior. The cost function can also be modified to penalize deviation of the state from some nonzero desired (aka reference) state trajectory, that is J=\frac{1}{2}(\bm x_N - \bm x_N^\text{ref})^\top \mathbf S_N (\bm x_N - \bm x_N^\text{ref}) +\frac{1}{2} \sum_{k=i}^{N-1} \left((\bm x_k - \bm x_k^\text{ref})^\top \mathbf Q (\bm x_k - \bm x_k^\text{ref}) + \bm u_k^\top \mathbf R \bm u_k\right).
+
+
+
Note that in none of these cost function did we include \bm u_{N} as an optimization variables as it has no influence over the interval [i,N].
+
It is perhaps needless to emphasize that while in some other applications maximizing may seem more appropriate (such as maximizing the yield, bandwidth or robustness), we can always reformulate the maximization into minimization. Therefore in our course we alway formulate the optimal control problems as minimization problems.
+
+
+
Derivation of the first-order necessary conditions of optimality
+
Having formulated a finite-dimensional constrained nonlinear optimization problem, we avoid the temptation to call an NLP solver to solve it numerically and proceed instead with our own analysis of the problem. Let’s see how far we can get.
+
By introducing Lagrange multipliers {\color{blue}\bm\lambda_k} we turn the constrained problem into an unconstrained one. The new cost function (we use the prime to distinguish it from the original cost) is
+\begin{aligned}
+& {J'}_i^N(\bm x_i, \ldots, \bm x_N, \bm u_i, \ldots, \bm u_{N-1},{\color{blue}\bm \lambda_i, \ldots, \bm \lambda_{N-1}}) \\
+&\qquad\qquad\qquad = \phi(\bm x_N,N) + \sum_{k=i}^{N-1}\left[L_{k}(\bm x_k,\bm u_k)+\bm {\color{blue}\lambda^\top_{k}}\;\left[\mathbf f_k(\bm x_k,\bm u_k)-\bm x_{k+1}\right]\right].
+\end{aligned}
+
+
From now on, in principle, we do not need any guidance here, do we? We are given an unconstrained optimization problem and its solution is just a few steps away. In particular, stationary point(s) must be found (and then we are going to argue if these qualify as minimizers or not). This calls for differentiating the above expression with respect to all the variables and setting these derivatives equal to zeros.
+
Although the principles are clear, some hindsight might be shared here if compact formulas are to be found. First such advice is to rename the variable(s) {\color{blue}\boldsymbol \lambda_k} to {\color{red}\boldsymbol \lambda_{k+1}}
+\begin{aligned}
+& {J'}_i^N(\bm x_i, \ldots, \bm x_N, \bm u_i, \ldots, \bm u_{N-1},{\color{red}\bm \lambda_{i+1}, \ldots, \bm \lambda_{N}}) \\
+& \qquad\qquad\qquad = \phi(N,\bm x_N) + \sum_{k=i}^{N-1}\left[L_{k}(\bm x_k,\bm u_k)+\boldsymbol {\color{red}\boldsymbol \lambda^\top_{k+1}}\; \left[\mathbf f_k(\bm x_k,\bm u_k)-\mathbf x_{k+1}\right]\right].
+\end{aligned}
+
+
This is really just a notational decision but thanks to it our resulting formulas will enjoy some symmetry.
+
+
+
+
+
+
+Note
+
+
+
+
Maybe it would be more didactic to leave you to go on without this advice notation and only then to nudge you to figure out this remedy on your own. But admittedly this is not the kind of competence that we aim at in this course. Let’s spend time with more rewarding things.
+
+
+
Another notational advice – but this one is more systematic and fundamental — is to make the above expression a bit shorter by introducing a new variable defined as \boxed{H_k(\bm x_k,\bm u_k,\boldsymbol\lambda_{k+1}) = L_{k}(\bm x_k,\bm u_k)+\boldsymbol \lambda_{k+1}^\top \; \mathbf f_k(\bm x_k,\bm u_k).}
+
+
We will call this new function Hamiltonian. Indeed, the choice of this name is motivated by the analogy with the equally named concept used in physics and theoretical mechanics, but we will only make more references to this analogy later in the course once we transition to continuous-time systems modelled by differential equations.
+
Introducing the Hamiltonian reformulates the cost function (and we omit the explicit dependence on all its input arguments) as
+{J'}_i^N = \phi(N,\bm x_N) + \sum_{k=i}^{N-1}\left[H_{k}(\bm x_k,\bm u_k,\boldsymbol\lambda_{k+1})-\boldsymbol\lambda^\top_{k+1}\;\mathbf x_{k+1}\right].
+
+
The final polishing of the expression before starting to compute the derivatives consists in bringing together the terms that contain related variables: the state \bm x_N at the final time, the state \bm x_i at the initial time, and the states, controls and Lagrange multipliers in the transient period
+
+{J'}_i^N = \underbrace{\phi(N,\bm x_N) -\boldsymbol\lambda^\top_{N}\;\mathbf x_{N}}_\text{at terminal time} + \underbrace{H_i(\bm x_i,\mathbf u_i,\boldsymbol\lambda_{i+1})}_\text{at initial time} + \sum_{k=i+1}^{N-1}\left[H_{k}(\bm x_k,\bm u_k,\boldsymbol\lambda_{k+1})-\boldsymbol\lambda^\top_{k}\;\mathbf x_{k}\right].
+
+
Although this step was not necessary, it will make things a bit more convenient once we start looking for the derivatives. And the time for it has just come.
+
Recall now the recommended procedure for finding derivatives of functions of vectors – find the differential instead and identify the derivative in the result. The gradient is then (by convention) obtained as the transpose of the derivative. Following this derivative-identification procedure, we anticipate the differential of the augmented cost function in the following form
+\begin{split}
+\text{d}{J'}_i^N &= (\qquad)^\top \; \text{d}\bm x_N + (\qquad)^\top \; \text{d}\bm x_i \\&+ \sum_{k=i+1}^{N-1}(\qquad)^\top \; \text{d}\bm x_k + \sum_{k=i}^{N-1}(\qquad)^\top \; \text{d}\bm u_k + \sum_{k=i+1}^{N}(\qquad)^\top \; \text{d}\boldsymbol \lambda_k.
+\end{split}
+
+
Identifying the gradients amounts to filling in the empty brackets. It straightforward if tedious (in particular the lower and upper bounds on the summation indices must be carefuly checked). The solution is
+\begin{split}
+\text{d}{J'}_i^N &= \left(\nabla_{\bm x_N}\phi-\lambda_N\right)^\top \; \text{d}\bm x_N + \left(\nabla_{\bm x_i}H_i\right)^\top \; \text{d}\bm x_i \\&+ \sum_{k=i+1}^{N-1}\left(\nabla_{\bm x_k}H_k-\boldsymbol\lambda_k\right)^\top \; \text{d}\bm x_k + \sum_{k=i}^{N-1}\left(\nabla_{\bm u_k}H_k\right)^\top \; \text{d}\bm u_k + \sum_{k=i+1}^{N}\left(\nabla_{\boldsymbol \lambda_k}H_{k-1}-\bm x_k\right)^\top \; \text{d}\boldsymbol \lambda_k.
+\end{split}
+
+
The ultimate goal of this derivation was to find stationary points for the augmented cost function, that is, to find conditions under which \text{d}{J'}_i^N=0. In typical optimization problems, the optimization is conducted with respect to all the participating variables, which means that the corresponding differentials may be arbitrary and the only way to guarantee that the total differential of J_i' is zeros is to make the associated gradients (the contents of the brackets) equal to zero. There are two exceptions to this rule in our case, though:
+
+
The state at the initial time is typically fixed and not available for optimization. Then \text{d}\bm x_i=0 and the corresponding necessary condition is replaced by the statement that \bm x_i is equal to some particular value, say, \bm x_i = \mathbf x^\text{init}. We have already discussed this before. In fact, in these situations we might even prefer to reflect it by the notation J_i^N(\ldots;\bm x_i), which emphasizes that \bm x_i is a parameter and not a variable. But in the solution below we do allow for the possibility that \bm x_i is a variable too (hence \text{d}\bm x_i\neq 0) for completeness.
+
The state at the final time may also be given/fixed, in which case the corresponding condition is replaced by the statement that \bm x_N is equal to some particular value, say, \bm x_N = \mathbf x^\text{ref}. But if it is not the case, then the final state is also subject to optimization and the corresponding necessary condition of optimality is obtained by setting the content of the corresponding brackets to zero.
+
+
+
+
Necessary conditions of optimality as two-point boundary value problem (TP-BVP)
+
The ultimate form of the first-order necessary conditions of optimality, which incorporates the special cases discussed above, is given by these equations
+\boxed{
+\begin{aligned}
+\mathbf x_{k+1} &= \nabla_{\boldsymbol\lambda_{k+1}}H_k, \;\;\; \color{gray}{k=i,\ldots, N-1},\\
+\boldsymbol\lambda_k &= \nabla_{\bm x_k}H_k, \;\;\; \color{gray}{k=i+1,\ldots, N-1}\\
+0 &= \nabla_{\bm u_k}H_k, \;\;\; \color{gray}{k=i,\ldots, N-1}\\
+\color{blue}{0} &= \color{blue}{\left(\nabla_{\bm x_N}\phi-\lambda_N\right)^\top \mathrm{d}\bm x_N},\\
+\color{blue}{0} &= \color{blue}{\left(\nabla_{\bm x_i}H_i\right)^\top \mathrm{d}\bm x_i},
+\end{aligned}
+}
+ or more explicitly
+\boxed{
+\begin{aligned}
+\mathbf x_{k+1} &= \mathbf f_k(\bm x_k,\bm u_k), \;\;\; \color{gray}{k=i,\ldots, N-1},\\
+\boldsymbol\lambda_k &= \nabla_{\bm x_k}\mathbf f_k\;\; \boldsymbol\lambda_{\mathbf k+1}+\nabla_{\bm x_k}L_k, \;\;\; \color{gray}{k=i+1,\ldots, N-1}\\
+0 &= \nabla_{\bm u_k}\mathbf f_k\;\; \boldsymbol\lambda_{k+1}+\nabla_{u_k}L_k, \;\;\; \color{gray}{k=i,\ldots, N-1}\\
+\color{blue}{0} &= \color{blue}{\left(\nabla_{\bm x_N}\phi-\lambda_N\right)^\top \mathrm{d}\bm x_N},\\
+\color{blue}{0} &= \color{blue}{\left(\nabla_{\bm x_i}H_i\right)^\top \mathrm{d}\bm x_i}.
+\end{aligned}
+}
+
+
Recall that since \mathbf f is a vector function, \nabla \mathbf f is not just a gradient but rather a matrix whose columns are gradients of the individual components of the vector \mathbf f — it is a transpose of Jacobian.
+
+
+
+
+
+
+Note
+
+
+
+
The first three necessary conditions above can be made completely “symmetric” by running the second one from k=i because the \boldsymbol\lambda_i introduced this way does not influence the rest of the problem and we could easily live with one useless variable.
+
+
+
We have just derived the (necessary) conditions of optimality in the form of five sets of (vector) equations:
+
+
The first two are recursive (or recurrent or also just discrete-time) equations, which means that they introduce coupling between the variables evaluated at consecutive times. In fact, the former is just the standard state equation that gives the state at one time as a function of the state (and the control) at the previous time. The latter gives a prescription for the variable \bm \lambda_k as a function of (among others) the same variable evaluated at the next (!) time, that is, \bm \lambda_{k+1}. Although from the optimization perspective these variables play the role of Lagrange multipliers, we call them co-state variables in optimal control theory because of the way they relate to the state equations. The corresponding vector equation is called a co-state equation.
+
+
+
+
+
+
+
+Important
+
+
+
+
It is a crucial property of the co-state equation that it dictates the evolution of the co-state variable backward in time.
+
+
+
+
The third set of equations are just algebraic equations that relate the control inputs to the state and co-state variables. Sometimes it is called a stationarity equation.
+
The last two are just single (vector) equations related to the end and the beginning of the time horizon. They are both stated in the general enough form that allows the corresponding states to be treated as either fixed or subject to optimization. In particular, if the final state is to be treated as free (subject to optimization), that is, \mathrm{d}\bm x_N can be atritrary and the only way the corresponding equation can be satisfied is \nabla_{\bm x_N}\phi=\lambda_N. If, on the other hand, the final state is to be treated as fixed, the the corresponding equation is just replaced by \bm x_N = \mathbf x^\text{ref}. Similarly for the initial state. But as we have hinted a few times, most often than not the initial state will be regarded as fixed and not subject to optimization, in which case the corresponding equation is replaced by \bm x_i = \mathbf x^\text{init}.
+
+
To summarize, the equations that give the necessary conditions of optimality for a general nonlinear discrete-time optimal control problem form a two-point boundary value problem (TP-BVP). Values of some variables are specified at the initial time, values of some (maybe the same or some other) variables are defined at the final time. The equations prescribe the evolution of some variables forward in time while for some other variables the evolution backward in time is dictated.
+
+
+
+
+
+
+Note
+
+
+
+
This is in contrast with the initial value problem (IVP) for state equations, for which we only specify the state at one end of the time horizon — the initial state — and then the state equation disctates the evolution of the (state) variable forward in time.
+
+
+
Boundary value problems are notoriously difficult to solve. Typically we can only solve them numerically, in which case it is appropriate to ask if anything has been gained by this indirect procedure compared with the direct one. After all, we did not even incorporate the inequality constraints in the problem, which was a piece of case in the direct approach. But we will see that in some special cases the TP-BVP they can be solved analytically and the outcome is particularly useful and would never have been discovered, if only the direct approach had been followed. We elaborate on this in the next section.
While the indirect approaches to optimal control constitute the classical core of the optimal control theory, most treatments of the subject consider continuous-time systems. Our treatment was based on Chapter 2 in (Lewis, Vrabie, and Syrmo 2012), which is one of a few resources that discuss discrete-time optimal control too.
Dynamic programming and discrete-time optimal control
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
In the previous two chapters we explained direct and indirect approaches to discrete-time optimal control. While the former conveniently allows incorporating almost arbitrary constraints, it only provides a control trajectory (a finite sequence of values of the control variable); if feedback is needed, the optimization must be performed in every sampling period (thus implementing the concept of receding horizon or model predictive control, MPC). The latter, in contrast, can lead to a (state) feedback control law, but this only happens in special cases such as a regulation of a linear system minimizing a quadratic cost (LQR) while assuming no bound constraints on the the control or state variables; in the general case it leads to a two-point boundary value problem, which can only be solved numerically for trajectories.
+
In this chapter we present yet another approach — dynamic programming DP. It also allows imposing constraints (in fact, even constraints such as integrality of variables, which are not compatible with our derivative-based optimization toolset exploited so far), and yet it directly leads to feedback controllers.
+
+
While in the case of linear systems with a quadratic cost function, dynamic programming provides another route to the theoretical results that we already know — Riccati equation based solution to the LQR problem —, in the the case of general nonlinear dynamical systems with general cost functions, the feedback controllers come in the form of look-up tables. This format of a feedback controller gives some hint about disadvantages of DP, namely, both computation and then the use of these look-up tables do not scale well with the dimension of the state space (aka curse of dimensionality). Various approximation schemes exist — one promising branch is known as reinforcement learning.
+
+
Bellman’s principle of optimality and dynamic programming
+
We start by considering the following example.
+
+
Example 1 (Reusing the plan for a trip from Prague to Ostrava) We are planning a car trip from Prague to Ostrava and you are searching for a route that minimizes the total time. Using the online planner we learn that the fastest route from Prague to Ostrava is — as bizarre as it sounds — via (actually around) Brno.
+
+
Now, is it possible to reuse this plan for our friends from Brno who are also heading for Ostrava?
+
+
The answer is yes, as the planner confirms. Surely did not even need the planner to answer such trivial question. And yet it demonstrates the key wisdom of the whole chapter — the Bellman’s principle of optimality —, which we now state formally.
+
+
+
Theorem 1 (Bellman’s principle of optimality) An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
+
+
We now investigate this idea a bit more quantitatively using a simple computational example of finding a shortest path in a graph.
+
+
Example 2 (Shortest path in a graph) We consider a directional graph with nodes A, B, C, D, and E and edges with the prescribed lengths as in the figure below.
+
+
+
+
+
+
+
+
The task is now to find the shortest path from A to I. What are possible solution strategies? We can start enumerating all the possible paths and calculate their costs (by summing the costs of the participating edges). Needless to say, this strategy based on enumeration scales very badly with the growing number of nodes.
+
Alternatively, we solve the problem using dynamic programming and relying on Bellman’s principle of optimality. Before we proceed, we need to define the concept of a stage. It is perhaps less common and natural when it comes to solving graph problems, but we introduce it with anticipation of discrete-time optimal control problems. By the kth stage we understand the node at which the kth decision needs to be made. In our case, starting at A, 4 decisions need to be made to reach the final node. But let’s agree that we also denote the final node as the stage, the 5th one, even if no decision is to be made here. The total number of stages is then N=5.
+
The crucial attribute of the strategy based on dynamic programming is that we proceed backwards. We start at the very final stage. At this stage, there is just one node and there is nothing we can do, but note that it also makes sense to formulate problems with several possible nodes at the final stage, each with a different (terminal) costs — we will actually use once we switch to the optimal control setting. Now we proceed backwards to the last but one, that is, the (N-1)th stage.
+
These are F and H nodes at this 4the stage. In these two nodes there is again no freedom as for the actions, but for each of them we can record their respective cost to go: 4 for the F node and 2 for the H node. These costs reflect how costly it is to reach the terminal node from them.
+
Things are only getting interesting if we now proceed to the 3rd stage. We now have to consider three possible nodes: C, E and G. For the C and G nodes there is still just one action and we can only record their costs to go. The cost for the C node can be computed as the cost for the immediate transition from C to F plus the cost for the F node, which we recorded previously, that is, 3+4=7. We record the value of 7 with the C node. Similarly for the G node. For the E node there are two possible actions — two possible decisions to be made, two possible paths to choose from. Either to the left (or, actually, up in our orientation of the graph), which would bring us to the node F, or to the right (or down), which would bring us to the node H. We compute the costs to go for both decisions and choose the decision with a smaller cost. Here the cost of the decision to go to the left is composed of the cost of the transition to F plus the cost to go from F, that is, 3+4=7. The cost to go for the decision to go right is composed of the transition cost from E to H plus the cost to go from H, that is, 2+2=4. Obviously, the optimal decision is to go right, that is, to the node H. Here, on top of the value of the optimal (smallest) cost to go from the node we also record the optimal decision (go to the right/down). We do it by coloring the edge in blue.
+
Note that in principle we should have highlighted the edges from F to I, from C to F, and from G to H. It was unnecessary here since there were the only possible edges emanating from these nodes.
+
We proceed backwards to the 2nd stage, and we compute the costs to go for the nodes B and D. Again we record their optimal values and the actual optimal decisions.
+
One last shift backwards and we are at the initial node A, for which we can do the same computation of the costs to go. Note that here coincidently both decisions have the same cost to go, hence both possible decisions/actions are optimal and we can just toss a coin.
+
+
+
+
+
+
+
+
Maybe it is not immediately clear from the graph, but when viewed as an itinerary for a trip, it provides a feedback controller. Even if for whichever reason we find ourselves out of the optimal path, we can always have a look at the graph — it will guide us along the path that is optimal from that given node. For example, if we happen to be in node C, we do have a plan. Well, here is misleadingly simple as there is no decision to be made, but you get the point.
+
+
+
+
Bellman’s principle of optimality applied to the discrete-time optimal control problem
+
Let’s recapitulate here the problem of optimal control for a discrete-time system. In particular, we consider the system modelled by
+\bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),
+ defined on the discrete time interval [i,N], with the initial state \bm x_i fixed (\bm x_i = \mathbf x_i) We aim at minimizing the cost function
+J_i^N\left(\bm x_i, \bm u_i, \bm u_{i+1}, \ldots, \bm u_{N-1}\right) = \phi(\bm x_N,N) + \sum_{k=i}^{N-1}L_k(\bm x_k,\bm u_k).
+
+
Before we proceed, some comments on the notation are in order. Indeed, a well tuned and systematically used notation is instrumental in dynamic programming.
+
+
+
+
+
+
+We omit the final time from the notation for the cost function
+
+
+
+
While the cost function does depend on the final time too, in most if not all our analyses we assume that it is fixed and understood from the context. Hence we will not explicitly indicate the dependence on the final time. We will write just J_i(\ldots). This may help reduce the notational clutter as we are going to need the upper index for something else soon.
+
+
+
+
+
+
+
+
+We omit the state trajectory from the notation for the cost function and leave just the initial state
+
+
+
+
The cost function is clearly a function of the full sequence \bm x_i, \bm x_{i+1},\ldots, \bm x_N of the state vectors too. In the previous chapters we handled it systematically (either by considering them as optimization variables in the simultaneous direct approach or by introducing Lagrange multipliers in the indirect approache). But here we want to emphasize the fact that starting with \bm x_{i+1}, the whole state trajectory is uniquelly determined by the initial state \bm x_i and the corresponding control trajectory \bm u_i, \bm u_{i+1},\ldots, \bm u_{N-1}. Therefore, we write the cost function as a function of the initial state, the initial time (we already agreed above not to emphasize the final time), and the sequence of controls.
+
+
+
+
+
+
+
+
+We use the lower index to display dependence on time
+
+
+
+
The dependence on the discrete time is reflected by the lower indices: not only in \bm x_k and \bm u_k but also in \mathbf f_k(), L_k() and J_k(). We could perhaps write these as \mathbf f(\cdot,\cdot,k), L(\cdot,\cdot,k) and J(\cdot,\cdot,k) to better indicate that k is really an argument for these functions, but we prefer making it compatible with the way we indicate the time dependence of \bm x_k and \bm u_k.
+
+
+
Having introduced the cost function parameterized by the initial state, initial time and the full sequence of controls, we now introduce the optimal cost function
The sequence of controls in the above minimization may be subject to some constraints, but we do not indicate them here for the sake of notational simplicity.
+
+
+
+
+
+
+Difference between the J_i and J^\star_i functions
+
+
+
+
Understanding the difference is crucial. While the cost function J_i depends on the (initial) state, the (initial) time and the sequence of controls applied over the whole interval, the optimal cost function J^\star_i only depends on the (initial) state and the (initial) time.
+
+
+
Assume now that we can find an optimal control sequence from any given state \bm x_{k+1} at time k+1 on, i.e., we can find \bm u_{k+1}^\star,\bm u_{k+2}^\star,\ldots, \bm u_{N-1}^\star yielding the optimal cost J_{k+1}^\star(\bm x_{k+1}). We will soon show how to actually find it, but for the time being we just assume we can have it. We now show how it can be used to find the optimal cost J_k^\star(\bm x_k) at state \bm x_k and time k.
+
Let’s now consider the following strategy: with the system at state \bm x_k and time k we apply some control \bm u_k, not necessarily an optimal one, which brings the system to the state \bm x_{k+1} in the next time k+1. But from then on we use the control sequence \bm u_{k+1}^\star,\bm u_{k+2}^\star,\ldots, \bm u_{N-1}^\star that is optimal from \bm x_{k+1}. The corresponding cost is
+L_k(\bm x_k,\bm u_k) + J_{k+1}^\star(\bm x_{k+1}).
+\tag{2}
+
Bellman’s principle of optimality states that if we optimize the above expression over \bm u_k, we get the optimal cost J_k^\star(\bm x_k) at time k
+\boxed{J_k^\star(\bm x_k) = \min_{\bm u_k}\left(L_k(\bm x_k,\bm u_k) + J_{k+1}^\star(\bm x_{k+1})\right).}
+\tag{3}
+
Hence, at a given state \bm x_{k} and time k, the optimization is performed over only one (possibly vector) control \bm u_k and not the whole trajectory as the definition of the optimal cost in Equation 1 suggests! What a simplification!
+
+
+
+
+
+
+Important
+
+
+
+
The minimization needs to be performed over the whole sum L_k(\bm x_k,\bm u_k) + J_{k+1}^\star(\bm x_{k+1}), because \bm x_{k+1} is a function of \bm u_k (recall that \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k)). We can also write Equation 3 as
+J_k^\star(\bm x_k) = \min_{\bm u_k}\left(L_k(\bm x_k,\bm u_k) + J_{k+1}^\star(\mathbf f_k(\bm x_k,\bm u_k))\right),
+ which makes it more apparent.
+
+
+
Once we have the optimal cost function J^\star_{k}, the optimal control \bm u_k^\star(x_k) at a given time k and state \bm x_k is obtained by
+\boxed{
+ \bm u_k^\star(\bm x_k) = \arg \min_{\bm u_k}\left(L_k(\bm x_k,\bm u_k) + J_{k+1}^\star(\mathbf f_k(\bm x_k,\bm u_k))\right).}
+
+
+
Alternative formulation of dynamic programming using Q-factors
+
The cost function in Equation 2 is sometimes called Q-factor and we denote it Q_k(\bm x_k,\bm u_k). We write its definition here for convenience
+Q^\star_k(\bm x_k,\bm u_k) = L_k(\bm x_k,\bm u_k) + J_{k+1}^\star(\bm x_{k+1}).
+
+
The optimal cost function J_k^\star(\bm x_k) can be recovered from the optimal Q-factor Q_k^\star(\bm x_k,\bm u_k) by taking the minimum over \bm u_k
+J_k^\star(\bm x_k) = \min_{\bm u_k} Q_k^\star(\bm x_k,\bm u_k).
+
+
Bellman’s principle of optimality can be then expressed using the optimal Q-factor as
+\boxed{Q_k^\star(\bm x_k,\bm u_k) = L_k(\bm x_k,\bm u_k) + \min_{\bm u_{k+1}} Q_{k+1}^\star(\bm x_{k+1},\bm u_{k+1})}.
+
+
Optimal control is then obtained from the optimal Q-factor as the minimizing control
+\boxed{\bm u_k^\star(\bm x_k) = \arg \min_{\bm u_k} Q_k^\star(\bm x_k,\bm u_k).}
+
In the previous section we have used dynamic programming as a numerical algorithm for solving a general discrete-time optimal control problem. We now show how to use dynamic programming to solve the discrete-time LQR problem. We consider a linear discrete-time system modelled by
+\bm x_{k+1} = \mathbf A\bm x_k + \mathbf B\bm u_k,
+ for which we want to minimize the quadratic cost given by
+J_0(\bm x_0, \bm u_0, \bm u_1, \ldots, \bm u_{N-1}) = \frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N + \frac{1}{2}\sum_{k=0}^{N-1}\left(\bm x_k^\top \mathbf Q \bm x_k + \bm u_k^\top \mathbf R \bm u_k\right),
+ with \mathbf S_N\succeq 0, \mathbf Q\succeq 0, \mathbf R\succ 0, as usual.
+
We now invoke the principle of optimality, that is, we start at the end of the time interval and evaluate the optimal cost
+J_N^\star(\bm x_N) = \frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N.
+
+
Obviously, here we did not even have to do any optimization since at the and of the interval the cost can no longer be influenced by any control. We just evaluated the cost.
+
We then proceed backwards in time, that is, we decrease the time to k=N-1. Here do have to optimize:
+J^\star_{N-1}(\bm x_{N-1}) = \min_{\bm u_{N-1}\in\mathbb R^m} J_{N-1}(\bm x_{N-1},\bm u_{N-1}) = \min_{\bm u_{N-1}\in\mathbb R^m} \left[L(\bm x_{N-1},\bm u_{N-1}) + J^\star_{N}(\bm x_{N}) \right].
+
We assumed no constraint on \mathbf u_{N-1}, hence finding the minimum of J_{N-1} is as easy as setting its gradient to zero
+\mathbf 0 = \nabla_{\bm u_{N-1}} J_{N-1} = (\mathbf R + \mathbf B^\top \mathbf S_n \mathbf B)\bm u_{N-1} + \mathbf B^\top \mathbf S_N\mathbf A\bm x_{N-1},
+ which leads to
+\bm u_{N-1}^\star = -\underbrace{(\mathbf B^\top \mathbf S_N\mathbf B + \mathbf R)^{-1}\mathbf B^\top \mathbf S_N \mathbf A}_{\mathbf K_{N-1}} \bm x_{N-1},
+ which amounts to solving a system of linear equations. We can also recognize the Kalman gain matrix \mathbf K_{N-1}, which we derived using the indirect approach in the previous chapter.
+
The optimal cost J^\star_{N-1} can be obtained by substituting \bm u_{N-1}^\star into J_{N-1}
+J_{N-1}^\star = \frac{1}{2}\bm x_{N-1}^\top \underbrace{\left[(\mathbf A-\mathbf B\mathbf K_{N-1})^\top \mathbf S_N(\mathbf A-\mathbf B\mathbf K_{N-1}) + \mathbf K_{N-1}^\top \mathbf R \mathbf K_{N-1} + \mathbf Q\right]}_{\mathbf S_{N-1}} \bm x_{N-1}.
+
+
Note that the optimal cost J^\star_{N-1} is also a quadratic function of the state as is the cost J^\star_{N}. We denote the matrix that defines this quadratic function as \mathbf S_{N-1}. We do this in anticipation of continuation of this recursive procedure to k = N-2, N-3, \ldots, which will give \mathbf S_{N-2}, \mathbf S_{N-3}, \ldots. The rest of the story is quite predictable, isn’t it? Applying the Bellman’s principle of optimality we (re)discovered the discrete-time Riccati equation in the Joseph stabilized form
+\mathbf S_k = (\mathbf A-\mathbf B\mathbf K_{N-1})^\top \mathbf S_N(\mathbf A-\mathbf B\mathbf K_{N-1}) + \mathbf K_{N-1}^\top \mathbf R \mathbf K_{N-1} + \mathbf Q,
+ together with the prescription for the state feedback (Kalman) gain
+\mathbf K_{k} = (\mathbf B^\top \mathbf S_N\mathbf B + \mathbf R)^{-1}\mathbf B^\top \mathbf S_N \mathbf A.
+
Dynamic programming (DP) is a fairly powerful and yet general framework that finds its use in many disciplines. Optimal control is not the only one. But in this overview of the literature we deliberately focus on the DP references with optimal control flavour.
+
Our introductory treatment was based almost exclusively on the (also just introductory) Chapter 6 in (Lewis, Vrabie, and Syrmo 2012). Electronic version of the book is freely available on the author’s webpage.
+
Comparable introduction is provided in (Kirk 2004). Although it does not appear to be legally available for free in an electronic form, its reprint by a low-cost publisher makes it an affordable (and recommendable) classic reference. Another classic (Anderson and Moore 2007) actually uses dynamic programming as the key technique to derive all those LQ-optimal regulation and tracking results. A few copies of this book are available in the faculty library at NTK. The authors also made an electronic version available for free on their website.
+
Fairly comprehensive treatment of control-oriented DP is in the two-volume monograph (Bertsekas 2017) and (Bertsekas 2012). It is not available online for free, but the book webpage contains links to other supporting materials including lecture notes. Furthermore, the latest book by the same author (Bertsekas 2023), which is available for free download, contains a decent introduction to dynamic programming.
+
Having just referenced a book on reinforcement learning (RL), indeed, this popular concept — or at least some of its flavours — is closely related to dynamic programming. In fact, it offers a way to overcome some of the limitations of dynamic programming. In our introductory lecture we are not covering RL, but an interested student can take advantage of availability of high-quality resources such as the the RL-related books and other resources by D. Bertsekas) and another recommendable introduction to RL from control systems perspective (Meyn 2022), which is also available for free download.
+
The book (Sutton and Barto 2018) often regarded as the bible of RL is nice (and freely available for download) but may be rather difficult to read for a control engineer because of major differences in terminology.
Based on what we have seen so far, it turns out that the key to solving the discrete-time optimal control problem is to find some… functions. Either the optimal cost function J_k^\star(\bm x_k) or the optimal Q-factor Q_k^\star(\bm x_k,\bm u_k). Once we have them, we can easily find the optimal control \bm u_k^\star(\bm x_k). The question however is how to find these functions. We have seen some recursions for both of them, but it is not clear how to turn these into practical algorithms. We do it here.
+
We are going to solve ?@eq-bellman_for_discrete_time_optimal_control backwards in (discrete) time at a grid of states. Indeed, gridding the state space is the key technique in dynamic programming, because DP assumes a finite state space. If it is not finite, we must grid it.
+
We start with the final time N. We evaluate the terminal cost function \phi(\bm x_N) at a grid of states, which directly yields the optimal costs J_N^\star(\bm x_N).
+
We then proceed to the time N-1. Evaluating the optimal cost function J^\star_{N-1} at each grid point in the state space calls for some optimization, namely
+\min_{u_{N-1}} \left(L_{N-1}(\bm x_{N-1},\bm u_{N-1}) + J_{N}^\star(\mathbf f_{N-1}(\bm x_{N-1}, \bm u_{N-1}))\right).
+
+
We save the optimal costs and the corresponding controls at the given grid points (giving two arrays of values), decrement the time to N-2, and repeat. All the way down to the initial time i.
+
Let’s summarize that as an outcome of this whole procedure we have two tables – one for the optimal cost, the other for the optimal control.
Each chapter corresponds to a single weekly block (covered by a lecture, a seminar/exercise, and some homework), hence 14 chapters in total.
+
Organizational instructions, description of grading policy, assignments of homework problems and other course related material relevant for officially enrolled students are published on the course page within the FEL Moodle system.
Multiple-input-multiple-output (MIMO) systems are subject to limitations of the same origin as single-input-single-output (SISO) systems: unstable poles, “unstable” zeros, delays, disturbances, saturation, etc. However, the vector character of inputs and outputs introduces both opportunities to mitigate those limitations, and… new limitations.
+
+
Directions in MIMO systems
+
With vector inputs and vector outputs, the input-output model of an LTI MIMO system is a matrix (of transfer functions). As such, it can be characterized not only by various scalar quantities (like poles, zeros, etc.), but also by the associated directions in the input and output spaces.
+
+
Example 1 Consider the transfer function matrix (or matrix of transfer functions)
+G(s) = \frac{1}{(0.2s+1)(s+1)}\begin{bmatrix}1 & 1\\ 1+2s& 2\end{bmatrix}.
+
+
Recall that a complex number z\in\mathbb C is a zero of G if the rank of G(z) is less than the rank of G(s) for most s. While reliable numerical algorithms for computing zeros of MIMO systems work with state-space realizations, in this simple case we can easily verify that there is only one zero z=1/2.
+
Zeros in the RHP only exhibit themselves in some directions.
Therefore minimized conditioning number \boxed{
+\gamma^\star(G) = \min_{D_1, D_2}\gamma(D_1GD_2)
+}
+ but difficult do compute (=upper bound on \mu)
+
RGA can be used to give a reasonable estimate.
+
+
+
Relative gain array (RGA)
+
Relative Gain Array (RGA) as an indicator of difficulties with control \boxed{\Lambda(G) = G \circ (G^{-1})^T}
+
+
independent of scaling,
+
sum of elements in rows and columns is 1,
+
sum of absolute values of elements of RGA is very close to the minimized sensitivity number \gamma^\star, hence a system with large RGA entries is always ill-conditioned (but system with large \gamma can have small RGA),
+
RGA for a triangular system is an identity matrix,
+
relative uncertainty of an element of a transfer function matrix equal to (negative) inverse of the corresponding RGA entry makes the system singular.
+
+
+
+
Functional controllability
+
+
+
Interpolation conditions for MIMO systems
+
+
+
Bandwidth limitations due to unstable poles and zeros
+
+
+
Limits given by presence of disturbance and/or reference
+
+
+
Disturbance rejection by a plant with RHP zero
+
+
+
Limits given by the input constraints (saturation)
+
+
+
Limits given by uncertainty in the model: in open loop
For a given system, there may be some inherent limitations of achievable performance. However hard we try to design/tune a feedback controller, certain closed-loop performance indicators such as bandwidth, steady-state accuracy, or resonant peaks may have inherent limits. We are going to explore these. The motivation is that once we know what is achievable, we do not have to waste time by trying to achieve the impossible.
+
At first it may look confusing that we are only formulating this problem of learning the limits towards the end of our course, since one view of the whole optimal control theory is that is that it provides a systematic methodology for learning what is possible to achieve. Shall we need to know the shortest possible time in which the drone can be brought from one position and orientation to another, we just formulate the minimum-time optimal control problem and solve it. Even if at the end of the day we intend to use a different controller – perhaps one supplied commercially with a fixed structure like a PID controller – at least we can assess the suboptimality of such controller by comparing its performance with the optimal one.
+
+
+
+
+
+
+Optimal control theory may help reveal the limits
+
+
+
+
Indeed, this is certainly a fair and practical motivation for studying even those parts of optimal control theory that do not provide very practical controllers, such as the \mu synthesis returning controllers of rather high order even for low-order plants.
+
+
+
In this section we are going to restrict ourselves to SISO systems. Then in the next section we will extend the results to MIMO systems.
+
+S+T = 1
+
+
+
Clarification of the definition of bandwidth
+
+
+
Interpolation conditions of internal stability
+
Consider that the plant modelled by the transfer function G(s) has a zero in the right half-plane (RHP), that is,
+
+G(z) = 0, \; z\in \text{RHP}.
+
+
It can be shown that the closed-loop transfer functions S(s) and t(s) satisfy the interpolation conditions \boxed{
+S(z)=1,\;\;\;T(z)=0
+}
+
+
+
Proof. Showing this is straightforward and insightful: since no unstable pole-zero cancellation is allowed if internal stability is to be guaranteed, the open-loop transfer function L=KG must inherit the RHP zero of G, that is,
+
+L(z) = K(z)G(z) = 0, \; z\in \text{RHP}.
+
+
But then the sensitivity function S=1/(1+L) must satisfy
+S(z) = \frac{1}{1+L(z)} = 1.
+
+
Consequently, the complementary sensitivity function T=1-S must satisfy the interpolation condition T(z)=0.
+
+
Similarly, assuming that the plant transfer function G(s) has a pole in the RHP, that is,
+
+G(p) = \infty, \; p\in \text{RHP},
+ which can also be formulated in a cleaner way (avoiding the infinity in the definition) as
+\frac{1}{G(p)} = 0, \; p\in \text{RHP},
+ the closed-loop transfer functions S(s) and T(s) satisfy the interpolation conditions \boxed
+{T(p) = 1,\;\;\;S(p) = 0.}
+
+
The interpolation conditions that we have just derived constitute the basis on which we are going to derive the limitations of achievable closed-loop magnitude frequency responses. But we need one more technical results before we can proceed. Most probably you have already encountered it in some course on complex analysis - maximum modulus principle. We state this result in the jargon of control theory.
+
+
Theorem 1 (Maximum modulus principle) For a stable transfer function F(s), that is, for a function with no pole in the closed right half-plane (RHP) it holds that
+
+\sup_{\omega}|F(j\omega)|\geq |F(s_0)|\;\;\; \forall s_0\in \text{RHP}.
+
+
This can also be expressed compactly as
+\|F(s)\|_\infty \geq |F(s_0)|\;\;\; \forall s_0\in \text{RHP}.
+
+
+
Now instead of some general F(s) we consider the weighted sensitivity function W_\mathrm{p}(s)S(s). And the complex number s in the RHP equals to a zero z of the plant transfer function G(s), that is, G(z)=0, \; z\in\mathbb C, \; \Re(z)\geq 0. Then the maximum modulus principle together with the interpolation condition S(z)=1 implies that
Similar result holds for the weighted complementary sensitivity function W(s)T(s) and an unstable pole p of the plant transfer function G(s), when combining the maximum modulus principle with the interpolation condition T(p)=1
+
+\|WT\|_{\infty}\geq |W(p)|.
+
+
These two simple results can be further generalized to the situations in which the plant transfer function G(s) has multiple zeros and poles in the RHP. Namely, if G(s) has N_p unstable poles p_i and N_z unstable zeros z_j,
As a special case, consider the no-weight cases W_\mathrm{p}(s)=1 and W(s)=1 with just a single unstable pole and zero. Then the limitations on the achievable closed-loop magnitude frequency responses can be formulated as
+\|S\|_{\infty} > c, \;\; \|T\|_{\infty} > c, \;\;\;c=\frac{|z+p|}{|z-p|}.
+
+
+
Example 1 For G(s) = \frac{s-4}{(s-1)(0.1s+1)}, the limitations are
+\|S\|_{\infty}>1.67, \quad \|T\|_{\infty}>1.67.
+
+
+
+
+
Limitations of the achievable bandwidth due to zeros in the right half-plane
+
There are now two requirements on the weighted sensitivity function that must be reconciled. First, the performance requirements
+|S(j\omega)|<\frac{1}{|W_\mathrm{p}(j\omega)|}\;\;\forall\omega\;\;\;\Longleftrightarrow \|W_\mathrm{p}S\|_{\infty}<1
+ and second, the just derived consequence of the interpolation condition
+\|W_\mathrm{p}S\|_{\infty}\geq |W_\mathrm{p}(z)|.
+
+
The only way to satisfy both is to guarantee that
+|W_\mathrm{p}(z)|<1.
+
+
Now, consider the popular first-order weight
+W_\mathrm{p}(z)=\frac{s/M+\omega_\mathrm{B}}{s+\omega_\mathrm{B} A}.
+
+
For one real zero in the RHP, the inequality |W_\mathrm{p}(z)|<1 can be written as
+\omega_\mathrm{B}(1-A) < z\left(1-\frac{1}{M}\right).
+
+
Setting A=0 a M=2, the upper bound on the bandwidth follows
+
\boxed
+{\omega_\mathrm{B}<0.5z.}
+
+
For complex conjugate pair
+\omega_\mathrm{B}=|z|\sqrt{1-\frac{1}{M^2}}
+M=2: \omega_\mathrm{B}<0.86|z|.
+
+
+
Limitation of the achievable bandwidth due to poles in the right half-plane
+
Using
+|T(j\omega)|<\frac{1}{|W(j\omega)|}\;\;\;\forall\omega\;\;\;\Longleftrightarrow \|WT\|_{\infty}<1
+ and the interpolation condition \|WT\|_{\infty}\geq |W(p)|:
+|W(p)|<1
+ With weight
+W(s)= \frac{s}{\omega_{BT}^*}+\frac{1}{M_T}
+ we get a lower bound on the bandwidth
+\omega_{BT}^* > p\frac{M_T}{M_T-1}
+M_T=2: {\omega_{BT}^*>2p}\ For complex conjugate pair: \omega_{BT}^*>1.15|p|.
+
+
+
Limitations due to time delay
+
Consider the problem of designing a feedback controller for reference tracking. An ideal closed-loop transfer function T(s) from the reference to the output satisfies T(s)=1. If the plant has a time delay, the best achievable closed-loop transfer function T(s) is given by
+T(s) = e^{-\theta s},
+ that is, the reference is perfectly tracked, albeit with some delay. The best achievable sensitivity function S(s) is then given by
+S(s) = 1-e^{-\theta s}.
+
+
In order to make the analysis simpler, we approximate the sensitivity function by the first-order Taylor expansion
+S(s) \approx \theta s,
+ from which we can see that the magnitude frequency response of the sensitivity function is approximated by a linear function of frequency. Unit gain is achieved at about
+
+\omega_{c}=1/\theta.
+ From this approximation, we can see that the bandwidth of the system is limited by the time delay \theta as
The material here is mostly based on the chapters 5 (SISO systems) and 6 (MIMO systems) of Skogestad and Postlethwaite (2005).
+
While the treatment of the book is more than sufficient for our course, we list here some other resources for further reading. The popular paper Stein (2003) provides a nice explanation of the waterbed effect(s).
+
Interested readers may want to consult the specialized monograph Seron, Braslavsky, and Goodwin (1997), but it is certainly not necessary for our course.
+
More recent results are surveyed in Chen, Fang, and Ishii (2019).
+Chen, Jie, Song Fang, and Hideaki Ishii. 2019. “Fundamental Limitations and Intrinsic Limits of Feedback: An Overview in an Information Age.”Annual Reviews in Control 47 (January): 155–77. https://doi.org/10.1016/j.arcontrol.2019.03.011.
+
+
+Seron, Maria M., Julio H. Braslavsky, and Graham C. Goodwin. 1997. Fundamental Limitations in Filtering and Control. Communications and Control Engineering. London: Springer. https://doi.org/10.1007/978-1-4471-0965-5.
+
+
+Skogestad, Sigurd, and Ian Postlethwaite. 2005. Multivariable Feedback Control: Analysis and Design. 2nd ed. Wiley. https://folk.ntnu.no/skoge/book/.
+
We keep adhering to our previous decision to focus on the algorithms that use derivatives. But even then the number of derivative-based algorithms for constrained optimization – and we consider both equality and inequality constraints – is large. They can be classified in many ways.
+
One way to classify the derivative-based algorithms for constrained optimization is based on is to based on the dimension of the space in which they work. For an optimization problem with n variables and m constraints, we have the following possibilities: n-m, n, m, and n+m.
+
+
primal methods
+
dual methods
+
primal-dual methods
+
+
+
Primal methods
+
+
With m equality constraints, they work in the space of dimension n-m.
+
Three advantages
+
+
each point generated by the iterative algoritm is feasible – if terminated early, such point is feaible.
+
if they generate a converging sequence, it typically converges at least to a local constrained minimum.
+
it does not rely on a special structure of the problem, it can be even nonconvex.
+
+
but it needs a feasible initial point.
+
They may fail for inequality constraints.
+
+
They are particularly useful for linear/affine constraints or simple nonlinear constraints (norm balls or ellipsoids).
+
+
Projected gradient method
+
+
+
Active set methods
+
+
+
Sequential quadratic programming (SQP)
+
KKT conditions for a nonlinear program with equality constraints solved by Newton’s method.
+
Interpretation: at each iteration, we solve a quadratic program (QP) with linear constraints.
These are essentially the methods that we have all learnt to apply using a pen and paper. A bunch of rules. The outcome is an expression.
+
+
+
Numerical finite-difference (FD) methods
+
These methods approximate the derivative by computing differences between the function values at different points, hence the name finite-difference (FD) methods. The simplest FD methods follow from the definition of the derivative after omiting the limit:
+
+\frac{\mathrm d f(x)}{\mathrm d x} \approx \frac{f(x+\alpha)-f(x)}{\alpha}\qquad\qquad \text{forward difference}
+ or
+\frac{\mathrm d f(x)}{\mathrm d x} \approx \frac{f(x)-f(x-\alpha)}{\alpha}\qquad\qquad \text{backward difference}
+ or
+\frac{\mathrm d f(x)}{\mathrm d x} \approx \frac{f(x+\frac{\alpha}{2})-f(x-\frac{\alpha}{2})}{\alpha}\qquad\qquad \text{central difference}
+
+
For functions of vector variables, the same idea applies, but now we have to compute the difference for each component of the vector.
The number of numerical solvers is huge. First, we give a short biased list of solvers which we may use within this course.
+
+
Optimization Toolbox for Matlab: fmincon, fminunc, linprog, quadpro, … Available within the all-university Matlab license for all students and employees at CTU.
Similarly, users of Julia and JuMP will find the list of solvers supported by JuMP useful. The list is worth consulting even if Julia is not the tool of choice, as many solvers are indepdenent of Julia.
Our motivation for studying numerical algorithms for unconstrained optimization remains the same as when we studied the conditions of optimality for such unconstrained problems – such algorithms constitute building blocks for constrained optimization problems. Indeed, many algorithms for constrained problems are based on reformulating the constrained problem into an unconstrained one and then applying the algorithms studied in this section.
+
It may be useful to recapitulate our motivation for studying optimization algorithms in general – after all, there are dozens of commercial or free&open-source software tools for solving optimization problems. Why not just use them? There are two answers beyond the traditional “at a grad school we should understand what we are using”:
+
+
There is no single solver that works best for all problems. Therefore we must be aware of the principles, strenghts and weaknesses of the algorithms in order to choose the right one for our problem.
+
This is a control engineering course and numerical optimization is becoming an integral part of control systems. While developing a control system, we may find ourselves in need of developing our own implementation of an optimization algorithm or adjusting an existing one. This requires deeper understanding of algorithms than just casual usage of high-level functions in Matlab or Python.
+
+
There is certainly no shortage of algorithms for unconstrained optimization. In this crash course we can cover only a few. But the few we cover here certainly form a solid theoretical basis and provide practically usable tools.
+
One possible way to classify the algorithms is based on whether they use derivatives of the objective functions or not. In this course, we only consider the former approaches as they leads to more efficient algorithms. For the latter methods, we can refer to the literature (the prominent example is Nelder-Mead method).
+
All the relevant methods are iterative. Based on what happens within each iteration, we can classify them into two categories:
+
+
Descent methods
+
+In each iteration, fix the search direction d_k first, and then determine how far to go along that direction, that is, find the step length \alpha_k that minimizes (exactly or approximately) f(x_k + \alpha_k d_k). In the next iteration the search direction is updated.
+
+
Trust region methods
+
+In each iteration, fix the region (typically a ball) around the current solution, in which a simpler (typically a quadratic) function approximates the original cost function reasonably accurately, and then find the minimum of this simpler cost function.
+
+
+
+
Descent methods
+
The obvious quality that the search direction needs to satisfy, is that the cost function decreses along it, at least locally (for a small step length).
+
+
Definition 1 (Descent direction) At the current iterate \bm x_k, the direction \bm d_k is called a descent direction if
+\nabla f(\bm x_k)^\top \bm d_k < 0,
+ that is, the directional derivative is negative along the direction \bm d_k.
+
+
The product above is an inner product of the two vectors \bm d_k and \nabla f(\mathbf x_k). Recall that it is defined as
+\nabla f(\bm x_k)^\top \bm d_k = \|\nabla f(\bm x_k)\| \|\bm d_k\| \cos \theta,
+ where \theta is the angle between the gradient and the search direction. This condition has a nice geometric interpretation in a contour plot for an optimization in \mathbb R^2. Consider the line tangent to the function countour at \bm x_k. A descent direction must be in the other half-plane generated by the tangent line than into which the gradient \nabla f(\bm x_k) points.
+
Beware that it is only guaranteed that the cost function is reduced if the length of the step is sufficently small. For longer steps the higher-order terms in the Taylor’s series approximation of the cost function can dominate.
+
Before we proceed to the question of which descent direction to choose and how to find it, we adress the question of how far to go along the chosen direction. This is the problem of line search.
+
+
Step length determination (aka line search)
+
Note that once the search direction has been fixed (whether we used the negative of the gradient or any other descent direction), the problem of finding the step length \alpha_k is just a scalar optimization problem. It turns out, however, that besides finding the true minimum along the search directions, it is often sufficient to find the minimum only approximately, or not aiming at minimization at all and work with a fixed step length instead.
+
+
Fixed length of the step
+
Here we give a guidance on the choice of the lenght of the step. But we need to introduce a useful concept first.
+
+
Definition 2 (L-smoothness) For a continuously differentiable function f, the gradient \nabla f is said to be L-smooth if there exists a constant L>0 such that
+\|\nabla f(x) - \nabla f(y)\| \leq L \|x-y\|.
+
+
+
Not that if the second derivatives exist, L is an upper bound on the norm of the Hessian
+\|\nabla^2 f\|\leq L.
+
+
For quadratic functions, L is the largest eigenvalue of the Hessian
+L = \max_i \lambda_i (\mathbf Q).
+
+
The usefulness of the concept of L-smoothness is that it provides a quadratic function that serves as an upper bound for the original function. This is formulated as the following lemma.
+
+
Lemma 1 (Descent lemma) Consider an L-smooth function f. Then for any \mathbf x_k and \mathbf x_{k+1}, the following inequality holds
+f(\mathbf x_{k+1}) \leq f(\mathbf x_{k}) + \nabla f(\mathbf x_k)^\top (\mathbf x_{k-1}-\mathbf x_{k}) + \frac{L}{2}\|\mathbf x_{k-1}-\mathbf x_{k}\|^2
+
+
+
What implication does the result have on the determination of the step length?
+
+\alpha = \frac{1}{L}
+
+
+
+
Exact line search
+
A number of methos exist: bisection, golden section, Newton, As finding true minium in each iteration is often too computationally costly and hardly needed, we do not have them here. The only exception the Newton’s method, which for vector variables constitutes another descent method on its own and we cover it later.
+
+
Example 1 Here we develop a solution for an exact minimization of a quadratic functions f(\bm x) = \frac{1}{2} \bm x^\top\mathbf Q \bm x + \mathbf c^\top \bm x along a given direction. We show that it leads to a closed-form formula. Although not particularly useful in practice, it is a good exercise in understanding the problem of line search. Furthermore, we will use it later to demonstrate the behaviour of the steepest descent method. The problem is to \operatorname*{minimize}_{\alpha_k} f(\bm x_k + \alpha_k \bm d_k). We express the cost as a function of the current iterate, the direction, and step length.
+\begin{aligned}
+f(\bm x_k + \alpha_k \bm d_k) &= \frac{1}{2}(\bm x_k + \alpha_k\bm d_k)^\top\mathbf Q (\bm x_k + \alpha_k\bm d_k) +\mathbf c^\top(\bm x_k + \alpha_k\bm d_k)\\
+&= \frac{1}{2} \bm x_k^\top\mathbf Q \bm x_k + \bm d_k^\top\mathbf Q\bm x_k \alpha_k + \frac{1}{2} \bm d_k^\top\mathbf Q\bm d_k \alpha_k^2+ \mathbf c^\top(\bm x_k + \alpha_k\bm d_k).
+\end{aligned}
+
+
Differentiating the function with respect to the length of the step, we get
+\frac{\mathrm{d}f(\bm x_k + \alpha_k\bm d_k)}{\mathrm{d}\alpha_k} = \bm d_k^\top \underbrace{(\mathbf Q\bm x_k + \mathbf c)}_{\nabla f(\bm x_k)} + \bm d_k^\top\mathbf Q\bm d_k \alpha_k.
+
+
And now setting the derivative to zero, we find the optimal step length
+\boxed{
+\alpha_k = -\frac{\bm d_k^\top \nabla f(\bm x_k)}{\bm d_k^\top\mathbf Q\bm d_k} = -\frac{\bm d_k^\top (\mathbf Q\bm x_k + \mathbf c)}{\bm d_k^\top\mathbf Q\bm d_k}.}
+
+
+
+
+
Approximate line search – backtracking
+
There are several methods for approximate line search. Here we describe the backtracking algorithm, which is based on the sufficient decrease condition (also known as Armijo condition), which reads
+f(\bm x_k+\alpha_k\bm d_k) - f(\bm x_k) \leq \gamma \alpha_k \mathbf d^T \nabla f(\bm x_k),
+ where \gamma\in(0,1), typically \gamma is very small, say \gamma = 10^{-4}.
+
The term on the right can be be viewed as a linear function of \alpha_k. Its negative slope is a bit less steep than the directional derivative of the function f at \bm x_k. The condition of sufficient decrease thus requires that the cost function (as a function of \alpha_k) is below the graph of this linear function.
+
Now, the backtracking algorithm is parameterized by three parameters: the initial step lenght \alpha_0>0, the typically very small \gamma\in(0,1) that parameterizes the Armijo condition, and yet another parameter \beta\in(0,1).
+
The k-th iteration of the algorithm goes like this: failure of the sufficient decrease condition for a given \alpha_k, or, equivalently satisfaction of the condition
+f(\bm x_k) - f(\bm x_k+\alpha_k\bm d_k) < -\gamma \alpha_k \mathbf d^T \nabla f(\bm x_k)
+ sends the algorithm into another reduction of \alpha_k by \alpha_k = \beta\alpha_k. A reasonable choice for \beta is 0.5, which corresponds to halving the step length upon failure to decrease sufficiently.
+
The backtracking algorithm can be implemented as follows
backtracking_line_search (generic function with 1 method)
+
+
+
Now we are ready to proceed to the question of choosing a descent direction.
+
+
+
+
Steepest descent (aka gradient descent) method
+
A natural candidate for a descent direction is the negative of the gradient
+\bm d_k = -\nabla f(\bm x_k).
+
+
In fact, among all descent directions, this is the one for which the descent is steepest (the gradient determines the direction of steepest ascent), though we will see later that this does not mean that the convergence of the method is the fastest.
+
In each iteration of the gradient method, this is the how the solution is updated
+
+\boxed{
+\bm x_{k+1} = \bm x_{k} - \alpha_k \nabla f(\bm x_{k}),}
+ where determinatio of the step length \alpha_k has already been discussed in the prevous section.
+
Let’s now examine the behaviour of the method by applying it to minimization of a quadratic function. Well, for a quadratic function it is obviously an overkill, but we use it in the example because we can compute the step lenght exactly, which then helps the methods show its best.
+
+
Example 2
+
+
+Show the code
+
usingLinearAlgebra# For dot() function.
+usingPrintf# For formatted output.
+
+x0 = [2, 3] # Initial vector.
+Q = [10; 03] # Positive definite matrix defining the quadratic form.
+c = [1, 2] # Vector defining the linear part.
+
+xs =-Q\c # Stationary point, automatically the minimizer for posdef Q.
+
+ϵ =1e-5# Threshold on the norm of the gradient.
+N =100; # Maximum number of steps .
+
+functiongradient_descent_quadratic_exact(Q,c,x0,ϵ,N)
+ x = x0
+ iter =0
+ f =1/2*dot(x,Q*x)+dot(x,c)
+ ∇f = Q*x+c
+while (norm(∇f) > ϵ)
+ α =dot(∇f,∇f)/dot(∇f,Q*∇f)
+ x = x - α*∇f
+ iter = iter+1
+ f =1/2*dot(x,Q*x)+dot(x,c)
+ ∇f = Q*x+c
+@printf("i = %3d ||∇f(x)|| = %6.4e f(x) = %6.4e\n", iter, norm(∇f), f)
+if iter >= N
+return f,x
+end
+end
+return f,x
+end
+
+fopt,xopt =gradient_descent_quadratic_exact(Q,c,x0,ϵ,N)
We can also decorate the code a bit to visualize how the iterations proceeded.
+
+
+Show the code
+
functiongradient_descent_quadratic_exact_decor(Q,c,x0,ϵ,N)
+ x = x0
+ X = x
+ f =1/2*dot(x,Q*x)+dot(x,c)
+ F = [f,]
+ ∇f = Q*x+c
+ iter =0
+while (norm(∇f) > ϵ)
+ α =dot(∇f,∇f)/dot(∇f,Q*∇f)
+ x = x - α*∇f
+ iter = iter+1
+ f =1/2*dot(x,Q*x)+dot(x,c)
+ ∇f = Q*x+c
+ X =hcat(X,x)
+push!(F,f)
+if iter >= N
+return F,X
+end
+end
+return F,X
+end
+
+F,X =gradient_descent_quadratic_exact_decor(Q,c,x0,ϵ,N)
+
+x1_data = x2_data =-4:0.01:4;
+f(x) =1/2*dot(x,Q*x)+dot(x,c)
+z_data = [f([x1,x2]) for x2=x2_data, x1=x1_data];
+
+usingPlots
+contour(x1_data,x2_data,z_data)
+plot!(X[1,:],X[2,:],label="xk",marker=:diamond,aspect_ratio=1)
+scatter!([x0[1],],[x0[2],],label="x0")
+scatter!([xs[1],],[xs[2],],label="xopt")
+xlabel!("x1");ylabel!("x2");
+xlims!(-4,4); ylims!(-4,4)
+
+
+
+
+
+
+
Altough the number of iterations in the above example is acceptable, a major characteristic of the method is visible. Its convergence is slowing down as we are approaching a local minimum, which is visually recognizable oscillations or zig-zagging. But it can be much worse for some data.
+
+
Gradient method converges slowly for ill-conditioned problems
+
+
Example 3 Consider minimization of the following cost function f(\bm x) = 1000x_1^2 + 40x_1x_2 + x_2^2.
While for the previous problem of the same kind and size the steepest descent method converged in just a few steps, for this particular data it takes many dozens of steps.
+
The culprit here are bad properties of the Hessian matrix Q. By ``bad properties’’ we mean the so-called , which is reflected in the very high . Recall that condition number \kappa for a given matrix \mathbf A is defined as
+\kappa(\mathbf A) = \|\mathbf A^{-1}\|\cdot \|\mathbf A\|
+ and it can be computed as ratio of the largest and smallest singular values, that is,
+\kappa(\mathbf A) = \frac{\sigma_{\max}(\mathbf A)}{\sigma_{\min}(\mathbf A)}.
+\end{equation}$$
+
Ideally this number should be around 1. In the example above it is
+
+
+Show the code
+
cond(Q)
+
+
+
1668.0010671466664
+
+
+
which is well above 1000. Is there anything that we can do about it? The answer is yes. We can scale the original date to improve the conditioning.
+
+
+
+
Scaled gradient method for ill-conditioned problems
+
Upon introducing a matrix \mathbf S that relates the original vector variable \bm x with a new vector variable \bm y according to
+\bm x = \mathbf S \bm y,
+ the optimization cost function changes from f(\bm x) to f(\mathbf S \bm y). Let’s relabel the latter to g(\bm y). And we will now examine how the steepest descent iteration changes. Straightforward application of a chain rule for finding derivatives of composite functions yields
+g'(\bm y) = f'(\mathbf S\bm y) = f'(\mathbf S\bm y)\mathbf S.
+
+
Keeping in mind that gradients are transposes of derivatives, we can write
+\nabla g(\bm y) = \mathbf S^\top \nabla f(\mathbf S\bm y).
+
+
Steepest descent iterations then change accordingly
Upon defining the scaling matrix \mathbf D as \mathbf S \mathbf S^T, a single iteration changes to
+\boxed{\bm x_{k+1} = \bm x_{k} - \alpha_k \mathbf D_k\nabla f(\bm x_{k}).}
+
+
The question now is: how to choose the matrix \mathbf D? We would like to make the Hessian matrix \nabla^2 f(\mathbf S \bm y) (which in the case of a quadratic matrix form is the matrix \mathbf Q as we used it above) better conditioned. Ideally, \nabla^2 f(\mathbf S \bm y)\approx \mathbf I.
+
A simple way for improving the conditioning is to define the scaling matrix \mathbf D as a diagonal matrix whose diagonal entries are given by
+\mathbf D_{ii} = [\nabla^2 f(\bm x_k)]^{-1}_{ii}.
+
+
In words, the diagonal entries of the Hessian matrix are inverted and they then form the diagonal of the scaling matrix.
+
It is worth emphasizing how the algorithm changed: the direction of steepest descent (the negative of the gradient) is premultiplied by some (scaling) matrix. We will see in a few moments that another method – Newton’s method – has a perfectly identical structure.
+
+
+
+
Newton’s method
+
Newton’s method is one of flagship algorithms in numerical computing. I am certainly not exaggerating if I include it in my personal Top 10 list of algorithms relevant for engineers. We may encounter the method in two settings: as a method for solving (sets of) nonlinear equations and as a method for optimization. The two are inherently related and it is useful to be able to see the connection.
+
+
Newton’s method for rootfinding
+
The problem to be solved is that of finding x for which a given function g() vanishes. In other words, we solve the following equation
+g(x) = 0.
+
+
The above state scalar version has also its vector extension
+\mathbf g(\bm x) = \mathbf 0,
+ in which \bm x stands for an n-tuple of variables and \mathbf g() actually stands for an n-tuple of functions. Even more general version allows for different number of variables and equations.
+
We start with a scalar version. A single iteration of the method evaluates not only the value of the function g(x_k) at the given point but also its derivative g'(x_k). It then uses the two to approximate the function g() at x_k by a linear (actually affine) function and computes the intersection of this approximating function with the horizontal axis. This gives as x_{k+1}, that is, the (k+1)-th approximation to a solution (root). We can write this down as
+\begin{aligned}
+\underbrace{g(x_{k+1})}_{0} &= g(x_{k}) + g'(x_{k})(x_{k+1}-x_k)\\
+0 &= g(x_{k}) + g'(x_{k})x_{k+1}-g'(x_{k})x_k),
+\end{aligned}
+ from which the famous formula follows
+\boxed{x_{k+1} = x_{k} - \frac{g(x_k)}{g'(x_k)}.}
+
+
In the vector form, the formula is
+\boxed{\bm x_{k+1} = \bm x_{k} - [\nabla \mathbf g(\bm x_k)^\top]^{-1}\mathbf g(\bm x_k),}
+ where \nabla \mathbf g(\bm x_k)^\top is the (Jacobian) matrix of the first derivatives of \mathbf g at \bm x_k, that is, \nabla \mathbf g() is a matrix with the gradient of the g_i(\bm x) function in its i-th column.
+
+
+
Newton’s method for optimization
+
Once again, restrict ourselves to a scalar case first. The problem is
+\operatorname*{minimize}_{x\in\mathbb{R}}\quad f(x).
+
+
At the k-th iteration of the algorithm, the solution is x_k. The function to be minimized is approximated by a quadratic function m_k() in x. In order to find parameterization of this quadratic function, the function f() but also its first and second derivatives, f'() and f''(), respectively, need to be evaluated at x_k. Using these three, a function m_k(x) approximating f(x) at some xnot too far from x_k can be defined
+m_k(x) = f(x_k) + f'(x_k)(x-x_k) + \frac{1}{2}f''(x_k)(x-x_k)^2.
+
+
The problem of minimizing this new function in the k-th iteration is then formulated, namely,
+
+\operatorname*{minimize}_{x_{k+1}\in\mathbb{R}}\quad m_k(x_{k+1})
+ and solved for some x_{k+1}. The way to find this solution is straightforward: find the derivative of m_k() and find the value of x_{k+1} for which this derivative vanishes. The result is
+\boxed{x_{k+1} = x_{k} - \frac{f'(x_k)}{f''(x_k)}.}
+
+
The vector version of the Newton’s step is
+\boxed{\bm x_{k+1} = \bm x_{k} - [\nabla^2 f(\bm x_k)]^{-1} \nabla f(\bm x_k).}
+
+
A few observations
+
+
If compared to the general prescription for descent direction methods (), the Newton’s method determines the direction and the step lenght at once (both \alpha_k and \mathbf d_k are hidden in - [\nabla^2 f(\mathbf x_k)]^{-1} \nabla f(\mathbf x_k)).
+
If compared with steepest descent (gradient) method, especially with its scaled version (), Newton’s method fits into the framework nicely because the inverse [\nabla^2 f(\mathbf x_k)]^{-1} of the Hessian can be regarded as a kind of a scaling matrix \mathbf D. Indeed, you can find arguments in some textbooks that Newton’s method involves scaling that is optimal in some sense. We skip the details here because we only wanted to highlight the similarity in the structure of the two methods.
+
+
The great popularity of Newton’s method is mainly due to its nice convergence – quadratic. Although we skip any discussion of convergence rates here, note that for all other methods this is an ideal that aims to be approached.
+
The nice convergence rate of Newton’s method is compensated by a few disadvantages
+
+
The need to compute the Hessian. This is perhaps not quite clear with simple problems but can play some role with huge problems.
+
Once the Hessian is computed, it must be inverted (actually, a linear system must by solved). But this assumes that Hessian is nonsingular. How can we guarantee this for a given problem?
+
It is not only that Hessian must be nonsingular but it must also be positive (definite). Note that in the scalar case this corresponds to the situation when the second derivative is positive. Negativeness of the second derivative can send the algorithm in the opposite direction – away from the local minimum – , which which would ruin the convergence of the algorithm.
+
+
The last two issues are handled by some modification of the standard Newton’s method
+
+
Damped Newton’s method
+
A parameter \alpha\in(0,1) is introduced that shortens the step as in
+ \bm x_{k+1} = \bm x_{k} - \alpha(\nabla^2 f(\bm x_k))^{-1} \nabla f(\bm x_k).
+
+
+
+
Fixed constant positive definite matrix instead of the inverse of the Hessian
+
The step is determined as
+ \bm x_{k+1} = \bm x_{k} - \mathbf B \nabla f(\bm x_k).
+
+
Note that the interpretation of the constant \mathbf B in the position of the (inverse of the) Hessian in the rootfinding setting is that the slope of the approximating linear (affine) function is always constant.
+
Now that we admitted to have something else then just the (inverse of the) Hessian in the formula for Newton’s method, we can explore further this new freedom. This will bring us into a family of methods called Quasi-Newton methods.
+
+
+
+
+
+
\ No newline at end of file
diff --git a/opt_algo_unconstrained.html b/opt_algo_unconstrained.html
index 076b5eb..be48290 100644
--- a/opt_algo_unconstrained.html
+++ b/opt_algo_unconstrained.html
@@ -900,7 +900,7 @@
Appro
f(\bm x_k) - f(\bm x_k+\alpha_k\bm d_k) < -\gamma \alpha_k \mathbf d^T \nabla f(\bm x_k)
sends the algorithm into another reduction of \alpha_k by \alpha_k = \beta\alpha_k. A reasonable choice for \beta is 0.5, which corresponds to halving the step length upon failure to decrease sufficiently.
The backtracking algorithm can be implemented as follows
We consider the following optimization problem with equality constraints
+\begin{aligned}
+\operatorname*{minimize}_{\bm x\in\mathbb{R}^n} &\quad f(\bm x)\\
+\text{subject to} &\quad \mathbf h(\bm x) = \mathbf 0,
+\end{aligned}
+ where \mathbf h(\bm x) \in \mathbb R^m defines a set of m equations
+\begin{aligned}
+h_1(\bm x) &= 0\\
+h_2(\bm x) &= 0\\
+\vdots\\
+h_m(\bm x) &= 0.
+\end{aligned}
+
+
Augmenting the original cost function f with the constraint functions h_i scaled by Lagrange variables \lambda_i gives the Lagrangian function
+\mathcal{L}(\bm x,\boldsymbol\lambda) \coloneqq f(\bm x) + \sum_{i=1}^m \lambda_i h_i(\bm x) = f(\bm x) + \boldsymbol \lambda^\top \mathbf h(\bm x).
+
+
+
+
First-order necessary condition of optimality
+
+
The first-order necessary condition of optimality is
+\nabla \mathcal{L}(\bm x,\boldsymbol\lambda) = \mathbf 0,
+ which amounts to two (generally vector) equations
+\boxed{
+\begin{aligned}
+\nabla f(\bm x) + \sum_{i=1}^m \lambda_i \nabla h_i(\bm x) &= \mathbf 0\\
+\mathbf{h}(\bm x) &= \mathbf 0.
+\end{aligned}}
+
+
Defining a matrix \nabla \mathbf h(\bm x) \in \mathbb R^{n\times m} as horizontally stacked gradients of the constraint functions
+\nabla \mathbf h(\bm x) \coloneqq \begin{bmatrix}
+ \nabla h_1(\bm x) && \nabla h_2(\bm x) && \ldots && \nabla h_m(\bm x)
+ \end{bmatrix},
+ in fact, a transpose of the Jacobian matrix, the necessary condition can be rewritten in a vector form as \boxed
+{\begin{aligned}
+\nabla f(\bm x) + \nabla \mathbf h(\bm x)\boldsymbol \lambda &= \mathbf 0\\
+\mathbf{h}(\bm x) &= \mathbf 0.
+\end{aligned}}
+
+
Beware of the nonregularity issue! The (\nabla \mathbf h(\bm x))^\mathrm T is regular at a given \bm x (the \bm x is a regular point) if it has a full column rank. Rank-deficiency reveals a defect in formulation.
+
+
Example 1 (Equality-constrained quadratic program)
+\begin{aligned}
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^n} &\quad \frac{1}{2}\bm{x}^\top\mathbf{Q}\bm{x} + \mathbf{r}^\top\bm{x}\\
+\text{subject to} &\quad \mathbf A \bm x + \mathbf b = \mathbf 0.
+\end{aligned}
+
+
The first-order necessary condition of optimality is
+
+\begin{bmatrix}
+ \mathbf Q & \mathbf A^\top\\\mathbf A & \mathbf 0
+\end{bmatrix}
+\begin{bmatrix}
+ \bm x \\ \boldsymbol \lambda
+\end{bmatrix}
+=
+\begin{bmatrix}
+ -\mathbf r\\\mathbf b
+\end{bmatrix}.
+
+
+
+
+
Second-order sufficient conditions
+
+
Using the unconstrained Hessian \nabla^2_{\mathbf{x}\bm{x}} \mathcal{L}(\bm x,\boldsymbol \lambda) is too conservative. Instead, use projected Hessian
+
+\mathbf{Z}^\mathrm{T}\;\nabla^2_{\bm{x}\bm{x}} \mathcal{L}(\bm x,\boldsymbol \lambda)\;\mathbf Z > 0,
+ where \mathbf Z is an (orthonormal) basis of the nullspace of the Jacobian (\nabla \mathbf h(\bm x))^\top.
+
+
+
+
Inequality constraints
+
+
+\begin{aligned}
+\operatorname*{minimize}_{\bm x\in\mathbb{R}^n} &\quad f(\bm x)\\
+\text{subject to} &\quad \mathbf g(\bm x) \leq \mathbf 0,
+\end{aligned}
+ where \mathbf g(\bm x) \in \mathbb R^p defines a set of p inequalities.
+
+
First-order necessary condition of optimality
+
Karush-Kuhn-Tucker (KKT) conditions of optimality are then composed of these four (sets of) conditions
+\begin{aligned}
+\nabla f(\bm x) + \sum_{i=1}^p \mu_i \nabla g_i(\bm x) &= \mathbf 0,\\
+\mathbf{g}(\bm{x}) &\leq \mathbf 0,\\
+\mu_i g_i(\bm x) &= 0,\quad i = 1,2,\ldots, m\\
+\mu_i &\geq 0,\quad i = 1,2,\ldots, m.
+\end{aligned}
+
Duality theory offers another view of the original optimization problem by bringing in another but related one.
+
Corresponding to the general optimization problem
+ \begin{aligned}
+ \operatorname*{minimize}\;&f(\bm x)\\
+ \text{subject to}\; & \mathbf g(\bm x)\leq \mathbf 0\\
+ & \mathbf h(\bm x) = \mathbf 0,
+ \end{aligned}
+
+
we form the Lagrangian function \mathcal L(\bm x,\bm \lambda,\bm \mu) = f(\bm x) + \bm \lambda^\top \mathbf h(\bm x) + \bm \mu^\top \mathbf g(\bm x)
+
For any (fixed) values of (\bm \lambda,\bm \mu) such that \bm \mu\geq 0, we define the Lagrange dual function through the following unconstrained optimization problem
+q(\bm\lambda,\bm\mu) = \inf_{\bm x}\mathcal L(\bm x,\bm \lambda,\bm \mu).
+
+
Obviously, it is alway possible to pick a feasible solution \bm x, in which case the value of the Lagrangian and the original function coincide, and so the result of this minimization is no worse (larger) than the minimum for the original optimization problem. It can thus serve as a lower bound q(\bm \lambda,\bm \mu) \leq f(\bm x^\star).
+
This result is called weak duality. A natural idea is to find the values of \bm \lambda and \bm \mu such that this lower bound is tightest, that is,
+\begin{aligned}
+ \operatorname*{maximize}_{\bm\lambda, \bm\mu}\; & q(\bm\lambda,\bm\mu)\\
+ \text{subject to}\;& \bm\mu \geq \mathbf 0.
+\end{aligned}
+
+
Under some circumstances the result can be tight, which leads to strong duality, which means that the minimum of the original (primal) problem and the maximum of the dual problem coincide.
+q(\bm \lambda^\star,\bm \mu^\star) = f(\bm x^\star).
+
+
This related dual optimization problem can have some advantages for development of both theory and algorithms.
Realistically complex optimization problems cannot be solved with just a pen and a paper – computer programs (often called optimization solvers) are needed to solve them. And now comes the challenge: as various solvers for even the same class of problems differ in the algorithms they implement, so do their interfaces – every solver expects the inputs (the data defining the optimization problem) in a specific format. This makes it difficult to switch between solvers, as the problem data has to be reformatted every time.
+
+
Example 1 (Data formatting for different solvers) Consider the following optimization problem:
+\begin{aligned}
+ \operatorname*{minimize}_{\bm x \in \mathbb R^2} & \quad \frac{1}{2} \bm x^\top \begin{bmatrix}4 & 1\\ 1 & 2 \end{bmatrix} \bm x + \begin{bmatrix}1 \\ 1\end{bmatrix}^\top \bm x \\
+ \text{subject to} & \quad \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \leq \begin{bmatrix} 1 & 1\\ 1 & 0\\ 0 & 1\end{bmatrix} \bm x \leq \begin{bmatrix}1 \\ 0.7 \\ 0.7\end{bmatrix}
+\end{aligned}
+
+
There are dozens of solvers that can be used to solve this problem. Here we demonstrate a usage of these two: OSQP and COSMO.jl. And we are going to call the solvers in Julia (using the the wrappers OSQP.jl for the former). First, we start with OSQP (in fact, this is their example):
+
+
+Show the code
+
usingOSQP
+usingSparseArrays
+
+# Define the problem data and build the problem description
+P =sparse([4.01.0; 1.02.0])
+q = [1.0; 1.0]
+A =sparse([1.01.0; 1.00.0; 0.01.0])
+l = [1.0; 0.0; 0.0]
+u = [1.0; 0.7; 0.7]
+
+problem_OSQP = OSQP.Model()
+OSQP.setup!(problem_OSQP; P=P, q=q, A=A, l=l, u=u, alpha=1, verbose=false)
+
+# Solve the optimization problem and show the results
+results_OSQP = OSQP.solve!(problem_OSQP)
+results_OSQP.x
+
+
+
Now we do the same with COSMO. First, we must take into account that COSMO cannot accept two-sided inequalities, so we have to reformulate the problem so that the constraints are only in the form of \mathbf A\bm x + b \geq \bm 0:
+\begin{aligned}
+ \operatorname*{minimize}_{\bm x \in \mathbb R^2} & \quad \frac{1}{2} \bm x^\top \begin{bmatrix}4 & 1\\ 1 & 2 \end{bmatrix} \bm x + \begin{bmatrix}1 \\ 1\end{bmatrix}^\top \bm x \\
+ \text{subject to} & \quad \begin{bmatrix} -1 & -1\\ -1 & 0\\ 0 & -1\\ 1 & 1\\ 1 & 0\\ 0 & 1\end{bmatrix}\bm x + \begin{bmatrix}1 \\ 0.7 \\ 0.7 \\ -1 \\ 0 \\ 0\end{bmatrix} \geq \mathbf 0.
+\end{aligned}
+
+
+
+Show the code
+
usingCOSMO
+usingSparseArrays
+
+# Define the problem data and build the problem description
+P =sparse([4.01.0; 1.02.0])
+q = [1.0; 1.0]
+A =sparse([1.01.0; 1.00.0; 0.01.0])
+l = [1.0; 0.0; 0.0]
+u = [1.0; 0.7; 0.7]
+
+Aa = [-A; A]
+ba = [u; -l]
+
+problem_COSMO = COSMO.Model()
+constraint = COSMO.Constraint(Aa, ba, COSMO.Nonnegatives)
+settings = COSMO.Settings(verbose=false)
+assemble!(problem_COSMO, P, q, constraint, settings = settings)
+
+# Solve the optimization problem and show the results
+results_COSMO = COSMO.optimize!(problem_COSMO)
+results_COSMO.x
+
+
+
Although the two solvers are solving the same problem, the data has to be formatted differently for each of them (and the difference in syntax is not negligible either).
+
What if we could formulate the same problem without considering the pecualiarities of each solver? It turns out that it is possible. In Julia we can use JuMP.jl:
+
+
+Show the code
+
usingJuMP
+usingSparseArrays
+usingOSQP, COSMO
+
+# Define the problem data and build the problem description
+P =sparse([4.01.0; 1.02.0])
+q = [1.0; 1.0]
+A =sparse([1.01.0; 1.00.0; 0.01.0])
+l = [1.0; 0.0; 0.0]
+u = [1.0; 0.7; 0.7]
+
+model_JuMP =Model()
+@variable(model_JuMP, x[1:2])
+@objective(model_JuMP, Min, 0.5*x'*P*x + q'*x)
+@constraint(model_JuMP, A*x .<= u)
+@constraint(model_JuMP, A*x .>= l)
+
+# Solve the optimization problem using OSQP and show the results
+set_silent(model_JuMP)
+set_optimizer(model_JuMP, OSQP.Optimizer)
+optimize!(model_JuMP)
+termination_status(model_JuMP)
+x_OSQP =value.(x)
+
+# Now solve the problem using COSMO and show the results
+set_optimizer(model_JuMP, COSMO.Optimizer)
+optimize!(model_JuMP)
+termination_status(model_JuMP)
+x_COSMO =value.(x)
+
+
+
+
Notice how the optimization problem is defined just once in the last code and then different solvers can be chosen to solve it. The code represents an instance of a so-called optimization modelling language (OML), or actually its major class called algebraic modelling language (AML).
+
The key motivation for using an OML/AML is to separate the process of formulating the problem from the process of solving it (using a particular solver). Furthermore, such solver-independent problem description (called optimization model) better mimics the way we formulate these problems using a pen and a paper, making it (perhaps) a bit more convenient to write our own and read someone else’s models.
+
+
+
Why not optimization modelling languages?
+
As a matter of fact, some optimization experts even keep avoiding OML/AML altogether. For example, if a company pays for a (not really cheap) license of Gurobi Optimizer – a powerful optimization library for (MI)LP/QP/QCQP –, it may be the case that for a particular very large-scale optimization problem their optimization specialist will have hard time to find a third-party solver of comparable performance. If then its Python API makes definition of optimization problems convenient too (see the code below), maybe there is little regret that such problem definitions cannot be reused with a third-party solver. The more so that since it is tailored to Gurobi solver, it will offer control over the finest details.
+
+
+Show the code
+
importgurobipy as gp
+importnumpy as np
+
+# Define the data for the model
+P = np.array([[4.0, 1.0], [1.0, 2.0]])
+q = np.array([1.0, 1.0])
+A = np.array([[1.0, 1.0], [1.0, 0.0], [0.0, 1.0]])
+l = np.array([1.0, 0.0, 0.0])
+u = np.array([1.0, 0.7, 0.7])
+
+# Create a new model
+m = gp.Model("qp")
+
+# Create a vector variable
+x = m.addMVar((2,))
+
+# Set the objective
+obj =1/2*(x@P@x+ q@x)
+m.setObjective(obj)
+
+# Add the constraints
+m.addConstr(A@x>= l, "c1")
+m.addConstr(A@x<= u, "c2")
+
+# Run the solver
+m.optimize()
+
+# Print the results
+for v in m.getVars():
+print(f"{v.VarName} {v.X:g}")
+
+print(f"Obj: {m.ObjVal:g}")
+
+
+
Similar and yet different is the story of the IBM ILOG CPLEX, another top-notch solvers addressing the same problems as Gurobi. They do have their own modeling language called Optimization Modelling Language (OPL), but it is also only interfacing with their solver(s). We can only guess that their motivation for developing their own optimization modelling language was that at the time of its developments (in 1990s) Python was still in pre-2.0 stage and formulating optimization problems in programming languages like C/C++ or Fortran was nowhere close to being convenient. Gurobi, in turn, started in 2008, when Python was already a popular language.
+
+
+
Language-independent optimization modelling languages
+
Optimization/algebraic modelling languages were originally developed outside programming languages, essentially as standalone tools. Examples are AMPL, GAMS, and, say, GLPK/GMPL (MathProg). We listed these main names here since they can be bumped across (they are still actively developed), but we are not going to discuss them in our course any further. The reason is that there are now alternatives that are implemented as packages/toolboxes in programming languages such as Julia, Matlab, and Python, which offer a more fluent workflow – a user can use the same programming language to acquire the data, preprocess them, formulate the optimization problem, configure and call a solver, and finally do some postprocessing including a visualization and whatever reporting, all without leaving the language of their choice.
+
+
+
Optimization modelling in Julia
+
My obvious (personal) bias towards Julia programming language is partly due to the terrific support for optimization modelling in Julia:
+
+
JuMP.jl not only constitutes one of the flagship packages of the Julia ecosystem but it is on par with the state of the art optimization modelling languages. Furthermore, being a free and open source software, it enjoys a vibrant community of developers and users. They even meet annually at JuMP-dev conference (in 2023 in Boston, MA).
+
Convex.jl is an implementation of the concept of Disciplined Convex Programming (DCP) in Julia (below we also list its implementations in Matlab and Python). Even though it is now registered as a part of the JuMP.jl project, it is still a separate concept. Interesting, convenient, but it seems to be in a maintanence mode now.
+
+
+
+Show the code
+
usingConvex, SCS
+
+# Define the problem data and build the problem description
+P = [4.01.0; 1.02.0]
+q = [1.0, 1.0]
+A = [1.01.0; 1.00.0; 0.01.0]
+l = [1.0, 0.0, 0.0]
+u = [1.0, 0.7, 0.7]
+
+# Create a vector variable of size n
+x =Variable(2)
+
+# Define the objective
+objective =1/2*quadform(x,P) +dot(q,x)
+
+# Define the constraints
+constraints = [l <= A*x, A*x <= u]
+
+# Define the overal description of the optimization problem
+problem =minimize(objective, constraints)
+
+# Solve the problem
+solve!(problem, SCS.Optimizer; silent_solver =true)
+
+# Check the status of the problem
+problem.status # :Optimal, :Infeasible, :Unbounded etc.
+
+# Get the optimum value
+problem.optval
+
+# Get the optimal x
+x.value
+
+
+
+
+
Optimization modelling in Matlab
+
Popularity of Matlab as a language and an ecosystem for control-related computations is undeniable. Therefore, let’s have a look at what is available for modelling optimization problems in Matlab:
+
+
Optimization Toolbox for Matlab is one of the commercial toolboxes produced by Matlab and Simulink creators. Since the R2017b release the toolbox supports Problem-based optimization workflow (besides the more traditional Solver-based optimization workflow supported since the beginning), which can be regarded as a kind of an optimization/algebraic modelling language, albeit restricted to their own solvers.
+
Yalmip started as Yet Another LMI Parser quite some time ago (which reveals its control theoretic roots), but these days it serves as fairly complete algebraic modelling language (within Matlab), interfacing to perhaps any optimization solver, both commercial and free&open-source. It is free and open-source. Is is still actively developed and maintained and it abounds with tutorials and examples.
+
CVX is a Matlab counterpart of Convex.jl (or the other way around, if you like, since it has been here longer). The name stipulates that it only allows convex optimization probles (unlike Yalmip) – it follows the Disciplined Convex Programming (DCP) paradigm. Unfortunately, the development seems to have stalled – the last update is from 2020.
+
+
+
+
Optimization modelling in Python
+
Python is a very popular language for scientific computing. Although it is arguable if it is actually suitable for implementation of numerical algoritms, when it comes to building optimization models, it does its job fairly well (and the numerical solvers it calls can be developed in different language). Several packages implementing OML/AML are available:
+
+
cvxpy is yet another instantiation of Disciplined Convex Programming that we alredy mention when introducing Convex.jl and CVX. And it turns out that this one exhibits the greatest momentum. The team of developers seems to be have exceeded a critical mass, hence the tools seems like a safe bet already.
+
Pyomo is a popular open-source optimization modelling language within Python.
+
APMonitor and GEKKO are relatively young projects, primarily motivated by applications of machine learning and optimization in chemical process engineering.
+
+
+
+
+
+
\ No newline at end of file
diff --git a/opt_theory_modellers.html b/opt_theory_modellers.html
index 1743138..f0b9977 100644
--- a/opt_theory_modellers.html
+++ b/opt_theory_modellers.html
@@ -815,7 +815,7 @@
Why o
\end{aligned}
There are dozens of solvers that can be used to solve this problem. Here we demonstrate a usage of these two: OSQP and COSMO.jl. And we are going to call the solvers in Julia (using the the wrappers OSQP.jl for the former). First, we start with OSQP (in fact, this is their example):
Although the two solvers are solving the same problem, the data has to be formatted differently for each of them (and the difference in syntax is not negligible either).
What if we could formulate the same problem without considering the pecualiarities of each solver? It turns out that it is possible. In Julia we can use JuMP.jl:
-
+
Show the code
usingJuMP
@@ -911,7 +911,7 @@
Why o
Why not optimization modelling languages?
As a matter of fact, some optimization experts even keep avoiding OML/AML altogether. For example, if a company pays for a (not really cheap) license of Gurobi Optimizer – a powerful optimization library for (MI)LP/QP/QCQP –, it may be the case that for a particular very large-scale optimization problem their optimization specialist will have hard time to find a third-party solver of comparable performance. If then its Python API makes definition of optimization problems convenient too (see the code below), maybe there is little regret that such problem definitions cannot be reused with a third-party solver. The more so that since it is tailored to Gurobi solver, it will offer control over the finest details.
-
+
Show the code
importgurobipy as gp
@@ -961,7 +961,7 @@
Optimizati
JuMP.jl not only constitutes one of the flagship packages of the Julia ecosystem but it is on par with the state of the art optimization modelling languages. Furthermore, being a free and open source software, it enjoys a vibrant community of developers and users. They even meet annually at JuMP-dev conference (in 2023 in Boston, MA).
Convex.jl is an implementation of the concept of Disciplined Convex Programming (DCP) in Julia (below we also list its implementations in Matlab and Python). Even though it is now registered as a part of the JuMP.jl project, it is still a separate concept. Interesting, convenient, but it seems to be in a maintanence mode now.
We formulate a general optimization problem (also a mathematical program) as
+\begin{aligned}
+\operatorname*{minimize} \quad & f(\bm x) \\
+\text{subject to} \quad & \bm x \in \mathcal{X},
+\end{aligned}
+ where f is a scalar function, \bm x can be a scalar, a vector, a matrix or perhaps even a variable of yet another type, and \mathcal{X} is a set of values that \bm x can take, also called the feasible set.
+
+
+
+
+
+
+Note
+
+
+
+
The term “program” here has nothing to do with a computer program. Instead, it was used by the US military during the WWII to refer to plans or schedules in training and logistics.
+
+
+
If maximization of the objective function f() is desired, we can simply multiply the objective function by -1 and minimize the resulting function.
+
Typically there are two types of constraints that can be imposed on the optimization variable \bm x:
+
+
explicit characterization of the set such as \bm x \in \mathbb{R}^n or \bm x \in \mathbb{Z}^n, possibly even a direction enumeration such as \bm x \in \{0,1\}^n in the case of binary variables,
+
implicit characterization of the set using equations and inequalities such as g_i(\bm x) \leq 0 and h_i(\bm x) = 0 for i = 1, \ldots, m and j = 1, \ldots, p.
+
+
An example of a more structured and yet sufficiently general optimization problems over several real and integer variables is
+\begin{aligned}
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^{n_x}, \, \bm y \in \mathbb{Z}^{n_y}} \quad & f(\bm x, \bm y) \\
+\text{subject to} \quad & g_i(\bm x, \bm y) \leq 0, \quad i = 1, \ldots, m, \\
+& h_j(\bm x, \bm y) = 0, \quad j = 1, \ldots, p.
+\end{aligned}
+
+
Indeed, for named sets such as \mathbb R or \mathbb Z, it is common to place the set constraints directly underneath the word “minimize”. But it is just one convention, and these constraints could be listed into the “subject to” section as well.
+
+
+
+
+
+
+Integer optimization not in this course
+
+
+
+
In this course we are only going to consider optimization problems with real-valued variables. This decision does not suggest that optimization with integer variables is less relevant for optimal control, quite the opposite! It is just that the theory and algorithms for integer or mixed integer optimization are based on different principles than those for real variables. And they can hardly fit into a single course. Good news for the interested students is that a graduate course named Combinatorial algorithms (B3B35KOA) covering integer optimization in detail is offered by the Cybernetics and Robotics study program at CTU FEE. Applications of integer optimization to optimal control are part of the course on Hybrid systems (B3B39HYS).
+
+
+
The form of an optimization problem that we are going to use in our course most often than not is
+\begin{aligned}
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^n} \quad & f(\bm x) \\
+\text{subject to} \quad & g_i(\bm x) \leq 0, \quad i = 1, \ldots, m, \\
+& h_i(\bm x) = 0, \quad i = 1, \ldots, p,
+\end{aligned}
+ which can also be written using vector-valued functions (reflected in the use of the bold face for their names)
+\begin{aligned}
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^n} \quad & f(\bm x) \\
+\text{subject to} \quad & \mathbf g(\bm x) \leq 0,\\
+& \mathbf h(\bm x) = 0.
+\end{aligned}
+
+
+
+
Properties of optimization problems
+
It is now the presence/absence and the properties of individual components in the optimization problem defined above that characterize classes of optimization problems. In particular, we can identify the following properties:
+
+
Unconstrained vs constrained
+
+Practically relevant problems are almost always constrained. But still there are good reasons to study unconstrained problems too, as many theoretical results and algorithms for constrained problems are based on transformations to unconstrained problems.
+
+
Linear vs nonlinear
+
+By linear problems we mean problems where the objective function and all the functions defining the constraints are linear (or actually affine) functions of the optimization variable \bm x. Such problems constitute the simplest class of optimization problems, are very well understood, and there are efficient algorithms for solving them. In contrast, nonlinear problems are typically more difficult to solve (but see the discussion of the role of convexity below).
+
+
Smooth vs nonsmooth
+
+Efficient algorithms for optimization over real variables benefit heavily from knowledge of the derivatives of the objective and constraint functions. If the functions are differentiable (aka smooth), we say that the whole optimization problem is smooth. Nonsmooth problems are typically a more difficult to analyze and solve (but again, see the discussion of the role of convexity below).
+
+
Convex vs nonconvex
+
+If the objective function and the feasible set are convex (the latter holds when the functions defining the inequality constraints are convex and the functions defining the equality constrains are affine), the whole optimization problem is convex. Convex optimization problems are very well understood and there are efficient algorithms for solving them. In contrast, nonconvex problems are typically more difficult to solve. It turns out that convexity is a lot more important property than linearity and smoothness when it comes to solving optimization problems efficiently.
+
+
+
+
+
Classes of optimization problems
+
Based on the properties discussed above, we can identify the following distinct classes of optimization problems:
+
+
Linear program (LP)
+
+\begin{aligned}
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^n} \quad & \mathbf c^\top \bm x \\
+\text{subject to} \quad & \mathbf A_\mathrm{ineq}\bm x \leq \mathbf b_\mathrm{ineq},\\
+& \mathbf A_\mathrm{eq}\bm x = \mathbf b_\mathrm{eq}.
+\end{aligned}
+
+
An LP is obviously linear, hence it is also smooth and convex.
+
Some theoretical results and numerical algorithms require a linear program in a specific form, called the standard form:
+\begin{aligned}
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^n} \quad & \mathbf c^\top \bm x \\
+\text{subject to} \quad & \mathbf A\bm x = \mathbf b,\\
+& \bm x \geq \mathbf 0,
+\end{aligned}
+ where the inequality \bm x \geq \mathbf 0 is understood componentwise, that is, x_i \geq 0 for all i = 1, \ldots, n.
+
+
+
Quadratic program (QP)
+
+\begin{aligned}
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^n} \quad & \bm x^\top \mathbf Q \bm x + \mathbf c^\top \bm x \\
+\text{subject to} \quad & \mathbf A_\mathrm{ineq}\bm x \leq \mathbf b_\mathrm{ineq},\\
+& \mathbf A_\mathrm{eq}\bm x = \mathbf b_\mathrm{eq}.
+\end{aligned}
+
+
Even though the QP is nonlinear, it is smooth and if the matrix \mathbf Q is positive semidefinite, it is convex. Its analysis and numerical solution are not much more difficult than those of an LP problem.
+
+
Quadratically constrained quadratic program (QCQP)
+
It is worth emphasizing that for the standard QP the constraints are still given by a system of linear equations and inequalities. Sometimes we can encounter problems in which not only the cost function but also the functions defining the constraints are quadratic as in
+\begin{aligned}
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^n} \quad & \bm x^\top \mathbf Q \bm x + \mathbf c^\top \bm x \\
+\text{subject to} \quad & \bm x^\top \mathbf A_i\bm x + \mathbf b_i \bm x + c_i \leq \mathbf 0, \quad i=1, \ldots, m.
+\end{aligned}
+
+
A QCQP problem is convex if and only if the the constraints define a convex feasible set, which is the case when all the matrices \mathbf A_i are positive semidefinite.
+
+
+
+
Conic program (CP)
+
First, what is a cone? It is a set such that if something is in the cone, then a multiple of it by a nonnegative number is still in the set. We are going to restrict ourselves to regular cones, which are are pointed, closed and convex. An example of such regular cone in a plane is in Figure 1 below.
+
+
+
+
Now, what is the point in using cones in optimization? Reformulation of nonlinear optimization problems using cones constitutes a systematic way to identify what these (conic) optimization problems have in common with linear programs, for which powerful theory and efficient algorithms exist.
+
Note that an LP in the standard form can be written as
+\begin{aligned}
+\operatorname*{minimize} &\quad \mathbf c^\top \bm x\\
+\text{subject to} &\quad \mathbf A\bm x = \mathbf b,\\
+&\quad \bm x\in \mathbb{R}_+^n,
+\end{aligned}
+ where \mathbb R_+^n is a positive orthant. Now, the positive orthant is a convex cone! We can then see the LP as an instance of a general conic optimization problem (conic program)
+
+\begin{aligned}
+\operatorname*{minimize} &\quad \mathbf c^\top \bm x\\
+\text{subject to} &\quad \mathbf A\bm x = \mathbf b,\\
+&\quad \bm x\in \mathcal{K},
+\end{aligned}
+ where \mathcal{K} is a cone in \mathbb R^n.
+
+
+
+
+
+
+Inequality as belonging to a cone
+
+
+
+
A fundamental idea unrolled here: the inequality \bm x\geq 0 can be interpreted as \bm x belonging to a componentwise nonegative cone, that is \bm x \in \mathbb R_+^n. What if some other cone \mathcal K is considered? What would be the interpretation of the inequality then?
+
+
+
Sometimes in order to emphasize that the inequality is induced by the cone \mathcal K, we write it as \geq_\mathcal{K}. Another convention – the one we actually adopt here – is to use another symbol for the inequality \succeq to distinguish it from the componentwise meaning, assuming that the cone is understood from the context. We then interpret conic inequalities such as
+\mathbf A_\mathrm{ineq}\bm x \succeq \mathbf b_\mathrm{ineq}
+ in the sense that
+\mathbf A_\mathrm{ineq}\bm x - \mathbf b_\mathrm{ineq} \in \mathcal{K}.
+
+
It is high time to explore some concrete cones (other than the positive orthant). We consider just two, but there are a few more, see the literature.
+
+
Second-order cone program (SOCP)
+
The most immediate cone in \mathbb R^n is the second-order cone, also called the Lorentz cone or even the ice cream cone. We explain it in \mathbb R^3 for the ease of visualization, but generalization to \mathbb R^n is straightforward. The second-order cone in \mathbb R^3 is defined as
+\mathcal{K}_\mathrm{SOC}^3 = \left\{ \bm x \in \mathbb R^3 \mid \sqrt{x_1^2 + x_2^2} \leq x_3 \right\}.
+
Which of the three axes plays the role of the axis of symmetry for the cone must be agreed beforhand. Singling this direction out, the SOC in \mathbb R^n can also be formulated as
+\mathcal{K}_\mathrm{SOC}^n = \left\{ (\bm x, t) \in \mathbb R^{n-1} \times \mathbb R \mid \|\bm x\|_2 \leq t \right\}.
+
+
A second-order conic program in standard form is then
+\begin{aligned}
+\operatorname*{minimize} &\quad \mathbf c^\top \bm x\\
+\text{subject to} &\quad \mathbf A\bm x = \mathbf b,\\
+&\quad \bm x\in \mathcal{K}_\mathrm{SOC}^n,
+\end{aligned}
+
+
which can be written explicitly as
+\begin{aligned}
+\operatorname*{minimize} &\quad \mathbf c^\top \bm x\\
+\text{subject to} &\quad \mathbf A\bm x = \mathbf b,\\
+&\quad x_1^2 + \cdots + x_{n-1}^2 - x_n^2 \leq 0.
+\end{aligned}
+
+
A second-order conic program can also come in non-standard form such as
+\begin{aligned}
+\operatorname*{minimize} &\quad \mathbf c^\top \bm x\\
+\text{subject to} &\quad \mathbf A_\mathrm{ineq}\bm x \succeq \mathbf b_\mathrm{ineq}.
+\end{aligned}
+
+
Assuming the data is structured as
+\begin{bmatrix}
+\mathbf A\\
+\mathbf c^\top
+\end{bmatrix}
+\bm x \succeq
+\begin{bmatrix}
+\mathbf b\\
+d
+\end{bmatrix},
+ the inequality can be rewritten as
+\begin{bmatrix}
+\mathbf A\\
+\mathbf c^\top
+\end{bmatrix}
+\bm x -
+\begin{bmatrix}
+\mathbf b\\
+d
+\end{bmatrix} \in \mathcal{K}_\mathrm{SOC}^n,
+ which finally gives
+\|\mathbf A \bm x - \mathbf b\|_2 \leq \mathbf c^\top \bm x + d.
+
+
To summarize, another form of a second-order cone program (SOCP) is
+
+\begin{aligned}
+\operatorname*{minimize} &\quad \mathbf c^\top \bm x\\
+\text{subject to} &\quad \mathbf A_\mathrm{eq}\bm x = \mathbf b_\mathrm{eq},\\
+&\quad \|\mathbf A \bm x - \mathbf b\|_2 \leq \mathbf c^\top \bm x + d.
+\end{aligned}
+
+
We can see that the SOCP contains both linear and quadratic constraints, hence it generalizes LP and QP, including convex QCQP. To see the latter, expand the square of \|\mathbf A \bm x - \mathbf b\|_2 into (\bm x^\top \mathbf A^\top - \mathbf b^\top)(\mathbf A \bm x - \mathbf b) = \bm x^\top \mathbf A^\top \mathbf A \bm x + \ldots
+
+
+
Semidefinite program (SDP)
+
Another cone of great importance the control theory is the cone of positive semidefinite matrices. It is commonly denoted as \mathcal S_+^n and is defined as
+\mathcal S_+^n = \left\{ \bm X \in \mathbb R^{n \times n} \mid \bm X = \bm X^\top, \, \bm z^\top \bm X \bm z \geq 0\; \forall \bm z\in \mathbb R^n \right\},
+ and with this cone the inequality \mathbf X \succeq 0 is a common way to express that \mathbf X is positive semidefinite.
+
Unlike the previous classes of optimization problems, this one is formulated with matrix variables instead of vector ones. But nothing prevents us from collecting the components of a symmetric matrix into a vector and proceed with vectors as usual, if needed:
An optimization problem with matrix variables constrained to be in the cone of semidefinite matrices (or their vector representations) is called a semidefinite program (SDP). As usual, we start with the standard form, in which the cost function is linear and the optimization is subject to an affine constraint and a conic constraint. In the following, in place of the inner products of two vectors \mathbf c^\top x we are going to use inner products of matrices defined as
+\langle \mathbf C, \bm X\rangle = \operatorname{Tr} \mathbf C \bm X,
+ where \operatorname{Tr} is a trace of a matrix defined as the sum of the diagonal elements.
+
The SDP program in the standard form is then
+\begin{aligned}
+\operatorname{minimize}_{\bm X} &\quad \operatorname{Tr} \mathbf C \bm X\\
+\text{subject to} &\quad \operatorname{Tr} \mathbf A_i \bm X = \mathbf b_i, \quad i=1, \ldots, m,\\
+&\quad \bm X \in \mathcal S_+^n,
+\end{aligned}
+ where the latter constraint is more often than not written as \bm X \succeq 0, understanding from the context that the cone of positive definite matrices is assumed.
+
+
+
Other conic programs
+
We are not going to cover them here, but we only enumerate a few other cones useful in optimization: rotated second-order cone, exponential cone, power cone, … A concise overview is in (“MOSEK Modeling Cookbook” 2024)
+
+
+
+
Geometric program (GP)
+
+
+
Nonlinear program (NLP)
+
For completeness we include here once again the general nonlinear programming problem:
Smoothness of the problem can easily be determined based on the differentiability of the functions. Convexity can also be determined by inspecting the functions, but this is not necessarily easy. One way to check convexity of a function is to view it as a composition of simple functions and exploit the knowledge about convexity of these simple functions. See (Boyd and Vandenberghe 2004, sec. 3.2)
If a single reference book on nonlinear optimization is to be recommended, be it (Nocedal and Wright 2006) that sits on your book shelf.
+
If one or two more can still fit, (Bertsekas 2016), (Luenberger and Ye 2021) are classical comprehensive references on nonlinear programming (the latter covers linear programming too).
+
While all the three books are only available for purchase, there is a wealth of resources are freely available online such as the notes (Gros and Diehl 2022) accompanying a course on optimal control, which do a decent job of introduction to a nonlinear programming, and beautifully typeset modern textbooks (Kochenderfer and Wheeler 2019) and (Martins and Ning 2022), the former based on Julia language. Yet another high-quality textbook that is freely available online is (Bierlaire 2018).
+
When restricting to convex optimization, the bible of this field (Boyd and Vandenberghe 2004) is also freely available online. It is a must-have for everyone interested in optimization. Another textbook biased towards convex optimization is (Calafiore 2014), which is freely accessible through its web version. Yet another advanced and treatment of convex optimization is (Ben-Tal and Nemirovski 2023), which is also freely available online.
+
Maybe a bit unexpected resources on theory are materials accompanying some optimization software. Partilarly recommendable is (“MOSEK Modeling Cookbook” 2024), it is very useful even if you do not indend to use their software.
+Gros, Sebastien, and Moritz Diehl. 2022. “Numerical Optimal Control (Draft).” Systems Control; Optimization Laboratory IMTEK, Faculty of Engineering, University of Freiburg. https://www.syscop.de/files/2020ss/NOC/book-NOCSE.pdf.
+
+Luenberger, David G., and Yinyu Ye. 2021. Linear and Nonlinear Programming. 5th ed. Vol. 228. International Series in Operations Research & Management Science. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-030-85450-8.
+
+
+Martins, Joaquim R. R. A., and Andrew Ning. 2022. Engineering Design Optimization. Cambridge ; New York, NY: Cambridge University Press. https://mdobook.github.io/.
+
+Nocedal, Jorge, and Stephen Wright. 2006. Numerical Optimization. 2nd ed. Springer Series in Operations Research and Financial Engineering. New York: Springer. https://doi.org/10.1007/978-0-387-40065-5.
+
Given a function f(\bm x), we can maximize it by minimizing -f(\bm x).
+
+
+
Equality into inequality constraints
+
As a matter of fact, we could declare as the most general format of an NLP problem the one with only inequality constraints. This is because we can always transform an equality constraint into two inequality constraints. Given an equality constraint h(\bm x) = 0, we can write it as h(\bm x) \leq 0 and -h(\bm x) \leq 0, that is,
On the other hand, it may be useful to keep the equality constraints explicit in the problem formulation for the benefit of theoretical analysis, numerical methods and convenience of the user/modeller.
+
+
+
Inequality into “sort-of” equality constraints
+
Consider the inequality constraint g(\bm x) \leq 0. By introducing a slack variables and imposing the nonnegativity condition, we can turn the inequality into the equality g(\bm x) + s = 0. Well, we have not completely discarded an inequality because now we have s \geq 0. But this new problem may be better suited for some theoretical analysis or numerical methods.
+
It is also possible to express the nonnegativity constraint implicitly by considering an unrestricted variable s and using it within the inequality through its square s^2:
+
+g(\bm x) + s^2 = 0.
+
+
+
+
Linear cost function always possible
+
Given a cost function f(\bm x) to be minimized, we can always upper-bound it by a new variable \gamma accompanied by a new constraint f(\bm x) \leq \gamma and then minimize just \gamma
+\begin{aligned}
+\operatorname*{minimize}_{\bm{x}\in\mathbb R^n, \gamma\in\mathbb R} & \quad \gamma \\
+\text{subject to} & \quad f(\bm x) \leq \gamma.
+\end{aligned}
+
+
+
+
Absolute value
+
Consider an optimization problem in which the cost function contains the absolute value of a variable
+\begin{aligned}
+\operatorname*{minimize} &\quad \sum_i c_i|x_i|\\
+\text{subject to} &\quad \mathbf A \bm x \geq \mathbf b.
+\end{aligned}
+
+
We also impose the restriction that all the coefficients c_i are nonnegative. The cost function is then a sum of piecewise linear convex function, which can be shown to be convex.
+
The trouble with the absolute value function is that it is not linear, it is not even smooth. And yet, as we will see below, this optimization with the absolute value can be reformulated as a linear program.
+
One possible reformulation introduces two new nonnegative (vector) variables \bm x^+\geq 0 and \bm x^-\geq 0 and with which the original variables can be expressed as x_i = x_i^+ - x_i^-, \; i=1, \ldots, n. The cost function can then be written as \sum c_i|x_i| = \sum_i c_i (x_i^+ + x_i^-).
+
This may look surprising (and incorrect) at first, but we argue that at an optimum, x_i^+ or x_i^- must be zero. Otherwise we could subtract (in case c_i>0) the same amount from/to both, which would not change the satisfaction of the constraints (this modification cancels in x_i = x_i^+ - x_i^-), and the cost would be further reduced.
Here we are going to analyze the optimization problem with no constraints
+\operatorname*{minimize}_{\bm x \in \mathbb{R}^n} \quad f(\bm x).
+
+
Why are we considering such an unrealistic problem? After all, every engineering problem is subject to some constraints.
+
Besides the standard teacher’s answer that we should always start with easier problems, there is another answer: it is common for analysis and algorithms for constrained optimization problems to reformulate them as unconstrained ones and then apply tools for unconstrained problems.
+
+
Local vs global optimality
+
First, let’s define carefully what we mean by a minimum in the unconstrained problem.
+
+
+
+
+
+
+Caution
+
+
+
+
For those whose mother tongue does not use articles such as the and a/an in English, it is worth emphasizing that there is a difference between “the minimum” and “a minimum”. In the former we assume that there is just one minimum, in the latter we make no such assumption.
+
+
+
Consider a (scalar) function of a scalar variable for simplicity
+
+
We say, that the function has a local minimum at x^\star if f(x)\geq f(x^\star) in an \varepsilon neighbourhood. All the red dots in the above figure are local minima. Similarly, of course, the function has a local maximum at x^\star if f(x)\leq f(x^\star) in an \varepsilon neighbourhood. Such local maxima are the green dots in the figure. The smallest and the largest of these are global minima and maxima, respectively.
+
+
+
Conditions of optimality
+
Here we consider two types of conditions of optimality for unconstrained minimization problems: necessary and sufficient conditions. Necessary conditions must be satisfied at the minimum, but even when they are, the optimality is not guaranteed. On the other hand, the sufficient conditions need not be satisfied at the minimum, but if they are, the optimality is guaranteed. We show the necessary conditions of the first and second order, while the sufficient condition only of the second order.
+
+
Scalar optimization variable
+
You may want to have a look at the video, but below we continue with the text that summarizes the video.
+
+
We recall here the fundamental assumption made at the beginning of our introduction to optimization – we only consider optimization variables that are real-valued (first, just scalar x \in \mathbb R, later vectors \bm x \in \mathbb R^n), and objective functions f() that are sufficiently smooth – all the derivatives exist. Then the conditions of optimality can be derived upon inspecting the Taylor series approximation of the cost function around the minimum.
+
+
Taylor series approximation around the optimum
+
Denote x^\star as the (local) minimum of the function f(x). The Taylor series expansion of f(x) around x^\star is
+
+f(x^\star+\alpha) = f(x^\star)+\left.\frac{\mathrm{d}f(x)}{\mathrm{d} x}\right|_{x=x^\star}\alpha + \frac{1}{2}\left.\frac{\mathrm{d^2}f(x)}{\mathrm{d} x^2}\right|_{x=x^\star}\alpha^2 + {\color{blue}\mathcal{O}(\alpha^3)},
+ where \mathcal{O}() is called Big O and has the property that
+\lim_{\alpha\rightarrow 0}\frac{\mathcal{O}(\alpha^3)}{\alpha^3} \leq M<\infty.
+
+
Alternatively, we can write the Taylor series expansion as
+f(x^\star+\alpha) = f(x^\star)+\left.\frac{\mathrm{d}f(x)}{\mathrm{d} x}\right|_{x=x^\star}\alpha + \frac{1}{2}\left.\frac{\mathrm{d^2}f(x)}{\mathrm{d} x^2}\right|_{x=x^\star}\alpha^2 + {\color{red}o(\alpha^2)},
+ using the little o with the property that
+\lim_{\alpha\rightarrow 0}\frac{o(\alpha^2)}{\alpha^2} = 0.
+
+
Whether \mathcal{O}() or \mathcal{o}() concepts are used, it is just a matter of personal preference. They both express that the higher-order terms in the expansion tend to be negligible compare to the first- and second-order term as \alpha is getting smaller. \mathcal O(\alpha^3) goes to zero at least as fast as a cubic function, while o(\alpha^2) goes to zero faster than a quadratic function.
+
It is indeed important to understand that this negligibility of the higher-order terms is only valid asymptotically – for a particular \alpha it may easily happend that, say, the third-order term is still dominating.
+
+
+
First-order necessary conditions of optimality
+
For \alpha sufficiently small, the first-order Taylor series expansion is a good approximation of the function f(x) around the minimum. Since \alpha enters this expansion linearly, the cost function can increase or decrease with \alpha, depending on the sign of the first derivative. The only way to ensure that the function as a (local) minimu at x^\star is to have the first derivative equal to zero, that is \boxed{
+\left.\frac{\mathrm{d}f(x)}{\mathrm{d} x}\right|_{x=x^\star} = 0.}
+
+
+
+
Second-order necessary conditions of optimality
+
Once the first-order necessary condition of optimality is satisfied, the dominating term (as \alpha is getting smalle) is the second-order term \frac{1}{2}\left.\frac{\mathrm{d^2}f(x)}{\mathrm{d} x^2}\right|_{x=x^\star}\alpha^2. Since \alpha is squared, it is the sign of the second derivative that determines the contribution of the whole second-order term to the cost function value. For the minimum, the second derivative must be nonnegative, that is
For completeness we state that the sign must be nonpositive for the maximum.
+
+
+
Second-order sufficient condition of optimality
+
Following the same line of reasoning as above, the if the second derivative is positive, the miniumum is guaranteed, that is, the sufficient condition of optimality is
+\boxed{\left.\frac{\mathrm{d^2}f(x)}{\mathrm{d} x^2}\right|_{x=x^\star} > 0.}
+
+
If the second derivative fails to be positive and is just zero (thus still satisfying the necessary condition), does it mean that the point is not a minimum? No. We must examine higher order terms.
+
+
+
+
Vector optimization variable
+
Once again, should you prefer watching a video, here it is, but below we continue with the text that covers the content of the video.
+
+
+
First-order necessary conditions of optimality
+
One way to handle the vector variables is to convert the vector problem into a scalar one by fixing a direction to an arbitrary vector \bm d and then considering the scalar function of the form f(\bm x^\star + \alpha \bm d). For convenience we define a new function
+g(\alpha) \coloneqq f(\bm x^\star + \alpha \bm d)
+ and from now on we can invoke the results for scalar functions. Namely, we expand the g() function around zero as
+g(\alpha) = g(0) + \frac{\mathrm{d}g(\alpha)}{\mathrm{d}\alpha}\bigg|_{\alpha=0}\alpha + \frac{1}{2}\frac{\mathrm{d}^2 g(\alpha)}{\mathrm{d}\alpha^2}\bigg|_{\alpha=0}\alpha^2 + \mathcal{O}(\alpha^3),
+ and argue that the first-order necessary condition of optimality is
+\frac{\mathrm{d}g(\alpha)}{\mathrm{d}\alpha}\bigg|_{\alpha=0} = 0.
+
+
Now, invoking the chain rule, we go back from g() to f()
+\frac{\mathrm{d}g(\alpha)}{\mathrm{d}\alpha}\bigg|_{\alpha=0} = \frac{\partial f(\bm x)}{\partial\bm x}\bigg|_{\bm x=\bm x^\star} \frac{\partial(\bm x^\star + \alpha \bm d)}{\partial\alpha}\bigg|_{\alpha=0} = \frac{\partial f(\bm x)}{\partial\bm x}\bigg|_{\bm x=\bm x^\star}\,\bm d = 0,
+ where \frac{\partial f(\bm x)}{\partial\bm x}\bigg|_{\bm x=\bm x^\star} is a row vector of partial derivatives of f() evaluated at \bm x^\star. Since the vector \bm d is arbitrary, the necessary condition is that
More often than not we use the column vector to store partial derivatives. We call it the gradient of the function f() and denoted it as
+\nabla f(\bm x) \coloneqq \begin{bmatrix}\frac{\partial f(\bm x)}{\partial x_1} \\ \frac{\partial f(\bm x)}{\partial x_n} \\ \vdots \\ \frac{\partial f(\bm x)}{\partial x_n}\end{bmatrix}.
+
+
The first-order necessary condition of optimality using gradients is then
+\boxed{\left.\nabla f(\bm x)\right|_{x=x^\star} = \mathbf 0. }
+
+
+
+
+
+
+
+Gradient is a column vector
+
+
+
+
In some literature the gradient \nabla f(\bm x) is defined as a row vector. For the condition of optimality it does not matteer since all we require is that all partial derivatives vanish. But for other purposes in our text we regard the gradient as a vector living in the same vector space \mathbb R^n as the optimization variable. The row vector is sometimes denoted as \mathrm Df(\bm x).
+
+
+
+
Computing the gradient of a scalar function of a vector variable
+
A convenient way is to compute the differential fist and then to identify the derivative in it. Recall that the differential is the first-order approximation to the increment of the function due to a change in the variable
+
+\Delta f \approx \mathrm{d}f = \nabla f(x)^\top \mathrm d \bm x.
+
+
Finding the differential of a function is conceptually easier than finding the derivative since it is a scalar quantity. When searching for the differential of a composed function, we follow the same rules as for the derivative (such as that the one for finding the differential of a product). Let’s illustrate it using an example.
+
+
Example 1 For the function
+f(\mathbf x) = \frac{1}{2}\bm{x}^\top\mathbf{Q}\bm{x} + \mathbf{r}^\top\bm{x},
+ where \mathbf Q is symetric, the differential is
+\mathrm{d}f = \frac{1}{2}\mathrm d\bm{x}^\top\mathbf{Q}\bm{x} + \frac{1}{2}\bm{x}^\top\mathbf{Q}\mathrm d\bm{x} + \mathbf{r}^\top\mathrm{d}\bm{x},
+ in which the first two terms can be combined thanks to the fact that they are scalars
+\mathrm{d}f = \left(\bm{x}^\top\frac{\mathbf{Q} + \mathbf{Q}^\top}{2} + \mathbf{r}^\top\right)\mathrm{d}\bm{x},
+ and finally, since we assumed that \mathbf Q is a symmetric matrix, we get
+\mathrm{d}f = \left(\mathbf{Q}\bm{x} + \mathbf{r}\right)^\top\mathrm{d}\bm{x},
+ from which we can identify the gradient as
+\nabla f(\mathbf{x}) = \mathbf{Q}\mathbf{x} + \mathbf{r}.
+
+
The first-order condition of optimality is then
+\boxed{\mathbf{Q}\mathbf{x} = -\mathbf{r}.}
+
+
Although this was just an example, it is actually a very useful one. Keep this result in mind – necessary condition of optimality of a quadratic function comes in the form of a set of linear equations.
+
+
+
+
+
Second-order necessary conditions of optimality
+
As before, we fix the direction \bm d and consider the function g(\alpha) = f(\bm x^\star + \alpha \bm d). We expand the expression for the first derivative as
+\frac{\mathrm d g(\alpha)}{\mathrm d \alpha} = \sum_{i=1}^{n}\frac{\partial f(\bm x)}{\partial x_i}\bigg|_{\bm x = \bm x^\star} d_i,
+ and differentiating this once again, we get the second derivative
+\frac{\mathrm d^2 g(\alpha)}{\mathrm d \alpha^2} = \sum_{i,j=1}^{n}\frac{\partial^2 f(\bm x)}{\partial x_ix_j}\bigg|_{\bm x = \bm x^\star}d_id_j
+
Since \bm d is arbitrary, the second-order necessary condition of optimality is then
+\boxed{\nabla^\text{2}f(\mathbf x)\bigg|_{\bm x = \bm x^\star} \succeq 0,}
+ where, once again, the inequality \succeq reads that the matrix is positive semidefinite.
+
+
+
Second-order sufficient condition of optimality
+
+\boxed{\nabla^2 f(\mathbf x)\bigg|_{\bm x = \bm x^\star} \succ 0,}
+ where, once again, the inequality \succ reads that the matrix is positive definite.
+
+
Example 2 For the quadratic function f(\mathbf x) = \frac{1}{2}\mathbf{x}^\mathrm{T}\mathbf{Q}\mathbf{x} + \mathbf{r}^\mathrm{T}\mathbf{x}, the Hessian is
+\nabla^2 f(\mathbf{x}) = \mathbf{Q}
+ and the second-order necessary condition of optimality is
+\boxed{\mathbf{Q} \succeq 0.}
+
+
Second-order sufficient condition of optimality is then
+\boxed{\mathbf{Q} \succ 0.}
+
+
Once again, this was more than just an example – quadratic functions are so important for us that it is worth remembering this result.
+
+
+
+
+
+
Classification of stationary points
+
For a stationary (also critical) \bm x^\star, that is, one that satisfies the first-order necessary condition
+\nabla f(\bm x^\star) = 0,
+
+
we can classify it as
+
+
Minimum: \nabla^2 f(x^\star)\succ 0
+
Maximum: \nabla^2 f(x^\star)\prec 0
+
Saddle point: \nabla^2 f(x^\star) indefinite
+
Singular point (we cannot decide): \nabla^2 f(x^\star)=0
+
+
+
Example 3 (Minimum of a quadratic function) We consider a quadratic function f(\mathbf x) = \frac{1}{2}\mathbf{x}^\mathrm{T}\mathbf{Q}\mathbf{x} + \mathbf{r}^\mathrm{T}\mathbf{x} for a particular \mathbf{Q} and \mathbf{r}.
The matrix is positive definite, so the stationary point is a minimum. In fact, the minimum. Surface and contour plots of the function are shown below.
The matrix Q is singular, which has two consequences:
+
+
We cannot compute the stationary point since Q is not invertible. In fact, there is a whole line (a subspace) of stationary points.
+
The matrix Q is positive semidefinite, which generally means that optimality cannot be concluded. But in this particular case of a quadratic function, there are no higher-order terms in the Taylor series expansion, so the stationary point is a minimum.
+
+
Surface and contour plots of the function are shown below.
+
+Figure 3: Singular point of a quadratic function
+
+
+
+
+
+
Example 6 (Singular point of a non-quadratic function) Consider the function f(\bm x) = x_1^2 + x_2^4. Its gradient is \nabla f(\bm x) = \begin{bmatrix}2x_1\\ 4x_2^3\end{bmatrix} and it vanishes at \bm x^\star = \begin{bmatrix}0\\ 0\end{bmatrix}. The Hessian is \nabla^2 f(\bm x) = \begin{bmatrix}2 & 0\\ 0 & 12x_2^2\end{bmatrix}, which when evaluated at the stationary point is \nabla^2 f(\bm x)\bigg|_{\bm x=\mathbf 0} = \begin{bmatrix}2 & 0\\ 0 & 0\end{bmatrix}, which is positive semidefinite. We cannot conclude if the function attains a minimum at \bm x^\star.
+
We need to examine higher-order terms in the Taylor series expansion. The third derivatives are
+\frac{\partial^3 f}{\partial x_1^3} = 0, \quad \frac{\partial^3 f}{\partial x_1^2\partial x_2} = 0, \quad \frac{\partial^3 f}{\partial x_1\partial x_2^2} = 0, \quad \frac{\partial^3 f}{\partial x_2^3} = 24x_2,
+ and when evaluated at zero, they all vanish.
+
All but one fourth derivatives also vanish. The one that does not is
+\frac{\partial^4 f}{\partial x_2^4} = 24,
+ which is positive, and since the associated derivative is of the even order, the power si also even, hence the function attains a minimum at \bm x^\star = \begin{bmatrix}0\\ 0\end{bmatrix}.
+
This can also be visually confirmed by the surface and contour plots of the function.
+
+
+
+
+
+
\ No newline at end of file
diff --git a/opt_theory_unconstrained.html b/opt_theory_unconstrained.html
index 0d7cba6..d31f997 100644
--- a/opt_theory_unconstrained.html
+++ b/opt_theory_unconstrained.html
@@ -1000,7 +1000,7 @@
Classi
Example 3 (Minimum of a quadratic function) We consider a quadratic function f(\mathbf x) = \frac{1}{2}\mathbf{x}^\mathrm{T}\mathbf{Q}\mathbf{x} + \mathbf{r}^\mathrm{T}\mathbf{x} for a particular \mathbf{Q} and \mathbf{r}.
-
+
Code
Q = [11; 12]
@@ -1039,56 +1039,56 @@
Classi
@@ -6341,97 +6341,97 @@
Classi
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
(b) Contour plot
@@ -6449,7 +6449,7 @@
Classi
Example 4 (Saddle point of a quadratic function)
-
+
Code
Q = [-11; 12]
@@ -6489,54 +6489,54 @@
Classi
@@ -12770,97 +12770,97 @@
Classi
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
(b) Contour plot
@@ -12878,7 +12878,7 @@
Classi
Example 5 (Singular point of a quadratic function)
The primary reference for our overview of methods for model and controller order reduction is Chapter 11 in Skogestad and Postlethwaite (2005). For a more detailed introduction, there a few dedicated monographs such as Obinata and Anderson (2000) and Athanasios C. Antoulas (2005). A short extract from the latter is in A. C. Antoulas and Sorensen (2001).
+
The latter also excels in that it also admits that the topic of reduction of order of mathematical models formatted as state equations is not only relevant for the control systems community but from a number of other engineering and scientific communities as well. After all, mathematical models are not only for model-based control design but for simulation, optimization, and other purposes. For example, the in Tan and He (2007) they are motivated by fast simulation of VLSI circuits.
+Antoulas, A. C., and D. C. Sorensen. 2001. “Approximation of Large-Scale Dynamical Systems: An Overview.” Technical Report. Houston, Texas: Rice University. https://hdl.handle.net/1911/101964.
+
+
+Antoulas, Athanasios C. 2005. Approximation of Large-Scale Dynamical Systems. Philadelphia: Society for Industrial and Applied Mathematics.
+
+
+Obinata, Goro, and Brian D. O. Anderson. 2000. Model Reduction for Control System Design. New York: Springer.
+
+
+Skogestad, Sigurd, and Ian Postlethwaite. 2005. Multivariable Feedback Control: Analysis and Design. 2nd ed. Wiley. https://folk.ntnu.no/skoge/book/.
+
+
+Tan, Sheldon, and Lei He. 2007. Advanced Model Order Reduction Techniques in VLSI Design. Cambridge: Cambridge University Press.
+
Even when restricted to control systems, the concept of robustness is quite broad and can be approached from many different angles. In our course we are restricting the focus to the approaches formulated in frequency domain. The main reference for this part of the course is the book (Skogestad and Postlethwaite 2005). The concepts and techniques introduced in our lecture are covered in Chapters 7 and 8 (up to 8.5) of the book.
+
What we typically do not cover in the lecture, but only due to time constraints, is the topic of structured uncertainties and their analysis using structured singular value (SSV, 𝜇). These are treated in the section 8.6 through 8.11 of the book. It is warmly recommended to have a look at it.
+
Although the book is not freely available online (only the first three pages are downloadable on the authors’ web page), it is available in a decent number of copies in the university library.
+
+
+
+
+
+
+Get the second edition of Skogestad’s book
+
+
+
+
In case you are interested in getting the book in one way or another (perhaps even by purchasing it), make sure you get the second edition published 2005. The book contains some useful snippets of Matlab code and the first edition relies on some ancient version of Matlab toolboxes, which makes it useless these days.
+
+
+
The topic of modeling uncertainty in frequency domain using weighting filters plugged into additive or multiplicative structures is fairly classical now and as such can be found in numerous textbooks on robust control such as (Doyle, Francis, and Tannenbaum 2009), (Zhou, Doyle, and Glover 1995), (Dullerud and Paganini 2000), (Sánchez-Peña and Sznaier 1998). Although these are fine texts, frankly speaking they offer nearly no guidance for applying the highly advanced concepts to practical problems – they mostly focus on building up the theoretical framework. In this regard, Skogestad’s book is truly unique.
+Dullerud, Geir E., and Fernando Paganini. 2000. A Course in Robust Control Theory: A Convex Approach. Texts in Applied Mathematics. New York: Springer-Verlag. https://doi.org/10.1007/978-1-4757-3290-0.
+
+
+Gu, Da-Wei, Petko H. Petkov, and Mihail M. Konstantinov. 2013. Robust Control Design with MATLAB. 2nd ed. Advanced Textbooks in Control and Signal Processing. New York: Springer. https://doi.org/10.1007/978-1-4471-4682-7.
+
+
+Lavretsky, Eugene, and Kevin Wise. 2024. Robust and Adaptive Control: With Aerospace Applications. 2nd ed. Advanced Textbooks in Control and Signal Processing (C&SP). Cham: Springer. https://doi.org/10.1007/978-3-031-38314-4.
+
+
+Sánchez-Peña, Ricardo S., and Mario Sznaier. 1998. Robust Systems Theory and Applications. 1st ed. Wiley-Interscience.
+
+
+Skogestad, Sigurd, and Ian Postlethwaite. 2005. Multivariable Feedback Control: Analysis and Design. 2nd ed. Wiley. https://folk.ntnu.no/skoge/book/.
+
+
+Yedavalli, Rama K. 2014. Robust Control of Uncertain Dynamic Systems: A Linear State Space Approach. New York: Springer. https://doi.org/10.1007/978-1-4614-9132-3.
+
+
+Zhou, Kemin, John C. Doyle, and Keith Glover. 1995. Robust and Optimal Control. 1st ed. Prentice Hall.
+
Alternatives in other languages exist, but very often are less well developed and/or documented. A notable exception is RobustAndOptimalControl.jl for Julia.
Through this chapter we are stepping into the domain of robust control. We need to define a few keywords first.
+
+
Definition 1 (Uncertainty) Deviation of the mathematical model of the system from the real (mechanical, electrial, chemical, biological, …) system.
+
+
+
Definition 2 (Robustness) Insensitivity of specified properties of the system to the uncertainty.
+
+
While these two terms are used in many other fields, here we are tailoring them to the discipline of control systems, in particular their model-based design.
+
+
Definition 3 (Robust control) Not a single type of a control but rather a class of control design methods that aim to ensure robustness of the resulting control system. By convention, a robust controller is a fixed controller, typically designed for a nominal model. This is in contrast with an adaptive controller that adjusts itself in real time to the actual system.
+
+
+
Origins of uncertainty in models?
+
+
Physical parameters are not known exactly (say, they are known to be within ±10% or ±3σ interval around the nominal value).
+
Even if the physical parameters are initially known with a high accuracy, they can evolve in time, unmeasured.
+
There may be variations among the individual units of the same product.
+
If a nonlinear system is planned to be operated around a given operating point, it can be linearized aroud that operating point, which gives a nominal linear model. If the system is then operated in a significantly different operating point, the corresponding linear model is different from the nominal one.
+
Our understanding of the underlying physics (or chemistry or biology or …) is imperfect, hence our model is imperfect too. In fact, our understading can even be incorrect, in which case the model contains some discrepancies too. The imperfections of the model are typically observed at higher frequencies (referring to the frequency-domain modeling such as transfer functions).
+
Even if we are able to eventually capture full dynamics of the system in a model, we may opt not to do so. We may want to keep the model simple, even if less accurate, because time invested into modelling is not for free.
+
Even if we can get a high-fidelity model with a reasonable effort, we may still prefer using a simpler (and less accurate) model for a controller design. The reason is that very often the complexity of the model used for model-based control design is reflected by the complexity of the controller – and high-complexity controllers are not particularly appreciated in industry.
+
+
+
+
Models of uncertainty
+
There are several approaches to model the uncertainty (or, in other words, to characterize the uncertainty in the model). They all aim – in one way or another – to express that the controller has to deal no only with the single nominal system, for which it was designed, but a family of systems. Depending on the mathematical frameworks used for characterization of such a family, there are two major classes of approaches.
+
+
Worst-case models of uncertainty
+
Probabilistic models of uncertainty
+
+
The former assumes sets of systems with no additional information about the structure of such sets. The latter imposes some probability structure on the set of systems – in other words, although in principle any member of the set possible, some may be more probable than the others. In this course we are focusing on the former, which is also the mainstream in the robust control literature, but note that the latter we already encountered while considering control for systems exposed to random disturbances, namely the LQG control. A possible viewpoint is that as a consequence of the random disturbance, the controller has to deal with a family of systems.
+
Another classification of models of uncertainty is according to the actual quantity that is uncertain. We distinguish these two
+
+
Parametric uncertainty
+
Frequency-dependent (aka dynamical) uncertainty
+
+
Unstructured uncertainty
+
Structured uncertainty
+
+
+
+
Parametric uncertainty
+
This is obviously straightforward to state: some (real/physical) parameters are uncertain. The conceptually simplest way to characterize such uncertain parameters is by considering intervals instead of just single (nominal) values.
+
+
Example 1 (A pendulum on a cart)
+\begin{aligned}
+{\color{red} m_\mathrm{l}} & \in [m_\mathrm{l}^{-},m_\mathrm{l}^{+}],\\
+{\color{red} l} & \in [l^{-}, l^{+}],
+\end{aligned}
+
Not only some enumerated physical parameters but even the order of the system can be uncertain. In other words, there may be some phenomena exhibitted by the system that is not captured by the model at all. Possibly some lightly damped modes, possibly some time delay here and there. The system contains uncertain dynamics. In the linear case, all this can be expressed by regarding the magnitude and phase responses uncertain without mapping these to actual physical parameters.
+
+
+
+
+
+
+Figure 1: A whole subsystem is uncertain
+
+
+
+
A popular model for the uncertain subsystem is that of a transfer function \Delta(s), about which we know only that it is stable and that its magnitude is bounded by 1 \boxed
+{\sup_{\omega}|\Delta(j\omega)|\leq 1,\;\;\Delta \;\text{stable}. }
+
+
But typically the uncertainty is higher at higher frequencies. This can be expressed by using some weighting function w(\omega).
+
For later theoretical and computational purposes we approximate the real weighting function using a low-order rational stable transfer function W(s). That is, W(j\omega)\approx w(\omega) for \omega \in \mathbb R, that is for s=j\omega on the imaginary axis.
+
The ultimate transfer function model of the uncertainty is then
+\boxed{
+W(s)\;\Delta(s),\quad \max_{\omega}|\Delta(j\omega)|\leq 1,\;\;\Delta\; \text{stable}. }
+
+
+
\mathcal H_\infty norm of an LTI system
+
+
H-infinity norm of an LTI system interpreted in frequency domain
+
+
Definition 4 (\mathcal H_\infty norm of a SISO LTI system) For a stable LTI system G with a single input and single output, the \mathcal H_\infty norm is defined as
+\|G\|_{\infty} = \sup_{\omega\in\mathbb{R}}|G(j\omega)|.
+
+
+
+
+
+
+
+
+Why supremum and not maximum?
+
+
+
+
Supremum is uses in the definition because it is not guaranteed that the peak value of the magnitude frequency response is attained at a single frequency. Consider an example of a first-order system G(s) = \frac{s}{Ts+1}. The peak gain of 1/T is not attained at a single finite frequency.
+
+
+
Having just defined the \mathcal H_\infty norm, the uncertainty model can be expressed compactly as \boxed{
+W(s)\;\Delta(s),\quad \|\Delta(j\omega)\|\leq 1. }
+
+
+
+
+
+
+
+\mathcal H_\infty as a space of functions
+
+
+
+
\mathcal H_\infty denotes a normed vector space of functions that are analytic in the closed extended right half plane (of the complex plane). In parlance of control systems, \mathcal H_\infty is the space of proper and stable transfer functions. Poles on the imaginary axis are not allowed. The functions do not need to be rational, but very often we do restrict ourselves to rational functions, in which case we typically write such space as \mathcal{RH}_\infty.
+
+
+
We now extend the concept of the \mathcal H_\infty norm to MIMO systems. The extension is perhaps not quite intuitive – certainly it is not computed as the maximum of the norms of individual transfer functions, which may be the first guess.
+
+
Definition 5 (\mathcal H_\infty norm of a MIMO LTI system) For a stable LTI system \mathbf G with multiple inputs and/or multiple outputs, the \mathcal H_\infty norm is defined as
+\|\mathbf G\|_{\infty} = \sup_{\omega\in\mathbb{R}}\bar{\sigma}(\mathbf{G}(j\omega))
+ where \bar\sigma is the largest singular value.
+
+
Here we include a short recap of singular values and singular value decomposition (SVD) of a matrix. Consider a matrix \mathbf M, possibly a rectangular one. It can be decomposed as a product of three matrices
+\mathbf M = \mathbf U
+\underbrace{
+\begin{bmatrix}
+\sigma_1 & & & &\\
+ & \sigma_2 & & &\\
+ & &\sigma_3 & &\\
+\\
+ & & & & \sigma_n\\
+\end{bmatrix}
+}_{\boldsymbol\Sigma}
+\mathbf V^{*}.
+
+
The two square matrices \mathbf V and \mathbf U are unitary, that is,
+\mathbf V\mathbf V^*=\mathbf I=\mathbf V^*\mathbf V
+ and
+\mathbf U\mathbf U^*=\mathbf I=\mathbf U^*\mathbf U.
+
+
The nonnegative diagonal entries \sigma_i \in \mathbb R_+, \forall i of the (possibly rectangular) matrix \Sigma are called singular values. Commonly they are ordered in a nonincreasing order, that is
+\sigma_1\geq \sigma_2\geq \sigma_3\geq \ldots \geq \sigma_n.
+
+
It is also a common notation to denote the largest singular value as \bar \sigma, that is, \bar \sigma \coloneqq \sigma_1.
+
+
+
\mathcal{H}_{\infty} norm of an LTI system interpreted in time domain
+
We can also view the dynamical system G with inputs and outputs as an operator mapping from some chosen space of functions to another space of functions. A popular model for these spaces are the spaces of square-integrable functions, denoted as \mathcal{L}_2, and sometimes interpreted as bounded-energy signals
+G:\;\mathcal{L}_2\rightarrow \mathcal{L}_2.
+
+
It is a powerful fact that the \mathcal{H}_{\infty} norm of the system is then defined as the induced norm of the corresponding operator \boxed{
+\|G(s)\|_{\infty} = \sup_{u(t)\in\mathcal{L}_2\setminus 0}\frac{\|y(t)\|_2}{\|u(t)\|_2}}.
+
+
With the energy interpretation of the input and output variables, this system norm can also be interpreted as the worst-case energy gain of the system.
+
Scaling necessary to get any useful info from MIMO models! See Skogestad’s book, section 1.4, pages 5–8.
+
+
+
+
How does the uncertainty enter the model of the system?
+
+
Additive uncertainty
+
The transfer function of an uncertain system can be written as a sum of a nominal system and an uncertainty
+G(s) = \underbrace{G_0(s)}_{\text{nominal model}}+\underbrace{W(s)\Delta(s)}_{\text{additive uncertainty}}.
+
+
The block diagram interpretation is in
+
+
+
+
+
+
+Figure 2: Additive uncertainty
+
+
+
+
The magnitude frequency response of the weighting filter W(s) then serves as an upper bound on the absolute error in the magnitude frequency responses
+|G(j\omega)-G_0(j\omega)|<|W(j\omega)|\quad \forall \omega\in\mathbb R.
+
+
+
+
Multiplicative uncertainty
+
+G(s) = (1+W(s)\Delta(s))\,G_0(s).
+
+
The block diagram interpretation is in
+
+
+
+
+
+
+Figure 3: Multiplicative uncertainty
+
+
+
+
+
+
+
+
+
+For SISO transfer functions no need to bother about the order of terms in the products
+
+
+
+
Sice we are considering SISO transfer functions, the order of terms in the products is not important. We will have to be more alert to the order of terms when we move to MIMO systems.
+
+
+
The magnitude frequency response of the weighting filter W(s) then serves as an upper bound on the relative error in the magnitude frequency responses \boxed
+{\frac{|G(j\omega)-G_0(j\omega)|}{|G_0(j\omega)|}<|W(j\omega)|\quad \forall \omega\in\mathbb R.}
+
+
+
Example 2 (Uncertain first-order delayed system) We consider a first-order system with a delay described by
+G(s) = \frac{k}{T s+1}e^{-\theta s}, \qquad 2\leq k,\tau,\theta,\leq 3.
+
+
We now need to choose the nominal model G_0(s) and then the uncertainty weighting filter W(s). The nominal model corresponds to the nominal values of the parameters, therefore we must choose these. There is no single correct way to do this. Perhaps the most intuitive way is to choose the nominal values as the average of the bounds. But we can also choose the nominal values in a way that makes the nominal system simple. For example, for this system with a delay, we can even choose the nominal value of the delay as zero, which makes the nominal system a first-order system without delay, hence simple enough for application of some basic linear control system design methods. Of course, the price to pay is that the resulting model of an uncertain system, which is actually a set of systems, contains even models of a plant that were not prescribed.
Now we need to find some upper bound on the relative error. Simplicity is a virtue here too, hence we are looking for a rational filter of very low order, say 1 or 2. Speaking of the first-order filter, one useful way to format it is
+\boxed{
+W(s) = \frac{\tau s+r_0}{(\tau/r_{\infty})s+1}}
+ where r_0 is uncertainty at steady state, 1/\tau is the frequency, where the relative uncertainty reaches 100%, r_{\infty} is relative uncertainty at high frequencies, often r_{\infty}\geq 2.
+
For our example, the parameters of the filter are in the code below and the frequency response follows.
Obviously the filter does not capture the family of systems perfectly. It is now up to the control engineer to decide if this is a problem. If yes, if the control design should be really robust against all uncertainties in the considered set, some more complex (higher-order) filter is needed to described the uncertainty more accurately. The source code shows (in commented lines) one particular candidate, but in general the whole problem boils down to designing a stable filter with a prescribed magnitude frequency response.
+
+
+
+
Inverse additive uncertainty
+
…
+
+
+
Inverse multiplicative uncertainty
+
…
+
+
+
Linear fractional transformation (LFT)
+
For a matrix \mathbf P sized (n_1+n_2)\times(m_1+m_2) and divided into blocks like
+\mathbf P=
+\begin{bmatrix}
+\mathbf P_{11} & \mathbf P_{12}\\
+\mathbf P_{21} & \mathbf P_{22}
+\end{bmatrix},
+ and a matrix \mathbf K sized m_2\times n_2, the lower LFT of \mathbf P with respect to \mathbf K is
+\boxed{
+\mathcal{F}_\mathbf{l}(\mathbf P,\mathbf K) = \mathbf P_{11}+\mathbf P_{12}\mathbf K(\mathbf I-\mathbf P_{22}\mathbf K)^{-1}\mathbf P_{21}}.
+
+
It can be viewed as a feedback interconnection of the plant \mathbf P and the controller \mathbf K, in which not all plant inputs are used as control inputs and not all plant outputs are measured, as depicted in Figure 4
+
+
+
+
+
+
+Figure 4: Lower LFT of \mathbf P with respect to \mathbf K
+
+
+
+
Similarly, for a matrix \mathbf N sized (n_1+n_2)\times(m_1+m_2) and a matrix \boldsymbol\Delta sized m_1\times n_1, the upper LFT of \mathbf N with respect to \mathbf K is
+\boxed{
+\mathcal{F}_\mathbf{u}(\mathbf N,\boldsymbol\Delta) = \mathbf N_{22}+\mathbf N_{21}\boldsymbol\Delta(\mathbf I-\mathbf N_{11}\boldsymbol\Delta)^{-1}\mathbf N_{12}}.
+
+
It can be viewed as a feedback interconnection of the nominal plant \mathbf N and the uncertainty block \boldsymbol\Delta, as depicted in Figure 5
+
+
+
+
+
+
+Figure 5: Upper LFT of \mathbf N with respect to \boldsymbol \Delta
+
+
+
+
Here we already anticipated MIMO uncertainty blocks. One motivation for them is explained in the very next section on structured uncertainties, another one is appearing once we start formulating robust performance within the same analytical framework as robust stability.
+
+
+
+
+
+
+Which is lower and which is upper is a matter of convention, but a useful one
+
+
+
+
Our usage of the lower LFT for a feedback interconnection of a (generalized) plant and a controller and the upper LFT for a feedback interconnection of a nominal system and and uncertainty is completely arbitrary. We could easily use the lower LFT for the uncertainty. But it is a convenient convention to adhere to. The more so that it allows for the combination of both as in the diagram Figure 6 below, which corresponds to composition of the two LFTs.
+
+
+
+
+
+
+Figure 6: Combination of the lower and upper LFT
+
+
+
+
+
+
+
+
+
+
Structured frequency-domain uncertainty
+
Not just a single \Delta(s) but several \Delta_i(s), i=1,\ldots,n are considered. Some of them scalar-valued, some of them matrix-valued.
+
In the upper LFT, all the individual \Delta_is are collected into a single overall \boldsymbol \Delta, which then exhibits some structure. Typically it is block-diagonal as in
+\boldsymbol\Delta =
+\begin{bmatrix}
+\Delta_1& 0 & \ldots & 0\\
+0 & \Delta_2 & \ldots & 0\\
+\vdots\\
+0 & 0 & \ldots & \boldsymbol\Delta_n
+\end{bmatrix},
+ with each block (including the MIMO blocks) satisfying the usual condition
+\|\Delta_i\|_{\infty}\leq 1, \; i=1,\ldots, n.
+
+
+
Structured singular value (SSV, \mu, mu)
+
With this structured uncertainty, how does the small gain theorem look like?
We now need to choose the nominal model G_0(s) and then the uncertainty weighting filter W(s). The nominal model corresponds to the nominal values of the parameters, therefore we must choose these. There is no single correct way to do this. Perhaps the most intuitive way is to choose the nominal values as the average of the bounds. But we can also choose the nominal values in a way that makes the nominal system simple. For example, for this system with a delay, we can even choose the nominal value of the delay as zero, which makes the nominal system a first-order system without delay, hence simple enough for application of some basic linear control system design methods. Of course, the price to pay is that the resulting model of an uncertain system, which is actually a set of systems, contains even models of a plant that were not prescribed.
Multiplicative
W(s) = \frac{\tau s+r_0}{(\tau/r_{\infty})s+1}}
where r_0 is uncertainty at steady state, 1/\tau is the frequency, where the relative uncertainty reaches 100%, r_{\infty} is relative uncertainty at high frequencies, often r_{\infty}\geq 2.
For our example, the parameters of the filter are in the code below and the frequency response follows.
Obviously the filter does not capture the family of systems perfectly. It is now up to the control engineer to decide if this is a problem. If yes, if the control design should be really robust against all uncertainties in the considered set, some more complex (higher-order) filter is needed to described the uncertainty more accurately. The source code shows (in commented lines) one particular candidate, but in general the whole problem boils down to designing a stable filter with a prescribed magnitude frequency response.
When we introduced the concept of robustness, we only vaguely hinted that it is always related to some property of interest. Now comes the time to specify these two properties:
+
+
Definition 1 (Robust stability) Guaranteed stability of the closed feedback loop with a given controller for all admissible (=considered apriori) deviations of the model from the reality.
+
+
+
Definition 2 (Robust performance) Robustness of some performance characteristics such as steady-state regulation error, attenuation of some specified disturbance, insensitivity to measurement noise, fast response, ….
+
+
+
Internal stability
+
Before we start discussing robust stability, we need to discuss one fine issue related to stability of a nominal system. We do it through the following example.
+
+
Example 1 (Internal stability) Consider the following feedback system with a nominal plant G(s) and a nominal controller K(s).
+
+
+
+
+
+
+
+
+
+
The question is: is this closed-loop system stable? We determine stability by looking at the denominator of a closed-loop transfer function. But which one? There are several. Perhaps the most immediate one is the transfer function from the reference r to the plant output y. With the open-loop transfer function L(s) = G(s)K(s) = \frac{s-1}{s+1} \frac{k(s+1)}{s(s-1)} = \frac{k}{s}, the closed-loop transfer function is
+T(s) = \frac{\frac{k}{s}}{1+\frac{k}{s}} = \frac{k}{s+k},
+ which is perfectly stable. But note that for practical purposes, all possible closed-loop transfer functions must be stable. How about the one from the output disturbance d to the plant output y?
+S(s) = \frac{1}{1+\frac{k}{s}} = \frac{s}{s+k},
+ which is stable too. Isn’t this a signal that we can stop worrying? Not yet. Consider now the closed-loop transfer function from the reference r to the control u. The closed-loop transfer function is
+K(s)S(s) = \frac{\frac{k(s+1)}{s(s-1)}}{1+\frac{k}{s}} = \frac{k(s+1)}{{\color{red}(s-1)}(s+k)}.
+
+
Oops! This closed-loop transfer function is not stable. Obviously the culprit here is our cancelling the zero in the RHP with an unstable pole in the controller. But let’s emphasize that the trouble is not in imperfectness of this cancellation due to numerical errors. The trouble is in the very cancelling the zero in the RHP by the controller. Identical problem would arise if an unstable pole of the plant is cancelled by the RHP zero of the controller as we can see by modifying the assignment accordingly.
+
+
The example taught (or perhaps reminded) us that in order to guarantee stability of all closed-loop transfer functions, no cancellation of poles and zeros in the right half plane is allowed. The resulting closed-loop system is then called internally stable. Checking just (arbitrary) one closed-loop transfer function for stability is then enough to conclude that all of them are stable too.
+
+
+
Robust stability for a multiplicative uncertainty
+
We consider a feedback system with a plant G(s) and a controller K(s), where the uncertainty in the plant modelled as multiplicative uncertainty, that is, G(s) = (1+W(s)\Delta(s))\,G_0(s).
+
The technique for analyzing closed-loop stability is based on Nyquist criterion. Instead of analyzing the Nyquist plot for the nominal plant G_0(s), we analyze the Nyquist plot for the uncertain plant G(s). The corresponding open-loop transfer function is
+L(s) = G(s)K(s) = (1+W(s)\Delta(s))\,G_0(s)K(s) = L_0(s) + W(s)L_0(s)\Delta(s).
+
+
When trying to figure out the conditions, under which this family of Nyquist curves avoids the point -1, it is useful to interpret the last equation at a given frequency \omega as a disc with the center at L_0(j\omega) and the radius W(j\omega)L_0(j\omega). To see this, note that \Delta(j\omega) represents a complex number with a magnitude up to one, and with an arbitrary angle.
The geometric formulation of the condition is then that the distance from -1 to the nominal Nyquist plot of L_0(j\omega) is greater than the radius W(j\omega)L_0(j\omega) of the disc centered at the nominal Nyquist curve With the distance from the point -1 to the nominal Nyquist plot of L_0(s) evaluated at a particular frequency \omega a |-1-L_0(j\omega)| = |1+L_0(j\omega)|, the condition can be written as
Dividing both sides by 1+L_0(j\omega) we get
+\frac{W(j\omega)L_0(j\omega)}{1+L_0(j\omega)} < 1, \;\forall \omega.
+
+
But recalling the definition of the complementary sensitivity function, and dividing both sides by W, we can rewrite the condition as \boxed
+{|T_0(j\omega)| < 1/|W(j\omega)|, \;\; \forall \omega.}
+
+
This condition has clear interpretation in terms of the magnitude of the complementary sensitivity function – it must be smaller than the reciprocal of the magnitude of the uncertainty weight at all frequencies.
+
Finally, we can also invoke the definition of the \mathcal H_\infty norm and reformulate the condition as \boxed
+{\|WT\|_{\infty}< 1.}
+
+
To appreciate usefulness of the this format of the robust stability condition beyond mere notational compactness, we mention that \mathcal H_\infty norm of an LTI system can be reliably computed. Robust stability can then be then checked by computing a single number.
+
In fact, it is even better than that – there are methods for computing a feedback controller that minimizise the \mathcal H_\infty norm of a specified closed-loop transfer function, which suggests an optimization-based approach to design of robustly stabilizing controllers. We are going to build on this in the next chapter. But let’s stick to the analysis for now.
+
+Figure 3: Upper LFT with the \mathbf N term corresponding to the nominal closed-loop system structured into blocks
+
+
+
+
The term corresponding to the nominal closed-loop system is structured into blocks. It is only the N_{11} block that captures the interaction with the uncertainty in the model. For convenience we rename this block as
+M \coloneqq N_{11}.
+
+
The open-loop transfer function is then M \Delta. Following the same Nyquist criterion based reasoning as before, that is, asking for the conditions under which this open-loop transfer function does not touch the point -1, while the \Delta term can introduce an arbitrary phase, we arrive at the robust stability condition for the LFT as \boxed
+{|M(j\omega)|<1,\;\;\forall \omega.}
+
+
Once again, invoking the definition of the \mathcal H_\infty norm, we can rewrite the condition compactly as \boxed
+{\|M\|_{\infty}<1.}
+
+
Once again, the formulation as an inequality over all frequencies can be useful for visualization and interpretation, while the inequality with the \mathcal H_\infty norm can be used for computation and optimization.
+
This condition of robust stability belongs to the most fundamental results in control theory. It is known as the small gain theorem.
+
+
+
+
+
+
+Small gain theorem works for MIMO too
+
+
+
+
Small gain theorem works for a MIMO uncertainty \boldsymbol \Delta and a block \mathbf N_{11} (or \mathbf M) too
+\|\mathbf M\|_{\infty}<1.
+
+
But we discuss in the next section that it is typically too conservative as the \boldsymbol \Delta block has typically some structure (block diagonal) and it should be exploited. More on this in the section dedicated to structured uncertainty.
+
+
+
+
+
Nominal performance
+
Having discussed stability (and its robustness), it is now time to turn to performance (and its robustness). Performance can mean difference things for different people, and it can be specified in a number of ways, but we would like to formulate performance requirements in the same frequency domain setting as we did for (robust) stability. Namely, we would like to specify the performance requirements in terms of the frequency response of some closed-loop transfer function. The sensitivity function seems to be a natural choice for this purpose. It turns out that by imposing upper bound constraints on |S(\omega)| (actually |S_0(\omega)| as we now focus on the nominal case with no uncertainty) we can specify a number of performance requirements:
+
+
Up to which frequency the feedback controller attenuates the disturbance, that is, the bandwidth \omega_\mathrm{BW} of the system.
+
How much the feedback controller attenuates the disturbances over the bandwidth.
+
How does it behave at very low frequencies, that is, how well it regulates the steady-state error.
+
What is the maximum amplification of the disturbance, that is, the resonance peak.
+
+
These four types of performance requirements can be pointed at in Figure 4 below.
+
+
+
+
+
+
+Figure 4: Performance specifications through the shape of the magnitude frequency response of the sensitivity function
+
+
+
+
But these requirements can also be compactly expressed throug the performance weighting filter W_\mathrm{p}(s) as \boxed
+{|S_0(j\omega)| < 1/|W_\mathrm{p}(j\omega)|,\;\;\forall \omega,}
+\tag{1}
+
where S_0 = \frac{1}{1+L_0} is the sensitivity function of the nominal closed-loop system. which can again be compactly written as \boxed
+{\|W_\mathrm{p}S_0\|_{\infty}<1.}
+
+
It lends some insight if we visualize this condition in the complex plane. First, recall that S_0 = \frac{1}{1+L_0}. Equation 1 then translates to
+|W_\mathrm{p}(j\omega)|<|1+L_0(j\omega)|\;\;\forall \omega,
+ which can be visualized as in
+
+
+
+
+
+
+Figure 5: Nominal performance condition
+
+
+
+
+
+
Robust performance for a multiplicative uncertainty
+
So far we have the condition of robust stability and the condition of nominal performance. Simultaneous satisfaction of both gives… just robust stability and nominal performance. Robust performance obviously needs a stricter condition.
In the SISO case, this is equivalent to \boxed
+{\left\|
+\begin{bmatrix}
+W_\mathrm{p}S_0\\
+WT_0
+\end{bmatrix}
+\right\|_{\infty}
+<\frac{1}{\sqrt{2}},}
+ where the augmented closed-loop system \begin{bmatrix} W_\mathrm{p}S\\ WT_0 \end{bmatrix} is called mixed sensitivity function.
+
In the MIMO case we do not have a useful upper bound, but at least we have received a hint that it may be useful to minimize the \mathcal H_\infty norm of the mixed sensitivity function. This observation will directly lead to a control design method.
Here we formulate the general problem of \mathcal{H}_\infty-optimal control. There are two motivations for this. First, it gives the general framework within which we can formulate and solve the mixed-sensitivity problem defined in the frequency domain that we discused previously. Second, it allows us to consider exploit the time-domain (or signal) interpretation of the \mathcal{H}_\infty norm of a system to formulate a new class of problems that can be solved with these optimization tools. For the latter, recall that
+\|\mathbf G\|_{\infty} = \sup_{u\in\mathcal{L}_{2}\backslash \emptyset}\frac{\|\bm y\|_2}{\|\bm u\|_2},
+ in which we allow for vector input and output signals, hence MIMO systems, from the very beginning.
+
Now, for particular control requirements, we build the generalized plant \mathbf P such that after forming the feedback interconnection with the controller \mathbf K as in Figure 1
+
+
+
+
+
+
+Figure 1: Lower LFT of the generalized plant and the controller
+
+
+
+
it makes sense require the stabilizing controller to minimize the amplification of the exogenous inputs (disturbances, references, noises) into the regulated outputs. We want to make the regulated outputs as insensitive as possible to the exogenous inputs and to quantify the sizes of the inputs and outputs, we use the \mathcal L_2 norm.
+
But then what we have is really the standard \mathcal{H}_\infty optimization problem \boxed
+{\operatorname*{minimize}_{\mathbf K \text{ stabilizing}}\|\mathcal{F}_{\mathrm l}(\mathbf P,\mathbf K)\|_{\infty}.}
+
+
Numerical solvers exist in various software environments.
+
+
Mixed-sensitivity problem reformulated as the standard \mathcal{H}_\infty optimization problem
+
We now show how the mixed-sensitivity problem discussed previously can be reformulated within as the standard \mathcal{H}_\infty optimization problem. We consider the full mixed-sensitivity problem for a SISO plant
+
+\operatorname*{minimize}_{K \text{ stabilizing}}
+\left\|
+\begin{bmatrix}
+W_1S\\W_2KS\\W_3T
+\end{bmatrix}
+\right\|_{\infty},
+ which obviously considers a closed-loop system with one input and three outputs. With only one exogenous input, we must choose its role. Say, the only exogenous input is the reference signal. The closed-loop system for which the norm is minimized is in the following block diagram Figure 2.
+
+
+
+
+
+
+Figure 2: Mixed-sensitivity problem interpreted as the standard \mathcal{H}_\infty optimization problem
+
+
+
+
The matrix transfer function for generalized plant \mathbf P has two inputs and four outputs and it can then be written as
+\mathbf P = \left[\begin{array}{c|c}
+W_1 & -W_1G\\
+0 & W_2\\
+0 & W_3G\\
+\hline
+1 & -G
+\end{array}\right].
+
+
A state space realization of this plant \mathbf P is then used as the input argument to the solver for the \mathcal{H}_\infty optimization problem. In fact, we must also tell the solver how the inputs and outputs are structured. In this case, the solver must know that of the two inputs, only the second one can be used by the controller, and of the four outputs, only the fourth one is measured.
+
+
+
Signal-based \mathcal{H}_\infty-optimal control problem
+
Being able to solve the \mathcal{H}_\infty optimization problem, indeed we do not have to restrict ourselves to the generalized plants \mathbf P that correspond to the mixed-sensitivity problem. We can consider any plant \mathbf P, for which the problem makes sense. For example, if we want to consider not only references but also disturbances, and possibly even noises, there is no way to formulate this within the mixed-sensitivity framework. But we can still formulate this as the standard \mathcal{H}_\infty optimal control problem.
+
+
+
What is behind the \mathcal{H}_\infty solver?
+
+
+
Structure of the \mathcal{H}_\infty-optimal controller
In the previous chapter we mentioned that there are several ways to capture uncertainty in the model and analyze robustness with respect to the uncertainty. We have chosen the worst-case approach based on the small gain theorem for analysis of robust stability, which in the case of linear systems has an intuitive frequency-domain interpretation.
+
This chosen framework has two benefits. First, having being formulated in frequency domain, it offers us to take advantage of the insight developed in introductory courses on automatic control, that typically invest quite some effort into developing frequency doman concepts such as magnitude and phase Bode plots, Nyquist plot, and sensitivity and complementary sensitivity functions. Generations of control engineers have contributed to the collective know-how carried by these classical concepts and techniques.
+
Second, by formulating the requirements on robust stability, nominal stability and robust performance as constraints on \mathcal H_\infty norms of some closed-loop systems, an immediate extension from analysis to automated synthesis (control design) is enabled by availability of numerical methods for \mathcal H_\inftyoptimization. This enhances the classical frequency-domain control design techniques in that while the classical methods require that we know what we want and we also know how to achieve it, the \mathcal H_\infty optimization based methods require that we only know what we want (and express our requirements in frequency domain). We don’t have to bother with how to achieve it because there are numerical solvers that will do the job for us.
+
Let’s introduce the first instance of such methodology. We have learnt that the robust performance condition in presence of multiplicative uncertainty is formulated as a bound on the \mathcal H_\infty norm of the mixed sensitivity function\begin{bmatrix}W_pS\\WT\end{bmatrix}, namely
+\left\|
+\begin{bmatrix}
+W_pS\\WT
+\end{bmatrix}
+\right\|_{\infty}
+< \frac{1}{\sqrt{2}}.
+
+
Evaluating this condition can be done in a straightforward way, either at a grid of frequencies (inefficient) or by invoking a method for computing the norm.
+
But the major god news of this chapter is that we can also turn this into an optimization problem
+\operatorname*{minimize}_{K \text{ stabilizing}}
+\left\|
+\begin{bmatrix}
+W_pS\\WT
+\end{bmatrix}
+\right\|_{\infty}.
+
+
In words, we are looking for a controller K that guaranees stability of the closed-loop system and it also minimizes the \mathcal H_\infty norm of the mixed sensitivity function.
+Mixed sensitivity minimization as a special case of the general \mathcal H_\infty optimization
+
+
+
+
In anticipation of what is to come, we note here that the above minimization of the \mathcal H_\infty norm of the mixed sensitivity function is a special case of the more general \mathcal H_\infty optimization problem (minimization of the norm of a general closed-loop transfer function). Therefore, even if your software tools does not have a specific function for mixed sensitivity optimization, chances are that a solver for the general \mathcal H_\infty optimization function is available. And we will soon see how to reformulate the mixed sensitivity minimization as the general \mathcal H_\infty optimization problem.
+
+
+
Having derived the bound on the norm of the mixed sensitivity function (equal to 1/\sqrt{2} in the SISO case), it may now be tempting to conclude that the only goal of the optimization is to find a controller that satisfies this bound. However, it turns out that the optimization has another useful property – it is called self-equalizing property. We are not going to prove it, we will be happy just to interpret it: it means that with the optimal controller the frequency response of the considered (weighted and possibly mixed) sensitivity function is flat (constant over all frequencies).
+
In order to understand the impact of this property, let us consider the problem of minimizing just \|WT\|_\infty. We choose this problem even though practically it is not really useful to require just (robust) stability. For \gamma = \min_{K}\|WT\|_\infty, the flatness of the frequency response |W(j\omega)T(j\omega)| means that the magnitude frequency response |T(j\omega)| is proportional to 1/|W(j\omega)|, that is,
This gives another motivation for our \mathcal{H}_\infty optimization endeavor – through minimization we shape the closed-loop magnitude frequency responses. This automatic/automated loopshaping is the second benefit promised at the beginning of this section. But we emphasize that for practical pursposes it is only useful to minimize the norm of the mixed sensitivity function, in which case more than just simultaneous shaping of W_\mathrm{p}S and WT must be achieved.
+
With this new interpretation, we can feel free to include other terms in the optimization criterion. In particular, the criterion can be extended to include the control effort as in (after reindexing the weighting filters) \boxed
+{\operatorname*{minimize}_{K \text{ stabilizing}}
+\left\|
+\begin{bmatrix}
+W_1S\\W_2KS\\W_3T
+\end{bmatrix}
+\right\|_{\infty}.}
+
+
The middle term penalizes control (similarly as R term in LQ optimality criterion \int(x^TQx+u^TRu)dt). Typically it is sufficient to set it equal to a nonnegative constant.
+
An important property of this method is that it extends to the multiple-input-multiple-output (MIMO) case. Nothing needs to be changes in the formal problem statement as the \mathcal H_\infty norm is defined for MIMO systems as well.
The literature for \mathcal H_\infty control is essentially identical to the one we gave in the previous chapter on analysis of robustness. In particular, we stick to our primary textbook Skogestad and Postlethwaite (2005), in which the material is discussed in the forty-page Chapter 9.
+
While discussing the analysis of robustness in the previous chapter, we omited discussion of structured uncertainties using the structured singular values (SSU, mu, \mu). Similarly here we did not delve into the extension of that framework towards control synthesis. Purely because of time constraints. But should you find some time, have a look at section 8.12, which discusses the methodology called \mu synthesis.
mixsyn: control design by minimnizing the \mathcal{H}_\infty norm the the mixed-sensitivity function.
+
hinfsyn: control design by minimizing the \mathcal{H}_\infty norm of a closed-loop transfer function formulated using an LFT.
+
ncfsyn: another control design based on \mathcal{H}_\infty optimization, but this one considers a different uncertainty model not covered in our course. Although this uncertainty model does not have as intuitive an interpretation as the multiplicative uncertainty model use in mixed sensitivity synthesis, it captures a broad class of uncertainties. Furtheremore, the resulting controller enjoys the same decomposition into a state feedback and an observer as the popular LQG controller, which can be an advantage from an implementation viewpoint. Highly recommended method.
+
musyn: similar general setup as the hinfsyn method, but it considers a structure in the \Delta term. It is regarded by some as the culmination of the \mathcal{H}_\infty control design methods. The disadvantage is that it is the most computationaly intensive of the methods we covered, and the resulting controller is typically of rather high order.
+
+
+
+
+
+
\ No newline at end of file
diff --git a/search.json b/search.json
index 3651ed8..ede5528 100644
--- a/search.json
+++ b/search.json
@@ -81,7 +81,7 @@
"href": "cont_dp_HJB.html",
"title": "Dynamic programming for continuous-time optimal control",
"section": "",
- "text": "In the previous sections we investigated both direct and indirect approaches to the optimal control problem. Similarly as in the discrete-time case, complementing the two approaches is the dynamic programming. Indeed, the key Bellmans’s idea, which we previously formulated in discrete time, can be extended to continuous time as well.\nWe consider the continuous-time system \n\\dot{\\bm{x}} = \\mathbf f(\\bm{x},\\bm{u},t)\n with the cost function \nJ(\\bm x(t_\\mathrm{i}), \\bm u(\\cdot), t_\\mathrm{i}) = \\phi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) + \\int_{t_\\mathrm{i}}^{t_\\mathrm{f}}L(\\bm x(t),\\bm u(t),t)\\, \\mathrm d t.\nOptionally we can also consider constraints on the state at the final time (be it a particular value or some set of values) \n\\psi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f})=0.",
+ "text": "In the previous sections we investigated both direct and indirect approaches to the optimal control problem. Similarly as in the discrete-time case, complementing the two approaches is the dynamic programming. Indeed, the key Bellmans’s idea, which we previously formulated in discrete time, can be extended to continuous time as well.\nWe consider the continuous-time system \n\\dot{\\bm{x}} = \\mathbf f(\\bm{x},\\bm{u},t)\n with the cost function \nJ(\\bm x(t_\\mathrm{i}), \\bm u(\\cdot), t_\\mathrm{i}) = \\phi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) + \\int_{t_\\mathrm{i}}^{t_\\mathrm{f}}L(\\bm x(t),\\bm u(t),t)\\, \\mathrm d t.\nThe final time can be fixed to a particular value t_\\mathrm{f}, in which case the state at the final time \\bm x(t_\\mathrm{f}) is either free (unspecified but penalized through \\phi(\\bm x(t_\\mathrm{f}))), or it is fixed (specified and not penalized, that is, \\bm x(t_\\mathrm{f}) = \\mathbf x^\\mathrm{ref}).\nThe final time can also be free (regarded as an optimization variable itself), in which case general constraints on the state at the final time can be expressed as \n\\psi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f})=0\n or possibly even using an inequality, which we will not consider here.\nThe final time can also be considered infinity, that is, t_\\mathrm{f}=\\infty, but we will handle this situation later separately.",
"crumbs": [
"9.X. Continuous-time optimal control – dynamic programming",
"Dynamic programming for continuous-time optimal control"
@@ -92,18 +92,40 @@
"href": "cont_dp_HJB.html#hamilton-jacobi-bellman-hjb-equation",
"title": "Dynamic programming for continuous-time optimal control",
"section": "Hamilton-Jacobi-Bellman (HJB) equation",
- "text": "Hamilton-Jacobi-Bellman (HJB) equation\nWe now consider an arbitrary time t and split the (remaining) time interval [t,t_\\mathrm{f}] into two parts [t,t+\\Delta t] and [t+\\Delta t,t_\\mathrm{f}] , and structure the cost function accordingly \nJ(\\bm x(t),\\bm u(\\cdot),t) = \\int_{t}^{t+\\Delta t} L(\\bm x,\\bm u,\\tau)\\,\\mathrm{d}\\tau + \\underbrace{\\int_{t+\\Delta t}^{t_\\mathrm{f}} L(\\bm x,\\bm u,\\tau)\\,\\mathrm{d}\\tau + \\phi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f})}_{J(\\bm x(t+\\Delta t), \\bm u(t+\\Delta t), t+\\Delta t)}.\n\nBellman’s principle of optimality gives \nJ^\\star(\\bm x(t),t) = \\min_{\\bm u(\\tau),\\;t\\leq\\tau\\leq t+\\Delta t} \\left[\\int_{t}^{t+\\Delta t} L(\\bm x,\\bm u,\\tau)\\,\\mathrm{d}\\tau + J^\\star(\\bm x+\\Delta \\bm x, t+\\Delta t)\\right].\n\nWe now perform Taylor series expansion of J^\\star(\\bm x+\\Delta \\bm x, t+\\Delta t) about (\\bm x,t) \nJ^\\star(\\bm x,t) = \\min_{\\bm u(\\tau),\\;t\\leq\\tau\\leq t+\\Delta t} \\left[L\\Delta t + J^\\star(\\bm x,t) + (\\nabla_{\\bm x} J^\\star)^\\top \\Delta \\bm x + \\frac{\\partial J^\\star}{\\partial t}\\Delta t + \\mathcal{O}((\\Delta t)^2)\\right].\n\nUsing \n\\Delta \\bm x = \\bm f(\\bm x,\\bm u,t)\\Delta t\n and noting that J^\\star and J_t^\\star are independent of \\bm u(\\tau),\\;t\\leq\\tau\\leq t+\\Delta t, we get \n\\cancel{J^\\star (\\bm x,t)} = \\cancel{J^\\star (\\bm x,t)} + \\frac{\\partial J^\\star }{\\partial t}\\Delta t + \\min_{\\bm u(\\tau),\\;t\\leq\\tau\\leq t+\\Delta t}\\left[L\\Delta t + (\\nabla_{\\bm x} J^\\star )^\\top f\\Delta t\\right].\n\nAssuming \\Delta t\\rightarrow 0 leads to the celebrated Hamilton-Jacobi-Bellman (HJB) equation \\boxed{\n-\\frac{\\partial {\\color{blue}J^\\star (\\bm x(t),t)}}{\\partial t} = \\min_{\\bm u(t)}\\left[L(\\bm x(t),\\bm u(t),t)+(\\nabla_{\\bm x} {\\color{blue} J^\\star (\\bm x(t),t)})^\\top \\bm f(\\bm x(t),\\bm u(t),t)\\right].}\n\nThis is obviously a partial differential equation (PDE) for the optimal cost function J^\\star(\\bm x,t).\nAnd since this is a differential equation, boundary value(s) must be specified to determine a unique solution. In particular, since the equation is first-order with respect to both time and state, specifying the value of the optimal cost function at the final state and the final time is enough. With the general final-state constraints we have introduced above, the boundary value condition reads \nJ^\\star (\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) = \\phi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}),\\qquad \\text{on the hypersurface } \\psi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) = 0.\n\nNote that this includes as special cases the fixed-final-state and free-final-state cases.",
+ "text": "Hamilton-Jacobi-Bellman (HJB) equation\nWe now consider an arbitrary time t and split the (remaining) time interval [t,t_\\mathrm{f}] into two parts [t,t+\\Delta t] and [t+\\Delta t,t_\\mathrm{f}] , and structure the cost function accordingly \nJ(\\bm x(t),\\bm u(\\cdot),t) = \\int_{t}^{t+\\Delta t} L(\\bm x,\\bm u,\\tau)\\,\\mathrm{d}\\tau + \\underbrace{\\int_{t+\\Delta t}^{t_\\mathrm{f}} L(\\bm x,\\bm u,\\tau)\\,\\mathrm{d}\\tau + \\phi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f})}_{J(\\bm x(t+\\Delta t), \\bm u(t+\\Delta t), t+\\Delta t)}.\n\nBellman’s principle of optimality gives \nJ^\\star(\\bm x(t),t) = \\min_{\\bm u(\\tau),\\;t\\leq\\tau\\leq t+\\Delta t} \\left[\\int_{t}^{t+\\Delta t} L(\\bm x,\\bm u,\\tau)\\,\\mathrm{d}\\tau + J^\\star(\\bm x+\\Delta \\bm x, t+\\Delta t)\\right].\n\nWe now perform Taylor series expansion of J^\\star(\\bm x+\\Delta \\bm x, t+\\Delta t) about (\\bm x,t) \nJ^\\star(\\bm x,t) = \\min_{\\bm u(\\tau),\\;t\\leq\\tau\\leq t+\\Delta t} \\left[L\\Delta t + J^\\star(\\bm x,t) + (\\nabla_{\\bm x} J^\\star)^\\top \\Delta \\bm x + \\frac{\\partial J^\\star}{\\partial t}\\Delta t + \\mathcal{O}((\\Delta t)^2)\\right].\n\nUsing \n\\Delta \\bm x = \\bm f(\\bm x,\\bm u,t)\\Delta t\n and noting that J^\\star and J_t^\\star are independent of \\bm u(\\tau),\\;t\\leq\\tau\\leq t+\\Delta t, we get \n\\cancel{J^\\star (\\bm x,t)} = \\cancel{J^\\star (\\bm x,t)} + \\frac{\\partial J^\\star }{\\partial t}\\Delta t + \\min_{\\bm u(\\tau),\\;t\\leq\\tau\\leq t+\\Delta t}\\left[L\\Delta t + (\\nabla_{\\bm x} J^\\star )^\\top f\\Delta t\\right].\n\nAssuming \\Delta t\\rightarrow 0 leads to the celebrated Hamilton-Jacobi-Bellman (HJB) equation \\boxed{\n-\\frac{\\partial {\\color{blue}J^\\star (\\bm x(t),t)}}{\\partial t} = \\min_{\\bm u(t)}\\left[L(\\bm x(t),\\bm u(t),t)+(\\nabla_{\\bm x} {\\color{blue} J^\\star (\\bm x(t),t)})^\\top \\bm f(\\bm x(t),\\bm u(t),t)\\right].}\n\nThis is obviously a partial differential equation (PDE) for the optimal cost function J^\\star(\\bm x,t).\n\nBoundary conditions for the HJB equation\nSince the HJB equation is a differential equation, initial/boundary value(s) must be specified to determine a unique solution. In particular, since the equation is first-order with respect to both time and state, specifying the value of the optimal cost function at the final state and the final time is enough.\nFor a fixed-final-time, free-final-state, the optimal cost at the final time is \nJ^\\star (\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) = \\phi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}).\n\nFor a fixed-final-time, fixed-final-state, since the component of the cost function corresponding to the terminal state is zero, the optimal cost at the final time is zero as well \nJ^\\star (\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) = 0.\n\nWith the general final-state constraints introduced above, the boundary value condition reads \nJ^\\star (\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) = \\phi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}),\\qquad \\text{on the hypersurface } \\psi(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) = 0.\n\n\n\nOptimal control using the optimal cost (-to-go) function\nAssume now that the solution J^\\star (\\bm x(t),t) to the HJB equation is available. We can then find the optimal control by the minimization \\boxed\n{\\bm u^\\star(t) = \\arg\\min_{\\bm u(t)}\\left[L(\\bm x(t),\\bm u(t),t)+(\\nabla_{\\bm x} J^\\star (\\bm x(t),t))^\\top \\bm f(\\bm x(t),\\bm u(t),t)\\right].}\n\nFor convenience, the minimized function is often labelled as \nQ(\\bm x(t),\\bm u(t),t) = L(\\bm x(t),\\bm u(t),t)+(\\nabla_{\\bm x} J^\\star (\\bm x(t),t))^\\top \\bm f(\\bm x(t),\\bm u(t),t)\n and called just Q-function. The optimal control is then \n\\bm u^\\star(t) = \\arg\\min_{\\bm u(t)} Q(\\bm x(t),\\bm u(t),t).",
"crumbs": [
"9.X. Continuous-time optimal control – dynamic programming",
"Dynamic programming for continuous-time optimal control"
]
},
{
- "objectID": "cont_dp_HJB.html#hjb-equation-and-hamiltonian",
- "href": "cont_dp_HJB.html#hjb-equation-and-hamiltonian",
+ "objectID": "cont_dp_HJB.html#hjb-equation-formulated-using-a-hamiltonian",
+ "href": "cont_dp_HJB.html#hjb-equation-formulated-using-a-hamiltonian",
"title": "Dynamic programming for continuous-time optimal control",
- "section": "HJB equation and Hamiltonian",
- "text": "HJB equation and Hamiltonian\nRecall the definition of Hamiltonian H(\\bm x,\\bm u,\\bm \\lambda,t) = L(\\bm x,\\bm u,t) + \\boldsymbol{\\lambda}^\\top \\mathbf f(\\bm x,\\bm u,t). The HJB equation can also be written as \\boxed\n{-\\frac{\\partial J^\\star (\\bm x(t),t)}{\\partial t} = \\min_{\\bm u(t)}H(\\bm x(t),\\bm u(t),\\nabla_{\\bm x} J^\\star (\\bm x(t),t),t).}\n\nWhat we have just derived is one of the most profound results in optimal control – Hamiltonian must be minimized by the optimal control. We will exploit it next for some derivations.\nRecall also that we have already encountered a similar results that made statements about the necessary maximization (or minimization) of the Hamiltonian with respect to the control – the celebrated Pontryagin’s principle of maximum (or minimum).",
+ "section": "HJB equation formulated using a Hamiltonian",
+ "text": "HJB equation formulated using a Hamiltonian\nRecall the definition of Hamiltonian H(\\bm x,\\bm u,\\bm \\lambda,t) = L(\\bm x,\\bm u,t) + \\boldsymbol{\\lambda}^\\top \\mathbf f(\\bm x,\\bm u,t). The HJB equation can also be written as \\boxed\n{-\\frac{\\partial J^\\star (\\bm x(t),t)}{\\partial t} = \\min_{\\bm u(t)}H(\\bm x(t),\\bm u(t),\\nabla_{\\bm x} J^\\star (\\bm x(t),t),t).}\n\nWhat we have just derived is one of the most profound results in optimal control – Hamiltonian must be minimized by the optimal control. We will exploit it next as a tool for deriving some theoretical results.",
+ "crumbs": [
+ "9.X. Continuous-time optimal control – dynamic programming",
+ "Dynamic programming for continuous-time optimal control"
+ ]
+ },
+ {
+ "objectID": "cont_dp_HJB.html#hjb-equation-vs-pontryagins-principle-of-maximum-minimum",
+ "href": "cont_dp_HJB.html#hjb-equation-vs-pontryagins-principle-of-maximum-minimum",
+ "title": "Dynamic programming for continuous-time optimal control",
+ "section": "HJB equation vs Pontryagin’s principle of maximum (minimum)",
+ "text": "HJB equation vs Pontryagin’s principle of maximum (minimum)\nRecall also that we have already encountered a similar results that made statements about the necessary maximization (or minimization) of the Hamiltonian with respect to the control – the celebrated Pontryagin’s principle of maximum (or minimum). Are these two related? Equivalent?",
+ "crumbs": [
+ "9.X. Continuous-time optimal control – dynamic programming",
+ "Dynamic programming for continuous-time optimal control"
+ ]
+ },
+ {
+ "objectID": "cont_dp_HJB.html#hjb-equation-for-an-infinite-time-horizon",
+ "href": "cont_dp_HJB.html#hjb-equation-for-an-infinite-time-horizon",
+ "title": "Dynamic programming for continuous-time optimal control",
+ "section": "HJB equation for an infinite time horizon",
+ "text": "HJB equation for an infinite time horizon\nWhen both the system and the cost function are time-invariant, and the final time is infinite, that is, t_\\mathrm{f}=\\infty, the optimal cost function J^\\star() must necessarily be independent of time, that is, it’s partial derivative with respect to time is zero, that is, \\frac{\\partial J^\\star (\\bm x(t),t)}{\\partial t} = 0. The HJB equation then simplifies to\n\\boxed{\n0 = \\min_{\\bm u(t)}\\left[L(\\bm x(t),\\bm u(t))+(\\nabla_{\\bm x} {J^\\star (\\bm x(t),t)})^\\top \\bm f(\\bm x(t),\\bm u(t))\\right],}\n or, using a Hamiltonian \\boxed\n{0 = \\min_{\\bm u(t)}H(\\bm x(t),\\bm u(t),\\nabla_{\\bm x} J^\\star (\\bm x(t))).}",
"crumbs": [
"9.X. Continuous-time optimal control – dynamic programming",
"Dynamic programming for continuous-time optimal control"
@@ -455,7 +477,7 @@
"href": "cont_numerical_indirect.html#methods-for-solving-tp-bvp-ode",
"title": "Numerical methods for indirect approach",
"section": "Methods for solving TP-BVP ODE",
- "text": "Methods for solving TP-BVP ODE\nHere we assume that from the stationarity equation \n\\mathbf 0 = \\nabla_{\\bm u} H(\\bm x,\\bm u,\\bm \\lambda)\n we can express \\bm u(t) as a function of the the state and costate variables, \\bm x(t) and \\bm \\lambda(t), respectively. In fact, Pontryagin’s principles gives this expression as \\bm u^\\star(t) = \\text{arg} \\min_{\\bm u(t) \\in\\mathcal{U}} H(\\bm x^\\star(t),\\bm u(t), \\bm\\lambda^\\star(t)). And we substitute for \\bm u(t) into the state and costate equations. This way we eliminate \\bm u(t) from the system of DAEs and we are left with a system of ODEs for \\bm x(t) and \\bm \\lambda(t) only. Formally, the resulting Hamiltonian is a different function as it is now a functio of two variables only.\n\n\\begin{aligned}\n\\dot{\\bm{x}} &= \\nabla_{\\bm\\lambda} \\mathcal H(\\bm x,\\bm \\lambda) \\\\\n\\dot{\\bm{\\lambda}} &= -\\nabla_{\\bm x} \\mathcal H(\\bm x,\\bm \\lambda) \\\\\n\\bm x(t_\\mathrm{i}) &=\\mathbf x_\\mathrm{i}\\\\\n\\bm x(t_\\mathrm{f}) &= \\mathbf x_\\mathrm{f} \\qquad \\text{or} \\qquad \\bm \\lambda(t_\\mathrm{f}) = \\nabla\\phi(\\bm{x}(t_\\mathrm{f})).\n\\end{aligned}\n\nAlthough we now have an ODE system, it is still a BVP. Strictly speaking, from now on, arbitrary reference on numerical solution of boundary value problems can be consulted to get some overview – we no longer need to restrict ourselves to the optimal control literature and software. On the other hand, the right sides are not quite arbitrary – these are Hamiltonian equations – and this property could and perhaps even should be exploited by the solution methods.\nThe methods for solving general BVPs are generally divided into\n\nshooting and multiple shooting methods,\ndiscretization methods,\ncollocation methods.\n\n\nShooting methods\n\nShooting method outside optimal control\nHaving made the diclaimer that boundary value problems constitute a topic indenendent of the optimal control theory, we start their investigation within a control-unrelated setup. We consider a system of two ordinary differential equations in two variables with the value of the first variable specified at both ends while the value of the other variable is left unspecified \n\\begin{aligned}\n\\begin{bmatrix}\n \\dot y_1(t)\\\\\n \\dot y_2(t)\n\\end{bmatrix}\n&=\n\\begin{bmatrix}\nf_1(\\bm y,t)\\\\\nf_2(\\bm y,t)\n\\end{bmatrix}\\\\\ny_1(t_\\mathrm{i}) &= \\mathrm y_{1\\mathrm{i}},\\\\\ny_1(t_\\mathrm{f}) &= \\mathrm y_{1\\mathrm{f}}.\n\\end{aligned}\n\nAn idea for a solution method is this:\n\nGuess at the missing (unspecified) value y_{2\\mathrm{i}} of y_2 at the initial time t_\\mathrm{i},\nUse an IVP solver (for example ode45 in Matlab) to find the values of both variables over the whole interval [t_\\mathrm{i},t_\\mathrm{f}].\nCompare the simulated value of the state variable y_1 at the final time t_\\mathrm{f} and compare it with the boundary value .\nBased on the error e = y_1(t_\\mathrm{f})-\\mathrm y_{1\\mathrm{f}}, update y_{2\\mathrm{i}} and go back to step 2.\n\nHow shall the update in the step 4 be realized? The value of y_1 at the final time t_\\mathrm{f} and therefore the error e are functions of the value y_{2\\mathrm{i}} of y_2 at the initial time t_\\mathrm{i}. We can formally express this upon introducing a map F such that e = F(y_{2\\mathrm{i}}). The problem now boils down to solving the nonlinear equation \\boxed\n{F(y_{2\\mathrm{i}}) = 0.}\n\nIf Newton’s method is to be used for solving this equation, the derivative of F is needed. Most often than not, numerical solvers for IVP ODE have to be called in order to evaluate the function F, in which case the derivative cannot be determined analytically. Finite difference (FD) and algorithmic/automatic differentiation (AD) methods are available.\nIn this example we only considered y_1 and y_2 as scalar variables, but in general these could be vector variables, in which case a system of equations in the vector variable has to be solved. Instead of a single scalar derivative, its matrix version – Jacobian matrix – must be determined.\nBy now the reason for calling this method shooting is perhaps obvious. Indeed, the analogy with aiming and shooting a cannon is illustrative.\nAs another example, we consider the BVP for a pendulum.\n\nExample 1 (BVP for pendulum) For an ideal pendulum described by the second-order model \\ddot \\theta + \\frac{b}{ml^2}\\dot \\theta + \\frac{g}{l} \\sin(\\theta) = 0 and for a final time t_\\mathrm{f}, at which some prescribed value of \\theta(t_\\mathrm{f}) must be achieved, compute by the shooting method the needed value of the initial angle \\theta_\\mathrm{i}, while assuming the initial angular rate \\omega_\\mathrm{i} is zero.\n\n\nShow the code\nusing DifferentialEquations\nusing Roots\nusing Plots\n\nfunction demo_shoot_pendulum()\n θfinal = -0.2;\n tfinal = 3.5;\n tspan = (0.0,tfinal)\n tol = 1e-5\n function pendulum!(dx,x,p,t)\n g = 9.81\n l = 1.0;\n m = 1.0;\n b = 0.1;\n a₁ = g/l\n a₂ = b/(m*l^2)\n θ,ω = x\n dx[1] = ω\n dx[2] = -a₁*sin(θ) - a₂*ω\n end\n prob = ODEProblem(pendulum!,zeros(Float64,2),tspan)\n function F(θ₀::Float64)\n xinitial = [θ₀,0.0]\n prob = remake(prob,u0=xinitial)\n sol = solve(prob,Tsit5(),reltol=tol/10,abstol=tol/10)\n return θfinal-sol[end][1]\n end\n θinitial = find_zero(F,(-pi,pi)) # Solving the equation F(θ)=0 using Roots package. In general can find more solutions.\n xinitial = [θinitial,0.0]\n prob = remake(prob,u0=xinitial) # Already solved in F(), but we solve it again for plotting.\n sol = solve(prob,Tsit5())\n p1 = plot(sol,lw=2,xlabel=\"Time\",ylabel=\"Angle\",label=\"θ\",idxs=(1))\n scatter!([tfinal],[θfinal],label=\"Required terminal θ\")\n p2 = plot(sol,lw=2,xlabel=\"Time\",ylabel=\"Angular rate\",label=\"ω\",idxs=(2))\n display(plot(p1,p2,layout=(2,1)))\nend\n\ndemo_shoot_pendulum()\n\n\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 1: State responses for a pendulum on a given time interval, with zero initial angular rate and the initial angle solved for numerically so that the final angle attains a give value\n\n\n\n\n\nA few general comments to the above code:\n\nThe function F(\\theta_\\mathrm{i}) that defines the nonlinear equation F(\\theta_\\mathrm{i})=0 calles a numerical solver for an IVP ODE. The latter solver then should have the numerical tolerances set more stringent than the former.\nThe ODE problem should only be defined once and then in each iteration its parameters should be updated. In Julia, this is done by the remake function, but it may be similar for other languages.\n\n\n\nShooting method for indirect approach to optimal control\nWe finally bring the method into the realm of indirect approach to optimal control – it is the initial value \\lambda_\\mathrm{i} of the costate variable that serves as an optimization variable, while the initial value x_\\mathrm{i} of the state variable is known and fixed. The final values of both the state and costate variables are the outcomes of numerical simulation obtained using a numerical solver for an IVP ODE. Based on these, the residual is computed. Either as e = x(t_\\mathrm{f})-x_\\mathrm{f} if the final state is fixed, or as e = \\lambda(t_\\mathrm{f}) - \\nabla \\phi(x(t_\\mathrm{f})) if the final state is free. Based on this residual, the initial value of the costate is updated and another iteration of the algorithm is entered.\n\n\n\n\n\n\nFigure 2: Indirect shooting\n\n\n\n\nExample 2 (Shooting for indirect approach to LQR) Standard LQR optimal control for a second-order system on a fixed finite interval with a fixed final state.\n\n\nShow the code\nusing LinearAlgebra\nusing DifferentialEquations\nusing NLsolve\n\nfunction shoot_lq_fixed(A,B,Q,R,xinitial,xfinal,tfinal)\n n = size(A)[1]\n function statecostateeq!(dw,w,p,t)\n x = w[1:n]\n λ = w[(n+1):end]\n dw[1:n] = A*x - B*(R\\B'*λ)\n dw[(n+1):end] = -Q*x - A'*λ\n end\n λinitial = zeros(n)\n tspan = (0.0,tfinal)\n tol = 1e-5\n function F(λinitial)\n winitial = vcat(xinitial,λinitial)\n prob = ODEProblem(statecostateeq!,winitial,tspan)\n dsol = solve(prob,Tsit5(),abstol=tol/10,reltol=tol/10)\n xfinalsolved = dsol[end][1:n]\n return (xfinal-xfinalsolved)\n end\n nsol = nlsolve(F,λinitial,xtol=tol) # Could add autodiff=:forward.\n λinitial = nsol.zero # Solving once again for plotting.\n winitial = vcat(xinitial,λinitial)\n prob = ODEProblem(statecostateeq!,winitial,tspan)\n dsol = solve(prob,Tsit5(),abstol=tol/10,reltol=tol/10)\n return dsol\nend\n\nfunction demo_shoot_lq_fixed()\n n = 2 # Order of the system.\n m = 1 # Number of inputs.\n A = rand(n,n) # Matrices modeling the system.\n B = rand(n,m)\n \n Q = diagm(0=>rand(n)) # Weighting matrices for the quadratic cost function.\n R = rand(1,1)\n\n xinitial = [1.0, 2.0]\n xfinal = [3.0, 4.0]\n tfinal = 5.0 \n\n dsol = shoot_lq_fixed(A,B,Q,R,xinitial,xfinal,tfinal)\n\n p1 = plot(dsol,idxs=(1:2),lw=2,legend=false,xlabel=\"Time\",ylabel=\"State\")\n p2 = plot(dsol,idxs=(3:4),lw=2,legend=false,xlabel=\"Time\",ylabel=\"Costate\")\n display(plot(p1,p2,layout=(2,1)))\nend\n\ndemo_shoot_lq_fixed()\n\n\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3: Shooting method applied to the indirect approach to LQR problem\n\n\n\n\n\n\n\n\nMultiple shooting methods\nThe key deficiency of the shooting method is that the only source of the error is the error in the initial condition, this error then amplifies as it propagates over the whole time interval as the numerical integration proceeds, and consequently the residual is very sensitive to tiny changes in the initial value. The multiple shooting method is a remedy for this. The idea is to divide the interval [t_\\mathrm{i},t_\\mathrm{f}] into N subintervals [t_k,t_{k+1}] and to introduce the values of the state and co-state variable at the beginning of each subinterval as additional variables. Additional equations are then introduced that enforce the continuity of the variable at the end of one subinterval and at the beginning of the next subinterval.\n\n\n\n\n\n\nFigure 4: Indirect multiple shooting\n\n\n\n\n\nDiscretization methods\nShooting methods take advantage of availability of solvers for IVP ODEs. These solvers produce discret(ized) trajectories, proceeding (integration) step by step, forward in time. But they do this in a way hidden from users. We just have to set the initial conditions (possibly through numerical optimization) and the solver does the rest.\nAlternatively, the formulas for the discrete-time updates are not evaluated one by one, step by step, running forward in time, but are assembled to form a system of equations, in general nolinear ones. Appropriate boundary conditions are then added to these nonlinear equations and the whole system is then solved numerically, yielding a discrete approximation of the trajectories satisfying the BVP.\nSince all those equatins are solved simultaneously (as a system of equations), there is no advantage in using explicit methods for solving ODEs, and implicit methods are used instead.\nIt is now time to recall some crucial results from the numerical methods for solving ODEs. First, we start with the popular single-step methods known as the Runge-Kutta (RK) methods.\nWe consider the standard ODE \\dot x(t) = f(x(t),t).\nand we define the Butcher tableau as \n \\begin{array}{ l | c c c c }\n c_1 & a_{11} & a_{12} & \\ldots & a_{1s}\\\\\n c_2 & a_{21} & a_{22} & \\ldots & a_{2s}\\\\\n \\vdots & \\vdots\\\\\n c_s & a_{s1} & a_{s2} & \\ldots & a_{ss}\\\\\n \\hline\n & b_{1} & b_{2} & \\ldots & b_{s}\n \\end{array}.\n such that c_i = \\sum_{j=1}^s a_{ij}, and 1 = \\sum_{j=1}^s b_{j}.\nReffering to the particular Butcher table, a single step of the method is \n \\begin{aligned}\n f_{k1} &= f(x_k + h_k \\left(a_{11}f_{k1}+a_{12}f_{k2} + \\ldots + a_{1s}f_{ks}),t_k+c_1h_k\\right)\\\\\n f_{k2} &= f(x_k + h_k \\left(a_{21}f_{k1}+a_{22}f_{k2} + \\ldots + a_{2s}f_{ks}),t_k+c_2h_k\\right)\\\\\n \\vdots\\\\\n f_{ks} &= f(x_k + h_k \\left(a_{s1}f_{k1}+a_{s2}f_{k2} + \\ldots + a_{ss}f_{ks}),t_k+c_sh_k\\right)\\\\\n x_{k+1} &= x_k + h_k \\left(b_1 f_{k1}+b_2f_{k2} + \\ldots + b_sf_{ks}\\right).\n \\end{aligned}\n\nIf the matrix A is strictly lower triangular, that is, if a_{ij} = 0 for all i<j , the method belongs to explicit Runge-Kutta methods, otherwise it belongs to implicit Runge-Kutta methods.\nA prominent example of explicit RK methods is the 4-stage RK method (oftentimes referred to as RK4).\n\nExplicit RK4 method\nThe Buttcher table for the method is \n \\begin{array}{ l | c c c c }\n 0 & 0 & 0 & 0 & 0\\\\\n 1/2 & 1/2 & 0 & 0 & 0\\\\\n 1/2 & 0 & 1/2 & 0 & 0\\\\\n 1 & 0 & 0 & 1 & 0\\\\\n \\hline\n & 1/6 & 1/3 & 1/3 & 1/6\n \\end{array}.\n\nFollowing the Butcher table, a single step of this method is \n \\begin{aligned}\n f_{k1} &= f(x_k,t_k)\\\\\n f_{k2} &= f\\left(x_k + \\frac{h_k}{2}f_{k1},t_k+\\frac{h_k}{2}\\right)\\\\\n f_{k3} &= f\\left(x_k + \\frac{h_k}{2}f_{k2},t_k+\\frac{h_k}{2}\\right)\\\\\n f_{k4} &= f\\left(x_k + h_k f_{k3},t_k+h_k\\right)\\\\\n x_{k+1} &= x_k + h_k \\left(\\frac{1}{6} f_{k1}+\\frac{1}{3}f_{k2} + \\frac{1}{3}f_{k3} + \\frac{1}{6}f_{k4}\\right)\n \\end{aligned}.\n\nBut as we have just mentions, explicit methods are not particularly useful for solving BVPs. We prefer implicit methods. One of the simplest is the implicit midpoint method.\n\n\nImplicit midpoint method\nThe Butcher tableau is \n \\begin{array}{ l | c r }\n 1/2 & 1/2 \\\\\n \\hline\n & 1\n \\end{array}\n\nA single step is then \\begin{aligned}\n f_{k1} &= f\\left(x_k+\\frac{1}{2}f_{k1} h_k, t_k+\\frac{1}{2}h_k\\right)\\\\\n x_{k+1} &= x_k + h_k f_{k1}.\n \\end{aligned}\nBut adding to the last equation x_k we get x_{k+1} + x_k = 2x_k + h_k f_{k1}.\nDividing by two we get \\frac{1}{2}(x_{k+1} + x_k) = x_k + \\frac{1}{2}h_k f_{k1} and then it follows that \\boxed{x_{k+1} = x_k + h_k f\\left(\\frac{1}{2}(x_k+x_{k+1}),t_k+\\frac{1}{2}h_k\\right).}\nThe right hand side of the last equation explains the “midpoint” in the name. It can be viewed as a rectangular approximation to the integral in x_{k+1} = x_k + \\int_{t_k}^{t_{k+1}} f(x(t),t)\\mathrm{d}t as the integral is computed as an area of a rectangle with the height determined by f() evaluated in the middle point.\nAlthough we do not explain the details here, let’s just note that it is the simplest of the collocation methods. In particular it belongs to Gauss (also Gauss-Legandre) methods.\n\n\nImplicit trapezoidal method\nThe method can be viewed both as a single-step (RK) method and a multi-step method. When viewed as an RK method, its Butcher table is \n \\begin{array}{ l | c r }\n 0 & 0 & 0 \\\\\n 1 & 1/2 & 1/2 \\\\\n \\hline\n & 1/2 & 1/2 \\\\\n \\end{array}\n Following the Butcher table, a single step of the method is then \n \\begin{aligned}\n f_{k1} &= f(x_k,t_k)\\\\\n f_{k2} &= f(x_k + h_k \\frac{f_{k1}+f_{k2}}{2},t_k+h_k)\\\\\n x_{k+1} &= x_k + h_k \\left(\\frac{1}{2} f_{k1}+\\frac{1}{2} f_{k2}\\right).\n \\end{aligned}\n\nBut since the collocation points are identical with the nodes (grid/mesh points), we can relabel to \\begin{aligned}\n f_{k} &= f(x_k,t_k)\\\\\n f_{k+1} &= f(x_{k+1},t_{k+1})\\\\\n x_{k+1} &= x_k + h_k \\left(\\frac{1}{2} f_{k}+\\frac{1}{2} f_{k+1}\\right).\n \\end{aligned}\n\nThis possibility is a particular advantage of Lobatto and Radau methods, which contain both end points of the interval or just one of them among the collocation points. The two symbols f_k and f_{k+1} are really just shorthands for values of the function f at the beginning and the end of the integration interval, so the first two equations of the triple above are not really equations to be solved but rather definitions. And we can assemble it all into just one equation \\boxed{\n x_{k+1} = x_k + h_k \\frac{f(x_k,t_k)+f(x_{k+1},t_{k+1})}{2}.\n }\n\nThe right hand side of the last equation explains the “trapezoidal” in the name. It can be viewed as a trapezoidal approximation to the integral in x_{k+1} = x_k + \\int_{t_k}^{t_{k+1}} f(x(t),t)\\mathrm{d}t as the integral is computed as an area of a trapezoid.\nWhen it comes to building a system of equations within transcription methods, we typically move all the terms just on one side to obtain the defect equations x_{k+1} - x_k - h_k \\left(\\frac{1}{2} f(x_k,t_k)+\\frac{1}{2} f(x_{k+1},t_{k+1})\\right) = 0.\n\n\nHermite-Simpson method\nIt belongs to the family of Lobatto III methods, namely it is a 3-stage Lobatto IIIA method. Butcher tableau \n \\begin{array}{ l | c c c c }\n 0 & 0 &0 & 0\\\\\n 1/2 & 5/24 & 1/3 & -1/24\\\\\n 1 & 1/6 & 2/3 & 1/6\\\\\n \\hline\n & 1/6 & 2/3 & 1/6\n \\end{array}\n\nHermite-Simpson method can actually come in three forms (this is from Betts (2020)):\n\nPrimary form\nThere are two equations for the given integration interval [t_k,t_{k+1}] x_{k+1} = x_k + h_k \\left(\\frac{1}{6}f_k + \\frac{2}{3}f_{k2} + \\frac{1}{6}f_{k+1}\\right), x_{k2} = x_k + h_k \\left(\\frac{5}{24}f_k + \\frac{1}{3}f_{k2} - \\frac{1}{24}f_{k+1}\\right), where the f symbols are just shorthand notations for values of the function at a certain point f_k = f(x_k,u(t_k),t_k), f_{k2} = f(x_{k2},u(t_{k2}),t_{k2}), f_{k+1} = f(x_{k+1},u(t_{k+1}),t_{k+1}), and the off-grid time t_{k2} is given by t_{k2} = t_k + \\frac{1}{2}h_k.\nThe first of the two equations can be recognized as Simpson’s rule for computing a definite integral. Note that while considering the right hand sides as functions of the control inputs, we also correctly express at which time (the collocation time) we consider the value of the control variable.\nBeing this general allows considering general control inputs and not only piecewise constant control inputs. For example, if we consider piecewise linear control inputs, then u(t_{k2}) = \\frac{u_k + u_{k+1}}{2}. But if we stick to the (more common) piecewise constant controls, not surprisingly u(t_{k2}) = u_k. Typically we format the equations as defect equations, that is, with zero on the right hand side \n\\begin{aligned}\nx_{k+1} - x_k - h_k \\left(\\frac{1}{6}f_k + \\frac{2}{3}f_{k2} + \\frac{1}{6}f_{k+1}\\right) &= 0,\\\\\nx_{k2} - x_k - h_k \\left(\\frac{5}{24}f_k + \\frac{1}{3}f_{k2} - \\frac{1}{24}f_{k+1}\\right) &= 0.\n\\end{aligned}\n\nThe optimization variables for every integration interval are x_k,u_k,x_{k2}, u_{k2}.\n\n\nHermite-Simpson Separated (HSS) method\nAlternatively, we can express f_{k2} in the first equation as a function of the remaining terms and then substitute to the second equation. This will transform the second equation such that only the terms indexed with k and k+1 are present. \n\\begin{aligned}\nx_{k+1} - x_k - h_k \\left(\\frac{1}{6}f_k + \\frac{2}{3}f_{k2} + \\frac{1}{6}f_{k+1}\\right) &= 0,\\\\\nx_{k2} - \\frac{x_k + x_{k+1}}{2} - \\frac{h_k}{8} \\left(f_k - f_{k+1}\\right) &= 0.\n\\end{aligned}\n\nWhile we already know (from some paragraph above) that the first equation is Simpson’s rule, the second equation is an outcome of Hermite intepolation. Hence the name. The optimization variables for every integration interval are the same as before, that is, x_k,u_k,x_{k2}, u_{k2} .\n\n\nHermite-Simpson Condensed (HSC) method\nYet some more simplification can be obtained from HSS. Namely, the second equation can be actually used to directly prescribe x_{k2} x_{k2} = \\frac{x_k + x_{k+1}}{2} + \\frac{h_k}{8} \\left(f_k - f_{k+1}\\right), which is used in the first equation as an argument for the f() function (represented by the f_{k2} symbol), by which the second equation and the term x_{k2} are eliminated from the set of defect equations. The optimization variables for every integration interval still need to contain u_{k2} even though x_{k2} was eliminated, because it is needed to parameterize f_{k2} . That is, the optimization variables then are x_k,u_k, u_{k2} . Reportedly (by Betts) this has been widely used and historically one of the first methods. When it comes to using it in optimal control, it turns out, however, that the sparsity pattern is better for the HSS.\n\n\n\n\nCollocation methods\nYet another family of methods for solving BVP ODE \\dot x(t) = f(x(t),t) are collocation methods. They are also based on discretization of independent variable – the time t. That is, on the interval [t_\\mathrm{i}, t_\\mathrm{f}], discretization points (or grid points or nodes or knots) are chosen, say, t_0, t_1, \\ldots, t_N, where t_0 = t_\\mathrm{i} and t_N = t_\\mathrm{f}. The solution x(t) is then approximated by a polynomial p_k(t) of a certain degree s on each interval [t_k,t_{k+1}] of length h_k=t_{k+1}-t_k\np_k(t) = p_{k0} + p_{k1}(t-t_k) + p_{k2}(t-t_k)^2+\\ldots + p_{ks}(t-t_k)^s.\nThe degree of the polynomial is low, say s=3 or so, certainly well below 10. With N subintervals, the total number of coefficients to parameterize the (approximation of the) solution x(t) over the whole interval is then N(s+1). For example, for s=3 and N=10, we have 40 coefficients: p_{00}, p_{01}, p_{02}, p_{03}, p_{10}, p_{11}, p_{12}, p_{13},\\ldots, p_{90}, p_{91}, p_{92}, p_{93}.\n\n\n\n\n\n\nFigure 5: Indirect collocation\n\n\n\nFinding a solution amounts to determining all those coefficients. Once we have them, the (approximate) solution is given by a piecewise polynomial.\nHow to determine the coefficients? By interpolation. But we will see in a moment that two types of interpolation are needed – interpolation of the value of the solution and interpolation of the derivative of the solution.\nThe former is only performed at the beginning of each interval, that is, at every discretization point (or grid point or node or knot). The condition reads that the polynomial p_{k-1}(t) approximating the solution x(t) on the (k-1)th interval should attain the same value at the end of that interval, that is, at t_{k-1} + h_{k-1}, as the polynomial p_k(t) approximating the solution x(t) on the kth interval attains at the same point, which from its perspective is the beginning of the kth interval, that is, t_k. We express this condition formally as \\boxed{p_{k-1}(\\underbrace{t_{k-1}+h_{k-1}}_{t_{k}}) = p_k(t_k).}\nExpanding the two polynomials, we get p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2+\\ldots + p_{k-1,s}h_{k-1}^s = p_{k0}.\n\n\n\n\n\n\nSubscripts in the coefficients\n\n\n\nWe adopt the notational convention that a coefficient of a polynomial is indexed by two indices, the first one indicating the interval and the second one indicating the power of the corresponding term. For example, p_{k-1,2} is the coefficient of the quadratic term in the polynomial approximating the solution on the (k-1)th interval. For the sake of brevity, we omit the comma between the two subscripts in the cases such as p_{k1} (instead of writing p_{k,1}). But we do write p_{k-1,0}, because here omiting the comma would introduce ambiguity.\n\n\nGood, we have one condition (one equation) for each subinterval. But we need more, if polynomials of degree at least one are considered (we then parameterize them by two parameters, in which case one more equation is needed for each subinterval). Here comes the opportunity for the other kind of interpolation – interpolation of the derivative of the solution. At a given point (or points) that we call collocation points, the polynomial p_k(t) approximating the solution x(t) on the kth interval should satisfy the same differential equation \\dot x(t) = f(x(t),t) as the solution does. That is, we require that at\nt_{kj} = t_k + h_k c_{j}, \\quad j=1,\\ldots, s, which we call collocation points, the polynomial satisfies \\boxed{\\dot p_k(t_{kj}) = f(p_k(t_{kj}),t_{kj}), \\quad j=1,\\ldots, s.}\nExpressing the derivative of the polynomial on the left and expanding the polynomial itself on the right, we get \n\\begin{aligned}\np_{k1} + &2p_{k2}(t_{kj}-t_k)+\\ldots + s p_{ks}(t_{kj}-t_k)^{s-1} = \\\\ &f(p_{k0} + p_{k1}(t_{kj}-t_k) + p_{ks}(t_{kj}-t_k)^2 + \\ldots + p_{ks}(t_{kj}-t_k)^s), \\quad j=1,\\ldots, s.\n\\end{aligned}\n\nThis gives us the complete set of equations for each interval. For the considered example of a cubic polynomial, we have one interpolation condition at the beginning of the interval and then three collocation conditions at the collocation points. In total, we have four equations for each interval. The number of equations is equal to the number of coefficients of the polynomial. Before the system of equations can solved for the coefficients, we must specifies the collocation points. Based on these, the collocation methods split into three families:\n\nGauss or Gauss-Legendre methods – the collocation points are strictly inside each interval.\nLobatto methods – the collocation points include also both ends of each interval.\nRadau methods – the collocation points include just one end of the interval.\n\n\n\n\n\n\n\nFigure 6: Single (sub)interval in indirect collocation – a cubic polynomial calls for three collocation points, two of which coincide with the discretization points (discrete-times); the continuity is enforced at the discretization point at the beginning of the interval\n\n\n\n\n\n\n\n\n\nImportant\n\n\n\nAlthough in principle the collocation points could be arbitrary (but distinct), within a given family of methods, and for a given number of collocation points, some clever options are known that maximize accuracy.\n\n\n\nLinear polynomials\nBesides the piecewise constant approximation, which is too crude, not to speak of the discontinuity it introduces, the next simplest approximation of a solution x(t) on the interval [t_k,t_{k+1}] of length h_k=t_{k+1}-t_k is a linear (actually affine) polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k).\nOn the given kth interval it is parameterized by two parameters p_{k0} and p_{k1}, hence two equations are needed. The first equation enforces the continuity at the beginning of the interval \\boxed\n{p_{k-1,0} + p_{k-1,1}h_{k-1} = p_{k0}.}\n\nThe remaining single equation is the collocation condition at a single collocation point t_{k1} = t_k + h_k c_1, which remains to be chosen. One possible choice is c_1 = 1/2, that is \nt_{k1} = t_k + \\frac{h_k}{2}\n\nIn words, the collocation point is chosen in the middle of the interval. The collocation condition then reads \\boxed\n{p_{k1} = f\\left(p_{k0} + p_{k1}\\frac{h_k}{2}\\right).}\n\n\n\nQuadratic polynomials\nIf a quadratic polynomial is used to approximate the solution, the condition at the beginning of the interval is \\boxed\n{p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2 = p_{k0}.}\n\nTwo more equations – collocation conditions – are needed to specify all the three coefficients that parameterize the aproximating polynomial on a given interval [t_k,t_{k+1}]. One intuitive (and actually clever) choice is to place the collocation points at the beginning and the end of the interval, that is, at t_k and t_{k+1}. The coefficient that parameterize the relative position of the collocation points with respect to the interval are c_1=0 and c_2=1 The collocation conditions then read \\boxed\n{\\begin{aligned}\np_{k1} &= f(p_{k0}),\\\\\np_{k1} + 2p_{k2}h_{k} &= f(p_{k0} + p_{k1}h_k + p_{k2}h_k^2).\n\\end{aligned}}\n\n\n\nCubic polynomials\nWhen a cubic polynomial is used, the condition at the beginning of the kth interval is \\boxed\n{p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2+p_{k-1,3}h_{k-1}^3 = p_{k0}.}\n\nThree more equations are needed to determine all the four coefficients of the polynomial. Where to place the collocations points? One intuitive (and clever too) option is to place them at the beginning, in the middle, and at the end of the interval. The relative positions of the collocation points are then given by c_1=0, c_2=1/2, and c_3=1. The collocation conditions then read \\boxed\n{\\begin{aligned}\np_{k1} &= f\\left(p_{k0} + p_{k1}(t_{k1}-t_k) + p_{k2}(t_{k1}-t_k)^2 + p_{k3}(t_{k1}-t_k)^3\\right),\\\\\np_{k1} + 2p_{k2}\\frac{h_k}{2} + 3 p_{k3}\\left(\\frac{h_k}{2}\\right)^{2} &= f\\left(p_{k0} + p_{k1}\\frac{h_k}{2} + p_{k2}\\left(\\frac{h_k}{2}\\right)^2 + p_{k3}\\left(\\frac{h_k}{2} \\right)^3\\right),\\\\\np_{k1} + 2p_{k2}h_k + 3 p_{k3}h_k^{2} &= f\\left(p_{k0} + p_{k1}h_k + p_{k2}h_k^2 + p_{k3}h_k^3\\right).\n\\end{aligned}}\n\n\n\n\nCollocation methods are implicit Runge-Kutta methods\nAn important observation that we are goint to make is that collocation methods can be viewed as implicit Runge-Kutta methods. But not all IRK methods can be viewed as collocation methods. In this section we show that the three implicit RK methods that we covered above are indeed (equivalent to) collocation methods. By the equivalence we mean that there is a linear relationship between the coefficients of the polynomials that approximate the solution on a given (sub)interval and the solution at the discretization point together with the derivative of the solution at the collocation points.\n\nImplicit midpoint method as a Radau collocation method\nFor the given integration interval [t_k,t_{k+1}], we write down two equations that relate the two coefficients of the linear polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) and an approximation x_k of x(t) at the beginning of the interval t_k, as well as an approximation of \\dot x(t) at the (single) collocation point t_{k1} = t_{k} + \\frac{h_k}{2}.\nIn particular, the first interpolation condition is p_k(t_k) = \\textcolor{red}{p_{k0} = x_k} \\approx x(t_k).\nThe second interpolation condition, the one on the derivative in the middle of the interval is \\dot p_k\\left(t_k + \\frac{h_k}{2}\\right) = \\textcolor{red}{p_{k1} = f(x_{k1},t_{k1})} \\approx f(x(t_{k1}),t_{k1}).\nNote that here we introduced yet another unknown – the approximation x_{k1} of x(t_{k1}) at the collocation point t_{k1}. We can write it using the polynomial p_k(t) as \nx_{k1} = p_k\\left(t_k + \\frac{h_k}{2}\\right) = p_{k0} + p_{k1}\\frac{h_k}{2}.\n\nSubstituting for p_{k0} and p_{k1}, we get \nx_{k1} = x_k + f(x_{k1},t_{k1})\\frac{h_k}{2}.\n\nWe also introduce the notation f_{k1} for f(x_{k1},t_{k1}) and we can write an equation \nf_{k1} = f\\left(x_k + f_{k1}\\frac{h_k}{2}\\right).\n\nBut we want to find x_{k+1}, which we can accomplish by evaluating the polynomial p_k(t) at t_{k+1} = t_k+h_k \nx_{k+1} = x_k + f_{k1}h_k.\n\nCollecting the last two equations, we rederived the good old friend – the implicit midpoint method.\n\n\nImplicit trapezoidal method as a Lobatto collocation method\nFor the given integration interval [t_k,t_{k+1}], we write down three equations that relate the three coefficients of the quadratic polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) + p_{k2}(t-t_k)^2 and an approximation x_k of x(t) at the beginning of the interval t_k, as well as approximations to \\dot x(t) at the two collocations points t_k and t_{k+1}.\nIn particular, the first interpolation condition is p_k(t_k) = \\textcolor{red}{p_{k0} = x_k} \\approx x(t_k).\nThe second interpolation condition, the one on the derivative at the beginning of the interval, the first collocation point, is \\dot p_k(t_k) = \\textcolor{red}{p_{k1} = f(x_k,t_k)} \\approx f(x(t_k),t_k).\nThe third interpolation condition, the one on the derivative at the second collocation point \\dot p_k(t_k+h_k) = \\textcolor{red}{p_{k1} + 2p_{k2} h_k = f(x_{k+1},t_{k+1})} \\approx f(x(t_{k+1}),t_{k+1}).\nAll the three conditions (emphasized in color above) can be written together as \n \\begin{bmatrix}\n 1 & 0 & 0\\\\\n 0 & 1 & 0\\\\\n 0 & 1 & 2 h_k\\\\\n \\end{bmatrix}\n \\begin{bmatrix}\n p_{k0} \\\\ p_{k1} \\\\ p_{k2}\n \\end{bmatrix}\n =\n \\begin{bmatrix}\n x_{k} \\\\ f(x_k,t_k) \\\\ f(x_{k+1},t_{k+1})\n \\end{bmatrix}.\n\nThe above system of linear equations can be solved by inverting the matrix \n \\begin{bmatrix}\n p_{k0} \\\\ p_{k1} \\\\ p_{k2}\n \\end{bmatrix}\n =\n \\begin{bmatrix}\n 1 & 0 & 0\\\\\n 0 & 1 & 0\\\\\n 0 & -\\frac{1}{2h_k} & \\frac{1}{2h_k}\\\\\n \\end{bmatrix}\n \\begin{bmatrix}\n x_{k} \\\\ f(x_k,t_k) \\\\ f(x_{k+1},t_{k+1})\n \\end{bmatrix}.\n\nWe can now write down the interpolating/approximating polynomial p_k(t) = x_{k} + f(x_{k},t_{k})(t-t_k) +\\left[-\\frac{1}{2h_k}f(x_{k},t_{k}) + \\frac{1}{2h_k}f(x_{k+1},t_{k+1})\\right](t-t_k)^2.\nThis polynomial can now be used to find an (approximation of the) value of the solution at the end of the interval x_{k+1} = p_k(t_k+h_k) = x_{k} + f(x_{k},t_{k})h_k +\\left[-\\frac{1}{2h_k}f(x_{k},t_{k}) + \\frac{1}{2h_k}f(x_{k+1},t_{k+1})\\right]h_k^2, which can be simplified nearly upon inspection to x_{k+1} = x_{k} + \\frac{f(x_{k},t_{k}) + f(x_{k+1},t_{k+1})}{2} h_k, but this is our good old friend, isn’t it? We have shown that the collocation method with a quadratic polynomial with the collocation points chosen at the beginning and the end of the interval is (equivalent to) the implicit trapezoidal method. The method belongs to the family of Lobatto IIIA methods, which are all known to be collocation methods.\n\n\nHermite-Simpson method as a Lobatto collocation method\nHere we show that Hermite-Simpson method also qualifies as a collocation method. In particular, it belongs to the family of Lobatto IIIA methods, similarly as implicit trapezoidal method. The first condition, the one on the value of the cubic polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) + p_{k2}(t-t_k)^2+ p_{k3}(t-t_k)^3 at the beginning of the interval is p_k(t_k) = \\textcolor{red}{p_{k0} = x_k} \\approx x(t_k).\nThe three remaining conditions are imposed at the collocation points, which for the integration interval [t_k,t_{k+1}] are t_{k1} = t_k , t_{k2} = \\frac{t_k+t_{k+1}}{2} , and t_{k3} = t_{k+1}. With the first derivative of the polynomial given by \\dot p_k(t) = p_{k1} + 2p_{k2}(t-t_k) + 3p_{k3}(t-t_k)^2, the first collocation condition \\dot p_k(t_k) = \\textcolor{red}{p_{k1} = f(x_k,t_k)} \\approx f(x(t_k),t_k).\nThe second collocation condition – the one on the derivative in the middle of the interval – is \\dot p_k\\left(t_k+\\frac{1}{2}h_k\\right) = \\textcolor{red}{p_{k1} + 2p_{k2} \\frac{h_k}{2} + 3p_{k3} \\left(\\frac{h_k}{2}\\right)^2 = f(x_{k2},t_{k2})} \\approx f\\left(x\\left(t_{k}+\\frac{h_k}{2}\\right),t_{k}+\\frac{h_k}{2}\\right).\nThe color-emphasized part can be simplified to \\textcolor{red}{p_{k1} + p_{k2} h_k + \\frac{3}{4}p_{k3} h_k^2 = f(x_{k2},t_{k2})}.\nFinally, the third collocation condition – the one imposed at the end of the interval – is \\dot p_k(t_k+h_k) = \\textcolor{red}{p_{k1} + 2p_{k2} h_k + 3p_{k3} h_k^2 = f(x_{k+1},t_{k+1})} \\approx f(x(t_{k+1}),t_{k+1}).\nAll the four conditions (emphasized in color above) can be written together as \n \\begin{bmatrix}\n 1 & 0 & 0 & 0\\\\\n 0 & 1 & 0 & 0\\\\\n 0 & 1 & h_k & \\frac{3}{4} h_k^2\\\\\n 0 & 1 & 2 h_k & 3h_k^2\\\\\n \\end{bmatrix}\n \\begin{bmatrix}\n p_{k0} \\\\ p_{k1} \\\\ p_{k2} \\\\p_{k3}\n \\end{bmatrix}\n =\n \\begin{bmatrix}\n x_{k} \\\\ f(x_k,t_k) \\\\ f(x_{k2},t_{k2}) \\\\ f(x_{k+1},t_{k+1}).\n \\end{bmatrix}\n\nInverting the matrix analytically, we get \n \\begin{bmatrix}\n p_{k0} \\\\ p_{k1} \\\\ p_{k2}\\\\ p_{k3}\n \\end{bmatrix}\n =\n \\begin{bmatrix}\n 1 & 0 & 0 & 0\\\\\n 0 & 1 & 0 & 0\\\\\n 0 & -\\frac{3}{2h_k} & \\frac{2}{h_k} & -\\frac{1}{2h_k}\\\\\n 0 & \\frac{2}{3h_k^2} & -\\frac{4}{3h_k^2} & \\frac{2}{3h_k^2}\n \\end{bmatrix}\n \\begin{bmatrix}\n x_{k} \\\\ f(x_k,t_k) \\\\ f(x_{k2},t_{k2})\\\\ f(x_{k+1},t_{k+1}).\n \\end{bmatrix}.\n\nWe can now write down the interpolating/approximating polynomial \n \\begin{aligned}\n p_k(t) &= x_{k} + f(x_{k},t_{k})(t-t_k) +\\left[-\\frac{3}{2h_k}f(x_{k},t_{k}) + \\frac{2}{h_k}f(x_{k2},t_{k2}) -\\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \\right](t-t_k)^2\\\\\n & +\\left[\\frac{2}{3h_k^2}f(x_{k},t_{k}) - \\frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \\right](t-t_k)^3.\n \\end{aligned}\n\nWe can use this prescription of the polynomial p_k(t) to compute the (approximation of the) value of the solution at the end of the kth interval \n \\begin{aligned}\n x_{k+1} = p_k(t_k+h_k) &= x_{k} + f(x_{k},t_{k})h_k +\\left[-\\frac{3}{2h_k}f(x_{k},t_{k}) + \\frac{2}{h_k}f(x_{k2},t_{k2}) -\\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \\right]h_k^2\\\\\n & +\\left[\\frac{2}{3h_k^2}f(x_{k},t_{k}) - \\frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \\right]h_k^3,\n \\end{aligned}\n which can be simplified to \n \\begin{aligned}\n x_{k+1} &= x_{k} + f(x_{k},t_{k})h_k +\\left[-\\frac{3}{2}f(x_{k},t_{k}) + \\frac{2}{1}f(x_{k2},t_{k2}) -\\frac{1}{2}f(x_{k+1},t_{k+1}) \\right]h_k\\\\\n & +\\left[\\frac{2}{3}f(x_{k},t_{k}) - \\frac{4}{3}f(x_{k2},t_{k2}) +\\frac{2}{3}f(x_{k+1},t_{k+1}) \\right]h_k,\n \\end{aligned}\n which further simplifies to \n x_{k+1} = x_{k} + h_k\\left[\\frac{1}{6}f(x_{k},t_{k}) + \\frac{2}{3}f(x_{k2},t_{k2}) + \\frac{1}{6}f(x_{k+1},t_{k+1}) \\right],\n which can be recognized as the Simpson integration that we have already seen in implicit Runge-Kutta method described above.\nObviously f_{k2} needs to be further elaborated on, namely, x_{k2} needs some prescription too. We know that it was introduced as an approximation to the solution x in the middle of the interval. Since the value of the polynomial in the middle is such an approximation too, we can set x_{k2} equal to the value of the polynomial in the middle. \n \\begin{aligned}\n x_{k2} = p_k\\left(t_k+\\frac{1}{2}h_k\\right) &= x_{k} + f(x_{k},t_{k})\\frac{h_k}{2} +\\left[-\\frac{3}{2h_k}f(x_{k},t_{k}) + \\frac{2}{h_k}f(x_{k2},t_{k2}) -\\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \\right]\\left(\\frac{h_k}{2}\\right)^2\\\\\n & +\\left[\\frac{2}{3h_k^2}f(x_{k},t_{k}) - \\frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \\right]\\left(\\frac{h_k}{2}\\right)^3,\n \\end{aligned}\n which without further ado simplifies to \n x_{k2} = x_{k} + h_k\\left( \\frac{5}{24}f(x_{k},t_{k}) +\\frac{1}{3}f(x_{k2},t_{k2}) -\\frac{1}{24}f(x_{k+1},t_{k+1}) \\right),\n which can be recognized as the other equation in the primary formulation of Implicit trapezoidal method described above.\n\n\n\nPseudospectral collocation methods\nThey only consider a single polynomial over the whole interval. The degree of such polynomial, in contrast with classical collocation methods, rather high, therefore also the number of collocation points is high, but their location is crucial.",
+ "text": "Methods for solving TP-BVP ODE\nHere we assume that from the stationarity equation \n\\mathbf 0 = \\nabla_{\\bm u} H(\\bm x,\\bm u,\\bm \\lambda)\n we can express \\bm u(t) as a function of the the state and costate variables, \\bm x(t) and \\bm \\lambda(t), respectively. In fact, Pontryagin’s principles gives this expression as \\bm u^\\star(t) = \\text{arg} \\min_{\\bm u(t) \\in\\mathcal{U}} H(\\bm x^\\star(t),\\bm u(t), \\bm\\lambda^\\star(t)). And we substitute for \\bm u(t) into the state and costate equations. This way we eliminate \\bm u(t) from the system of DAEs and we are left with a system of ODEs for \\bm x(t) and \\bm \\lambda(t) only. Formally, the resulting Hamiltonian is a different function as it is now a functio of two variables only.\n\n\\begin{aligned}\n\\dot{\\bm{x}} &= \\nabla_{\\bm\\lambda} \\mathcal H(\\bm x,\\bm \\lambda) \\\\\n\\dot{\\bm{\\lambda}} &= -\\nabla_{\\bm x} \\mathcal H(\\bm x,\\bm \\lambda) \\\\\n\\bm x(t_\\mathrm{i}) &=\\mathbf x_\\mathrm{i}\\\\\n\\bm x(t_\\mathrm{f}) &= \\mathbf x_\\mathrm{f} \\qquad \\text{or} \\qquad \\bm \\lambda(t_\\mathrm{f}) = \\nabla\\phi(\\bm{x}(t_\\mathrm{f})).\n\\end{aligned}\n\nAlthough we now have an ODE system, it is still a BVP. Strictly speaking, from now on, arbitrary reference on numerical solution of boundary value problems can be consulted to get some overview – we no longer need to restrict ourselves to the optimal control literature and software. On the other hand, the right sides are not quite arbitrary – these are Hamiltonian equations – and this property could and perhaps even should be exploited by the solution methods.\nThe methods for solving general BVPs are generally divided into\n\nshooting and multiple shooting methods,\ndiscretization methods,\ncollocation methods.\n\n\nShooting methods\n\nShooting method outside optimal control\nHaving made the diclaimer that boundary value problems constitute a topic indenendent of the optimal control theory, we start their investigation within a control-unrelated setup. We consider a system of two ordinary differential equations in two variables with the value of the first variable specified at both ends while the value of the other variable is left unspecified \n\\begin{aligned}\n\\begin{bmatrix}\n \\dot y_1(t)\\\\\n \\dot y_2(t)\n\\end{bmatrix}\n&=\n\\begin{bmatrix}\nf_1(\\bm y,t)\\\\\nf_2(\\bm y,t)\n\\end{bmatrix}\\\\\ny_1(t_\\mathrm{i}) &= \\mathrm y_{1\\mathrm{i}},\\\\\ny_1(t_\\mathrm{f}) &= \\mathrm y_{1\\mathrm{f}}.\n\\end{aligned}\n\nAn idea for a solution method is this:\n\nGuess at the missing (unspecified) value y_{2\\mathrm{i}} of y_2 at the initial time t_\\mathrm{i},\nUse an IVP solver (for example ode45 in Matlab) to find the values of both variables over the whole interval [t_\\mathrm{i},t_\\mathrm{f}].\nCompare the simulated value of the state variable y_1 at the final time t_\\mathrm{f} and compare it with the boundary value .\nBased on the error e = y_1(t_\\mathrm{f})-\\mathrm y_{1\\mathrm{f}}, update y_{2\\mathrm{i}} and go back to step 2.\n\nHow shall the update in the step 4 be realized? The value of y_1 at the final time t_\\mathrm{f} and therefore the error e are functions of the value y_{2\\mathrm{i}} of y_2 at the initial time t_\\mathrm{i}. We can formally express this upon introducing a map F such that e = F(y_{2\\mathrm{i}}). The problem now boils down to solving the nonlinear equation \\boxed\n{F(y_{2\\mathrm{i}}) = 0.}\n\nIf Newton’s method is to be used for solving this equation, the derivative of F is needed. Most often than not, numerical solvers for IVP ODE have to be called in order to evaluate the function F, in which case the derivative cannot be determined analytically. Finite difference (FD) and algorithmic/automatic differentiation (AD) methods are available.\nIn this example we only considered y_1 and y_2 as scalar variables, but in general these could be vector variables, in which case a system of equations in the vector variable has to be solved. Instead of a single scalar derivative, its matrix version – Jacobian matrix – must be determined.\nBy now the reason for calling this method shooting is perhaps obvious. Indeed, the analogy with aiming and shooting a cannon is illustrative.\nAs another example, we consider the BVP for a pendulum.\n\nExample 1 (BVP for pendulum) For an ideal pendulum described by the second-order model \\ddot \\theta + \\frac{b}{ml^2}\\dot \\theta + \\frac{g}{l} \\sin(\\theta) = 0 and for a final time t_\\mathrm{f}, at which some prescribed value of \\theta(t_\\mathrm{f}) must be achieved, compute by the shooting method the needed value of the initial angle \\theta_\\mathrm{i}, while assuming the initial angular rate \\omega_\\mathrm{i} is zero.\n\n\nShow the code\nusing DifferentialEquations\nusing Roots\nusing Plots\n\nfunction demo_shoot_pendulum()\n θfinal = -0.2;\n tfinal = 3.5;\n tspan = (0.0,tfinal)\n tol = 1e-5\n function pendulum!(dx,x,p,t)\n g = 9.81\n l = 1.0;\n m = 1.0;\n b = 0.1;\n a₁ = g/l\n a₂ = b/(m*l^2)\n θ,ω = x\n dx[1] = ω\n dx[2] = -a₁*sin(θ) - a₂*ω\n end\n prob = ODEProblem(pendulum!,zeros(Float64,2),tspan)\n function F(θ₀::Float64)\n xinitial = [θ₀,0.0]\n prob = remake(prob,u0=xinitial)\n sol = solve(prob,Tsit5(),reltol=tol/10,abstol=tol/10)\n return θfinal-sol[end][1]\n end\n θinitial = find_zero(F,(-pi,pi)) # Solving the equation F(θ)=0 using Roots package. In general can find more solutions.\n xinitial = [θinitial,0.0]\n prob = remake(prob,u0=xinitial) # Already solved in F(), but we solve it again for plotting.\n sol = solve(prob,Tsit5())\n p1 = plot(sol,lw=2,xlabel=\"Time\",ylabel=\"Angle\",label=\"θ\",idxs=(1))\n scatter!([tfinal],[θfinal],label=\"Required terminal θ\")\n p2 = plot(sol,lw=2,xlabel=\"Time\",ylabel=\"Angular rate\",label=\"ω\",idxs=(2))\n display(plot(p1,p2,layout=(2,1)))\nend\n\ndemo_shoot_pendulum()\n\n\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 1: State responses for a pendulum on a given time interval, with zero initial angular rate and the initial angle solved for numerically so that the final angle attains a give value\n\n\n\n\n\nA few general comments to the above code:\n\nThe function F(\\theta_\\mathrm{i}) that defines the nonlinear equation F(\\theta_\\mathrm{i})=0 calles a numerical solver for an IVP ODE. The latter solver then should have the numerical tolerances set more stringent than the former.\nThe ODE problem should only be defined once and then in each iteration its parameters should be updated. In Julia, this is done by the remake function, but it may be similar for other languages.\n\n\n\nShooting method for indirect approach to optimal control\nWe finally bring the method into the realm of indirect approach to optimal control – it is the initial value \\lambda_\\mathrm{i} of the costate variable that serves as an optimization variable, while the initial value x_\\mathrm{i} of the state variable is known and fixed. The final values of both the state and costate variables are the outcomes of numerical simulation obtained using a numerical solver for an IVP ODE. Based on these, the residual is computed. Either as e = x(t_\\mathrm{f})-x_\\mathrm{f} if the final state is fixed, or as e = \\lambda(t_\\mathrm{f}) - \\nabla \\phi(x(t_\\mathrm{f})) if the final state is free. Based on this residual, the initial value of the costate is updated and another iteration of the algorithm is entered.\n\n\n\n\n\n\nFigure 2: Indirect shooting\n\n\n\n\nExample 2 (Shooting for indirect approach to LQR) Standard LQR optimal control for a second-order system on a fixed finite interval with a fixed final state.\n\n\nShow the code\nusing LinearAlgebra\nusing DifferentialEquations\nusing NLsolve\n\nfunction shoot_lq_fixed(A,B,Q,R,xinitial,xfinal,tfinal)\n n = size(A)[1]\n function statecostateeq!(dw,w,p,t)\n x = w[1:n]\n λ = w[(n+1):end]\n dw[1:n] = A*x - B*(R\\B'*λ)\n dw[(n+1):end] = -Q*x - A'*λ\n end\n λinitial = zeros(n)\n tspan = (0.0,tfinal)\n tol = 1e-5\n function F(λinitial)\n winitial = vcat(xinitial,λinitial)\n prob = ODEProblem(statecostateeq!,winitial,tspan)\n dsol = solve(prob,Tsit5(),abstol=tol/10,reltol=tol/10)\n xfinalsolved = dsol[end][1:n]\n return (xfinal-xfinalsolved)\n end\n nsol = nlsolve(F,λinitial,xtol=tol) # Could add autodiff=:forward.\n λinitial = nsol.zero # Solving once again for plotting.\n winitial = vcat(xinitial,λinitial)\n prob = ODEProblem(statecostateeq!,winitial,tspan)\n dsol = solve(prob,Tsit5(),abstol=tol/10,reltol=tol/10)\n return dsol\nend\n\nfunction demo_shoot_lq_fixed()\n n = 2 # Order of the system.\n m = 1 # Number of inputs.\n A = rand(n,n) # Matrices modeling the system.\n B = rand(n,m)\n \n Q = diagm(0=>rand(n)) # Weighting matrices for the quadratic cost function.\n R = rand(1,1)\n\n xinitial = [1.0, 2.0]\n xfinal = [3.0, 4.0]\n tfinal = 5.0 \n\n dsol = shoot_lq_fixed(A,B,Q,R,xinitial,xfinal,tfinal)\n\n p1 = plot(dsol,idxs=(1:2),lw=2,legend=false,xlabel=\"Time\",ylabel=\"State\")\n p2 = plot(dsol,idxs=(3:4),lw=2,legend=false,xlabel=\"Time\",ylabel=\"Costate\")\n display(plot(p1,p2,layout=(2,1)))\nend\n\ndemo_shoot_lq_fixed()\n\n\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nFigure 3: Shooting method applied to the indirect approach to LQR problem\n\n\n\n\n\n\n\n\nMultiple shooting methods\nThe key deficiency of the shooting method is that the only source of the error is the error in the initial condition, this error then amplifies as it propagates over the whole time interval as the numerical integration proceeds, and consequently the residual is very sensitive to tiny changes in the initial value. The multiple shooting method is a remedy for this. The idea is to divide the interval [t_\\mathrm{i},t_\\mathrm{f}] into N subintervals [t_k,t_{k+1}] and to introduce the values of the state and co-state variable at the beginning of each subinterval as additional variables. Additional equations are then introduced that enforce the continuity of the variable at the end of one subinterval and at the beginning of the next subinterval.\n\n\n\n\n\n\nFigure 4: Indirect multiple shooting\n\n\n\n\n\nDiscretization methods\nShooting methods take advantage of availability of solvers for IVP ODEs. These solvers produce discret(ized) trajectories, proceeding (integration) step by step, forward in time. But they do this in a way hidden from users. We just have to set the initial conditions (possibly through numerical optimization) and the solver does the rest.\nAlternatively, the formulas for the discrete-time updates are not evaluated one by one, step by step, running forward in time, but are assembled to form a system of equations, in general nolinear ones. Appropriate boundary conditions are then added to these nonlinear equations and the whole system is then solved numerically, yielding a discrete approximation of the trajectories satisfying the BVP.\nSince all those equatins are solved simultaneously (as a system of equations), there is no advantage in using explicit methods for solving ODEs, and implicit methods are used instead.\nIt is now time to recall some crucial results from the numerical methods for solving ODEs. First, we start with the popular single-step methods known as the Runge-Kutta (RK) methods.\nWe consider the standard ODE \\dot x(t) = f(x(t),t).\nand we define the Butcher tableau as \n \\begin{array}{ l | c c c c }\n c_1 & a_{11} & a_{12} & \\ldots & a_{1s}\\\\\n c_2 & a_{21} & a_{22} & \\ldots & a_{2s}\\\\\n \\vdots & \\vdots\\\\\n c_s & a_{s1} & a_{s2} & \\ldots & a_{ss}\\\\\n \\hline\n & b_{1} & b_{2} & \\ldots & b_{s}\n \\end{array}.\n such that c_i = \\sum_{j=1}^s a_{ij}, and 1 = \\sum_{j=1}^s b_{j}.\nReffering to the particular Butcher table, a single step of the method is \n \\begin{aligned}\n f_{k1} &= f(x_k + h_k \\left(a_{11}f_{k1}+a_{12}f_{k2} + \\ldots + a_{1s}f_{ks}),t_k+c_1h_k\\right)\\\\\n f_{k2} &= f(x_k + h_k \\left(a_{21}f_{k1}+a_{22}f_{k2} + \\ldots + a_{2s}f_{ks}),t_k+c_2h_k\\right)\\\\\n \\vdots\\\\\n f_{ks} &= f(x_k + h_k \\left(a_{s1}f_{k1}+a_{s2}f_{k2} + \\ldots + a_{ss}f_{ks}),t_k+c_sh_k\\right)\\\\\n x_{k+1} &= x_k + h_k \\left(b_1 f_{k1}+b_2f_{k2} + \\ldots + b_sf_{ks}\\right).\n \\end{aligned}\n\nIf the matrix A is strictly lower triangular, that is, if a_{ij} = 0 for all i<j , the method belongs to explicit Runge-Kutta methods, otherwise it belongs to implicit Runge-Kutta methods.\nA prominent example of explicit RK methods is the 4-stage RK method (oftentimes referred to as RK4).\n\nExplicit RK4 method\nThe Buttcher table for the method is \n \\begin{array}{ l | c c c c }\n 0 & 0 & 0 & 0 & 0\\\\\n 1/2 & 1/2 & 0 & 0 & 0\\\\\n 1/2 & 0 & 1/2 & 0 & 0\\\\\n 1 & 0 & 0 & 1 & 0\\\\\n \\hline\n & 1/6 & 1/3 & 1/3 & 1/6\n \\end{array}.\n\nFollowing the Butcher table, a single step of this method is \n \\begin{aligned}\n f_{k1} &= f(x_k,t_k)\\\\\n f_{k2} &= f\\left(x_k + \\frac{h_k}{2}f_{k1},t_k+\\frac{h_k}{2}\\right)\\\\\n f_{k3} &= f\\left(x_k + \\frac{h_k}{2}f_{k2},t_k+\\frac{h_k}{2}\\right)\\\\\n f_{k4} &= f\\left(x_k + h_k f_{k3},t_k+h_k\\right)\\\\\n x_{k+1} &= x_k + h_k \\left(\\frac{1}{6} f_{k1}+\\frac{1}{3}f_{k2} + \\frac{1}{3}f_{k3} + \\frac{1}{6}f_{k4}\\right)\n \\end{aligned}.\n\nBut as we have just mentions, explicit methods are not particularly useful for solving BVPs. We prefer implicit methods. One of the simplest is the implicit midpoint method.\n\n\nImplicit midpoint method\nThe Butcher tableau is \n \\begin{array}{ l | c r }\n 1/2 & 1/2 \\\\\n \\hline\n & 1\n \\end{array}\n\nA single step is then \\begin{aligned}\n f_{k1} &= f\\left(x_k+\\frac{1}{2}f_{k1} h_k, t_k+\\frac{1}{2}h_k\\right)\\\\\n x_{k+1} &= x_k + h_k f_{k1}.\n \\end{aligned}\nBut adding to the last equation x_k we get x_{k+1} + x_k = 2x_k + h_k f_{k1}.\nDividing by two we get \\frac{1}{2}(x_{k+1} + x_k) = x_k + \\frac{1}{2}h_k f_{k1} and then it follows that \\boxed{x_{k+1} = x_k + h_k f\\left(\\frac{1}{2}(x_k+x_{k+1}),t_k+\\frac{1}{2}h_k\\right).}\nThe right hand side of the last equation explains the “midpoint” in the name. It can be viewed as a rectangular approximation to the integral in x_{k+1} = x_k + \\int_{t_k}^{t_{k+1}} f(x(t),t)\\mathrm{d}t as the integral is computed as an area of a rectangle with the height determined by f() evaluated in the middle point.\nAlthough we do not explain the details here, let’s just note that it is the simplest of the collocation methods. In particular it belongs to Gauss (also Gauss-Legandre) methods.\n\n\nImplicit trapezoidal method\nThe method can be viewed both as a single-step (RK) method and a multi-step method. When viewed as an RK method, its Butcher table is \n \\begin{array}{ l | c r }\n 0 & 0 & 0 \\\\\n 1 & 1/2 & 1/2 \\\\\n \\hline\n & 1/2 & 1/2 \\\\\n \\end{array}\n Following the Butcher table, a single step of the method is then \n \\begin{aligned}\n f_{k1} &= f(x_k,t_k)\\\\\n f_{k2} &= f(x_k + h_k \\frac{f_{k1}+f_{k2}}{2},t_k+h_k)\\\\\n x_{k+1} &= x_k + h_k \\left(\\frac{1}{2} f_{k1}+\\frac{1}{2} f_{k2}\\right).\n \\end{aligned}\n\nBut since the collocation points are identical with the nodes (grid/mesh points), we can relabel to \\begin{aligned}\n f_{k} &= f(x_k,t_k)\\\\\n f_{k+1} &= f(x_{k+1},t_{k+1})\\\\\n x_{k+1} &= x_k + h_k \\left(\\frac{1}{2} f_{k}+\\frac{1}{2} f_{k+1}\\right).\n \\end{aligned}\n\nThis possibility is a particular advantage of Lobatto and Radau methods, which contain both end points of the interval or just one of them among the collocation points. The two symbols f_k and f_{k+1} are really just shorthands for values of the function f at the beginning and the end of the integration interval, so the first two equations of the triple above are not really equations to be solved but rather definitions. And we can assemble it all into just one equation \\boxed{\n x_{k+1} = x_k + h_k \\frac{f(x_k,t_k)+f(x_{k+1},t_{k+1})}{2}.\n }\n\nThe right hand side of the last equation explains the “trapezoidal” in the name. It can be viewed as a trapezoidal approximation to the integral in x_{k+1} = x_k + \\int_{t_k}^{t_{k+1}} f(x(t),t)\\mathrm{d}t as the integral is computed as an area of a trapezoid.\nWhen it comes to building a system of equations within transcription methods, we typically move all the terms just on one side to obtain the defect equations x_{k+1} - x_k - h_k \\left(\\frac{1}{2} f(x_k,t_k)+\\frac{1}{2} f(x_{k+1},t_{k+1})\\right) = 0.\n\n\nHermite-Simpson method\nIt belongs to the family of Lobatto III methods, namely it is a 3-stage Lobatto IIIA method. Butcher tableau \n \\begin{array}{ l | c c c c }\n 0 & 0 &0 & 0\\\\\n 1/2 & 5/24 & 1/3 & -1/24\\\\\n 1 & 1/6 & 2/3 & 1/6\\\\\n \\hline\n & 1/6 & 2/3 & 1/6\n \\end{array}\n\nHermite-Simpson method can actually come in three forms (this is from Betts (2020)):\n\nPrimary form\nThere are two equations for the given integration interval [t_k,t_{k+1}] x_{k+1} = x_k + h_k \\left(\\frac{1}{6}f_k + \\frac{2}{3}f_{k2} + \\frac{1}{6}f_{k+1}\\right), x_{k2} = x_k + h_k \\left(\\frac{5}{24}f_k + \\frac{1}{3}f_{k2} - \\frac{1}{24}f_{k+1}\\right), where the f symbols are just shorthand notations for values of the function at a certain point f_k = f(x_k,u(t_k),t_k), f_{k2} = f(x_{k2},u(t_{k2}),t_{k2}), f_{k+1} = f(x_{k+1},u(t_{k+1}),t_{k+1}), and the off-grid time t_{k2} is given by t_{k2} = t_k + \\frac{1}{2}h_k.\nThe first of the two equations can be recognized as Simpson’s rule for computing a definite integral. Note that while considering the right hand sides as functions of the control inputs, we also correctly express at which time (the collocation time) we consider the value of the control variable.\nBeing this general allows considering general control inputs and not only piecewise constant control inputs. For example, if we consider piecewise linear control inputs, then u(t_{k2}) = \\frac{u_k + u_{k+1}}{2}. But if we stick to the (more common) piecewise constant controls, not surprisingly u(t_{k2}) = u_k. Typically we format the equations as defect equations, that is, with zero on the right hand side \n\\begin{aligned}\nx_{k+1} - x_k - h_k \\left(\\frac{1}{6}f_k + \\frac{2}{3}f_{k2} + \\frac{1}{6}f_{k+1}\\right) &= 0,\\\\\nx_{k2} - x_k - h_k \\left(\\frac{5}{24}f_k + \\frac{1}{3}f_{k2} - \\frac{1}{24}f_{k+1}\\right) &= 0.\n\\end{aligned}\n\nThe optimization variables for every integration interval are x_k,u_k,x_{k2}, u_{k2}.\n\n\nHermite-Simpson Separated (HSS) method\nAlternatively, we can express f_{k2} in the first equation as a function of the remaining terms and then substitute to the second equation. This will transform the second equation such that only the terms indexed with k and k+1 are present. \n\\begin{aligned}\nx_{k+1} - x_k - h_k \\left(\\frac{1}{6}f_k + \\frac{2}{3}f_{k2} + \\frac{1}{6}f_{k+1}\\right) &= 0,\\\\\nx_{k2} - \\frac{x_k + x_{k+1}}{2} - \\frac{h_k}{8} \\left(f_k - f_{k+1}\\right) &= 0.\n\\end{aligned}\n\nWhile we already know (from some paragraph above) that the first equation is Simpson’s rule, the second equation is an outcome of Hermite intepolation. Hence the name. The optimization variables for every integration interval are the same as before, that is, x_k,u_k,x_{k2}, u_{k2} .\n\n\nHermite-Simpson Condensed (HSC) method\nYet some more simplification can be obtained from HSS. Namely, the second equation can be actually used to directly prescribe x_{k2} x_{k2} = \\frac{x_k + x_{k+1}}{2} + \\frac{h_k}{8} \\left(f_k - f_{k+1}\\right), which is used in the first equation as an argument for the f() function (represented by the f_{k2} symbol), by which the second equation and the term x_{k2} are eliminated from the set of defect equations. The optimization variables for every integration interval still need to contain u_{k2} even though x_{k2} was eliminated, because it is needed to parameterize f_{k2} . That is, the optimization variables then are x_k,u_k, u_{k2} . Reportedly (by Betts) this has been widely used and historically one of the first methods. When it comes to using it in optimal control, it turns out, however, that the sparsity pattern is better for the HSS.\n\n\n\n\nCollocation methods\nYet another family of methods for solving BVP ODE \\dot x(t) = f(x(t),t) are collocation methods. They are also based on discretization of independent variable – the time t. That is, on the interval [t_\\mathrm{i}, t_\\mathrm{f}], discretization points (or grid points or nodes or knots) are chosen, say, t_0, t_1, \\ldots, t_N, where t_0 = t_\\mathrm{i} and t_N = t_\\mathrm{f}. The solution x(t) is then approximated by a polynomial p_k(t) of a certain degree s on each interval [t_k,t_{k+1}] of length h_k=t_{k+1}-t_k\np_k(t) = p_{k0} + p_{k1}(t-t_k) + p_{k2}(t-t_k)^2+\\ldots + p_{ks}(t-t_k)^s.\nThe degree of the polynomial is low, say s=3 or so, certainly well below 10. With N subintervals, the total number of coefficients to parameterize the (approximation of the) solution x(t) over the whole interval is then N(s+1). For example, for s=3 and N=10, we have 40 coefficients: p_{00}, p_{01}, p_{02}, p_{03}, p_{10}, p_{11}, p_{12}, p_{13},\\ldots, p_{90}, p_{91}, p_{92}, p_{93}.\n\n\n\n\n\n\nFigure 5: Indirect collocation\n\n\n\nFinding a solution amounts to determining all those coefficients. Once we have them, the (approximate) solution is given by a piecewise polynomial.\nHow to determine the coefficients? By interpolation. But we will see in a moment that two types of interpolation are needed – interpolation of the value of the solution and interpolation of the derivative of the solution.\nThe former is only performed at the beginning of each interval, that is, at every discretization point (or grid point or node or knot). The condition reads that the polynomial p_{k-1}(t) approximating the solution x(t) on the (k-1)th interval should attain the same value at the end of that interval, that is, at t_{k-1} + h_{k-1}, as the polynomial p_k(t) approximating the solution x(t) on the kth interval attains at the same point, which from its perspective is the beginning of the kth interval, that is, t_k. We express this condition formally as \\boxed{p_{k-1}(\\underbrace{t_{k-1}+h_{k-1}}_{t_{k}}) = p_k(t_k).}\nExpanding the two polynomials, we get p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2+\\ldots + p_{k-1,s}h_{k-1}^s = p_{k0}.\n\n\n\n\n\n\nSubscripts in the coefficients\n\n\n\nWe adopt the notational convention that a coefficient of a polynomial is indexed by two indices, the first one indicating the interval and the second one indicating the power of the corresponding term. For example, p_{k-1,2} is the coefficient of the quadratic term in the polynomial approximating the solution on the (k-1)th interval. For the sake of brevity, we omit the comma between the two subscripts in the cases such as p_{k1} (instead of writing p_{k,1}). But we do write p_{k-1,0}, because here omiting the comma would introduce ambiguity.\n\n\nGood, we have one condition (one equation) for each subinterval. But we need more, if polynomials of degree at least one are considered (we then parameterize them by two parameters, in which case one more equation is needed for each subinterval). Here comes the opportunity for the other kind of interpolation – interpolation of the derivative of the solution. At a given point (or points) that we call collocation points, the polynomial p_k(t) approximating the solution x(t) on the kth interval should satisfy the same differential equation \\dot x(t) = f(x(t),t) as the solution does. That is, we require that at\nt_{kj} = t_k + h_k c_{j}, \\quad j=1,\\ldots, s, which we call collocation points, the polynomial satisfies \\boxed{\\dot p_k(t_{kj}) = f(p_k(t_{kj}),t_{kj}), \\quad j=1,\\ldots, s.}\nExpressing the derivative of the polynomial on the left and expanding the polynomial itself on the right, we get \n\\begin{aligned}\np_{k1} + &2p_{k2}(t_{kj}-t_k)+\\ldots + s p_{ks}(t_{kj}-t_k)^{s-1} = \\\\ &f(p_{k0} + p_{k1}(t_{kj}-t_k) + p_{ks}(t_{kj}-t_k)^2 + \\ldots + p_{ks}(t_{kj}-t_k)^s), \\quad j=1,\\ldots, s.\n\\end{aligned}\n\nThis gives us the complete set of equations for each interval. For the considered example of a cubic polynomial, we have one interpolation condition at the beginning of the interval and then three collocation conditions at the collocation points. In total, we have four equations for each interval. The number of equations is equal to the number of coefficients of the polynomial. Before the system of equations can solved for the coefficients, we must specifies the collocation points. Based on these, the collocation methods split into three families:\n\nGauss or Gauss-Legendre methods – the collocation points are strictly inside each interval.\nLobatto methods – the collocation points include also both ends of each interval.\nRadau methods – the collocation points include just one end of the interval.\n\n\n\n\n\n\n\nFigure 6: Single (sub)interval in indirect collocation – a cubic polynomial calls for three collocation points, two of which coincide with the discretization points (discrete-times); the continuity is enforced at the discretization point at the beginning of the interval\n\n\n\n\n\n\n\n\n\nImportant\n\n\n\nAlthough in principle the collocation points could be arbitrary (but distinct), within a given family of methods, and for a given number of collocation points, some clever options are known that maximize accuracy.\n\n\n\nLinear polynomials\nBesides the piecewise constant approximation, which is too crude, not to speak of the discontinuity it introduces, the next simplest approximation of a solution x(t) on the interval [t_k,t_{k+1}] of length h_k=t_{k+1}-t_k is a linear (actually affine) polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k).\nOn the given kth interval it is parameterized by two parameters p_{k0} and p_{k1}, hence two equations are needed. The first equation enforces the continuity at the beginning of the interval \\boxed\n{p_{k-1,0} + p_{k-1,1}h_{k-1} = p_{k0}.}\n\nThe remaining single equation is the collocation condition at a single collocation point t_{k1} = t_k + h_k c_1, which remains to be chosen. One possible choice is c_1 = 1/2, that is \nt_{k1} = t_k + \\frac{h_k}{2}\n\nIn words, the collocation point is chosen in the middle of the interval. The collocation condition then reads \\boxed\n{p_{k1} = f\\left(p_{k0} + p_{k1}\\frac{h_k}{2}\\right).}\n\n\n\nQuadratic polynomials\nIf a quadratic polynomial is used to approximate the solution, the condition at the beginning of the interval is \\boxed\n{p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2 = p_{k0}.}\n\nTwo more equations – collocation conditions – are needed to specify all the three coefficients that parameterize the aproximating polynomial on a given interval [t_k,t_{k+1}]. One intuitive (and actually clever) choice is to place the collocation points at the beginning and the end of the interval, that is, at t_k and t_{k+1}. The coefficient that parameterize the relative position of the collocation points with respect to the interval are c_1=0 and c_2=1 The collocation conditions then read \\boxed\n{\\begin{aligned}\np_{k1} &= f(p_{k0}),\\\\\np_{k1} + 2p_{k2}h_{k} &= f(p_{k0} + p_{k1}h_k + p_{k2}h_k^2).\n\\end{aligned}}\n\n\n\nCubic polynomials\nWhen a cubic polynomial is used, the condition at the beginning of the kth interval is \\boxed\n{p_{k-1,0} + p_{k-1,1}h_{k-1} + p_{k-1,2}h_{k-1}^2+p_{k-1,3}h_{k-1}^3 = p_{k0}.}\n\nThree more equations are needed to determine all the four coefficients of the polynomial. Where to place the collocations points? One intuitive (and clever too) option is to place them at the beginning, in the middle, and at the end of the interval. The relative positions of the collocation points are then given by c_1=0, c_2=1/2, and c_3=1. The collocation conditions then read \\boxed\n{\\begin{aligned}\np_{k1} &= f\\left(p_{k0} + p_{k1}(t_{k1}-t_k) + p_{k2}(t_{k1}-t_k)^2 + p_{k3}(t_{k1}-t_k)^3\\right),\\\\\np_{k1} + 2p_{k2}\\frac{h_k}{2} + 3 p_{k3}\\left(\\frac{h_k}{2}\\right)^{2} &= f\\left(p_{k0} + p_{k1}\\frac{h_k}{2} + p_{k2}\\left(\\frac{h_k}{2}\\right)^2 + p_{k3}\\left(\\frac{h_k}{2} \\right)^3\\right),\\\\\np_{k1} + 2p_{k2}h_k + 3 p_{k3}h_k^{2} &= f\\left(p_{k0} + p_{k1}h_k + p_{k2}h_k^2 + p_{k3}h_k^3\\right).\n\\end{aligned}}\n\n\n\n\nCollocation methods are implicit Runge-Kutta methods\nAn important observation that we are goint to make is that collocation methods can be viewed as implicit Runge-Kutta methods. But not all IRK methods can be viewed as collocation methods. In this section we show that the three implicit RK methods that we covered above are indeed (equivalent to) collocation methods. By the equivalence we mean that there is a linear relationship between the coefficients of the polynomials that approximate the solution on a given (sub)interval and the solution at the discretization point together with the derivative of the solution at the collocation points.\n\nImplicit midpoint method as a Radau collocation method\nFor the given integration interval [t_k,t_{k+1}], we write down two equations that relate the two coefficients of the linear polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) and an approximation x_k of x(t) at the beginning of the interval t_k, as well as an approximation of \\dot x(t) at the (single) collocation point t_{k1} = t_{k} + \\frac{h_k}{2}.\nIn particular, the first interpolation condition is p_k(t_k) = \\textcolor{red}{p_{k0} = x_k} \\approx x(t_k).\nThe second interpolation condition, the one on the derivative in the middle of the interval is \\dot p_k\\left(t_k + \\frac{h_k}{2}\\right) = \\textcolor{red}{p_{k1} = f(x_{k1},t_{k1})} \\approx f(x(t_{k1}),t_{k1}).\nNote that here we introduced yet another unknown – the approximation x_{k1} of x(t_{k1}) at the collocation point t_{k1}. We can write it using the polynomial p_k(t) as \nx_{k1} = p_k\\left(t_k + \\frac{h_k}{2}\\right) = p_{k0} + p_{k1}\\frac{h_k}{2}.\n\nSubstituting for p_{k0} and p_{k1}, we get \nx_{k1} = x_k + f(x_{k1},t_{k1})\\frac{h_k}{2}.\n\nWe also introduce the notation f_{k1} for f(x_{k1},t_{k1}) and we can write an equation \nf_{k1} = f\\left(x_k + f_{k1}\\frac{h_k}{2}\\right).\n\nBut we want to find x_{k+1}, which we can accomplish by evaluating the polynomial p_k(t) at t_{k+1} = t_k+h_k \nx_{k+1} = x_k + f_{k1}h_k.\n\nCollecting the last two equations, we rederived the good old friend – the implicit midpoint method.\n\n\nImplicit trapezoidal method as a Lobatto collocation method\nFor the given integration interval [t_k,t_{k+1}], we write down three equations that relate the three coefficients of the quadratic polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) + p_{k2}(t-t_k)^2 and an approximation x_k of x(t) at the beginning of the interval t_k, as well as approximations to \\dot x(t) at the two collocations points t_k and t_{k+1}.\nIn particular, the first interpolation condition is p_k(t_k) = \\textcolor{red}{p_{k0} = x_k} \\approx x(t_k).\nThe second interpolation condition, the one on the derivative at the beginning of the interval, the first collocation point, is \\dot p_k(t_k) = \\textcolor{red}{p_{k1} = f(x_k,t_k)} \\approx f(x(t_k),t_k).\nThe third interpolation condition, the one on the derivative at the second collocation point \\dot p_k(t_k+h_k) = \\textcolor{red}{p_{k1} + 2p_{k2} h_k = f(x_{k+1},t_{k+1})} \\approx f(x(t_{k+1}),t_{k+1}).\nAll the three conditions (emphasized in color above) can be written together as \n \\begin{bmatrix}\n 1 & 0 & 0\\\\\n 0 & 1 & 0\\\\\n 0 & 1 & 2 h_k\\\\\n \\end{bmatrix}\n \\begin{bmatrix}\n p_{k0} \\\\ p_{k1} \\\\ p_{k2}\n \\end{bmatrix}\n =\n \\begin{bmatrix}\n x_{k} \\\\ f(x_k,t_k) \\\\ f(x_{k+1},t_{k+1})\n \\end{bmatrix}.\n\nThe above system of linear equations can be solved by inverting the matrix \n \\begin{bmatrix}\n p_{k0} \\\\ p_{k1} \\\\ p_{k2}\n \\end{bmatrix}\n =\n \\begin{bmatrix}\n 1 & 0 & 0\\\\\n 0 & 1 & 0\\\\\n 0 & -\\frac{1}{2h_k} & \\frac{1}{2h_k}\\\\\n \\end{bmatrix}\n \\begin{bmatrix}\n x_{k} \\\\ f(x_k,t_k) \\\\ f(x_{k+1},t_{k+1})\n \\end{bmatrix}.\n\nWe can now write down the interpolating/approximating polynomial p_k(t) = x_{k} + f(x_{k},t_{k})(t-t_k) +\\left[-\\frac{1}{2h_k}f(x_{k},t_{k}) + \\frac{1}{2h_k}f(x_{k+1},t_{k+1})\\right](t-t_k)^2.\nThis polynomial can now be used to find an (approximation of the) value of the solution at the end of the interval x_{k+1} = p_k(t_k+h_k) = x_{k} + f(x_{k},t_{k})h_k +\\left[-\\frac{1}{2h_k}f(x_{k},t_{k}) + \\frac{1}{2h_k}f(x_{k+1},t_{k+1})\\right]h_k^2, which can be simplified nearly upon inspection to x_{k+1} = x_{k} + \\frac{f(x_{k},t_{k}) + f(x_{k+1},t_{k+1})}{2} h_k, but this is our good old friend, isn’t it? We have shown that the collocation method with a quadratic polynomial with the collocation points chosen at the beginning and the end of the interval is (equivalent to) the implicit trapezoidal method. The method belongs to the family of Lobatto IIIA methods, which are all known to be collocation methods.\n\n\nHermite-Simpson method as a Lobatto collocation method\nHere we show that Hermite-Simpson method also qualifies as a collocation method. In particular, it belongs to the family of Lobatto IIIA methods, similarly as implicit trapezoidal method. The first condition, the one on the value of the cubic polynomial p_k(t) = p_{k0} + p_{k1}(t-t_k) + p_{k2}(t-t_k)^2+ p_{k3}(t-t_k)^3 at the beginning of the interval is p_k(t_k) = \\textcolor{red}{p_{k0} = x_k} \\approx x(t_k).\nThe three remaining conditions are imposed at the collocation points, which for the integration interval [t_k,t_{k+1}] are t_{k1} = t_k , t_{k2} = \\frac{t_k+t_{k+1}}{2} , and t_{k3} = t_{k+1}. With the first derivative of the polynomial given by \\dot p_k(t) = p_{k1} + 2p_{k2}(t-t_k) + 3p_{k3}(t-t_k)^2, the first collocation condition \\dot p_k(t_k) = \\textcolor{red}{p_{k1} = f(x_k,t_k)} \\approx f(x(t_k),t_k).\nThe second collocation condition – the one on the derivative in the middle of the interval – is \\dot p_k\\left(t_k+\\frac{1}{2}h_k\\right) = \\textcolor{red}{p_{k1} + 2p_{k2} \\frac{h_k}{2} + 3p_{k3} \\left(\\frac{h_k}{2}\\right)^2 = f(x_{k2},t_{k2})} \\approx f\\left(x\\left(t_{k}+\\frac{h_k}{2}\\right),t_{k}+\\frac{h_k}{2}\\right).\nThe color-emphasized part can be simplified to \\textcolor{red}{p_{k1} + p_{k2} h_k + \\frac{3}{4}p_{k3} h_k^2 = f(x_{k2},t_{k2})}.\nFinally, the third collocation condition – the one imposed at the end of the interval – is \\dot p_k(t_k+h_k) = \\textcolor{red}{p_{k1} + 2p_{k2} h_k + 3p_{k3} h_k^2 = f(x_{k+1},t_{k+1})} \\approx f(x(t_{k+1}),t_{k+1}).\nAll the four conditions (emphasized in color above) can be written together as \n \\begin{bmatrix}\n 1 & 0 & 0 & 0\\\\\n 0 & 1 & 0 & 0\\\\\n 0 & 1 & h_k & \\frac{3}{4} h_k^2\\\\\n 0 & 1 & 2 h_k & 3h_k^2\\\\\n \\end{bmatrix}\n \\begin{bmatrix}\n p_{k0} \\\\ p_{k1} \\\\ p_{k2} \\\\p_{k3}\n \\end{bmatrix}\n =\n \\begin{bmatrix}\n x_{k} \\\\ f(x_k,t_k) \\\\ f(x_{k2},t_{k2}) \\\\ f(x_{k+1},t_{k+1}).\n \\end{bmatrix}\n\nInverting the matrix analytically, we get \n \\begin{bmatrix}\n p_{k0} \\\\ p_{k1} \\\\ p_{k2}\\\\ p_{k3}\n \\end{bmatrix}\n =\n \\begin{bmatrix}\n 1 & 0 & 0 & 0\\\\\n 0 & 1 & 0 & 0\\\\\n 0 & -\\frac{3}{2h_k} & \\frac{2}{h_k} & -\\frac{1}{2h_k}\\\\\n 0 & \\frac{2}{3h_k^2} & -\\frac{4}{3h_k^2} & \\frac{2}{3h_k^2}\n \\end{bmatrix}\n \\begin{bmatrix}\n x_{k} \\\\ f(x_k,t_k) \\\\ f(x_{k2},t_{k2})\\\\ f(x_{k+1},t_{k+1}).\n \\end{bmatrix}.\n\nWe can now write down the interpolating/approximating polynomial \n \\begin{aligned}\n p_k(t) &= x_{k} + f(x_{k},t_{k})(t-t_k) +\\left[-\\frac{3}{2h_k}f(x_{k},t_{k}) + \\frac{2}{h_k}f(x_{k2},t_{k2}) -\\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \\right](t-t_k)^2\\\\\n & +\\left[\\frac{2}{3h_k^2}f(x_{k},t_{k}) - \\frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \\right](t-t_k)^3.\n \\end{aligned}\n\nWe can use this prescription of the polynomial p_k(t) to compute the (approximation of the) value of the solution at the end of the kth interval \n \\begin{aligned}\n x_{k+1} = p_k(t_k+h_k) &= x_{k} + f(x_{k},t_{k})h_k +\\left[-\\frac{3}{2h_k}f(x_{k},t_{k}) + \\frac{2}{h_k}f(x_{k2},t_{k2}) -\\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \\right]h_k^2\\\\\n & +\\left[\\frac{2}{3h_k^2}f(x_{k},t_{k}) - \\frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \\right]h_k^3,\n \\end{aligned}\n which can be simplified to \n \\begin{aligned}\n x_{k+1} &= x_{k} + f(x_{k},t_{k})h_k +\\left[-\\frac{3}{2}f(x_{k},t_{k}) + \\frac{2}{1}f(x_{k2},t_{k2}) -\\frac{1}{2}f(x_{k+1},t_{k+1}) \\right]h_k\\\\\n & +\\left[\\frac{2}{3}f(x_{k},t_{k}) - \\frac{4}{3}f(x_{k2},t_{k2}) +\\frac{2}{3}f(x_{k+1},t_{k+1}) \\right]h_k,\n \\end{aligned}\n which further simplifies to \n x_{k+1} = x_{k} + h_k\\left[\\frac{1}{6}f(x_{k},t_{k}) + \\frac{2}{3}f(x_{k2},t_{k2}) + \\frac{1}{6}f(x_{k+1},t_{k+1}) \\right],\n which can be recognized as the Simpson integration that we have already seen in implicit Runge-Kutta method described above.\nObviously f_{k2} needs to be further elaborated on, namely, x_{k2} needs some prescription too. We know that it was introduced as an approximation to the solution x in the middle of the interval. Since the value of the polynomial in the middle is such an approximation too, we can set x_{k2} equal to the value of the polynomial in the middle. \n \\begin{aligned}\n x_{k2} = p_k\\left(t_k+\\frac{1}{2}h_k\\right) &= x_{k} + f(x_{k},t_{k})\\frac{h_k}{2} +\\left[-\\frac{3}{2h_k}f(x_{k},t_{k}) + \\frac{2}{h_k}f(x_{k2},t_{k2}) -\\frac{1}{2h_k}f(x_{k+1},t_{k+1}) \\right]\\left(\\frac{h_k}{2}\\right)^2\\\\\n & +\\left[\\frac{2}{3h_k^2}f(x_{k},t_{k}) - \\frac{4}{3h_k^2}f(x_{k2},t_{k2}) +\\frac{2}{3h_k^2}f(x_{k+1},t_{k+1}) \\right]\\left(\\frac{h_k}{2}\\right)^3,\n \\end{aligned}\n which without further ado simplifies to \n x_{k2} = x_{k} + h_k\\left( \\frac{5}{24}f(x_{k},t_{k}) +\\frac{1}{3}f(x_{k2},t_{k2}) -\\frac{1}{24}f(x_{k+1},t_{k+1}) \\right),\n which can be recognized as the other equation in the primary formulation of Implicit trapezoidal method described above.\n\n\n\nPseudospectral collocation methods\nThey only consider a single polynomial over the whole interval. The degree of such polynomial, in contrast with classical collocation methods, rather high, therefore also the number of collocation points is high, but their location is crucial.",
"crumbs": [
"9. Numerical methods for continuous-time optimal control - both indirect and direct approaches",
"Numerical methods for indirect approach"
@@ -620,7 +642,7 @@
"href": "cont_dp_LQR.html",
"title": "Using HJB equation to solve the continuous-time LQR problem",
"section": "",
- "text": "As we have already discussed a couple of times, in the LQR problem we consider a linear time invariant (LTI) system modelled by \n\\dot{\\bm x}(t) = \\mathbf A\\bm x(t) + \\mathbf B\\bm u(t),\n and the quadratic cost function \nJ(\\bm x(t_\\mathrm{i}),\\bm u(\\cdot), t_\\mathrm{i}) = \\frac{1}{2}\\bm x^\\top(t_\\mathrm{f})\\mathbf S_\\mathrm{f}\\bm x(t_\\mathrm{f}) + \\frac{1}{2}\\int_{t_\\mathrm{i}}^{t_\\mathrm{f}}\\left(\\bm x^\\top \\mathbf Q\\bm x + \\bm u^\\top \\mathbf R \\bm u\\right)\\mathrm{d}t.\n\nThe Hamiltonian is \nH(\\bm x,\\bm u,\\bm \\lambda) = \\frac{1}{2}\\left(\\bm x^\\top \\mathbf Q\\bm x + \\bm u^\\top \\mathbf R \\bm u\\right) + \\boldsymbol{\\lambda}^\\top \\left(\\mathbf A\\bm x + \\mathbf B\\bm u\\right).\n\nAccording to the HJB equation our goal is to minimize H at a given time t, which enforces the condition on its gradient \n\\mathbf 0 = \\nabla_{\\bm u} H = \\mathbf R\\bm u + \\mathbf B^\\top \\boldsymbol\\lambda,\n from which it follows that the optimal control must necessarily satisfy \n\\bm u^\\star = -\\mathbf R^{-1} \\mathbf B^\\top \\boldsymbol\\lambda.\n\nSince the Hessian of the Hamiltonian is positive definite by our assumption on positive definiteness of \\mathbf R \n\\nabla_{\\bm u \\bm u}^2 \\mathbf H = \\mathbf R > 0,\n Hamiltonian is really minimized by the above choice of \\bm u^\\star.\nThe minimized Hamiltonian is \n\\min_{\\bm u(t)}H(\\bm x, \\bm u, \\bm \\lambda) = \\frac{1}{2}\\bm x^\\top \\mathbf Q \\bm x + \\boldsymbol\\lambda^\\top \\mathbf A \\bm x - \\frac{1}{2}\\boldsymbol\\lambda^\\top \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\boldsymbol\\lambda\n\nSetting \\boldsymbol\\lambda = \\nabla_{\\bm x} J^\\star, the HJB equation is \\boxed\n{-\\frac{\\partial J^\\star}{\\partial t} = \\frac{1}{2}\\bm x^\\top \\mathbf Q \\bm x + (\\nabla_{\\bm x} J^\\star)^\\top \\mathbf A\\bm x - \\frac{1}{2}(\\nabla_{\\bm x} J^\\star)^\\top \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\nabla_{\\bm x} J^\\star,}\n and the boundary condition is \nJ^\\star(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) = \\frac{1}{2}\\bm x^\\top (t_\\mathrm{f})\\mathbf S_\\mathrm{f}\\bm x(t_\\mathrm{f}).\n\nWe can now proceed by assuming that the optimal cost function is quadratic in \\bm x for all other times t, that is, there must exist a symmetric matrix function \\mathbf S(t) such that \nJ^\\star(\\bm x(t),t) = \\frac{1}{2}\\bm x^\\top (t)\\mathbf S(t)\\bm x(t).\n\n\n\n\n\n\n\nNote\n\n\n\nRecall that we did something similar when making a sweep assumption to derive a Riccati equation following the indirect approach – we just make an inspired guess and see if it works. Here the inspiration comes from the observation made elsewhere, that the optimal cost function in the LQR problem is quadratic in \\bm x.\n\n\nWe now aim at substituting this into the HJB equation. Observe that \\frac{\\partial J^\\star}{\\partial t}=\\bm x^\\top(t) \\dot{\\mathbf{S}}(t) \\bm x(t) and \\nabla_{\\bm x} J^\\star = \\mathbf S \\bm x. Upon substitution to the HJB equation, we get\n\n-\\bm x^\\top \\dot{\\mathbf{S}} \\bm x = \\frac{1}{2}\\bm x^\\top \\mathbf Q \\bm x + \\bm x^\\top \\mathbf S \\mathbf A\\bm x - \\frac{1}{2}\\bm x^\\top \\mathbf S \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S \\bm x.\n\nThis can be reformatted as \n-\\bm x^\\top \\dot{\\mathbf{S}} \\bm x = \\frac{1}{2} \\bm x^\\top \\left[\\mathbf Q + 2 \\mathbf S \\mathbf A - \\mathbf S \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S \\right ] \\bm x.\n\nNotice that the middle matrix in the square brackets is not symmetric. Symmetrizing it (with no effect on the resulting value of the quadratic form) we get\n\n-\\bm x^\\top \\dot{\\mathbf{S}} \\bm x = \\frac{1}{2} \\bm x^\\top \\left[\\mathbf Q + \\mathbf S \\mathbf A + \\mathbf A^\\top \\mathbf S - \\mathbf S \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S \\right ] \\bm x.\n\nFinally, since the above single (scalar) equation should hold for all \\bm x(t), the matrix equation must hold too, and we get the familiar differential Riccati equation for the matrix variable \\mathbf S(t) \\boxed\n{-\\dot{\\mathbf S}(t) = \\mathbf A^\\top \\mathbf S(t) + \\mathbf S(t)\\mathbf A - \\mathbf S(t)\\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S(t) + \\mathbf Q}\n initialized at the final time t_\\mathrm{f} by \\mathbf S(t_\\mathrm{f}) = \\mathbf S_\\mathrm{f}.\nHaving obtained \\mathbf S(t), we can get the optimal control by substituting it into \\boxed\n{\n\\begin{aligned}\n \\bm u^\\star(t) &= - \\mathbf R^{-1}\\mathbf B^\\top \\nabla_{\\bm x} J^\\star(\\bm x(t),t) \\\\\n &= - \\underbrace{\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S(t)}_{\\bm K(t)}\\bm x(t).\n\\end{aligned}\n}\n\nWe have just rederived the continuous-time LQR problem using the HJB equation (previously we did it by massaging the two-point boundary value problem that followed as the necessary condition of optimality from the techniques of calculus of variations).\nNote that we have also just seen the equivalence between a first-order linear PDE and first-order nonlinear ODE.\n\n\n\n Back to top",
+ "text": "As we have already discussed a couple of times, in the LQR problem we consider a linear time invariant (LTI) system modelled by \n\\dot{\\bm x}(t) = \\mathbf A\\bm x(t) + \\mathbf B\\bm u(t),\n and the quadratic cost function \nJ(\\bm x(t_\\mathrm{i}),\\bm u(\\cdot), t_\\mathrm{i}) = \\frac{1}{2}\\bm x^\\top(t_\\mathrm{f})\\mathbf S_\\mathrm{f}\\bm x(t_\\mathrm{f}) + \\frac{1}{2}\\int_{t_\\mathrm{i}}^{t_\\mathrm{f}}\\left(\\bm x^\\top \\mathbf Q\\bm x + \\bm u^\\top \\mathbf R \\bm u\\right)\\mathrm{d}t.\n\nThe Hamiltonian is \nH(\\bm x,\\bm u,\\bm \\lambda) = \\frac{1}{2}\\left(\\bm x^\\top \\mathbf Q\\bm x + \\bm u^\\top \\mathbf R \\bm u\\right) + \\boldsymbol{\\lambda}^\\top \\left(\\mathbf A\\bm x + \\mathbf B\\bm u\\right).\n\nAccording to the HJB equation our goal is to minimize H at a given time t, which enforces the condition on its gradient \n\\mathbf 0 = \\nabla_{\\bm u} H = \\mathbf R\\bm u + \\mathbf B^\\top \\boldsymbol\\lambda,\n from which it follows that the optimal control must necessarily satisfy \n\\bm u^\\star = -\\mathbf R^{-1} \\mathbf B^\\top \\boldsymbol\\lambda.\n\nSince the Hessian of the Hamiltonian is positive definite by our assumption on positive definiteness of \\mathbf R \n\\nabla_{\\bm u \\bm u}^2 \\mathbf H = \\mathbf R > 0,\n Hamiltonian is really minimized by the above choice of \\bm u^\\star.\nThe minimized Hamiltonian is \n\\min_{\\bm u(t)}H(\\bm x, \\bm u, \\bm \\lambda) = \\frac{1}{2}\\bm x^\\top \\mathbf Q \\bm x + \\boldsymbol\\lambda^\\top \\mathbf A \\bm x - \\frac{1}{2}\\boldsymbol\\lambda^\\top \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\boldsymbol\\lambda\n\nSetting \\boldsymbol\\lambda = \\nabla_{\\bm x} J^\\star, the HJB equation is \\boxed\n{-\\frac{\\partial J^\\star}{\\partial t} = \\frac{1}{2}\\bm x^\\top \\mathbf Q \\bm x + (\\nabla_{\\bm x} J^\\star)^\\top \\mathbf A\\bm x - \\frac{1}{2}(\\nabla_{\\bm x} J^\\star)^\\top \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\nabla_{\\bm x} J^\\star,}\n and the boundary condition is \nJ^\\star(\\bm x(t_\\mathrm{f}),t_\\mathrm{f}) = \\frac{1}{2}\\bm x^\\top (t_\\mathrm{f})\\mathbf S_\\mathrm{f}\\bm x(t_\\mathrm{f}).\n\nWe can now proceed by assuming that the optimal cost function is quadratic in \\bm x for all other times t, that is, there must exist a symmetric matrix function \\mathbf S(t) such that \nJ^\\star(\\bm x(t),t) = \\frac{1}{2}\\bm x^\\top (t)\\mathbf S(t)\\bm x(t).\n\n\n\n\n\n\n\nNote\n\n\n\nRecall that we did something similar when making a sweep assumption to derive a Riccati equation following the indirect approach – we just make an inspired guess and see if it solves the equation. Here the inspiration comes from the observation made elsewhere, that the optimal cost function in the LQR problem is quadratic in \\bm x.\n\n\nWe now aim at substituting this into the HJB equation. Observe that \\frac{\\partial J^\\star}{\\partial t}=\\bm x^\\top(t) \\dot{\\mathbf{S}}(t) \\bm x(t) and \\nabla_{\\bm x} J^\\star = \\mathbf S \\bm x. Upon substitution to the HJB equation, we get\n\n-\\bm x^\\top \\dot{\\mathbf{S}} \\bm x = \\frac{1}{2}\\bm x^\\top \\mathbf Q \\bm x + \\bm x^\\top \\mathbf S \\mathbf A\\bm x - \\frac{1}{2}\\bm x^\\top \\mathbf S \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S \\bm x.\n\nThis can be reformatted as \n-\\bm x^\\top \\dot{\\mathbf{S}} \\bm x = \\frac{1}{2} \\bm x^\\top \\left[\\mathbf Q + 2 \\mathbf S \\mathbf A - \\mathbf S \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S \\right ] \\bm x.\n\nNotice that the middle matrix in the square brackets is not symmetric. Symmetrizing it (with no effect on the resulting value of the quadratic form) we get\n\n-\\bm x^\\top \\dot{\\mathbf{S}} \\bm x = \\frac{1}{2} \\bm x^\\top \\left[\\mathbf Q + \\mathbf S \\mathbf A + \\mathbf A^\\top \\mathbf S - \\mathbf S \\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S \\right ] \\bm x.\n\nFinally, since the above single (scalar) equation should hold for all \\bm x(t), the matrix equation must hold too, and we get the familiar differential Riccati equation for the matrix variable \\mathbf S(t) \\boxed\n{-\\dot{\\mathbf S}(t) = \\mathbf A^\\top \\mathbf S(t) + \\mathbf S(t)\\mathbf A - \\mathbf S(t)\\mathbf B\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S(t) + \\mathbf Q}\n initialized at the final time t_\\mathrm{f} by \\mathbf S(t_\\mathrm{f}) = \\mathbf S_\\mathrm{f}.\nHaving obtained \\mathbf S(t), we can get the optimal control by substituting it into \\boxed\n{\n\\begin{aligned}\n \\bm u^\\star(t) &= - \\mathbf R^{-1}\\mathbf B^\\top \\nabla_{\\bm x} J^\\star(\\bm x(t),t) \\\\\n &= - \\underbrace{\\mathbf R^{-1}\\mathbf B^\\top \\mathbf S(t)}_{\\bm K(t)}\\bm x(t).\n\\end{aligned}\n}\n\nWe have just rederived the continuous-time LQR problem using the HJB equation (previously we did it by massaging the two-point boundary value problem that followed as the necessary condition of optimality from the techniques of calculus of variations).\nNote that we have also just seen the equivalence between a first-order linear PDE and first-order nonlinear ODE.\n\n\n\n Back to top",
"crumbs": [
"9.X. Continuous-time optimal control – dynamic programming",
"Using HJB equation to solve the continuous-time LQR problem"
@@ -957,7 +979,7 @@
"href": "roban_uncertainty.html#models-of-uncertainty",
"title": "Uncertainty (in) modelling",
"section": "Models of uncertainty",
- "text": "Models of uncertainty\nThere are several approaches to model the uncertainty (or, in other words, to characterize the uncertainty in the model). They all aim – in one way or another – to express that the controller has to deal no only with the single nominal system, for which it was designed, but a family of systems. Depending on the mathematical frameworks used for characterization of such a family, there are two major classes of approaches.\n\nWorst-case models of uncertainty\nProbabilistic models of uncertainty\n\nThe former assumes sets of systems with no additional information about the structure of such sets. The latter imposes some probability structure on the set of systems – in other words, although in principle any member of the set possible, some may be more probable than the others. In this course we are focusing on the former, which is also the mainstream in the robust control literature, but note that the latter we already encountered while considering control for systems exposed to random disturbances, namely the LQG control. A possible viewpoint is that as a consequence of the random disturbance, the controller has to deal with a family of systems.\nAnother classification of models of uncertainty is according to the actual quantity that is uncertain. We distinguish these two\n\nParametric uncertainty\nFrequency-dependent (aka dynamical) uncertainty\n\nUnstructured uncertainty\nStructured uncertainty\n\n\n\nParametric uncertainty\nThis is obviously straightforward to state: some (real/physical) parameters are uncertain. The conceptually simplest way to characterize such uncertain parameters is by considering intervals instead of just single (nominal) values.\n\nExample 1 (A pendulum on a cart) \n\\begin{aligned}\n{\\color{red} m_\\mathrm{l}} & \\in [m_\\mathrm{l}^{-},m_\\mathrm{l}^{+}],\\\\\n{\\color{red} l} & \\in [l^{-}, l^{+}],\n\\end{aligned}\n\n\n\\dot{\\bm x}(t) =\n\\begin{bmatrix}\n0 & 1 & 0 & 0\\\\\n0 & 0 & \\frac{\\textcolor{red}{m_\\mathrm{l}}}{m_\\mathrm{c}} g & 0\\\\\n0 & 0 & 0 & 1\\\\\n0 & 0 & -\\frac{(\\textcolor{red}{m_\\mathrm{l}}+m_\\mathrm{c})g}{m_\\mathrm{c}\\textcolor{red}{l}} & 0\n\\end{bmatrix}\n\\bm x(t)\n+\n\\begin{bmatrix}\n0\\\\\n\\frac{1}{m_\\mathrm{c}}\\\\\n0\\\\\n-\\frac{1}{m_\\mathrm{c}\\textcolor{red}{l}}\n\\end{bmatrix}\nu(t).\n\n\n\n\nUnstructured frequency-dependent uncertainty\nNot only some enumerated physical parameters but even the order of the system can be uncertain. In other words, there may be some phenomena exhibitted by the system that is not captured by the model at all. Possibly some lightly damped modes, possibly some time delay here and there. The system contains uncertain dynamics. In the linear case, all this can be expressed by regarding the magnitude and phase responses uncertain without mapping these to actual physical parameters.\n\n\n\n\n\n\nFigure 1: A whole subsystem is uncertain\n\n\n\nA popular model for the uncertain subsystem is that of a transfer function \\Delta(s), about which we know only that it is stable and that its magnitude is bounded by 1 \\boxed\n{\\sup_{\\omega}|\\Delta(j\\omega)|\\leq 1,\\;\\;\\Delta \\;\\text{stable}. }\n\nBut typically the uncertainty is higher at higher frequencies. This can be expressed by using some weighting function w(\\omega).\nFor later theoretical and computational purposes we approximate the real weighting function using a low-order rational stable transfer function W(s). That is, W(j\\omega)\\approx w(\\omega) for \\omega \\in \\mathbb R, that is for s=j\\omega on the imaginary axis.\nThe ultimate transfer function model of the uncertainty is then\n\\boxed{\nW(s)\\;\\Delta(s),\\quad \\max_{\\omega}|\\Delta(j\\omega)|\\leq 1,\\;\\;\\Delta\\; \\text{stable}. }\n\n\n\\mathcal H_\\infty norm of an LTI system\n\nH-infinity norm of an LTI system interpreted in frequency domain\n\nDefinition 4 (\\mathcal H_\\infty norm of a SISO LTI system) For a stable LTI system G with a single input and single output, the \\mathcal H_\\infty norm is defined as \n\\|G\\|_{\\infty} = \\sup_{\\omega\\in\\mathbb{R}}|G(j\\omega)|.\n\n\n\n\n\n\n\n\nWhy supremum and not maximum?\n\n\n\nSupremum is uses in the definition because it is not guaranteed that the peak value of the magnitude frequency response is attained at a single frequency. Consider an example of a first-order system G(s) = \\frac{s}{Ts+1}. The peak gain of 1/T is not attained at a single finite frequency.\n\n\nHaving just defined the \\mathcal H_\\infty norm, the uncertainty model can be expressed compactly as \\boxed{\nW(s)\\;\\Delta(s),\\quad \\|\\Delta(j\\omega)\\|\\leq 1. }\n\n\n\n\n\n\n\n\\mathcal H_\\infty as a space of functions\n\n\n\n\\mathcal H_\\infty denotes a normed vector space of functions that are analytic in the closed extended right half plane (of the complex plane). In parlance of control systems, \\mathcal H_\\infty is the space of proper and stable transfer functions. Poles on the imaginary axis are not allowed. The functions do not need to be rational, but very often we do restrict ourselves to rational functions, in which case we typically write such space as \\mathcal{RH}_\\infty.\n\n\nWe now extend the concept of the \\mathcal H_\\infty norm to MIMO systems. The extension is perhaps not quite intuitive – certainly it is not computed as the maximum of the norms of individual transfer functions, which may be the first guess.\n\nDefinition 5 (\\mathcal H_\\infty norm of a MIMO LTI system) For a stable LTI system \\mathbf G with multiple inputs and/or multiple outputs, the \\mathcal H_\\infty norm is defined as \n\\|\\mathbf G\\|_{\\infty} = \\sup_{\\omega\\in\\mathbb{R}}\\bar{\\sigma}(\\mathbf{G}(j\\omega))\n where \\bar\\sigma is the largest singular value.\n\nHere we include a short recap of singular values and singular value decomposition (SVD) of a matrix. Consider a matrix \\mathbf M, possibly a rectangular one. It can be decomposed as a product of three matrices \n\\mathbf M = \\mathbf U\n\\underbrace{\n\\begin{bmatrix}\n\\sigma_1 & & & &\\\\\n & \\sigma_2 & & &\\\\\n & &\\sigma_3 & &\\\\\n\\\\\n & & & & \\sigma_n\\\\\n\\end{bmatrix}\n}_{\\boldsymbol\\Sigma}\n\\mathbf V^{*}.\n\nThe two square matrices \\mathbf V and \\mathbf U are unitary, that is, \n\\mathbf V\\mathbf V^*=\\mathbf I=\\mathbf V^*\\mathbf V\n and \n\\mathbf U\\mathbf U^*=\\mathbf I=\\mathbf U^*\\mathbf U.\n\nThe nonnegative diagonal entries \\sigma_i \\in \\mathbb R_+, \\forall i of the (possibly rectangular) matrix \\Sigma are called singular values. Commonly they are ordered in a nonincreasing order, that is \n\\sigma_1\\geq \\sigma_2\\geq \\sigma_3\\geq \\ldots \\geq \\sigma_n.\n\nIt is also a common notation to denote the largest singular value as \\bar \\sigma, that is, \\bar \\sigma \\coloneqq \\sigma_1.\n\n\n\\mathcal{H}_{\\infty} norm of an LTI system interpreted in time domain\nWe can also view the dynamical system G with inputs and outputs as an operator mapping from some chosen space of functions to another space of functions. A popular model for these spaces are the spaces of square-integrable functions, denoted as \\mathcal{L}_2, and sometimes interpreted as bounded-energy signals \nG:\\;\\mathcal{L}_2\\rightarrow \\mathcal{L}_2.\n\nIt is a powerful fact that the \\mathcal{H}_{\\infty} norm of the system is then defined as the induced norm of the corresponding operator \\boxed{\n\\|G(s)\\|_{\\infty} = \\sup_{u(t)\\in\\mathcal{L}_2\\setminus 0}\\frac{\\|y(t)\\|_2}{\\|u(t)\\|_2}}.\n\nWith the energy interpretation of the input and output variables, this system norm can also be interpreted as the worst-case energy gain of the system.\nScaling necessary to get any useful info from MIMO models! See Skogestad’s book, section 1.4, pages 5–8. \n\n\n\nHow does the uncertainty enter the model of the system?\n\nAdditive uncertainty\nThe transfer function of an uncertain system can be written as a sum of a nominal system and an uncertainty \nG(s) = \\underbrace{G_0(s)}_{\\text{nominal model}}+\\underbrace{W(s)\\Delta(s)}_{\\text{additive uncertainty}}.\n\nThe block diagram interpretation is in\n\n\n\n\n\n\nFigure 2: Additive uncertainty\n\n\n\nThe magnitude frequency response of the weighting filter W(s) then serves as an upper bound on the absolute error in the magnitude frequency responses \n|G(j\\omega)-G_0(j\\omega)|<|W(j\\omega)|\\quad \\forall \\omega\\in\\mathbb R.\n\n\n\nMultiplicative uncertainty\n\nG(s) = (1+W(s)\\Delta(s))\\,G_0(s).\n\nThe block diagram interpretation is in\n\n\n\n\n\n\nFigure 3: Multiplicative uncertainty\n\n\n\n\n\n\n\n\n\nFor SISO transfer functions no need to bother about the order of terms in the products\n\n\n\nSice we are considering SISO transfer functions, the order of terms in the products is not important. We will have to be more alert to the order of terms when we move to MIMO systems.\n\n\nThe magnitude frequency response of the weighting filter W(s) then serves as an upper bound on the relative error in the magnitude frequency responses \\boxed\n{\\frac{|G(j\\omega)-G_0(j\\omega)|}{|G_0(j\\omega)|}<|W(j\\omega)|\\quad \\forall \\omega\\in\\mathbb R.}\n\n\nExample 2 (Uncertain first-order delayed system) We consider a first-order system with a delay described by \nG(s) = \\frac{k}{T s+1}e^{-\\theta s}, \\qquad 2\\leq k,\\tau,\\theta,\\leq 3.\n\nWe now need to choose the nominal model G_0(s) and then the uncertainty weighting filter W(s). The nominal model corresponds to the nominal values of the parameters, therefore we must choose these. There is no single correct way to do this. Perhaps the most intuitive way is to choose the nominal values as the average of the bounds. But we can also choose the nominal values in a way that makes the nominal system simple. For example, for this system with a delay, we can even choose the nominal value of the delay as zero, which makes the nominal system a first-order system without delay, hence simple enough for application of some basic linear control system design methods. Of course, the price to pay is that the resulting model of an uncertain system, which is actually a set of systems, contains even models of a plant that were not prescribed.\n\n\nShow the code\nusing ControlSystems\nusing Plots\n\nfunction uncertain_first_order_delayed()\n kmin = 2; kmax = 3; \n θmin = 2; θmax = 3; \n Tmin = 2; Tmax = 3;\n\n k₀ = (kmin+kmax)/2; \n θ₀ = (θmin+θmax)/2;\n θ₀ = 0 \n T₀ = (Tmin+Tmax)/2;\n\n G₀ = tf(k₀,[T₀, 1])*delay(θ₀) \n\n ω = exp10.(range(-2, 2, length=50))\n G₀ω = freqresp(G₀,ω)\n G₀ω = vec(G₀ω)\n\n EEω_db = [];\n for k in range(kmin,kmax,length=10)\n for θ in range(θmin,θmax,length=10)\n for T in range(Tmin,Tmax,length=10)\n G = tf(k,[T, 1])*delay(θ) \n Gω = freqresp(G,ω)\n Gω = vec(Gω) \n Eω = abs.(Gω-G₀ω)./abs.(G₀ω)\n Eω_db = 20 * log10.(Eω)\n push!(EEω_db,Eω_db)\n end\n end\n end\n\n plot(ω,EEω_db,xscale=:log10,label=\"\",ylims=(-40,20))\n xlabel!(\"Frequency [rad/s]\")\n ylabel!(\"Relative error [dB]\")\nend\nuncertain_first_order_delayed()\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNow we need to find some upper bound on the relative error. Simplicity is a virtue here too, hence we are looking for a rational filter of very low order, say 1 or 2. Speaking of the first-order filter, one useful way to format it is \n\\boxed{\nW(s) = \\frac{\\tau s+r_0}{(\\tau/r_{\\infty})s+1}}\n where r_0 is uncertainty at steady state, 1/\\tau is the frequency, where the relative uncertainty reaches 100%, r_{\\infty} is relative uncertainty at high frequencies, often r_{\\infty}\\geq 2.\nFor our example, the parameters of the filter are in the code below and the frequency response follows.\n\n\nShow the code\nusing ControlSystems\nusing Plots\n\nfunction uncertain_first_order_delayed_with_weights()\n kmin = 2; kmax = 3; \n θmin = 2; θmax = 3; \n Tmin = 2; Tmax = 3;\n\n k₀ = (kmin+kmax)/2; \n θ₀ = (θmin+θmax)/2;\n θ₀ = 0 \n T₀ = (Tmin+Tmax)/2;\n\n G₀ = tf(k₀,[T₀, 1])*delay(θ₀) \n\n ω = exp10.(range(-2, 2, length=50))\n G₀ω = freqresp(G₀,ω)\n G₀ω = vec(G₀ω)\n\n EEω_db = [];\n for k in range(kmin,kmax,length=10)\n for θ in range(θmin,θmax,length=10)\n for T in range(Tmin,Tmax,length=10)\n G = tf(k,[T, 1])*delay(θ) \n Gω = freqresp(G,ω)\n Gω = vec(Gω) \n Eω = abs.(Gω-G₀ω)./abs.(G₀ω)\n Eω_db = 20 * log10.(Eω)\n push!(EEω_db,Eω_db)\n end\n end\n end\n\n plot(ω,EEω_db,xscale=:log10,label=\"\",ylims=(-40,20))\n xlabel!(\"Frequency [rad/s]\")\n ylabel!(\"Relative error [dB]\")\n\n τ = 1/0.25\n r₀ = 0.2\n r∞ = 10^(8/20)\n W = tf([τ, r₀],[τ/r∞, 1])\n magW, phaseW = bode(W,ω)\n plot!(ω,20*log10.(vec(magW)),xscale=:log10,lw=3,color=:red,label=\"W\")\n\n # W2 = W*tf([1, 1.6, 1],[1, 1.4, 1]); \n # magW2, phaseW2 = bode(W2,ω)\n # plot!(ω,20*log10.(vec(magW2)),xscale=:log10,lw=3,color=:blue,label=\"W₂\")\nend\nuncertain_first_order_delayed_with_weights()\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nObviously the filter does not capture the family of systems perfectly. It is now up to the control engineer to decide if this is a problem. If yes, if the control design should be really robust against all uncertainties in the considered set, some more complex (higher-order) filter is needed to described the uncertainty more accurately. The source code shows (in commented lines) one particular candidate, but in general the whole problem boils down to designing a stable filter with a prescribed magnitude frequency response.\n\n\n\nInverse additive uncertainty\n…\n\n\nInverse multiplicative uncertainty\n…\n\n\nLinear fractional transformation (LFT)\nFor a matrix \\mathbf P sized (n_1+n_2)\\times(m_1+m_2) and divided into blocks like \n\\mathbf P=\n\\begin{bmatrix}\n\\mathbf P_{11} & \\mathbf P_{12}\\\\\n\\mathbf P_{21} & \\mathbf P_{22}\n\\end{bmatrix},\n and a matrix \\mathbf K sized m_2\\times n_2, the lower LFT of \\mathbf P with respect to \\mathbf K is \n\\boxed{\n\\mathcal{F}_\\mathbf{l}(\\mathbf P,\\mathbf K) = \\mathbf P_{11}+\\mathbf P_{12}\\mathbf K(\\mathbf I-\\mathbf P_{22}\\mathbf K)^{-1}\\mathbf P_{21}}.\n\nIt can be viewed as a feedback interconnection of the plant \\mathbf P and the controller \\mathbf K, in which not all plant inputs are used as control inputs and not all plant outputs are measured, as depicted in Figure 4\n\n\n\n\n\n\nFigure 4: Lower LFT of \\mathbf P with respect to \\mathbf K\n\n\n\nSimilarly, for a matrix \\mathbf N sized (n_1+n_2)\\times(m_1+m_2) and a matrix \\boldsymbol\\Delta sized m_1\\times n_1, the upper LFT of \\mathbf N with respect to \\mathbf K is \n\\boxed{\n\\mathcal{F}_\\mathbf{u}(\\mathbf N,\\boldsymbol\\Delta) = \\mathbf N_{22}+\\mathbf N_{21}\\boldsymbol\\Delta(\\mathbf I-\\mathbf N_{11}\\boldsymbol\\Delta)^{-1}\\mathbf N_{12}}.\n\nIt can be viewed as a feedback interconnection of the nominal plant \\mathbf N and the uncertainty block \\boldsymbol\\Delta, as depicted in Figure 5\n\n\n\n\n\n\nFigure 5: Upper LFT of \\mathbf N with respect to \\boldsymbol \\Delta\n\n\n\nHere we already anticipated MIMO uncertainty blocks. One motivation for them is explained in the very next section on structured uncertainties, another one is appearing once we start formulating robust performance within the same analytical framework as robust stability.\n\n\n\n\n\n\nWhich is lower and which is upper is a matter of convention, but a useful one\n\n\n\nOur usage of the lower LFT for a feedback interconnection of a (generalized) plant and a controller and the upper LFT for a feedback interconnection of a nominal system and and uncertainty is completely arbitrary. We could easily use the lower LFT for the uncertainty. But it is a convenient convention to adhere to. The more so that it allows for the combination of both as in the diagram Figure 6 below, which corresponds to composition of the two LFTs.\n\n\n\n\n\n\nFigure 6: Combination of the lower and upper LFT\n\n\n\n\n\n\n\n\n\nStructured frequency-domain uncertainty\nNot just a single \\Delta(s) but several \\Delta_i(s), i=1,\\ldots,n are considered. Some of them scalar-valued, some of them matrix-valued.\nIn the upper LFT, all the individual \\Delta_is are collected into a single overall \\boldsymbol \\Delta, which then exhibits some structure. Typically it is block-diagonal as in \n\\boldsymbol\\Delta =\n\\begin{bmatrix}\n\\Delta_1& 0 & \\ldots & 0\\\\\n0 & \\Delta_2 & \\ldots & 0\\\\\n\\vdots\\\\\n0 & 0 & \\ldots & \\boldsymbol\\Delta_n\n\\end{bmatrix},\n with each block (including the MIMO blocks) satisfying the usual condition \n\\|\\Delta_i\\|_{\\infty}\\leq 1, \\; i=1,\\ldots, n.\n\n\nStructured singular value (SSV, \\mu, mu)\nWith this structured uncertainty, how does the small gain theorem look like?",
+ "text": "Models of uncertainty\nThere are several approaches to model the uncertainty (or, in other words, to characterize the uncertainty in the model). They all aim – in one way or another – to express that the controller has to deal no only with the single nominal system, for which it was designed, but a family of systems. Depending on the mathematical frameworks used for characterization of such a family, there are two major classes of approaches.\n\nWorst-case models of uncertainty\nProbabilistic models of uncertainty\n\nThe former assumes sets of systems with no additional information about the structure of such sets. The latter imposes some probability structure on the set of systems – in other words, although in principle any member of the set possible, some may be more probable than the others. In this course we are focusing on the former, which is also the mainstream in the robust control literature, but note that the latter we already encountered while considering control for systems exposed to random disturbances, namely the LQG control. A possible viewpoint is that as a consequence of the random disturbance, the controller has to deal with a family of systems.\nAnother classification of models of uncertainty is according to the actual quantity that is uncertain. We distinguish these two\n\nParametric uncertainty\nFrequency-dependent (aka dynamical) uncertainty\n\nUnstructured uncertainty\nStructured uncertainty\n\n\n\nParametric uncertainty\nThis is obviously straightforward to state: some (real/physical) parameters are uncertain. The conceptually simplest way to characterize such uncertain parameters is by considering intervals instead of just single (nominal) values.\n\nExample 1 (A pendulum on a cart) \n\\begin{aligned}\n{\\color{red} m_\\mathrm{l}} & \\in [m_\\mathrm{l}^{-},m_\\mathrm{l}^{+}],\\\\\n{\\color{red} l} & \\in [l^{-}, l^{+}],\n\\end{aligned}\n\n\n\\dot{\\bm x}(t) =\n\\begin{bmatrix}\n0 & 1 & 0 & 0\\\\\n0 & 0 & \\frac{\\textcolor{red}{m_\\mathrm{l}}}{m_\\mathrm{c}} g & 0\\\\\n0 & 0 & 0 & 1\\\\\n0 & 0 & -\\frac{(\\textcolor{red}{m_\\mathrm{l}}+m_\\mathrm{c})g}{m_\\mathrm{c}\\textcolor{red}{l}} & 0\n\\end{bmatrix}\n\\bm x(t)\n+\n\\begin{bmatrix}\n0\\\\\n\\frac{1}{m_\\mathrm{c}}\\\\\n0\\\\\n-\\frac{1}{m_\\mathrm{c}\\textcolor{red}{l}}\n\\end{bmatrix}\nu(t).\n\n\n\n\nUnstructured frequency-dependent uncertainty\nNot only some enumerated physical parameters but even the order of the system can be uncertain. In other words, there may be some phenomena exhibitted by the system that is not captured by the model at all. Possibly some lightly damped modes, possibly some time delay here and there. The system contains uncertain dynamics. In the linear case, all this can be expressed by regarding the magnitude and phase responses uncertain without mapping these to actual physical parameters.\n\n\n\n\n\n\nFigure 1: A whole subsystem is uncertain\n\n\n\nA popular model for the uncertain subsystem is that of a transfer function \\Delta(s), about which we know only that it is stable and that its magnitude is bounded by 1 \\boxed\n{\\sup_{\\omega}|\\Delta(j\\omega)|\\leq 1,\\;\\;\\Delta \\;\\text{stable}. }\n\nBut typically the uncertainty is higher at higher frequencies. This can be expressed by using some weighting function w(\\omega).\nFor later theoretical and computational purposes we approximate the real weighting function using a low-order rational stable transfer function W(s). That is, W(j\\omega)\\approx w(\\omega) for \\omega \\in \\mathbb R, that is for s=j\\omega on the imaginary axis.\nThe ultimate transfer function model of the uncertainty is then\n\\boxed{\nW(s)\\;\\Delta(s),\\quad \\max_{\\omega}|\\Delta(j\\omega)|\\leq 1,\\;\\;\\Delta\\; \\text{stable}. }\n\n\n\\mathcal H_\\infty norm of an LTI system\n\nH-infinity norm of an LTI system interpreted in frequency domain\n\nDefinition 4 (\\mathcal H_\\infty norm of a SISO LTI system) For a stable LTI system G with a single input and single output, the \\mathcal H_\\infty norm is defined as \n\\|G\\|_{\\infty} = \\sup_{\\omega\\in\\mathbb{R}}|G(j\\omega)|.\n\n\n\n\n\n\n\n\nWhy supremum and not maximum?\n\n\n\nSupremum is uses in the definition because it is not guaranteed that the peak value of the magnitude frequency response is attained at a single frequency. Consider an example of a first-order system G(s) = \\frac{s}{Ts+1}. The peak gain of 1/T is not attained at a single finite frequency.\n\n\nHaving just defined the \\mathcal H_\\infty norm, the uncertainty model can be expressed compactly as \\boxed{\nW(s)\\;\\Delta(s),\\quad \\|\\Delta(j\\omega)\\|\\leq 1. }\n\n\n\n\n\n\n\n\\mathcal H_\\infty as a space of functions\n\n\n\n\\mathcal H_\\infty denotes a normed vector space of functions that are analytic in the closed extended right half plane (of the complex plane). In parlance of control systems, \\mathcal H_\\infty is the space of proper and stable transfer functions. Poles on the imaginary axis are not allowed. The functions do not need to be rational, but very often we do restrict ourselves to rational functions, in which case we typically write such space as \\mathcal{RH}_\\infty.\n\n\nWe now extend the concept of the \\mathcal H_\\infty norm to MIMO systems. The extension is perhaps not quite intuitive – certainly it is not computed as the maximum of the norms of individual transfer functions, which may be the first guess.\n\nDefinition 5 (\\mathcal H_\\infty norm of a MIMO LTI system) For a stable LTI system \\mathbf G with multiple inputs and/or multiple outputs, the \\mathcal H_\\infty norm is defined as \n\\|\\mathbf G\\|_{\\infty} = \\sup_{\\omega\\in\\mathbb{R}}\\bar{\\sigma}(\\mathbf{G}(j\\omega))\n where \\bar\\sigma is the largest singular value.\n\nHere we include a short recap of singular values and singular value decomposition (SVD) of a matrix. Consider a matrix \\mathbf M, possibly a rectangular one. It can be decomposed as a product of three matrices \n\\mathbf M = \\mathbf U\n\\underbrace{\n\\begin{bmatrix}\n\\sigma_1 & & & &\\\\\n & \\sigma_2 & & &\\\\\n & &\\sigma_3 & &\\\\\n\\\\\n & & & & \\sigma_n\\\\\n\\end{bmatrix}\n}_{\\boldsymbol\\Sigma}\n\\mathbf V^{*}.\n\nThe two square matrices \\mathbf V and \\mathbf U are unitary, that is, \n\\mathbf V\\mathbf V^*=\\mathbf I=\\mathbf V^*\\mathbf V\n and \n\\mathbf U\\mathbf U^*=\\mathbf I=\\mathbf U^*\\mathbf U.\n\nThe nonnegative diagonal entries \\sigma_i \\in \\mathbb R_+, \\forall i of the (possibly rectangular) matrix \\Sigma are called singular values. Commonly they are ordered in a nonincreasing order, that is \n\\sigma_1\\geq \\sigma_2\\geq \\sigma_3\\geq \\ldots \\geq \\sigma_n.\n\nIt is also a common notation to denote the largest singular value as \\bar \\sigma, that is, \\bar \\sigma \\coloneqq \\sigma_1.\n\n\n\\mathcal{H}_{\\infty} norm of an LTI system interpreted in time domain\nWe can also view the dynamical system G with inputs and outputs as an operator mapping from some chosen space of functions to another space of functions. A popular model for these spaces are the spaces of square-integrable functions, denoted as \\mathcal{L}_2, and sometimes interpreted as bounded-energy signals \nG:\\;\\mathcal{L}_2\\rightarrow \\mathcal{L}_2.\n\nIt is a powerful fact that the \\mathcal{H}_{\\infty} norm of the system is then defined as the induced norm of the corresponding operator \\boxed{\n\\|G(s)\\|_{\\infty} = \\sup_{u(t)\\in\\mathcal{L}_2\\setminus 0}\\frac{\\|y(t)\\|_2}{\\|u(t)\\|_2}}.\n\nWith the energy interpretation of the input and output variables, this system norm can also be interpreted as the worst-case energy gain of the system.\nScaling necessary to get any useful info from MIMO models! See Skogestad’s book, section 1.4, pages 5–8. \n\n\n\nHow does the uncertainty enter the model of the system?\n\nAdditive uncertainty\nThe transfer function of an uncertain system can be written as a sum of a nominal system and an uncertainty \nG(s) = \\underbrace{G_0(s)}_{\\text{nominal model}}+\\underbrace{W(s)\\Delta(s)}_{\\text{additive uncertainty}}.\n\nThe block diagram interpretation is in\n\n\n\n\n\n\nFigure 2: Additive uncertainty\n\n\n\nThe magnitude frequency response of the weighting filter W(s) then serves as an upper bound on the absolute error in the magnitude frequency responses \n|G(j\\omega)-G_0(j\\omega)|<|W(j\\omega)|\\quad \\forall \\omega\\in\\mathbb R.\n\n\n\nMultiplicative uncertainty\n\nG(s) = (1+W(s)\\Delta(s))\\,G_0(s).\n\nThe block diagram interpretation is in\n\n\n\n\n\n\nFigure 3: Multiplicative uncertainty\n\n\n\n\n\n\n\n\n\nFor SISO transfer functions no need to bother about the order of terms in the products\n\n\n\nSice we are considering SISO transfer functions, the order of terms in the products is not important. We will have to be more alert to the order of terms when we move to MIMO systems.\n\n\nThe magnitude frequency response of the weighting filter W(s) then serves as an upper bound on the relative error in the magnitude frequency responses \\boxed\n{\\frac{|G(j\\omega)-G_0(j\\omega)|}{|G_0(j\\omega)|}<|W(j\\omega)|\\quad \\forall \\omega\\in\\mathbb R.}\n\n\nExample 2 (Uncertain first-order delayed system) We consider a first-order system with a delay described by \nG(s) = \\frac{k}{T s+1}e^{-\\theta s}, \\qquad 2\\leq k,\\tau,\\theta,\\leq 3.\n\nWe now need to choose the nominal model G_0(s) and then the uncertainty weighting filter W(s). The nominal model corresponds to the nominal values of the parameters, therefore we must choose these. There is no single correct way to do this. Perhaps the most intuitive way is to choose the nominal values as the average of the bounds. But we can also choose the nominal values in a way that makes the nominal system simple. For example, for this system with a delay, we can even choose the nominal value of the delay as zero, which makes the nominal system a first-order system without delay, hence simple enough for application of some basic linear control system design methods. Of course, the price to pay is that the resulting model of an uncertain system, which is actually a set of systems, contains even models of a plant that were not prescribed.\n\n\nShow the code\nusing ControlSystems\nusing Plots\n\nfunction uncertain_first_order_delayed()\n kmin = 2; kmax = 3; \n θmin = 2; θmax = 3; \n Tmin = 2; Tmax = 3;\n\n k₀ = (kmin+kmax)/2; \n θ₀ = (θmin+θmax)/2;\n θ₀ = 0 \n T₀ = (Tmin+Tmax)/2;\n\n G₀ = tf(k₀,[T₀, 1])*delay(θ₀) \n\n ω = exp10.(range(-2, 2, length=50))\n G₀ω = freqresp(G₀,ω)\n G₀ω = vec(G₀ω)\n\n EEω_db = [];\n for k in range(kmin,kmax,length=10)\n for θ in range(θmin,θmax,length=10)\n for T in range(Tmin,Tmax,length=10)\n G = tf(k,[T, 1])*delay(θ) \n Gω = freqresp(G,ω)\n Gω = vec(Gω) \n Eω = abs.(Gω-G₀ω)./abs.(G₀ω)\n Eω_db = 20 * log10.(Eω)\n push!(EEω_db,Eω_db)\n end\n end\n end\n\n plot(ω,EEω_db,xscale=:log10,label=\"\",ylims=(-40,20))\n xlabel!(\"Frequency [rad/s]\")\n ylabel!(\"Relative error [dB]\")\nend\nuncertain_first_order_delayed()\n\n\nPrecompiling ControlSystems\n ✓ ArrayLayouts\n ✓ ArrayLayouts → ArrayLayoutsSparseArraysExt\n ✓ MatrixFactorizations\n ✓ LazyArrays\n ✓ BandedMatrices\n ✓ BandedMatrices → BandedMatricesSparseArraysExt\n ✓ MatrixFactorizations → MatrixFactorizationsBandedMatricesExt\n ✓ ArrayInterface → ArrayInterfaceBandedMatricesExt\n ✓ LazyArrays → LazyArraysBandedMatricesExt\n ✓ FiniteDiff → FiniteDiffBandedMatricesExt\n ✓ LazyArrays → LazyArraysStaticArraysExt\n ✓ FastAlmostBandedMatrices\n ✓ LinearSolve\n ✓ LinearSolve → LinearSolveFastAlmostBandedMatricesExt\n ✓ NonlinearSolve\n ✓ NonlinearSolve → NonlinearSolveNLsolveExt\n ✓ NonlinearSolve → NonlinearSolveBandedMatricesExt\n ✓ DiffEqCallbacks\n ✓ OrdinaryDiffEq\n ✓ DelayDiffEq\n ✓ ControlSystems\n 21 dependencies successfully precompiled in 89 seconds. 203 already precompiled.\n[ Info: Precompiling SciMLBaseMakieExt [565f26a4-c902-5eae-92ad-e10714a9d9de]\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nNow we need to find some upper bound on the relative error. Simplicity is a virtue here too, hence we are looking for a rational filter of very low order, say 1 or 2. Speaking of the first-order filter, one useful way to format it is \n\\boxed{\nW(s) = \\frac{\\tau s+r_0}{(\\tau/r_{\\infty})s+1}}\n where r_0 is uncertainty at steady state, 1/\\tau is the frequency, where the relative uncertainty reaches 100%, r_{\\infty} is relative uncertainty at high frequencies, often r_{\\infty}\\geq 2.\nFor our example, the parameters of the filter are in the code below and the frequency response follows.\n\n\nShow the code\nusing ControlSystems\nusing Plots\n\nfunction uncertain_first_order_delayed_with_weights()\n kmin = 2; kmax = 3; \n θmin = 2; θmax = 3; \n Tmin = 2; Tmax = 3;\n\n k₀ = (kmin+kmax)/2; \n θ₀ = (θmin+θmax)/2;\n θ₀ = 0 \n T₀ = (Tmin+Tmax)/2;\n\n G₀ = tf(k₀,[T₀, 1])*delay(θ₀) \n\n ω = exp10.(range(-2, 2, length=50))\n G₀ω = freqresp(G₀,ω)\n G₀ω = vec(G₀ω)\n\n EEω_db = [];\n for k in range(kmin,kmax,length=10)\n for θ in range(θmin,θmax,length=10)\n for T in range(Tmin,Tmax,length=10)\n G = tf(k,[T, 1])*delay(θ) \n Gω = freqresp(G,ω)\n Gω = vec(Gω) \n Eω = abs.(Gω-G₀ω)./abs.(G₀ω)\n Eω_db = 20 * log10.(Eω)\n push!(EEω_db,Eω_db)\n end\n end\n end\n\n plot(ω,EEω_db,xscale=:log10,label=\"\",ylims=(-40,20))\n xlabel!(\"Frequency [rad/s]\")\n ylabel!(\"Relative error [dB]\")\n\n τ = 1/0.25\n r₀ = 0.2\n r∞ = 10^(8/20)\n W = tf([τ, r₀],[τ/r∞, 1])\n magW, phaseW = bode(W,ω)\n plot!(ω,20*log10.(vec(magW)),xscale=:log10,lw=3,color=:red,label=\"W\")\n\n # W2 = W*tf([1, 1.6, 1],[1, 1.4, 1]); \n # magW2, phaseW2 = bode(W2,ω)\n # plot!(ω,20*log10.(vec(magW2)),xscale=:log10,lw=3,color=:blue,label=\"W₂\")\nend\nuncertain_first_order_delayed_with_weights()\n\n\n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nObviously the filter does not capture the family of systems perfectly. It is now up to the control engineer to decide if this is a problem. If yes, if the control design should be really robust against all uncertainties in the considered set, some more complex (higher-order) filter is needed to described the uncertainty more accurately. The source code shows (in commented lines) one particular candidate, but in general the whole problem boils down to designing a stable filter with a prescribed magnitude frequency response.\n\n\n\nInverse additive uncertainty\n…\n\n\nInverse multiplicative uncertainty\n…\n\n\nLinear fractional transformation (LFT)\nFor a matrix \\mathbf P sized (n_1+n_2)\\times(m_1+m_2) and divided into blocks like \n\\mathbf P=\n\\begin{bmatrix}\n\\mathbf P_{11} & \\mathbf P_{12}\\\\\n\\mathbf P_{21} & \\mathbf P_{22}\n\\end{bmatrix},\n and a matrix \\mathbf K sized m_2\\times n_2, the lower LFT of \\mathbf P with respect to \\mathbf K is \n\\boxed{\n\\mathcal{F}_\\mathbf{l}(\\mathbf P,\\mathbf K) = \\mathbf P_{11}+\\mathbf P_{12}\\mathbf K(\\mathbf I-\\mathbf P_{22}\\mathbf K)^{-1}\\mathbf P_{21}}.\n\nIt can be viewed as a feedback interconnection of the plant \\mathbf P and the controller \\mathbf K, in which not all plant inputs are used as control inputs and not all plant outputs are measured, as depicted in Figure 4\n\n\n\n\n\n\nFigure 4: Lower LFT of \\mathbf P with respect to \\mathbf K\n\n\n\nSimilarly, for a matrix \\mathbf N sized (n_1+n_2)\\times(m_1+m_2) and a matrix \\boldsymbol\\Delta sized m_1\\times n_1, the upper LFT of \\mathbf N with respect to \\mathbf K is \n\\boxed{\n\\mathcal{F}_\\mathbf{u}(\\mathbf N,\\boldsymbol\\Delta) = \\mathbf N_{22}+\\mathbf N_{21}\\boldsymbol\\Delta(\\mathbf I-\\mathbf N_{11}\\boldsymbol\\Delta)^{-1}\\mathbf N_{12}}.\n\nIt can be viewed as a feedback interconnection of the nominal plant \\mathbf N and the uncertainty block \\boldsymbol\\Delta, as depicted in Figure 5\n\n\n\n\n\n\nFigure 5: Upper LFT of \\mathbf N with respect to \\boldsymbol \\Delta\n\n\n\nHere we already anticipated MIMO uncertainty blocks. One motivation for them is explained in the very next section on structured uncertainties, another one is appearing once we start formulating robust performance within the same analytical framework as robust stability.\n\n\n\n\n\n\nWhich is lower and which is upper is a matter of convention, but a useful one\n\n\n\nOur usage of the lower LFT for a feedback interconnection of a (generalized) plant and a controller and the upper LFT for a feedback interconnection of a nominal system and and uncertainty is completely arbitrary. We could easily use the lower LFT for the uncertainty. But it is a convenient convention to adhere to. The more so that it allows for the combination of both as in the diagram Figure 6 below, which corresponds to composition of the two LFTs.\n\n\n\n\n\n\nFigure 6: Combination of the lower and upper LFT\n\n\n\n\n\n\n\n\n\nStructured frequency-domain uncertainty\nNot just a single \\Delta(s) but several \\Delta_i(s), i=1,\\ldots,n are considered. Some of them scalar-valued, some of them matrix-valued.\nIn the upper LFT, all the individual \\Delta_is are collected into a single overall \\boldsymbol \\Delta, which then exhibits some structure. Typically it is block-diagonal as in \n\\boldsymbol\\Delta =\n\\begin{bmatrix}\n\\Delta_1& 0 & \\ldots & 0\\\\\n0 & \\Delta_2 & \\ldots & 0\\\\\n\\vdots\\\\\n0 & 0 & \\ldots & \\boldsymbol\\Delta_n\n\\end{bmatrix},\n with each block (including the MIMO blocks) satisfying the usual condition \n\\|\\Delta_i\\|_{\\infty}\\leq 1, \\; i=1,\\ldots, n.\n\n\nStructured singular value (SSV, \\mu, mu)\nWith this structured uncertainty, how does the small gain theorem look like?",
"crumbs": [
"11. Uncertainty modelling and robustness analysis",
"Uncertainty (in) modelling"
diff --git a/sitemap.xml b/sitemap.xml
index ed4aa91..bbe9133 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -10,7 +10,7 @@
https://hurak.github.io/orr/cont_dp_HJB.html
- 2024-07-06T09:44:49.845Z
+ 2024-07-07T16:24:57.023Zhttps://hurak.github.io/orr/opt_theory_references.html
@@ -106,7 +106,7 @@
https://hurak.github.io/orr/cont_dp_LQR.html
- 2024-07-04T15:34:28.912Z
+ 2024-07-06T11:08:59.518Zhttps://hurak.github.io/orr/dynamic_programming_references.html
diff --git a/uncertainty 26.html b/uncertainty 26.html
new file mode 100644
index 0000000..71a0a9e
--- /dev/null
+++ b/uncertainty 26.html
@@ -0,0 +1,1155 @@
+
+
+
+
+
+
+
+
+
+Uncertainty modelling – B(E)3M35ORR – Optimal and Robust Control
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+