Built site for gh-pages

hurak · Jul 6, 2024 · 74c1786 · 74c1786
1 parent c4f1f94
commit 74c1786
Show file tree

Hide file tree

Showing 75 changed files with 5,382 additions and 3,764 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-103e56eb
+03b73c09
diff --git a/cont_dp_DDP.html b/cont_dp_DDP.html
diff --git a/cont_dp_DPP.html b/cont_dp_DPP.html
@@ -7,7 +7,7 @@
 <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
 
 
-<title>cont_dp_dpp – B(E)3M35ORR – Optimal and Robust Control</title>
+<title>Differential dynamic programming (DDP) – B(E)3M35ORR – Optimal and Robust Control</title>
 <style>
 code{white-space: pre-wrap;}
 span.smallcaps{font-variant: small-caps;}
@@ -492,6 +492,9 @@
   </div>
 </li>
           <li class="sidebar-item">
+ <span class="menu-text">cont_dp_DDP.qmd</span>
+  </li>
+          <li class="sidebar-item">
   <div class="sidebar-item-container"> 
   <a href="./cont_dp_references.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">References</span></a>
@@ -691,9 +694,26 @@
 <!-- main -->
 <main class="content" id="quarto-document-content">
 
+<header id="title-block-header" class="quarto-title-block default">
+<div class="quarto-title">
+<h1 class="title">Differential dynamic programming (DDP)</h1>
+</div>
+
+
+
+<div class="quarto-title-meta">
 
+
+
+
+  </div>
+
+
+
+</header>
 
 
+<p>blabla</p>
 
 
 
@@ -790,7 +810,7 @@
   }
     var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//);
     var mailtoRegex = new RegExp(/^mailto:/);
-      var filterRegex = new RegExp("https:\/\/hurak\.github\.io\/orr\/");
+      var filterRegex = new RegExp('/' + window.location.host + '/');
     var isInternal = (href) => {
         return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href);
     }

diff --git a/cont_dp_HJB.html b/cont_dp_HJB.html
@@ -533,6 +533,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
@@ -728,8 +734,15 @@
 </nav>
 <div id="quarto-sidebar-glass" class="quarto-sidebar-collapse-item" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item"></div>
 <!-- margin-sidebar -->
-    <div id="quarto-margin-sidebar" class="sidebar margin-sidebar zindex-bottom">
-
+    <div id="quarto-margin-sidebar" class="sidebar margin-sidebar">
+        <nav id="TOC" role="doc-toc" class="toc-active">
+    <h2 id="toc-title">On this page</h2>
+
+  <ul>
+  <li><a href="#hamilton-jacobi-bellman-hjb-equation" id="toc-hamilton-jacobi-bellman-hjb-equation" class="nav-link active" data-scroll-target="#hamilton-jacobi-bellman-hjb-equation">Hamilton-Jacobi-Bellman (HJB) equation</a></li>
+  <li><a href="#hjb-equation-and-hamiltonian" id="toc-hjb-equation-and-hamiltonian" class="nav-link" data-scroll-target="#hjb-equation-and-hamiltonian">HJB equation and Hamiltonian</a></li>
+  </ul>
+<div class="toc-actions"><ul><li><a href="https://github.com/hurak/orr/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div></nav>
     </div>
 <!-- main -->
 <main class="content" id="quarto-document-content">
@@ -762,6 +775,8 @@ <h1 class="title">Dynamic programming for continuous-time optimal control</h1>
 <p>Optionally we can also consider constraints on the state at the final time (be it a particular value or some set of values) <span class="math display">
 \psi(\bm x(t_\mathrm{f}),t_\mathrm{f})=0.
 </span></p>
+<section id="hamilton-jacobi-bellman-hjb-equation" class="level2">
+<h2 class="anchored" data-anchor-id="hamilton-jacobi-bellman-hjb-equation">Hamilton-Jacobi-Bellman (HJB) equation</h2>
 <p>We now consider an arbitrary time <span class="math inline">t</span> and split the (remaining) time interval <span class="math inline">[t,t_\mathrm{f}]</span> into two parts <span class="math inline">[t,t+\Delta t]</span> and <span class="math inline">[t+\Delta t,t_\mathrm{f}]</span> , and structure the cost function accordingly <span class="math display">
 J(\bm x(t),\bm u(\cdot),t) = \int_{t}^{t+\Delta t} L(\bm x,\bm u,\tau)\,\mathrm{d}\tau + \underbrace{\int_{t+\Delta t}^{t_\mathrm{f}} L(\bm x,\bm u,\tau)\,\mathrm{d}\tau + \phi(\bm x(t_\mathrm{f}),t_\mathrm{f})}_{J(\bm x(t+\Delta t), \bm u(t+\Delta t), t+\Delta t)}.
 </span></p>
@@ -780,16 +795,21 @@ <h1 class="title">Dynamic programming for continuous-time optimal control</h1>
 -\frac{\partial {\color{blue}J^\star (\bm x(t),t)}}{\partial t} = \min_{\bm u(t)}\left[L(\bm x(t),\bm u(t),t)+(\nabla_{\bm x} {\color{blue} J^\star (\bm x(t),t)})^\top \bm f(\bm x(t),\bm u(t),t)\right].}
 </span></p>
 <p>This is obviously a partial differential equation (PDE) for the optimal cost function <span class="math inline">J^\star(\bm x,t)</span>.</p>
-<p>And since this is a differential equation, boundary value(s) must also be specified. In particular, the optimal cost function must be specified at the final state and the final time, i.e. <span class="math display">
+<p>And since this is a differential equation, boundary value(s) must be specified to determine a unique solution. In particular, since the equation is first-order with respect to both time and state, specifying the value of the optimal cost function at the final state and the final time is enough. With the general final-state constraints we have introduced above, the boundary value condition reads <span class="math display">
 J^\star (\bm x(t_\mathrm{f}),t_\mathrm{f}) = \phi(\bm x(t_\mathrm{f}),t_\mathrm{f}),\qquad \text{on the hypersurface } \psi(\bm x(t_\mathrm{f}),t_\mathrm{f}) = 0.
 </span></p>
-<p>By the way, recall the definition of Hamiltonian <span class="math inline">H(\bm x,\bm u,\bm \lambda,t) = L(\bm x,\bm u,t) + \boldsymbol{\lambda}^\top \mathbf f(\bm x,\bm u,t)</span>. The HJB equation can also be written as <span class="math display">\boxed
+<p>Note that this includes as special cases the fixed-final-state and free-final-state cases.</p>
+</section>
+<section id="hjb-equation-and-hamiltonian" class="level2">
+<h2 class="anchored" data-anchor-id="hjb-equation-and-hamiltonian">HJB equation and Hamiltonian</h2>
+<p>Recall the definition of Hamiltonian <span class="math inline">H(\bm x,\bm u,\bm \lambda,t) = L(\bm x,\bm u,t) + \boldsymbol{\lambda}^\top \mathbf f(\bm x,\bm u,t)</span>. The HJB equation can also be written as <span class="math display">\boxed
 {-\frac{\partial J^\star (\bm x(t),t)}{\partial t} = \min_{\bm u(t)}H(\bm x(t),\bm u(t),\nabla_{\bm x} J^\star (\bm x(t),t),t).}
 </span></p>
 <p>What we have just derived is one of the most profound results in optimal control – Hamiltonian must be minimized by the optimal control. We will exploit it next for some derivations.</p>
 <p>Recall also that we have already encountered a similar results that made statements about the necessary maximization (or minimization) of the Hamiltonian with respect to the control – the celebrated Pontryagin’s principle of maximum (or minimum).</p>
 
 
+</section>
 
 <a onclick="window.scrollTo(0, 0); return false;" role="button" id="quarto-back-to-top"><i class="bi bi-arrow-up"></i> Back to top</a></main> <!-- /main -->
 <script id="quarto-html-after-body" type="application/javascript">
@@ -1232,7 +1252,7 @@ <h1 class="title">Dynamic programming for continuous-time optimal control</h1>
     </div>   
     <div class="nav-footer-center">
 <p>Copyright 2024, Zdeněk Hurák</p>
-<div class="toc-actions"><ul><li><a href="https://github.com/hurak/orr/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div></div>
+<div class="toc-actions d-sm-block d-md-none"><ul><li><a href="https://github.com/hurak/orr/issues/new" class="toc-action"><i class="bi bi-github"></i>Report an issue</a></li></ul></div></div>
     <div class="nav-footer-right">
       &nbsp;
     </div>

diff --git a/cont_dp_LQR.html b/cont_dp_LQR.html
@@ -30,7 +30,7 @@
 <script src="site_libs/quarto-search/fuse.min.js"></script>
 <script src="site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="./">
-<link href="./cont_dp_references.html" rel="next">
+<link href="./cont_dp_DDP.html" rel="next">
 <link href="./cont_dp_HJB.html" rel="prev">
 <script src="site_libs/quarto-html/quarto.js"></script>
 <script src="site_libs/quarto-html/popper.min.js"></script>
@@ -533,6 +533,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link active">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
@@ -753,14 +759,15 @@ <h1 class="title">Using HJB equation to solve the continuous-time LQR problem</h
 </header>
 
 
-<p>As we have already discussed a couple of times, in the LQR problem wwe consider a linear (and time invariant) system modelled by <span class="math display">
-\dot{\bm x}(t) = \mathbf A\bm x(t) + \mathbf B\bm u(t)
+<p>As we have already discussed a couple of times, in the LQR problem we consider a linear time invariant (LTI) system modelled by <span class="math display">
+\dot{\bm x}(t) = \mathbf A\bm x(t) + \mathbf B\bm u(t),
 </span> and the quadratic cost function <span class="math display">
 J(\bm x(t_\mathrm{i}),\bm u(\cdot), t_\mathrm{i}) = \frac{1}{2}\bm x^\top(t_\mathrm{f})\mathbf S_\mathrm{f}\bm x(t_\mathrm{f}) + \frac{1}{2}\int_{t_\mathrm{i}}^{t_\mathrm{f}}\left(\bm x^\top \mathbf Q\bm x + \bm u^\top \mathbf R \bm u\right)\mathrm{d}t.
 </span></p>
 <p>The Hamiltonian is <span class="math display">
-H(\bm x,\bm u,\bm \lambda) = \frac{1}{2}\left(\bm x^\top \mathbf Q\bm x + \bm u^\top \mathbf R \bm u\right) + \boldsymbol{\lambda}^\top \left(\mathbf A\bm x + \mathbf B\bm u\right)
-</span> and according to the HJB equation our goal is to minimize <span class="math inline">H</span> at a given time <span class="math inline">t</span>, which enforces the condition on its gradient <span class="math display">
+H(\bm x,\bm u,\bm \lambda) = \frac{1}{2}\left(\bm x^\top \mathbf Q\bm x + \bm u^\top \mathbf R \bm u\right) + \boldsymbol{\lambda}^\top \left(\mathbf A\bm x + \mathbf B\bm u\right).
+</span></p>
+<p>According to the HJB equation our goal is to minimize <span class="math inline">H</span> at a given time <span class="math inline">t</span>, which enforces the condition on its gradient <span class="math display">
 \mathbf 0 = \nabla_{\bm u} H = \mathbf R\bm u + \mathbf B^\top \boldsymbol\lambda,
 </span> from which it follows that the optimal control must necessarily satisfy <span class="math display">
 \bm u^\star = -\mathbf R^{-1} \mathbf B^\top \boldsymbol\lambda.
@@ -771,7 +778,7 @@ <h1 class="title">Using HJB equation to solve the continuous-time LQR problem</h
 <p>The minimized Hamiltonian is <span class="math display">
 \min_{\bm u(t)}H(\bm x, \bm u, \bm \lambda) = \frac{1}{2}\bm x^\top \mathbf Q \bm x + \boldsymbol\lambda^\top \mathbf A \bm x - \frac{1}{2}\boldsymbol\lambda^\top \mathbf B\mathbf R^{-1}\mathbf B^\top \boldsymbol\lambda
 </span></p>
-<p>Setting <span class="math inline">\boldsymbol\lambda = (\nabla_{\bm x} J^\star)^\top</span>, the HJB equation is <span class="math display">\boxed
+<p>Setting <span class="math inline">\boldsymbol\lambda = \nabla_{\bm x} J^\star</span>, the HJB equation is <span class="math display">\boxed
 {-\frac{\partial J^\star}{\partial t} = \frac{1}{2}\bm x^\top \mathbf Q \bm x + (\nabla_{\bm x} J^\star)^\top \mathbf A\bm x - \frac{1}{2}(\nabla_{\bm x} J^\star)^\top \mathbf B\mathbf R^{-1}\mathbf B^\top \nabla_{\bm x} J^\star,}
 </span> and the boundary condition is <span class="math display">
 J^\star(\bm x(t_\mathrm{f}),t_\mathrm{f}) = \frac{1}{2}\bm x^\top (t_\mathrm{f})\mathbf S_\mathrm{f}\bm x(t_\mathrm{f}).
@@ -803,11 +810,16 @@ <h1 class="title">Using HJB equation to solve the continuous-time LQR problem</h
 <p><span class="math display">
 -\bm x^\top \dot{\mathbf{S}} \bm x = \frac{1}{2} \bm x^\top \left[\mathbf Q + \mathbf S \mathbf A + \mathbf A^\top \mathbf S  - \mathbf S \mathbf B\mathbf R^{-1}\mathbf B^\top \mathbf S \right ] \bm x.
 </span></p>
-<p>Finally, since the above single (scalar) equation should hold for all <span class="math inline">\bm x(t)</span>, the matrix equation must hold too, and we get the familiar differential Riccati equation <span class="math display">\boxed
-{-\dot{\mathbf S}(t) = \mathbf A^\top \mathbf S(t) + \mathbf S(t)\mathbf A - \mathbf S(t)\mathbf B\mathbf R^{-1}\mathbf B^\top \mathbf S(t) + \mathbf Q.}
-</span></p>
-<p>We also get the optimal control <span class="math display">\boxed
-{\bm u^\star(t) = - \underbrace{\mathbf R^{-1}\mathbf B^\top \mathbf S(t)}_{\bm K(t)}\bm x(t).}
+<p>Finally, since the above single (scalar) equation should hold for all <span class="math inline">\bm x(t)</span>, the matrix equation must hold too, and we get the familiar differential Riccati equation for the matrix variable <span class="math inline">\mathbf S(t)</span> <span class="math display">\boxed
+{-\dot{\mathbf S}(t) = \mathbf A^\top \mathbf S(t) + \mathbf S(t)\mathbf A - \mathbf S(t)\mathbf B\mathbf R^{-1}\mathbf B^\top \mathbf S(t) + \mathbf Q}
+</span> initialized at the final time <span class="math inline">t_\mathrm{f}</span> by <span class="math inline">\mathbf S(t_\mathrm{f}) = \mathbf S_\mathrm{f}</span>.</p>
+<p>Having obtained <span class="math inline">\mathbf S(t)</span>, we can get the optimal control by substituting it into <span class="math display">\boxed
+{
+\begin{aligned}
+    \bm u^\star(t) &amp;= - \mathbf R^{-1}\mathbf B^\top \nabla_{\bm x} J^\star(\bm x(t),t) \\
+                   &amp;= - \underbrace{\mathbf R^{-1}\mathbf B^\top \mathbf S(t)}_{\bm K(t)}\bm x(t).
+\end{aligned}
+}
 </span></p>
 <p>We have just rederived the continuous-time LQR problem using the HJB equation (previously we did it by massaging the two-point boundary value problem that followed as the necessary condition of optimality from the techniques of calculus of variations).</p>
 <p>Note that we have also just seen the equivalence between a first-order linear PDE and first-order nonlinear ODE.</p>
@@ -1242,8 +1254,8 @@ <h1 class="title">Using HJB equation to solve the continuous-time LQR problem</h
       </a>          
   </div>
   <div class="nav-page nav-page-next">
-      <a href="./cont_dp_references.html" class="pagination-link" aria-label="References">
-        <span class="nav-page-text">References</span> <i class="bi bi-arrow-right-short"></i>
+      <a href="./cont_dp_DDP.html" class="pagination-link" aria-label="Differential dynamic programming (DDP)">
+        <span class="nav-page-text">Differential dynamic programming (DDP)</span> <i class="bi bi-arrow-right-short"></i>
       </a>
   </div>
 </nav>

diff --git a/cont_dp_references.html b/cont_dp_references.html
@@ -51,7 +51,7 @@
 <script src="site_libs/quarto-search/quarto-search.js"></script>
 <meta name="quarto:offset" content="./">
 <link href="./ext_stochastic_LQR.html" rel="next">
-<link href="./cont_dp_LQR.html" rel="prev">
+<link href="./cont_dp_DDP.html" rel="prev">
 <script src="site_libs/quarto-html/quarto.js"></script>
 <script src="site_libs/quarto-html/popper.min.js"></script>
 <script src="site_libs/quarto-html/tippy.umd.min.js"></script>
@@ -510,6 +510,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 
@@ -1170,8 +1176,8 @@ <h1 class="title">References</h1>
 </script>
 <nav class="page-navigation">
   <div class="nav-page nav-page-previous">
-      <a href="./cont_dp_LQR.html" class="pagination-link" aria-label="Using HJB equation to solve the continuous-time LQR problem">
-        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Using HJB equation to solve the continuous-time LQR problem</span>
+      <a href="./cont_dp_DDP.html" class="pagination-link" aria-label="Differential dynamic programming (DDP)">
+        <i class="bi bi-arrow-left-short"></i> <span class="nav-page-text">Differential dynamic programming (DDP)</span>
       </a>          
   </div>
   <div class="nav-page nav-page-next">

diff --git a/cont_indir_CARE.html b/cont_indir_CARE.html
@@ -490,6 +490,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 

diff --git a/cont_indir_LQR_fin_horizon.html b/cont_indir_LQR_fin_horizon.html
@@ -490,6 +490,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 

diff --git a/cont_indir_LQR_inf_horizon.html b/cont_indir_LQR_inf_horizon.html
@@ -490,6 +490,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 

diff --git a/cont_indir_Pontryagin.html b/cont_indir_Pontryagin.html
@@ -490,6 +490,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 

diff --git a/cont_indir_calculus_of_variations.html b/cont_indir_calculus_of_variations.html
@@ -490,6 +490,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 

diff --git a/cont_indir_constrained.html b/cont_indir_constrained.html
@@ -490,6 +490,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container"> 

diff --git a/cont_indir_overview.html b/cont_indir_overview.html
@@ -533,6 +533,12 @@
   <a href="./cont_dp_LQR.html" class="sidebar-item-text sidebar-link">
  <span class="menu-text">Using HJB equation to solve the continuous-time LQR problem</span></a>
   </div>
+</li>
+          <li class="sidebar-item">
+  <div class="sidebar-item-container"> 
+  <a href="./cont_dp_DDP.html" class="sidebar-item-text sidebar-link">
+ <span class="menu-text">Differential dynamic programming (DDP)</span></a>
+  </div>
 </li>
           <li class="sidebar-item">
   <div class="sidebar-item-container">