nonlinear_solvers.tex


The Nonlinear Analysis product area provides the top level algorithms for a computational simulation or design study.
These include nonlinear solvers, time integration, bifurcation tracking, parameter continuation, optimization, and uncertainty quantification.
A common theme of this collection is the philosophy of ``analysis beyond simulation'', which aims to automate many computational tasks that are often performed by application code users by trial-and-error or repeated simulation.
Tasks that can be automated include performing parameter studies, sensitivity analysis, calibration, optimization, time step size control, and locating instabilities.
This capability area additionally includes utilities for the nonlinear analysis. These include automatic differentiation tools that can provide the derivatives critical to the analysis algorithms and the abstraction layers and interfaces for application callbacks.

\subsection{Thyra}
\todo{Bartlett}

\subsection{NOX}
NOX provides algortihms for solving large\-scale sets of nonlinear equations.
The library contains abstractions for solvers, directions and line searches that allow users to customize their code.
The stopping criteria has a set 
Methods include Newton-based algorithms such as inexact Newton, matrix-free Newton-Krylov, line-search methods, trust-region methods, tensor methods, and homotopy methods.
\todo{Pawlowski, finish this}

\subsection{LOCA}
\todo{Phipps}

\subsection{Tempus}

Tempus (Latin meaning time as in “tempus fugit” -> “time flies”)
is the Trilinos time-integration package for advanced transient
analysis.  It includes various time integrators and embedded
sensitivity analysis for next-generation code architectures.  Tempus
provides “out-of-the-box” time-integration capabilities, which
allows users to quickly and easily incorporate time-integration
capabilities to their applications and switch between various time
integrators depending on the simulation needs.  Additionally, Tempus
provides “build-your-own” capabilities, which allows applications
to incorporate various Tempus components to augment or replace
application transient capabilities. Other capabilities include
embedded error analysis, sensitivity analysis, transient optimization
with ROL.

Tempus provides a general infrastructure for the time evolution of
solutions through a variety of general integration schemes.  Tempus
provides time integrators for explicit and implicit methods and for
first- and second-order ODEs.  It can be used from small systems of
equations (e.g., single ODEs for the time evolution of plasticity
models, and multiple ODEs for coupled chemical reactions) to
large-scale transient simulations requiring exascale computing
(e.g., flow fields around reentry vehicles and magneto-hydrodynamics).

Tempus has several components that can be used in concert or
individually, depending on the needs of the application.
\begin{itemize}
  \item Integrators are the time-loop structure for time integration
  and provide several features, e.g., control the advancement of
  the solution, selection of the next timestep size and solution
  output.

  \item Time Steppers are individual methods that advance the
  solution from one step to the next.  A variety of time steppers
  are available:
  \begin{itemize}
    \item Classic one-step methods (e.g., Forward Euler and Trapezoidal
    Method)
    \item Explicit Runge-Kutta methods (e.g., RK Explicit 4 Stage)
    \item Diagonally Implicit Runge-Kutta (DIRK) Methods (e.g.,
    general tableau DIRK and many specific DIRK/SDIRK methods)
    \item Implicit-Explicit (IMEX) Runge-Kutta Methods (e.g., IMEX
    RK SSP2, IMEX RK SSP3, and general tableau IMEX RK methods)
    \item Multi-Step Methods (i.e., BDF2)
    \item Second-order ODE Methods (e.g., Leapfrog, Newmark methods
    and HHT-Alpha)
    \item Steppers with subSteppers (e.g., operator-split and
    subcycling methods)
  \end{itemize}

  \item Solution History is used to maintain the solution during
  time-step failure, solution restart/output, interpolation of
  solution between time steps, and to provide the solution for
  transient adjoint sensitivities.

  \item Timestep Control and Strategies provide methods to select
  the time-step size based on user input and/or temporal error
  control (e.g., bounding min/max time-step size, relative/absolute
  maximum error, and timestep adjustments for output and checkpointing)

\end{itemize}
Additionally, Tempus has several mechanisms which allow users to
insert application-specific algorithms into Tempus components (e.g.,
through observers and creating derived classes).

\subsection{Piro}

Piro~\cite{osti_1231283} is the top-level, unifying package of the embedded nonlinear analysis capability area. 
The main purpose of the package is to provide driver classes for the common uses of Trilinos nonlinear analysis tools. 
These drivers all can be constructed similarly, with a \lstinline{Thyra::ModelEvaluator} and a \lstinline{Teuchos::ParameterList}, 
to make it simple to switch between different types of analysis. 
They also all inherit from the same base classes (response-only model evaluators) so that the resulting analysis can 
in turn be driven by non-intrusive analysis routines.

\todo{Perego: To be continued/rewritten}

\subsection{ROL}
Rapid Optimization Library (ROL) \cite{rol,ROL2022ICCOPT} is the
Trilinos library for numerical optimization. ROL brings an extensive
collection of state-of-the-art optimization algorithms to virtually
any application. Its programming interface supports any computational
hardware, including heterogeneous many-core systems with digital and
analog accelerators.
ROL has been used with great success for optimal control, optimal design,
inverse problems, image processing and mesh optimization, in application
areas including geophysics, climate science \cite{Perego2022}, structural
dynamics, fluid dynamics \cite{Antil2023}, electromagnetics, quantum
computing, hypersonics and geospatial imaging \cite{Kouri2014}.

The key design, performance and algorithmic features of ROL can be
summarized as follows:

\begin{itemize}
\item
\emph{Vector abstractions and matrix-free interface for universal applicability}.
Similar to other Trilinos packages that implement abstract numerical
algorithms, ROL's design is centered around an abstract linear algebra
interface, through the ROL::Vector class, which enables the use of any
ROL~algorithm with any data type, such as the C++ std::vector, MPI-parallel
data structures (e.g., Epetra, Tpetra, PETSc, HYPRE vectors), GPU-supporting data
structures (e.g., Kokkos, ArrayFire), etc.  The abstract linear algebra
interface further enables the definition of custom vector dot products,
which are critical for performance in large-scale applications.
Finally, all ROL algorithms are matrix-free and rely on user-defined
applications of linear and nonlinear operators.  ROL's vector class design is
inspired by the Hilbert Class Library \cite{hcl}, the Rice Vector Library \cite{rvl}
and the RTOp framework \cite{rtop}, while ROL's operator interfaces for
simulation-based optimization, known as the SimOpt middleware,
are related to \cite{Heinkenschloss1999}.
\item
\emph{Modern algorithms for unconstrained and constrained optimization}.
ROL features a large collection of well-established algorithms for smooth
optimization as well as several novel algorithms developed by the ROL team.
ROL categorizes optimization problems into 4 basic types, based on the presence of
certain constraints: Type U (unconstrained), Type B (bound constraints),
Type E (equality constraints) and Type G (general constraints). Additionally,
each problem type can include linear constraints, which ROL can reduce or eliminate
algorithmically. The novel algorithms include augmented Lagrangian methods for
infinite-dimensional problems with general constraints \cite{ALESQP} and
matrix-free trust-region methods for convex-constrained
optimization \cite{Kouri2022}.
\item
\emph{Easy-to-use methods for stochastic and risk-aware optimization}.
\item
\emph{Fast and robust algorithms for nonsmooth optimization}.
\item
\emph{Trust-region methods for inexact and adaptive computations}.
ROL controls inexact function evaluations and exploits adaptive computations
and multi-fidelity models through its trust-region methods. Several of these methods
have been pioneered by the ROL team.  Their implementations typically require some
knowledge of error in the computations, i.e., an \emph{error estimate}.
For a comprehensive overview of problem formulations, algorithms with error control,
and their ROL implementations, see \cite{Kouri2018}.
\item
\emph{PDE-OPT application development kit for PDE-constrained optimization}.
To build and optimize models based on partial differential equations (PDEs),
ROL offers an application development kit, called PDE-OPT. PDE-OPT is based on a
modular design with three layers: local finite element computations,
which encode the physical equations governing the optimization problem;
global computations, which utilize Tpetra data structures;
and an interface to ROL, enabling efficient optimization.
To solve PDE-constrained optimization problems, the users need only implement the
local finite element layer of PDE-OPT, i.e., the evaluations of weak forms on
a mesh element; PDE-OPT automates all remaining steps.
\end{itemize}


\subsection{Sacado}

Sacado \cite{SacadoURL,phipps2012efficient,phipps2008large} provides forward and reverse-mode operator overloading-based automatic differentiation (AD) tools within Trilinos.
%The package provides both forward and reverse-mode AD data types. 
Sacado forward AD tools have been integrated into Kokkos and have demonstrated good performance on GPU architectures~\cite{phipps2022automatic}.
Sacado, along with its Kokkos integration, provides high-performance derivative capabilities to numerous Office of Science and NNSA extreme scale applications, including Albany for solid mechanics and land ice modeling~\cite{Salinger2016,MPASAlbany2018}, 
Charon for semiconductor device modeling~\cite{CharonUsersManual2020} and multiphase chemically reacting flows~\cite{Musson2009}, Drekar for computational fluid dynamics (CFD)~\cite{Sondak2021,Shadid2016}, magnetohydrodynamics~\cite{Shadid2016mhd} and 
plasma physics~\cite{Crockatt2022,Miller2019}, Xyce for electronic circuit simulation~\cite{xyceTrilinos,xycePCE}, and SPARC for hypersonic fluid flows~\cite{SparcValidation}. 

\subsection{Stokhos}

Stokhos~\cite{phipps2015stokhos,Phipps2016,phipps2014exploring} provides implementations of two intrusive uncertainty quantification strategies: 
the intrusive stochastic Galerkin uncertainty quantification method~\cite{ghanem1990polynomial,ghanem2003stochastic} and the embedded ensemble propagation~\cite{phipps2017embedded}.

For the first one, Stokhos provides methods for computing intrusive stochastic Galerkin projections such as Polynomial Chaos and Generalized Polynomial Chaos, 
interfaces for forming the resulting nonlinear systems, and linear solver methods for solving block stochastic Galerkin linear systems.
The implementation targets GPU performances using Kokkos and by commuting the layout of the Galerkin operator to be outer-spatial and inner-stochastic~\cite{phipps2014exploring}.
The stochastic Galerkin implementation of Stokhos has been used in~\cite{constantine2014efficient} to efficiently propagate uncertainty in multiphysics systems by reducing the full system with a nonlinear elimination method.

The embedded ensemble propagation consists in propagating a subset of samples gathered into a so-called ensemble through the forward simulation at once.
It builds on~\cite{pawlowski2012automating} for automating embedded analysis capabilities; Stokhos defines an ensemble type, a SIMD data type, that is able to store
the values of the input, output, and state variables for every sample of an ensemble. This type can then be used in the Tpetra solver stack as a template argument for the scalar type.
This approach allows to save computation time in four ways: the sample-independent data and computation can be reused for every sample of an ensemble, the memory access pattern is improved,
the operations on the ensemble type can be vectorized efficiently, and the message passing costs are reduced by sending fewer but larger messages.
However, the approach requires solvers and BLAS functions to be aware of the extra dimension associated to the ensemble; for example, a GMRES for ensemble types~\cite{liegeois2020gmres} needs to monitor 
the convergence of the individual sample in order to decide when to stop based on the union of the information.