From 2a4d266b3259a5534f8e66350b01c7325257c88d Mon Sep 17 00:00:00 2001
From: Warisa Roongaraya <81345089+warisa-r@users.noreply.github.com>
Date: Tue, 5 Nov 2024 13:57:30 +0100
Subject: [PATCH] New Paired Explicit Runge-Kutta Integrator: Third Order
 (#2008)

* make NLsolve a weapdep

* fmt

* add NEWS.md but TBD on the ref pull request

* add comments and adjustment on solve_a_unknown

* modular implementation with init, step!, and solve_step!

* fmt

* add test

* adda body of p3 constructor test

* changes according to test and correct variable names

* only check the values of a_matrix from second row to end

* adjust the the constructor of path coefficient and its test

* adjust the test and add a seed to the randomized initial guess for reproducibility

* add NLsolve as a dependency for testing

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* optimize the loop for step! by moving the condition outside

* fmt

* more type generic

* change some names

* update test

* Correcting steps!

* Apply suggestions from code review

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update examples/structured_1d_dgsem/elixir_burgers_perk3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update examples/structured_1d_dgsem/elixir_burgers_perk3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* add docstring about dt_opt

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* merge k_higher in the last stage to a bigger loop

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* change solve_step! to solve!

* Correct the logic of step!

* deprecation

* Optimize K_S1 away

* fmt

* remove dt_opt as an attribute of PERK3

* change the objective function to match the number of equations

* fmt

* minor comment fix

* delete some stuff left from random

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* minor adjustments

* minor change to the comment

* add proper comment and bring seed back

* update test values

* fmt

* change the keyword according to the error in the test pipeline and edit some values to match the test pipeline

* remove unused import

* fix test values in misc

* add max iteration

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Apply suggestions from code review

* remove the allocating part of is_sol_valid

* removing dt_opt and update test values

* Update NEWS.md

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* update cfl number for the simulation

* Update examples/structured_1d_dgsem/elixir_burgers_perk3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* change from a_stages_stages.txt to a_stages.txt

* fixed step size should work with save solution now

* Update examples/structured_1d_dgsem/elixir_burgers_perk3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* add save solution to the example

* update test to be compatible with save_solution

* move comment regarding seed upwards

* Revert "Correct the logic of step!" only the part that meddles with methods_PERK2

* correct methods_PERK3

* move is_sol_valid closer to the for loop

* fmt

* Revert some random changes in other test unit

* add tolerance to the test

* modify functions so that they are also compatible with PERK3

* change function's name to be more descriptive

* change function's name to be more descriptive in all files

* Revert irrelevent change in TrixiConvexECOSExt.jl

* add PR number to NEWS.md

* fmt

* change from using Random to StableRNGs

* fix the value in unit test

* remove prints

* minor comment correction

* attempt to fix the error at fixed time step

* add the missing clause to test set

* adjust allocation values in test of perk3

* update test value

* move objective function to the extension

* minor fix with compute_c_coeffs

* remove explicit import of solve_a_unknown from line 18

* Apply suggestions from code review

* document why additional packages are loaded

* correct docstring

* use Float32

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

* change some Flot32 back to the way they originally were

* add line that get the type that a_unknown should be

* due to some the change of type, print out some values of a_matrix that changes slighly from the original value (error value in other tests still remain the same)

* update test values

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* allocate c_eq once per solve_a_unknown is called

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* minor fix regarding recent changes witjh c_eq

* adjust a constructor to get num stages from reading the files directly

* Revert "adjust a constructor to get num stages from reading the files directly" since it is breaking

* Update TrixiNLsolveExt.jl to use forward autodiff in solve_a_butcher_coeffs_unknown! function

* Apply suggestions from code review

* Slight modifications a values

* add cfl number calculation for PERK3

* update CI values

* remove the example without cfl calculation

* Apply suggestions from code review

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Daniel Doehring <doehringd2@gmail.com>

* add DOI to a reference in TrixiNLsolveExt

* update the test of perk3

* Update examples/structured_1d_dgsem/elixir_burgers_perk3.jl

* Update examples/structured_1d_dgsem/elixir_burgers_perk3.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* Update examples/structured_1d_dgsem/elixir_burgers_perk3.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* add to docstring packages needed in order to use PERK3

* update NEWS.md

* changes according to #2026

* add comments to the test without stepsize callback

* Update ext/TrixiNLsolveExt.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* fmt

* slight modification according to #2123

* Update test/test_unit.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* Update src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* fmt + minor correction

* update NEWS.md

* make functions more general for PERK

* remove base.resize for perk3

* put all the general functions of PERK into paired_explicit_runge_kutta.jl

* Update src/time_integration/paired_explicit_runge_kutta/paired_explicit_runge_kutta.jl

Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>

* modify some functions to be useable for both multi and single scheme

* fmt

* ensure type consistency in compute_c_coeffs

* print out the full a_matrix

* print out the new a_matrix from the latest change

* remove the false line of PERK construction

* fix the test values

* move using statements to src/Trixi.jl

---------

Co-authored-by: Daniel Doehring <daniel.doehring@rwth-aachen.de>
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
Co-authored-by: Joshua Lampert <51029046+JoshuaLampert@users.noreply.github.com>
---
 NEWS.md                                       |   8 +
 Project.toml                                  |   6 +
 .../elixir_burgers_perk3.jl                   |  70 ++++
 .../tree_1d_dgsem/elixir_advection_perk2.jl   |   3 +
 ext/TrixiNLsolveExt.jl                        | 128 +++++++
 src/Trixi.jl                                  |  10 +-
 .../methods_PERK2.jl                          | 164 ---------
 .../methods_PERK3.jl                          | 336 ++++++++++++++++++
 .../paired_explicit_runge_kutta.jl            | 174 ++++++++-
 test/Project.toml                             |   4 +
 test/test_structured_1d.jl                    |  36 ++
 test/test_unit.jl                             |  40 +++
 12 files changed, 812 insertions(+), 167 deletions(-)
 create mode 100644 examples/structured_1d_dgsem/elixir_burgers_perk3.jl
 create mode 100644 ext/TrixiNLsolveExt.jl
 create mode 100644 src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl

diff --git a/NEWS.md b/NEWS.md
index cba0d3f86d8..2f71c45e57e 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -4,6 +4,14 @@ Trixi.jl follows the interpretation of [semantic versioning (semver)](https://ju
 used in the Julia ecosystem. Notable changes will be documented in this file
 for human readability.
 
+## Changes in the v0.9 lifecycle
+
+#### Added
+
+- New time integrator `PairedExplicitRK3`, implementing the third-order paired explicit Runge-Kutta
+  method with [Convex.jl](https://github.com/jump-dev/Convex.jl), [ECOS.jl](https://github.com/jump-dev/ECOS.jl),
+  and [NLsolve.jl](https://github.com/JuliaNLSolvers/NLsolve.jl) ([#2008])
+
 ## Changes when updating to v0.9 from v0.8.x
 
 #### Added
diff --git a/Project.toml b/Project.toml
index 98512989fb6..bb203d4ba1b 100644
--- a/Project.toml
+++ b/Project.toml
@@ -36,6 +36,7 @@ Requires = "ae029012-a4dd-5104-9daa-d747884805df"
 SciMLBase = "0bca4576-84f4-4d90-8ffe-ffa030f20462"
 SimpleUnPack = "ce78b400-467f-4804-87d8-8f486da07d0a"
 SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
+StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
 StartUpDG = "472ebc20-7c99-4d4b-9470-8fde4e9faa0f"
 Static = "aedffcd0-7271-4cad-89d0-dc628f76c6d3"
 StaticArrayInterface = "0d7ed370-da01-4f52-bd93-41d350b8b718"
@@ -55,10 +56,12 @@ UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
 Convex = "f65535da-76fb-5f13-bab9-19810c17039a"
 ECOS = "e2685f51-7e38-5353-a97d-a921fd2c8199"
 Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
+NLsolve = "2774e3e8-f4cf-5e23-947b-6d7e65073b56"
 
 [extensions]
 TrixiConvexECOSExt = ["Convex", "ECOS"]
 TrixiMakieExt = "Makie"
+TrixiNLsolveExt = "NLsolve"
 
 [compat]
 Accessors = "0.1.12"
@@ -82,6 +85,7 @@ LoopVectorization = "0.12.151"
 MPI = "0.20"
 Makie = "0.19, 0.20"
 MuladdMacro = "0.2.2"
+NLsolve = "4.5.1"
 Octavian = "0.3.21"
 OffsetArrays = "1.12"
 P4est = "0.4.9"
@@ -96,6 +100,7 @@ Requires = "1.1"
 SciMLBase = "1.90, 2"
 SimpleUnPack = "1.1"
 SparseArrays = "1"
+StableRNGs = "1.0.2"
 StartUpDG = "0.17.7, 1.1.5"
 Static = "0.8.7"
 StaticArrayInterface = "1.4"
@@ -116,3 +121,4 @@ julia = "1.8"
 Convex = "f65535da-76fb-5f13-bab9-19810c17039a"
 ECOS = "e2685f51-7e38-5353-a97d-a921fd2c8199"
 Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
+NLsolve = "2774e3e8-f4cf-5e23-947b-6d7e65073b56"
diff --git a/examples/structured_1d_dgsem/elixir_burgers_perk3.jl b/examples/structured_1d_dgsem/elixir_burgers_perk3.jl
new file mode 100644
index 00000000000..bf91fde74ea
--- /dev/null
+++ b/examples/structured_1d_dgsem/elixir_burgers_perk3.jl
@@ -0,0 +1,70 @@
+# Convex and ECOS are imported because they are used for finding the optimal time step and optimal 
+# monomial coefficients in the stability polynomial of P-ERK time integrators.
+using Convex, ECOS
+
+# NLsolve is imported to solve the system of nonlinear equations to find the coefficients
+# in the Butcher tableau in the third order P-ERK time integrator.
+using NLsolve
+
+using OrdinaryDiffEq
+using Trixi
+
+###############################################################################
+# semidiscretization of the (inviscid) Burgers equation
+
+equations = InviscidBurgersEquation1D()
+
+initial_condition = initial_condition_convergence_test
+
+# Create DG solver with polynomial degree = 4 and (local) Lax-Friedrichs/Rusanov flux as surface flux
+solver = DGSEM(polydeg = 4, surface_flux = flux_lax_friedrichs)
+
+coordinates_min = (0.0,) # minimum coordinate
+coordinates_max = (1.0,) # maximum coordinate
+cells_per_dimension = (64,)
+
+mesh = StructuredMesh(cells_per_dimension, coordinates_min, coordinates_max)
+
+semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver,
+                                    source_terms = source_terms_convergence_test)
+
+###############################################################################
+# ODE solvers, callbacks etc.
+
+tspan = (0.0, 2.0)
+ode = semidiscretize(semi, tspan)
+
+summary_callback = SummaryCallback()
+
+analysis_interval = 200
+analysis_callback = AnalysisCallback(semi, interval = analysis_interval)
+
+alive_callback = AliveCallback(analysis_interval = analysis_interval)
+
+save_solution = SaveSolutionCallback(dt = 0.1,
+                                     save_initial_solution = true,
+                                     save_final_solution = true,
+                                     solution_variables = cons2prim)
+
+# Construct third order paired explicit Runge-Kutta method with 8 stages for given simulation setup.
+# Pass `tspan` to calculate maximum time step allowed for the bisection algorithm used 
+# in calculating the polynomial coefficients in the ODE algorithm.                                     
+ode_algorithm = Trixi.PairedExplicitRK3(8, tspan, semi)
+
+cfl_number = Trixi.calculate_cfl(ode_algorithm, ode)
+# For non-linear problems, the CFL number should be reduced by a safety factor
+# as the spectrum changes (in general) over the course of a simulation
+stepsize_callback = StepsizeCallback(cfl = 0.85 * cfl_number)
+
+callbacks = CallbackSet(summary_callback,
+                        analysis_callback, alive_callback, save_solution,
+                        stepsize_callback)
+
+###############################################################################
+# run the simulation
+sol = Trixi.solve(ode, ode_algorithm,
+                  dt = 1.0, # solve needs some value here but it will be overwritten by the stepsize_callback
+                  save_everystep = false, callback = callbacks);
+
+# Print the timer summary
+summary_callback()
diff --git a/examples/tree_1d_dgsem/elixir_advection_perk2.jl b/examples/tree_1d_dgsem/elixir_advection_perk2.jl
index 7db461a079e..68befbcfe82 100644
--- a/examples/tree_1d_dgsem/elixir_advection_perk2.jl
+++ b/examples/tree_1d_dgsem/elixir_advection_perk2.jl
@@ -1,5 +1,8 @@
 
+# Convex and ECOS are imported because they are used for finding the optimal time step and optimal 
+# monomial coefficients in the stability polynomial of P-ERK time integrators.
 using Convex, ECOS
+
 using OrdinaryDiffEq
 using Trixi
 
diff --git a/ext/TrixiNLsolveExt.jl b/ext/TrixiNLsolveExt.jl
new file mode 100644
index 00000000000..9247687fdc7
--- /dev/null
+++ b/ext/TrixiNLsolveExt.jl
@@ -0,0 +1,128 @@
+# Package extension for adding NLsolve-based features to Trixi.jl
+module TrixiNLsolveExt
+
+# Required for finding coefficients in Butcher tableau in the third order of 
+# P-ERK scheme integrators
+if isdefined(Base, :get_extension)
+    using NLsolve: nlsolve
+else
+    # Until Julia v1.9 is the minimum required version for Trixi.jl, we still support Requires.jl
+    using ..NLsolve: nlsolve
+end
+
+# We use a random initialization of the nonlinear solver.
+# To make the tests reproducible, we need to seed the RNG
+using StableRNGs: StableRNG, rand
+
+# Use functions that are to be extended and additional symbols that are not exported
+using Trixi: Trixi, compute_c_coeffs, @muladd
+
+# By default, Julia/LLVM does not use fused multiply-add operations (FMAs).
+# Since these FMAs can increase the performance of many numerical algorithms,
+# we need to opt-in explicitly.
+# See https://ranocha.de/blog/Optimizing_EC_Trixi for further details.
+@muladd begin
+#! format: noindent
+
+# Compute residuals for nonlinear equations to match a stability polynomial with given coefficients,
+# in order to find A-matrix in the Butcher-Tableau
+function PairedExplicitRK3_butcher_tableau_objective_function!(c_eq, a_unknown,
+                                                               num_stages,
+                                                               num_stage_evals,
+                                                               monomial_coeffs,
+                                                               cS2)
+    c_ts = compute_c_coeffs(num_stages, cS2) # ts = timestep
+    # For explicit methods, a_{1,1} = 0 and a_{2,1} = c_2 (Butcher's condition)
+    a_coeff = [0, c_ts[2], a_unknown...]
+    # Equality constraint array that ensures that the stability polynomial computed from 
+    # the to-be-constructed Butcher-Tableau matches the monomial coefficients of the 
+    # optimized stability polynomial.
+    # For details, see Chapter 4.3, Proposition 3.2, Equation (3.3) from 
+    # Hairer, Wanner: Solving Ordinary Differential Equations 2
+    # DOI: 10.1007/978-3-662-09947-6
+
+    # Lower-order terms: Two summands present
+    for i in 1:(num_stage_evals - 4)
+        term1 = a_coeff[num_stage_evals - 1]
+        term2 = a_coeff[num_stage_evals]
+        for j in 1:i
+            term1 *= a_coeff[num_stage_evals - 1 - j]
+            term2 *= a_coeff[num_stage_evals - j]
+        end
+        term1 *= c_ts[num_stages - 2 - i] * 1 / 6 # 1 / 6 = b_{S-1}
+        term2 *= c_ts[num_stages - 1 - i] * 2 / 3 # 2 / 3 = b_S
+
+        c_eq[i] = monomial_coeffs[i] - (term1 + term2)
+    end
+
+    # Highest coefficient: Only one term present
+    i = num_stage_evals - 3
+    term2 = a_coeff[num_stage_evals]
+    for j in 1:i
+        term2 *= a_coeff[num_stage_evals - j]
+    end
+    term2 *= c_ts[num_stages - 1 - i] * 2 / 3 # 2 / 3 = b_S
+
+    c_eq[i] = monomial_coeffs[i] - term2
+    # Third-order consistency condition (Cf. eq. (27) from https://doi.org/10.1016/j.jcp.2022.111470
+    c_eq[num_stage_evals - 2] = 1 - 4 * a_coeff[num_stage_evals] -
+                                a_coeff[num_stage_evals - 1]
+end
+
+# Find the values of the a_{i, i-1} in the Butcher tableau matrix A by solving a system of
+# non-linear equations that arise from the relation of the stability polynomial to the Butcher tableau.
+# For details, see Proposition 3.2, Equation (3.3) from 
+# Hairer, Wanner: Solving Ordinary Differential Equations 2
+function Trixi.solve_a_butcher_coeffs_unknown!(a_unknown, num_stages, monomial_coeffs,
+                                               c_s2, c;
+                                               verbose, max_iter = 100000)
+
+    # Define the objective_function
+    function objective_function!(c_eq, x)
+        return PairedExplicitRK3_butcher_tableau_objective_function!(c_eq, x,
+                                                                     num_stages,
+                                                                     num_stages,
+                                                                     monomial_coeffs,
+                                                                     c_s2)
+    end
+
+    # RealT is determined as the type of the first element in monomial_coeffs to ensure type consistency
+    RealT = typeof(monomial_coeffs[1])
+
+    # To ensure consistency and reproducibility of results across runs, we use 
+    # a seeded random initial guess.
+    rng = StableRNG(555)
+
+    for _ in 1:max_iter
+        # Due to the nature of the nonlinear solver, different initial guesses can lead to 
+        # small numerical differences in the solution.
+
+        x0 = convert(RealT, 0.1) .* rand(rng, RealT, num_stages - 2)
+
+        sol = nlsolve(objective_function!, x0, method = :trust_region,
+                      ftol = 4.0e-16, # Enforce objective up to machine precision
+                      iterations = 10^4, xtol = 1.0e-13, autodiff = :forward)
+
+        a_unknown = sol.zero # Retrieve solution (root = zero)
+
+        # Check if the values a[i, i-1] >= 0.0 (which stem from the nonlinear solver) 
+        # and also c[i] - a[i, i-1] >= 0.0 since all coefficients should be non-negative
+        # to avoid downwinding of numerical fluxes.
+        is_sol_valid = all(x -> !isnan(x) && x >= 0, a_unknown) &&
+                       all(!isnan(c[i] - a_unknown[i - 2]) &&
+                           c[i] - a_unknown[i - 2] >= 0 for i in eachindex(c) if i > 2)
+
+        if is_sol_valid
+            return a_unknown
+        else
+            if verbose
+                println("Solution invalid. Restart the process of solving non-linear system of equations again.")
+            end
+        end
+    end
+
+    error("Maximum number of iterations ($max_iter) reached. Cannot find valid sets of coefficients.")
+end
+end # @muladd
+
+end # module TrixiNLsolveExt
diff --git a/src/Trixi.jl b/src/Trixi.jl
index 0cedec782df..7d557ddde38 100644
--- a/src/Trixi.jl
+++ b/src/Trixi.jl
@@ -19,7 +19,8 @@ module Trixi
 # (standard library packages first, other packages next, all of them sorted alphabetically)
 
 using Accessors: @reset
-using LinearAlgebra: LinearAlgebra, Diagonal, diag, dot, mul!, norm, cross, normalize, I,
+using LinearAlgebra: LinearAlgebra, Diagonal, diag, dot, eigvals, mul!, norm, cross,
+                     normalize, I,
                      UniformScaling, det
 using Printf: @printf, @sprintf, println
 using SparseArrays: AbstractSparseMatrix, AbstractSparseMatrixCSC, sparse, droptol!,
@@ -40,6 +41,7 @@ import SciMLBase: get_du, get_tmp_cache, u_modified!,
                   get_proposed_dt, set_proposed_dt!,
                   terminate!, remake, add_tstop!, has_tstop, first_tstop
 
+using DelimitedFiles: readdlm
 using Downloads: Downloads
 using CodeTracking: CodeTracking
 using ConstructionBase: ConstructionBase
@@ -320,6 +322,12 @@ function __init__()
         end
     end
 
+    @static if !isdefined(Base, :get_extension)
+        @require NLsolve="2774e3e8-f4cf-5e23-947b-6d7e65073b56" begin
+            include("../ext/TrixiNLsolveExt.jl")
+        end
+    end
+
     # FIXME upstream. This is a hacky workaround for
     #       https://github.com/trixi-framework/Trixi.jl/issues/628
     #       https://github.com/trixi-framework/Trixi.jl/issues/1185
diff --git a/src/time_integration/paired_explicit_runge_kutta/methods_PERK2.jl b/src/time_integration/paired_explicit_runge_kutta/methods_PERK2.jl
index bad40c61565..0ff648556c4 100644
--- a/src/time_integration/paired_explicit_runge_kutta/methods_PERK2.jl
+++ b/src/time_integration/paired_explicit_runge_kutta/methods_PERK2.jl
@@ -2,18 +2,9 @@
 # Since these FMAs can increase the performance of many numerical algorithms,
 # we need to opt-in explicitly.
 # See https://ranocha.de/blog/Optimizing_EC_Trixi for further details.
-using DelimitedFiles: readdlm
-using LinearAlgebra: eigvals
-
 @muladd begin
 #! format: noindent
 
-# Abstract base type for both single/standalone and multi-level 
-# PERK (Paired-Explicit Runge-Kutta) time integration schemes
-abstract type AbstractPairedExplicitRK end
-# Abstract base type for single/standalone PERK time integration schemes
-abstract type AbstractPairedExplicitRKSingle <: AbstractPairedExplicitRK end
-
 # Compute the coefficients of the A matrix in the Butcher tableau using
 # stage scaling factors and monomial coefficients
 function compute_a_coeffs(num_stage_evals, stage_scaling_factors, monomial_coeffs)
@@ -185,31 +176,6 @@ function PairedExplicitRK2(num_stages, tspan, eig_vals::Vector{ComplexF64};
     return PairedExplicitRK2(num_stages, a_matrix, c, 1 - bS, bS, cS, dt_opt)
 end
 
-# This struct is needed to fake https://github.com/SciML/OrdinaryDiffEq.jl/blob/0c2048a502101647ac35faabd80da8a5645beac7/src/integrators/type.jl#L1
-mutable struct PairedExplicitRKOptions{Callback, TStops}
-    callback::Callback # callbacks; used in Trixi
-    adaptive::Bool # whether the algorithm is adaptive
-    dtmax::Float64 # ignored
-    maxiters::Int # maximal number of time steps
-    tstops::TStops # tstops from https://diffeq.sciml.ai/v6.8/basics/common_solver_opts/#Output-Control-1; ignored
-end
-
-function PairedExplicitRKOptions(callback, tspan; maxiters = typemax(Int), kwargs...)
-    tstops_internal = BinaryHeap{eltype(tspan)}(FasterForward())
-    # We add last(tspan) to make sure that the time integration stops at the end time
-    push!(tstops_internal, last(tspan))
-    # We add 2 * last(tspan) because add_tstop!(integrator, t) is only called by DiffEqCallbacks.jl if tstops contains a time that is larger than t
-    # (https://github.com/SciML/DiffEqCallbacks.jl/blob/025dfe99029bd0f30a2e027582744528eb92cd24/src/iterative_and_periodic.jl#L92)
-    push!(tstops_internal, 2 * last(tspan))
-    PairedExplicitRKOptions{typeof(callback), typeof(tstops_internal)}(callback,
-                                                                       false, Inf,
-                                                                       maxiters,
-                                                                       tstops_internal)
-end
-
-abstract type PairedExplicitRK end
-abstract type AbstractPairedExplicitRKSingleIntegrator <: PairedExplicitRK end
-
 # This struct is needed to fake https://github.com/SciML/OrdinaryDiffEq.jl/blob/0c2048a502101647ac35faabd80da8a5645beac7/src/integrators/type.jl#L77
 # This implements the interface components described at
 # https://diffeq.sciml.ai/v6.8/basics/integrator/#Handing-Integrators-1
@@ -238,58 +204,6 @@ mutable struct PairedExplicitRK2Integrator{RealT <: Real, uType, Params, Sol, F,
     k_higher::uType
 end
 
-"""
-    calculate_cfl(ode_algorithm::AbstractPairedExplicitRKSingle, ode)
-
-This function computes the CFL number once using the initial condition of the problem and the optimal timestep (`dt_opt`) from the ODE algorithm.
-"""
-function calculate_cfl(ode_algorithm::AbstractPairedExplicitRKSingle, ode)
-    t0 = first(ode.tspan)
-    u_ode = ode.u0
-    semi = ode.p
-    dt_opt = ode_algorithm.dt_opt
-
-    if isnothing(dt_opt)
-        error("The optimal time step `dt_opt` must be provided.")
-    end
-
-    mesh, equations, solver, cache = mesh_equations_solver_cache(semi)
-    u = wrap_array(u_ode, mesh, equations, solver, cache)
-
-    cfl_number = dt_opt / max_dt(u, t0, mesh,
-                        have_constant_speed(equations), equations,
-                        solver, cache)
-    return cfl_number
-end
-
-"""
-    add_tstop!(integrator::PairedExplicitRK2Integrator, t)
-Add a time stop during the time integration process.
-This function is called after the periodic SaveSolutionCallback to specify the next stop to save the solution.
-"""
-function add_tstop!(integrator::PairedExplicitRK2Integrator, t)
-    integrator.tdir * (t - integrator.t) < zero(integrator.t) &&
-        error("Tried to add a tstop that is behind the current time. This is strictly forbidden")
-    # We need to remove the first entry of tstops when a new entry is added.
-    # Otherwise, the simulation gets stuck at the previous tstop and dt is adjusted to zero.
-    if length(integrator.opts.tstops) > 1
-        pop!(integrator.opts.tstops)
-    end
-    push!(integrator.opts.tstops, integrator.tdir * t)
-end
-
-has_tstop(integrator::PairedExplicitRK2Integrator) = !isempty(integrator.opts.tstops)
-first_tstop(integrator::PairedExplicitRK2Integrator) = first(integrator.opts.tstops)
-
-# Forward integrator.stats.naccept to integrator.iter (see GitHub PR#771)
-function Base.getproperty(integrator::PairedExplicitRK, field::Symbol)
-    if field === :stats
-        return (naccept = getfield(integrator, :iter),)
-    end
-    # general fallback
-    return getfield(integrator, field)
-end
-
 function init(ode::ODEProblem, alg::PairedExplicitRK2;
               dt, callback::Union{CallbackSet, Nothing} = nothing, kwargs...)
     u0 = copy(ode.u0)
@@ -326,29 +240,6 @@ function init(ode::ODEProblem, alg::PairedExplicitRK2;
     return integrator
 end
 
-# Fakes `solve`: https://diffeq.sciml.ai/v6.8/basics/overview/#Solving-the-Problems-1
-function solve(ode::ODEProblem, alg::PairedExplicitRK2;
-               dt, callback = nothing, kwargs...)
-    integrator = init(ode, alg, dt = dt, callback = callback; kwargs...)
-
-    # Start actual solve
-    solve!(integrator)
-end
-
-function solve!(integrator::PairedExplicitRK2Integrator)
-    @unpack prob = integrator.sol
-
-    integrator.finalstep = false
-
-    @trixi_timeit timer() "main loop" while !integrator.finalstep
-        step!(integrator)
-    end # "main loop" timer
-
-    return TimeIntegratorSolution((first(prob.tspan), integrator.t),
-                                  (prob.u0, integrator.u),
-                                  integrator.sol.prob)
-end
-
 function step!(integrator::PairedExplicitRK2Integrator)
     @unpack prob = integrator.sol
     @unpack alg = integrator
@@ -434,59 +325,4 @@ function step!(integrator::PairedExplicitRK2Integrator)
         terminate!(integrator)
     end
 end
-
-# get a cache where the RHS can be stored
-get_du(integrator::PairedExplicitRK) = integrator.du
-get_tmp_cache(integrator::PairedExplicitRK) = (integrator.u_tmp,)
-
-# some algorithms from DiffEq like FSAL-ones need to be informed when a callback has modified u
-u_modified!(integrator::PairedExplicitRK, ::Bool) = false
-
-# used by adaptive timestepping algorithms in DiffEq
-function set_proposed_dt!(integrator::PairedExplicitRK, dt)
-    (integrator.dt = dt; integrator.dtcache = dt)
-end
-
-function get_proposed_dt(integrator::PairedExplicitRK)
-    return ifelse(integrator.opts.adaptive, integrator.dt, integrator.dtcache)
-end
-
-# stop the time integration
-function terminate!(integrator::PairedExplicitRK)
-    integrator.finalstep = true
-end
-
-"""
-    modify_dt_for_tstops!(integrator::PairedExplicitRK)
-Modify the time-step size to match the time stops specified in integrator.opts.tstops.
-To avoid adding OrdinaryDiffEq to Trixi's dependencies, this routine is a copy of
-https://github.com/SciML/OrdinaryDiffEq.jl/blob/d76335281c540ee5a6d1bd8bb634713e004f62ee/src/integrators/integrator_utils.jl#L38-L54
-"""
-function modify_dt_for_tstops!(integrator::PairedExplicitRK)
-    if has_tstop(integrator)
-        tdir_t = integrator.tdir * integrator.t
-        tdir_tstop = first_tstop(integrator)
-        if integrator.opts.adaptive
-            integrator.dt = integrator.tdir *
-                            min(abs(integrator.dt), abs(tdir_tstop - tdir_t)) # step! to the end
-        elseif iszero(integrator.dtcache) && integrator.dtchangeable
-            integrator.dt = integrator.tdir * abs(tdir_tstop - tdir_t)
-        elseif integrator.dtchangeable && !integrator.force_stepfail
-            # always try to step! with dtcache, but lower if a tstop
-            # however, if force_stepfail then don't set to dtcache, and no tstop worry
-            integrator.dt = integrator.tdir *
-                            min(abs(integrator.dtcache), abs(tdir_tstop - tdir_t)) # step! to the end
-        end
-    end
-end
-
-# used for AMR (Adaptive Mesh Refinement)
-function Base.resize!(integrator::PairedExplicitRK2Integrator, new_size)
-    resize!(integrator.u, new_size)
-    resize!(integrator.du, new_size)
-    resize!(integrator.u_tmp, new_size)
-
-    resize!(integrator.k1, new_size)
-    resize!(integrator.k_higher, new_size)
-end
 end # @muladd
diff --git a/src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl b/src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl
new file mode 100644
index 00000000000..384f4b408ca
--- /dev/null
+++ b/src/time_integration/paired_explicit_runge_kutta/methods_PERK3.jl
@@ -0,0 +1,336 @@
+# By default, Julia/LLVM does not use fused multiply-add operations (FMAs).
+# Since these FMAs can increase the performance of many numerical algorithms,
+# we need to opt-in explicitly.
+# See https://ranocha.de/blog/Optimizing_EC_Trixi for further details.
+@muladd begin
+#! format: noindent
+
+# Initialize Butcher array abscissae c for PairedExplicitRK3 based on SSPRK33 base method
+function compute_c_coeffs(num_stages, cS2)
+    c = zeros(eltype(cS2), num_stages)
+
+    # Last timesteps as for SSPRK33, see motivation in Section 3.3 of
+    # https://doi.org/10.1016/j.jcp.2024.113223
+    c[num_stages - 1] = one(cS2)
+    c[num_stages] = convert(eltype(cS2), 0.5)
+
+    # Linear increasing timestep for remainder
+    for i in 2:(num_stages - 2)
+        c[i] = cS2 * (i - 1) / (num_stages - 3)
+    end
+
+    return c
+end
+
+# Compute the Butcher tableau for a paired explicit Runge-Kutta method order 3
+# using a list of eigenvalues
+function compute_PairedExplicitRK3_butcher_tableau(num_stages, tspan,
+                                                   eig_vals::Vector{ComplexF64};
+                                                   verbose = false, cS2)
+    # Initialize array of c
+    c = compute_c_coeffs(num_stages, cS2)
+
+    # Initialize the array of our solution
+    a_unknown = zeros(num_stages - 2)
+
+    # Special case of e = 3
+    if num_stages == 3
+        a_unknown = [0.25] # Use classic SSPRK33 (Shu-Osher) Butcher Tableau
+    else
+        # Calculate coefficients of the stability polynomial in monomial form
+        consistency_order = 3
+        dtmax = tspan[2] - tspan[1]
+        dteps = 1.0f-9
+
+        num_eig_vals, eig_vals = filter_eig_vals(eig_vals; verbose)
+
+        monomial_coeffs, dt_opt = bisect_stability_polynomial(consistency_order,
+                                                              num_eig_vals, num_stages,
+                                                              dtmax, dteps,
+                                                              eig_vals; verbose)
+        monomial_coeffs = undo_normalization!(monomial_coeffs, consistency_order,
+                                              num_stages)
+
+        # Solve the nonlinear system of equations from monomial coefficient and
+        # Butcher array abscissae c to find Butcher matrix A
+        # This function is extended in TrixiNLsolveExt.jl
+        a_unknown = solve_a_butcher_coeffs_unknown!(a_unknown, num_stages,
+                                                    monomial_coeffs, cS2, c;
+                                                    verbose)
+    end
+    # Fill A-matrix in P-ERK style
+    a_matrix = zeros(num_stages - 2, 2)
+    a_matrix[:, 1] = c[3:end]
+    a_matrix[:, 1] -= a_unknown
+    a_matrix[:, 2] = a_unknown
+
+    return a_matrix, c, dt_opt
+end
+
+# Compute the Butcher tableau for a paired explicit Runge-Kutta method order 3
+# using provided values of coefficients a in A-matrix of Butcher tableau
+function compute_PairedExplicitRK3_butcher_tableau(num_stages,
+                                                   base_path_a_coeffs::AbstractString;
+                                                   cS2)
+
+    # Initialize array of c
+    c = compute_c_coeffs(num_stages, cS2)
+
+    # - 2 Since First entry of A is always zero (explicit method) and second is given by c_2 (consistency)
+    a_coeffs_max = num_stages - 2
+
+    a_matrix = zeros(a_coeffs_max, 2)
+    a_matrix[:, 1] = c[3:end]
+
+    path_a_coeffs = joinpath(base_path_a_coeffs,
+                             "a_" * string(num_stages) * ".txt")
+
+    @assert isfile(path_a_coeffs) "Couldn't find file $path_a_coeffs"
+    a_coeffs = readdlm(path_a_coeffs, Float64)
+    num_a_coeffs = size(a_coeffs, 1)
+
+    @assert num_a_coeffs == a_coeffs_max
+    # Fill A-matrix in P-ERK style
+    a_matrix[:, 1] -= a_coeffs
+    a_matrix[:, 2] = a_coeffs
+
+    return a_matrix, c
+end
+
+@doc raw"""
+    PairedExplicitRK3(num_stages, base_path_a_coeffs::AbstractString, dt_opt = nothing;
+                      cS2 = 1.0f0)
+    PairedExplicitRK3(num_stages, tspan, semi::AbstractSemidiscretization;
+                      verbose = false, cS2 = 1.0f0)
+    PairedExplicitRK3(num_stages, tspan, eig_vals::Vector{ComplexF64};
+                      verbose = false, cS2 = 1.0f0)
+
+    Parameters:
+    - `num_stages` (`Int`): Number of stages in the paired explicit Runge-Kutta (P-ERK) method.
+    - `base_path_a_coeffs` (`AbstractString`): Path to a file containing some coefficients in the A-matrix in 
+      the Butcher tableau of the Runge Kutta method.
+      The matrix should be stored in a text file at `joinpath(base_path_a_coeffs, "a_$(num_stages).txt")` and separated by line breaks.
+    - `dt_opt` (`Float64`, optional): Optimal time step size for the simulation setup. Can be `nothing` if it is unknown. 
+       In this case the optimal CFL number cannot be computed and the [`StepsizeCallback`](@ref) cannot be used.
+    - `tspan`: Time span of the simulation.
+    - `semi` (`AbstractSemidiscretization`): Semidiscretization setup.
+    -  `eig_vals` (`Vector{ComplexF64}`): Eigenvalues of the Jacobian of the right-hand side (rhs) of the ODEProblem after the
+      equation has been semidiscretized.
+    - `verbose` (`Bool`, optional): Verbosity flag, default is false.
+    - `cS2` (`Float64`, optional): Value of c in the Butcher tableau at c_{s-2}, when
+      s is the number of stages, default is 1.0f0.
+
+The following structures and methods provide an implementation of
+the third-order paired explicit Runge-Kutta (P-ERK) method
+optimized for a certain simulation setup (PDE, IC & BC, Riemann Solver, DG Solver).
+The original paper is
+- Nasab, Vermeire (2022)
+Third-order Paired Explicit Runge-Kutta schemes for stiff systems of equations
+[DOI: 10.1016/j.jcp.2022.111470](https://doi.org/10.1016/j.jcp.2022.111470)
+While the changes to SSPRK33 base-scheme are described in 
+- Doehring, Schlottke-Lakemper, Gassner, Torrilhon (2024)
+Multirate Time-Integration based on Dynamic ODE Partitioning through Adaptively Refined Meshes for Compressible Fluid Dynamics
+[DOI: 10.1016/j.jcp.2024.113223](https://doi.org/10.1016/j.jcp.2024.113223)
+
+Note: To use this integrator, the user must import the `Convex`, `ECOS`, and `NLsolve` packages.
+"""
+mutable struct PairedExplicitRK3 <: AbstractPairedExplicitRKSingle
+    const num_stages::Int # S
+
+    a_matrix::Matrix{Float64}
+    c::Vector{Float64}
+    dt_opt::Union{Float64, Nothing}
+end # struct PairedExplicitRK3
+
+# Constructor for previously computed A Coeffs
+function PairedExplicitRK3(num_stages, base_path_a_coeffs::AbstractString,
+                           dt_opt = nothing;
+                           cS2 = 1.0f0)
+    a_matrix, c = compute_PairedExplicitRK3_butcher_tableau(num_stages,
+                                                            base_path_a_coeffs;
+                                                            cS2)
+
+    return PairedExplicitRK3(num_stages, a_matrix, c, dt_opt)
+end
+
+# Constructor that computes Butcher matrix A coefficients from a semidiscretization
+function PairedExplicitRK3(num_stages, tspan, semi::AbstractSemidiscretization;
+                           verbose = false, cS2 = 1.0f0)
+    eig_vals = eigvals(jacobian_ad_forward(semi))
+
+    return PairedExplicitRK3(num_stages, tspan, eig_vals; verbose, cS2)
+end
+
+# Constructor that calculates the coefficients with polynomial optimizer from a list of eigenvalues
+function PairedExplicitRK3(num_stages, tspan, eig_vals::Vector{ComplexF64};
+                           verbose = false, cS2 = 1.0f0)
+    a_matrix, c, dt_opt = compute_PairedExplicitRK3_butcher_tableau(num_stages,
+                                                                    tspan,
+                                                                    eig_vals;
+                                                                    verbose, cS2)
+    return PairedExplicitRK3(num_stages, a_matrix, c, dt_opt)
+end
+
+# This struct is needed to fake https://github.com/SciML/OrdinaryDiffEq.jl/blob/0c2048a502101647ac35faabd80da8a5645beac7/src/integrators/type.jl#L77
+# This implements the interface components described at
+# https://diffeq.sciml.ai/v6.8/basics/integrator/#Handing-Integrators-1
+# which are used in Trixi.jl.
+mutable struct PairedExplicitRK3Integrator{RealT <: Real, uType, Params, Sol, F, Alg,
+                                           PairedExplicitRKOptions} <:
+               AbstractPairedExplicitRKSingleIntegrator
+    u::uType
+    du::uType
+    u_tmp::uType
+    t::RealT
+    tdir::RealT
+    dt::RealT # current time step
+    dtcache::RealT # manually set time step
+    iter::Int # current number of time steps (iteration)
+    p::Params # will be the semidiscretization from Trixi
+    sol::Sol # faked
+    f::F
+    alg::Alg # This is our own class written above; Abbreviation for ALGorithm
+    opts::PairedExplicitRKOptions
+    finalstep::Bool # added for convenience
+    dtchangeable::Bool
+    force_stepfail::Bool
+    # PairedExplicitRK stages:
+    k1::uType
+    k_higher::uType
+end
+
+function init(ode::ODEProblem, alg::PairedExplicitRK3;
+              dt, callback::Union{CallbackSet, Nothing} = nothing, kwargs...)
+    u0 = copy(ode.u0)
+    du = zero(u0)
+    u_tmp = zero(u0)
+
+    # PairedExplicitRK stages
+    k1 = zero(u0)
+    k_higher = zero(u0)
+
+    t0 = first(ode.tspan)
+    tdir = sign(ode.tspan[end] - ode.tspan[1])
+    iter = 0
+
+    integrator = PairedExplicitRK3Integrator(u0, du, u_tmp, t0, tdir, dt, dt, iter,
+                                             ode.p,
+                                             (prob = ode,), ode.f, alg,
+                                             PairedExplicitRKOptions(callback,
+                                                                     ode.tspan;
+                                                                     kwargs...),
+                                             false, true, false,
+                                             k1, k_higher)
+
+    # initialize callbacks
+    if callback isa CallbackSet
+        for cb in callback.continuous_callbacks
+            throw(ArgumentError("Continuous callbacks are unsupported with paired explicit Runge-
+            Kutta methods."))
+        end
+        for cb in callback.discrete_callbacks
+            cb.initialize(cb, integrator.u, integrator.t, integrator)
+        end
+    end
+
+    return integrator
+end
+
+function step!(integrator::PairedExplicitRK3Integrator)
+    @unpack prob = integrator.sol
+    @unpack alg = integrator
+    t_end = last(prob.tspan)
+    callbacks = integrator.opts.callback
+
+    @assert !integrator.finalstep
+    if isnan(integrator.dt)
+        error("time step size `dt` is NaN")
+    end
+
+    modify_dt_for_tstops!(integrator)
+
+    # if the next iteration would push the simulation beyond the end time, set dt accordingly
+    if integrator.t + integrator.dt > t_end ||
+       isapprox(integrator.t + integrator.dt, t_end)
+        integrator.dt = t_end - integrator.t
+        terminate!(integrator)
+    end
+
+    @trixi_timeit timer() "Paired Explicit Runge-Kutta ODE integration step" begin
+        # k1
+        integrator.f(integrator.du, integrator.u, prob.p, integrator.t)
+        @threaded for i in eachindex(integrator.du)
+            integrator.k1[i] = integrator.du[i] * integrator.dt
+        end
+
+        # Construct current state
+        @threaded for i in eachindex(integrator.du)
+            integrator.u_tmp[i] = integrator.u[i] + alg.c[2] * integrator.k1[i]
+        end
+        # k2
+        integrator.f(integrator.du, integrator.u_tmp, prob.p,
+                     integrator.t + alg.c[2] * integrator.dt)
+
+        @threaded for i in eachindex(integrator.du)
+            integrator.k_higher[i] = integrator.du[i] * integrator.dt
+        end
+
+        # Higher stages
+        for stage in 3:(alg.num_stages - 1)
+            # Construct current state
+            @threaded for i in eachindex(integrator.du)
+                integrator.u_tmp[i] = integrator.u[i] +
+                                      alg.a_matrix[stage - 2, 1] *
+                                      integrator.k1[i] +
+                                      alg.a_matrix[stage - 2, 2] *
+                                      integrator.k_higher[i]
+            end
+
+            integrator.f(integrator.du, integrator.u_tmp, prob.p,
+                         integrator.t + alg.c[stage] * integrator.dt)
+
+            @threaded for i in eachindex(integrator.du)
+                integrator.k_higher[i] = integrator.du[i] * integrator.dt
+            end
+        end
+
+        # Last stage
+        @threaded for i in eachindex(integrator.du)
+            integrator.u_tmp[i] = integrator.u[i] +
+                                  alg.a_matrix[alg.num_stages - 2, 1] *
+                                  integrator.k1[i] +
+                                  alg.a_matrix[alg.num_stages - 2, 2] *
+                                  integrator.k_higher[i]
+        end
+
+        integrator.f(integrator.du, integrator.u_tmp, prob.p,
+                     integrator.t + alg.c[alg.num_stages] * integrator.dt)
+
+        @threaded for i in eachindex(integrator.u)
+            # "Own" PairedExplicitRK based on SSPRK33.
+            # Note that 'k_higher' carries the values of K_{S-1}
+            # and that we construct 'K_S' "in-place" from 'integrator.du'
+            integrator.u[i] += (integrator.k1[i] + integrator.k_higher[i] +
+                                4.0 * integrator.du[i] * integrator.dt) / 6.0
+        end
+    end # PairedExplicitRK step timer
+
+    integrator.iter += 1
+    integrator.t += integrator.dt
+
+    # handle callbacks
+    if callbacks isa CallbackSet
+        for cb in callbacks.discrete_callbacks
+            if cb.condition(integrator.u, integrator.t, integrator)
+                cb.affect!(integrator)
+            end
+        end
+    end
+
+    # respect maximum number of iterations
+    if integrator.iter >= integrator.opts.maxiters && !integrator.finalstep
+        @warn "Interrupted. Larger maxiters is needed."
+        terminate!(integrator)
+    end
+end
+end # @muladd
diff --git a/src/time_integration/paired_explicit_runge_kutta/paired_explicit_runge_kutta.jl b/src/time_integration/paired_explicit_runge_kutta/paired_explicit_runge_kutta.jl
index b73ea758312..c606326738f 100644
--- a/src/time_integration/paired_explicit_runge_kutta/paired_explicit_runge_kutta.jl
+++ b/src/time_integration/paired_explicit_runge_kutta/paired_explicit_runge_kutta.jl
@@ -5,8 +5,178 @@
 @muladd begin
 #! format: noindent
 
-# Basic implementation of the second-order paired explicit Runge-Kutta (PERK) method
-include("methods_PERK2.jl")
 # Define all of the functions necessary for polynomial optimizations
 include("polynomial_optimizer.jl")
+
+# Abstract base type for both single/standalone and multi-level 
+# PERK (Paired-Explicit Runge-Kutta) time integration schemes
+abstract type AbstractPairedExplicitRK end
+# Abstract base type for single/standalone PERK time integration schemes
+abstract type AbstractPairedExplicitRKSingle <: AbstractPairedExplicitRK end
+
+# This struct is needed to fake https://github.com/SciML/OrdinaryDiffEq.jl/blob/0c2048a502101647ac35faabd80da8a5645beac7/src/integrators/type.jl#L1
+mutable struct PairedExplicitRKOptions{Callback, TStops}
+    callback::Callback # callbacks; used in Trixi
+    adaptive::Bool # whether the algorithm is adaptive
+    dtmax::Float64 # ignored
+    maxiters::Int # maximal number of time steps
+    tstops::TStops # tstops from https://diffeq.sciml.ai/v6.8/basics/common_solver_opts/#Output-Control-1; ignored
+end
+
+function PairedExplicitRKOptions(callback, tspan; maxiters = typemax(Int), kwargs...)
+    tstops_internal = BinaryHeap{eltype(tspan)}(FasterForward())
+    # We add last(tspan) to make sure that the time integration stops at the end time
+    push!(tstops_internal, last(tspan))
+    # We add 2 * last(tspan) because add_tstop!(integrator, t) is only called by DiffEqCallbacks.jl if tstops contains a time that is larger than t
+    # (https://github.com/SciML/DiffEqCallbacks.jl/blob/025dfe99029bd0f30a2e027582744528eb92cd24/src/iterative_and_periodic.jl#L92)
+    push!(tstops_internal, 2 * last(tspan))
+    PairedExplicitRKOptions{typeof(callback), typeof(tstops_internal)}(callback,
+                                                                       false, Inf,
+                                                                       maxiters,
+                                                                       tstops_internal)
+end
+
+abstract type AbstractPairedExplicitRKIntegrator end
+abstract type AbstractPairedExplicitRKSingleIntegrator <:
+              AbstractPairedExplicitRKIntegrator end
+
+"""
+    calculate_cfl(ode_algorithm::AbstractPairedExplicitRK, ode)
+
+This function computes the CFL number once using the initial condition of the problem and the optimal timestep (`dt_opt`) from the ODE algorithm.
+"""
+function calculate_cfl(ode_algorithm::AbstractPairedExplicitRK, ode)
+    t0 = first(ode.tspan)
+    u_ode = ode.u0
+    semi = ode.p
+    dt_opt = ode_algorithm.dt_opt
+
+    if isnothing(dt_opt)
+        error("The optimal time step `dt_opt` must be provided.")
+    end
+
+    mesh, equations, solver, cache = mesh_equations_solver_cache(semi)
+    u = wrap_array(u_ode, mesh, equations, solver, cache)
+
+    cfl_number = dt_opt / max_dt(u, t0, mesh,
+                        have_constant_speed(equations), equations,
+                        solver, cache)
+    return cfl_number
+end
+
+# Forward integrator.stats.naccept to integrator.iter (see GitHub PR#771)
+function Base.getproperty(integrator::AbstractPairedExplicitRKIntegrator, field::Symbol)
+    if field === :stats
+        return (naccept = getfield(integrator, :iter),)
+    end
+    # general fallback
+    return getfield(integrator, field)
+end
+
+"""
+    add_tstop!(integrator::AbstractPairedExplicitRKIntegrator, t)
+Add a time stop during the time integration process.
+This function is called after the periodic SaveSolutionCallback to specify the next stop to save the solution.
+"""
+function add_tstop!(integrator::AbstractPairedExplicitRKIntegrator, t)
+    integrator.tdir * (t - integrator.t) < zero(integrator.t) &&
+        error("Tried to add a tstop that is behind the current time. This is strictly forbidden")
+    # We need to remove the first entry of tstops when a new entry is added.
+    # Otherwise, the simulation gets stuck at the previous tstop and dt is adjusted to zero.
+    if length(integrator.opts.tstops) > 1
+        pop!(integrator.opts.tstops)
+    end
+    push!(integrator.opts.tstops, integrator.tdir * t)
+end
+
+has_tstop(integrator::AbstractPairedExplicitRKIntegrator) = !isempty(integrator.opts.tstops)
+first_tstop(integrator::AbstractPairedExplicitRKIntegrator) = first(integrator.opts.tstops)
+
+# Fakes `solve`: https://diffeq.sciml.ai/v6.8/basics/overview/#Solving-the-Problems-1
+function solve(ode::ODEProblem, alg::AbstractPairedExplicitRK;
+               dt, callback = nothing, kwargs...)
+    integrator = init(ode, alg, dt = dt, callback = callback; kwargs...)
+
+    # Start actual solve
+    solve!(integrator)
+end
+
+function solve!(integrator::AbstractPairedExplicitRKIntegrator)
+    @unpack prob = integrator.sol
+
+    integrator.finalstep = false
+
+    @trixi_timeit timer() "main loop" while !integrator.finalstep
+        step!(integrator)
+    end # "main loop" timer
+
+    return TimeIntegratorSolution((first(prob.tspan), integrator.t),
+                                  (prob.u0, integrator.u),
+                                  integrator.sol.prob)
+end
+
+# used for AMR (Adaptive Mesh Refinement)
+function Base.resize!(integrator::AbstractPairedExplicitRKIntegrator, new_size)
+    resize!(integrator.u, new_size)
+    resize!(integrator.du, new_size)
+    resize!(integrator.u_tmp, new_size)
+
+    resize!(integrator.k1, new_size)
+    resize!(integrator.k_higher, new_size)
+end
+
+# get a cache where the RHS can be stored
+get_du(integrator::AbstractPairedExplicitRKIntegrator) = integrator.du
+get_tmp_cache(integrator::AbstractPairedExplicitRKIntegrator) = (integrator.u_tmp,)
+
+# some algorithms from DiffEq like FSAL-ones need to be informed when a callback has modified u
+u_modified!(integrator::AbstractPairedExplicitRKIntegrator, ::Bool) = false
+
+# used by adaptive timestepping algorithms in DiffEq
+function set_proposed_dt!(integrator::AbstractPairedExplicitRKIntegrator, dt)
+    (integrator.dt = dt; integrator.dtcache = dt)
+end
+
+function get_proposed_dt(integrator::AbstractPairedExplicitRKIntegrator)
+    return ifelse(integrator.opts.adaptive, integrator.dt, integrator.dtcache)
+end
+
+# stop the time integration
+function terminate!(integrator::AbstractPairedExplicitRKIntegrator)
+    integrator.finalstep = true
+end
+
+"""
+    modify_dt_for_tstops!(integrator::PairedExplicitRK)
+ 
+Modify the time-step size to match the time stops specified in integrator.opts.tstops.
+To avoid adding OrdinaryDiffEq to Trixi's dependencies, this routine is a copy of
+https://github.com/SciML/OrdinaryDiffEq.jl/blob/d76335281c540ee5a6d1bd8bb634713e004f62ee/src/integrators/integrator_utils.jl#L38-L54
+"""
+function modify_dt_for_tstops!(integrator::AbstractPairedExplicitRKIntegrator)
+    if has_tstop(integrator)
+        tdir_t = integrator.tdir * integrator.t
+        tdir_tstop = first_tstop(integrator)
+        if integrator.opts.adaptive
+            integrator.dt = integrator.tdir *
+                            min(abs(integrator.dt), abs(tdir_tstop - tdir_t)) # step! to the end
+        elseif iszero(integrator.dtcache) && integrator.dtchangeable
+            integrator.dt = integrator.tdir * abs(tdir_tstop - tdir_t)
+        elseif integrator.dtchangeable && !integrator.force_stepfail
+            # always try to step! with dtcache, but lower if a tstop
+            # however, if force_stepfail then don't set to dtcache, and no tstop worry
+            integrator.dt = integrator.tdir *
+                            min(abs(integrator.dtcache), abs(tdir_tstop - tdir_t)) # step! to the end
+        end
+    end
+end
+
+# Add definitions of functions related to polynomial optimization by NLsolve here
+# such that hey can be exported from Trixi.jl and extended in the TrixiConvexECOSExt package
+# extension or by the NLsolve-specific code loaded by Requires.jl
+function solve_a_butcher_coeffs_unknown! end
+
+# Basic implementation of the second-order paired explicit Runge-Kutta (PERK) method
+include("methods_PERK2.jl")
+include("methods_PERK3.jl")
 end # @muladd
diff --git a/test/Project.toml b/test/Project.toml
index c8ae33a40ae..f8bcff947c0 100644
--- a/test/Project.toml
+++ b/test/Project.toml
@@ -10,10 +10,12 @@ FFMPEG = "c87230d0-a227-11e9-1b43-d7ebe4e7570a"
 ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"
+NLsolve = "2774e3e8-f4cf-5e23-947b-6d7e65073b56"
 OrdinaryDiffEq = "1dea7af3-3e70-54e6-95c3-0bf5283fa5ed"
 Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
 Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
+StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
 Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
 
 [compat]
@@ -28,10 +30,12 @@ FFMPEG = "0.4"
 ForwardDiff = "0.10.24"
 LinearAlgebra = "1"
 MPI = "0.20"
+NLsolve = "4.5.1"
 OrdinaryDiffEq = "6.49.1"
 Plots = "1.19"
 Printf = "1"
 Random = "1"
+StableRNGs = "1.0.2"
 Test = "1"
 
 [preferences.OrdinaryDiffEq]
diff --git a/test/test_structured_1d.jl b/test/test_structured_1d.jl
index f64b8c9c065..3061257d508 100644
--- a/test/test_structured_1d.jl
+++ b/test/test_structured_1d.jl
@@ -58,6 +58,42 @@ end
     end
 end
 
+# Testing the third-order paired explicit Runge-Kutta (PERK) method with its optimal CFL number
+@trixi_testset "elixir_burgers_perk3.jl" begin
+    @test_trixi_include(joinpath(EXAMPLES_DIR, "elixir_burgers_perk3.jl"),
+                        l2=[3.8156922097242205e-6],
+                        linf=[2.1962957979626552e-5],
+                        atol=1.0e-6)
+    # Ensure that we do not have excessive memory allocations
+    # (e.g., from type instabilities)
+    let
+        t = sol.t[end]
+        u_ode = sol.u[end]
+        du_ode = similar(u_ode)
+        @test (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 8000
+    end
+end
+
+# Testing the third-order paired explicit Runge-Kutta (PERK) method without stepsize callback
+@trixi_testset "elixir_burgers_perk3.jl(fixed time step)" begin
+    @test_trixi_include(joinpath(EXAMPLES_DIR, "elixir_burgers_perk3.jl"),
+                        dt=2.0e-3,
+                        tspan=(0.0, 2.0),
+                        save_solution=SaveSolutionCallback(dt = 0.1 + 1.0e-8), # Adding a small epsilon to avoid floating-point precision issues
+                        callbacks=CallbackSet(summary_callback, save_solution,
+                                              analysis_callback, alive_callback),
+                        l2=[5.726144786001842e-7],
+                        linf=[3.430730019182704e-6])
+    # Ensure that we do not have excessive memory allocations
+    # (e.g., from type instabilities)
+    let
+        t = sol.t[end]
+        u_ode = sol.u[end]
+        du_ode = similar(u_ode)
+        @test (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 8000
+    end
+end
+
 @trixi_testset "elixir_euler_sedov.jl" begin
     @test_trixi_include(joinpath(EXAMPLES_DIR, "elixir_euler_sedov.jl"),
                         l2=[3.67478226e-01, 3.49491179e-01, 8.08910759e-01],
diff --git a/test/test_unit.jl b/test/test_unit.jl
index 523ce7061ec..b0925f31cd8 100644
--- a/test/test_unit.jl
+++ b/test/test_unit.jl
@@ -10,6 +10,10 @@ using DelimitedFiles: readdlm
 using Convex: Convex
 using ECOS: Optimizer
 
+# Use NLsolve to load the extension that extends functions for testing
+# PERK Single p3 Constructors
+using NLsolve: nlsolve
+
 include("test_trixi.jl")
 
 # Start with a clean environment: remove Trixi.jl output directory if it exists
@@ -1709,6 +1713,42 @@ end
                     0.13942836392866081 0.3605716360713392], atol = 1e-13)
 end
 
+@testset "PERK Single p3 Constructors" begin
+    path_coeff_file = mktempdir()
+    Trixi.download("https://gist.githubusercontent.com/warisa-r/0796db36abcd5abe735ac7eebf41b973/raw/32889062fd5dcf7f450748f4f5f0797c8155a18d/a_8_8.txt",
+                   joinpath(path_coeff_file, "a_8.txt"))
+
+    ode_algorithm = Trixi.PairedExplicitRK3(8, path_coeff_file)
+
+    @test isapprox(ode_algorithm.a_matrix,
+                   [0.33551678438002486 0.06448322158043965
+                    0.49653494442225443 0.10346507941960345
+                    0.6496890912144586 0.15031092070647037
+                    0.789172498521197 0.21082750147880308
+                    0.7522972036571336 0.2477027963428664
+                    0.31192569908571666 0.18807430091428337], atol = 1e-13)
+
+    Trixi.download("https://gist.githubusercontent.com/warisa-r/8d93f6a3ae0635e13b9f51ee32ab7fff/raw/54dc5b14be9288e186b745facb5bbcb04d1476f8/EigenvalueList_Refined2.txt",
+                   joinpath(path_coeff_file, "spectrum.txt"))
+
+    eig_vals = readdlm(joinpath(path_coeff_file, "spectrum.txt"), ComplexF64)
+    tspan = (0.0, 1.0)
+    ode_algorithm = Trixi.PairedExplicitRK3(13, tspan, vec(eig_vals))
+
+    @test isapprox(ode_algorithm.a_matrix,
+                   [0.19121164778938382 0.008788355190848427
+                    0.28723462747227385 0.012765384448655121
+                    0.38017717196008227 0.019822834000382223
+                    0.4706748928843403 0.029325107115659724
+                    0.557574833668358 0.04242519017349991
+                    0.6390917512034328 0.06090823687563831
+                    0.7124876770174374 0.08751233490349149
+                    0.7736369992226316 0.12636297693551043
+                    0.8161315324169078 0.1838684675830921
+                    0.7532704453316061 0.2467295546683939
+                    0.31168238866709846 0.18831761133290154], atol = 1e-13)
+end
+
 @testset "Sutherlands Law" begin
     function mu(u, equations)
         T_ref = 291.15