jump-dev · odow · Feb 29, 2024 · Feb 26, 2024 · Feb 26, 2024 · Feb 26, 2024
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -9,6 +9,7 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 DocumenterCitations = "daee34ce-89f3-4625-b898-19384cb65244"
 Downloads = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
 Dualization = "191a621a-6537-11e9-281d-650236a99e60"
+ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
 GLPK = "60bf3e95-4087-53dc-ae20-288a0d20c6a6"
 HTTP = "cd3eb016-35fb-5094-929b-558a96fad6f3"
 HiGHS = "87dc4568-4c63-4d18-b0c0-bb2238e4078b"

diff --git a/docs/src/manual/nonlinear.md b/docs/src/manual/nonlinear.md
@@ -161,6 +161,14 @@ julia> sin(sin(1.0))
 0.7456241416655579
 ```
 
+## Automatic differentiation
+
+JuMP computes first- and second-order derivatives using sparse reverse-mode
+automatic differentiation. For details, see [ReverseAD](@ref).
+
+For a tutorial on how to construct and query the derivatives, see
+[Computing Hessians](@ref)
+
 ## Nonlinear expressions in detail
 
 Nonlinear expressions in JuMP are represented by a [`NonlinearExpr`](@ref)
@@ -407,18 +415,12 @@ ERROR: Cannot evaluate `>` between a variable and a number.
 In these cases, you should define a [User-defined operator](@ref jump_user_defined_operators)
 using the [`@operator`](@ref) macro.
 
-!!! tip
-    If it works, in most cases, you should prefer to use function tracing
-    instead of defining a user-defined operator. One exception is when the
-    function returns a very large expression (for example, it includes a
-    summation over a million elements). In that case, the user-defined operator
-    can be more efficient.
-
 ## [User-defined operators](@id jump_user_defined_operators)
 
 In addition to a standard list of univariate and multivariate operators
-recognized by the `MOI.Nonlinear` submodule, JuMP supports *user-defined*
-Julia operators.
+recognized by the `MOI.Nonlinear` submodule, JuMP supports user-defined
+operators, which let you represent nonlinear functions that cannot (or should
+not) be traced.
 
 !!! warning
     User-defined operators must return a scalar output. For a work-around, see
@@ -460,6 +462,29 @@ model = Model();
 op_square_2 = model[:op_square]
 ```
 
+## Automatic differentiation
+
+JuMP computes first- and second-order derivatives of expressions using
+[ReverseAD](@ref), which implements sparse reverse-mode automatic
+differentiation. However, because [ReverseAD](@ref) requires the algebraic
+expression as input, JuMP cannot use [ReverseAD](@ref) to differentiate
+user-defined operators.
+
+Instead, unless [Gradients and Hessians](@ref) are explicitly provided,
+user-defined operators must support automatic differentiation by
+[ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl).
+
+The use of FowardDiff.jl has two important implications:
+
+ 1. ForwardDiff.jl supports only a limited subset of Julia. If you encounter an
+    error adding the operator, see [Common mistakes when writing a user-defined operator](@ref).
+ 2. Differentiating operators with many arguments is slow. In general, you
+    should try to keep the number of arguments to less than 100, and ideally, to
+    less than 10.
+
+Because of the use of ForwardDiff, in most cases, you should prefer to use
+function tracing instead of defining a user-defined operator.
+
 ### Add an operator without macros
 
 The [`@operator`](@ref) macro is syntactic sugar for [`add_nonlinear_operator`](@ref).
@@ -622,53 +647,158 @@ f_splat(x..., y..., z)
 @objective(model, Min, op_f(x..., y..., z))
 ```
 
-### Automatic differentiation
-
-JuMP does not support black-box optimization, so all user-defined operators must
-provide derivatives in some form. Fortunately, JuMP supports automatic
-differentiation of user-defined operators.
-
-!!! info
-    Automatic differentiation is *not* finite differencing. JuMP's automatically
-    computed derivatives are not subject to approximation error.
+### Common mistakes when writing a user-defined operator
 
 JuMP uses [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) to
-perform automatic differentiation of user-defined operators; see the ForwardDiff.jl
-[documentation](https://www.juliadiff.org/ForwardDiff.jl/v0.10.2/user/limitations.html)
-for a description of how to write a function suitable for automatic
-differentiation.
+compute the first-order derivatives of user-defined operators. ForwardDiff has a
+number of limitations that you should be aware of when writing user-defined
+operators.
 
-#### Common mistakes when writing a user-defined operator
+The rest of this section provides debugging advice and explains some common
+mistakes.
 
 !!! warning
     Get an error like `No method matching Float64(::ForwardDiff.Dual)`? Read
-    this section, and see the guidelines at [ForwardDiff.jl](https://www.juliadiff.org/ForwardDiff.jl/release-0.10/user/limitations.html).
+    this section.
 
-The most common error is that your user-defined operator is not generic with
-respect to the number type, that is, don't assume that the input to the function
-is `Float64`.
-```julia
-f(x::Float64) = 2 * x  # This will not work.
-f(x::Real)    = 2 * x  # This is good.
-f(x)          = 2 * x  # This is also good.
+#### Debugging
+
+If you add an operator that does not support ForwardDiff, a long error message
+will be printed. You can review the stacktrace for more information, but it can
+often be hard to understand why and where your function is failing.
+
+It may be helpful to debug the operator outside of JuMP as follows.
+
+If the operator is univariate, do:
+```jldoctest
+julia> import ForwardDiff
+
+julia> my_operator(a) = a^2
+my_operator (generic function with 1 method)
+
+julia> ForwardDiff.derivative(my_operator, 1.0)
+2.0
 ```
 
-Another reason you may encounter this error is if you create arrays inside
-your function which are `Float64`.
+If the operator is multivariate, do:
+```jldoctest
+julia> import ForwardDiff
+
+julia> my_operator(a, b) = a^2 + b^2
+my_operator (generic function with 1 method)
+
+julia> ForwardDiff.gradient(x -> my_operator(x...), [1.0, 2.0])
+2-element Vector{Float64}:
+ 2.0
+ 4.0
+```
+Note that even though the operator takes the splatted arguments,
+`ForwardDiff.gradient` requires a vector as input.
+
+#### Operator calls something unsupported by ForwardDiff
+
+ForwardDiff works by overloading many Julia functions for a special type
+`ForwardDiff.Dual <: Real`. If your operator attempts to call a function for
+which an overload has not been defined, a `MethodError` will be thrown.
+
+For example, your operator cannot call external C functions, or be the optimal
+objective value of a JuMP model.
+
+```jldoctest
+julia> import ForwardDiff
+
+julia> my_operator_bad(x) = @ccall sqrt(x::Cdouble)::Cdouble
+my_operator_bad (generic function with 1 method)
+
+julia> ForwardDiff.derivative(my_operator_bad, 1.0)
+ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(my_operator_bad), Float64}, Float64, 1})
+[...]
+```
+
+Unfortunately, the list of calls supported by ForwardDiff is too large to
+enumerate what is an isn't allowed, so the best advice is to try and see if it
+works.
+
+#### Operator does not accept splatted input
+
+The operator takes `f(x::Vector)` as input, instead of the splatted `f(x...)`.
+
+```jldoctest
+julia> import ForwardDiff
+
+julia> my_operator_bad(x::Vector) = sum(x[i]^2 for i in eachindex(x))
+my_operator_bad (generic function with 1 method)
+
+julia> my_operator_good(x...) = sum(x[i]^2 for i in eachindex(x))
+my_operator_good (generic function with 1 method)
+
+julia> ForwardDiff.gradient(x -> my_operator_bad(x...), [1.0, 2.0])
+ERROR: MethodError: no method matching my_operator_bad(::ForwardDiff.Dual{ForwardDiff.Tag{var"#5#6", Float64}, Float64, 2}, ::ForwardDiff.Dual{ForwardDiff.Tag{var"#5#6", Float64}, Float64, 2})
+[...]
+
+julia> ForwardDiff.gradient(x -> my_operator_good(x...), [1.0, 2.0])
+2-element Vector{Float64}:
+ 2.0
+ 4.0
+```
+
+#### Operator assumes `Float64` as input
+
+The operator assumes `Float64` will be passed as input, but it must work for any
+generic `Real` type.
+
+```jldoctest
+julia> import ForwardDiff
+
+julia> my_operator_bad(x::Float64...) = sum(x[i]^2 for i in eachindex(x))
+my_operator_bad (generic function with 1 method)
+
+julia> my_operator_good(x::Real...) = sum(x[i]^2 for i in eachindex(x))
+my_operator_good (generic function with 1 method)
+
+julia> ForwardDiff.gradient(x -> my_operator_bad(x...), [1.0, 2.0])
+ERROR: MethodError: no method matching my_operator_bad(::ForwardDiff.Dual{ForwardDiff.Tag{var"#5#6", Float64}, Float64, 2}, ::ForwardDiff.Dual{ForwardDiff.Tag{var"#5#6", Float64}, Float64, 2})
+[...]
+
+julia> ForwardDiff.gradient(x -> my_operator_good(x...), [1.0, 2.0])
+2-element Vector{Float64}:
+ 2.0
+ 4.0
+```
+
+#### Operator allocates `Float64` storage
+
+The operator allocates temporary storage using `zeros(3)` or similar. This
+defaults to `Float64`, so use `zeros(T, 3)` instead.
+
 ```julia
-function bad_f(x...)
-    y = zeros(length(x))  # This constructs an array of `Float64`!
-    for i in 1:length(x)
-        y[i] = x[i]^i
-    end
-    return sum(y)
-end
+julia> import ForwardDiff
+
+julia> function my_operator_bad(x::Real...)
+           # This line is problematic. zeros(n) is short for zeros(Float64, n)
+           y = zeros(length(x))
+           for i in eachindex(x)
+               y[i] = x[i]^2
+           end
+           return sum(y)
+       end
+my_operator_bad (generic function with 1 method)
+
+julia> function my_operator_good(x::T...) where {T<:Real}
+           y = zeros(T, length(x))
+           for i in eachindex(x)
+               y[i] = x[i]^2
+           end
+           return sum(y)
+       end
+my_operator_good (generic function with 1 method)
+
+julia> ForwardDiff.gradient(x -> my_operator_bad(x...), [1.0, 2.0])
+ERROR: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{var"#1#2", Float64}, Float64, 2})
+[...]
 
-function good_f(x::T...) where {T<:Real}
-    y = zeros(T, length(x))  # Construct an array of type `T` instead!
-    for i in 1:length(x)
-        y[i] = x[i]^i
-    end
-    return sum(y)
-end
+julia> ForwardDiff.gradient(x -> my_operator_good(x...), [1.0, 2.0])
+2-element Vector{Float64}:
+ 2.0
+ 4.0
 ```
diff --git a/docs/styles/Vocab/JuMP-Vocab/accept.txt b/docs/styles/Vocab/JuMP-Vocab/accept.txt
@@ -39,7 +39,7 @@ preprint
 README
 recurse
 reimplemented
-Stacktrace
+[Ss]tacktrace
 subexpression(?s)
 subgraph(?s)
 subtyping