Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review feedback #82

Closed
odow opened this issue Aug 26, 2024 · 16 comments
Closed

Review feedback #82

odow opened this issue Aug 26, 2024 · 16 comments

Comments

@odow
Copy link
Collaborator

odow commented Aug 26, 2024

The purpose of this issue is to collate and discuss user feedback.

Layout

Predictors all live in
https://github.com/lanl-ansi/MathOptAI.jl/tree/main/src/predictors

Extensions live in
https://github.com/lanl-ansi/MathOptAI.jl/tree/main/ext

Documentation

The docs can be difficult to build, because it requires a PyTorch installation via CONDA.

You might need to uncomment:

# julia> ENV["JULIA_CONDAPKG_BACKEND"] = "Current"

Otherwise, you'll need to make do with reading the source until I can set up CI (we need the repo to be public first).

Here's a good tutorial intro:
https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/tutorials/mnist.jl

The predictors all have doctrings and examples

"""
Affine(
A::Matrix{Float64},
b::Vector{Float64} = zeros(size(A, 1)),
) <: AbstractPredictor
An [`AbstractPredictor`](@ref) that represents the affine relationship:
```math
f(x) = A x + b
```
## Example
```jldoctest
julia> using JuMP, MathOptAI
julia> model = Model();
julia> @variable(model, x[1:2]);
julia> f = MathOptAI.Affine([2.0, 3.0])
Affine(A, b) [input: 2, output: 1]
julia> y = MathOptAI.add_predictor(model, f, x)
1-element Vector{VariableRef}:
moai_Affine[1]
julia> print(model)
Feasibility
Subject to
2 x[1] + 3 x[2] - moai_Affine[1] = 0
julia> y = MathOptAI.add_predictor(model, MathOptAI.ReducedSpace(f), x)
1-element Vector{AffExpr}:
2 x[1] + 3 x[2]
```
"""

Return structs

Read #67 and #80. Thoughts, comments, and ideas?

Comparison to existing projects

Read https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/developers/design_principles.md. Have I misrepresented anything, or left anything out?

@pulsipher
Copy link

pulsipher commented Aug 27, 2024

This is awesome work, I'm excited for this to be released and grow. After a high-level pass, here are my initial thoughts.

Documentation

Some thoughts for improvement:

  • Provide some more info on the different ReLU formulations on https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/manual/predictors.md
  • If possible, provide a mathematical definition of the formulation form of BinaryDecisionTree and Quantile on the above page
  • More info/discussion on each of the manual pages would be nice
  • PyTorch models should have a manual page
  • Would be good to provide context on why vec(x) is used:
    function find_adversarial_image(test_case; adversary_label, δ = 0.05)
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, 0 <= x[1:28, 1:28] <= 1)
    @constraint(model, -δ .<= x .- test_case.features .<= δ)
    y = MathOptAI.add_predictor(model, ml_model, vec(x))
    @objective(model, Max, y[adversary_label+1] - y[test_case.targets+1])
    optimize!(model)
    @assert is_solved_and_feasible(model)
    return value.(x)
    end
  • Provide more info on what this model is trying to accomplish: https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/tutorials/pytorch.jl. What is the optimization formulation doing?
  • I like how this tutorial goes through the model step by step: https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/tutorials/student_enrollment.jl
  • The mathematical definition of each reformulation should be provided for each add_predictor method and some summarizing documentation would be nice to to anticipate what will be added in terms of constraints, variables, and new nonlinear operators.
  • The accepted format of the config dictionary should be more precisely described
  • The accepted forms of the argument x::Vector should be more precisely described/checked
  • The accepted file extension types for PyTorchModels should be documented.

Return structs

Read #67 and #80. Thoughts, comments, and ideas?

I definitely think it is a good idea to have access to the variables and constraints created for the predictor. This helps to demystify the transformations and I believe provides a way to delete a predictor (i.e., manually deleting the variables and constraints).

Here, are some of my thoughts on syntax to accomplish this. I prefer options 2 or 3.

1. Using the approach proposed in #80

I don't have any major issues with this approach, except that I think add_predictor should return y, formulation instead of formulation, y since I imagine the formulation will often not be used and the user would just want y, _ = add_predictor. Though, admittedly it might be a little annoying to have to deal with the 2nd argument when it is often not needed.

One side question would be why is Vector{Any} required for the variables field?

2. Tweaking #80 to return only one object

Instead of returning y and formulation, we could return a reformulation block of the form:

struct SimpleFormulation{P<:AbstractPredictor} <: AbstractFormulation
    predictor::P
    outputs::Array{Any} # new field for `y`
    variables::Vector{Any}
    constraints::Vector{Any}
end

Then the user can just extract the outputs y from the formulation object as wanted. Going one step further, one could even overload getindex such that:

Base.getindex(f::SimpleFormulation, inds...) = getindex(f.outputs, inds...)

3. Store the formulation info in the model and use references

Adding a little more complexity, we could store the formulation objects in model.ext and have add_predictor return a predictor reference object that points to it. Then the reference can be used to simplify the user experience in like manner to VariableRefs and ConstraintRefs (and similar to what DisjunctiveProgramming does with disjunctions). A rough API might look like:

predictor = add_predictor(model, nn, x)
y = outputs(predictor)
cons = transformation_constraints(predictor)
vars = transformation_variables(predictor)
predictor_obj = predictor_object(predictor)
set_transformation(predictor, TransformType()) # change the transformation used in-place
delete(model, predictor) # removes the predictor and anything it added to the model

Most of the above API could also be added with option 2. Moreover, we could also overload getindex to index the output variables.

See #67

Comparison to existing projects

Read https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/developers/design_principles.md. Have I misrepresented anything, or left anything out?

The comparison seems fair, but it could use some improvements.

This part should add more discussion on what you would like this syntax to accomplish in the ideal case, it is not immediately obvious as is:

A second downside is that the predictor must describe the input and output
dimension; these cannot be inferred automatically. As one example, this means
that it cannot do the following:
```python
# Not possible because dimension not given
model.pred_constr.build_formulation(ReLU())
```

What about reduced-space formulations?

First, the user probably already has the input `x` as decision variables or an
expression in the model, so we do not need the `connect_input` constraint, and
because we use a full-space formulation, the output `y` will always be a vector
of decision variables, which avoids the need for a `connect_output` constraint.

Typo?

The main downsides are that we do not return a `pred_constr` equivalent
contains statistics on the reformulation, and that the user cannot delete a

Some additional points that might be worth mentioning:

  • How do the different packages handle transformations? Which ones are supported? How does the user control it?
  • How do the packages differ in terms of handling variables that are scaled when the predictor is trained?
  • What are key terminology differences? For instance, OMLT doesn't use the term predictor.
  • Notably, MathOptAI is in Julia and not Python which has implications for what predictors are supports

Also, there is a recent review article that goes over a lot of the current software tools for embedding surrogate predictors in optimization models that is worth a look: https://doi.org/10.1021/acs.iecr.4c00632

Other thoughts

Finally, here are some other ideas/thoughts that came to mind.

Reduced-space formulations

I think the method ReducedSpace(predictor::ReducedSpace) = predictor should be added to avoid

if reduced_space
# If config maps to a ReducedSpace predictor, we'll get a MethodError
# when trying to add the nested redcued space predictors.
# TODO: raise a nicer error or try to handle this gracefully.
inner_predictor = MathOptAI.ReducedSpace(inner_predictor)
end
. Also, I would generally expect specifying reduced_space = true to work on NNs and simply ignore this option for layers where it is not applicable, perhaps throwing a warning instead of an error.

Scaling

Automating the scaling for the inputs/outputs would be a handy capability that I use all the time in OMLT. This really just amounts to an affine transformation that might have some overlap to JuMP supporting units.

See #87

Black-box transformations

As discussed in #70, it would be very nice to automate the embedding of NNs (and other predictors) as black-box functions that are treated as nonlinear operators. In my research focusing on using smooth NNs in nonlinear optimal control formulations, we have found that treating the NN as an operator that gets its derivatives from the ML environment (e.g., using torch.func from PyTorch) significantly outperforms embedding the NN as algebraic constraints (benchmarking OMLT against using PyNumero's greybox API).

Naturally, JuMP's nonlinear operator API is scalarized, so I am not sure how well it will work for predictors with many inputs and outputs. This definitely motivates the milestone to add vectorized operator support in JuMP.

See #90

Other models

Adding support for the following would be useful:

@pulsipher
Copy link

pulsipher commented Aug 27, 2024

Also, I would like to look into generalizing MathOptAI to work with JuMP.AbstractModels such that it is compatible with InfiniteOpt. It would likely need to be something similar to hdavid16/DisjunctiveProgramming.jl#114.

On a similar note, wouldn't it be prudent to generalize MathOptAI so support GenericModel instead of Model?.

Fixed by #83

@odow
Copy link
Collaborator Author

odow commented Aug 27, 2024

Niiiice @pulsipher. This is super helpful.

How common are:

set_transformation(predictor, TransformType())
delete(model, predictor)

Delete seems easy. set_transformation seems hard (to do efficiently).

@odow
Copy link
Collaborator Author

odow commented Aug 27, 2024

On scaling: I really didn't see the point of adding this to MathOptAI. Isn't it just a transformation you can trivially do in user-code?

@odow
Copy link
Collaborator Author

odow commented Aug 28, 2024

Also, there is a recent review article that goes over a lot of the current software tools for embedding surrogate predictors in optimization models that is worth a look: https://doi.org/10.1021/acs.iecr.4c00632

This is a pretty nice overview article. I think we're in a good position to play to JuMP's strengths with the wide variety of base AML features and multiple dispatch to provide a really great library.

@pulsipher
Copy link

How common are:

set_transformation(predictor, TransformType())
delete(model, predictor)

Delete seems easy. set_transformation seems hard (to do efficiently).

Deletion is a capability provided by alternative tools and seems easy to implement in MathOptAI. Admittedly, this is not a feature I really use, but I also don't tend use delete variables or constraints in JuMP models either. One possible scenario, might be wanting to replace a predictor with one that has updated weights.

Setting the transformation is not a critical feature, but we have used such workflows when comparing the performance of different transformation approaches without having to rebuild the model each time. It has been helpful on large models where the build time is considerable. In terms of performance, simply deleting the old constraints and replacing these with the new transformation would be sufficient. I would think this is straightforward with full-space formulations since you can just reuse the previous output variables, but reduced-space formulations would definitely be more tricky. Perhaps such a feature would be limited to full-space if it were to be added. Alternatively, the user could just manually delete the predictor (assuming this capability is added) along with the constraints/objective it was used in and then add it again using the new transformation method.

@odow
Copy link
Collaborator Author

odow commented Aug 28, 2024

Deletion is a capability provided by alternative tools

This isn't an argument that we should also implement it 😄 I'd much prefer we went for simplicity over a bag-of-features that are not used.

@odow
Copy link
Collaborator Author

odow commented Aug 28, 2024

For black-box outputs, we could automate wrapping @operator and building the appropriate derivatives. And for vector-valued, we could also automate the memoization stuff.

It seems like a reasonable request. I'll open a separate issue.

@odow
Copy link
Collaborator Author

odow commented Aug 28, 2024

Thanks for the review @pulsipher. I think we've made a bunch of nice changes, and there are some more in the pipeline.

@pulsipher
Copy link

Thanks for the review @pulsipher. I think we've made a bunch of nice changes, and there are some more in the pipeline.

Happy to help, this will integrate quite nicely into my research group. Thanks for all your hard work and the quick turnaround on making changes.

@pulsipher
Copy link

pulsipher commented Aug 28, 2024

Taking a closer look at the extensions for Lux and Flux, I believe error checking could be improved a bit.

With Flux, attempting to add a Chain with an unsupported layer type (e.g., Flux.Conv) would lead to a MethodError with _add_predictor which seems less than ideal.

Attempting the same with Lux would erroneously throw an error like Unsupported activation function: Lux.Conv since activation is not typed in _add_predictor.

Fixed by #95

@pulsipher
Copy link

Also, this function:

function add_predictor(model::JuMP.AbstractModel, predictor, x::Matrix)
y = map(j -> add_predictor(model, predictor, x[:, j]), 1:size(x, 2))
return reduce(hcat, y)
end

seems to assume that predictors will always take a vector of inputs, but I can envision wanting to support predictors later on that take in array inputs (e.g., CNNs, neural operators).

@odow
Copy link
Collaborator Author

odow commented Aug 28, 2024

I believe error checking could be improved a bit

Most definitely. Will be easier to catch and improve all these once we have CI and coverage up and running, etc.

seems to assume that predictors will always take a vector of inputs

Yes!!! I should discuss this as a design principle. I think Julia libraries too quickly lean to "anything goes". I want to keep inputs as a Base.Vector, and multiple inputs as a Base.Matrix. The matrix method is so that we can do this:

evaluate_df.enroll = MathOptAI.add_predictor(model, model_glm, evaluate_df);

Otherwise, it would need

evaluate_df.enroll = map(eachrow(evaluate_df)) do row
    return only(MathOptAI.add_predictor(model, model_glm, row))
end

Although, on reflection, perhaps that's not too bad.

@pulsipher
Copy link

Yes!!! I should discuss this as a design principle. I think Julia libraries too quickly lean to "anything goes". I want to keep inputs as a Base.Vector, and multiple inputs as a Base.Matrix.

This philosophy makes sense, thanks for adding the clarification to the docs.

In near future however I would very much like to add support for CNNs which are not readily compatible with vector inputs. To deal with this, I see two main options:

  1. Allow predictors to MathOptAI to accept Base.Arrays of any dimension
  2. Make the MathOptAI version of CNN layers reshape the inputs and outputs internally

My intuition is that option 1 would be simpler to implement and simpler for the user.

@odow
Copy link
Collaborator Author

odow commented Aug 29, 2024

So one thing that would be super helpful for this are example/tutorials.

I tend to lean towards (2), provided that the reshaping is exactly vec(input) and reshape(output, size).

My intuition is that option 1 would be simpler to implement and simpler for the user.

I don't disagree. But if they're using PyTorch/(F)lux, then they don't care how the layer is implemented internally. It would only be if they manually are using the MathOptAI layers directly.

I'm a little concerned about scope explosion for this library. There are just so many things we could do, and it isn't obvious which ones are must-haves and which ones are niche features that a single user needs.

As one example, I used to have a LogisticRegression layer, but now it is Pipeline(Affine, Sigmoid). I really really want to minimize the number of unique concepts, and suddenly having to start worrying about the shape of inputs and outputs could be tricky.

Before we did anything like this, we need many more examples.

@odow
Copy link
Collaborator Author

odow commented Oct 24, 2024

Closing this issue because I think it has run its course. We can open more focused issues to discuss input types etc if/when we decide to start supporting CNNs etc.

@odow odow closed this as completed Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants