Review feedback #82

odow · 2024-08-26T21:56:39Z

The purpose of this issue is to collate and discuss user feedback.

Layout

Predictors all live in
https://github.com/lanl-ansi/MathOptAI.jl/tree/main/src/predictors

Extensions live in
https://github.com/lanl-ansi/MathOptAI.jl/tree/main/ext

Documentation

The docs can be difficult to build, because it requires a PyTorch installation via CONDA.

You might need to uncomment:

MathOptAI.jl/docs/src/tutorials/pytorch.jl

Line 22 in e1e4f47

# julia> ENV["JULIA_CONDAPKG_BACKEND"] = "Current"

Otherwise, you'll need to make do with reading the source until I can set up CI (we need the repo to be public first).

Here's a good tutorial intro:
https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/tutorials/mnist.jl

The predictors all have doctrings and examples

MathOptAI.jl/src/predictors/Affine.jl

Lines 7 to 43 in e1e4f47

    
           """ 
        
               Affine( 
        
                   A::Matrix{Float64}, 
        
                   b::Vector{Float64} = zeros(size(A, 1)), 
        
               ) <: AbstractPredictor 
        
           An [`AbstractPredictor`](@ref) that represents the affine relationship: 
        
           ```math 
        
           f(x) = A x + b 
        
           ``` 
        
           ## Example 
        
           ```jldoctest 
        
           julia> using JuMP, MathOptAI 
        
           julia> model = Model(); 
        
           julia> @variable(model, x[1:2]); 
        
           julia> f = MathOptAI.Affine([2.0, 3.0]) 
        
           Affine(A, b) [input: 2, output: 1] 
        
           julia> y = MathOptAI.add_predictor(model, f, x) 
        
           1-element Vector{VariableRef}: 
        
            moai_Affine[1] 
        
           julia> print(model) 
        
           Feasibility 
        
           Subject to 
        
            2 x[1] + 3 x[2] - moai_Affine[1] = 0 
        
           julia> y = MathOptAI.add_predictor(model, MathOptAI.ReducedSpace(f), x) 
        
           1-element Vector{AffExpr}: 
        
            2 x[1] + 3 x[2] 
        
           ``` 
        
           """

Return structs

Read #67 and #80. Thoughts, comments, and ideas?

Comparison to existing projects

Read https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/developers/design_principles.md. Have I misrepresented anything, or left anything out?

pulsipher · 2024-08-27T15:30:58Z

This is awesome work, I'm excited for this to be released and grow. After a high-level pass, here are my initial thoughts.

Documentation

Some thoughts for improvement:

~~Provide some more info on the different ReLU formulations on https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/manual/predictors.md~~
If possible, provide a mathematical definition of the formulation form of BinaryDecisionTree and Quantile on the above page
More info/discussion on each of the manual pages would be nice
~~PyTorch models should have a manual page~~

Would be good to provide context on why vec(x) is used:

MathOptAI.jl/docs/src/tutorials/mnist.jl

Lines 148 to 158 in e1e4f47

function find_adversarial_image(test_case; adversary_label, δ = 0.05)

model = Model(Ipopt.Optimizer)

set_silent(model)

@variable(model, 0 <= x[1:28, 1:28] <= 1)

@constraint(model, -δ .<= x .- test_case.features .<= δ)

y = MathOptAI.add_predictor(model, ml_model, vec(x))

@objective(model, Max, y[adversary_label+1] - y[test_case.targets+1])

optimize!(model)

@assert is_solved_and_feasible(model)

return value.(x)

end

~~Provide more info on what this model is trying to accomplish: https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/tutorials/pytorch.jl. What is the optimization formulation doing?~~
~~I like how this tutorial goes through the model step by step: https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/tutorials/student_enrollment.jl~~
The mathematical definition of each reformulation should be provided for each add_predictor method and some summarizing documentation would be nice to to anticipate what will be added in terms of constraints, variables, and new nonlinear operators.
~~The accepted format of the config dictionary should be more precisely described~~
~~The accepted forms of the argument x::Vector should be more precisely described/checked~~
~~The accepted file extension types for PyTorchModels should be documented.~~

Return structs

Read #67 and #80. Thoughts, comments, and ideas?

I definitely think it is a good idea to have access to the variables and constraints created for the predictor. This helps to demystify the transformations and I believe provides a way to delete a predictor (i.e., manually deleting the variables and constraints).

Here, are some of my thoughts on syntax to accomplish this. I prefer options 2 or 3.

1. Using the approach proposed in #80

I don't have any major issues with this approach, except that I think add_predictor should return y, formulation instead of formulation, y since I imagine the formulation will often not be used and the user would just want y, _ = add_predictor. Though, admittedly it might be a little annoying to have to deal with the 2nd argument when it is often not needed.

One side question would be why is Vector{Any} required for the variables field?

2. Tweaking #80 to return only one object

Instead of returning y and formulation, we could return a reformulation block of the form:

struct SimpleFormulation{P<:AbstractPredictor} <: AbstractFormulation predictor::P outputs::Array{Any} # new field for `y` variables::Vector{Any} constraints::Vector{Any} end

Then the user can just extract the outputs y from the formulation object as wanted. Going one step further, one could even overload getindex such that:

Base.getindex(f::SimpleFormulation, inds...) = getindex(f.outputs, inds...)

3. Store the formulation info in the model and use references

Adding a little more complexity, we could store the formulation objects in model.ext and have add_predictor return a predictor reference object that points to it. Then the reference can be used to simplify the user experience in like manner to VariableRefs and ConstraintRefs (and similar to what DisjunctiveProgramming does with disjunctions). A rough API might look like:

predictor = add_predictor(model, nn, x) y = outputs(predictor) cons = transformation_constraints(predictor) vars = transformation_variables(predictor) predictor_obj = predictor_object(predictor) set_transformation(predictor, TransformType()) # change the transformation used in-place delete(model, predictor) # removes the predictor and anything it added to the model

~~Most of the above API could also be added with option 2. Moreover, we could also overload getindex to index the output variables.~~

See #67

Comparison to existing projects

Read https://github.com/lanl-ansi/MathOptAI.jl/blob/main/docs/src/developers/design_principles.md. Have I misrepresented anything, or left anything out?

The comparison seems fair, but it could use some improvements.

~~This part should add more discussion on what you would like this syntax to accomplish in the ideal case, it is not immediately obvious as is:~~

MathOptAI.jl/docs/src/developers/design_principles.md

Lines 108 to 114 in e1e4f47

A second downside is that the predictor must describe the input and output

dimension; these cannot be inferred automatically. As one example, this means

that it cannot do the following:

```python

# Not possible because dimension not given

model.pred_constr.build_formulation(ReLU())

```

~~What about reduced-space formulations?~~

MathOptAI.jl/docs/src/developers/design_principles.md

Lines 126 to 129 in e1e4f47

First, the user probably already has the input `x` as decision variables or an

expression in the model, so we do not need the `connect_input` constraint, and

because we use a full-space formulation, the output `y` will always be a vector

of decision variables, which avoids the need for a `connect_output` constraint.

~~Typo?~~

MathOptAI.jl/docs/src/developers/design_principles.md

Lines 137 to 138 in e1e4f47

The main downsides are that we do not return a `pred_constr` equivalent

contains statistics on the reformulation, and that the user cannot delete a

Some additional points that might be worth mentioning:

~~How do the different packages handle transformations? Which ones are supported? How does the user control it?~~
~~How do the packages differ in terms of handling variables that are scaled when the predictor is trained?~~
~~What are key terminology differences? For instance, OMLT doesn't use the term predictor.~~
Notably, MathOptAI is in Julia and not Python which has implications for what predictors are supports

Also, there is a recent review article that goes over a lot of the current software tools for embedding surrogate predictors in optimization models that is worth a look: https://doi.org/10.1021/acs.iecr.4c00632

Other thoughts

Finally, here are some other ideas/thoughts that came to mind.

Reduced-space formulations

~~I think the method ReducedSpace(predictor::ReducedSpace) = predictor should be added to avoid~~

MathOptAI.jl/ext/MathOptAIPythonCallExt.jl

Lines 45 to 50 in e1e4f47

if reduced_space

# If config maps to a ReducedSpace predictor, we'll get a MethodError

# when trying to add the nested redcued space predictors.

# TODO: raise a nicer error or try to handle this gracefully.

inner_predictor = MathOptAI.ReducedSpace(inner_predictor)

end

~~. Also, I would generally expect specifying reduced_space = true to work on NNs and simply ignore this option for layers where it is not applicable, perhaps throwing a warning instead of an error.~~

Scaling

Automating the scaling for the inputs/outputs would be a handy capability that I use all the time in OMLT. This really just amounts to an affine transformation that might have some overlap to JuMP supporting units.

See #87

Black-box transformations

As discussed in #70, it would be very nice to automate the embedding of NNs (and other predictors) as black-box functions that are treated as nonlinear operators. In my research focusing on using smooth NNs in nonlinear optimal control formulations, we have found that treating the NN as an operator that gets its derivatives from the ML environment (e.g., using torch.func from PyTorch) significantly outperforms embedding the NN as algebraic constraints (benchmarking OMLT against using PyNumero's greybox API).

Naturally, JuMP's nonlinear operator API is scalarized, so I am not sure how well it will work for predictors with many inputs and outputs. This definitely motivates the milestone to add vectorized operator support in JuMP.

See #90

Other models

Adding support for the following would be useful:

Tensorflow models
CNNs
Linear model decision trees (see https://omlt.readthedocs.io/en/latest/api_doc/omlt.linear_tree.html and https://doi.org/10.1016/j.compchemeng.2023.108347)
Some models from scikit-learn

pulsipher · 2024-08-27T15:45:33Z

Also, I would like to look into generalizing MathOptAI to work with JuMP.AbstractModels such that it is compatible with InfiniteOpt. It would likely need to be something similar to hdavid16/DisjunctiveProgramming.jl#114.

~~On a similar note, wouldn't it be prudent to generalize MathOptAI so support GenericModel instead of Model?~~.

Fixed by #83

odow · 2024-08-27T21:30:50Z

Niiiice @pulsipher. This is super helpful.

How common are:

set_transformation(predictor, TransformType())
delete(model, predictor)

Delete seems easy. set_transformation seems hard (to do efficiently).

odow · 2024-08-27T22:48:11Z

On scaling: I really didn't see the point of adding this to MathOptAI. Isn't it just a transformation you can trivially do in user-code?

odow · 2024-08-28T00:08:44Z

Also, there is a recent review article that goes over a lot of the current software tools for embedding surrogate predictors in optimization models that is worth a look: https://doi.org/10.1021/acs.iecr.4c00632

This is a pretty nice overview article. I think we're in a good position to play to JuMP's strengths with the wide variety of base AML features and multiple dispatch to provide a really great library.

pulsipher · 2024-08-28T02:32:14Z

How common are:
set_transformation(predictor, TransformType())
delete(model, predictor)
Delete seems easy. set_transformation seems hard (to do efficiently).

Deletion is a capability provided by alternative tools and seems easy to implement in MathOptAI. Admittedly, this is not a feature I really use, but I also don't tend use delete variables or constraints in JuMP models either. One possible scenario, might be wanting to replace a predictor with one that has updated weights.

Setting the transformation is not a critical feature, but we have used such workflows when comparing the performance of different transformation approaches without having to rebuild the model each time. It has been helpful on large models where the build time is considerable. In terms of performance, simply deleting the old constraints and replacing these with the new transformation would be sufficient. I would think this is straightforward with full-space formulations since you can just reuse the previous output variables, but reduced-space formulations would definitely be more tricky. Perhaps such a feature would be limited to full-space if it were to be added. Alternatively, the user could just manually delete the predictor (assuming this capability is added) along with the constraints/objective it was used in and then add it again using the new transformation method.

odow · 2024-08-28T02:37:53Z

Deletion is a capability provided by alternative tools

This isn't an argument that we should also implement it 😄 I'd much prefer we went for simplicity over a bag-of-features that are not used.

odow · 2024-08-28T03:09:13Z

For black-box outputs, we could automate wrapping @operator and building the appropriate derivatives. And for vector-valued, we could also automate the memoization stuff.

It seems like a reasonable request. I'll open a separate issue.

odow · 2024-08-28T03:25:50Z

Thanks for the review @pulsipher. I think we've made a bunch of nice changes, and there are some more in the pipeline.

pulsipher · 2024-08-28T14:13:13Z

Thanks for the review @pulsipher. I think we've made a bunch of nice changes, and there are some more in the pipeline.

Happy to help, this will integrate quite nicely into my research group. Thanks for all your hard work and the quick turnaround on making changes.

pulsipher · 2024-08-28T15:11:22Z

~~Taking a closer look at the extensions for Lux and Flux, I believe error checking could be improved a bit.~~

~~With Flux, attempting to add a Chain with an unsupported layer type (e.g., Flux.Conv) would lead to a MethodError with _add_predictor which seems less than ideal.~~

~~Attempting the same with Lux would erroneously throw an error like Unsupported activation function: Lux.Conv since activation is not typed in _add_predictor.~~

Fixed by #95

pulsipher · 2024-08-28T15:17:16Z

Also, this function:

MathOptAI.jl/src/MathOptAI.jl

Lines 93 to 96 in 36c9739

    
           function add_predictor(model::JuMP.AbstractModel, predictor, x::Matrix) 
        
               y = map(j -> add_predictor(model, predictor, x[:, j]), 1:size(x, 2)) 
        
               return reduce(hcat, y) 
        
           end

seems to assume that predictors will always take a vector of inputs, but I can envision wanting to support predictors later on that take in array inputs (e.g., CNNs, neural operators).

odow · 2024-08-28T21:59:36Z

I believe error checking could be improved a bit

Most definitely. Will be easier to catch and improve all these once we have CI and coverage up and running, etc.

seems to assume that predictors will always take a vector of inputs

Yes!!! I should discuss this as a design principle. I think Julia libraries too quickly lean to "anything goes". I want to keep inputs as a Base.Vector, and multiple inputs as a Base.Matrix. The matrix method is so that we can do this:

MathOptAI.jl/docs/src/tutorials/student_enrollment.jl

Line 85 in 36c9739

evaluate_df.enroll = MathOptAI.add_predictor(model, model_glm, evaluate_df);

Otherwise, it would need

evaluate_df.enroll = map(eachrow(evaluate_df)) do row
    return only(MathOptAI.add_predictor(model, model_glm, row))
end

Although, on reflection, perhaps that's not too bad.

pulsipher · 2024-08-29T01:22:27Z

Yes!!! I should discuss this as a design principle. I think Julia libraries too quickly lean to "anything goes". I want to keep inputs as a Base.Vector, and multiple inputs as a Base.Matrix.

This philosophy makes sense, thanks for adding the clarification to the docs.

In near future however I would very much like to add support for CNNs which are not readily compatible with vector inputs. To deal with this, I see two main options:

Allow predictors to MathOptAI to accept Base.Arrays of any dimension
Make the MathOptAI version of CNN layers reshape the inputs and outputs internally

My intuition is that option 1 would be simpler to implement and simpler for the user.

odow · 2024-08-29T01:36:06Z

So one thing that would be super helpful for this are example/tutorials.

I tend to lean towards (2), provided that the reshaping is exactly vec(input) and reshape(output, size).

My intuition is that option 1 would be simpler to implement and simpler for the user.

I don't disagree. But if they're using PyTorch/(F)lux, then they don't care how the layer is implemented internally. It would only be if they manually are using the MathOptAI layers directly.

I'm a little concerned about scope explosion for this library. There are just so many things we could do, and it isn't obvious which ones are must-haves and which ones are niche features that a single user needs.

As one example, I used to have a LogisticRegression layer, but now it is Pipeline(Affine, Sigmoid). I really really want to minimize the number of unique concepts, and suddenly having to start worrying about the shape of inputs and outputs could be tricky.

Before we did anything like this, we need many more examples.

odow · 2024-10-24T02:53:06Z

Closing this issue because I think it has run its course. We can open more focused issues to discuss input types etc if/when we decide to start supporting CNNs etc.

This was referenced Aug 27, 2024

Relax type restrictions to support JuMP.AbstractModel #83

Merged

Allow nested ReducedSpace predictors #84

Merged

Various documentation improvements #85

Merged

This was referenced Aug 28, 2024

[docs] add more design principles and add PyTorch manual page #86

Merged

Add OffsetScaling predictor #87

Closed

This was referenced Aug 28, 2024

[GrayBox] add Hessian support #90

Closed

[breaking] return config struct from each layer #67

Closed

This was referenced Aug 28, 2024

[docs] add inputs are Vectors to design principles #93

Merged

Fix MethodError with unsupported layers in (F)lux #95

Merged

odow closed this as completed Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review feedback #82

Review feedback #82

odow commented Aug 26, 2024 •

edited

Loading

pulsipher commented Aug 27, 2024 •

edited by odow

Loading

1. Using the approach proposed in #80

2. Tweaking #80 to return only one object

3. Store the formulation info in the model and use references

pulsipher commented Aug 27, 2024 •

edited by odow

Loading

odow commented Aug 27, 2024

odow commented Aug 27, 2024

odow commented Aug 28, 2024

pulsipher commented Aug 28, 2024

odow commented Aug 28, 2024

odow commented Aug 28, 2024

odow commented Aug 28, 2024

pulsipher commented Aug 28, 2024

pulsipher commented Aug 28, 2024 •

edited by odow

Loading

pulsipher commented Aug 28, 2024

odow commented Aug 28, 2024

pulsipher commented Aug 29, 2024

odow commented Aug 29, 2024

odow commented Oct 24, 2024

Review feedback #82

Review feedback #82

Comments

odow commented Aug 26, 2024 • edited Loading

Layout

Documentation

Return structs

Comparison to existing projects

pulsipher commented Aug 27, 2024 • edited by odow Loading

Documentation

Return structs

1. Using the approach proposed in #80

2. Tweaking #80 to return only one object

3. Store the formulation info in the model and use references

Comparison to existing projects

Other thoughts

Reduced-space formulations

Scaling

Black-box transformations

Other models

pulsipher commented Aug 27, 2024 • edited by odow Loading

odow commented Aug 27, 2024

odow commented Aug 27, 2024

odow commented Aug 28, 2024

pulsipher commented Aug 28, 2024

odow commented Aug 28, 2024

odow commented Aug 28, 2024

odow commented Aug 28, 2024

pulsipher commented Aug 28, 2024

pulsipher commented Aug 28, 2024 • edited by odow Loading

pulsipher commented Aug 28, 2024

odow commented Aug 28, 2024

pulsipher commented Aug 29, 2024

odow commented Aug 29, 2024

odow commented Oct 24, 2024

odow commented Aug 26, 2024 •

edited

Loading

pulsipher commented Aug 27, 2024 •

edited by odow

Loading

pulsipher commented Aug 27, 2024 •

edited by odow

Loading

pulsipher commented Aug 28, 2024 •

edited by odow

Loading