Skip to content
This repository has been archived by the owner on Apr 24, 2024. It is now read-only.

Ridge regressor #4

Merged
merged 4 commits into from
Feb 9, 2023
Merged

Ridge regressor #4

merged 4 commits into from
Feb 9, 2023

Conversation

PicoCentauri
Copy link
Collaborator

@PicoCentauri PicoCentauri commented Jan 19, 2023

Adding a Ridge class for linear models.

Idea

The core of the Ridge class is that all inputs are TensorMap objects. This holds for the training data X the target values y but also for the regularization strength alpha and the sample_weights. We this the Ridge class looks fairly simple and clean in my view.

For the construction of y we already have a helper function as introduced in #3. For alpha I will write a helper function in this PR.

TODO

  • Finish valdiation function _validate_data and _validate_params
  • Add tests
  • Add example
  • Add function for creating the regularization TensorMap.

Acknowledgement

Thanks to @rosecers for the inspiration in #2. The problem with #2 that we can not use sklearn in equisolve. We have to use different alphas for values, gradients etc...

Thanks also to @Luthaf for the linear_model in equistore-examples!

Comment on lines 6 to 7
# Released under the GNU Public Licence, v3 or any higher version
# SPDX-License-Identifier: GPL-3.0-or-later
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What? The repo says MIT, and the rest of the ecosystem is BSD =)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me change this to BSD if this is our license

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to include this boilerplate in every files, but I would be in favor of changing the licence of this repo to BSD. This should be done in a separate PR though!

@PicoCentauri PicoCentauri force-pushed the ridge branch 10 times, most recently from e7a2d03 to 0c2c25b Compare January 20, 2023 18:39
Comment on lines 12 to 13
def rmse(y_true: TensorMap, y_pred: TensorMap) -> float:
"""TODO: Needs to be a tensormap implementation."""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function ever used? becouse right now it does not work if y_true y_pred is a TensorMap

Copy link
Contributor

@DavideTisi DavideTisi Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think we can just do:

  • check if the metadata are the same
  • return np.sqrt(np.mean((y_true.values - y_pred.values) ** 2))
  • do the same for the grad
    It works only if the metadata are in the same order but it is simple

Am I not getting something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, more or less. I will write the function now...

@PicoCentauri PicoCentauri force-pushed the ridge branch 3 times, most recently from c7e613e to 66e3052 Compare January 24, 2023 16:04
@agoscinski
Copy link
Collaborator

agoscinski commented Jan 25, 2023

Just some mental note what the current Ridge class can cover in principle. This does not mean that we support this already with this PR, but we could extend the support to these cases.

  • Learning with gradients: All gradient components are put into the samples dimension and then used for computing weights
  • Learning environmental properties: should be possible
  • Learning with multiple body orders: it is possible if one moves the body orders to the properties before
  • Learning with long and short range: it is possible if one moves the short and long range features to the properties before. Joining the gradients is actually problematic, since the pair gradients have different shapes (different cutoff), but that is part of preprocessing outside of Ridge and can be solved with equistore join operation
  • Learning Hamiltonian: For each key in the TensorMap a model is created, the $\mu$ components in each tensor block have to be moved to the samples before using the model, but it should in principle work
  • Learning only per-central-species variances: I don't think we can represent this, but I am not super sure how this would work in the linear model case with structure properties. Need to think about this with @Luthaf. Wrote an issue Add support in Ridge for a model with different weights per central-species #6

Comment on lines 132 to 139
samples = Labels(
names=["structure"],
values=np.array([(0,)]),
)
alpha = slice(X, samples=samples)
n_features = len(alpha.block().values[:])

alpha.block().values[:] = 1e-5
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not happy at all about the usability of setting alpha here. It is not really clear what is happening. However, I see no easy way at the moment if we want to keep alpha as a TensorMap. We can also also set the input type of alpha as a numpy array, but I like the idea that all inputs inputs are TensorMap's.

We have a similar problem below for the sample_weights. We really have to work on helper functions for creating simple TensorMap's. One possible function is using a reference TensorMaps meta data but using different numbers. i.e. here we need a TensorMap which has the same number of features as a reference TensorMap but only one sample.

examples/linear-model.py Outdated Show resolved Hide resolved
Comment on lines 120 to 122
a = X_mat.T @ X_mat + np.diag(alpha)
b = X_mat.T @ y_mat
w = solve(a, b, assume_a="pos", overwrite_a=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @kvhuguenin because he had an idea to improve the actual solver.

src/equisolve/numpy/models/linear_model.py Show resolved Hide resolved
@PicoCentauri PicoCentauri marked this pull request as ready for review January 31, 2023 21:47
Copy link
Contributor

@DavideTisi DavideTisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to check the tests

examples/README.rst Outdated Show resolved Hide resolved
src/equisolve/numpy/models/linear_model.py Outdated Show resolved Hide resolved
Comment on lines 12 to 13
def rmse(y_true: TensorMap, y_pred: TensorMap) -> float:
"""TODO: Needs to be a tensormap implementation."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function ever used? becouse right now it does not work if y_true y_pred is a TensorMap

src/equisolve/numpy/models/linear_model.py Outdated Show resolved Hide resolved
src/equisolve/numpy/models/linear_model.py Outdated Show resolved Hide resolved
src/equisolve/numpy/models/linear_model.py Outdated Show resolved Hide resolved
src/equisolve/numpy/models/linear_model.py Outdated Show resolved Hide resolved
src/equisolve/numpy/models/linear_model.py Outdated Show resolved Hide resolved
Comment on lines 12 to 13
def rmse(y_true: TensorMap, y_pred: TensorMap) -> float:
"""TODO: Needs to be a tensormap implementation."""
Copy link
Contributor

@DavideTisi DavideTisi Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think we can just do:

  • check if the metadata are the same
  • return np.sqrt(np.mean((y_true.values - y_pred.values) ** 2))
  • do the same for the grad
    It works only if the metadata are in the same order but it is simple

Am I not getting something?

@PicoCentauri PicoCentauri force-pushed the ridge branch 2 times, most recently from 48f0790 to 71d73a1 Compare February 7, 2023 14:46
PicoCentauri added a commit that referenced this pull request Feb 8, 2023
# This is the 1st commit message:

Add Ridge class for linear models.

# This is the commit message #2:

Reduce the number of combinations tested on CI (#5)

# This is the commit message #3:

add standard scaler with tests

# This is the commit message #4:

small fixes, added TODOs to linear model

# This is the commit message #5:

format example and other files

# This is the commit message #6:

add tensor map to pickable dictionary object

# This is the commit message #7:

format example and other files

# This is the commit message #8:

Added tests

# This is the commit message #9:

skeleton for ridge test

# This is the commit message #10:

Added a shape test for Ridge + more utils

# This is the commit message #11:

fix type to int

# This is the commit message #12:

Added numerically stable solver

# This is the commit message #13:

Add tests for ridge solver: vs exact results

# This is the commit message #14:

Add test: Infinite regularization + predict

# This is the commit message #15:

changing temporray TensorMap  member variables to dicts to allow saving of model as torchscript

# This is the commit message #16:

Add test: consistent scaling of weights
@PicoCentauri PicoCentauri force-pushed the ridge branch 2 times, most recently from 8a1bc45 to e1c8c7c Compare February 8, 2023 16:20
@PicoCentauri
Copy link
Collaborator Author

The RMSE is now in and also all dependencies have been updated. If there are no further objections we can merge this from my side.

components=[],
properties=Labels(["property"], np.array([(0,)])),
)

if positions_gradients is not None:
if n_properties != len(positions_gradients):
if n_samples != len(positions_gradients):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this function for me in principle is ok, just to do a fast TensorMap, but in principle, this is not completely correct for derivative on the positions.
Each sample in the value array can have derivatives with respect to different atoms, each of these derivatives would result in a different line in the gradient array.
In general, the number of samples in the block.values array (1st axis of the values array) is not the same as the number of samples in the block.gradient.data array (1st axis of the data array in the gradient)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i do not understand what M_i is, or how the matrix positions_gradients is structured

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a bit unclear here indeed. The function takes a list of values (List[floats]) and gradients as list of numpy arrays (List[np.ndarray]). Therefore, the check we perform here is correct. We check that the number of provided list entries for positions_gradients and values are the same. This has nothing to do with how samples are stored in equistore.

I will adjust this in the code make it clear.


X = TensorMap(Labels.single(), [X_block])

assert_equal(rmse(X, X, parameter_key="positions"), [0.0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test to check what happens when you use a parameter_key not equal to sample, positions or cell (or in general when it is equal to something not allowed), since it is not explicitly checked in the rmse function but left to the call to the derivative parameter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no explicit checks of these kind anywhere. The reason is that we might introduce new gradients in the future.


# Test prediction
X_pred = clf.predict(X)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to check if two TensorMap are equal with also the same metadata you can use the operations.all_close(X_pred,y) function.

now it is only one line of code + check
- constructed the coef_tensor which is the tensorMap of the coeff
- i left the old implementation commented in case you do not like it and to compare
so far both coef_ and coef_tensor exist so the check are still on coeff_ .
maybe we want to change that.
Copy link
Contributor

@DavideTisi DavideTisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the predict is better to do with the dot among TensorMap. I did a commit to do that, do you like the new version?
if so maybe we want to have coeff_ only in its TensorMap form and then all the tests must be changed accordingly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants