Add sensitivity analysis #108

luigibonati · 2023-12-18T10:35:57Z

Description

Add a sensitivity analyisis as done in the DeepLDA/DeepTICA paper, as requested in #105.

Todos

sensitivity_analysis in utils.explain
regtests
plot_sensitivity_analysis in utils.plot
add it in deeplda example notebook
create a tutorial to explain it
retrieve automatically feature names from dataset (Add feature_names attribute to DictDataset #106)

For now I created a new module inside utils called explain, altough we might also move it outside utils and create an explain module which contains this together with other things such as the sparse classification with Lasso (@pietronvll)

codecov · 2023-12-18T10:51:58Z

Codecov Report

Merging #108 (d0399d3) into main (4fc7301) will increase coverage by 0.36%.
Report is 2 commits behind head on main.
The diff coverage is 92.35%.

Additional details and impacted files

mlcolvar/tests/test_utils_explain.py

mlcolvar/utils/plot.py

mlcolvar/utils/explain.py

luigibonati · 2023-12-18T12:04:17Z

In order to automatically retrieve the input descriptors names I added an attribute feature_names to DictDataset (both in the constructor and as a property) and changed the function create_dataset_from_file to automatically set it when creating a dataset from a dataframe. let me know if you have comments

mlcolvar/utils/plot.py

+        return_ax = False
+
+    # define utils functions
+    def _set_violin_attributes(violin_parts, color, alpha=0.5, label=None, zorder=None):


mlcolvar/utils/explain.py

EnricoTrizio · 2023-12-18T14:04:56Z

I would prefer to have the plots horizontally rather than vertically, please fix this @luigibonati 🤗

andrrizzi

Looks good to me!

jinmei-1 · 2023-12-21T14:39:46Z

Hello! Thank you for providing the sensitivity analysis code. We have tested it and found its performance to be excellent. However, we encountered an issue during the simplification of our system's 4536 descriptors, specifically regarding the image size being too large. In particular, we received the following error message:
ValueError: Image size of 500x113400 pixels is too large. It must be less than 2^16 in each direction.
We would like to seek your guidance on how to modify the code to address this issue. Additionally, while reading your article "Deep learning the slow modes for rare events sampling," we interest in the method you mentioned for reducing the descriptor set in the Chignolin Folding case. In your research, you reducing the number of descriptors to from 4278 to 210 by selecting the most relevant ones through sensitivity analysis of the primary CVs. **### We would like to learn more about the criteria you used to select these 210 descriptors.**Why is it 210 instead of 220 or 200?
We appreciate your assistance and look forward to your guidance. Meanwhile, we are sharing our test code with you for a better understanding of our issues.
Thank you for your time and support. We eagerly await your response.
12-21-code and COVARL.zip

luigibonati · 2023-12-22T16:36:00Z

Hello! Thank you for providing the sensitivity analysis code. We have tested it and found its performance to be excellent. However, we encountered an issue during the simplification of our system's 4536 descriptors, specifically regarding the image size being too large. In particular, we received the following error message: ValueError: Image size of 500x113400 pixels is too large. It must be less than 2^16 in each direction. We would like to seek your guidance on how to modify the code to address this issue. Additionally, while reading your article "Deep learning the slow modes for rare events sampling," we interest in the method you mentioned for reducing the descriptor set in the Chignolin Folding case. In your research, you reducing the number of descriptors to from 4278 to 210 by selecting the most relevant ones through sensitivity analysis of the primary CVs. **### We would like to learn more about the criteria you used to select these 210 descriptors.**Why is it 210 instead of 220 or 200? We appreciate your assistance and look forward to your guidance. Meanwhile, we are sharing our test code with you for a better understanding of our issues. Thank you for your time and support. We eagerly await your response. 12-21-code and COVARL.zip

hi, thanks for the feedback. i added a tutorial in the documentation where i show how to customize the analysis. it should work now, let me know!

i added that by default it only prints the first 50 features, but this can be changed in the plot_sensitivity options
i have also added how to create a new dataset using only the first N features. how to choose N depends on 1) how is the distribution of features 2) how much is the cost of using the resulting CV. i remember that regarding the deeptica paper you were getting qualitatively similar results using e.g. 100, 200 or 300 features

mlcolvar/utils/plot.py

@@ -256,6 +256,160 @@
    else:
        return None

+
+def plot_sensitivity(results, mode="violin", per_class=None, max_features = 100, ax=None):


luigibonati · 2023-12-22T17:01:37Z

this closes also #105 and #106

jinmei-1 · 2023-12-27T09:30:01Z

Hello! Thank you for providing your tutorial and the new code. We've adapted and executed DeepTICA (our protein-ligand system) based on your DeepLDA tutorial. We successfully plotted sensitivity, but encountered an issue when attempting to create a new dataset using only the first N features, as indicated below:
KeyError: "None of [Index(['2003', '2094', '2007', '2099', '2005', '2008', '2046', '2004', '2100', '2006'], dtype='object')] are in the [columns]"

Could you please guide us on how to modify it to suit DeepTICA data?
Additionally, we have a minor query regarding this line of code: relevant_features = results['feature_names'][-n_features:]. Why are the last n_features elements selected from the end of the feature_names list, rather than choosing n_features from front to back in the order of sensitivity ranking elements?
We've attached our test code for your review to provide better insight into our challenges. Your time and support are greatly appreciated. We look forward to your response.
12-27-code and COVARL.zip

luigibonati · 2024-01-03T17:42:41Z

The features are sorted in ascending order, this is why we need to slice the last N features instead of the first ones.

Regarding the error in creating the new dataset, it is related to the fact that in the case of a time-lagged dataset (at variance with those created with the create_dataset_from_files) there is no information about the feature names, so they have to be specified by hand. Indeed, in the sensitivity plot the features are displayed as numbers and not with the correct names.

let me know if this works


feature_names = colvar.filter(regex='d_').columns

results = sensitivity_analysis(model,               
                               dataset, 
                               metric="mean_abs_val",       # metric to use to compute the sensitivity per feature (e.g. mean absolute value or root mean square)
                               feature_names=feature_names, # by default, they will be taken from `dataset.feature_names` 
                               per_class=False,             # whether to do per-class statistics
                               plot_mode=None)              # plot mode (see below)

I will add a warning that the feature names could not be retrieved if they are not present in the dataset.

Luigi

jinmei-1 · 2024-01-15T02:00:54Z

Thank you for providing the new code, it works and I get the new deep-TICA(.tpc) model.
I have successfully implemented GROMACS_2022.5-plumed_2.9.0 on our server. Additionally, I have configured LibTorch and the PyTorch module, with both "plumed config has libtorch" and "plumed config module pytorch" returning on.
However, when attempting enhanced sampling simulations, I encountered an error: GROMACS displayed "free(): invalid pointer." This issue persisted when I tried to run the Deep-TICA example for alanine, as outlined in your paper titled "Deep learning the slow modes for rare events sampling" (Luigi Bonati, GiovanniMaria Piccini, and Michele Parrinello. Proceedings of the National Academy of Sciences, 118(44), 2021).

I find this error perplexing, and I was hoping you could assist me in troubleshooting or provide any guidance on resolving this matter. I have attached relevant files such as plumed.dat, .tpr, .tpc, descriptors.dat, and a log for your reference.
Thank you very much for your time and consideration. I appreciate any insights or assistance you can provide.

Best regards,
jinmei
ref.zip

luigibonati and others added 5 commits December 18, 2023 10:55

add sensitivity analysis and plot util

8f00c08

fix math in doc

e63b727

added sensitivity analyisis to deeplda notebook

b45bfe1

Merge branch 'main' into sensitivity

bc20b45

add sensitivity analyis to deeplda example notebook

595c6c3

github-advanced-security bot found potential problems Dec 18, 2023

View reviewed changes

luigibonati added 5 commits December 18, 2023 12:25

Add feature_names attribute to DictDataset #106

dd115b6

retrieve features name from dataset by default

2a78e90

fix code warnings and error

44978fa

fix typo in notebook

400a322

format with black

3b08667

github-advanced-security bot found potential problems Dec 18, 2023

View reviewed changes

added mean metric to sensitivity

9674613

andrrizzi approved these changes Dec 19, 2023

View reviewed changes

luigibonati added 4 commits December 22, 2023 16:59

add max_features option to plot sensitivity

f0ceb75

add option to pass std to sensitivity analysis

cf103c1

fixed plot max features

3821c89

add tutorial on feature relevances

d0399d3

github-advanced-security bot found potential problems Dec 22, 2023

View reviewed changes

luigibonati merged commit 6f7a6df into main Dec 22, 2023
12 checks passed

luigibonati deleted the sensitivity branch December 22, 2023 17:01

luigibonati mentioned this pull request Dec 22, 2023

Add feature_names attribute to DictDataset #106

Closed

luigibonati restored the sensitivity branch December 22, 2023 17:05

luigibonati mentioned this pull request Dec 22, 2023

add sensitivity tutorial to doc #112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sensitivity analysis #108

Add sensitivity analysis #108

luigibonati commented Dec 18, 2023 •

edited

Loading

codecov bot commented Dec 18, 2023 •

edited

Loading

luigibonati commented Dec 18, 2023

EnricoTrizio commented Dec 18, 2023

andrrizzi left a comment

jinmei-1 commented Dec 21, 2023

luigibonati commented Dec 22, 2023

luigibonati commented Dec 22, 2023

jinmei-1 commented Dec 27, 2023

luigibonati commented Jan 3, 2024

jinmei-1 commented Jan 15, 2024

Add sensitivity analysis #108

Add sensitivity analysis #108

Conversation

luigibonati commented Dec 18, 2023 • edited Loading

Description

Todos

codecov bot commented Dec 18, 2023 • edited Loading

Codecov Report

luigibonati commented Dec 18, 2023

EnricoTrizio commented Dec 18, 2023

andrrizzi left a comment

Choose a reason for hiding this comment

jinmei-1 commented Dec 21, 2023

luigibonati commented Dec 22, 2023

luigibonati commented Dec 22, 2023

jinmei-1 commented Dec 27, 2023

luigibonati commented Jan 3, 2024

jinmei-1 commented Jan 15, 2024

luigibonati commented Dec 18, 2023 •

edited

Loading

codecov bot commented Dec 18, 2023 •

edited

Loading