Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solvation Shell #196

Merged
merged 14 commits into from
Sep 21, 2021
Merged

Solvation Shell #196

merged 14 commits into from
Sep 21, 2021

Conversation

ALescoulie
Copy link
Contributor

My PR for issue #195

@ALescoulie ALescoulie added this to the 0.8.0 milestone Sep 13, 2021
@ALescoulie ALescoulie self-assigned this Sep 13, 2021
@ALescoulie ALescoulie linked an issue Sep 13, 2021 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Sep 13, 2021

Codecov Report

Merging #196 (e7e7696) into develop (9810b74) will increase coverage by 0.40%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #196      +/-   ##
===========================================
+ Coverage    78.53%   78.93%   +0.40%     
===========================================
  Files           11       12       +1     
  Lines         1677     1709      +32     
  Branches       250      254       +4     
===========================================
+ Hits          1317     1349      +32     
  Misses         276      276              
  Partials        84       84              
Impacted Files Coverage Δ
mdpow/analysis/solvation.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9810b74...e7e7696. Read the comment docs.

@ALescoulie ALescoulie requested a review from orbeckst September 15, 2021 01:39
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments and also update CHANGES.

Comment on lines 23 to 24
*ensemble*
The :class:`~mdpow.analysis.ensemble.Ensemble` used in the analysis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be updated for the actual call signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 33 to 38

Typical Workflow::

ens = Ensemble(dirname='Mol')

solv = SolvationAnalysis(ens, [1.2, 2.4]).run(start=0, stop=10, step=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update; say in words what the example accomplishes so that one can make the connection between intent/science and code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for n in self._solu[keys[0]].names:
self._sel += f' {n}'
self._col = ['distance', 'solvent', 'interaction',
'lambda', 'time', 'quantity']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the last column always called "quantity"?

  • If the setup is always the same, put it into the base class.
  • If it's customized, use a more descriptive column name because it then makes for clearer seaborn plots and generally it's better to have self-documenting data. E.g., here it it could be N_solvent or number or count or something like that.

Comment on lines 51 to 52
for n in self._solu[keys[0]].names:
self._sel += f' {n}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd add a comment here, explaining that you're building a selection string to get solvent only... and you'd need to document because this can bite users:

I don't like it because you have no idea if some of the names are not also part of the solute. This is not robust.

Instead you should be directly using the user's groups (see below). I think you can nuke this part of the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to make the selection string robust is to extract the atom indices and select those, i.e., build a selection string index 121 122 123 .... 10456 or index 121 - 10456.

But there's a better way...

Comment on lines 44 to 45
self._solu = solute
self._solv = solvent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please write out solv to solvent and solu to solute ... bits aren't THAT expensive anymore, your IDE autocompletes anyway, and readability counts.

... especially as they currently only differ by the last letter — that's just asking for typos

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


def _single_frame(self):
for d in self._dists:
solvs = len(self._temp_sys.select_atoms(f'around {d} ({self._sel})').residues)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the selection with using selection groups, along the lines of

solute = self._solu[self._key]    # you can also set these up in _single_universe()
solvent = self._solv[self._key]   # but I defined them here for readability
solventshell = solvent.select_atoms(f'around {d} solute', solute=solute)
counts = len(solventshell.residues)  # make it a separate line, more readable, and no performance penalty

Isn't this nicer?

(Check that it works in a simple sample Universe... I don't think it needs the global selection keyword.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reading the docs, you almost certainly need global:

solventshell = solvent.select_atoms(f'around {d} global solute', solute=solute)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really test the selection and check that solventshell does not contain any solute atoms. I am not 100% sure how the interaction with global will play out. Perhaps you'll need

solventshell = solvent.select_atoms(f'solvent and around {d} global solute', solvent=solvent, solute=solute)

Copy link
Member

@orbeckst orbeckst Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fyi, the way to quickly process all _dists is to use MDAnalysis.lib.distances.capped_distance.

  1. capped distance array out to max(_dists):
    pairs, distances = capped_distance(solute.positions, solvent.positions, max(_dists), return_distances=True, box=ts.dimensions)
    solute_i, solvent_j = np.transpose(pairs)    # make the indices a np array
  2. for each cutoff _dists[i], find solvent inside cutoff :
    close_solvent_atoms = solvent[solvent_j[distances < _dists[i]]]
  3. Get the residue count
    n[i] = len(close_solvent_atoms.residues)

Should be equal in speed to just a single distance, but then almost the same speed for N distances.

Nevertheless, there's something to be said for first implementing it with the around selection as this is very clear and allows one to grasp the overall structure.

p.s.: The code above would need to be checked; I just wrote it from memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last method with capped_distances appears to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It produces results that make sense with the number of solvents in the gro file

MANIFEST = RESOURCES.join("manifest.yml")


class TestDihedral(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy and paste fail ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, can't believe I missed that.

import pytest

from numpy.testing import assert_almost_equal
from scipy.stats import variation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is that this is too elaborate, keep it simple and just use numpy. I had to look up what scipy.stats.variation does, which was a sign that too much is going on than necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

def test_selection(self):
solv = SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1)
mean = np.mean(solv.results['quantity'])
var = variation(solv.results['quantity'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use np.std; variation does not give any new information because you already tested mean and you can avoid

  • using something that I had to first look up, which really annoyed me more than it should have: First "Is variation() a function that you defined?" — No. "Where does it come from?" It's imported from scipy.stats (sidenote: don't do direct imports, keep it in it's namespace, especially if only used once... stats.variance or even clearer scipy.stats.variance is roughly 1000 times more obvious), finally "What does it do?" mean/std. "Is it actually important for the test?" NO!
  • importing an additional package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ALescoulie ALescoulie requested a review from orbeckst September 16, 2021 20:48
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please see comments
  • add entry to CHANGES

The data is returned in a :class:`pandas.DataFrame` with observations sorted by
distance, solvent, interaction, lambda, time.

.. ruberic:: Example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rubric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""Measures the number of solvent molecules withing the given distances
in an :class:`~mdpow.analysis.ensemble.Ensemble` .

:keyword:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters

(keywords are optional)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


ens = Ensemble(dirname='Mol')
solvent = ens.select_atoms('resname SOL and name OW')
solute = ens.select_atoms('not resname SOL')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it more specific, using the resname of the solute. Otherwise people might copy and paste and get wrong answers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

solvent = ens.select_atoms('resname SOL and name OW')
solute = ens.select_atoms('not resname SOL')

solv_dist = SolvationAnalysis(solute, solvent, [1.2, 2.4]).run(start=0, stop=10, step=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove start and step from the example as they are just using the defaults

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

mdpow/analysis/solvation.py Show resolved Hide resolved
import pytest

from numpy.testing import assert_almost_equal
from scipy.stats import variation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

mdpow/tests/test_solv_shell.py Show resolved Hide resolved

def test_selection(self):
solv = SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1)
mean = np.mean(solv.results['N_solvent'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the asserts for the two distances separately? The one for 2 should be very small, the one for 10 large. Having actual numbers can help with validation because then we can also apply our knowledge about the system.

for i in solv.results['interaction'][:12]:
assert i == 'Coulomb'

def test_selection(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a parametrized test that checks for each distance separately. We also create a fixture for running the analysis. We make it class-scoped so that it only runs once.

@pytest.fixture(scope="class")
def solvation_analysis_list_results(self):
    return SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1).results

@pytest.mark.parametrize("d,ref_mean,ref_std", [(2, ..., ...), (10, ..., ...)])
def test_selection(self, solvation_analysis_list_results, d, ref_mean, ref_std):
     results = solvation_analysis_list_result  # the fixture provides the results, aliased to a shorter variable for convenience
     mean = ... # calculate the mean for distance d from solv.re
     std = ... # calculate std dev for distance d
     assert mean = pytest.approx(ref_mean)   # or use assert_almost_equal
     assert  std = ...

You can either use pytest.approx or the numpy assertion; I normally use the latter ones for arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks this is a way more robust method.



class TestSolvShell(object):
mean = 2654.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove and replace with a parametrized test (see below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ALescoulie ALescoulie requested a review from orbeckst September 20, 2021 15:28
@ALescoulie
Copy link
Contributor Author

Sorry it took a while to get through the requested changes, but I think I hit all of them.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there, just few minor issues

def _single_frame(self):
solute = self._solute[self._key]
solvent = self._solvent[self._key]
pairs, distaces = capped_distance(solute.positions, solvent.positions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix spelling of "distaces"

mdpow/tests/test_solv_shell.py Show resolved Hide resolved
mdpow/tests/test_solv_shell.py Outdated Show resolved Hide resolved
CHANGES Outdated Show resolved Hide resolved
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all good, will just add one update changes and then merge

@orbeckst
Copy link
Member

@ALescoulie if I don't get to squash-merging in the next five minutes while waiting for CI, please go ahead and squash and merge yourself (just edit the commit message into something nicely formatted and readable and remove the "Co-authored by Oliver Beckstein" line — for my little amendment I don't want co-authorship).

@ALescoulie ALescoulie merged commit 2088ae4 into Becksteinlab:develop Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Solvation Shell Tool
2 participants