Solvation Shell #196

ALescoulie · 2021-09-13T01:33:05Z

My PR for issue #195

codecov · 2021-09-13T01:37:01Z

Codecov Report

Merging #196 (e7e7696) into develop (9810b74) will increase coverage by 0.40%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #196      +/-   ##
===========================================
+ Coverage    78.53%   78.93%   +0.40%     
===========================================
  Files           11       12       +1     
  Lines         1677     1709      +32     
  Branches       250      254       +4     
===========================================
+ Hits          1317     1349      +32     
  Misses         276      276              
  Partials        84       84

Impacted Files	Coverage Δ
mdpow/analysis/solvation.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9810b74...e7e7696. Read the comment docs.

orbeckst

Please see comments and also update CHANGES.

orbeckst · 2021-09-15T22:35:11Z

mdpow/analysis/solvation.py

+    *ensemble*
+        The :class:`~mdpow.analysis.ensemble.Ensemble` used in the analysis.


Needs to be updated for the actual call signature

orbeckst · 2021-09-15T22:35:30Z

mdpow/analysis/solvation.py

+
+    Typical Workflow::
+
+        ens = Ensemble(dirname='Mol')
+
+        solv = SolvationAnalysis(ens, [1.2, 2.4]).run(start=0, stop=10, step=1)


update; say in words what the example accomplishes so that one can make the connection between intent/science and code

orbeckst · 2021-09-15T22:37:48Z

mdpow/analysis/solvation.py

+        for n in self._solu[keys[0]].names:
+            self._sel += f' {n}'
+        self._col = ['distance', 'solvent', 'interaction',
+                     'lambda', 'time', 'quantity']


Is the last column always called "quantity"?

If the setup is always the same, put it into the base class.

If it's customized, use a more descriptive column name because it then makes for clearer seaborn plots and generally it's better to have self-documenting data. E.g., here it it could be N_solvent or number or count or something like that.

orbeckst · 2021-09-15T22:43:19Z

mdpow/analysis/solvation.py

+        for n in self._solu[keys[0]].names:
+            self._sel += f' {n}'


You'd add a comment here, explaining that you're building a selection string to get solvent only... and you'd need to document because this can bite users:

I don't like it because you have no idea if some of the names are not also part of the solute. This is not robust.

Instead you should be directly using the user's groups (see below). I think you can nuke this part of the code.

The only way to make the selection string robust is to extract the atom indices and select those, i.e., build a selection string index 121 122 123 .... 10456 or index 121 - 10456.

But there's a better way...

orbeckst · 2021-09-15T22:48:43Z

mdpow/analysis/solvation.py

+        self._solu = solute
+        self._solv = solvent


please write out solv to solvent and solu to solute ... bits aren't THAT expensive anymore, your IDE autocompletes anyway, and readability counts.

... especially as they currently only differ by the last letter — that's just asking for typos

orbeckst · 2021-09-15T22:54:09Z

mdpow/analysis/solvation.py

+
+    def _single_frame(self):
+        for d in self._dists:
+            solvs = len(self._temp_sys.select_atoms(f'around {d} ({self._sel})').residues)


Replace the selection with using selection groups, along the lines of

solute = self._solu[self._key] # you can also set these up in _single_universe() solvent = self._solv[self._key] # but I defined them here for readability solventshell = solvent.select_atoms(f'around {d} solute', solute=solute) counts = len(solventshell.residues) # make it a separate line, more readable, and no performance penalty

Isn't this nicer?

(Check that it works in a simple sample Universe... I don't think it needs the global selection keyword.)

Re-reading the docs, you almost certainly need global:

solventshell = solvent.select_atoms(f'around {d} global solute', solute=solute)

Really test the selection and check that solventshell does not contain any solute atoms. I am not 100% sure how the interaction with global will play out. Perhaps you'll need

solventshell = solvent.select_atoms(f'solvent and around {d} global solute', solvent=solvent, solute=solute)

Just fyi, the way to quickly process all _dists is to use MDAnalysis.lib.distances.capped_distance.

capped distance array out to max(_dists):
pairs, distances = capped_distance(solute.positions, solvent.positions, max(_dists), return_distances=True, box=ts.dimensions) solute_i, solvent_j = np.transpose(pairs) # make the indices a np array

for each cutoff _dists[i], find solvent inside cutoff :
close_solvent_atoms = solvent[solvent_j[distances < _dists[i]]]

Get the residue count
n[i] = len(close_solvent_atoms.residues)

Should be equal in speed to just a single distance, but then almost the same speed for N distances.

Nevertheless, there's something to be said for first implementing it with the around selection as this is very clear and allows one to grasp the overall structure.

p.s.: The code above would need to be checked; I just wrote it from memory.

The last method with capped_distances appears to work.

It produces results that make sense with the number of solvents in the gro file

orbeckst · 2021-09-15T23:02:00Z

mdpow/tests/test_solv_shell.py

+MANIFEST = RESOURCES.join("manifest.yml")
+
+
+class TestDihedral(object):


copy and paste fail ;-)

fixed, can't believe I missed that.

orbeckst · 2021-09-15T23:04:36Z

mdpow/tests/test_solv_shell.py

+import pytest
+
+from numpy.testing import assert_almost_equal
+from scipy.stats import variation


My view is that this is too elaborate, keep it simple and just use numpy. I had to look up what scipy.stats.variation does, which was a sign that too much is going on than necessary.

orbeckst · 2021-09-15T23:07:33Z

mdpow/tests/test_solv_shell.py

+    def test_selection(self):
+        solv = SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1)
+        mean = np.mean(solv.results['quantity'])
+        var = variation(solv.results['quantity'])


just use np.std; variation does not give any new information because you already tested mean and you can avoid

using something that I had to first look up, which really annoyed me more than it should have: First "Is variation() a function that you defined?" — No. "Where does it come from?" It's imported from scipy.stats (sidenote: don't do direct imports, keep it in it's namespace, especially if only used once... stats.variance or even clearer scipy.stats.variance is roughly 1000 times more obvious), finally "What does it do?" mean/std. "Is it actually important for the test?" NO!

importing an additional package

orbeckst

please see comments
add entry to CHANGES

orbeckst · 2021-09-16T21:49:44Z

mdpow/analysis/solvation.py

+    The data is returned in a :class:`pandas.DataFrame` with observations sorted by
+    distance, solvent, interaction, lambda, time.
+
+    .. ruberic:: Example


orbeckst · 2021-09-16T21:50:06Z

mdpow/analysis/solvation.py

+    """Measures the number of solvent molecules withing the given distances
+    in an :class:`~mdpow.analysis.ensemble.Ensemble` .
+
+    :keyword:


Parameters

(keywords are optional)

orbeckst · 2021-09-16T21:51:10Z

mdpow/analysis/solvation.py

+
+        ens = Ensemble(dirname='Mol')
+        solvent = ens.select_atoms('resname SOL and name OW')
+        solute = ens.select_atoms('not resname SOL')


Make it more specific, using the resname of the solute. Otherwise people might copy and paste and get wrong answers.

orbeckst · 2021-09-16T21:51:47Z

mdpow/analysis/solvation.py

+        solvent = ens.select_atoms('resname SOL and name OW')
+        solute = ens.select_atoms('not resname SOL')
+
+        solv_dist = SolvationAnalysis(solute, solvent, [1.2, 2.4]).run(start=0, stop=10, step=1)


remove start and step from the example as they are just using the defaults

mdpow/analysis/solvation.py

orbeckst · 2021-09-16T22:00:45Z

mdpow/tests/test_solv_shell.py

+import pytest
+
+from numpy.testing import assert_almost_equal
+from scipy.stats import variation


mdpow/tests/test_solv_shell.py

orbeckst · 2021-09-16T22:07:50Z

mdpow/tests/test_solv_shell.py

+
+    def test_selection(self):
+        solv = SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1)
+        mean = np.mean(solv.results['N_solvent'])


Can you do the asserts for the two distances separately? The one for 2 should be very small, the one for 10 large. Having actual numbers can help with validation because then we can also apply our knowledge about the system.

orbeckst · 2021-09-16T22:16:44Z

mdpow/tests/test_solv_shell.py

+        for i in solv.results['interaction'][:12]:
+            assert i == 'Coulomb'
+
+    def test_selection(self):


Make this a parametrized test that checks for each distance separately. We also create a fixture for running the analysis. We make it class-scoped so that it only runs once.

@pytest.fixture(scope="class") def solvation_analysis_list_results(self): return SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1).results @pytest.mark.parametrize("d,ref_mean,ref_std", [(2, ..., ...), (10, ..., ...)]) def test_selection(self, solvation_analysis_list_results, d, ref_mean, ref_std): results = solvation_analysis_list_result # the fixture provides the results, aliased to a shorter variable for convenience mean = ... # calculate the mean for distance d from solv.re std = ... # calculate std dev for distance d assert mean = pytest.approx(ref_mean) # or use assert_almost_equal assert std = ...

You can either use pytest.approx or the numpy assertion; I normally use the latter ones for arrays.

thanks this is a way more robust method.

orbeckst · 2021-09-16T22:22:58Z

mdpow/tests/test_solv_shell.py

+
+
+class TestSolvShell(object):
+    mean = 2654.0


remove and replace with a parametrized test (see below)

ALescoulie · 2021-09-20T15:28:38Z

Sorry it took a while to get through the requested changes, but I think I hit all of them.

orbeckst

almost there, just few minor issues

orbeckst · 2021-09-20T21:59:22Z

mdpow/analysis/solvation.py

+    def _single_frame(self):
+        solute = self._solute[self._key]
+        solvent = self._solvent[self._key]
+        pairs, distaces = capped_distance(solute.positions, solvent.positions,


fix spelling of "distaces"

mdpow/tests/test_solv_shell.py

CHANGES

orbeckst

all good, will just add one update changes and then merge

orbeckst · 2021-09-21T01:37:49Z

@ALescoulie if I don't get to squash-merging in the next five minutes while waiting for CI, please go ahead and squash and merge yourself (just edit the commit message into something nicely formatted and readable and remove the "Co-authored by Oliver Beckstein" line — for my little amendment I don't want co-authorship).

commit solvation.py

41d9f36

ALescoulie added the analysis label Sep 13, 2021

ALescoulie added this to the 0.8.0 milestone Sep 13, 2021

ALescoulie self-assigned this Sep 13, 2021

Merge branch 'develop' into solv_shell

0eb48ff

ALescoulie linked an issue Sep 13, 2021 that may be closed by this pull request

Solvation Shell Tool #195

Closed

ALescoulie and others added 5 commits September 12, 2021 18:52

add docs

2f7f12d

Merge remote-tracking branch 'origin/solv_shell' into solv_shell

d669eff

Merge branch 'develop' into solv_shell

54f865c

fix selection

5feeb9e

fix selection, add test

9ee27cb

ALescoulie requested a review from orbeckst September 15, 2021 01:39

Merge branch 'develop' into solv_shell

c4cff76

orbeckst requested changes Sep 15, 2021

View reviewed changes

ALescoulie added 2 commits September 16, 2021 13:46

overhaul calculation, update docs

25d125f

Merge remote-tracking branch 'origin/solv_shell' into solv_shell

89a5da2

ALescoulie requested a review from orbeckst September 16, 2021 20:48

orbeckst requested changes Sep 16, 2021

View reviewed changes

fixes

b62ca3f

ALescoulie requested a review from orbeckst September 20, 2021 15:28

orbeckst requested changes Sep 20, 2021

View reviewed changes

ALescoulie added 2 commits September 20, 2021 15:23

update CHANGES

437d707

fix varriable name

64a1455

orbeckst reviewed Sep 21, 2021

View reviewed changes

CHANGES Outdated Show resolved Hide resolved

orbeckst approved these changes Sep 21, 2021

View reviewed changes

Update CHANGES

e7e7696

ALescoulie merged commit 2088ae4 into Becksteinlab:develop Sep 21, 2021

		ensemble
		The :class:`~mdpow.analysis.ensemble.Ensemble` used in the analysis.

		MANIFEST = RESOURCES.join("manifest.yml")


		class TestDihedral(object):

Solvation Shell #196

Solvation Shell #196

Conversation

ALescoulie commented Sep 13, 2021

codecov bot commented Sep 13, 2021 • edited Loading

Codecov Report

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst Sep 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ALescoulie commented Sep 20, 2021

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst commented Sep 21, 2021

codecov bot commented Sep 13, 2021 •

edited

Loading

orbeckst Sep 15, 2021 •

edited

Loading