-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solvation Shell #196
Solvation Shell #196
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #196 +/- ##
===========================================
+ Coverage 78.53% 78.93% +0.40%
===========================================
Files 11 12 +1
Lines 1677 1709 +32
Branches 250 254 +4
===========================================
+ Hits 1317 1349 +32
Misses 276 276
Partials 84 84
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments and also update CHANGES.
mdpow/analysis/solvation.py
Outdated
*ensemble* | ||
The :class:`~mdpow.analysis.ensemble.Ensemble` used in the analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated for the actual call signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mdpow/analysis/solvation.py
Outdated
|
||
Typical Workflow:: | ||
|
||
ens = Ensemble(dirname='Mol') | ||
|
||
solv = SolvationAnalysis(ens, [1.2, 2.4]).run(start=0, stop=10, step=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update; say in words what the example accomplishes so that one can make the connection between intent/science and code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mdpow/analysis/solvation.py
Outdated
for n in self._solu[keys[0]].names: | ||
self._sel += f' {n}' | ||
self._col = ['distance', 'solvent', 'interaction', | ||
'lambda', 'time', 'quantity'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the last column always called "quantity"?
- If the setup is always the same, put it into the base class.
- If it's customized, use a more descriptive column name because it then makes for clearer seaborn plots and generally it's better to have self-documenting data. E.g., here it it could be N_solvent or number or count or something like that.
mdpow/analysis/solvation.py
Outdated
for n in self._solu[keys[0]].names: | ||
self._sel += f' {n}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd add a comment here, explaining that you're building a selection string to get solvent only... and you'd need to document because this can bite users:
I don't like it because you have no idea if some of the names are not also part of the solute. This is not robust.
Instead you should be directly using the user's groups (see below). I think you can nuke this part of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way to make the selection string robust is to extract the atom indices and select those, i.e., build a selection string index 121 122 123 .... 10456
or index 121 - 10456
.
But there's a better way...
mdpow/analysis/solvation.py
Outdated
self._solu = solute | ||
self._solv = solvent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please write out solv to solvent and solu to solute ... bits aren't THAT expensive anymore, your IDE autocompletes anyway, and readability counts.
... especially as they currently only differ by the last letter — that's just asking for typos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mdpow/analysis/solvation.py
Outdated
|
||
def _single_frame(self): | ||
for d in self._dists: | ||
solvs = len(self._temp_sys.select_atoms(f'around {d} ({self._sel})').residues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the selection with using selection groups, along the lines of
solute = self._solu[self._key] # you can also set these up in _single_universe()
solvent = self._solv[self._key] # but I defined them here for readability
solventshell = solvent.select_atoms(f'around {d} solute', solute=solute)
counts = len(solventshell.residues) # make it a separate line, more readable, and no performance penalty
Isn't this nicer?
(Check that it works in a simple sample Universe
... I don't think it needs the global selection keyword.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reading the docs, you almost certainly need global
:
solventshell = solvent.select_atoms(f'around {d} global solute', solute=solute)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really test the selection and check that solventshell
does not contain any solute
atoms. I am not 100% sure how the interaction with global
will play out. Perhaps you'll need
solventshell = solvent.select_atoms(f'solvent and around {d} global solute', solvent=solvent, solute=solute)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fyi, the way to quickly process all _dists
is to use MDAnalysis.lib.distances.capped_distance.
- capped distance array out to
max(_dists)
:pairs, distances = capped_distance(solute.positions, solvent.positions, max(_dists), return_distances=True, box=ts.dimensions) solute_i, solvent_j = np.transpose(pairs) # make the indices a np array
- for each cutoff
_dists[i]
, find solvent inside cutoff :close_solvent_atoms = solvent[solvent_j[distances < _dists[i]]]
- Get the residue count
n[i] = len(close_solvent_atoms.residues)
Should be equal in speed to just a single distance, but then almost the same speed for N distances.
Nevertheless, there's something to be said for first implementing it with the around
selection as this is very clear and allows one to grasp the overall structure.
p.s.: The code above would need to be checked; I just wrote it from memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last method with capped_distances
appears to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It produces results that make sense with the number of solvents in the gro file
mdpow/tests/test_solv_shell.py
Outdated
MANIFEST = RESOURCES.join("manifest.yml") | ||
|
||
|
||
class TestDihedral(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy and paste fail ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, can't believe I missed that.
mdpow/tests/test_solv_shell.py
Outdated
import pytest | ||
|
||
from numpy.testing import assert_almost_equal | ||
from scipy.stats import variation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view is that this is too elaborate, keep it simple and just use numpy. I had to look up what scipy.stats.variation
does, which was a sign that too much is going on than necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
mdpow/tests/test_solv_shell.py
Outdated
def test_selection(self): | ||
solv = SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1) | ||
mean = np.mean(solv.results['quantity']) | ||
var = variation(solv.results['quantity']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use np.std
; variation
does not give any new information because you already tested mean and you can avoid
- using something that I had to first look up, which really annoyed me more than it should have: First "Is
variation()
a function that you defined?" — No. "Where does it come from?" It's imported fromscipy.stats
(sidenote: don't do direct imports, keep it in it's namespace, especially if only used once...stats.variance
or even clearerscipy.stats.variance
is roughly 1000 times more obvious), finally "What does it do?" mean/std. "Is it actually important for the test?" NO! - importing an additional package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- please see comments
- add entry to CHANGES
mdpow/analysis/solvation.py
Outdated
The data is returned in a :class:`pandas.DataFrame` with observations sorted by | ||
distance, solvent, interaction, lambda, time. | ||
|
||
.. ruberic:: Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rubric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mdpow/analysis/solvation.py
Outdated
"""Measures the number of solvent molecules withing the given distances | ||
in an :class:`~mdpow.analysis.ensemble.Ensemble` . | ||
|
||
:keyword: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters
(keywords are optional)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mdpow/analysis/solvation.py
Outdated
|
||
ens = Ensemble(dirname='Mol') | ||
solvent = ens.select_atoms('resname SOL and name OW') | ||
solute = ens.select_atoms('not resname SOL') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it more specific, using the resname of the solute. Otherwise people might copy and paste and get wrong answers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mdpow/analysis/solvation.py
Outdated
solvent = ens.select_atoms('resname SOL and name OW') | ||
solute = ens.select_atoms('not resname SOL') | ||
|
||
solv_dist = SolvationAnalysis(solute, solvent, [1.2, 2.4]).run(start=0, stop=10, step=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove start and step from the example as they are just using the defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mdpow/tests/test_solv_shell.py
Outdated
import pytest | ||
|
||
from numpy.testing import assert_almost_equal | ||
from scipy.stats import variation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
mdpow/tests/test_solv_shell.py
Outdated
|
||
def test_selection(self): | ||
solv = SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1) | ||
mean = np.mean(solv.results['N_solvent']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do the asserts for the two distances separately? The one for 2 should be very small, the one for 10 large. Having actual numbers can help with validation because then we can also apply our knowledge about the system.
mdpow/tests/test_solv_shell.py
Outdated
for i in solv.results['interaction'][:12]: | ||
assert i == 'Coulomb' | ||
|
||
def test_selection(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a parametrized test that checks for each distance separately. We also create a fixture for running the analysis. We make it class-scoped so that it only runs once.
@pytest.fixture(scope="class")
def solvation_analysis_list_results(self):
return SolvationAnalysis(self.solute, self.solvent, [2, 10]).run(start=0, stop=4, step=1).results
@pytest.mark.parametrize("d,ref_mean,ref_std", [(2, ..., ...), (10, ..., ...)])
def test_selection(self, solvation_analysis_list_results, d, ref_mean, ref_std):
results = solvation_analysis_list_result # the fixture provides the results, aliased to a shorter variable for convenience
mean = ... # calculate the mean for distance d from solv.re
std = ... # calculate std dev for distance d
assert mean = pytest.approx(ref_mean) # or use assert_almost_equal
assert std = ...
You can either use pytest.approx or the numpy assertion; I normally use the latter ones for arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks this is a way more robust method.
mdpow/tests/test_solv_shell.py
Outdated
|
||
|
||
class TestSolvShell(object): | ||
mean = 2654.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove and replace with a parametrized test (see below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Sorry it took a while to get through the requested changes, but I think I hit all of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
almost there, just few minor issues
mdpow/analysis/solvation.py
Outdated
def _single_frame(self): | ||
solute = self._solute[self._key] | ||
solvent = self._solvent[self._key] | ||
pairs, distaces = capped_distance(solute.positions, solvent.positions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix spelling of "distaces"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all good, will just add one update changes and then merge
@ALescoulie if I don't get to squash-merging in the next five minutes while waiting for CI, please go ahead and squash and merge yourself (just edit the commit message into something nicely formatted and readable and remove the "Co-authored by Oliver Beckstein" line — for my little amendment I don't want co-authorship). |
My PR for issue #195