Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Solvation Shell Analysis #227

Open
wants to merge 115 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
e71456d
developing back end run methods
cadeduckworth Sep 15, 2022
6c67ec1
condensed backend run methods
cadeduckworth Sep 15, 2022
9914675
Initial adding of automation functionality to dihedral.py
cadeduckworth Oct 20, 2022
f9b6646
moved directory_paths function to reside within the DihedralAnalysis …
cadeduckworth Oct 20, 2022
7bcdf49
Addressed change requests by Oliver
cadeduckworth Oct 22, 2022
a87cb11
fixed indentation error for _single_frame() block
cadeduckworth Oct 22, 2022
7edde0d
reverting previous tab issue, necessary for normal function
cadeduckworth Oct 22, 2022
48016c5
fixed errors previously reverted for testing
cadeduckworth Oct 23, 2022
59c819c
Merge branch 'develop' into ensemble_run_update
cadeduckworth Oct 23, 2022
6b1be1b
restored original dihedral.py file
cadeduckworth Oct 28, 2022
2ddea0e
created automation/dihedral subdirectory under mdpow/analysis/ to hou…
cadeduckworth Oct 28, 2022
ad6db6e
updated docstrings and added examples for directory_paths.py
cadeduckworth Oct 28, 2022
848dd87
added docstrings and examples for directory_iteration.py
cadeduckworth Oct 28, 2022
6394d7a
added initial draft of docstrings and examples for all functions cont…
cadeduckworth Oct 28, 2022
d8f3ca3
added the option to save the DihedralAnalysis results DataFrame as cs…
cadeduckworth Nov 12, 2022
d8137cb
removed redundant DataFrame saving pattern from automated_dihedral_an…
cadeduckworth Nov 17, 2022
29477ab
Merge remote-tracking branch 'origin/ensemble_run_update' into automa…
cadeduckworth Nov 17, 2022
642e87e
merged updates from PR #216 and PR #218
cadeduckworth Nov 17, 2022
4814376
Merge branch 'develop' into automated-dihedral-analysis
orbeckst Nov 17, 2022
f333ee8
Merge branch 'develop' into automated-dihedral-analysis
orbeckst Nov 18, 2022
6f16b84
resolved conflicts to merge updates in from develop from PR#216
cadeduckworth Dec 14, 2022
ddffbdf
adding data for testing automated dihedral analysis
cadeduckworth Dec 14, 2022
c850448
added starting framework for testing automated dihedral analysis
cadeduckworth Dec 15, 2022
69b0351
reorganized automation directory into workflows directory
cadeduckworth Dec 15, 2022
4cf176d
added preliminary simple core tests for automated dihedral analysis
cadeduckworth Dec 15, 2022
97c4a73
initial reformatting of existing docs for sphinx markup compatibility
cadeduckworth Dec 15, 2022
930d54e
fix errors in docs for workflows
cadeduckworth Dec 15, 2022
84536ff
added imports for automated dihedral analysis tests
cadeduckworth Dec 15, 2022
f01b2a4
added init file for workflows module
cadeduckworth Dec 15, 2022
61ff494
added logging functionality for workflows modules
cadeduckworth Dec 15, 2022
f7b3f53
sphinx markup corrections for workflow modules
cadeduckworth Dec 15, 2022
d16dc7d
redo of reduced testing data for GAFF SM25 from original full dataset…
cadeduckworth Dec 16, 2022
bcfa10f
sphinx docs automodule test
cadeduckworth Dec 16, 2022
b8c309d
sphinx docs test base.txt
cadeduckworth Dec 16, 2022
837f53e
docs
cadeduckworth Dec 16, 2022
44db567
docs
cadeduckworth Dec 16, 2022
e53dfb6
docs
cadeduckworth Dec 16, 2022
5f28b46
edit docs
cadeduckworth Dec 17, 2022
e58703a
edit docs
cadeduckworth Dec 17, 2022
bbcb5de
edit docs
cadeduckworth Dec 17, 2022
45892b6
edit docs
cadeduckworth Dec 17, 2022
eaf767a
edit docs
cadeduckworth Dec 17, 2022
f19f32e
update py.path to pathlib in test_automated_dihedral_analysis
cadeduckworth Dec 17, 2022
41166a4
update reference testing values for new reduced dataset, test_automat…
cadeduckworth Dec 17, 2022
1de662e
change syntax in test_automated_dihedral_analysis to compare objects …
cadeduckworth Dec 17, 2022
4eafa23
update assert_almost_equal to pytest.approx
cadeduckworth Dec 17, 2022
4833a45
bounce redundant function calls to a pytest.fixture for use with mult…
cadeduckworth Dec 17, 2022
5be6684
add bz2 compression for dataframe storage of automated dihedral analy…
cadeduckworth Dec 17, 2022
ef750f1
Update ci.yaml
cadeduckworth Dec 17, 2022
4c9251b
added functionality for user input of alternative SMARTS string selec…
cadeduckworth Jan 5, 2023
187632b
Merge branch 'automated-dihedral-analysis' of github.com:Becksteinlab…
cadeduckworth Jan 5, 2023
7abb0bc
add tests for automated directory iteration of dihedral analysis, wor…
cadeduckworth Jan 5, 2023
372a205
add tests for saving dataframes and saving figures, which covers the …
cadeduckworth Jan 5, 2023
39518c2
add test for directory iteration of fully automated dihedral analysis
cadeduckworth Jan 5, 2023
37eb085
moved location of rdkit install used for testing/pytest
cadeduckworth Jan 7, 2023
85b54f3
remove misplaced hbonds documentation
cadeduckworth Jan 7, 2023
1e5aadc
relocate test data for workflows and change test scripts accordingly
cadeduckworth Jan 7, 2023
edc0a9a
removed unnecessary .ipynb, .lock, and .npz files
cadeduckworth Jan 7, 2023
eb41100
simplified keyword specification for default and user input of SMARTS…
cadeduckworth Jan 7, 2023
b91240e
fixed plotting issue for solvents (name and order)
cadeduckworth Jan 7, 2023
fbca70d
variable name issue in testing module, user_SMARTS -> SMARTS
cadeduckworth Jan 7, 2023
b0081ef
Delete dir.csv
cadeduckworth Jan 8, 2023
0520ffa
import syntax
cadeduckworth Jan 8, 2023
77df0ee
sphinx docs
cadeduckworth Jan 8, 2023
17a0b4a
Merge branch 'automated-dihedral-analysis' of github.com:Becksteinlab…
cadeduckworth Jan 8, 2023
7359319
docs
cadeduckworth Jan 8, 2023
8f8f26c
docs
cadeduckworth Jan 8, 2023
c452475
docs
cadeduckworth Jan 8, 2023
3dd1b31
docs
cadeduckworth Jan 8, 2023
cecd928
docs
cadeduckworth Jan 8, 2023
5b87e83
docs
cadeduckworth Jan 8, 2023
86804d8
docs
cadeduckworth Jan 8, 2023
0ab0299
docs
cadeduckworth Jan 8, 2023
3964a44
docs
cadeduckworth Jan 8, 2023
dfb7e8e
docs
cadeduckworth Jan 8, 2023
be86754
docs
cadeduckworth Jan 8, 2023
f6330cb
docs
cadeduckworth Jan 8, 2023
ab6800e
docs
cadeduckworth Jan 8, 2023
6bdc521
docs
cadeduckworth Jan 8, 2023
0a7963d
docs
cadeduckworth Jan 8, 2023
1866d43
docs
cadeduckworth Jan 8, 2023
495546c
docs
cadeduckworth Jan 8, 2023
956b359
docs
cadeduckworth Jan 8, 2023
0ad73db
docs
cadeduckworth Jan 8, 2023
f6beebb
docs
cadeduckworth Jan 8, 2023
036d939
docs
cadeduckworth Jan 8, 2023
58f33c3
docs
cadeduckworth Jan 8, 2023
d704f95
moving workflows location, subsequent doc changes
cadeduckworth Jan 8, 2023
02e10e5
import changed to reflect moving workflows module
cadeduckworth Jan 8, 2023
891782e
docs
cadeduckworth Jan 8, 2023
148e56f
add requirements for sphinx build
cadeduckworth Jan 8, 2023
a03951c
add requirements for sphinx build
cadeduckworth Jan 8, 2023
31291f1
add requirements for sphinx build
cadeduckworth Jan 8, 2023
652e788
add requirements for sphinx build
cadeduckworth Jan 8, 2023
48c7e88
add requirements for sphinx build
cadeduckworth Jan 8, 2023
ba17ceb
docs
cadeduckworth Jan 8, 2023
68cbc9c
docs
cadeduckworth Jan 8, 2023
8c5c2ec
docs structure
cadeduckworth Jan 8, 2023
8d29f91
consistent keyword names in tests
cadeduckworth Jan 8, 2023
24f6c00
imports and docs
cadeduckworth Jan 8, 2023
63e3de4
imports
cadeduckworth Jan 8, 2023
104bf93
imports
cadeduckworth Jan 8, 2023
1ad6b8c
generalizing automation base module for use with other analyses
cadeduckworth Jan 8, 2023
3f65e28
initialize PR for automated-solvation-shell module
cadeduckworth Jan 8, 2023
80c641c
initial script for automated solvation shell analysis
cadeduckworth Jan 9, 2023
09721c7
updated automated solvation module, incomplete, issue with compression
cadeduckworth Jan 9, 2023
68c0b1e
fixed automated solvation shell analysis module
cadeduckworth Jan 9, 2023
7181017
docs and func names
cadeduckworth Jan 9, 2023
e3ef88f
cleanup and remove workflows base module/move to new PR
cadeduckworth Jan 15, 2023
36be611
resolving conflicts
cadeduckworth Apr 5, 2023
b876ad1
add new version of testing resources
cadeduckworth Apr 5, 2023
1f62fa6
correct module name
cadeduckworth Apr 5, 2023
77f7054
start test module for solvations
cadeduckworth Apr 5, 2023
041c6ed
change data obtained by SolvationAnalysis
cadeduckworth Apr 5, 2023
61f53b9
name top-level automated solvation function and complete addition to …
cadeduckworth Apr 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion devtools/conda-envs/test_env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@ dependencies:
- pytest
- pytest-pep8
- pytest-cov
- codecov
- codecov
1 change: 1 addition & 0 deletions doc/sphinx/source/workflows.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,4 @@ for use with :class:`~mdpow.analysis.dihedral.DihedralAnalysis`.
workflows/base
workflows/registry
workflows/dihedrals
workflows/solvations
7 changes: 7 additions & 0 deletions doc/sphinx/source/workflows/solvations.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
==================================
Automated Solvation Shell Analysis
==================================

.. versionadded:: 0.9.0

.. automodule:: mdpow.workflows.solvations
35 changes: 26 additions & 9 deletions mdpow/analysis/solvation.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,21 +56,38 @@ def __init__(self, solute: EnsembleAtomGroup, solvent: EnsembleAtomGroup, distan
self._dists = distances

def _prepare_ensemble(self):
self._col = ['distance', 'solvent', 'interaction',
'lambda', 'time', 'N_solvent']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you now returning distances? If so, better write a new class and leave the old one as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst
I agree, because they are doing different things. Can they both exist as separate classes within the same module?

self._col = ['solute_ix', 'solvent_ix', 'distance',
'solvent', 'interaction',
'lambda', 'time']
self.results = pd.DataFrame(columns=self._col)
self._res_dict = {key: [] for key in self._col}

def _single_frame(self):
solute = self._solute[self._key]
solvent = self._solvent[self._key]
pairs, distances = capped_distance(solute.positions, solvent.positions,
max(self._dists), box=self._ts.dimensions)
solute_i, solvent_j = np.transpose(pairs)
for d in self._dists:
close_solv_atoms = solvent[solvent_j[distances < d]]
result = [d, self._key[0], self._key[1],self._key[2],
self._ts.time, close_solv_atoms.n_atoms]
#pairs, distances = capped_distance(solute.positions, solvent.positions,
pairs, distances = capped_distance(solute, solvent,
self._dists[0], box=self._ts.dimensions,
return_distances=True)
#solute_i, solvent_j = np.transpose(pairs)
#for d in self._dists:
# close_solv_atoms = solvent[solvent_j[distances < d]]
# result = [d, self._key[0], self._key[1],self._key[2],
# self._ts.time, close_solv_atoms.n_atoms]
# for i in range(len(self._col)):
# self._res_dict[self._col[i]].append(result[i])

for k, [i, j] in enumerate(pairs):
#su = solute.positions[i]
su = solute[i]
su_info = [su.ix, su.name, su.type, su.resname, su.resid, su.segid]
#sv = solvent.positions[j]
sv = solvent[j]
sv_info = [sv.ix, sv.name, sv.type, sv.resname, sv.resid, sv.segid]
d = distances[k]
result = [su_info, sv_info, d,
self._key[0], self._key[1],
self._key[2], self._ts.time]
for i in range(len(self._col)):
self._res_dict[self._col[i]].append(result[i])

Expand Down
44 changes: 44 additions & 0 deletions mdpow/tests/test_automated_solvation_shell_analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import os
import sys
import yaml
import pybol
import pytest
import pathlib
import logging

import scipy
import numpy as np
import pandas as pd

import rdkit
from rdkit import Chem

import seaborn

from numpy.testing import assert_almost_equal

from . import RESOURCES

import py.path

from ..workflows import solvations

from pkg_resources import resource_filename

# ^review and update these as necessary, currently copied from test_ada

RESOURCES = pathlib.PurePath(resource_filename(__name__, 'testing_resources'))
MANIFEST = RESOURCES / "manifest.yml"

@pytest.fixture(scope="function")
def molname_workflows_directory(tmp_path, molname='SM25'):
m = pybol.Manifest(str(MANIFEST))
m.assemble('workflows', tmp_path)
return tmp_path / molname

class TestAutomatedSolvationShellAnalysis(object):

@pytest.fixture(scope="function")
def SM25_tmp_dir(self, molname_workflows_directory):
dirname = molname_workflows_directory
return dirname
16 changes: 10 additions & 6 deletions mdpow/workflows/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@
:widths: auto
:name: workflows_registry

+-------------------------------+------------------------------------------------------------------------------------------------------+
| key/keyword: EnsembleAnalysis | value: <workflow module>.<top-level automated analysis function> |
+===============================+======================================================================================================+
| DihedralAnalysis | :any:`dihedrals.automated_dihedral_analysis <mdpow.workflows.dihedrals.automated_dihedral_analysis>` |
+-------------------------------+------------------------------------------------------------------------------------------------------+
+-------------------------------+----------------------------------------------------------------------------------------------------------------------+
| key/keyword: EnsembleAnalysis | value: <workflow module>.<top-level automated analysis function> |
+===============================+======================================================================================================================+
| DihedralAnalysis | :any:`dihedrals.automated_dihedral_analysis <mdpow.workflows.dihedrals.automated_dihedral_analysis>` |
+-------------------------------+----------------------------------------------------------------------------------------------------------------------+
| SolvationAnalysis | :any:`solvations.automated_solvation_shell_analysis <mdpow.workflows.solvations.automated_solvation_shell_analysis>` |
+-------------------------------+----------------------------------------------------------------------------------------------------------------------+

.. autodata:: registry

Expand All @@ -26,10 +28,12 @@

# import analysis
from mdpow.workflows import dihedrals
from mdpow.workflows import solvations

registry = {

'DihedralAnalysis' : dihedrals.automated_dihedral_analysis
'DihedralAnalysis' : dihedrals.automated_dihedral_analysis,
'SolvationAnalysis' : solvations.automated_solvation_shell_analysis

}

Expand Down
115 changes: 115 additions & 0 deletions mdpow/workflows/solvations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# MDPOW: solvations.py
# 2022 Cade Duckworth

"""
:mod:`mdpow.workflows.solvations` --- Automation for :class:`SolvationAnalysis`
==============================================================================
:mod:`~mdpow.workflows.solvations` module with functions
useful for automated use of
:class:`~mdpow.analysis.solvation.SolvationAnalysis`.
See each function for usage, output, and examples.
Most functions can be used as standalone or in combination
depending on the desired results. Complete automation encompassed in
:func:`~mdpow.workflows.solvations.automated_solvation_shell_analysis`.

.. autofunction:: solvation_ensemble
.. autofunction:: solvation_analysis
.. autofunction:: asa_save_df
.. autofunction:: automated_solvation_shell_analysis
"""

import os
import numpy as np
import pandas as pd

import mdpow
from mdpow.analysis.solvation import SolvationAnalysis

import logging

logger = logging.getLogger('mdpow.workflows.solvations')

def solvation_ensemble(dirname, resname, solvents=('water', 'octanol'),
interactions=('Coulomb', 'VDW')):

ens = mdpow.analysis.ensemble.Ensemble(dirname=dirname,
interactions=interactions,
solvents=solvents)
solute = ens.select_atoms(f'resname {resname}')
solvent = ens.select_atoms(f'not resname {resname}')
return solute, solvent

def solvation_analysis(solute=None, solvent=None, distances=None,
start=None, stop=None, step=None):

solv = SolvationAnalysis(solute=solute, solvent=solvent, distances=distances)
ds = solv.run(start=start, stop=stop, step=step)
df = solv.results
return df

def asa_save_df(df, df_save_dir=None, resname=None, molname=None):
'''Takes a :class:`pandas.DataFrame` of results from
:class:`~mdpow.analysis.solvation.SolvationAnalysis`
as input to optionaly save the data.
Given a parent directory, creates subdirectory
for molecule, saves fully sampled csv.
:keywords:
*df*
results :class:`pandas.DataFrame` from
:class:`~mdpow.analysis.solvation.SolvationAnalysis`
*df_save_dir*
path to parent directory to create
subdirectory for saving the .csv files
'''

if molname is None:
molname = resname

if df_save_dir is not None:
subdir = molname
newdir = os.path.join(df_save_dir, subdir)
os.mkdir(newdir)

df = df.sort_values(by=["solvent",
"interaction",
"lambda"]).reset_index(drop=True)

if df_save_dir is not None:
df.to_csv(f'{newdir}/{molname}_full_df.csv', index=False, compression='bz2')
# this part might need some work
return

def automated_solvation_shell_analysis(dirname, df_save_dir=None, resname=None, molname=None,
solvents=('water', 'octanol'), interactions=('Coulomb', 'VDW'),
distances=[1.2, 2.4], figdir=None,
start=None, stop=None, step=None,
SMARTS=None, padding=None, width=None):
# figure out kwargs for each analysis type
"""Measures the number of solvent molecules withing the given distances
in an :class:`~mdpow.analysis.ensemble.Ensemble` .

:Parameters:

*solute*
An :class:`~mdpow.analysis.ensemble.EnsembleAtom` containing the solute
used to measure distance.

*solvent*
An :class:`~mdpow.analysis.ensemble.EnsembleAtom` containing the solvents
counted in by the distance measurement. Each solvent atom is counted by the
distance calculation.

*distances*
Array like of the cutoff distances around the solute measured in Angstroms.

The data is returned in a :class:`pandas.DataFrame` with observations sorted by
distance, solvent, interaction, lambda, time.
"""

components = solvation_ensemble(dirname=dirname, resname=resname, solvents=solvents)
df = solvation_analysis(solute=components[0], solvent=components[1],
distances=distances, start=start, stop=stop, step=step)
if df_save_dir is not None:
asa_save_df(df, df_save_dir=df_save_dir, resname=resname, molname=molname)

return df