-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated Solvation Shell Analysis #227
base: develop
Are you sure you want to change the base?
Conversation
* separating functionality for use with _single_universe without iterating frames * retaining functionality for use with _single_frame
* try/except pattern condensed to one run method * single run method implemented for _single_universe and _single_frame * functionality of original run method retained, while removing frame iteration for _single_universe usage * progress bar still active when using _single_universe, but no frame iteration occurs
directory_paths: finds MDPOW project data and returns pandas DataFrame directory_iteration: takes pandas DataFrame or csv and iterates automated_dihedral_analysis over MDPOW project directories automated_dihedral_analysis: group of functions that includes automation scripts for DihedralAnalysis, automatic dihedral atom group selection, seaborn violin plots function, and other required functions Future changes will eliminate redundancies following initial testing
…class to test possible ways to call automation-related functions
removed raise added docstrings to describe use of NotImplementedError responded with reason for removing _conclude_universe() next step: updating html documentation following successful testing
fixed expected indent block error removed raise removed _conclude_universe()
…se automation for DihedralAnalysis automated_dihedral_analysis.py directory_iteration.py directory_paths.py
…ained within automated_dihedral_analysis.py
…v files in provided parent directory
…ted-dihedral-analysis updating with changes for backend and testing from PR #216 ensemble_run_update
*incomplete, pushing to test *source and html doc additions *changed all instances of datadir to dirname for consistency with DihedralAnalysis *all changes pertain to new workflows modules
Codecov Report
@@ Coverage Diff @@
## develop #227 +/- ##
===========================================
- Coverage 80.15% 78.70% -1.45%
===========================================
Files 15 16 +1
Lines 1900 1935 +35
Branches 291 295 +4
===========================================
Hits 1523 1523
- Misses 286 321 +35
Partials 91 91
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
ASA works as is workflows base module needs adjustment for proper use
Lots of conflicts to resolve, @cadeduckworth ! |
@orbeckst I made changes that allow a user to input a max distance, and obtain all of the solute-solvent interactions within that distance (distance and atom/atomgroup info included), per molecule, per solvent, per interaction, per lambda. This is a very rough draft, especially for the plot, but the workflow, results DataFrame, and FacetGrid get the main idea across for what I am going for. SAMPL9 Data, FF = GAFF, Molecule = S08 Results DF: (it might be better to break the Figure (FacetGrid): (please excuse the quality, I am trying to work out the kinks in the |
The manual workflow as shown is a good mini tutorial and looks already very good. Presenting the data will be more of a challenge—the distance KDEs are naturally cut off at the cutoff, which makes it look a bit weird/incomplete. And once the solute disappears, the N becomes quite big and swamps the small numbers. Perhaps try a log plot?? Or normalize??? Also, are the large numbers compatible with what you get for a sphere of radius with the cutoff at the density of water or octanol/other solvent? |
The distance histograms are interesting. You can see how molecules start to overlap with the disappearing solute. |
…workflows registry/registry docs
@orbeckst On that note, I think seeing that there is not much change when turning of Coulombic interactions for neutral solute/solvent is a good indication, and then seeing the gradual increase of number of solvent molecules at a decreased distance when turning off VDW in that sense is also a good indication of sufficient sampling, IFF, this holds up for other cases with more frames included. Perhaps I should run a simulation with one of the charged molecules that was excluded for logP calculations from one of the SAMPL sets and see what happens if I can get GROMACS to comply? EDIT: paragraph spacing, wording After thought: a nearest neighbors calculation or comparison of individual lambdas to a population distribution could help later in identifying when some systems are not behaving? |
I will look into approximately homogenous equilibration of solvents to try to confirm the data for more completely sampled datasets when I get the chance (those lambdas for VDW closer to 1, when the solute is essentially removed). I apologize if I did not use the correct terms there, it has been a while since I covered equilibrium chemistry, but I think I understand what you are getting at. |
I ran a more complete analysis of the same SAMPL9 GAFF molecule (S08) with a step size of 500 and cleaned up the plots a bit. So, this is the SVG converted to pdf. The width ratios are slightly off, which affects the placement of the Mol object, but I cleaned up the scale on the y-axes as well as the height/width aspect ratio. Fixing the axes/scales and including much more data seems to give a better look at what is going on. If we do eventually go with a stacked plot scheme, then I will reserve less of the right side and reduce it to a single Mol object and legend to make it look better. |
From earlier this morning before making changes to Using the original |
For the S08 data, the number of solvent atoms is initially 60,000 — there aren't that many waters in the box, are they? Can you do the following calculation: Given the standard density of water at the simulation conditions, how many water molecules would you expect to find in a volume roughly equivalent to (1) the solute, (2) the solute with its surface blown up by 4 Å? A rough physicist estimate will suffice as a start, e.g., approximating the molecule as a cylinder or sphere... Then compare the estimate to the numbers from solvation analysis. |
@@ -56,21 +56,38 @@ def __init__(self, solute: EnsembleAtomGroup, solvent: EnsembleAtomGroup, distan | |||
self._dists = distances | |||
|
|||
def _prepare_ensemble(self): | |||
self._col = ['distance', 'solvent', 'interaction', | |||
'lambda', 'time', 'N_solvent'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you now returning distances? If so, better write a new class and leave the old one as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orbeckst
I agree, because they are doing different things. Can they both exist as separate classes within the same module?
I am not sure if the time series contain more interesting information than the averages or distributions — at least in equilibrium. Time series in equilibrium are good as a diagnostic to check if the system is likely sampling equilibrium: if the fluctuations are around a stable mean then it might be equilibrium... |
I don't think that we need to add the molecular structure next to the solvation plots unless you want to highlight specific atoms. Otherwise I'd expect a molecular structure to be included in an overview panel and the reader can use the ID to cross-reference. |
@orbeckst |
Each atom should only be counted once. There's a single closest interaction between a solvent atom and the solute (barring coincidental equidistant triangles). |
@cadeduckworth is this something you'd want to complete for a MDPOW paper? Can you resolve the conflicts so that one can look at the changes in the context of the latest version of the code? |
No description provided.