-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Ensemble Objects #179
Add Ensemble Objects #179
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #179 +/- ##
===========================================
+ Coverage 74.78% 77.64% +2.86%
===========================================
Files 9 10 +1
Lines 1400 1606 +206
Branches 189 227 +38
===========================================
+ Hits 1047 1247 +200
Misses 276 276
- Partials 77 83 +6
Continue to review full report at Codecov.
|
Since we're planning on dropping python 2 support #177 I did not go out of my way to make the python 2 cross compatible. |
It says that you changed 583 files. Is this true? |
Have you integrated the docs already, i.e., added an entry to doc/sphinx/source/index.txt and an a txt file that autodocs the module? I also recommend you make an I am commenting here because the diff is currently to noisy to review. |
@orbeckst I will defiantly start working on documentation, I understand that this is a large amount of new code, I'm also working on building more robust testing. Writing documentation will also coincide well with my report. |
Good that the docs builds but check https://mdpow--179.org.readthedocs.build/en/179/analysis.html
see https://www.sphinx-doc.org/en/master/usage/quickstart.html and of course look at existing code and docs. I also build the docs locally with conda install sphinx sphinx_rtd_theme # only needed once
python setup.py build_sphinx
open build/sphinx/html/index.html # works on macOS |
Oh-oh... ;-) |
@orbeckst just got the documentation pushed. Tried to base it off of other documentation I had seen on the site. |
Also the number of files changed if from the sloppy testing setup, I added a complete benzene simulation and have not gone through and set up more refined tests yet so there are uncompressed xtc files in the testing dir. |
Reduce the size of the test datasets
|
I merged current develop, so this is now definitely Python 3 only. |
Thanks, This makes the process of merging a bit simpler. I've spent today running new tests on Tubiwan. |
I merged develop into your branch, don't forget to |
@orbeckst I have files for water, but for octanol I'm getting some failures on the NPT. I know we wanted to have two solvents for testing, so that right now is the biggest delay. Aside from that I have two other issues to fix.
|
What's the NPT failure? |
I posted it in free energy methods in slack. It was due to pressure scaling greater than 1% |
@orbeckst and @VOD555 I think this request is done. I still would like to get the octanol test files added, but at this point am having gromacs issues running it, so will add them later. I also added a kwarg for specifying a topology, and tested it. The one big issue still remaining is the lack of a test on the exception handling on the _load_dir_unv function in Ensemble, but I don't anticipate multiple topologies per lambda directory to be a issue for most use cases. |
@orbeckst I made a few quick changes to the docs fixing the links and a typo, tell me if they need anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I managed to review ~3/4 but still have some to do.
Please see inline comments and additional comments here:
- Re-organize by creating a
analysis/
subpackage, initially with__init__.py
andensemble.py
but later with dihedral, first solvation shell, etc code. - update CHANGELOG
- bzip2 all gro files to reduce the file size (MDAnalysis can read compressed files) — you also need to adapt your code to also be able to find them
@VOD555 can you please also start reviewing the PR? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of my previous comments are not visible on mdpow/analysis/ensemble.py
but you can see them in the PR above on mdpow/ensemble.py
. Please address those in addition to the new one. Thanks.
mdpow/tests/test_ensemble.py
Outdated
self.old_path = os.getcwd() | ||
self.resources = os.path.join( | ||
self.old_path, 'mdpow', 'tests', 'testing_resources') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still todo
@ALescoulie please add a comment to each of my comments when you fix it — either "done" or a reason why you don't want to or can't address it. This will make it easier for me when reviewing. Don't resolve them, I'll do this when I re-review. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my previous comments.
My main issue is with _load_universe_from_dirs()
and add_system()
. As you see in the comments, I'd like _load_universe_from_dirs()
to be a pure function (method) without side effects that simply returns a Universe. Then add_system()
can be simplified to only add a Universe. This will make the code clearer and more robust.
It will also hopefully create less necessity for checking for NoDataError etc. In particular, get rid of the exception NoDataWarning
. If we need an exception for this case, use NoDataError
. See the comments — apologies, the comments are a bit sprawling as I was learning how the code worked.
Compress almost all gro files as bz2, and leave one as gro and one as gz.
select_systems()
looks good for MDPOW's purposes: it can select for what we care and does not do anything that's too fancy. I like how you can create new Ensembles and EnsembleAtomGroups. This looks like the right approach.
Regarding rewriting the history of this PR: Let's finish it here where we have all the comments. Once the PR is approved, you can make a new, clean one if you like. But I can also just squash this one, which will get rid of intermediate commits, too.
mdpow/tests/test_ensemble.py
Outdated
def test_build_exception(self): | ||
ens = Ensemble() | ||
with in_dir(os.path.join(self.tmpdir.name, 'FEP', 'test_solv'), create=False): | ||
with pytest.raises(NoDataWarning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be calling an exception SomethingWarning. Exceptions are SomethingError, so the code that raises this exception should raise NoDataError
. Once that is done, this test needs to be adapted.
mdpow/analysis/ensemble.py
Outdated
|
||
if not os.path.exists(dirname): | ||
logger.error(f"Directory {dirname} does not exist") | ||
raise FileNotFoundError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give more information in the exception
raise FileNotFoundError | |
raise FileNotFoundError(errno.ENOENT, "Directory does not exist", dirname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
edit: responded to wrong exception
mdpow/analysis/ensemble.py
Outdated
for k in self.keys(): | ||
if self[k] != other[k]: | ||
return False | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewrite as
for k in self.keys(): | |
if self[k] != other[k]: | |
return False | |
return True | |
return all(self[k] == other[k] for k in self.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mdpow/analysis/ensemble.py
Outdated
else: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be shortened
else: | |
return False | |
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@orbeckst I finished addressing all your comments and got it near complete coverage on ensemble with the only thing being some partials. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good changes, code is much cleaner and excellent test coverage improvement.
- A few minor comments (see inline), mostly docs.
- Compress almost all of the gro files with bz2, one with gz, and leave one uncompressed.
mdpow/analysis/ensemble.py
Outdated
int_dir = os.path.join(fep_dir, solvent, dirs) | ||
with in_dir(int_dir, create=False): # Entering attribute folders | ||
logger.info("Searching %s directory for systems", os.curdir) | ||
files: list = os.listdir(os.curdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the type annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really important, can be removed.
mdpow/analysis/ensemble.py
Outdated
|
||
logs warning if more than one topology is in directory. If | ||
more than one trajectory attempts to load both of them | ||
in a universe if that fail will try to load each individually""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needs to be added to doc string.
mdpow/analysis/ensemble.py
Outdated
Takes specified key and either existing mda.Universe object or | ||
trajectory and topology path. Ensure that paths are set to absolute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update doc string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@VOD555 can you please check if @ALescoulie satisfied your review requirements. If you're happy please approve, otherwise please ask for any other necessary changes. |
@orbeckst I got the files compressed and uploaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the updates.
I am working on the squash merge... will be merged in <5 mins. |
Hooray, it's in, congratulations @ALescoulie 🥳 !!! |
* new Ensemble framework for aggregate analysis in mdpow.analysis * part of Becksteinlab#168 (adding analysis to MDPOW) * new submodule mdpow.analysis * new Ensemble, EnsembleAtomGroup, and EnsembleAnalysis classes * add docs (new section on Analysis) * add tests including ensemble test data (water simulations, octanol to be added later) * update CHANGES
Adding Ensemble and Ensemble Analysis
Created a new pull request just for the ensemble object so that the history would be less chaotic. I will build and commit the analysis classes in another fork and pull request. My previous draft request #169 was branched from my python 3 support branch making the commit history a nightmare, so I just created a new one.
I also decided to separate the ensemble objects into their own file.
Summary of New Objects
Ensemble
The Ensemble object is a collection of MDAnalyis Universe objects. It is intended to store the set of systems generated by running mdpow-fep.
The Ensemble object works by storing the systems in a dictionary and extending the functionality of a Universe object to a collection of universes. It when given a directory finds the simulation files, reads then loads them into a dictionary. The object can be indexed the same as a dictionary, and has methods analogous the the Universe object. The main one being
select_atoms
which returns a EnsembleAtomGroup.An Ensemble in its current form can also be built by manually adding and popping universes into an empty instance.
EnsembleAtomGroup
The EnsembleAtomGroup is created by running the
select_atoms
method on an Ensemble. It stores AtomGroup selections of the groups generated by running select atom on individual universes in a dictionary with the same key structure as the parent Ensemble class.It returns a copy of the parent Ensemble object when the
ensemble
method is run.EnsembleAnalysis
The EnsebmleAnalysis is a class inspired by the AnalysisBase in MDAnalyis which iterates over the systems in the ensemble and the frames in the systems. It sets up both iterations between universes and universe frames allowing for analysis to be run on both whole systems and the frames of those systems. This allows for users to easily run analyses on MDPOW simulations.
Example workflow