Add Ensemble Objects #179

ALescoulie · 2021-08-03T09:30:19Z

Adding Ensemble and Ensemble Analysis

Created a new pull request just for the ensemble object so that the history would be less chaotic. I will build and commit the analysis classes in another fork and pull request. My previous draft request #169 was branched from my python 3 support branch making the commit history a nightmare, so I just created a new one.

I also decided to separate the ensemble objects into their own file.

Summary of New Objects

Ensemble

The Ensemble object is a collection of MDAnalyis Universe objects. It is intended to store the set of systems generated by running mdpow-fep.

The Ensemble object works by storing the systems in a dictionary and extending the functionality of a Universe object to a collection of universes. It when given a directory finds the simulation files, reads then loads them into a dictionary. The object can be indexed the same as a dictionary, and has methods analogous the the Universe object. The main one being select_atoms which returns a EnsembleAtomGroup.

An Ensemble in its current form can also be built by manually adding and popping universes into an empty instance.

benzene_dir = os.path.join('mdpow', 'tests', 'testing_resources', 'states', 'benzene')

Benzene = mdpow.ensemble.Ensemble(dirname=benzene_dir)

DihedralGroup = Benzene.select_atoms('name C1 or name C2 or name C3 or name C4')

EnsembleAtomGroup

The EnsembleAtomGroup is created by running the select_atoms method on an Ensemble. It stores AtomGroup selections of the groups generated by running select atom on individual universes in a dictionary with the same key structure as the parent Ensemble class.

It returns a copy of the parent Ensemble object when the ensemble method is run.

EnsembleAnalysis

The EnsebmleAnalysis is a class inspired by the AnalysisBase in MDAnalyis which iterates over the systems in the ensemble and the frames in the systems. It sets up both iterations between universes and universe frames allowing for analysis to be run on both whole systems and the frames of those systems. This allows for users to easily run analyses on MDPOW simulations.

Example workflow

    class DihedralAnalysis(mdpow.ensemble.EnsembleAnalysis):
        def __init__(self, DihedralEnsembleGroup):
            super(DihedralAnalysis, self).__init__(DihedralEnsembleGroup.ensemble())

            self._sel = DihedralEnsembleGroup

        def _prepare_ensemble(self):
            self.result_dict = {}
            for s in ['water', 'octanol']:
                self.result_dict[s] = {'Coulomb': {},
                                       'VDW': {}}
            for key in self._sel.group_keys():
                self.result_dict[key[0]][key[1]][key[2]] = None

        def _prepare_universe(self):
            self.angle_dict = {'angle': None,
                               'time': None}
            self.angles = []

        def _single_frame(self):
            angle = calc_dihedrals(self._sel[self._key].positions[0], self._sel[self._key].positions[1],
                                   self._sel[self._key].positions[2], self._sel[self._key].positions[3])
            self.angles.append(angle)

        def _conclude_universe(self):
            self.angle_dict['time'] = self.times
            self.angle_dict['angle'] = self.angles
            self.result_dict[self._key[0]][self._key[1]][self._key[2]] = self.angle_dict

        def _conclude_ensemble(self):
            self.results = pd.DataFrame(data=self.result_dict)

codecov · 2021-08-03T20:43:01Z

Codecov Report

Merging #179 (6b19d49) into develop (8c891be) will increase coverage by 2.86%.
The diff coverage is 97.08%.

@@             Coverage Diff             @@
##           develop     #179      +/-   ##
===========================================
+ Coverage    74.78%   77.64%   +2.86%     
===========================================
  Files            9       10       +1     
  Lines         1400     1606     +206     
  Branches       189      227      +38     
===========================================
+ Hits          1047     1247     +200     
  Misses         276      276              
- Partials        77       83       +6

Impacted Files	Coverage Δ
mdpow/analysis/ensemble.py	`97.08% <97.08%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c891be...6b19d49. Read the comment docs.

ALescoulie · 2021-08-03T21:17:15Z

Since we're planning on dropping python 2 support #177 I did not go out of my way to make the python 2 cross compatible.

orbeckst · 2021-08-05T04:50:55Z

It says that you changed 583 files. Is this true?

orbeckst · 2021-08-05T04:54:31Z

Have you integrated the docs already, i.e., added an entry to doc/sphinx/source/index.txt and an a txt file that autodocs the module? I also recommend you make an analysis subdirectory in the docs where you can start adding documentation for the analysis itself.

I am commenting here because the diff is currently to noisy to review.

ALescoulie · 2021-08-05T05:43:23Z

@orbeckst I will defiantly start working on documentation, I understand that this is a large amount of new code, I'm also working on building more robust testing. Writing documentation will also coincide well with my report.

orbeckst · 2021-08-05T18:54:44Z

Good that the docs builds but check https://mdpow--179.org.readthedocs.build/en/179/analysis.html

reST is not markdown ;-)
code documentation is not included yet

see https://www.sphinx-doc.org/en/master/usage/quickstart.html and of course look at existing code and docs.

I also build the docs locally with

conda install sphinx sphinx_rtd_theme    # only needed once
python setup.py build_sphinx
open build/sphinx/html/index.html    # works on macOS

orbeckst · 2021-08-05T18:56:29Z

I will defiantly start working on documentation

Oh-oh... ;-)

ALescoulie · 2021-08-06T01:13:42Z

@orbeckst just got the documentation pushed. Tried to base it off of other documentation I had seen on the site.

ALescoulie · 2021-08-06T01:15:11Z

Also the number of files changed if from the sloppy testing setup, I added a complete benzene simulation and have not gone through and set up more refined tests yet so there are uncompressed xtc files in the testing dir.

orbeckst · 2021-08-09T23:52:17Z

Reduce the size of the test datasets

use something with at least 1 hydrogen bond instead of benzene
3 Coul, 4 VDW windows
trajectories only need to contain 4 frames
try to keep size < 2 MB

orbeckst · 2021-08-17T21:02:33Z

I merged current develop, so this is now definitely Python 3 only.

ALescoulie · 2021-08-18T00:03:19Z

I merged current develop, so this is now definitely Python 3 only.

Thanks, This makes the process of merging a bit simpler. I've spent today running new tests on Tubiwan.

orbeckst · 2021-08-18T18:31:14Z

I merged develop into your branch, don't forget to git pull so that the history remains clean.

ALescoulie · 2021-08-22T02:09:41Z

@orbeckst I have files for water, but for octanol I'm getting some failures on the NPT. I know we wanted to have two solvents for testing, so that right now is the biggest delay. Aside from that I have two other issues to fix.

Add way to specify topology with kwarg
Increase coverage especially for universe building

orbeckst · 2021-08-22T06:16:11Z

What's the NPT failure?

ALescoulie · 2021-08-22T06:24:01Z

I posted it in free energy methods in slack. It was due to pressure scaling greater than 1%

ALescoulie · 2021-08-24T01:00:45Z

@orbeckst and @VOD555 I think this request is done. I still would like to get the octanol test files added, but at this point am having gromacs issues running it, so will add them later. I also added a kwarg for specifying a topology, and tested it. The one big issue still remaining is the lack of a test on the exception handling on the _load_dir_unv function in Ensemble, but I don't anticipate multiple topologies per lambda directory to be a issue for most use cases.

ALescoulie · 2021-08-25T21:18:18Z

@orbeckst I made a few quick changes to the docs fixing the links and a typo, tell me if they need anything else.

orbeckst

I managed to review ~3/4 but still have some to do.

Please see inline comments and additional comments here:

Re-organize by creating a analysis/ subpackage, initially with __init__.py and ensemble.py but later with dihedral, first solvation shell, etc code.
update CHANGELOG
bzip2 all gro files to reduce the file size (MDAnalysis can read compressed files) — you also need to adapt your code to also be able to find them

doc/sphinx/source/analysis.txt

mdpow/ensemble.py

orbeckst · 2021-08-27T01:23:26Z

@VOD555 can you please also start reviewing the PR? Thanks.

mdpow/analysis/ensemble.py

ALescoulie · 2021-08-28T00:12:15Z

@orbeckst @VOD555 I think I addressed the changes both of you requested.

orbeckst

Many of my previous comments are not visible on mdpow/analysis/ensemble.py but you can see them in the PR above on mdpow/ensemble.py. Please address those in addition to the new one. Thanks.

CHANGES

orbeckst · 2021-08-28T01:46:00Z

mdpow/tests/test_ensemble.py

+        self.old_path = os.getcwd()
+        self.resources = os.path.join(
+            self.old_path, 'mdpow', 'tests', 'testing_resources')


mdpow/tests/test_ensemble.py

doc/sphinx/source/index.txt

doc/sphinx/source/analysis.txt

doc/sphinx/source/ensemble.txt

doc/sphinx/source/analysis.txt

mdpow/tests/test_ensemble.py

orbeckst · 2021-08-28T02:00:12Z

@ALescoulie please add a comment to each of my comments when you fix it — either "done" or a reason why you don't want to or can't address it. This will make it easier for me when reviewing. Don't resolve them, I'll do this when I re-review. Thanks.

orbeckst

Thank you for addressing my previous comments.

My main issue is with _load_universe_from_dirs() and add_system(). As you see in the comments, I'd like _load_universe_from_dirs() to be a pure function (method) without side effects that simply returns a Universe. Then add_system() can be simplified to only add a Universe. This will make the code clearer and more robust.

It will also hopefully create less necessity for checking for NoDataError etc. In particular, get rid of the exception NoDataWarning. If we need an exception for this case, use NoDataError. See the comments — apologies, the comments are a bit sprawling as I was learning how the code worked.

Compress almost all gro files as bz2, and leave one as gro and one as gz.

select_systems() looks good for MDPOW's purposes: it can select for what we care and does not do anything that's too fancy. I like how you can create new Ensembles and EnsembleAtomGroups. This looks like the right approach.

Regarding rewriting the history of this PR: Let's finish it here where we have all the comments. Once the PR is approved, you can make a new, clean one if you like. But I can also just squash this one, which will get rid of intermediate commits, too.

orbeckst · 2021-09-06T19:57:34Z

mdpow/tests/test_ensemble.py

+    def test_build_exception(self):
+        ens = Ensemble()
+        with in_dir(os.path.join(self.tmpdir.name, 'FEP', 'test_solv'), create=False):
+            with pytest.raises(NoDataWarning):


We should not be calling an exception SomethingWarning. Exceptions are SomethingError, so the code that raises this exception should raise NoDataError. Once that is done, this test needs to be adapted.

mdpow/tests/test_ensemble.py

orbeckst · 2021-09-06T20:08:18Z

mdpow/analysis/ensemble.py

+
+        if not os.path.exists(dirname):
+            logger.error(f"Directory {dirname} does not exist")
+            raise FileNotFoundError


Give more information in the exception

Suggested change

raise FileNotFoundError

raise FileNotFoundError(errno.ENOENT, "Directory does not exist", dirname)

done

edit: responded to wrong exception

mdpow/analysis/ensemble.py

orbeckst · 2021-09-06T21:25:39Z

mdpow/analysis/ensemble.py

+            for k in self.keys():
+                if self[k] != other[k]:
+                    return False
+            return True


rewrite as

Suggested change

for k in self.keys():

if self[k] != other[k]:

return False

return True

return all(self[k] == other[k] for k in self.keys())

orbeckst · 2021-09-06T21:27:08Z

mdpow/analysis/ensemble.py

+        else:
+            return False


can be shortened

Suggested change

else:

return False

return False

ALescoulie · 2021-09-07T18:37:08Z

@orbeckst I finished addressing all your comments and got it near complete coverage on ensemble with the only thing being some partials.

orbeckst

Very good changes, code is much cleaner and excellent test coverage improvement.

A few minor comments (see inline), mostly docs.
Compress almost all of the gro files with bz2, one with gz, and leave one uncompressed.

orbeckst · 2021-09-09T01:05:44Z

mdpow/analysis/ensemble.py

+                int_dir = os.path.join(fep_dir, solvent, dirs)
+                with in_dir(int_dir, create=False):  # Entering attribute folders
+                    logger.info("Searching %s directory for systems", os.curdir)
+                    files: list = os.listdir(os.curdir)


Why the type annotation?

not really important, can be removed.

orbeckst · 2021-09-09T01:06:01Z

mdpow/analysis/ensemble.py

+
+        logs warning if more than one topology is in directory. If
+        more than one trajectory attempts to load both of them
+        in a universe if that fail will try to load each individually"""


Still needs to be added to doc string.

mdpow/analysis/ensemble.py

orbeckst · 2021-09-09T01:07:38Z

mdpow/analysis/ensemble.py

+        Takes specified key and either existing mda.Universe object or
+        trajectory and topology path. Ensure that paths are set to absolute


update doc string

mdpow/analysis/ensemble.py

orbeckst · 2021-09-09T01:17:00Z

@VOD555 can you please check if @ALescoulie satisfied your review requirements. If you're happy please approve, otherwise please ask for any other necessary changes.

ALescoulie · 2021-09-09T15:36:38Z

@orbeckst I got the files compressed and uploaded.

VOD555

I'm ok with the updates.

orbeckst · 2021-09-09T17:20:21Z

I am working on the squash merge... will be merged in <5 mins.

orbeckst · 2021-09-09T17:24:07Z

Hooray, it's in, congratulations @ALescoulie 🥳 !!!

* new Ensemble framework for aggregate analysis in mdpow.analysis * part of Becksteinlab#168 (adding analysis to MDPOW) * new submodule mdpow.analysis * new Ensemble, EnsembleAtomGroup, and EnsembleAnalysis classes * add docs (new section on Analysis) * add tests including ensemble test data (water simulations, octanol to be added later) * update CHANGES

ALescoulie mentioned this pull request Aug 3, 2021

Analysis Draft #169

Closed

ALescoulie linked an issue Aug 3, 2021 that may be closed by this pull request

Adding Analysis Module #168

Closed

3 tasks

ALescoulie added the enhancement label Aug 3, 2021

orbeckst added this to the 0.8.0 milestone Aug 4, 2021

ALescoulie requested review from orbeckst and VOD555 August 24, 2021 00:58

orbeckst requested changes Aug 27, 2021

View reviewed changes

VOD555 requested changes Aug 27, 2021

View reviewed changes

mdpow/analysis/ensemble.py Outdated Show resolved Hide resolved

mdpow/analysis/ensemble.py Show resolved Hide resolved

orbeckst requested changes Aug 28, 2021

View reviewed changes

ALescoulie added 7 commits September 2, 2021 17:56

improve test_ensemble.py and ensemble.py exceptions

58739d2

Reorganize docs

f896243

update ensemble.py methods

54291b1

add test tpr

455b4c3

add test tpr

7b259fd

fix docs links

348ba2d

update manifest.yml

f609cd3

ALescoulie force-pushed the ensemble branch from 0443228 to f609cd3 Compare September 3, 2021 01:00

Merge branch 'develop' into ensemble

6c4374b

orbeckst requested changes Sep 6, 2021

View reviewed changes

ALescoulie added 5 commits September 6, 2021 18:06

Restructure _load_universe_from_dir and add_universe methods

76cfca0

Merge remote-tracking branch 'origin/ensemble' into ensemble

406c713

fix _load_universe_from_dir and add kwargs option

fef50e7

improve select_systems

d7a9db2

simplify __eq__

dfe214e

orbeckst requested changes Sep 9, 2021

View reviewed changes

Compress test files and fix _load_universe_from_dir

1c03210

Update docs

6b19d49

VOD555 approved these changes Sep 9, 2021

View reviewed changes

ALescoulie mentioned this pull request Sep 9, 2021

Adding Analysis Module #168

Closed

3 tasks

orbeckst approved these changes Sep 9, 2021

View reviewed changes

orbeckst self-assigned this Sep 9, 2021

orbeckst merged commit 0b969cf into Becksteinlab:develop Sep 9, 2021

ALescoulie mentioned this pull request Sep 10, 2021

Dihedral Analysis #193

Merged

2 tasks

	raise FileNotFoundError
	raise FileNotFoundError(errno.ENOENT, "Directory does not exist", dirname)

		Takes specified key and either existing mda.Universe object or
		trajectory and topology path. Ensure that paths are set to absolute

Add Ensemble Objects #179

Add Ensemble Objects #179

Conversation

ALescoulie commented Aug 3, 2021 • edited Loading

Adding Ensemble and Ensemble Analysis

Summary of New Objects

Ensemble

EnsembleAtomGroup

EnsembleAnalysis

codecov bot commented Aug 3, 2021 • edited Loading

Codecov Report

ALescoulie commented Aug 3, 2021

orbeckst commented Aug 5, 2021

orbeckst commented Aug 5, 2021

ALescoulie commented Aug 5, 2021

orbeckst commented Aug 5, 2021

orbeckst commented Aug 5, 2021

ALescoulie commented Aug 6, 2021

ALescoulie commented Aug 6, 2021

orbeckst commented Aug 9, 2021

orbeckst commented Aug 17, 2021

ALescoulie commented Aug 18, 2021

orbeckst commented Aug 18, 2021

ALescoulie commented Aug 22, 2021 • edited Loading

orbeckst commented Aug 22, 2021

ALescoulie commented Aug 22, 2021

ALescoulie commented Aug 24, 2021

ALescoulie commented Aug 25, 2021

orbeckst left a comment

Choose a reason for hiding this comment

orbeckst commented Aug 27, 2021

ALescoulie commented Aug 28, 2021

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Aug 28, 2021 • edited Loading

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ALescoulie Sep 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ALescoulie commented Sep 7, 2021

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst commented Sep 9, 2021

ALescoulie commented Sep 9, 2021

VOD555 left a comment

Choose a reason for hiding this comment

orbeckst commented Sep 9, 2021

orbeckst commented Sep 9, 2021

ALescoulie commented Aug 3, 2021 •

edited

Loading

codecov bot commented Aug 3, 2021 •

edited

Loading

ALescoulie commented Aug 22, 2021 •

edited

Loading

orbeckst commented Aug 28, 2021 •

edited

Loading

ALescoulie Sep 7, 2021 •

edited

Loading