Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ensemble Objects #179

Merged
merged 39 commits into from
Sep 9, 2021
Merged

Add Ensemble Objects #179

merged 39 commits into from
Sep 9, 2021

Conversation

ALescoulie
Copy link
Contributor

@ALescoulie ALescoulie commented Aug 3, 2021

Adding Ensemble and Ensemble Analysis

Created a new pull request just for the ensemble object so that the history would be less chaotic. I will build and commit the analysis classes in another fork and pull request. My previous draft request #169 was branched from my python 3 support branch making the commit history a nightmare, so I just created a new one.

I also decided to separate the ensemble objects into their own file.

Summary of New Objects

Ensemble

The Ensemble object is a collection of MDAnalyis Universe objects. It is intended to store the set of systems generated by running mdpow-fep.

The Ensemble object works by storing the systems in a dictionary and extending the functionality of a Universe object to a collection of universes. It when given a directory finds the simulation files, reads then loads them into a dictionary. The object can be indexed the same as a dictionary, and has methods analogous the the Universe object. The main one being select_atoms which returns a EnsembleAtomGroup.

An Ensemble in its current form can also be built by manually adding and popping universes into an empty instance.

benzene_dir = os.path.join('mdpow', 'tests', 'testing_resources', 'states', 'benzene')

Benzene = mdpow.ensemble.Ensemble(dirname=benzene_dir)

DihedralGroup = Benzene.select_atoms('name C1 or name C2 or name C3 or name C4')

EnsembleAtomGroup

The EnsembleAtomGroup is created by running the select_atoms method on an Ensemble. It stores AtomGroup selections of the groups generated by running select atom on individual universes in a dictionary with the same key structure as the parent Ensemble class.

It returns a copy of the parent Ensemble object when the ensemble method is run.

EnsembleAnalysis

The EnsebmleAnalysis is a class inspired by the AnalysisBase in MDAnalyis which iterates over the systems in the ensemble and the frames in the systems. It sets up both iterations between universes and universe frames allowing for analysis to be run on both whole systems and the frames of those systems. This allows for users to easily run analyses on MDPOW simulations.

Example workflow

    class DihedralAnalysis(mdpow.ensemble.EnsembleAnalysis):
        def __init__(self, DihedralEnsembleGroup):
            super(DihedralAnalysis, self).__init__(DihedralEnsembleGroup.ensemble())

            self._sel = DihedralEnsembleGroup

        def _prepare_ensemble(self):
            self.result_dict = {}
            for s in ['water', 'octanol']:
                self.result_dict[s] = {'Coulomb': {},
                                       'VDW': {}}
            for key in self._sel.group_keys():
                self.result_dict[key[0]][key[1]][key[2]] = None

        def _prepare_universe(self):
            self.angle_dict = {'angle': None,
                               'time': None}
            self.angles = []

        def _single_frame(self):
            angle = calc_dihedrals(self._sel[self._key].positions[0], self._sel[self._key].positions[1],
                                   self._sel[self._key].positions[2], self._sel[self._key].positions[3])
            self.angles.append(angle)

        def _conclude_universe(self):
            self.angle_dict['time'] = self.times
            self.angle_dict['angle'] = self.angles
            self.result_dict[self._key[0]][self._key[1]][self._key[2]] = self.angle_dict

        def _conclude_ensemble(self):
            self.results = pd.DataFrame(data=self.result_dict)

@ALescoulie ALescoulie mentioned this pull request Aug 3, 2021
@ALescoulie ALescoulie linked an issue Aug 3, 2021 that may be closed by this pull request
3 tasks
@codecov
Copy link

codecov bot commented Aug 3, 2021

Codecov Report

Merging #179 (6b19d49) into develop (8c891be) will increase coverage by 2.86%.
The diff coverage is 97.08%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #179      +/-   ##
===========================================
+ Coverage    74.78%   77.64%   +2.86%     
===========================================
  Files            9       10       +1     
  Lines         1400     1606     +206     
  Branches       189      227      +38     
===========================================
+ Hits          1047     1247     +200     
  Misses         276      276              
- Partials        77       83       +6     
Impacted Files Coverage Δ
mdpow/analysis/ensemble.py 97.08% <97.08%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c891be...6b19d49. Read the comment docs.

@ALescoulie
Copy link
Contributor Author

Since we're planning on dropping python 2 support #177 I did not go out of my way to make the python 2 cross compatible.

@orbeckst orbeckst added this to the 0.8.0 milestone Aug 4, 2021
@orbeckst
Copy link
Member

orbeckst commented Aug 5, 2021

It says that you changed 583 files. Is this true?

@orbeckst
Copy link
Member

orbeckst commented Aug 5, 2021

Have you integrated the docs already, i.e., added an entry to doc/sphinx/source/index.txt and an a txt file that autodocs the module? I also recommend you make an analysis subdirectory in the docs where you can start adding documentation for the analysis itself.

I am commenting here because the diff is currently to noisy to review.

@ALescoulie
Copy link
Contributor Author

@orbeckst I will defiantly start working on documentation, I understand that this is a large amount of new code, I'm also working on building more robust testing. Writing documentation will also coincide well with my report.

@orbeckst
Copy link
Member

orbeckst commented Aug 5, 2021

Good that the docs builds but check https://mdpow--179.org.readthedocs.build/en/179/analysis.html

  • reST is not markdown ;-)
  • code documentation is not included yet

see https://www.sphinx-doc.org/en/master/usage/quickstart.html and of course look at existing code and docs.

I also build the docs locally with

conda install sphinx sphinx_rtd_theme    # only needed once
python setup.py build_sphinx
open build/sphinx/html/index.html    # works on macOS

@orbeckst
Copy link
Member

orbeckst commented Aug 5, 2021

I will defiantly start working on documentation

Oh-oh... ;-)

@ALescoulie
Copy link
Contributor Author

@orbeckst just got the documentation pushed. Tried to base it off of other documentation I had seen on the site.

@ALescoulie
Copy link
Contributor Author

Also the number of files changed if from the sloppy testing setup, I added a complete benzene simulation and have not gone through and set up more refined tests yet so there are uncompressed xtc files in the testing dir.

@orbeckst
Copy link
Member

orbeckst commented Aug 9, 2021

Reduce the size of the test datasets

  • use something with at least 1 hydrogen bond instead of benzene
  • 3 Coul, 4 VDW windows
  • trajectories only need to contain 4 frames
  • try to keep size < 2 MB

@orbeckst
Copy link
Member

I merged current develop, so this is now definitely Python 3 only.

@ALescoulie
Copy link
Contributor Author

I merged current develop, so this is now definitely Python 3 only.

Thanks, This makes the process of merging a bit simpler. I've spent today running new tests on Tubiwan.

@orbeckst
Copy link
Member

I merged develop into your branch, don't forget to git pull so that the history remains clean.

@ALescoulie
Copy link
Contributor Author

ALescoulie commented Aug 22, 2021

@orbeckst I have files for water, but for octanol I'm getting some failures on the NPT. I know we wanted to have two solvents for testing, so that right now is the biggest delay. Aside from that I have two other issues to fix.

  • Add way to specify topology with kwarg
  • Increase coverage especially for universe building

@orbeckst
Copy link
Member

What's the NPT failure?

@ALescoulie
Copy link
Contributor Author

I posted it in free energy methods in slack. It was due to pressure scaling greater than 1%

@ALescoulie ALescoulie requested review from orbeckst and VOD555 August 24, 2021 00:58
@ALescoulie
Copy link
Contributor Author

@orbeckst and @VOD555 I think this request is done. I still would like to get the octanol test files added, but at this point am having gromacs issues running it, so will add them later. I also added a kwarg for specifying a topology, and tested it. The one big issue still remaining is the lack of a test on the exception handling on the _load_dir_unv function in Ensemble, but I don't anticipate multiple topologies per lambda directory to be a issue for most use cases.

@ALescoulie
Copy link
Contributor Author

@orbeckst I made a few quick changes to the docs fixing the links and a typo, tell me if they need anything else.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to review ~3/4 but still have some to do.

Please see inline comments and additional comments here:

  1. Re-organize by creating a analysis/ subpackage, initially with __init__.py and ensemble.py but later with dihedral, first solvation shell, etc code.
  2. update CHANGELOG
  3. bzip2 all gro files to reduce the file size (MDAnalysis can read compressed files) — you also need to adapt your code to also be able to find them

doc/sphinx/source/analysis.txt Outdated Show resolved Hide resolved
doc/sphinx/source/analysis.txt Show resolved Hide resolved
doc/sphinx/source/analysis.txt Outdated Show resolved Hide resolved
doc/sphinx/source/analysis.txt Outdated Show resolved Hide resolved
doc/sphinx/source/analysis.txt Show resolved Hide resolved
mdpow/ensemble.py Outdated Show resolved Hide resolved
mdpow/ensemble.py Outdated Show resolved Hide resolved
mdpow/ensemble.py Outdated Show resolved Hide resolved
mdpow/ensemble.py Outdated Show resolved Hide resolved
mdpow/ensemble.py Outdated Show resolved Hide resolved
@orbeckst
Copy link
Member

@VOD555 can you please also start reviewing the PR? Thanks.

mdpow/analysis/ensemble.py Outdated Show resolved Hide resolved
mdpow/analysis/ensemble.py Show resolved Hide resolved
@ALescoulie
Copy link
Contributor Author

@orbeckst @VOD555 I think I addressed the changes both of you requested.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of my previous comments are not visible on mdpow/analysis/ensemble.py but you can see them in the PR above on mdpow/ensemble.py. Please address those in addition to the new one. Thanks.

CHANGES Outdated Show resolved Hide resolved
Comment on lines 36 to 38
self.old_path = os.getcwd()
self.resources = os.path.join(
self.old_path, 'mdpow', 'tests', 'testing_resources')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still todo

mdpow/tests/test_ensemble.py Outdated Show resolved Hide resolved
doc/sphinx/source/index.txt Outdated Show resolved Hide resolved
doc/sphinx/source/analysis.txt Outdated Show resolved Hide resolved
doc/sphinx/source/ensemble.txt Outdated Show resolved Hide resolved
doc/sphinx/source/analysis.txt Show resolved Hide resolved
mdpow/tests/test_ensemble.py Show resolved Hide resolved
@orbeckst
Copy link
Member

orbeckst commented Aug 28, 2021

@ALescoulie please add a comment to each of my comments when you fix it — either "done" or a reason why you don't want to or can't address it. This will make it easier for me when reviewing. Don't resolve them, I'll do this when I re-review. Thanks.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my previous comments.

My main issue is with _load_universe_from_dirs() and add_system(). As you see in the comments, I'd like _load_universe_from_dirs() to be a pure function (method) without side effects that simply returns a Universe. Then add_system() can be simplified to only add a Universe. This will make the code clearer and more robust.

It will also hopefully create less necessity for checking for NoDataError etc. In particular, get rid of the exception NoDataWarning. If we need an exception for this case, use NoDataError. See the comments — apologies, the comments are a bit sprawling as I was learning how the code worked.

Compress almost all gro files as bz2, and leave one as gro and one as gz.

select_systems() looks good for MDPOW's purposes: it can select for what we care and does not do anything that's too fancy. I like how you can create new Ensembles and EnsembleAtomGroups. This looks like the right approach.

Regarding rewriting the history of this PR: Let's finish it here where we have all the comments. Once the PR is approved, you can make a new, clean one if you like. But I can also just squash this one, which will get rid of intermediate commits, too.

def test_build_exception(self):
ens = Ensemble()
with in_dir(os.path.join(self.tmpdir.name, 'FEP', 'test_solv'), create=False):
with pytest.raises(NoDataWarning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be calling an exception SomethingWarning. Exceptions are SomethingError, so the code that raises this exception should raise NoDataError. Once that is done, this test needs to be adapted.

mdpow/tests/test_ensemble.py Show resolved Hide resolved

if not os.path.exists(dirname):
logger.error(f"Directory {dirname} does not exist")
raise FileNotFoundError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give more information in the exception

Suggested change
raise FileNotFoundError
raise FileNotFoundError(errno.ENOENT, "Directory does not exist", dirname)

Copy link
Contributor Author

@ALescoulie ALescoulie Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

edit: responded to wrong exception

mdpow/analysis/ensemble.py Show resolved Hide resolved
mdpow/analysis/ensemble.py Show resolved Hide resolved
mdpow/analysis/ensemble.py Show resolved Hide resolved
mdpow/analysis/ensemble.py Show resolved Hide resolved
mdpow/analysis/ensemble.py Show resolved Hide resolved
Comment on lines 309 to 312
for k in self.keys():
if self[k] != other[k]:
return False
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite as

Suggested change
for k in self.keys():
if self[k] != other[k]:
return False
return True
return all(self[k] == other[k] for k in self.keys())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 313 to 314
else:
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be shortened

Suggested change
else:
return False
return False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@ALescoulie
Copy link
Contributor Author

@orbeckst I finished addressing all your comments and got it near complete coverage on ensemble with the only thing being some partials.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good changes, code is much cleaner and excellent test coverage improvement.

  • A few minor comments (see inline), mostly docs.
  • Compress almost all of the gro files with bz2, one with gz, and leave one uncompressed.

int_dir = os.path.join(fep_dir, solvent, dirs)
with in_dir(int_dir, create=False): # Entering attribute folders
logger.info("Searching %s directory for systems", os.curdir)
files: list = os.listdir(os.curdir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the type annotation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really important, can be removed.


logs warning if more than one topology is in directory. If
more than one trajectory attempts to load both of them
in a universe if that fail will try to load each individually"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs to be added to doc string.

mdpow/analysis/ensemble.py Show resolved Hide resolved
Comment on lines 208 to 209
Takes specified key and either existing mda.Universe object or
trajectory and topology path. Ensure that paths are set to absolute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update doc string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

mdpow/analysis/ensemble.py Show resolved Hide resolved
@orbeckst
Copy link
Member

orbeckst commented Sep 9, 2021

@VOD555 can you please check if @ALescoulie satisfied your review requirements. If you're happy please approve, otherwise please ask for any other necessary changes.

@ALescoulie
Copy link
Contributor Author

@orbeckst I got the files compressed and uploaded.

Copy link
Contributor

@VOD555 VOD555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the updates.

@ALescoulie ALescoulie mentioned this pull request Sep 9, 2021
3 tasks
@orbeckst orbeckst self-assigned this Sep 9, 2021
@orbeckst
Copy link
Member

orbeckst commented Sep 9, 2021

I am working on the squash merge... will be merged in <5 mins.

@orbeckst orbeckst merged commit 0b969cf into Becksteinlab:develop Sep 9, 2021
@orbeckst
Copy link
Member

orbeckst commented Sep 9, 2021

Hooray, it's in, congratulations @ALescoulie 🥳 !!!

ALescoulie added a commit to ALescoulie/MDPOW that referenced this pull request Sep 9, 2021
* new Ensemble framework for aggregate analysis in mdpow.analysis
* part of Becksteinlab#168 (adding analysis to MDPOW)
* new submodule mdpow.analysis
* new Ensemble, EnsembleAtomGroup, and EnsembleAnalysis classes
* add docs (new section on Analysis)
* add tests including ensemble test data (water simulations, octanol to be added later)
* update CHANGES
@ALescoulie ALescoulie mentioned this pull request Sep 10, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding Analysis Module
3 participants