Skip to content

Commit

Permalink
Merge pull request #112 from valence-labs/3bpa
Browse files Browse the repository at this point in the history
3bpa dataset
  • Loading branch information
FNTwin authored Aug 30, 2024
2 parents dab04ef + 3aa2796 commit 5015a2e
Show file tree
Hide file tree
Showing 21 changed files with 379 additions and 134 deletions.
91 changes: 61 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,25 @@
# openQDC

Open Quantum Data Commons
<div align="center">
<img src="docs/assets/logo-title.png" width="100%">
</div>

<p align="center">
<b>openQDC - Open Quantum Data Commons </b> <br />
</p>
<p align="center">
<a href="https://docs.openqdc.io/" target="_blank">
Docs
</a> |
<a href="https://openqdc.io/" target="_blank">
Homepage
</a>
</p>

---

[![license](https://licensebuttons.net/l/by-nc/4.0/80x15.png)](https://github.com/valence-labs/openQDC/blob/main/LICENSE)

### Installing openQDC

```bash
git clone [email protected]:OpenDrugDiscovery/openQDC.git
cd openQDC
Expand Down Expand Up @@ -57,41 +74,55 @@ We provide support for the following publicly available QM Potential Energy Data

# Potential Energy

| Dataset | # Molecules | # Conformers | Average Conformers per Molecule | Force Labels | Atom Types | QM Level of Theory | Off-Equilibrium Conformations|
| --- | --- | --- | --- | --- | --- | --- | --- |
| [ANI](https://pubs.rsc.org/en/content/articlelanding/2017/SC/C6SC05720A) | 57,462 | 20,000,000 | 348 | No | 4 | ωB97x:6-31G(d) | Yes |
| [GEOM](https://www.nature.com/articles/s41597-022-01288-4) | 450,000 | 37,000,000 | 82 | No | 18 | GFN2-xTB | No |
| [Molecule3D](https://arxiv.org/abs/2110.01717) | 3,899,647 | 3,899,647 | 1 | No | 5 | B3LYP/6-31G* | No |
| [NablaDFT](https://pubs.rsc.org/en/content/articlelanding/2022/CP/D2CP03966D) | 1,000,000 | 5,000,000 | 5 | No | 6 | ωB97X-D/def2-SVP | |
| [OrbNet Denali](https://arxiv.org/abs/2107.00299) | 212,905 | 2,300,000 | 11 | No | 16 | GFN1-xTB | Yes |
| [PCQM_PM6](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00740) | | | 1| No| | PM6 | No
| [PCQM_B3LYP](https://arxiv.org/abs/2305.18454) | 85,938,443|85,938,443 | 1| No| | B3LYP/6-31G* | No
| [QMugs](https://www.nature.com/articles/s41597-022-01390-7) | 665,000 | 2,000,000 | 3 | No | 10 | GFN2-xTB, ωB97X-D/def2-SVP | No |
| [QM7X](https://www.nature.com/articles/s41597-021-00812-2) | 6,950 | 4,195,237 | 603 | Yes | 7 | PBE0+MBD | Yes |
| [SN2RXN](https://pubs.acs.org/doi/10.1021/acs.jctc.9b00181) | 39 | 452709 | 11,600 | Yes | 6 | DSD-BLYP-D3(BJ)/def2-TZVP | |
| [SolvatedPeptides](https://doi.org/10.1021/acs.jctc.9b00181) | | 2,731,180 | | Yes | | revPBE-D3(BJ)/def2-TZVP | |
| [Spice](https://arxiv.org/abs/2209.10702) | 19,238 | 1,132,808 | 59 | Yes | 15 | ωB97M-D3(BJ)/def2-TZVPPD | Yes |
| [tmQM](https://pubs.acs.org/doi/10.1021/acs.jcim.0c01041) | 86,665 | 86,665| 1| No | | TPSSh-D3BJ/def2-SVP | |
| [Transition1X](https://www.nature.com/articles/s41597-022-01870-w) | | 9,654,813| | Yes | | ωB97x/6–31 G(d) | Yes |
| [WaterClusters](https://doi.org/10.1063/1.5128378) | 1 | 4,464,740| | No | 2 | TTM2.1-F | Yes|

| Dataset | # Molecules | # Conformers | Average Conformers per Molecule | Force Labels | Atom Types | QM Level of Theory | Off-Equilibrium Conformations |
| ----------------------------------------------------------------------------- | ----------- | ------------ | ------------------------------- | ------------ | ---------- | -------------------------- | ----------------------------- |
| [ANI](https://pubs.rsc.org/en/content/articlelanding/2017/SC/C6SC05720A) | 57,462 | 20,000,000 | 348 | No | 4 | ωB97x:6-31G(d) | Yes |
| [GEOM](https://www.nature.com/articles/s41597-022-01288-4) | 450,000 | 37,000,000 | 82 | No | 18 | GFN2-xTB | No |
| [Molecule3D](https://arxiv.org/abs/2110.01717) | 3,899,647 | 3,899,647 | 1 | No | 5 | B3LYP/6-31G\* | No |
| [NablaDFT](https://pubs.rsc.org/en/content/articlelanding/2022/CP/D2CP03966D) | 1,000,000 | 5,000,000 | 5 | No | 6 | ωB97X-D/def2-SVP | |
| [OrbNet Denali](https://arxiv.org/abs/2107.00299) | 212,905 | 2,300,000 | 11 | No | 16 | GFN1-xTB | Yes |
| [PCQM_PM6](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c00740) | | | 1 | No | | PM6 | No |
| [PCQM_B3LYP](https://arxiv.org/abs/2305.18454) | 85,938,443 | 85,938,443 | 1 | No | | B3LYP/6-31G\* | No |
| [QMugs](https://www.nature.com/articles/s41597-022-01390-7) | 665,000 | 2,000,000 | 3 | No | 10 | GFN2-xTB, ωB97X-D/def2-SVP | No |
| [QM7X](https://www.nature.com/articles/s41597-021-00812-2) | 6,950 | 4,195,237 | 603 | Yes | 7 | PBE0+MBD | Yes |
| [SN2RXN](https://pubs.acs.org/doi/10.1021/acs.jctc.9b00181) | 39 | 452709 | 11,600 | Yes | 6 | DSD-BLYP-D3(BJ)/def2-TZVP | |
| [SolvatedPeptides](https://doi.org/10.1021/acs.jctc.9b00181) | | 2,731,180 | | Yes | | revPBE-D3(BJ)/def2-TZVP | |
| [Spice](https://arxiv.org/abs/2209.10702) | 19,238 | 1,132,808 | 59 | Yes | 15 | ωB97M-D3(BJ)/def2-TZVPPD | Yes |
| [tmQM](https://pubs.acs.org/doi/10.1021/acs.jcim.0c01041) | 86,665 | 86,665 | 1 | No | | TPSSh-D3BJ/def2-SVP | |
| [Transition1X](https://www.nature.com/articles/s41597-022-01870-w) | | 9,654,813 | | Yes | | ωB97x/6–31 G(d) | Yes |
| [WaterClusters](https://doi.org/10.1063/1.5128378) | 1 | 4,464,740 | | No | 2 | TTM2.1-F | Yes |

# Interaction energy

We also provide support for the following publicly available QM Noncovalent Interaction Energy Datasets.

| Dataset |
| --- |
| [DES370K](https://www.nature.com/articles/s41597-021-00833-x) |
| [DES5M](https://www.nature.com/articles/s41597-021-00833-x) |
| Dataset |
| ------------------------------------------------------------------------------------------------------------------- |
| [DES370K](https://www.nature.com/articles/s41597-021-00833-x) |
| [DES5M](https://www.nature.com/articles/s41597-021-00833-x) |
| [Metcalf](https://pubs.aip.org/aip/jcp/article/152/7/074103/1059677/Approaches-for-machine-learning-intermolecular) |
| [DESS66](https://www.nature.com/articles/s41597-021-00833-x) |
| [DESS66x8](https://www.nature.com/articles/s41597-021-00833-x) |
| [Splinter](https://www.nature.com/articles/s41597-023-02443-1) |
| [X40](https://pubs.acs.org/doi/10.1021/ct300647k) |
| [L7](https://pubs.acs.org/doi/10.1021/ct400036b) |
| [DESS66](https://www.nature.com/articles/s41597-021-00833-x) |
| [DESS66x8](https://www.nature.com/articles/s41597-021-00833-x) |
| [Splinter](https://www.nature.com/articles/s41597-023-02443-1) |
| [X40](https://pubs.acs.org/doi/10.1021/ct300647k) |
| [L7](https://pubs.acs.org/doi/10.1021/ct400036b) |

# CI Status

The CI runs tests and performs code quality checks for the following combinations:

- The three major platforms: Windows, OSX and Linux.
- The four latest Python versions.

| | `main` |
| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Lib build & Testing | [![test](https://github.com/valence-labs/openQDC/actions/workflows/test.yml/badge.svg)](https://github.com/valence-labs/openQDC/actions/workflows/test.yml) |
| Code Sanity (linting and type analysis) | [![code-check](https://github.com/valence-labs/openQDC/actions/workflows/code-check.yml/badge.svg)](https://github.com/valence-labs/openQDC//actions/workflows/code-check.yml) |
| Documentation Build | [![doc](https://github.com/valence-labs/openQDC/actions/workflows/doc.yml/badge.svg)](https://github.com/valence-labs/openQDC/actions/workflows/doc.yml) |
| Pre-Commit | [![pre-commit](https://github.com/valence-labs/openQDC/actions/workflows/pre-commit-ci.yml/badge.svg)](https://github.com/valence-labs/openQDC/actions/workflows/pre-commit-ci.yml) |

# How to cite

All data presented in the OpenQDC are already published in scientific journals, full reference to the respective paper is attached to each dataset class. When citing data obtained from OpenQDC, you should cite both the original paper(s) the data come from and our paper on OpenQDC itself. The reference is:

ADD REF HERE LATER
1 change: 1 addition & 0 deletions docs/API/e0_dispatcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.energies
3 changes: 3 additions & 0 deletions docs/API/properties.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Defined properties for datasets

:::openqdc.datasets.properties
1 change: 1 addition & 0 deletions docs/API/statistics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.statistics
Binary file added docs/assets/logo-title.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@ OpenQDC is a python library to work with quantum datasets. It's a package aimed
- 🧠 Performance matters: read and write multiple formats (memmap, zarr, xyz, etc).
- 📈 Data: have access to 1.5+ billion datapoints

Visit our website at TOFILL <IDK>.
Visit our website at https://openqdc.io .

## Installation

Use mamba:

```bash
mamba install -c conda-forge openqdc
conda install -c conda-forge openqdc
```

_**Tips:** You can replace `mamba` by `conda`._
_**Tips:** You can replace `conda` by `mamba`._

_**Note:** We highly recommend using a [Conda Python distribution](https://github.com/conda-forge/miniforge) to install OpenQDC. The package is also pip installable if you need it: `pip install openqdc`._

Expand Down Expand Up @@ -58,7 +58,7 @@ dataset.calculate_descriptors(

## How to cite

Please cite OpenQDC if you use it in your research: [![DOI](zenodo_badge)](zenodo_link).
Please cite OpenQDC if you use it in your research: [![Pending Publication](Pending Publication)](Pending Publication).

## Compatibilities

Expand Down
8 changes: 8 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,14 @@ for data in dataset.as_iter(atoms=True):
break
```

or if you want to just iterate over the data:

```python
for data in dataset:
print(data) # dict of arrays
break
```

## Lazy loading

OpenQDC uses lazy loading to dynamically expose all its API without imposing a long import time during `import openqdc as qdc`. In case of trouble you can always disable lazy loading by setting the environment variable `OPENQDC_DISABLE_LAZY_LOADING` to `1`.
1 change: 1 addition & 0 deletions env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies:
- s3fs
- pydantic
- python-dotenv
- httpx


# Scientific
Expand Down
10 changes: 7 additions & 3 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
site_name: "OpenQDC"
site_description: "I don't know... Something about data and Quantum stuff I guess :D"
site_description: "Harness the power of quantum chemistry in one line of code."
repo_url: "https://github.com/valence-labs/openQDC"
repo_name: "openQDC"
copyright: Copyright 2023 Valence Labs
copyright: Copyright 2024 Valence Labs

site_url: "https://github.com/valence-labs/openQDC"
remote_branch: "gh-pages"
Expand All @@ -25,7 +25,11 @@ nav:
- API:
- QM methods: API/methods.md
- Normalization regressor: API/regressor.md
- Main class: API/basedataset.md
- Main classes:
- BaseDataset: API/basedataset.md
- Available Properties: API/properties.md
- e0 Dispatcher: API/e0_dispatcher.md
- Statistics: API/statistics.md
- Format loading: API/formats.md
- Datasets:
- Potential Energy:
Expand Down
2 changes: 2 additions & 0 deletions openqdc/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ def get_project_root():
"ANI1CCX_V2": "openqdc.datasets.potential.ani",
"ANI1X": "openqdc.datasets.potential.ani",
"ANI2X": "openqdc.datasets.potential.ani",
"BPA": "openqdc.datasets.potential.bpa",
"Spice": "openqdc.datasets.potential.spice",
"SpiceV2": "openqdc.datasets.potential.spice",
"SpiceVL2": "openqdc.datasets.potential.spice",
Expand Down Expand Up @@ -118,6 +119,7 @@ def __dir__():
# POTENTIAL
from .datasets.potential.alchemy import Alchemy
from .datasets.potential.ani import ANI1, ANI1CCX, ANI1CCX_V2, ANI1X, ANI2X
from .datasets.potential.bpa import BPA
from .datasets.potential.comp6 import COMP6
from .datasets.potential.dummy import Dummy, PredefinedDataset
from .datasets.potential.gdml import GDML
Expand Down
8 changes: 7 additions & 1 deletion openqdc/datasets/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,13 @@ def force_methods(self):
return list(compress(self.energy_methods, self.force_mask))

@property
def e0s_dispatcher(self):
def e0s_dispatcher(self) -> AtomEnergies:
"""
Property to get the object that dispatched the isolated atom energies of the QM methods.
Returns:
Object wrapping the isolated atom energies of the QM methods.
"""
if not hasattr(self, "_e0s_dispatcher"):
# Automatically fetch/compute formation or regression energies
self._e0s_dispatcher = AtomEnergies(self, **self.regressor_kwargs)
Expand Down
Loading

0 comments on commit 5015a2e

Please sign in to comment.