Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Icml push #102

Closed
wants to merge 64 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
eed4278
Updated gitignore
Jun 26, 2024
c6df5c1
1St implementation zarr reading
Jun 26, 2024
c80f431
working zarr download
Jun 26, 2024
b0a1833
VQM24
Jun 26, 2024
8e88e13
protein fragments + alchemy
Jun 26, 2024
238342f
bunch of datasets
Jun 26, 2024
1ab09ea
QM8
Jun 26, 2024
ea59515
QM7 QM7b QM9
Jun 26, 2024
0c60be6
WIP
Jun 30, 2024
b221f72
Add Memmap and Zarr dataset types
Jul 8, 2024
d031b13
Dataset structure factory
Jul 9, 2024
d2ecb2b
QM7X patch, fixes + donwload cli WIP for gs
FNTwin Jul 9, 2024
31c0b1c
Fixes
FNTwin Jul 9, 2024
dd7ae10
if mistake
FNTwin Jul 9, 2024
dba3607
Waterclusters preprocessing
FNTwin Jul 10, 2024
e44281f
Docstrings+type hinting of various methods
FNTwin Jul 10, 2024
77fa786
Mkdocs docstrings google + import dataset
FNTwin Jul 10, 2024
2a186fc
black .
FNTwin Jul 10, 2024
e5d5e9b
WIP
FNTwin Jul 10, 2024
f0141ed
WIP docs
FNTwin Jul 11, 2024
844e48c
Docs versioning and mike
FNTwin Jul 11, 2024
8abb1b7
CLI docs
FNTwin Jul 11, 2024
bbe9f3a
CLI formatting
FNTwin Jul 11, 2024
48effde
Updated docstrings
shenoynikhil Jul 11, 2024
802881e
added more docstrings
shenoynikhil Jul 12, 2024
4bdab0c
added docstrings for ani
shenoynikhil Jul 12, 2024
8a15ab4
Added des dataset docstrings
shenoynikhil Jul 12, 2024
d3d6eb3
Updated l7 and splinter
shenoynikhil Jul 12, 2024
dac9308
License + docs + docstrings for ProteinFr, WtrCls, ScanWtr
FNTwin Jul 12, 2024
7d74de1
Merge branch 'tags' into docstring-addn
shenoynikhil Jul 12, 2024
2fa43b2
added metcalf
shenoynikhil Jul 12, 2024
ddd0509
Added qm9 and x40
shenoynikhil Jul 12, 2024
3fc5c7d
Normalization entry doc, regressor API
FNTwin Jul 12, 2024
e59192f
pre-commit change
shenoynikhil Jul 12, 2024
44fcf16
circular import removal for tests
FNTwin Jul 12, 2024
a86b070
fix pre-commit
shenoynikhil Jul 12, 2024
98ebb91
Merge remote-tracking branch 'origin/tags' into docstring-addn
shenoynikhil Jul 12, 2024
0e95e17
Iterator, better docs, storage view
FNTwin Jul 15, 2024
d34f08a
pre commit
FNTwin Jul 15, 2024
9845a1a
New license format, better naming, moved CLI into usage
FNTwin Jul 15, 2024
f4d13f8
Macos + windows matrix
FNTwin Jul 15, 2024
44efba3
multiple PR templates, add dataset init
FNTwin Jul 15, 2024
fb10590
File name changed, dataset add md
FNTwin Jul 15, 2024
7ec4164
Docstrings for methods
FNTwin Jul 15, 2024
150842a
Strict docs generation, completed type hinting
FNTwin Jul 16, 2024
65f971c
py3.8 type hinting, removed preprocessing.py
FNTwin Jul 16, 2024
bf18538
Merge pull request #104 from valence-labs/tags
FNTwin Jul 16, 2024
69de4da
tuple->Tuple
FNTwin Jul 16, 2024
e36351c
Solver selection abstraction
FNTwin Jul 16, 2024
8824e1e
Merge branch 'docstring-addn' into icml_push
FNTwin Jul 16, 2024
6acf7fc
Docstrings formatting
FNTwin Jul 16, 2024
a89a672
Change in API from s3 to https
Jul 16, 2024
d291515
Removed polyfill again ? How?
FNTwin Jul 16, 2024
863914a
Merge remote-tracking branch 'origin/icml_push' into docstring-addn
shenoynikhil Jul 17, 2024
d3a3978
Merge pull request #103 from valence-labs/docstring-addn
shenoynikhil Jul 17, 2024
4ee1cc1
S3 API
Jul 18, 2024
4da4736
Download test
Jul 18, 2024
f67c079
Merge branch 'icml_push' of https://github.com/OpenDrugDiscovery/open…
Jul 18, 2024
1dc4b51
small markdown add
Jul 18, 2024
c51d040
windows fix
Jul 18, 2024
8eb1ec5
Disabled windows test for now
Jul 18, 2024
3bca7b6
Skip stats option + np.float32 for NablaDFT+PCQM
Jul 22, 2024
ccfaa28
XYZDataset fix
Jul 22, 2024
41c679a
Merge pull request #105 from valence-labs/skip_stats
FNTwin Jul 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
os: ["ubuntu-latest"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
os: ["ubuntu-latest", "macos-latest"] #,"windows-latest"

runs-on: ${{ matrix.os }}
timeout-minutes: 30
Expand Down Expand Up @@ -53,5 +53,5 @@ jobs:
- name: Run tests
run: python -m pytest

#- name: Test building the doc
# run: mkdocs build
- name: Test building the doc
run: mkdocs build
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -149,3 +149,6 @@ cookie.txt
*.txt
*.sh
.DS_Store
*.zarr/
scripts/
notebooks/
352 changes: 352 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

3 changes: 0 additions & 3 deletions docs/API/available_datasets.md

This file was deleted.

1 change: 1 addition & 0 deletions docs/API/basedataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.base
1 change: 1 addition & 0 deletions docs/API/datasets/alchemy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.alchemy
1 change: 1 addition & 0 deletions docs/API/datasets/ani.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.ani
1 change: 1 addition & 0 deletions docs/API/datasets/comp6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.comp6
1 change: 1 addition & 0 deletions docs/API/datasets/des.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.interaction.des
1 change: 1 addition & 0 deletions docs/API/datasets/gdml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.gdml
1 change: 1 addition & 0 deletions docs/API/datasets/geom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.geom.GEOM
1 change: 1 addition & 0 deletions docs/API/datasets/iso_17.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.iso_17
1 change: 1 addition & 0 deletions docs/API/datasets/l7.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.interaction.l7
1 change: 1 addition & 0 deletions docs/API/datasets/md22.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.md22
1 change: 1 addition & 0 deletions docs/API/datasets/metcalf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.interaction.metcalf
1 change: 1 addition & 0 deletions docs/API/datasets/molecule3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.molecule3d
1 change: 1 addition & 0 deletions docs/API/datasets/multixcqm9.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.multixcqm9
1 change: 1 addition & 0 deletions docs/API/datasets/nabladft.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.nabladft
1 change: 1 addition & 0 deletions docs/API/datasets/orbnet_denali.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.orbnet_denali
1 change: 1 addition & 0 deletions docs/API/datasets/pcqm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.pcqm
1 change: 1 addition & 0 deletions docs/API/datasets/proteinfragments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.proteinfragments
1 change: 1 addition & 0 deletions docs/API/datasets/qm1b.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.qm1b
1 change: 1 addition & 0 deletions docs/API/datasets/qm7x.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.qm7x
1 change: 1 addition & 0 deletions docs/API/datasets/qmugs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.qmugs
1 change: 1 addition & 0 deletions docs/API/datasets/qmx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.qmx
1 change: 1 addition & 0 deletions docs/API/datasets/revmd17.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.revmd17
1 change: 1 addition & 0 deletions docs/API/datasets/sn2_rxn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.sn2_rxn
1 change: 1 addition & 0 deletions docs/API/datasets/solvated_peptides.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.solvated_peptides
2 changes: 2 additions & 0 deletions docs/API/datasets/spice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

::: openqdc.datasets.potential.spice
1 change: 1 addition & 0 deletions docs/API/datasets/splinter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.interaction.splinter
1 change: 1 addition & 0 deletions docs/API/datasets/tmqm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.tmqm
1 change: 1 addition & 0 deletions docs/API/datasets/transition1x.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.transition1x
1 change: 1 addition & 0 deletions docs/API/datasets/vqm24.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.vqm24
1 change: 1 addition & 0 deletions docs/API/datasets/waterclusters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.waterclusters
1 change: 1 addition & 0 deletions docs/API/datasets/waterclusters3_30.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.potential.waterclusters3_30
1 change: 1 addition & 0 deletions docs/API/datasets/x40.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.interaction.x40
1 change: 1 addition & 0 deletions docs/API/formats.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.datasets.structure
6 changes: 5 additions & 1 deletion docs/API/methods.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# QM Methods

::: openqdc.methods
::: openqdc.methods.enums

# Isolated Atom Energies

::: openqdc.methods.atom_energies
1 change: 1 addition & 0 deletions docs/API/regressor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.utils.regressor
3 changes: 3 additions & 0 deletions docs/API/units.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# UNITS

::: openqdc.utils.units
1 change: 1 addition & 0 deletions docs/API/utils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: openqdc.utils
46 changes: 0 additions & 46 deletions docs/_overrides/main.html

This file was deleted.

Binary file added docs/assets/StorageView.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/qdc_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
113 changes: 113 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# CLI for dataset downloading and uploading
You can quickly download, fetch, preprocess and upload openQDC datasets using the command line interface (CLI).

## Datasets
Print a formatted table of the available openQDC datasets and some informations.

Usage:

openqdc datasets [OPTIONS]

Options:

--help Show this message and exit.

## Cache
Get the current local cache path of openQDC

Usage:

openqdc cache [OPTIONS]

Options:

--help Show this message and exit.


## Download
Download preprocessed ml-ready datasets from the main openQDC hub.

Usage:

openqdc download DATASETS... [OPTIONS]

Options:

--help Show this message and exit.
--overwrite Whether to force the re-download of the datasets and overwrite the current cached dataset. [default: no-overwrite]
--cache-dir Path to the cache. If not provided, the default cache directory (.cache/openqdc/) will be used. [default: None]
--as-zarr Whether to use a zarr format for the datasets instead of memmap. [default: no-as-zarr]
--gs Whether source to use for downloading. If True, Google Storage will be used.Otherwise, AWS S3 will be used [default: no-gs]

Example:

openqdc download Spice

## Fetch
Download the raw datasets files from the main openQDC hub

Note:

Special case: if the dataset is "all", "potential", "interaction".

Usage:

openqdc fetch DATASETS... [OPTIONS]

Options:

--help Show this message and exit.
--overwrite Whether to overwrite or force the re-download of the raw files. [default: no-overwrite]
--cache-dir Path to the cache. If not provided, the default cache directory (.cache/openqdc/) will be used. [default: None]

Example:

openqdc fetch Spice

## Preprocess
Preprocess a raw dataset (previously fetched) into a openqdc dataset and optionally push it to remote.

Usage:

openqdc preprocess DATASETS... [OPTIONS]

Options:

--help Show this message and exit.
--overwrite Whether to overwrite the current cached datasets. [default: overwrite]
--upload Whether to attempt the upload to the remote storage. Must have write permissions. [default: no-upload]
--as-zarr Whether to preprocess as a zarr format or a memmap format. [default: no-as-zarr]

Example:

openqdc preprocess Spice QMugs

## Upload
Upload a preprocessed dataset to the remote storage

Usage:

openqdc upload DATASETS... [OPTIONS]

Options:

--help Show this message and exit.
--overwrite Whether to overwrite the remote files if they are present. [default: overwrite]
--as-zarr Whether to upload the zarr files if available. [default: no-as-zarr]

Example:

openqdc upload Spice --overwrite

## Convert
Convert a preprocessed dataset from a memmap dataset to a zarr dataset.

Usage:

openqdc convert DATASETS... [OPTIONS]

Options:

--help Show this message and exit.
--overwrite Whether to overwrite the current zarr cached datasets. [default: no-overwrite]
--download Whether to force the re-download of the memmap datasets. [default: no-download]
59 changes: 59 additions & 0 deletions docs/contribute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Contribute

The below documents the development lifecycle of OpenQDC.

## Setup a dev environment

```bash
mamba env create -n openqdc -f env.yml
mamba activate datamol
pip install -e .
```

## Pre commit installation

```bash
pre-commit install
pre-commit run --all-files
```

## Continuous Integration

OpenQDC uses Github Actions to:

- **Build and test** `openQDC`.
- Multiple combinations of OS and Python versions are tested.
- **Check** the code:
- Formatting with `black`.
- Static type check with `mypy`.
- Modules import formatting with `isort`.
- Pre-commit hooks.
- **Documentation**:
- Google docstring format.
- build and deploy the documentation on `main` and for every new git tag.


## Run tests

```bash
pytest
```

## Build the documentation

You can build and serve the documentation locally with:

```bash
# Build and serve the doc
mike serve
```

or with

```bash
mkdocs serve
```

### Multi-versionning

The doc is built for eash push on `main` and every git tags using [mike](https://github.com/jimporter/mike). Everything is automated using Github Actions. Please refer to the official mike's documentation for the details.
Loading
Loading