AtomGen provides a robust framework for handling atomistic graph datasets focusing on transformer-based implementations. We provide utilities for training various models, experimenting with different pre-training tasks, and pre-trained models and datasets on huggingface hub.
It streamlines the process of aggregation, standardization, and utilization of datasets from diverse sources, enabling large-scale pre-training and generative modeling on atomistic graphs.
The package can be installed using poetry:
python3 -m poetry install
source $(poetry env info --path)/bin/activate
AtomGen facilitates the aggregation and standardization of datasets, including but not limited to:
-
S2EF Datasets: Aggregated from multiple sources such as OC20, OC22, ODAC23, MPtrj, and SPICE with structures and energies/forces for pre-training.
-
Misc. Atomistic Graph Datasets: Including Molecule3D, Protein Data Bank (PDB), and the Open Quantum Materials Database (OQMD).
Currently, AtomGen has pre-processed datasets for the S2EF pre-training task for OC20 and a mixed dataset of OC20, OC22, ODAC23, MPtrj, and SPICE. They have been uploaded to huggingface hub and can be accessed using the datasets API.
AtomGen supports a variety of models for training on atomistic graph datasets, including:
- AtomFormer: Custom architecture that leverages gaussian pair-wise positional embeddings and self-attention to model atomistic graphs.
- SchNet: A continuous-filter convolutional neural network for modeling quantum interactions.
- TokenGT: Tokenized graph transformer that treats all nodes and edges as independent tokens.
Experimentation with pre-training tasks is facilitated through AtomGen, including:
-
Structure to Energy & Forces: Predicting energies and forces for atomistic graphs.
-
Masked Atom Modeling: Masking atoms and predicting their properties.
-
Coordinate Denoising: Denoising atom coordinates.
These tasks are all facilitated through the DataCollatorForAtomModeling
class and can be used simultaneously or individually.
The development environment can be set up using poetry. Hence, make sure it is installed and then run:
python3 -m poetry install
source $(poetry env info --path)/bin/activate
In order to install dependencies for testing (codestyle, unit tests, integration tests), run:
python3 -m poetry install --with test