M2Hub aims to build the machine learning foundations for materials discovery which has a standard workflow from virtual screening/inverse design to simulation to experiment. M2Hub provides data downloading, data processing, (baseline and state-of-the-art) machine learning method implementation, evaluation pipeline and benchmark results.
For machine learning researchers: M2Hub provides dataset collection, problem formulation and machine learning workflow to plug in any newly developed model for benchmarking results.
For materials scientists: M2Hub implements the entire machine learning workflow for plugging in any materials datasets to use.
M2Hub provides several key functions, data downloading, data processing, training machine learning models and benchmarking results.
conda env create -f environment.yml
We also support installation through pip
pip install -e .
Please refer to INSTALL.md for details.
python download_data.py --task TASK --property PROPERTY --split SPLIT --get-edges
Please check DATASETS.md for details, splits include [random|composition|system|time].
For more details about each dataset, please check DOCUMENTS.md.
python -u main.py --mode train --config-yml configs/matbench/e_form/random/cgcnn.yml
Please check MODELS.md for details.
To facilitate the development of generative materials design, we provide oracle functions and evaluation metrics for generative modeling on materials.
Oracle functions
python run.py --Task steels --Data test_data.cif --Oracle rf_scm_magpie
We provide two oracle functions here, please use "--Oracle" to set which one you would like to use. Also, the running task can be set with "--Task". Please see our paper for more details.
Evaluation Metrics
python compute_metrics.py --root_path my_data --eval_model_name my_model --tasks recon gen opt
We provide evaluation metrics for reconstruction, generation, and optimizatioin tasks. Please check our paper for more details. The to-be-evaluated dataset should be under "--root_path" with a format like "eval_recon.pt". The folder containing the pre-trained property prediction model checkpoint should be under "./prop_models".
We provide an initial benchmarking results over 13 tasks covering representative methods developed in the past. We will continue to incorporate new methods and welcome contributions from the community.
We welcome contributions in any format, from new datasets, materials discovery tasks, to machine learning models, evaluation methods, and benchmark results.
Reach us at [email protected] and [email protected] or open a GitHub issue.
M2Hub is released under the MIT license.
If you use our code in your work, please consider citing:
@inproceedings{du2023m,
title={M $\^{} 2$ Hub: Unlocking the Potential of Machine Learning for Materials Discovery},
author={Du, Yuanqi and Wang, Yingheng and Huang, Yining and Li, Jianan Canal and Zhu, Yanqiao and Xie, Tian and Duan, Chenru and Gregoire, John and Gomes, Carla P},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023}
}
Open Catalyst Project (https://github.com/Open-Catalyst-Project/ocp)
Crystal Diffusion Variational Autoencoder (CDVAE) (https://github.com/txie-93/cdvae)