Project 39 : Divide and Conquer : Local Gaussian Processes to design Covalent Organic Frameworks for Methane Deliverable Capacity
- Nikhil Kumar Thota (Johns Hopkins University) (https://github.com/T-NIKHIL)
- Maitreyee Sharma Priyadarshini (Johns Hopkins University) (https://github.com/msharmap)
- Yiran (Gigi) Wang (Johns Hopkins University) (https://github.com/gigiwang08)
- Jarett Ren (Johns Hopkins University) (https://github.com/jren0))
In this project, we explore the use of local Gaussian Process models to accelerate materials discovery when the search spaces are very large. We evaluate the performance of the framework on a covalent organic framework (COF) dataset that consists of 69,840 2D and 3D COFs [1]. This dataset replicates some real-world scenarios wherein the search space to explore is very large. In this test, we used an initial training dataset comprising 5% of the total search space. These COF structures are designed for methane storage and our optimization target here is the deliverable capacity (v STP/v) of the COF structure. We employ gaussian process surrogates with zero prior mean function and Matern kernel as the covariance function.
Gaussian Process (GP) has been a popular choice of
surrogate model in Bayesian Optimization due to its
flexibility and uncertainty quantification. However,
training a Gaussian Process involves several matrix
inversions, which can dramatically scale up the computational
cost as more data is obtained via Bayesian Optimization.
Gaussian Process has a runtime complexity of
What to know more click here
References
[1] Mercado, R.; Fu, R.-S.; Yakutovich, A. V.; Talirz, L.; Haranczyk, M.; Smit, B. In Silico Design of 2D and 3D Covalent Organic Frameworks for Methane Storage Applications. Chem. Mater. 2018, 30 (15), 5069–5086. https://doi.org/10.1021/acs.chemmater.8b01425.
-
Install the conda environment bo-hackathon from the bo-hackathon.yml file. Type the following command in the terminal : conda env create -f bo-hackathon.yml
-
The inputs to the code must be provided in code_inputs.py
-
The jupyter notebook for running the code is located under src/BO.ipynb
-
The dataset for training the model is located under data/properties.csv
-
The results from training the model are located under bo_output