While inorganic retrosynthesis planning is essential in the field of chemical science, the application of machine learning in this area has been notably less explored compared to organic retrosynthesis planning. In this paper, we propose Retrieval-Retro for inorganic retrosynthesis planning, which implicitly extracts the precursor information of reference materials that are retrieved from the knowledge base regarding domain expertise in the field. Specifically, instead of directly employing the precursor information of reference materials, we propose implicitly extracting it with various attention layers, which enables the model to learn novel synthesis recipes more effectively. Moreover, during retrieval, we consider the thermodynamic relationship between target material and precursors, which is essential domain expertise in identifying the most probable precursor set among various options. Extensive experiments demonstrate the superiority of Retrieval-Retro in retrosynthesis planning, especially in discovering novel synthesis recipes, which is crucial for materials discovery.
You can dowload the year split dataset in this drive link
After downloading the dataset, place it in the dataset
folder
(We upload our dataset into the drive link due to its file size)
Run main_Retrieval_Retro.py
to train our Retrieval-Retro (after downloading the dataset)
Retrieval_Retro.py
: Our proposed model: Retrieval-Retro
MPC.py
: MPC Retriever
train_mpc.py
: For training MPC Retriever
retrieval_mpc.py
: For calculating the cosine similarity and saving top-k retrieved materials from the MPC retriever
utils_mpc.py
: utils for MPC Retriever training
GraphNetwork.py
: Formation Energy predictor GraphNetwork & GraphNetwork backbone
pretrain_nre.py
: For training NRE Retriever (Formation Energy predictor)
calculate_gibbs.py
: For calculating the Gibbs free energy between the target material and a precursor set of materials in the knowledge base
retrieval_nre.py
: Save top-k retrieved materials from the NRE retriever
utils_nre.py
: utils for NRE Retriever (Formation Energy predictor) training
You can use collate.py
to collate retrieved data from both the MPC & NRE retrievers for constructing the final dataset
--layers:
Number of GNN layers in the Retrieval-Retro
--t_layers:
Number of cross-attention layers in the Retrieval-Retro
--t_layers:
Number of self-attention layers in the Retrieval-Retro
--embedder:
Selecting embedder
--hidden:
Size of hidden dimension
--epochs:
Number of epochs for training the model
--lr:
Learning rate for training the model
--es:
Early stopping criteria
--eval:
Evaluation step
--K:
Number of retrieved materials