A simple package to perform QTAIM on molecules, reactions, and (soon) periodic systems. Uses QTAIM to define bonds in a system as well as define a rich set of descriptors for machine learning. With a few scripts you can get to generating QTAIM-informatics for analysis and machine learning tasks. Currently, this package supports BondNet(BonDNet) (for reaction-property predicton) and QTAIM-Embed and ChemProp QTAIM-Embed for molecular machine learning tasks. Note that the Chemprop implementation currently only supports atom-level QTAIM descriptors.
To get started you will need to decide a few things:
- DFT Software: We currently have input file writers for Orca though creating custom writers for other software should be easy to integrate. For ORCA, we add a few options such a relativistic corrections and atom-specific basis sets. See the example JSON for more options
- QTAIM software: The implementation with Critic2 works but is relatively experimental and we suggest you use Multiwfn as it yields a richer set of QTAIM features.
- Level of theory: QTAIM is pretty resistant to low levels of theory. Take care, however, when your dataset contains metals (especially heavy metals where this assertion is less tested).
Simply install this package by cloning the repo and running:
pip install -e .
Three scripts will be needed to generate QTAIM features readily formatted for your dataset. These scripts generate job files, run jobs, and parse outputs to a single json, respectively. For the following we will assume you have a properly formatted json/pickle/bson and will return to this later.
create_files.py
- generates input files for DFT and QTAIM jobs and has severate arguments:-reaction
: specifies whether the dataframe-parser
Multiwfn or Critic2-file
specifies the dataset file-root
specifies where to write job files-options_qm_file
options for your electronic structure job--molden_sub
whether to useorca_2mkl
to convert the a gbw to a .molden.input file prior to Multiwfn. Use this if you intend on using ECPs.
run.py
- runs DFT and QTAIM jobs in selected folder-redo_qtaim
- whether to clear QTAIM results file and redo-just_dft
- whether to scriptly run DFT jobs--reactions
: specifies whether the root folder contains reaction or molecule jobs-dir_active
- root folder of QTAIM/DFT jobs-orca_path
- path to ORCA executable-num_threads
- number of threads for DFT jobs-folders_to_crawl
- how many folders to check for complete jobs
parse_data.py
takes DFT/QTAIM output files and merges QTAIM data into a the original data structure:--root
root folder of QTAIM/DFT jobs--file_in
- input dataframe used to construct QTAIM/DFT jobs--impute
- whether or not to fill in missing values with mean values from computed statistics--file_out
- where to write to--reaction
- where your data is a reaction dataset--update_bonds_w_qtaim
-whether to overwrite existing bond definitions-define_bonds
- method ("distances" or "qtaim") of determining bonds
parse_stop.py
computes and prints statistics of QTAIM values in selected foldercheck_res_rxn_json.py
checks the number of complete jobs for reaction QTAIM runcheck_res_wfn.py
checks the number of complete jobs for molecular QTAIM runfolder_xyz_molecules_to_pkl.py
converts a folder of xyz files into a single dataset for subsequent QTAIM generation.
Jsons, pkls, and bson can all be parsed.