Skip to content

Latest commit

 

History

History
56 lines (45 loc) · 4.4 KB

README.md

File metadata and controls

56 lines (45 loc) · 4.4 KB

(QTAIM) Generator

A simple package to perform QTAIM on molecules, reactions, and (soon) periodic systems. Uses QTAIM to define bonds in a system as well as define a rich set of descriptors for machine learning. With a few scripts you can get to generating QTAIM-informatics for analysis and machine learning tasks. Currently, this package supports BondNet(BonDNet) (for reaction-property predicton) and QTAIM-Embed and ChemProp QTAIM-Embed for molecular machine learning tasks. Note that the Chemprop implementation currently only supports atom-level QTAIM descriptors.

Overview / Installation

To get started you will need to decide a few things:

  1. DFT Software: We currently have input file writers for Orca though creating custom writers for other software should be easy to integrate. For ORCA, we add a few options such a relativistic corrections and atom-specific basis sets. See the example JSON for more options
  2. QTAIM software: The implementation with Critic2 works but is relatively experimental and we suggest you use Multiwfn as it yields a richer set of QTAIM features.
  3. Level of theory: QTAIM is pretty resistant to low levels of theory. Take care, however, when your dataset contains metals (especially heavy metals where this assertion is less tested).

Simply install this package by cloning the repo and running:

pip install -e .

Usage

Three scripts will be needed to generate QTAIM features readily formatted for your dataset. These scripts generate job files, run jobs, and parse outputs to a single json, respectively. For the following we will assume you have a properly formatted json/pickle/bson and will return to this later.

  1. create_files.py - generates input files for DFT and QTAIM jobs and has severate arguments:
    • -reaction : specifies whether the dataframe
    • -parser Multiwfn or Critic2
    • -file specifies the dataset file
    • -root specifies where to write job files
    • -options_qm_file options for your electronic structure job
    • --molden_sub whether to use orca_2mkl to convert the a gbw to a .molden.input file prior to Multiwfn. Use this if you intend on using ECPs.
  2. run.py - runs DFT and QTAIM jobs in selected folder
    • -redo_qtaim - whether to clear QTAIM results file and redo
    • -just_dft - whether to scriptly run DFT jobs
    • --reactions : specifies whether the root folder contains reaction or molecule jobs
    • -dir_active - root folder of QTAIM/DFT jobs
    • -orca_path - path to ORCA executable
    • -num_threads - number of threads for DFT jobs
    • -folders_to_crawl - how many folders to check for complete jobs
  3. parse_data.py takes DFT/QTAIM output files and merges QTAIM data into a the original data structure:
    • --root root folder of QTAIM/DFT jobs
    • --file_in - input dataframe used to construct QTAIM/DFT jobs
    • --impute - whether or not to fill in missing values with mean values from computed statistics
    • --file_out - where to write to
    • --reaction - where your data is a reaction dataset
    • --update_bonds_w_qtaim -whether to overwrite existing bond definitions
    • -define_bonds - method ("distances" or "qtaim") of determining bonds

Extra Scripts

  1. parse_stop.py computes and prints statistics of QTAIM values in selected folder
  2. check_res_rxn_json.py checks the number of complete jobs for reaction QTAIM run
  3. check_res_wfn.py checks the number of complete jobs for molecular QTAIM run
  4. folder_xyz_molecules_to_pkl.py converts a folder of xyz files into a single dataset for subsequent QTAIM generation.

Data Structure

Jsons, pkls, and bson can all be parsed.

install