This document provides a detailed list of all descriptors used in the water solubility prediction study. Each descriptor is briefly described to give context on its relevance and role in the model.
- Basic Descriptors
- Functional Group Descriptors
- Feature-Engineered Descriptors
- Descriptors Used by Sorkun
- MaxAbsEStateIndex: The maximum absolute value of the E-state index, a measure of electron distribution.
- MaxEStateIndex: The maximum E-state index in the molecule.
- MinAbsEStateIndex: The minimum absolute value of the E-state index.
- MinEStateIndex: The minimum E-state index in the molecule.
- qed: Quantitative estimate of drug-likeness, a metric used to assess the drug-likeness of a molecule.
- SPS: Synthetic Accessibility Score, predicting the ease of synthesis for a compound.
- MolWt: Molecular Weight, the total mass of the molecule.
- HeavyAtomMolWt: The molecular weight of heavy atoms (non-hydrogen atoms).
- ExactMolWt: The exact molecular weight considering isotopic distribution.
- NumValenceElectrons: Total number of valence electrons in the molecule.
- NumRadicalElectrons: Number of radical electrons in the molecule.
- MaxPartialCharge: Maximum partial charge on any atom in the molecule.
- MinPartialCharge: Minimum partial charge on any atom in the molecule.
- MaxAbsPartialCharge: Maximum absolute value of the partial charge on any atom.
- MinAbsPartialCharge: Minimum absolute value of the partial charge on any atom.
- FpDensityMorgan1: Morgan fingerprint density with radius 1, a circular fingerprint used to encode molecular structure.
- FpDensityMorgan2: Morgan fingerprint density with radius 2.
- FpDensityMorgan3: Morgan fingerprint density with radius 3.
- BCUT2D_MWHI: BCUT metric using molecular weight as the property.
- BCUT2D_MWLOW: BCUT metric for low molecular weight property.
- BCUT2D_CHGHI: BCUT metric using high charge as the property.
- BCUT2D_CHGLO: BCUT metric for low charge property.
- BCUT2D_LOGPHI: BCUT metric using high LogP as the property.
- BCUT2D_LOGPLOW: BCUT metric for low LogP property.
- BCUT2D_MRHI: BCUT metric using high molar refractivity as the property.
- BCUT2D_MRLOW: BCUT metric for low molar refractivity property.
- AvgIpc: Average Information Content (Ipc) of the molecule.
- BalabanJ: Balaban’s J index, a topological descriptor.
- BertzCT: Bertz complexity index, a measure of molecular complexity.
- Chi0: The first-order connectivity index.
- Chi0n: The first-order connectivity index with nitrogen.
- Chi0v: The first-order valence connectivity index.
- Chi1: The second-order connectivity index.
- Chi1n: The second-order connectivity index with nitrogen.
- Chi1v: The second-order valence connectivity index.
- Chi2n: The third-order connectivity index with nitrogen.
- Chi2v: The third-order valence connectivity index.
- Chi3n: The fourth-order connectivity index with nitrogen.
- Chi3v: The fourth-order valence connectivity index.
- Chi4n: The fifth-order connectivity index with nitrogen.
- Chi4v: The fifth-order valence connectivity index.
- HallKierAlpha: Hall-Kier alpha modification of the shape index.
- Ipc: Information content index, a molecular descriptor.
- Kappa1: First Kappa shape index.
- Kappa2: Second Kappa shape index.
- Kappa3: Third Kappa shape index.
- LabuteASA: Labute’s approximation to molecular surface area.
- PEOE_VSA1: PEOE charge on molecular surface area, bin 1.
- PEOE_VSA10: PEOE charge on molecular surface area, bin 10.
- PEOE_VSA11: PEOE charge on molecular surface area, bin 11.
- PEOE_VSA12: PEOE charge on molecular surface area, bin 12.
- PEOE_VSA13: PEOE charge on molecular surface area, bin 13.
- PEOE_VSA14: PEOE charge on molecular surface area, bin 14.
- PEOE_VSA2: PEOE charge on molecular surface area, bin 2.
- PEOE_VSA3: PEOE charge on molecular surface area, bin 3.
- PEOE_VSA4: PEOE charge on molecular surface area, bin 4.
- PEOE_VSA5: PEOE charge on molecular surface area, bin 5.
- PEOE_VSA6: PEOE charge on molecular surface area, bin 6.
- PEOE_VSA7: PEOE charge on molecular surface area, bin 7.
- PEOE_VSA8: PEOE charge on molecular surface area, bin 8.
- PEOE_VSA9: PEOE charge on molecular surface area, bin 9.
- SMR_VSA1: SMR surface area, bin 1.
- SMR_VSA10: SMR surface area, bin 10.
- SMR_VSA2: SMR surface area, bin 2.
- SMR_VSA3: SMR surface area, bin 3.
- SMR_VSA4: SMR surface area, bin 4.
- SMR_VSA5: SMR surface area, bin 5.
- SMR_VSA6: SMR surface area, bin 6.
- SMR_VSA7: SMR surface area, bin 7.
- SMR_VSA8: SMR surface area, bin 8.
- SMR_VSA9: SMR surface area, bin 9.
- SlogP_VSA1: SlogP surface area, bin 1.
- SlogP_VSA10: SlogP surface area, bin 10.
- SlogP_VSA11: SlogP surface area, bin 11.
- SlogP_VSA12: SlogP surface area, bin 12.
- SlogP_VSA2: SlogP surface area, bin 2.
- SlogP_VSA3: SlogP surface area, bin 3.
- SlogP_VSA4: SlogP surface area, bin 4.
- SlogP_VSA5: SlogP surface area, bin 5.
- SlogP_VSA6: SlogP surface area, bin 6.
- SlogP_VSA7: SlogP surface area, bin 7.
- SlogP_VSA8: SlogP surface area, bin 8.
- SlogP_VSA9: SlogP surface area, bin 9.
- TPSA: Topological polar surface area, a predictor of drug absorption.
- EState_VSA1: E-state surface area, bin 1.
- EState_VSA10: E-state surface area, bin 10.
- EState_VSA11: E-state surface area, bin 11.
- EState_VSA2: E-state surface area, bin 2.
- EState_VSA3: E-state surface area, bin 3.
- EState_VSA4: E-state surface area, bin 4.
- EState_VSA5: E-state surface area, bin 5.
- EState_VSA6: E-state surface area, bin 6.
- EState_VSA7: E-state surface area, bin 7.
- EState_VSA8: E-state surface area, bin 8.
- EState_VSA9: E-state surface area, bin 9.
- VSA_EState1: VSA_E-state, bin 1.
- VSA_EState10: VSA_E-state, bin 10.
- VSA_EState2: VSA_E-state, bin 2.
- VSA_EState3: VSA_E-state, bin 3.
- VSA_EState4: VSA_E-state, bin 4.
- VSA_EState5: VSA_E-state, bin 5.
- VSA_EState6: VSA_E-state, bin 6.
- VSA_EState7: VSA_E-state, bin 7.
- VSA_EState8: VSA_E-state, bin 8.
- VSA_EState9: VSA_E-state, bin 9.
- FractionCSP3: Fraction of carbon atoms that are sp3 hybridized.
- HeavyAtomCount: The number of heavy atoms (non-hydrogen atoms).
- NHOHCount: The number of -OH and -NH groups.
- NOCount: The number of nitrogen and oxygen atoms.
- NumAliphaticCarbocycles: The number of aliphatic carbocycles.
- NumAliphaticHeterocycles: The number of aliphatic heterocycles.
- NumAliphaticRings: The number of aliphatic rings.
- NumAromaticCarbocycles: The number of aromatic carbocycles.
- NumAromaticHeterocycles: The number of aromatic heterocycles.
- NumAromaticRings: The number of aromatic rings.
- NumHAcceptors: The number of hydrogen bond acceptors.
- NumHDonors: The number of hydrogen bond donors.
- NumHeteroatoms: The number of heteroatoms (non-carbon and non-hydrogen atoms).
- NumRotatableBonds: The number of rotatable bonds.
- NumSaturatedCarbocycles: The number of saturated carbocycles.
- NumSaturatedHeterocycles: The number of saturated heterocycles.
- NumSaturatedRings: The number of saturated rings.
- RingCount: The total number of rings in the molecule.
- MolLogP: The logarithm of the partition coefficient (LogP) of the molecule.
- MolMR: The molar refractivity of the molecule.
- Hydroxyl Group ('[OH]'): Indicates the presence of a hydroxyl group (-OH) in the molecule.
- Carbonyl Group ('C=O'): Indicates the presence of a carbonyl group (C=O) in the molecule.
- Amide Group ('C(=O)N'): Indicates the presence of an amide group (-C(=O)N-) in the molecule.
- Carboxyl Group ('C(=O)[OH]'): Indicates the presence of a carboxyl group (-COOH) in the molecule.
- Alkyl ('[R]'): Indicates the presence of an alkyl group (R-) in the molecule.
- Aromatic Rings ('c'): Indicates the presence of aromatic rings in the molecule.
- Alkene ('C=C'): Indicates the presence of an alkene group (C=C) in the molecule.
- charge: The total charge of the molecule.
- many_double_bonds: The count of double bonds in the molecule.
- atoms_degree_0: Number of atoms with zero degree (not connected to other atoms).
- atoms_degree_1: Number of atoms with one connection.
- atoms_degree_2: Number of atoms with two connections.
- atoms_degree_3: Number of atoms with three connections.
- atoms_degree_4: Number of atoms with four connections.
- atoms_degree_5: Number of atoms with five connections.
- atoms_degree_6: Number of atoms with six connections.
- atoms_valence_0: Number of atoms with zero valence electrons.
- atoms_valence_1: Number of atoms with one valence electron.
- atoms_valence_2: Number of atoms with two valence electrons.
- atoms_valence_3: Number of atoms with three valence electrons.
- atoms_valence_4: Number of atoms with four valence electrons.
- atoms_valence_5: Number of atoms with five valence electrons.
- atoms_valence_6: Number of atoms with six valence electrons.
- atom_hybridization_S: Number of atoms with S orbital hybridization.
- atom_hybridization_SP: Number of atoms with SP orbital hybridization.
- atom_hybridization_SP2: Number of atoms with SP2 orbital hybridization.
- atom_hybridization_SP3: Number of atoms with SP3 orbital hybridization.
- atom_hybridization_SP3D: Number of atoms with SP3D orbital hybridization.
- atom_hybridization_SP3D2: Number of atoms with SP3D2 orbital hybridization.
- atom_hybridization_UNSPECIFIED: Number of atoms with unspecified hybridization.
- aromatic_atoms: Number of aromatic atoms in the molecule.
- single_bonds: Number of single bonds in the molecule.
- double_bonds: Number of double bonds in the molecule.
- triple_bonds: Number of triple bonds in the molecule.
- aromatic_bonds: Number of aromatic bonds in the molecule.
- zero_bonds: Number of atoms with zero bonds.
- conjugated_bonds: Number of conjugated bonds in the molecule.
- bonds_in_ring: Number of bonds present in a ring structure.
- chirality_none: Number of atoms without chirality.
- chirality_any: Number of atoms with any chirality.
- chirality_z: Number of atoms with Z chirality.
- chirality_e: Number of atoms with E chirality.
- n_atoms: Total number of atoms in the molecule.
- n_bonds: Total number of bonds in the molecule.
- n_rings: Total number of rings in the molecule.
- nX: Number of halogen atoms in the molecule.
- nHeavyAtom: Number of non-hydrogen atoms in the molecule.
- nAromAtom: Number of aromatic atoms in the molecule.
- nHBAcc: Number of hydrogen bond acceptors.
- nHBDon: Number of hydrogen bond donors.
- nRot: Number of rotatable bonds.
- nBonds: Total number of bonds in the molecule.
- nAromBond: Number of aromatic bonds in the molecule.
- nBondsO: Number of bonds to oxygen atoms.
- nBondsS: Number of bonds to sulfur atoms.
- nBondsD: Number of double bonds in the molecule.
- nBondsT: Number of triple bonds in the molecule.
- VMcGowan: McGowan volume, a descriptor related to molecular size.
- TopoPSA(NO): Topological polar surface area, calculated considering only oxygen and nitrogen atoms.
- TopoPSA: Topological polar surface area, considering all polar atoms.
- LabuteASA: Labute’s Approximation to Molecular Surface Area.
- apol: Sum of atomic polarizabilities (including implicit hydrogens).
- bpol: Sum of atomic polarizabilities (excluding hydrogens).
- nAcid: Number of acidic groups in the molecule.
- nBase: Number of basic groups in the molecule.
- ECIndex: Eccentric connectivity index, a topological descriptor.
- GGI1: Gutman Molecular Topological Index.
- JGI1: Topological charge index of the first order.
- SLogP: Logarithm of the partition coefficient (SLogP).
- SMR: Sum of atomic molar refractivity values.
- BertzCT: Bertz complexity index, related to the molecular complexity.
- BalabanJ: Balaban’s J index, a topological descriptor.
- WPol: Weighted path order 3.
- Zagreb1: First Zagreb index, a topological descriptor.
- ABC: Aromatic bond count.
- ABCGG: Generalized graph approach to aromatic bond count.
- nRing: Number of rings in the molecule.
- nHRing: Number of rings with hydrogen atoms.
- naRing: Number of aromatic rings in the molecule.
- naHRing: Number of aromatic rings with hydrogen atoms.
- nARing: Number of rings with atoms other than carbon.
- nFRing: Number of fused rings in the molecule.
- NsCH3: Number of sulfur-carbon single bonds where carbon is connected to three other atoms.
- NdCH2: Number of carbon atoms doubly bonded to another atom and singly bonded to two other atoms.
- NssCH2: Number of carbon atoms singly bonded to a sulfur atom and two other atoms.
- NtCH: Number of carbon atoms triply bonded to another atom and singly bonded to one other atom.
- NdsCH: Number of carbon atoms doubly bonded to a sulfur atom and singly bonded to one other atom.
- NaaCH: Number of carbon atoms singly bonded to two other atoms and a sulfur atom.
- NsssCH: Number of carbon atoms singly bonded to three other atoms and a sulfur atom.
- NddC: Number of carbon atoms doubly bonded to two other atoms.
- NtsC: Number of carbon atoms triply bonded to another atom and singly bonded to one other atom.
- NdssC: Number of carbon atoms doubly bonded to a sulfur atom and singly bonded to one other atom.
- NaasC: Number of carbon atoms singly bonded to a sulfur atom and one other atom.
- NaaaC: Number of carbon atoms singly bonded to three other atoms.
- NssssC: Number of carbon atoms singly bonded to four other atoms.
- NsNH2: Number of nitrogen atoms singly bonded to one other atom and two hydrogens.
- NssNH: Number of nitrogen atoms singly bonded to two other atoms and one hydrogen.
- NaaNH: Number of nitrogen atoms singly bonded to two other atoms and one hydrogen.
- NtN: Number of nitrogen atoms triply bonded to another atom.
- NdsN: Number of nitrogen atoms doubly bonded to another atom and singly bonded to one other atom.
- NaaN: Number of nitrogen atoms singly bonded to two other atoms.
- NsssN: Number of nitrogen atoms singly bonded to three other atoms.
- NaasN: Number of nitrogen atoms singly bonded to two other atoms.
- NsOH: Number of oxygen atoms singly bonded to one other atom and one hydrogen.
- NdO: Number of oxygen atoms doubly bonded to another atom.
- NssO: Number of oxygen atoms singly bonded to two other atoms.
- NaaO: Number of oxygen atoms singly bonded to two other atoms.
- NsF: Number of fluorine atoms singly bonded to one other atom.
- NdsssP: Number of phosphorus atoms doubly bonded to another atom and singly bonded to three other atoms.
- NdS: Number of sulfur atoms doubly bonded to another atom.
- NssS: Number of sulfur atoms singly bonded to two other atoms.
- NaaS: Number of sulfur atoms singly bonded to two other atoms.
- NdssS: Number of sulfur atoms doubly bonded to another atom and singly bonded to one other atom.
- NddssS: Number of sulfur atoms doubly bonded to two other atoms.
- NsCl: Number of chlorine atoms singly bonded to one other atom.
- NsBr: Number of bromine atoms singly bonded to one other atom.
- NsI: Number of iodine atoms singly bonded to one other atom.
- SsCH3: Number of carbon atoms singly bonded to one sulfur atom and three other atoms.
- SdCH2: Number of carbon atoms doubly bonded to another atom and singly bonded to two other atoms.
- SssCH2: Number of carbon atoms singly bonded to two other atoms and one sulfur atom.
- StCH: Number of carbon atoms triply bonded to another atom and singly bonded to one other atom.
- SdsCH: Number of carbon atoms doubly bonded to a sulfur atom and singly bonded to one other atom.
- SaaCH: Number of carbon atoms singly bonded to a sulfur atom and two other atoms.
- SsssCH: Number of carbon atoms singly bonded to three other atoms and a sulfur atom.
- SddC: Number of carbon atoms doubly bonded to two other atoms.
- StsC: Number of carbon atoms triply bonded to another atom and singly bonded to one other atom.
- SdssC: Number of carbon atoms doubly bonded to a sulfur atom and singly bonded to one other atom.
- SaasC: Number of carbon atoms singly bonded to two other atoms and a sulfur atom.
- SaaaC: Number of carbon atoms singly bonded to three other atoms.
- SssssC: Number of carbon atoms singly bonded to four other atoms.
- SsNH2: Number of nitrogen atoms singly bonded to one other atom and two hydrogens.
- SssNH: Number of nitrogen atoms singly bonded to two other atoms and one hydrogen.
- SaaNH: Number of nitrogen atoms singly bonded to two other atoms and one hydrogen.
- StN: Number of nitrogen atoms triply bonded to another atom.
- SdsN: Number of nitrogen atoms doubly bonded to another atom and singly bonded to one other atom.
- SaaN: Number of nitrogen atoms singly bonded to two other atoms.
- SsssN: Number of nitrogen atoms singly bonded to three other atoms.
- SaasN: Number of nitrogen atoms singly bonded to two other atoms.
- SsOH: Number of oxygen atoms singly bonded to one other atom and one hydrogen.
- SdO: Number of oxygen atoms doubly bonded to another atom.
- SssO: Number of oxygen atoms singly bonded to two other atoms.
- SaaaO: Number of oxygen atoms singly bonded to two other atoms.
- SsF: Number of fluorine atoms singly bonded to one other atom.
- SdsssP: Number of phosphorus atoms doubly bonded to another atom and singly bonded to three other atoms.
- SdS: Number of sulfur atoms doubly bonded to another atom.
- SssS: Number of sulfur atoms singly bonded to two other atoms.
- SaaaS: Number of sulfur atoms singly bonded to two other atoms.
- SdssS: Number of sulfur atoms doubly bonded to another atom and singly bonded to one other atom.
- SddssS: Number of sulfur atoms doubly bonded to two other atoms.
- SsCl: Number of chlorine atoms singly bonded to one other atom.
- SsBr: Number of bromine atoms singly bonded to one other atom.
- SsI: Number of iodine atoms singly bonded to one other atom.
- C: Total number of carbon atoms.
- H: Total number of hydrogen atoms.
- Br: Total number of bromine atoms.
- N: Total number of nitrogen atoms.
- O: Total number of oxygen atoms.
- I: Total number of iodine atoms.
- Cl: Total number of chlorine atoms.
- S: Total number of sulfur atoms.
- F: Total number of fluorine atoms.
- P: Total number of phosphorus atoms.
- As: Total number of arsenic atoms.
- Si: Total number of silicon atoms.
- Se: Total number of selenium atoms.
- Sn: Total number of tin atoms.
- Hg: Total number of mercury atoms.
- Ge: Total number of germanium atoms.