Bill Tatsis, Matt Seddon, Dan Mason, Dan O'Donovan, Gio Cincilla, Azedine Zoufir and Nath Brown
Lig3DLens performs the following tasks:
- Prepares a commercial compound library to be used for a VS campaign. This task involves: i) compound standardisation and ii) filtering out compounds outside a predefined range of physicochemical properties.
- Generates conformers for all of the compounds in the commercial library and calculates their 3D similarity (shape & electrostatics) to a reference compound.
- Finally, it can cluster the highest scoring hits and select a set number of representative compounds that can be ordered and tested.
python -m pip install -r requirements.txt .
- Prepare a chemical library for a 3D VS campaign
Note In order to keep track of the library cmpds the input file should have a column containing the text "ID"
lig3lens-prepare --in input_SD_file --filter physchem_yaml_file --out output_SD_file
- Generates 3D conformers for both the library and reference compounds and scores the library compounds using a 3D shape & electrostatics similarity function to the reference molecule
Note In order to keep track of the library cmpds the input file should have a column containing the text "ID"
lig3dlens-align --ref input_reference_molecule_file --lib input_library_file_name --conf num_conformers --out output_SD_file
New feature Lig3DLens can work with input files that contain the 3D conformations for the reference compound and/or the compound library
lig3dlens-align --ref input_3D-reference_molecule_file --lib input_3D-library_file_name --out output_SD_file
- Clusters the highest scoring molecules and selects a representative (diverse) set of compounds. The user can input the number of clusters (
num_clusters
), the fingerprint type (fingerprint_type
) and its dimension (fingerprint_dimension
) used for the clustering.
lig3dlens-cluster –-in input_SD_file –-clusters num_clusters –-out output_file -–dim fingerprint_dimension -–fp_type fingerprint_type
To run lig3dlens
with the input files (reference molecule and compound library) included in this repository;
... preparing the compound library for virtual screening
lig3dlens-prepare --in tests/test_data/Enamine_hts_collection_202303_first500_VS_results.sdf \
--filter lig3dlens/physchem_properties.yaml \
--out curated_compounds.sd
... generate the 3D conformers for both reference and library compounds, score the compounds in the library set using a 3D shape and electrostatics similarity
lig3dlens-align --ref input_reference_molecule_file --lib input_library_file_name --conf num_conformers --out output_SD_file
... cluster the results from the 3D virtual screening campaing
lig3dlens-cluster –-in input_SD_file –-clusters num_clusters –-out output_file -–dim fingerprint_dimension -–fp_type fingerprint_type
Use the Makefile commands to help tidy the codebase periodically. The following will reformat the code according to PEP8, and logically sort the imported modules:
make tidy
Run pytest in lig3dlens directory
pytest tests
Note: The whole VS workflow was tested in a linux (ubuntu) environment and this environment variable had to be set: Tell MKL (used by NumPy) to use the GNU OpenMP runtime instead of the Intel OpenMP runtime by setting the following environment variable:
export MKL_THREADING_LAYER=GNU
The open source quantum chemistry package Psi4 is required to for the QM calculation of the partial charges. More information about installing Psi4 in different CPU architectures (arm64 included) is provided in Psi4's website
-
Compound library preparation:
-
Compound selection:
- Multi-parameter selection of compounds using a score function that includes the 3D score, 2D similarity to the reference compound, and the physchem properties. The aim is to get an even distribution between highly scored cmpds and other properties.
- Select an optimal number of clusters instead of a predefined one (e.g. using Silhouette or affinity propagation methods). Alternatively, using another method for maximum score-diversity selection problem (e.g. Score Erosion algorithm).
- Provide tools to analyse the chemical diversity of the final selection compound set.