CovidGenetic

This code has been merged into a more comprehensive collection of ML code for the covid moonshot. This repo will therefore be archived.

CovidGenetic

Genetic algorithm search for molecules with high similarities to known COVID-19 protease inhibitors - recap on the protease can be found here

Visualise protease as well as the ligands which inhibit specific sites here

3D coordinates of all the ligands (need to filter to get the relevant ones) can be found here in the .pdb files which are formatted like this

Background on the graph-based genetic algorithm (GB-GA) that I used can be found in this paper - I co-opted the GB-GA code from this Github

Dependencies (for running the GA)

dscribe - make sure you get the latest version which is much quicker at calculating SOAP descriptors
RDKit
pandas

Data

Fragment data was preprocessed using transform.sh which uses openbabel to conver the .mol files into .xyz, which are then fed into concat_ligand.py to concat the atom coordinates into one file.

Candidates were found from this Google Sheets and the 'SMILES' column is saved in data/covid_submissions.csv

How to use

running python GA-soap.py will start running a genetic algorithm. It uses crossover.py and mutate.py from Jensen's Github. Doc-strings and comments in GA-soap.py should be enough to help you understand what's going on.

To-try

parallelize conformer generation / similarity calculation with MPI? Takes ~1 minute per generation right now which isn't terrible but could be better
play with GA and SOAP parameters to find optimal candidate(s) ; set -tgt_size to average size of the molecules in the submission?
I think we should only use submission candidates that include the fragments from site 2 and 11 (which are the only ones I selected); too tired to do it now
possibly include fragments in the initial population also
improved way of writing best candidates from each generation to a file for visualisation
some form of synthesizability scoring?
May need better objective function as target ligand field has a LOT of atoms (892)
more conformers?

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
__pycache__		__pycache__
data		data
experiments		experiments
GA-soap-mpi.py		GA-soap-mpi.py
GA-soap.py		GA-soap.py
README.md		README.md
crossover.py		crossover.py
helper.py		helper.py
mutate.py		mutate.py
sascorer.py		sascorer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CovidGenetic

Dependencies (for running the GA)

Data

How to use

To-try

About

Releases

Packages

Languages

wjm41/CovidGenetic

Folders and files

Latest commit

History

Repository files navigation

CovidGenetic

Dependencies (for running the GA)

Data

How to use

To-try

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages