Skip to content

Latest commit

 

History

History
55 lines (35 loc) · 4.65 KB

resources.md

File metadata and controls

55 lines (35 loc) · 4.65 KB

Useful Resources for Machine Learning (ML) in Drug Discovery

Books

Data Science from Scratch: First Principles with Python - To apply ML in drug discovery, one has to understand data science. This book provides a practical introduction to data science with lots of great code examples. It covers Python, statistics, probability, clustering, databases, and some basic machine learning.

Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python - This book is probably the best reference for someone who wants to get started with machine learning. It covers a wide range of topics from conventional machine learning approaches to recent deep learning advances.

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python - To effectively use ML in drug discovery or anywhere else, one needs to understand statistics. This book takes a pragmatic approch with lots of code examples to help readers to explore concenpts.

Tutorials

TeachOpenCADD - This is a great set of tutorials from Andrea Volkamer's group that use Open Source software to teach Computer-Aided Drug Design concepts including molecular similarity, applications of machine learning, and pharmacophore analysis.

Practical Cheminformatics Tutorials - a set of tutorials for learning Cheminformatics. The tutorials begin with basics like SMILES and SMARTS, as well a reaction enumeration. The tutorials also include a range of SAR analysis and machine learning tasks.

Blogs

Practical Cheminformatics - This is a blog where I post once a month or so. These posts typically contain code that demonstrates various aspects of cheminformatics; clustering, machine learning, data visualization, etc. I occasionally throw in posts containing opinions on things like AI and getting a job.

Is Life Worth Living - A great blog from Iwatobipen (aka pen), whose posts are chock full of great code examples. Pen always seems to be up on the latest methods and posts interesting examples on a variety of topics ranging from quantum chemistry to machine learning.

The RDKit Blog - Greg Landrum is the primary contributor to, and BDFL, of the RDKit. In addition to the latest and greatest features in the RDKit, Greg's posts also touch on a number of key issues in Cheminformatics and ML, such as dealing with unbalanced datasets and the impact of fingerprint folding on similarity searching.

Cheminformania - A set of very practical posts by Esben Jannik Bjerrum and friends that primarily focus on the applications of deep learning in drug discovery. These posts provide several useful code examples.

Videos

3Blue1Brown has an excellent introduction to neural networks. Each of these videos is about 20 minutes.

Generative Molecular Design

Datasets

The Polaris Benchmarking Platform

Papers

A few references on key topics

Combining Datasets

Combining IC50 or Ki Values from Different Sources Is a Source of Significant Noise

Imbalanced Datasets

GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning

Model Interpreability

Model agnostic generation of counterfactual explanations for molecules