Data Science from Scratch: First Principles with Python - To apply ML in drug discovery, one has to understand data science. This book provides a practical introduction to data science with lots of great code examples. It covers Python, statistics, probability, clustering, databases, and some basic machine learning.
Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python - This book is probably the best reference for someone who wants to get started with machine learning. It covers a wide range of topics from conventional machine learning approaches to recent deep learning advances.
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python - To effectively use ML in drug discovery or anywhere else, one needs to understand statistics. This book takes a pragmatic approch with lots of code examples to help readers to explore concenpts.
TeachOpenCADD - This is a great set of tutorials from Andrea Volkamer's group that use Open Source software to teach Computer-Aided Drug Design concepts including molecular similarity, applications of machine learning, and pharmacophore analysis.
Practical Cheminformatics Tutorials - a set of tutorials for learning Cheminformatics. The tutorials begin with basics like SMILES and SMARTS, as well a reaction enumeration. The tutorials also include a range of SAR analysis and machine learning tasks.
Practical Cheminformatics - This is a blog where I post once a month or so. These posts typically contain code that demonstrates various aspects of cheminformatics; clustering, machine learning, data visualization, etc. I occasionally throw in posts containing opinions on things like AI and getting a job.
Is Life Worth Living - A great blog from Iwatobipen (aka pen), whose posts are chock full of great code examples. Pen always seems to be up on the latest methods and posts interesting examples on a variety of topics ranging from quantum chemistry to machine learning.
The RDKit Blog - Greg Landrum is the primary contributor to, and BDFL, of the RDKit. In addition to the latest and greatest features in the RDKit, Greg's posts also touch on a number of key issues in Cheminformatics and ML, such as dealing with unbalanced datasets and the impact of fingerprint folding on similarity searching.
Cheminformania - A set of very practical posts by Esben Jannik Bjerrum and friends that primarily focus on the applications of deep learning in drug discovery. These posts provide several useful code examples.
3Blue1Brown has an excellent introduction to neural networks. Each of these videos is about 20 minutes.
- But what is a neural network? | Chapter 1, Deep learning
- Gradient descent, how neural networks learn | Chapter 2, Deep learning
- What is backpropagation really doing? | Chapter 3, Deep learning
- SMILES-RNN from Morgan Thomas code
- REINVENT from AstraZeneca paper code
- MolMIM from NVIDIA paper demo_notebook
The Polaris Benchmarking Platform
A few references on key topics
Combining IC50 or Ki Values from Different Sources Is a Source of Significant Noise
GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning
Model agnostic generation of counterfactual explanations for molecules