In this project, we aim to use Human U2OS cell images (GigaScience dataset) to predict a large number of compound activities against different protein targets.
Investigations and key findings:
- Batch effects exist in the cell image dataset. There are several methods to detect batch effects.
- Visualizing cell image features vs. experiment ID
- Interactive visualization tool to detect batch effects
- Plot feature correlation heatmap
- It is challenging to remove such batch effects.
- ComBat normalization
- Z-score normalization
- It is promising to use cell image data to predict compound assay activities.
- Use compound fingerprint feature as a baseline
- Experiment with random forest, logistic regression with features extracted from a pre-trained CNN
- End-to-end train a LeNet CNN
To learn more, please check out our Jupyter Notebooks below and Python scripts in ./scripts
.
Notebook | Description |
---|---|
image_processing.ipynb |
Visualize the raw images and their features |
meta_data.ipynb |
Explore the meta data come with the image dataset, such as compound chemical annotations |
feature_visualization.ipynb |
Visualize the single cell images, CNN extracted features, and clusterings on the extracted features |
normalization.ipynb |
Experiment with batch normalization methods such as Combat and z-score normalization |
explore_excape_db.ipynb |
Align U2OS image data with ExCAPE-DB assay data using chemical annotations |
positive_control.ipynb |
Find compounds that have been tested on U2OS cell-line from the CCLE database. |
assay_selection.ipynb |
Aggregate cell-level CellProfiler features to assay-level |
assay_prediction.ipynb |
Predict assay activity using U2OS images with random forest and logistic regression models |
simple_cnn.ipynb |
Predict assay activity using U2OS images by training a LeNet CNN model |