Kaggle Histopathologic Cancer Detection

Tackling the Kaggle Histopathologic Cancer Detection Challenge to evaluate different machine-learning algorithms for identifying metastatic cancer in small image patches taken from larger digital pathology scans.

Goal

For this project, in order to understand how far traditional Computer-Vision techniques have evolved with the advent of Deep Learning, I start from the very basic of algorithms and iteratively improve each model's performance one small step at a time. Not all steps are guaranteed to improve performance, but it's necessary to try them to build a working intuition of what might work.

I start off with hand-engineered CV features (Color-Space Transforms, LBP, Gabor, Scharr, Laplacian, Harris, etc.) that work well with Shallow-ML models, and compare their performance against the automatic feature-extraction of large DL models.

Results

Validation accuracy of the baseline model the started out at 53.2%. The best Shallow-ML model topped out at 87.2% using 60 hand-engineered features. The best CNN model topped out at 97.6%.

Journey

Step	Notebook	Description
1	Data_Exploration	Exploratory Data Analysis
2	Data_HDF5	Generate Grayscale+HED HDF5 dataset volume
3	Data_1D	Generate Naïve-1D flattened .npz from HDF5 for Shallow-ML
4	LogReg	Baseline Naïve-1D with Logistic Regression
5	Create_LBP_Feat	Generate LBP features and Evaluate on GBT classifier
6	LBP_Euclidean_vs_KLD	LBP histogram Dissimilarity metrics: Euclidean vs KL-Divergence
7	LogReg	Baseline LBP features with Logistic Regression
8	Find_Landmarks	Develop/Test algorithmn for finding set of 'Landmarks'
9	Generate_Landmarks	Generate Landmarks on Histopathology dataset LBP features
10	Create_LDist_Feat	Generate Distance-to-Landmarks (identified above) features
11	LogReg	Baseline Landmark features with Logistic Regression
12	PCA	Evaluate effect of PCA transformation on i. LBP and ii. Landmark features
13	SVM	Evaluate SVM model with i. LBP and ii. Landmark features
14	Create_5LBP_Feat	Generate 5-cell overlapping LBPs: 64x64px centered and 32x32px on four corners
15	GBT	Evaluate GBT model with 'Double-LBP' (scaling-pyramid: full-size 96x96px, half-size 48x48px) features
16	Create_COPOD_Feat	Classification using COPOD scores on LBP features
17	Create_2x2LBP_Feat	Add 2nd set of Rotation-Invariant LBP texture features
18	Create_Gabor_Feat	Add Gabor Filters (16x 2-D kernels) features
19	Create_Gabor_Scharr_Feat	Add Gabor+Scharr Gradient Filter features
20	Create_Laplacian_Feat	Add Laplacian Edge-Detection Filter features
21	Create_Harris_Feat	Add Harris Corner-Detection Filter features
22	GBT	Re-evaluate GBT model on aggregation of best Shallow-ML features
23	TPOT.ipynb	Evaluate TPOT Auto-ML on Shallow-ML features (Last of Shallow Models)
24	NN	Evaluate Neural Network with Shallow-ML (LBP, Gabor, Scharr) features
25	CNN_ModelA	Sequential CNN with Increasing # Conv2D filters
26	CNN_ModelB	Sequential CNN with Decreasing # Conv2D filters
27	CNN_ModelA-BD	CNN_ModelA on full 200k Train set
28	CNN_ModelD1-BD-AUG-N	Added Augmentations, Gaussian Noise, more Dropout
29	CNN_ModelF	Change last Conv2D from AvgPooling2D to Conv2D, Reduce Learning-Rate

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
kaggle_histopathology		kaggle_histopathology
.gitignore		.gitignore
CNN.ipynb		CNN.ipynb
CNN_EffNet-BD.ipynb		CNN_EffNet-BD.ipynb
CNN_EffNet.ipynb		CNN_EffNet.ipynb
CNN_ModelA-BD-AUG.ipynb		CNN_ModelA-BD-AUG.ipynb
CNN_ModelA-BD.ipynb		CNN_ModelA-BD.ipynb
CNN_ModelA.ipynb		CNN_ModelA.ipynb
CNN_ModelB-BD.ipynb		CNN_ModelB-BD.ipynb
CNN_ModelB.ipynb		CNN_ModelB.ipynb
CNN_ModelC-BD.ipynb		CNN_ModelC-BD.ipynb
CNN_ModelC.ipynb		CNN_ModelC.ipynb
CNN_ModelD-BD-AUG.ipynb		CNN_ModelD-BD-AUG.ipynb
CNN_ModelD1-BD-AUG-N.ipynb		CNN_ModelD1-BD-AUG-N.ipynb
CNN_ModelD1-BD-AUG.ipynb		CNN_ModelD1-BD-AUG.ipynb
CNN_ModelD2-BD-AUG-N.ipynb		CNN_ModelD2-BD-AUG-N.ipynb
CNN_ModelD3-BD-AUG-N.ipynb		CNN_ModelD3-BD-AUG-N.ipynb
CNN_ModelF.ipynb		CNN_ModelF.ipynb
Create_2x2LBP_Feat.ipynb		Create_2x2LBP_Feat.ipynb
Create_5LBP_Feat.ipynb		Create_5LBP_Feat.ipynb
Create_COPOD_Feat.ipynb		Create_COPOD_Feat.ipynb
Create_Gabor_Feat.ipynb		Create_Gabor_Feat.ipynb
Create_Gabor_Scharr_Feat.ipynb		Create_Gabor_Scharr_Feat.ipynb
Create_Harris_Feat.ipynb		Create_Harris_Feat.ipynb
Create_LBP_Feat.ipynb		Create_LBP_Feat.ipynb
Create_LDist_Feat.ipynb		Create_LDist_Feat.ipynb
Create_Laplacian_Feat.ipynb		Create_Laplacian_Feat.ipynb
Data_1D.ipynb		Data_1D.ipynb
Data_2D.ipynb		Data_2D.ipynb
Data_3Ch.ipynb		Data_3Ch.ipynb
Data_Exploration.ipynb		Data_Exploration.ipynb
Data_HDF5.ipynb		Data_HDF5.ipynb
Data_TF_HDF5.ipynb		Data_TF_HDF5.ipynb
Find_Landmarks.ipynb		Find_Landmarks.ipynb
GBT.ipynb		GBT.ipynb
Generate_Landmarks.ipynb		Generate_Landmarks.ipynb
LBP_Euclidean_vs_KLDivergence.ipynb		LBP_Euclidean_vs_KLDivergence.ipynb
LogReg.ipynb		LogReg.ipynb
NN.ipynb		NN.ipynb
PCA.ipynb		PCA.ipynb
README.md		README.md
SVM.ipynb		SVM.ipynb
TPOT.ipynb		TPOT.ipynb
helper.py		helper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle Histopathologic Cancer Detection

Goal

Results

Journey

About

Releases

Packages

Languages

ixig/Kaggle_Histopathology

Folders and files

Latest commit

History

Repository files navigation

Kaggle Histopathologic Cancer Detection

Goal

Results

Journey

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages