Semi/Self-Supervised Learning on a Pediatric Pneumonia Dataset

About

Fully supervised approaches need large, densely annotated datasets. Only hospitals that can afford to collect large annotated datasets can utilize these approaches to aid their physicians. The project goal is to utilize self-supervised and semi-supervised learning approaches to significantly reduce the need for fully labelled data. In this repo, you will find the project source code, along with training notebooks, and the final TensorFlow 2 saved model used to develop the web application for detecting Pediatric Pneumonia from chest X-rays.

The semi/self-supervised learning framework used in the project comprises of three stages:

Self-supervised pretraining
Supervised fine-tuning with active-learning
Knowledge distillation using unlabeled data

Refer to Google reserach team's paper (SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners) for more details regarding the framework used.

The training notebooks for Stage 1, 2, and 3 can be found in the notebooks folder. Notebooks for Selective Labeling (active-learning) using Entropy or Augmentations policies can be found in the Active_Learn folder. We also evaluated another Semi-Supervised Learning approach called FixMatch. Benchmarks for Fully-Supervised Learning can be found in the FSL_Benchmarks folder. The code for Data Preprocessing can be found in Data_Preparation.

Results

Stage 1 - Contrastive Accuracy

Labels	Stage 1 (Self-Supervised)
No labels used	99.99%

Contrastive Accuracy is a measure of how invariant the model's predictions are when tested against image augmentations.

Stage 2 and 3 - Test Accuracy Comparison

Labels	FSL (Benchmark)	Stage 2 (Finetuning)	Stage 3 (Distillation)
1%	85.2%	94.5%	96.3%
2%	85.1%	96.8%	97.6%
5%	86.0%	97.1%	98.1%
100%	98.9%	N/A	N/A

Despite needing only a small fraction of labels, our Stage 2 and Stage 3 models were able to acheive test accuracies that are comparable to a 100% labelled Fully-Supervised (FSL) model. Refer to the Project Report and the Final Presentation for a more detailed discussion and findings.

ML Workflow

Web App Demo

Installation

Your can run the app locally if you have Docker installed. First, clone this repo:

git clone https://github.com/TeamSemiSuperCV/semi-super

Navigate to the webapp directory of the repo:

cd semi-super/webapp

Build the container image using the docker build command (will take few minutes):

docker build -t semi-super .

Start the container using the docker run command, specifying the name of the image we just created:

docker run -dp 8080:8080 semi-super

After a few seconds, open your web browser to http://localhost:8080. You should see the app.

Acknowledgements

We took the SimCLR framework code from Google Research and heavily modified it for the purposes of this project. We enhanced the knowledge distillation feature along with several other changes to make it perform better with our dataset. With these changes and improvements, knowledge distillation can be performed on the Google Cloud TPU infrastructure, which reduces training time significantly.

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.github/readme		.github/readme
Active_Learn		Active_Learn
Data_Preparation		Data_Preparation
FSL_Benchmarks		FSL_Benchmarks
FixMatch		FixMatch
docs		docs
notebooks		notebooks
webapp		webapp
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
data.py		data.py
data_util.py		data_util.py
lars_optimizer.py		lars_optimizer.py
metrics.py		metrics.py
model.py		model.py
objective.py		objective.py
plots.py		plots.py
requirements.txt		requirements.txt
resnet.py		resnet.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semi/Self-Supervised Learning on a Pediatric Pneumonia Dataset

About

Results

Stage 1 - Contrastive Accuracy

Stage 2 and 3 - Test Accuracy Comparison

ML Workflow

Web App Demo

Installation

Acknowledgements

Other Resources

About

Releases

Packages

Languages

ixig/semi-super-cv

Folders and files

Latest commit

History

Repository files navigation

Semi/Self-Supervised Learning on a Pediatric Pneumonia Dataset

About

Results

Stage 1 - Contrastive Accuracy

Stage 2 and 3 - Test Accuracy Comparison

ML Workflow

Web App Demo

Installation

Acknowledgements

Other Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages