This project serves as an introduction to reading ML literature, and then applying this knowledge to deep learning and differential privacy concerns. The goal of this project is to understand deep learning models and how to protect the privacy of an individual’s data. Different algorithmic techniques for learning was implemented on medical image datasets and an analysis of privacy costs within the framework of differential privacy was be completed to evaluate the merits and room for improvement of different techniques. This project's deliverable was a research paper summarizing the results found throughout the fall and winter semesters. The paper can be found here.
- Nicole Streltsov (@NicoleStrel)
- Ritvik Jayanthi (@RitvikJayanthi)
- Alec Dong (@AlecDong)
- Ria Upreti (@ria-upreti)
- Akriti Sharma (@Akriti-Sharma1)
- Bolade Amoussou (@cdw18)
- Mikhael Orteza (@xPreliator)
- Divya Gupta (@gdivyagupta)
- Datasets:
/chest-data/
: gzip Numpy array files, from Chest Pneumonia X-ray images dataset/knee-data/
: gzip Numpy array files, from Knee Osteoarthritis X-ray images dataset
- Techniques:
/DP-SGD/
(Tensorflow Objax): Differential Privacy with Stochiastic Gradient Descent, from the paper Abadi et al./DP-SGD-JL/
(Tensorflow Keras): Differential Privacy with Stochastic Gradient Descent and JL Projections, from the paper Bu et al./DP-SGD-FL/
(PyTorch): Differential Privacy with Stochastic Gradient Descent and Federated Learning, referencing the paper Wei et al./PATE/
(PyTorch): Private Aggregation of Teacher Ensembles (PATE) algorithm, from the paper Uniyal et al.
- Python Scripts:
load_dataset_into_pickle.py
: reads a directory of images, transforms the data into Numpy arrays, applies data segmentation and saves into gzip pickle files.visualize_dataset.py
: reads a directory of images to create a scatter plot of image size and label distribution.metrics_calc_helper_functions.py
: helper functions to calculate metrics for comparison, and to dump the data into text files.runtime_and_memory_graphs.py
: generates graphs to compare memory/runtime of all techniques for the chest and knee datasets.
- Metrics:
/metrics/
: stores text files of the metrics from our techniques for both the chest and knee datasets