- The code is the official implementation of the work "Mitigating Carbon Footprint for Knowledge Distillation Based Deep Learning Model Compression" in PLOS One. Published: May 15, 2023, DOI: https://doi.org/10.1371/journal.pone.0285668
- Congratulations and great work to Kazi Rafat, Sadia Islam, Abdullah Al Mahfug and Ismail Hossain.
- Special Thanks to the great supervisors Dr. Shafin Rahman and Dr. Nabeel Mohammad.
KD generates lighter models and typically performs with slightly less accuracy than the heavier teacher model (model accuracy by the teacher model. Although the distillation process makes models deployable on low-resource devices, they were found to consume an exorbitant amount of energy and have a substantial carbon footprint (15 times more carbon compared to the corresponding teacher model). The enormous environmental cost is primarily attributable to the tuning of the hyperparameter, Temperature (τ). In this article, we propose measuring the environmental costs of deep learning work (in terms of GFLOPS in millions, energy consumption in kWh, and CO2 equivalent in grams). In order to create lightweight models with low environmental costs, we propose a straightforward yet effective method for selecting a hyperparameter (τ) using a stochastic approach for each training batch fed into the models.
Figure 1: Illustration of carbon footprints used by different deep models while (a) training on CIFAR 100 (in log scale) and (b-c) inferring on evaluation set. ResNet18 is a deeper model with 11.2M parameters, resulting in higher inference time (4.7 sec.) and CO2 emission (0.087 g). To minimize this, using ResNet18 as a teacher, we train two student models, MobileNetV2 (student 1) and ShuffleNetV2 (student 2), following the traditional KD process. This training costs significant carbon footprints (red and green dashed curves in (a)) with an accuracy increment from learning the teacher model (black dotted curve in (a)). However, as expected, both students consume less time and CO2 during inference (red and green shaded bars in (b) and (c)). We aim to reduce the training cost and CO2 production of the KD process while using the same students (red and green solid curves in (a)) and maintain similar accuracy and inference costs (solid red and green bars in (b) and (c)) in comparison with the costly KD training.
- python>=3.7
- pytorch==1.12.1, torchvision==0.13.1, torchaudio==0.12.1, cudatoolkit=11.3
- carbontracker
- CIFAR 10:
Paper link: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Website: https://www.cs.toronto.edu/~kriz/cifar.html
File: https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
- CIFAR 100: Paper link: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Website: https://www.cs.toronto.edu/~kriz/cifar.html
File: https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
- Tiny ImageNet:
Paper link: http://cs231n.stanford.edu/reports/2015/pdfs/yle_project.pdf
File: http://cs231n.stanford.edu/tiny-imagenet-200.zip
- Pascal voc 2012:
Paper link: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/devkit_doc.pdf
Website: http://host.robots.ox.ac.uk/pascal/VOC/
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
- Pascal voc 2007:
Paper link: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/devkit_doc_07-Jun-2007.pdf
Website: http://host.robots.ox.ac.uk/pascal/VOC/
File: http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar, and http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
STKD.py
: perform KD training, base training, regularization training, stochastic training (Tiny ImageNet custom dataset added).Experiments
: consists of JSON files for carrying out experiments.model
: Consists of model architectures that can be used.
- Change the dataset under "Dataset Assignment" to use different dataset. Currently available for CIFAR 10, CIFAR 100, and Tiny ImageNet. Can include new custom datasets for new data.
- Change KD train model dir and choose teacher-student combinations according to requirement and cmnt out base train.
- To run base train comment out KD train and choose appropriate model to train.
- Change Arment passing for different teacher free training.
-
Github Repository for DATA-FREE KNOWLEDGE DISTILLATION: https://github.com/zju-vipa/CMI
-
Github Repository for OBJECT DETECTION KNOWLEDGE DISTILLATION: https://github.com/SsisyphusTao/Object-Detection-Knowledge-Distillation/tree/mbv2-lite
Sadia Islam: [email protected]
Abdullah Al Mahfug: [email protected]
Md. Ismail Hossain: [email protected]
Dr. Nabeel Mohammad: [email protected]
If you use this code and model and dataset splits for your research, please consider citing:
@article{rafat2023mitigating,
title={Mitigating carbon footprint for knowledge distillation based deep learning model compression},
author={Rafat, Kazi and Islam, Sadia and Mahfug, Abdullah Al and Hossain, Md Ismail and Rahman, Fuad and Momen, Sifat and Rahman, Shafin and Mohammed, Nabeel},
journal={Plos one},
volume={18},
number={5},
pages={e0285668},
year={2023},
publisher={Public Library of Science San Francisco, CA USA}
}