- MobilenetV1 [2017]
- MobilenetV2 [2019]
- MobilenetV3 [2019]
- SqueezeNet [2016]
- SqueezeNext [2018]
- Tiny Darknet [?]
- CondenseNet
- NASNet
- ShuffleNet
- FD-MobileNet[2018]
- ProxylessNAS[2019]
- MnasNet[2018]
- ESPNetv2[2018]
- Learning both Weights and Connections for Efficient Neural Network.
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.
- SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.
- 8-Bit Approximations for Parallelism in Deep Learning.
- Neural Networks with Few Multiplications.
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications.
- Hardware-oriented Approximation of Convolutional Neural Networks.
- Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets.
- Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations.
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.
- Deep Learning with Limited Numerical Precision.
- Dynamic Network Surgery for Efficient DNNs.
- Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks.
- Variational Dropout Sparsifies Deep Neural Networks https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn
- Soft Weight-Sharing for Neural Network Compression
- LCNN: Lookup-based Convolutional Neural Network
- Bayesian Compression for Deep Learning
- ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression
- https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/
- https://github.com/aaron-xichen/pytorch-playground#quantization
- Quantizing deep convolutional networks for efficient inference: A whitepaper
- https://nervanasystems.github.io/distiller/quantization/
- XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.
- Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1.
- BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet
- Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy
- Trained Ternary Quantization [2017]
- Training wide residual networks for deployment using a single bit for each weight [2018]
- Flattened convolutional neural networks for feedforward acceleration [2015]
- Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition [2015]
- Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [2016]
- Pruning Convolutional Neural Networks for Resource Efficient Inference [2017]
- Pruning in tensorflow
- https://github.com/Tencent/PocketFlow
- https://code.fb.com/ml-applications/qnnpack/
- https://developer.nvidia.com/tensorrt
- https://github.com/NervanaSystems/distiller
- https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize
- Data-Free Knowledge Distillation for Deep Neural Networks (https://arxiv.org/pdf/1710.07535.pdf) [2017]
- Stealing Machine Learning Models via Prediction APIs (https://arxiv.org/pdf/1609.02943.pdf) [2016]
- MobiFace [2018]
- BlazeFace [2019]
https://machinethink.net/blog/mobile-architectures/
To look at:
https://github.com/csyhhu/Awesome-Deep-Neural-Network-Compression
https://github.com/wpf535236337/real-time-network
https://github.com/dkozlov/awesome-knowledge-distillation
https://github.com/memoiry/Awesome-model-compression-and-acceleration
https://github.com/ljk628/ML-Systems/blob/master/dl_cnn.md
https://github.com/songhan/SqueezeNet-Deep-Compression
https://github.com/jiaxiang-wu/quantized-cnn
https://github.com/andyhahaha/Convolutional-Neural-Network-Compression-Survey
https://github.com/Zhouaojun/Efficient-Deep-Learning
https://github.com/ZFTurbo/Keras-inference-time-optimizer
https://github.com/becauseofAI/MobileFace
https://github.com/MingSun-Tse/EfficientDNNs
https://github.com/csyhhu/Awesome-Deep-Neural-Network-Compression