- Overview
- Performance
- Examples
- Dependencies
- Repository Content
- Download the convolutionalized MobileNet-V1 weights
- Download the original trained model weights
- Terminology
This is a Keras port of the Mobilenet SSD model architecture introduced by Wei Liu et al. in the paper SSD: Single Shot MultiBox Detector.
Weights are ported from caffe implementation of MobileNet SSD. MAP comes out to be same if we train the model from scratch and the given this implies that implementation is correct.
The repository currently provides the following network architectures:
- SSD300_mobilenet:
ssd_mobilenet.py
Mobilebet-V1 is used as a backbone for feature extyraction. This Network has capibility to train faster and results in increment in fps while deployment.
Here are the mAP evaluation results of the ported weights and below that the evaluation results of a model trained from scratch using this implementation. All models were evaluated using the official Pascal VOC test server (for 2007 test
). In all cases the results match those of the original Caffe models. Download links to all ported weights are available further below.
Mean Average Precision | |||
evaluated on | VOC2007 test | ||
trained on IoU rule |
07+12 0.5 |
07+12+COCO 0.5 |
|
MobileNet-SSD300 | 68.5 | 72.7 |
Training an SSD300 from scratch to MS-COCO and then fine tune on Pascal VOC 2007 trainval
and 2012 trainval
produces the same mAP on Pascal VOC 2007 test
as the original Caffe MobileNet SSD300 "07+12+COCO" model.
Mean Average Precision | |||
Original Caffe Model | Ported Weights | Trained from Scratch | |
MobileNet-SSD300 "07+12" | 72.5 | 72.7 | 72.2 |
The models achieve the following average number of frames per second (FPS) on Pascal VOC on an NVIDIA GeForce GTX 1080 Ti(i.e. the laptop version) and cuDNN v6 and on Nvidia Jetson Tx1.Batch Size is kept 1 for getting the prediction time which is meaningful.
Frames per Second | |||
Nvidia 1080 Ti | Nvidia Jetson TX1 | ||
Batch Size | 1 | 1 | |
MobileNet-SSD300 | 170 | 36 |
Below are some prediction examples of the fully trained original MobileNet-SSD300 "07+12" model (i.e. trained on Pascal VOC2007 trainval
and VOC2012 trainval
). The predictions were made on Pascal VOC2007 test
.
- Python 2.x or 3.x
- Numpy
- TensorFlow 1.x
- Keras 2.x
- OpenCV
This repository provides python files that explain training, inference and evaluation.
How to use a trained model for inference:
How to train a model:
How to evaluate a trained model:
How to use the data generator:
The general training setup is layed out and explained in train_mobilenet_ssd
.
To train the original MobileNet-SSD300 model on Pascal VOC:
- Download the datasets:
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
- Download the weights for the convolutionalized MobileNet-V1 or for one of the trained original models provided below.
- Set the file paths for the datasets and model weights accordingly in
train_mobilenet_ssd
and execute it.
In order to train an MobileNet-SSD300 from scratch, download the weights of the fully convolutionalized MobileNet-V1 model trained to convergence on ImageNet classification here:
As with all other weights files below, this is a direct port of the corresponding .caffemodel
file that is provided in the repository of the original Caffe implementation.
Here are the ported weights for all the original trained models. The filenames correspond to their respective .caffemodel
counterparts. The asterisks and footnotes refer to those in the README of the original Caffe implementation.
-
PASCAL VOC models:
- 07+12: MobileNet-SSD300
- 07++12+COCO: MobileNet-SSD300
The following things are on the to-do list, ranked by priority. Contributions are welcome, but please read the contributing guidelines.
- Add model definitions and trained weights for SSDs based on other base networks such as MobileNet, InceptionResNetV2, or DenseNet.
- Add support for the Theano and CNTK backends. Requires porting the custom layers and the loss function from TensorFlow to the abstract Keras backend.