Name		Name	Last commit message	Last commit date
parent directory ..
model		model
README.md		README.md

README.md

Single Stage Detector

Description

This model is a real-time neural network for object detection that detects 80 different classes.

Model

Model	Download	Download (with sample test data)	ONNX version	Opset version	Accuracy
SSD	80.4 MB	78.5 MB	1.5	10	mAP of 0.195

Inference

Input to model

Image shape (1x3x1200x1200)

Preprocessing steps

The images have to be loaded in to a range of [0, 1], resized to (1200, 1200) with bilinear interpolation and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The transformation should preferrably happen at preprocessing.

The following code shows how to preprocess a NCHW tensor:

import numpy as np
from PIL import Image

def preprocess(img_path):
    input_shape = (1, 3, 1200, 1200)
    img = Image.open(img_path)
    img = img.resize((1200, 1200), Image.BILINEAR)
    img_data = np.array(img)
    img_data = np.transpose(img_data, [2, 0, 1])
    img_data = np.expand_dims(img_data, 0)
    mean_vec = np.array([0.485, 0.456, 0.406])
    stddev_vec = np.array([0.229, 0.224, 0.225])
    norm_img_data = np.zeros(img_data.shape).astype('float32')
    for i in range(img_data.shape[1]):
        norm_img_data[:,i,:,:] = (img_data[:,i,:,:]/255 - mean_vec[i]) / stddev_vec[i]
    return norm_img_data

Output of model

The model has 3 outputs. boxes: (1x'nbox'x4) labels: (1x'nbox') scores: (1x'nbox')

Dataset (Train and validation)

The SSD model was trained on 2017 COCO train data set - using mlperf/training/single_stage_detector repo , compute mAP on 2017 COCO val data set.

Validation accuracy

Metric is COCO box mAP (averaged over IoU of 0.5:0.95), computed over 2017 COCO val data. mAP of 0.195

Publication/Attribution

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg. SSD: Single Shot MultiBox Detector. In the Proceedings of the European Conference on Computer Vision (ECCV), 2016.

Backbone is ResNet34 pretrained on ILSVRC 2012 (from torchvision). Modifications to the backbone networks: remove conv_5x residual blocks, change the first 3x3 convolution of the conv_4x block from stride 2 to stride1 (this increases the resolution of the feature map to which detector heads are attached), attach all 6 detector heads to the output of the last conv_4x residual block. Thus detections are attached to 38x38, 19x19, 10x10, 5x5, 3x3, and 1x1 feature maps. Convolutions in the detector layers are followed by batch normalization layers.

References

This model is converted from mlperf/inference repository with modifications in repository.

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ssd

ssd

README.md

Single Stage Detector

Description

Model

Inference

Input to model

Preprocessing steps

Output of model

Dataset (Train and validation)

Validation accuracy

Publication/Attribution

References

License

Files

ssd

Directory actions

More options

Directory actions

More options

Latest commit

History

ssd

Folders and files

parent directory

README.md

Single Stage Detector

Description

Model

Inference

Input to model

Preprocessing steps

Output of model

Dataset (Train and validation)

Validation accuracy

Publication/Attribution

References

License