A trained model of ICNet for fast semantic segmentation, trained on the CamVid* dataset from scratch using the TensorFlow* framework. The trained model has 30% sparsity (ratio of 0's within all the convolution kernel weights). For more details about the original floating point model, check out the paper.
The model input is a blob that consists of a single image of "1x3x720x960" in BGR order. The pixel values are integers in the [0, 255] range.
The model output for icnet-camvid-ava-sparse-30-0001
is the predicted class index of each input pixel belonging to one of the 12 classes of the CamVid dataset.
Metric | Value |
---|---|
GFlops | 151.82Bn |
MParams | 25.45 |
Source framework | TensorFlow* |
The quality metrics were calculated on the CamVid* validation dataset. The 'unlabeled' class had been ignored during metrics calculation.
Metric | Value |
---|---|
mIoU | 69.99% |
IOU=TP/(TP+FN+FP)
, where:TP
- number of true positive pixels for given classFN
- number of false negative pixels for given classFP
- number of false positive pixels for given class
Image, shape - 1,3,720,960
, format is B,C,H,W
where:
B
- batch sizeC
- channelH
- heightW
- width
Channel order is BGR
.
Semantic segmentation class prediction map, shape - 1,720,960
, output data format is B,H,W
where:
B
- batch sizeH
- horizontal coordinate of the input pixelW
- vertical coordinate of the input pixel
containing the class prediction result of each pixel.
[*] Other names and brands may be claimed as the property of others.