CPU: ncnn, ONNXRuntime, OpenVINO
GPU: TensorRT, PPLNN
- Ubuntu 18.04 操作系统
- Cuda 11.3
- TensorRT 7.2.3.4
- Docker 20.10.8
- NVIDIA tesla T4 显卡.
- 静态图导出
- 批次大小为 1
- 每次推理后均同步
- 延迟基准测试时,我们计算各个数据集中100张图片的平均延时。
- 热身。 针对分类任务,我们热身1010轮。 对其他任务,我们热身10轮。
- 输入分辨率根据代码库的数据集不同而不同,除了
mmediting
,其他代码库均使用真实图片作为输入。
用户可以直接通过如何测试延迟获得想要的速度测试结果。下面是我们环境中的测试结果:
MMCls
MMCls |
TensorRT |
PPLNN |
NCNN |
|
Model |
Dataset |
Input |
fp32 |
fp16 |
int8 |
fp16 |
SnapDragon888-fp32 |
Adreno660-fp32 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
ResNet |
ImageNet |
1x3x224x224 |
2.97 |
336.90 |
1.26 |
791.89 |
1.21 |
829.66 |
1.30 |
768.28 |
33.91 |
29.49 |
25.93 |
38.57 |
$MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py |
ResNeXt |
ImageNet |
1x3x224x224 |
4.31 |
231.93 |
1.42 |
703.42 |
1.37 |
727.42 |
1.36 |
737.67 |
133.44 |
7.49 |
69.38 |
14.41 |
$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py |
SE-ResNet |
ImageNet |
1x3x224x224 |
3.41 |
293.64 |
1.66 |
600.73 |
1.51 |
662.90 |
1.91 |
524.07 |
107.84 |
9.27 |
80.85 |
12.37 |
$MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py |
ShuffleNetV2 |
ImageNet |
1x3x224x224 |
1.37 |
727.94 |
1.19 |
841.36 |
1.13 |
883.47 |
4.69 |
213.33 |
9.55 |
104.71 |
10.66 |
93.81 |
$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py |
MMDet
MMDet |
TensorRT |
PPLNN |
|
Model |
Dataset |
Input |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
YOLOv3 |
COCO |
1x3x320x320 |
14.76 |
67.76 |
24.92 |
40.13 |
24.92 |
40.13 |
18.07 |
55.35 |
$MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py |
SSD-Lite |
COCO |
1x3x320x320 |
8.84 |
113.12 |
9.21 |
108.56 |
8.04 |
124.38 |
19.72 |
50.71 |
$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py |
RetinaNet |
COCO |
1x3x800x1344 |
97.09 |
10.30 |
25.79 |
38.78 |
16.88 |
59.23 |
38.34 |
26.08 |
$MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py |
FCOS |
COCO |
1x3x800x1344 |
84.06 |
11.90 |
23.15 |
43.20 |
17.68 |
56.57 |
- |
- |
$MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py |
FSAF |
COCO |
1x3x800x1344 |
82.96 |
12.05 |
21.02 |
47.58 |
13.50 |
74.08 |
30.41 |
32.89 |
$MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py |
Faster-RCNN |
COCO |
1x3x800x1344 |
88.08 |
11.35 |
26.52 |
37.70 |
19.14 |
52.23 |
65.40 |
15.29 |
$MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py |
Mask-RCNN |
COCO |
1x3x800x1344 |
320.86 |
3.12 |
241.32 |
4.14 |
- |
- |
86.80 |
11.52 |
$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py |
MMDet |
NCNN |
|
Model |
Dataset |
Input |
SnapDragon888-fp32 |
Adreno660-fp32 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
MobileNetv2-YOLOv3 |
COCO |
1x3x320x320 |
48.57 |
20.59 |
66.55 |
15.03 |
$MMDET_DIR/configs/yolo/yolov3_mobilenetv2_mstrain-416_300e_coco.py |
SSD-Lite |
COCO |
1x3x320x320 |
44.91 |
22.27 |
66.19 |
15.11 |
$MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py |
MMEdit
MMEdit |
TensorRT |
PPLNN |
|
Model |
Input |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
ESRGAN |
1x3x32x32 |
12.64 |
79.14 |
12.42 |
80.50 |
12.45 |
80.35 |
7.67 |
130.39 |
$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py |
SRCNN |
1x3x32x32 |
0.70 |
1436.47 |
0.35 |
2836.62 |
0.26 |
3850.45 |
0.56 |
1775.11 |
$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py |
MMOCR
MMOCR |
TensorRT |
PPLNN |
NCNN |
|
Model |
Dataset |
Input |
fp32 |
fp16 |
int8 |
fp16 |
SnapDragon888-fp32 |
Adreno660-fp32 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
DBNet |
ICDAR2015 |
1x3x640x640 |
10.70 |
93.43 |
5.62 |
177.78 |
5.00 |
199.85 |
34.84 |
28.70 |
- |
- |
- |
- |
$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py |
CRNN |
IIIT5K |
1x1x32x32 |
1.93 |
518.28 |
1.40 |
713.88 |
1.36 |
736.79 |
- |
- |
10.57 |
94.64 |
20.00 |
50.00 |
$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py |
MMSeg
MMSeg |
TensorRT |
PPLNN |
|
Model |
Dataset |
Input |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
latency (ms) |
FPS |
FCN |
Cityscapes |
1x3x512x1024 |
128.42 |
7.79 |
23.97 |
41.72 |
18.13 |
55.15 |
27.00 |
37.04 |
$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py |
PSPNet |
Cityscapes |
1x3x512x1024 |
119.77 |
8.35 |
24.10 |
41.49 |
16.33 |
61.23 |
27.26 |
36.69 |
$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py |
DeepLabV3 |
Cityscapes |
1x3x512x1024 |
226.75 |
4.41 |
31.80 |
31.45 |
19.85 |
50.38 |
36.01 |
27.77 |
$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py |
DeepLabV3+ |
Cityscapes |
1x3x512x1024 |
151.25 |
6.61 |
47.03 |
21.26 |
50.38 |
26.67 |
34.80 |
28.74 |
$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py |
用户可以直接通过如何测试性能获得想要的性能测试结果。下面是我们环境中的测试结果:
MMCls
MMCls |
PyTorch |
ONNX Runtime |
TensorRT |
PPLNN |
|
Model |
Task |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
ResNet-18 |
Classification |
top-1 |
69.90 |
69.88 |
69.88 |
69.86 |
69.86 |
69.86 |
$MMCLS_DIR/configs/resnet/resnet18_b32x8_imagenet.py |
top-5 |
89.43 |
89.34 |
89.34 |
89.33 |
89.38 |
89.34 |
ResNeXt-50 |
Classification |
top-1 |
77.90 |
77.90 |
77.90 |
- |
77.78 |
77.89 |
$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py |
top-5 |
93.66 |
93.66 |
93.66 |
- |
93.64 |
93.65 |
SE-ResNet-50 |
Classification |
top-1 |
77.74 |
77.74 |
77.74 |
77.75 |
77.63 |
77.73 |
$MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py |
top-5 |
93.84 |
93.84 |
93.84 |
93.83 |
93.72 |
93.84 |
ShuffleNetV1 1.0x |
Classification |
top-1 |
68.13 |
68.13 |
68.13 |
68.13 |
67.71 |
68.11 |
$MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py |
top-5 |
87.81 |
87.81 |
87.81 |
87.81 |
87.58 |
87.80 |
ShuffleNetV2 1.0x |
Classification |
top-1 |
69.55 |
69.55 |
69.55 |
69.54 |
69.10 |
69.54 |
$MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py |
top-5 |
88.92 |
88.92 |
88.92 |
88.91 |
88.58 |
88.92 |
MobileNet V2 |
Classification |
top-1 |
71.86 |
71.86 |
71.86 |
71.87 |
70.91 |
71.84 |
$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py |
top-5 |
90.42 |
90.42 |
90.42 |
90.40 |
89.85 |
90.41 |
MMDet
MMDet |
Pytorch |
ONNXRuntime |
TensorRT |
PPLNN |
OpenVINO |
|
Model |
Task |
Dataset |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
model config file |
YOLOV3 |
Object Detection |
COCO2017 |
box AP |
33.7 |
- |
33.5 |
33.5 |
33.5 |
- |
- |
$MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py |
SSD |
Object Detection |
COCO2017 |
box AP |
25.5 |
- |
25.5 |
25.5 |
- |
- |
- |
$MMDET_DIR/configs/ssd/ssd300_coco.py |
RetinaNet |
Object Detection |
COCO2017 |
box AP |
36.5 |
- |
36.4 |
36.4 |
36.3 |
36.5 |
- |
$MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py |
FCOS |
Object Detection |
COCO2017 |
box AP |
36.6 |
- |
36.6 |
36.5 |
- |
- |
- |
$MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py |
FSAF |
Object Detection |
COCO2017 |
box AP |
37.4 |
- |
37.4 |
37.4 |
37.2 |
37.4 |
- |
$MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py |
YOLOX |
Object Detection |
COCO2017 |
box AP |
40.5 |
- |
40.3 |
40.3 |
29.3 |
- |
- |
$MMDET_DIR/configs/yolox/yolox_s_8x8_300e_coco.py |
Faster R-CNN |
Object Detection |
COCO2017 |
box AP |
37.4 |
- |
37.3 |
37.3 |
37.1 |
37.3 |
- |
$MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py |
ATSS |
Object Detection |
COCO2017 |
box AP |
39.4 |
- |
39.4 |
39.4 |
- |
- |
- |
$MMDET_DIR/configs/atss/atss_r50_fpn_1x_coco.py |
Cascade R-CNN |
Object Detection |
COCO2017 |
box AP |
40.4 |
- |
40.4 |
40.4 |
- |
40.4 |
- |
$MMDET_DIR/configs/cascade_rcnn/cascade_rcnn_r50_caffe_fpn_1x_coco.py |
Mask R-CNN |
Instance Segmentation |
COCO2017 |
box AP |
38.2 |
- |
38.1 |
38.1 |
- |
38.0 |
- |
$MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py |
mask AP |
34.7 |
- |
33.7 |
33.7 |
- |
- |
- |
MMEdit
MMEdit |
Pytorch |
ONNX Runtime |
TensorRT |
PPLNN |
|
Model |
Task |
Dataset |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
SRCNN |
Super Resolution |
Set5 |
PSNR |
28.4316 |
28.4323 |
28.4323 |
28.4286 |
28.1995 |
28.4311 |
$MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py |
SSIM |
0.8099 |
0.8097 |
0.8097 |
0.8096 |
0.7934 |
0.8096 |
ESRGAN |
Super Resolution |
Set5 |
PSNR |
28.2700 |
28.2592 |
28.2592 |
- |
- |
28.2624 |
$MMEDIT_DIR/configs/restorers/esrgan/esrgan_x4c64b23g32_g1_400k_div2k.py |
SSIM |
0.7778 |
0.7764 |
0.7774 |
- |
- |
0.7765 |
ESRGAN-PSNR |
Super Resolution |
Set5 |
PSNR |
30.6428 |
30.6444 |
30.6430 |
- |
- |
27.0426 |
$MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py |
|
0.8559 |
0.8558 |
0.8558 |
- |
- |
0.8557 |
SRGAN |
Super Resolution |
Set5 |
PSNR |
27.9499 |
27.9408 |
27.9408 |
- |
- |
27.9388 |
$MMEDIT_DIR/configs/restorers/srresnet_srgan/srgan_x4c64b16_g1_1000k_div2k.pyy |
SSIM |
0.7846 |
0.7839 |
0.7839 |
- |
- |
0.7839 |
SRResNet |
Super Resolution |
Set5 |
PSNR |
30.2252 |
30.2300 |
30.2300 |
- |
- |
30.2294 |
$MMEDIT_DIR/configs/restorers/srresnet_srgan/msrresnet_x4c64b16_g1_1000k_div2k.py |
|
0.8491 |
0.8488 |
0.8488 |
- |
- |
0.8488 |
Real-ESRNet |
Super Resolution |
Set5 |
PSNR |
28.0297 |
27.7016 |
27.7016 |
- |
- |
27.7049 |
$MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py |
SSIM |
0.8236 |
0.8122 |
0.8122 |
- |
- |
0.8123 |
EDSR |
Super Resolution |
Set5 |
PSNR |
30.2223 |
30.2214 |
30.2214 |
30.2211 |
30.1383 |
- |
$MMEDIT_DIR/configs/restorers/edsr/edsr_x4c64b16_g1_300k_div2k.py |
SSIM |
0.8500 |
0.8497 |
0.8497 |
0.8497 |
0.8469 |
- |
MMOCR
MMOCR |
Pytorch |
ONNXRuntime |
TensorRT |
PPLNN |
OpenVINO |
|
Model |
Task |
Dataset |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
fp32 |
model config file |
DBNet* |
TextDetection |
ICDAR2015 |
recall |
0.7310 |
0.7304 |
0.7198 |
0.7179 |
0.7111 |
0.7304 |
0.7309 |
$MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py |
precision |
0.8714 |
0.8718 |
0.8677 |
0.8674 |
0.8688 |
0.8718 |
0.8714 |
hmean |
0.7950 |
0.7949 |
0.7868 |
0.7856 |
0.7821 |
0.7949 |
0.7950 |
CRNN |
TextRecognition |
IIIT5K |
acc |
0.8067 |
0.8067 |
0.8067 |
0.8063 |
0.8067 |
0.8067 |
- |
$MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py |
SAR |
TextRecognition |
IIIT5K |
acc |
0.9517 |
0.9287 |
- |
- |
- |
- |
- |
$MMOCR_DIR/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py |
MMSeg
MMSeg |
Pytorch |
ONNXRuntime |
TensorRT |
PPLNN |
|
Model |
Dataset |
Metrics |
fp32 |
fp32 |
fp32 |
fp16 |
int8 |
fp16 |
model config file |
FCN |
Cityscapes |
mIoU |
72.25 |
- |
72.36 |
72.35 |
74.19 |
- |
$MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py |
PSPNet |
Cityscapes |
mIoU |
78.55 |
- |
78.26 |
78.24 |
77.97 |
- |
$MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py |
deeplabv3 |
Cityscapes |
mIoU |
79.09 |
- |
79.12 |
79.12 |
78.96 |
- |
$MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py |
deeplabv3+ |
Cityscapes |
mIoU |
79.61 |
- |
79.6 |
79.6 |
79.43 |
- |
$MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py |
Fast-SCNN |
Cityscapes |
mIoU |
70.96 |
- |
70.93 |
70.92 |
66.0 |
- |
$MMSEG_DIR/configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py |
-
由于某些数据集在代码库中包含各种分辨率的图像,例如 MMDet,速度基准是通过 MMDeploy 中的静态配置获得的,而性能基准是通过动态配置获得的。
-
TensorRT 的一些 int8 性能基准测试需要具有 tensor core 的 Nvidia 卡,否则性能会大幅下降。
-
DBNet 在模型的颈部使用了nearest
插值模式,TensorRT-7 应用了与 Pytorch 完全不同的策略。为了使与 TensorRT-7 兼容,我们重写了neck
以使用bilinear
插值模式,这提高了最终检测性能。为了获得与 Pytorch 匹配的性能,推荐使用 TensorRT-8+,其插值方法与 Pytorch 相同。