Skip to content

Latest commit

 

History

History
1296 lines (1269 loc) · 40.3 KB

benchmark.md

File metadata and controls

1296 lines (1269 loc) · 40.3 KB

基准

后端

CPU: ncnn, ONNXRuntime, OpenVINO

GPU: TensorRT, PPLNN

延迟基准

平台

  • Ubuntu 18.04 操作系统
  • Cuda 11.3
  • TensorRT 7.2.3.4
  • Docker 20.10.8
  • NVIDIA tesla T4 显卡.

其他设置

  • 静态图导出
  • 批次大小为 1
  • 每次推理后均同步
  • 延迟基准测试时,我们计算各个数据集中100张图片的平均延时。
  • 热身。 针对分类任务,我们热身1010轮。 对其他任务,我们热身10轮。
  • 输入分辨率根据代码库的数据集不同而不同,除了mmediting,其他代码库均使用真实图片作为输入。

用户可以直接通过如何测试延迟获得想要的速度测试结果。下面是我们环境中的测试结果:

MMCls
MMCls TensorRT PPLNN NCNN
Model Dataset Input fp32 fp16 int8 fp16 SnapDragon888-fp32 Adreno660-fp32 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
ResNet ImageNet 1x3x224x224 2.97 336.90 1.26 791.89 1.21 829.66 1.30 768.28 33.91 29.49 25.93 38.57 $MMCLS_DIR/configs/resnet/resnet50_b32x8_imagenet.py
ResNeXt ImageNet 1x3x224x224 4.31 231.93 1.42 703.42 1.37 727.42 1.36 737.67 133.44 7.49 69.38 14.41 $MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
SE-ResNet ImageNet 1x3x224x224 3.41 293.64 1.66 600.73 1.51 662.90 1.91 524.07 107.84 9.27 80.85 12.37 $MMCLS_DIR/configs/seresnet/seresnet50_b32x8_imagenet.py
ShuffleNetV2 ImageNet 1x3x224x224 1.37 727.94 1.19 841.36 1.13 883.47 4.69 213.33 9.55 104.71 10.66 93.81 $MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py
MMDet
MMDet TensorRT PPLNN
Model Dataset Input fp32 fp16 int8 fp16 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
YOLOv3 COCO 1x3x320x320 14.76 67.76 24.92 40.13 24.92 40.13 18.07 55.35 $MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD-Lite COCO 1x3x320x320 8.84 113.12 9.21 108.56 8.04 124.38 19.72 50.71 $MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
RetinaNet COCO 1x3x800x1344 97.09 10.30 25.79 38.78 16.88 59.23 38.34 26.08 $MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS COCO 1x3x800x1344 84.06 11.90 23.15 43.20 17.68 56.57 - - $MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF COCO 1x3x800x1344 82.96 12.05 21.02 47.58 13.50 74.08 30.41 32.89 $MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
Faster-RCNN COCO 1x3x800x1344 88.08 11.35 26.52 37.70 19.14 52.23 65.40 15.29 $MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
Mask-RCNN COCO 1x3x800x1344 320.86 3.12 241.32 4.14 - - 86.80 11.52 $MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py
MMDet NCNN
Model Dataset Input SnapDragon888-fp32 Adreno660-fp32 model config file
latency (ms) FPS latency (ms) FPS
MobileNetv2-YOLOv3 COCO 1x3x320x320 48.57 20.59 66.55 15.03 $MMDET_DIR/configs/yolo/yolov3_mobilenetv2_mstrain-416_300e_coco.py
SSD-Lite COCO 1x3x320x320 44.91 22.27 66.19 15.11 $MMDET_DIR/configs/ssd/ssdlite_mobilenetv2_scratch_600e_coco.py
MMEdit
MMEdit TensorRT PPLNN
Model Input fp32 fp16 int8 fp16 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
ESRGAN 1x3x32x32 12.64 79.14 12.42 80.50 12.45 80.35 7.67 130.39 $MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
SRCNN 1x3x32x32 0.70 1436.47 0.35 2836.62 0.26 3850.45 0.56 1775.11 $MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py
MMOCR
MMOCR TensorRT PPLNN NCNN
Model Dataset Input fp32 fp16 int8 fp16 SnapDragon888-fp32 Adreno660-fp32 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
DBNet ICDAR2015 1x3x640x640 10.70 93.43 5.62 177.78 5.00 199.85 34.84 28.70 - - - - $MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
CRNN IIIT5K 1x1x32x32 1.93 518.28 1.40 713.88 1.36 736.79 - - 10.57 94.64 20.00 50.00 $MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py
MMSeg
MMSeg TensorRT PPLNN
Model Dataset Input fp32 fp16 int8 fp16 model config file
latency (ms) FPS latency (ms) FPS latency (ms) FPS latency (ms) FPS
FCN Cityscapes 1x3x512x1024 128.42 7.79 23.97 41.72 18.13 55.15 27.00 37.04 $MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet Cityscapes 1x3x512x1024 119.77 8.35 24.10 41.49 16.33 61.23 27.26 36.69 $MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3 Cityscapes 1x3x512x1024 226.75 4.41 31.80 31.45 19.85 50.38 36.01 27.77 $MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py
DeepLabV3+ Cityscapes 1x3x512x1024 151.25 6.61 47.03 21.26 50.38 26.67 34.80 28.74 $MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py

性能基准

用户可以直接通过如何测试性能获得想要的性能测试结果。下面是我们环境中的测试结果:

MMCls
MMCls PyTorch ONNX Runtime TensorRT PPLNN
Model Task Metrics fp32 fp32 fp32 fp16 int8 fp16 model config file
ResNet-18 Classification top-1 69.90 69.88 69.88 69.86 69.86 69.86 $MMCLS_DIR/configs/resnet/resnet18_b32x8_imagenet.py
top-5 89.43 89.34 89.34 89.33 89.38 89.34
ResNeXt-50 Classification top-1 77.90 77.90 77.90 - 77.78 77.89 $MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
top-5 93.66 93.66 93.66 - 93.64 93.65
SE-ResNet-50 Classification top-1 77.74 77.74 77.74 77.75 77.63 77.73 $MMCLS_DIR/configs/resnext/resnext50_32x4d_b32x8_imagenet.py
top-5 93.84 93.84 93.84 93.83 93.72 93.84
ShuffleNetV1 1.0x Classification top-1 68.13 68.13 68.13 68.13 67.71 68.11 $MMCLS_DIR/configs/shufflenet_v1/shufflenet_v1_1x_b64x16_linearlr_bn_nowd_imagenet.py
top-5 87.81 87.81 87.81 87.81 87.58 87.80
ShuffleNetV2 1.0x Classification top-1 69.55 69.55 69.55 69.54 69.10 69.54 $MMCLS_DIR/configs/shufflenet_v2/shufflenet_v2_1x_b64x16_linearlr_bn_nowd_imagenet.py
top-5 88.92 88.92 88.92 88.91 88.58 88.92
MobileNet V2 Classification top-1 71.86 71.86 71.86 71.87 70.91 71.84 $MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py
top-5 90.42 90.42 90.42 90.40 89.85 90.41
MMDet
MMDet Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
Model Task Dataset Metrics fp32 fp32 fp32 fp16 int8 fp16 fp32 model config file
YOLOV3 Object Detection COCO2017 box AP 33.7 - 33.5 33.5 33.5 - - $MMDET_DIR/configs/yolo/yolov3_d53_320_273e_coco.py
SSD Object Detection COCO2017 box AP 25.5 - 25.5 25.5 - - - $MMDET_DIR/configs/ssd/ssd300_coco.py
RetinaNet Object Detection COCO2017 box AP 36.5 - 36.4 36.4 36.3 36.5 - $MMDET_DIR/configs/retinanet/retinanet_r50_fpn_1x_coco.py
FCOS Object Detection COCO2017 box AP 36.6 - 36.6 36.5 - - - $MMDET_DIR/configs/fcos/fcos_r50_caffe_fpn_gn-head_1x_coco.py
FSAF Object Detection COCO2017 box AP 37.4 - 37.4 37.4 37.2 37.4 - $MMDET_DIR/configs/fsaf/fsaf_r50_fpn_1x_coco.py
YOLOX Object Detection COCO2017 box AP 40.5 - 40.3 40.3 29.3 - - $MMDET_DIR/configs/yolox/yolox_s_8x8_300e_coco.py
Faster R-CNN Object Detection COCO2017 box AP 37.4 - 37.3 37.3 37.1 37.3 - $MMDET_DIR/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
ATSS Object Detection COCO2017 box AP 39.4 - 39.4 39.4 - - - $MMDET_DIR/configs/atss/atss_r50_fpn_1x_coco.py
Cascade R-CNN Object Detection COCO2017 box AP 40.4 - 40.4 40.4 - 40.4 - $MMDET_DIR/configs/cascade_rcnn/cascade_rcnn_r50_caffe_fpn_1x_coco.py
Mask R-CNN Instance Segmentation COCO2017 box AP 38.2 - 38.1 38.1 - 38.0 - $MMDET_DIR/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py
mask AP 34.7 - 33.7 33.7 - - -
MMEdit
MMEdit Pytorch ONNX Runtime TensorRT PPLNN
Model Task Dataset Metrics fp32 fp32 fp32 fp16 int8 fp16 model config file
SRCNN Super Resolution Set5 PSNR 28.4316 28.4323 28.4323 28.4286 28.1995 28.4311 $MMEDIT_DIR/configs/restorers/srcnn/srcnn_x4k915_g1_1000k_div2k.py
SSIM 0.8099 0.8097 0.8097 0.8096 0.7934 0.8096
ESRGAN Super Resolution Set5 PSNR 28.2700 28.2592 28.2592 - - 28.2624 $MMEDIT_DIR/configs/restorers/esrgan/esrgan_x4c64b23g32_g1_400k_div2k.py
SSIM 0.7778 0.7764 0.7774 - - 0.7765
ESRGAN-PSNR Super Resolution Set5 PSNR 30.6428 30.6444 30.6430 - - 27.0426 $MMEDIT_DIR/configs/restorers/esrgan/esrgan_psnr_x4c64b23g32_g1_1000k_div2k.py
0.8559 0.8558 0.8558 - - 0.8557
SRGAN Super Resolution Set5 PSNR 27.9499 27.9408 27.9408 - - 27.9388 $MMEDIT_DIR/configs/restorers/srresnet_srgan/srgan_x4c64b16_g1_1000k_div2k.pyy
SSIM 0.7846 0.7839 0.7839 - - 0.7839
SRResNet Super Resolution Set5 PSNR 30.2252 30.2300 30.2300 - - 30.2294 $MMEDIT_DIR/configs/restorers/srresnet_srgan/msrresnet_x4c64b16_g1_1000k_div2k.py
0.8491 0.8488 0.8488 - - 0.8488
Real-ESRNet Super Resolution Set5 PSNR 28.0297 27.7016 27.7016 - - 27.7049 $MMEDIT_DIR/configs/restorers/real_esrgan/realesrnet_c64b23g32_12x4_lr2e-4_1000k_df2k_ost.py
SSIM 0.8236 0.8122 0.8122 - - 0.8123
EDSR Super Resolution Set5 PSNR 30.2223 30.2214 30.2214 30.2211 30.1383 - $MMEDIT_DIR/configs/restorers/edsr/edsr_x4c64b16_g1_300k_div2k.py
SSIM 0.8500 0.8497 0.8497 0.8497 0.8469 -
MMOCR
MMOCR Pytorch ONNXRuntime TensorRT PPLNN OpenVINO
Model Task Dataset Metrics fp32 fp32 fp32 fp16 int8 fp16 fp32 model config file
DBNet* TextDetection ICDAR2015 recall 0.7310 0.7304 0.7198 0.7179 0.7111 0.7304 0.7309 $MMOCR_DIR/configs/textdet/dbnet/dbnet_r18_fpnc_1200e_icdar2015.py
precision 0.8714 0.8718 0.8677 0.8674 0.8688 0.8718 0.8714
hmean 0.7950 0.7949 0.7868 0.7856 0.7821 0.7949 0.7950
CRNN TextRecognition IIIT5K acc 0.8067 0.8067 0.8067 0.8063 0.8067 0.8067 - $MMOCR_DIR/configs/textrecog/crnn/crnn_academic_dataset.py
SAR TextRecognition IIIT5K acc 0.9517 0.9287 - - - - - $MMOCR_DIR/configs/textrecog/sar/sar_r31_parallel_decoder_academic.py
MMSeg
MMSeg Pytorch ONNXRuntime TensorRT PPLNN
Model Dataset Metrics fp32 fp32 fp32 fp16 int8 fp16 model config file
FCN Cityscapes mIoU 72.25 - 72.36 72.35 74.19 - $MMSEG_DIR/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py
PSPNet Cityscapes mIoU 78.55 - 78.26 78.24 77.97 - $MMSEG_DIR/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py
deeplabv3 Cityscapes mIoU 79.09 - 79.12 79.12 78.96 - $MMSEG_DIR/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py
deeplabv3+ Cityscapes mIoU 79.61 - 79.6 79.6 79.43 - $MMSEG_DIR/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py
Fast-SCNN Cityscapes mIoU 70.96 - 70.93 70.92 66.0 - $MMSEG_DIR/configs/fastscnn/fast_scnn_lr0.12_8x4_160k_cityscapes.py

注意

  • 由于某些数据集在代码库中包含各种分辨率的图像,例如 MMDet,速度基准是通过 MMDeploy 中的静态配置获得的,而性能基准是通过动态配置获得的。

  • TensorRT 的一些 int8 性能基准测试需要具有 tensor core 的 Nvidia 卡,否则性能会大幅下降。

  • DBNet 在模型的颈部使用了nearest插值模式,TensorRT-7 应用了与 Pytorch 完全不同的策略。为了使与 TensorRT-7 兼容,我们重写了neck以使用bilinear插值模式,这提高了最终检测性能。为了获得与 Pytorch 匹配的性能,推荐使用 TensorRT-8+,其插值方法与 Pytorch 相同。