Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test - Inference Dashboard #2

Open
anijain2305 opened this issue Dec 21, 2022 · 2 comments
Open

Test - Inference Dashboard #2

anijain2305 opened this issue Dec 21, 2022 · 2 comments

Comments

@anijain2305
Copy link
Owner

Testing the inference numbers

@anijain2305
Copy link
Owner Author

anijain2305 commented Dec 21, 2022

Inference Performance Dashboard for float32 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. This is inference run. For accuracy, we check the numerical correctness of forward pass outputs by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 90%, 55/61 | 100%, 46/46 | 98%, 60/61  |
|       aot_eager        | 87%, 53/61 | 100%, 46/46 | 98%, 60/61  |
|        inductor        | 84%, 51/61 | 100%, 46/46 | 97%, 59/61  |
| inductor_no_cudagraphs | 85%, 52/61 | 100%, 46/46 | 97%, 59/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.41x    |    1.34x    |    1.35x    |
| inductor_no_cudagraphs |   1.32x    |    1.33x    |    1.34x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.02    |    2.53     |    1.79     |
|       aot_eager        |    2.89    |    4.62     |    3.86     |
|        inductor        |    7.37    |    14.30    |    12.08    |
| inductor_no_cudagraphs |    7.15    |    12.33    |    11.95    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.05x    |    1.03x    |    1.18x    |
|       aot_eager        |   1.05x    |    1.03x    |    1.16x    |
|        inductor        |   1.05x    |    1.25x    |    1.12x    |
| inductor_no_cudagraphs |   1.11x    |    1.31x    |    1.18x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|                drq                |  1   | 1.0248 |  1.0061   |  3.1668  |         1.2794         |
|            hf_T5_large            |  1   | 1.092  |  0.8994   |  2.4551  |         2.1993         |
|         soft_actor_critic         | 256  | 0.998  |  0.9378   |  2.2791  |         1.177          |
|            hf_T5_base             |  1   | 1.0021 |  0.9981   |  2.1787  |         2.1532         |
|               dlrm                |  1   | 0.9848 |  1.0381   |  1.9782  |         1.1422         |
|           lennard_jones           | 1000 | 0.8456 |  0.8697   |  1.9276  |         0.8307         |
|             hf_Albert             |  16  | 1.001  |  1.0011   |  1.8985  |         1.872          |
|               hf_T5               |  4   | 1.0009 |  1.0005   |  1.8525  |         1.8416         |
|         phlippe_densenet          | 128  | 1.0064 |  1.1276   |  1.7133  |         1.3969         |
|            hf_Reformer            |  8   | 0.9982 |  1.0029   |  1.7116  |         1.7139         |
|              hf_GPT2              |  16  | 0.9999 |  0.9996   |  1.711   |         1.7076         |
|            timm_nfnet             | 128  | 0.9992 |  0.9997   |  1.7102  |         1.6956         |
|           hf_GPT2_large           |  1   | 1.0014 |  0.9993   |  1.6171  |         1.5953         |
|           hf_Longformer           |  4   | 0.9999 |  0.9995   |  1.5795  |         1.5745         |
|           squeezenet1_1           | 256  | 0.9993 |  0.9987   |  1.5639  |         1.5717         |
|        shufflenet_v2_x1_0         | 128  | 0.997  |  0.9974   |   1.5    |         1.5035         |
|            densenet121            |  64  |  1.0   |    1.0    |  1.4986  |         1.4819         |
|           timm_resnest            | 256  | 0.9986 |  1.0002   |  1.484   |         1.4853         |
|             resnet50              |  64  | 0.9989 |  0.9988   |  1.459   |         1.4515         |
|           pytorch_unet            |  4   | 0.9996 |  0.9998   |  1.4324  |         1.4317         |
|           fastNLP_Bert            |  16  | 0.9974 |  0.9947   |  1.4274  |         1.4248         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9939 |  0.9955   |  1.4214  |         1.3167         |
|             resnet152             |  64  | 0.9995 |  0.9994   |  1.4182  |         1.4057         |
|           mobilenet_v2            | 128  | 0.9994 |   0.999   |  1.4158  |         1.4159         |
|          resnext50_32x4d          |  64  | 0.9992 |  0.9991   |  1.4127  |          1.4           |
|            timm_regnet            |  32  | 0.9991 |  0.9972   |  1.4116  |         1.3319         |
|        Background_Matting         |  1   | 0.9985 |  0.9983   |  1.3682  |         1.3576         |
|        mobilenet_v3_large         | 128  | 0.9987 |  0.9993   |  1.3506  |         1.3526         |
|       functorch_dp_cifar10        | 512  | 0.9903 |  0.9807   |  1.3401  |         1.2551         |
|           BERT_pytorch            |  32  | 0.9956 |  0.9881   |  1.3312  |         1.3096         |
|            mnasnet1_0             | 128  | 0.9988 |  0.9994   |  1.3236  |         1.3244         |
|          phlippe_resnet           | 256  | 0.9931 |  0.9932   |  1.3221  |         1.2526         |
|        speech_transformer         |  1   | 0.9847 |  0.8724   |  1.3211  |         1.3313         |
|           hf_Bert_large           |  4   | 1.002  |  0.9983   |  1.3169  |         1.2937         |
|         timm_efficientnet         | 128  | 0.9989 |  0.9993   |  1.3158  |         1.3137         |
|              hf_Bert              |  8   | 1.0025 |  0.9989   |  1.3142  |         1.2965         |
|        doctr_det_predictor        |  4   | 1.0016 |  0.9909   |  1.3026  |         1.3014         |
|          LearningToPaint          | 256  | 0.9973 |  0.9983   |  1.2932  |         1.3189         |
|              yolov3               |  8   | 0.9977 |  0.9981   |  1.2837  |         1.2649         |
|           hf_DistilBert           |  16  | 1.0001 |  0.9994   |  1.2765  |         1.2712         |
|             resnet18              | 256  | 0.9994 |  0.9991   |  1.2672  |         1.2763         |
|            timm_vovnet            | 128  | 0.9993 |  0.9996   |  1.2624  |         1.2581         |
|               vgg16               |  8   | 0.9958 |  0.9947   |  1.1891  |         1.1677         |
|            Super_SloMo            |  8   | 0.9998 |  0.9995   |  1.182   |         1.175          |
|              alexnet              | 1024 | 0.9994 |  0.9993   |  1.1534  |         1.1938         |
|          vision_maskrcnn          |  4   | 0.9426 |  0.9268   |  1.1437  |         1.1797         |
|      timm_vision_transformer      | 128  | 0.9974 |   0.998   |  1.1168  |         1.1111         |
| attention_is_all_you_need_pytorch | 256  | 0.9986 |  0.9973   |  1.1138  |         1.0949         |
|          pytorch_stargan          |  16  | 0.9969 |  0.9961   |  1.1131  |         1.1136         |
|              hf_Bart              |  8   | 1.0029 |  0.9958   |  1.1012  |         1.1011         |
|               dcgan               | 1024 | 0.9981 |   0.998   |  1.0577  |         1.0596         |
|   timm_vision_transformer_large   |  8   | 1.0005 |  1.0008   |  1.0494  |         1.0384         |
|              demucs               |  32  | 1.0001 |  0.9996   |  0.9998  |         0.9997         |
|       doctr_reco_predictor        |  64  | 0.9938 |  0.9975   |  0.9946  |         0.9935         |
|            tts_angular            | 512  | 0.9963 |  0.9954   |  0.9924  |         0.9982         |
|      nvidia_deeprecommender       | 512  | 0.9963 |  0.9946   |  0.8845  |         0.9918         |
|            hf_BigBird             |  4   | 0.9967 |   0.993   |   0.0    |         1.2566         |
|             tacotron2             | 128  | 1.0966 |    0.0    |   0.0    |          0.0           |
|               moco                |  64  | 0.9958 |    0.0    |   0.0    |          0.0           |
|          DALLE2_pytorch           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|     detectron2_fcos_r_50_fpn      |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientdet         |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |       pass       |       pass       |       pass       |          pass          |
|        Background_Matting         |  1  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|       doctr_reco_predictor        |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |   fail_to_run    |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          DALLE2_pytorch           |  4  |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          vision_maskrcnn          |  4  |       pass       |       pass       |      0.0000      |         0.0000         |
|             tacotron2             |  4  |       pass       |   fail_to_run    |      0.0000      |         0.0000         |
|              demucs               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|     detectron2_fcos_r_50_fpn      |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|        doctr_det_predictor        |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|          vision_maskrcnn          |  4   |  8.241   |  15.7649  | 66.0949  |        42.4555         |
|           hf_Longformer           |  4   |  4.8308  |  9.1081   | 33.8975  |        32.2896         |
|            hf_T5_large            |  1   | 11.2894  |  17.4488  | 30.9691  |        30.4491         |
|              yolov3               |  8   |  1.6802  |  3.6443   | 17.8457  |        17.3166         |
|           hf_GPT2_large           |  1   |  4.7435  |   8.421   | 17.3706  |        17.0395         |
| attention_is_all_you_need_pytorch | 256  |  1.2609  |  2.7682   | 17.1504  |        15.7586         |
|   timm_vision_transformer_large   |  8   |  2.6453  |  6.1895   |  16.314  |        15.8107         |
|        speech_transformer         |  1   |  1.6536  |  3.5834   | 16.2521  |        15.0244         |
|            hf_T5_base             |  1   |  5.0749  |  8.1023   | 16.0494  |        15.3537         |
|            densenet121            |  64  |  2.1336  |  5.2594   | 15.2494  |        15.0479         |
|            hf_Reformer            |  8   |  1.4987  |  2.6034   | 12.8986  |        12.2529         |
|              hf_Bart              |  8   |  1.7111  |  3.3115   | 12.6101  |        12.5953         |
|             resnet152             |  64  |  2.4461  |  6.2734   | 12.5083  |        12.0015         |
|               hf_T5               |  4   |  2.4876  |  4.0756   | 12.1149  |        11.1377         |
|           hf_Bert_large           |  4   |  3.0993  |  5.7654   | 10.8633  |        10.6527         |
|            Super_SloMo            |  8   |  1.204   |   2.925   | 10.0844  |         9.8696         |
|            timm_nfnet             | 128  |  2.1054  |  3.8291   |  8.8616  |         8.7343         |
|           fastNLP_Bert            |  16  |  1.5065  |  2.9006   |  8.2846  |         7.6391         |
|              hf_GPT2              |  16  |   1.55   |  2.7929   |  7.7324  |         7.5908         |
|           BERT_pytorch            |  32  |  1.4507  |  2.9121   |  7.4947  |         7.2631         |
|            timm_regnet            |  32  |  1.9834  |   3.722   |  7.1953  |         7.0248         |
|        doctr_det_predictor        |  4   |  1.261   |  3.0938   |  7.0458  |         6.6775         |
|         timm_efficientnet         | 128  |  1.5313  |   3.036   |  6.6266  |         6.4773         |
|      timm_vision_transformer      | 128  |  0.8927  |  1.9043   |  6.3691  |         6.0912         |
|           timm_resnest            | 256  |  0.6306  |  1.2882   |  6.2985  |         6.0496         |
|        shufflenet_v2_x1_0         | 128  |  1.0066  |  2.4716   |  5.9515  |         5.7353         |
|              hf_Bert              |  8   |  1.5695  |   2.895   |  5.8942  |         5.8154         |
|         phlippe_densenet          | 128  |  0.8806  |  2.1915   |  5.7873  |         5.5692         |
|             hf_Albert             |  16  |  1.4134  |  2.7094   |  5.7543  |         5.524          |
|        mobilenet_v3_large         | 128  |  0.9113  |  2.2511   |  5.6266  |         5.5884         |
|           mobilenet_v2            | 128  |  0.8741  |  2.2077   |  5.2335  |         5.2215         |
|          resnext50_32x4d          |  64  |  0.9003  |  2.2032   |  5.2287  |         4.8345         |
|            timm_vovnet            | 128  |  1.1426  |  2.1075   |  5.1691  |         4.9489         |
|        Background_Matting         |  1   |  0.9713  |  2.2791   |  5.1691  |         4.9209         |
|             resnet50              |  64  |  0.9459  |  2.3204   |  5.0908  |         4.8862         |
|           hf_DistilBert           |  16  |  0.7992  |  1.4186   |  5.0349  |         4.7274         |
|            mnasnet1_0             | 128  |  0.837   |  2.0486   |  4.7904  |         4.5988         |
|       functorch_dp_cifar10        | 512  |  0.3088  |   0.531   |  3.3274  |         3.2167         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.4308  |  1.0066   |  3.2657  |         3.0655         |
|           pytorch_unet            |  4   |  0.4827  |  1.1158   |  3.1365  |         2.8746         |
|          pytorch_stargan          |  16  |  0.4283  |  1.1197   |  2.813   |         2.7817         |
|             resnet18              | 256  |  0.4141  |  0.9234   |  2.6977  |         2.4564         |
|          LearningToPaint          | 256  |  0.4363  |  0.9683   |  2.6331  |         2.4596         |
|          phlippe_resnet           | 256  |  0.3997  |  0.9013   |  2.3902  |         2.2379         |
|           squeezenet1_1           | 256  |  0.2371  |  0.4091   |  1.9137  |         1.7443         |
|               vgg16               |  8   |  0.1809  |   0.302   |  1.456   |         1.4129         |
|              alexnet              | 1024 |  0.1525  |  0.2304   |  1.4328  |         1.2835         |
|                drq                |  1   |  0.2935  |   0.385   |  1.3473  |         1.2152         |
|               dlrm                |  1   |  0.2508  |  0.3632   |  1.2426  |         1.0564         |
|               dcgan               | 1024 |  0.1624  |   0.261   |  1.2185  |         1.2193         |
|      nvidia_deeprecommender       | 512  |  0.1823  |  0.2667   |  1.1612  |         1.0288         |
|         soft_actor_critic         | 256  |  0.2076  |  0.2584   |  1.0431  |         0.9338         |
|           lennard_jones           | 1000 |  0.1298  |  0.1847   |  1.0128  |         0.8478         |
|            tts_angular            | 512  |  0.1648  |   0.195   |  0.9414  |         0.8122         |
|       doctr_reco_predictor        |  64  |  0.7994  |  0.7884   |  0.6222  |         0.6206         |
|              demucs               |  32  |  0.2781  |  0.2782   |  0.1943  |         0.1864         |
|            hf_BigBird             |  4   |  3.4026  |  4.7473   |   nan    |        11.2452         |
|             tacotron2             | 128  | 119.4557 |    nan    |   nan    |          nan           |
|               moco                |  64  | 22.7028  |    nan    |   nan    |          nan           |
|          DALLE2_pytorch           |  0   |   nan    |    nan    |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|         timm_efficientnet         | 128  | 1.2148 |  1.2148   |  1.6531  |         1.792          |
|            timm_vovnet            | 128  | 1.2926 |  1.2926   |  1.5959  |         1.6554         |
|           pytorch_unet            |  4   | 1.5928 |  1.3548   |  1.5789  |         1.5928         |
|               hf_T5               |  4   | 1.0782 |  1.0782   |  1.5282  |         1.5313         |
|           mobilenet_v2            | 128  | 1.0723 |  1.0723   |  1.4645  |         1.5894         |
|            timm_nfnet             | 128  | 1.1588 |  1.6471   |  1.457   |         1.5071         |
|        mobilenet_v3_large         | 128  | 1.1002 |  1.1002   |  1.4117  |         1.5726         |
|              hf_Bart              |  8   | 1.004  |  1.0037   |  1.3553  |         1.3554         |
|            mnasnet1_0             | 128  | 1.1266 |  1.1266   |  1.3292  |         1.5086         |
|             resnet18              | 256  |  1.0   |    1.0    |  1.286   |         1.4285         |
| attention_is_all_you_need_pytorch | 256  | 1.0312 |  1.0292   |  1.2286  |         1.2344         |
|        Background_Matting         |  1   | 1.311  |   1.311   |  1.2134  |         1.2299         |
|              yolov3               |  8   | 1.2265 |  1.2264   |  1.2049  |         1.2265         |
|           squeezenet1_1           | 256  |  1.0   |    1.0    |  1.1629  |         1.299          |
|            hf_T5_base             |  1   | 1.0276 |  1.0276   |  1.1572  |         1.1587         |
|              demucs               |  32  | 1.1385 |  1.1385   |  1.1385  |         1.1385         |
|          phlippe_resnet           | 256  | 1.1717 |  1.1717   |  1.1008  |         1.1717         |
|        shufflenet_v2_x1_0         | 128  |  1.0   |    1.0    |  1.0768  |         1.3133         |
|         phlippe_densenet          | 128  | 1.2259 |  1.2259   |  1.0508  |         1.0796         |
|          pytorch_stargan          |  16  | 1.0494 |  1.0494   |  1.0494  |         1.0494         |
|        doctr_det_predictor        |  4   | 0.4934 |  0.4934   |  1.021   |         0.4912         |
|             hf_Albert             |  16  | 1.0232 |  1.0216   |  1.0192  |         1.0232         |
|           hf_DistilBert           |  16  | 1.016  |  1.0154   |  1.0133  |         1.016          |
|          resnext50_32x4d          |  64  |  1.0   |  0.9484   |  1.0086  |         1.0564         |
|             resnet50              |  64  | 1.056  |  0.9486   |  1.0085  |         1.0561         |
|             resnet152             |  64  | 1.0428 |  0.9597   |  1.0066  |         1.0429         |
|              hf_Bert              |  8   | 1.0088 |  1.0082   |  1.0058  |         1.0088         |
|           hf_Bert_large           |  4   | 1.0033 |   1.003   |  1.0016  |         1.0033         |
|              hf_GPT2              |  16  |  1.0   |  0.9995   |  0.9993  |          1.0           |
|               dlrm                |  1   |  1.0   |    1.0    |  0.999   |          1.0           |
|      nvidia_deeprecommender       | 512  | 1.001  |   1.001   |  0.999   |         1.1422         |
|       doctr_reco_predictor        |  64  | 0.997  |   0.997   |  0.997   |         0.997          |
|            hf_T5_large            |  1   | 1.0024 |  1.0024   |  0.9958  |         0.9964         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.9997   |  0.9947  |          1.0           |
|           hf_GPT2_large           |  1   |  1.0   |  0.9999   |  0.9945  |         0.9956         |
|               vgg16               |  8   |  1.0   |    1.0    |  0.9937  |          1.0           |
|   timm_vision_transformer_large   |  8   | 1.0039 |  1.0037   |  0.9839  |         0.9846         |
|            timm_regnet            |  32  |  1.0   |    1.0    |  0.9829  |         0.9998         |
|               dcgan               | 1024 |  1.0   |    1.0    |  0.9783  |          1.0           |
|           hf_Longformer           |  4   | 0.5916 |  0.5953   |  0.9664  |         0.9892         |
|           fastNLP_Bert            |  16  | 1.0619 |  1.0608   |  0.9541  |         0.9573         |
|            tts_angular            | 512  | 0.9982 |  0.9982   |  0.9537  |         0.9982         |
|       functorch_dp_cifar10        | 512  |  1.0   |    1.0    |  0.9463  |          1.0           |
|           timm_resnest            | 256  |  1.0   |  0.8998   |  0.9102  |         0.9472         |
|          LearningToPaint          | 256  |  1.0   |    1.0    |  0.8734  |          1.0           |
|              alexnet              | 1024 |  1.0   |  0.9167   |  0.8714  |          1.0           |
|            Super_SloMo            |  8   | 1.0841 |  0.9258   |  0.845   |         0.8639         |
|                drq                |  1   | 0.9627 |  0.9627   |  0.8437  |         0.9627         |
|           BERT_pytorch            |  32  | 1.0265 |  1.0265   |  0.8058  |         0.8087         |
|         soft_actor_critic         | 256  |  1.0   |    1.0    |   0.79   |          1.0           |
|            hf_Reformer            |  8   | 1.384  |  1.5123   |  0.7044  |         0.7573         |
|      timm_vision_transformer      | 128  | 1.1056 |  1.0986   |  0.6961  |         0.7486         |
|        speech_transformer         |  1   | 1.0655 |  1.0651   |  0.6679  |         0.6703         |
|            densenet121            |  64  | 1.1503 |  1.0007   |  0.5977  |         0.6177         |
|          vision_maskrcnn          |  4   | 0.7923 |  0.7922   |  0.5905  |         0.795          |
|           lennard_jones           | 1000 |  1.0   |    1.0    |  0.5622  |          1.0           |
|            hf_BigBird             |  4   | 0.8781 |   0.878   |   nan    |         0.8781         |
|               moco                |  64  | 1.0357 |    nan    |   nan    |          nan           |
|             tacotron2             | 128  | 0.7663 |    nan    |   nan    |          nan           |
|          DALLE2_pytorch           |  0   |  nan   |    nan    |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|          vision_maskrcnn          |  4   | 165.3751 | 168.3843  | 138.1865 |        133.836         |
|           hf_Longformer           |  4   | 177.7305 | 177.6922  | 112.3879 |        112.8816        |
|              demucs               |  32  | 96.4082  |  96.436   | 96.3731  |        96.4266         |
|              hf_GPT2              |  16  | 140.7779 | 140.7904  | 82.2663  |        82.4248         |
|   timm_vision_transformer_large   |  8   | 63.8748  |  63.8446  | 60.8244  |        61.6358         |
|               hf_T5               |  4   | 107.7393 | 107.7478  | 58.1867  |        58.5388         |
|            Super_SloMo            |  8   | 63.8387  |  63.8149  | 54.0027  |        54.3421         |
|           pytorch_unet            |  4   | 67.9805  |  67.9503  | 47.3894  |        47.4318         |
|            hf_T5_base             |  1   |  95.661  |  96.0823  | 43.8782  |        44.5879         |
|           timm_resnest            | 256  | 61.8914  |  61.8598  | 41.6678  |         41.623         |
|            timm_nfnet             | 128  | 70.4865  |  68.7478  | 40.1819  |         40.555         |
|           fastNLP_Bert            |  16  | 52.2602  |  52.6789  |  36.704  |        36.6037         |
|        doctr_det_predictor        |  4   | 48.2132  |  49.2009  | 36.6718  |        36.6766         |
|             resnet152             |  64  | 50.7557  |  50.8251  |  35.82   |        36.1333         |
|      timm_vision_transformer      | 128  | 33.9717  |  33.9482  | 30.3213  |        30.4728         |
|            timm_vovnet            | 128  | 37.8464  |  37.8799  | 30.0141  |        30.0896         |
|           hf_GPT2_large           |  1   | 46.6565  |  46.6913  | 28.8728  |        29.3264         |
|             hf_Albert             |  16  | 50.5565  |  50.5794  | 26.6975  |        27.0312         |
|              alexnet              | 1024 | 29.1828  |  29.1832  | 25.3136  |        24.4332         |
|              hf_Bart              |  8   | 26.2046  |  26.0725  | 23.8817  |        23.8524         |
|            hf_Reformer            |  8   | 39.5501  |  39.3432  | 23.0629  |        23.0115         |
|           hf_Bert_large           |  4   | 28.8009  |  28.8379  | 21.8645  |         22.314         |
|          resnext50_32x4d          |  64  | 29.7848  |  29.8986  | 21.1063  |         21.359         |
| attention_is_all_you_need_pytorch | 256  | 23.4359  |  23.4836  | 21.0349  |        21.3839         |
|         timm_efficientnet         | 128  | 27.3529  |  27.4112  | 20.7859  |        20.8285         |
|             resnet18              | 256  | 25.7838  |  25.7921  | 20.3283  |        20.1841         |
|            timm_regnet            |  32  | 27.7896  |  27.7935  | 19.7673  |        20.8208         |
|            densenet121            |  64  | 29.2286  |  29.2807  | 19.5376  |        19.7158         |
|            hf_T5_large            |  1   | 53.5439  |  65.0104  | 19.3227  |        21.4085         |
|           hf_DistilBert           |  16  | 23.0383  |  23.0379  | 18.0298  |        18.1205         |
|        Background_Matting         |  1   |  22.485  |  22.5089  | 16.4111  |        16.5229         |
|              hf_Bert              |  8   | 20.5463  |  20.6005  | 15.6761  |        15.9638         |
|             resnet50              |  64  | 22.5742  |  22.6059  | 15.4677  |        15.5654         |
|            mnasnet1_0             | 128  | 20.0301  |  19.9967  | 15.1213  |        15.0924         |
|           mobilenet_v2            | 128  | 19.2942  |  19.333   | 13.6182  |        13.6357         |
|               dcgan               | 1024 | 13.7557  |  13.7382  |  13.001  |        12.9693         |
|           squeezenet1_1           | 256  | 19.5143  |  19.5144  |  12.482  |        12.4382         |
|        mobilenet_v3_large         | 128  | 15.9305  |  15.9604  | 11.7933  |        11.7814         |
|              yolov3               |  8   | 14.7084  |  15.0619  | 11.4584  |        11.6107         |
|        speech_transformer         |  1   |  14.921  |  16.5492  | 11.0369  |        10.7523         |
|           BERT_pytorch            |  32  | 14.5162  |  14.5846  |  10.81   |        11.0379         |
|            tts_angular            | 512  |  8.8181  |  8.8531   |  8.9145  |         8.9159         |
|          LearningToPaint          | 256  |  9.4494  |  9.5336   |  7.2565  |         7.1171         |
|       doctr_reco_predictor        |  64  |  7.6247  |  7.3903   |  7.0454  |         6.9874         |
|          pytorch_stargan          |  16  |  7.0188  |  7.0007   |  6.2586  |         6.2707         |
|        shufflenet_v2_x1_0         | 128  |  7.8553  |  7.8608   |  5.2269  |         5.1969         |
|      nvidia_deeprecommender       | 512  |  4.6047  |  3.9878   |  5.1839  |         4.6375         |
|               vgg16               |  8   |  4.9927  |  4.9944   |  4.1875  |         4.2564         |
|         phlippe_densenet          | 128  |  5.3834  |  4.7674   |  3.3165  |         4.0308         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  3.9529  |  3.7929   |  3.0924  |         3.3416         |
|       functorch_dp_cifar10        | 512  |  3.0496  |  2.6854   |  2.2523  |         2.4017         |
|          phlippe_resnet           | 256  |  2.527   |  2.5101   |  1.885   |         2.0002         |
|               dlrm                |  1   |  0.8499  |  0.6733   |  0.3609  |         0.616          |
|                drq                |  1   |  0.6735  |  0.6316   |  0.2202  |         0.6486         |
|         soft_actor_critic         | 256  |  0.2723  |  0.3037   |  0.1285  |         0.243          |
|           lennard_jones           | 1000 |  0.2419  |  0.2356   |  0.115   |         0.3122         |
|            hf_BigBird             |  4   | 187.143  |  187.632  |   nan    |        148.2089        |
|             tacotron2             | 128  | 584.2111 |    nan    |   nan    |          nan           |
|               moco                |  64  | 47.8611  |    nan    |   nan    |          nan           |
|          DALLE2_pytorch           |  0   |   nan    |    nan    |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       MT5ForConditionalGeneration       | 16  | 1.073  |  0.9662   |  2.4505  |         1.9327         |
|            XLNetLMHeadModel             |  8  | 1.0019 |  1.0013   |  2.3162  |         2.3123         |
|          MobileBertForMaskedLM          | 64  | 1.0858 |   0.93    |  1.8978  |         1.5189         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |  0.9997   |  1.877   |         1.8589         |
|                 T5Small                 |  4  | 1.0048 |  0.9943   |  1.7975  |         1.7636         |
|       T5ForConditionalGeneration        |  4  | 1.0063 |  0.9958   |  1.7959  |         1.7666         |
|               DistillGPT2               | 16  | 0.9994 |  0.9993   |  1.7122  |         1.704          |
|           ElectraForCausalLM            | 32  | 1.0012 |  0.9993   |  1.6914  |         1.6705         |
|          AllenaiLongformerBase          |  4  | 0.9999 |  0.9902   |  1.5825  |         1.551          |
|               GoogleFnet                | 16  | 0.9994 |  0.9993   |  1.5738  |         1.7554         |
|       ElectraForQuestionAnswering       | 64  | 1.0005 |  0.9999   |  1.5086  |         1.4934         |
|             XGLMForCausalLM             |  8  | 1.0393 |  0.9278   |  1.4273  |         1.3205         |
|     MobileBertForQuestionAnswering      | 128 | 1.0042 |  1.0035   |  1.4074  |         1.3641         |
|           RobertaForCausalLM            | 16  | 1.0006 |  0.9994   |  1.4027  |         1.3916         |
|    LayoutLMForSequenceClassification    | 16  | 1.0001 |  0.9992   |  1.3995  |         1.3877         |
|            YituTechConvBert             | 16  | 0.9999 |  0.9997   |  1.3787  |         1.3667         |
|           LayoutLMForMaskedLM           | 16  |  1.0   |  0.9993   |  1.375   |         1.3666         |
|       RobertaForQuestionAnswering       | 16  | 1.0005 |   0.999   |  1.3735  |         1.3622         |
|        BertForQuestionAnswering         | 16  | 1.0003 |  0.9993   |  1.3696  |         1.3588         |
|             BertForMaskedLM             | 16  | 1.0005 |  1.0002   |  1.3514  |         1.3424         |
|                CamemBert                | 16  | 1.0003 |  0.9987   |  1.3513  |         1.3422         |
|            AlbertForMaskedLM            |  4  | 1.0009 |   1.001   |  1.327   |         1.3195         |
|       AlbertForQuestionAnswering        |  4  | 1.0003 |   1.001   |  1.3238  |         1.3195         |
|         MegatronBertForCausalLM         |  4  | 1.0028 |   1.002   |  1.2788  |         1.2596         |
|             OPTForCausalLM              |  2  | 0.9983 |  0.9976   |  1.2731  |         1.2907         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |  0.9994   |  1.2677  |         1.2547         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0014 |  0.9955   |  1.2589  |         1.2595         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.9995   |  1.2344  |         1.2303         |
|         Speech2Text2ForCausalLM         | 256 | 0.9967 |  0.9906   |  1.2227  |         1.2268         |
|             BartForCausalLM             |  4  | 1.0012 |   0.999   |  1.2184  |         1.2176         |
|            PLBartForCausalLM            |  8  | 1.0002 |  0.9998   |  1.2064  |         1.2054         |
|     DistilBertForQuestionAnswering      | 256 | 0.999  |  0.9992   |  1.2052  |         1.2024         |
|            MBartForCausalLM             |  4  | 1.0014 |  0.9992   |  1.1976  |         1.1975         |
|            TrOCRForCausalLM             | 32  | 0.9993 |   0.999   |  1.1622  |         1.1636         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0014 |  0.9978   |  1.1497  |         1.1473         |
|     PLBartForConditionalGeneration      |  4  | 1.0008 |  0.9957   |  1.1378  |          1.14          |
|     M2M100ForConditionalGeneration      | 16  | 0.997  |  0.9778   |  1.1244  |         1.1031         |
|           PegasusForCausalLM            | 32  | 0.9972 |  0.9945   |  1.1211  |         1.1119         |
|      BartForConditionalGeneration       |  2  | 1.0012 |  0.9983   |  1.1067  |         1.1067         |
|      MBartForConditionalGeneration      |  2  | 1.002  |  0.9984   |  1.0873  |         1.0867         |
|          DebertaV2ForMaskedLM           |  1  | 0.7431 |  0.6574   |  1.0803  |          0.77          |
|     PegasusForConditionalGeneration     | 32  | 0.9971 |  0.9967   |  1.0664  |         1.0566         |
|           DebertaForMaskedLM            |  4  | 0.8421 |   0.753   |  1.0602  |         1.0631         |
|          BlenderbotForCausalLM          |  4  | 1.002  |  0.9952   |  1.0515  |         1.0498         |
|       DebertaForQuestionAnswering       |  8  | 0.9946 |  0.9905   |  1.0361  |         1.1819         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9947 |  0.9676   |  0.9162  |         1.0231         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|               GoogleFnet                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|      DebertaV2ForQuestionAnswering      |  2  | 6.6259 |  10.3101  | 43.0451  |         21.019         |
|          DebertaV2ForMaskedLM           |  1  | 6.6731 |  10.4388  | 41.5626  |        19.9304         |
|          AllenaiLongformerBase          |  4  | 4.6241 |  9.5261   | 39.5228  |        37.0787         |
|           DebertaForMaskedLM            |  4  | 4.0661 |  6.3203   | 30.8606  |        16.1217         |
|       DebertaForQuestionAnswering       |  8  | 4.0516 |  6.4168   | 30.7544  |        15.3368         |
|     M2M100ForConditionalGeneration      | 16  | 3.646  |  6.8526   | 22.2439  |        21.2446         |
|     PegasusForConditionalGeneration     | 32  | 3.2597 |  6.7268   | 21.6447  |        20.7627         |
|      MBartForConditionalGeneration      |  2  | 3.3639 |  6.8885   | 20.1097  |        20.1785         |
|            XLNetLMHeadModel             |  8  | 4.1313 |  8.6555   | 19.7031  |        19.8304         |
|      BartForConditionalGeneration       |  2  | 3.3165 |  6.8814   | 19.3623  |        19.0427         |
|             XGLMForCausalLM             |  8  | 2.6261 |  5.2783   | 18.9194  |        18.4895         |
|          MobileBertForMaskedLM          | 64  | 7.0304 |  12.4774  | 18.3681  |        18.1248         |
|     MobileBertForQuestionAnswering      | 128 | 7.0857 |  12.3998  | 18.2001  |         18.185         |
|          BlenderbotForCausalLM          |  4  | 2.6878 |   5.143   | 18.1617  |        17.9775         |
| BlenderbotSmallForConditionalGeneration | 64  |  2.26  |  4.6878   | 14.8718  |        15.1227         |
|       MT5ForConditionalGeneration       | 16  | 3.4989 |  5.8067   | 14.8089  |        14.0701         |
|            YituTechConvBert             | 16  | 2.3358 |  4.6666   | 13.0127  |        12.8534         |
|           PegasusForCausalLM            | 32  | 1.3912 |   2.689   | 13.0048  |        12.4951         |
|     PLBartForConditionalGeneration      |  4  | 1.7426 |  3.4186   | 12.9575  |        13.1115         |
|             OPTForCausalLM              |  2  | 1.417  |  2.5998   | 11.5374  |        11.3977         |
|            MBartForCausalLM             |  4  | 1.2915 |  2.6257   | 11.3537  |        11.3893         |
|         MegatronBertForCausalLM         |  4  | 3.3362 |  5.9982   | 11.2485  |        10.9017         |
|                 T5Small                 |  4  | 2.5088 |  4.1588   |  10.982  |        10.3458         |
|    MegatronBertForQuestionAnswering     |  8  | 3.2037 |  6.0425   | 10.9613  |        10.8094         |
|             BartForCausalLM             |  4  | 1.3601 |   2.637   | 10.9474  |        10.8065         |
|            TrOCRForCausalLM             | 32  | 1.3512 |   2.604   | 10.8594  |        10.8888         |
|       T5ForConditionalGeneration        |  4  | 2.5392 |  4.1129   | 10.7617  |        10.3277         |
|               GoogleFnet                | 16  | 1.003  |  1.6715   | 10.5064  |         7.0694         |
|         Speech2Text2ForCausalLM         | 256 | 0.8117 |   1.502   |  10.36   |        10.0164         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9359 |  1.8596   | 10.1243  |        10.1521         |
|            PLBartForCausalLM            |  8  | 0.7022 |  1.3863   |  8.801   |         8.7473         |
|    LayoutLMForSequenceClassification    | 16  | 1.643  |  3.0454   |  8.0692  |         7.7074         |
|           LayoutLMForMaskedLM           | 16  | 1.6683 |  3.0666   |  8.0107  |         7.6432         |
|      GPT2ForSequenceClassification      |  4  | 1.5743 |  2.8658   |  7.8992  |         7.5908         |
|           ElectraForCausalLM            | 32  | 1.5976 |  2.9503   |  7.5459  |         7.1048         |
|           RobertaForCausalLM            | 16  | 1.6141 |  3.0263   |  7.0771  |         6.9572         |
|               DistillGPT2               | 16  | 0.8741 |  1.5713   |  6.9232  |         6.454          |
|       ElectraForQuestionAnswering       | 64  | 1.5801 |  3.0094   |  6.2399  |         6.0255         |
|        BertForQuestionAnswering         | 16  | 1.5959 |  2.9881   |  6.2223  |         5.7377         |
|                CamemBert                | 16  | 1.5999 |  3.0262   |  6.181   |         5.8014         |
|             BertForMaskedLM             | 16  | 1.5897 |  2.9618   |  6.1413  |         5.9362         |
|       RobertaForQuestionAnswering       | 16  | 1.6304 |  2.9387   |  6.0278  |         5.8623         |
|       AlbertForQuestionAnswering        |  4  | 1.4092 |  2.7625   |  5.7335  |         5.5299         |
|            AlbertForMaskedLM            |  4  | 1.419  |  2.6802   |  5.6957  |         5.3283         |
|     DistilBertForQuestionAnswering      | 256 | 0.7839 |  1.4999   |  5.3479  |         4.9536         |
|          DistilBertForMaskedLM          | 128 | 0.7729 |  1.4689   |  5.2032  |         4.919          |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|           ElectraForCausalLM            | 32  | 1.0027 |  1.0022   |  2.5791  |         2.5839         |
|               DistillGPT2               | 16  | 1.0042 |   1.004   |  2.0062  |         2.009          |
|           RobertaForCausalLM            | 16  | 1.0065 |   1.006   |  1.8216  |         1.8253         |
|          DistilBertForMaskedLM          | 128 | 1.0111 |  1.0107   |  1.7675  |         1.7705         |
|     MobileBertForQuestionAnswering      | 128 | 1.922  |   1.922   |  1.7542  |         1.7762         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0042 |  1.0035   |  1.696   |         1.696          |
|             OPTForCausalLM              |  2  | 1.0033 |  1.0031   |  1.6856  |         1.6856         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0041 |  1.0039   |  1.6824  |         1.6824         |
|          MobileBertForMaskedLM          | 64  | 1.0073 |  1.0073   |  1.6754  |         1.6806         |
|            PLBartForCausalLM            |  8  | 1.0058 |  1.0056   |  1.5986  |         1.5986         |
|         Speech2Text2ForCausalLM         | 256 | 0.8788 |  0.8788   |  1.5339  |         1.3822         |
|                CamemBert                | 16  | 1.0084 |  1.0078   |  1.5294  |         1.5328         |
|             BertForMaskedLM             | 16  | 1.0087 |  1.0081   |  1.5198  |         1.5233         |
|            YituTechConvBert             | 16  | 1.0087 |  1.0081   |  1.5185  |         1.5219         |
|           LayoutLMForMaskedLM           | 16  | 1.0086 |   1.008   |  1.512   |         1.5154         |
|       MT5ForConditionalGeneration       | 16  | 0.9997 |  0.9997   |  1.4498  |         1.4512         |
|            TrOCRForCausalLM             | 32  | 1.0063 |   1.006   |  1.4435  |         1.4435         |
|       T5ForConditionalGeneration        |  4  | 1.0096 |  1.0096   |  1.4288  |         1.4337         |
|                 T5Small                 |  4  | 1.0096 |  1.0096   |  1.4288  |         1.4337         |
|          AllenaiLongformerBase          |  4  | 0.9913 |  0.9913   |  1.364   |         1.3779         |
|     PLBartForConditionalGeneration      |  4  | 1.0041 |  1.0037   |  1.3604  |         1.3604         |
|             BartForCausalLM             |  4  | 1.0041 |  1.0039   |  1.2512  |         1.2512         |
|            MBartForCausalLM             |  4  | 1.0041 |  1.0039   |  1.2512  |         1.2512         |
|           PegasusForCausalLM            | 32  | 0.9073 |  0.9073   |  1.2273  |         1.1094         |
|             XGLMForCausalLM             |  8  | 0.9702 |  0.9702   |  1.1741  |         1.1398         |
|     M2M100ForConditionalGeneration      | 16  | 0.9363 |  0.9363   |  1.1724  |         1.1058         |
|     PegasusForConditionalGeneration     | 32  | 0.9932 |  0.9932   |  1.1665  |         1.1907         |
|         MegatronBertForCausalLM         |  4  | 1.0025 |  1.0022   |  1.1605  |         1.1622         |
|            XLNetLMHeadModel             |  8  | 1.0039 |  1.0035   |  1.0798  |         1.0798         |
|      BartForConditionalGeneration       |  2  | 1.0021 |  1.0018   |  1.0578  |         1.0578         |
|      MBartForConditionalGeneration      |  2  | 1.0021 |  1.0018   |  1.0532  |         1.0532         |
|          BlenderbotForCausalLM          |  4  | 1.0008 |  1.0009   |  1.0002  |         1.0002         |
|       AlbertForQuestionAnswering        |  4  | 1.0898 |  1.0896   |  0.9849  |         0.9865         |
|            AlbertForMaskedLM            |  4  | 1.0896 |  1.0894   |  0.9841  |         0.9857         |
|    MegatronBertForQuestionAnswering     |  8  | 1.0339 |  1.0334   |  0.9822  |         0.9836         |
|      GPT2ForSequenceClassification      |  4  | 1.0149 |  1.0145   |  0.9679  |         0.9699         |
|    LayoutLMForSequenceClassification    | 16  | 1.0927 |  1.0915   |  0.9642  |         0.9668         |
|        BertForQuestionAnswering         | 16  | 1.0946 |  1.0933   |  0.9632  |         0.966          |
|       RobertaForQuestionAnswering       | 16  | 1.0946 |  1.0933   |  0.9632  |         0.966          |
|       ElectraForQuestionAnswering       | 64  | 1.2343 |   1.224   |  0.9244  |         0.9287         |
|     DistilBertForQuestionAnswering      | 256 | 1.1401 |  1.1378   |  0.889   |         0.8911         |
|          DebertaV2ForMaskedLM           |  1  |  1.0   |    1.0    |  0.6022  |          1.0           |
|               GoogleFnet                | 16  |  1.0   |    1.0    |  0.596   |          1.0           |
|      DebertaV2ForQuestionAnswering      |  2  | 1.0016 |  1.0016   |  0.4218  |         0.9935         |
|           DebertaForMaskedLM            |  4  |  0.96  |   0.96    |  0.1842  |         0.9599         |
|       DebertaForQuestionAnswering       |  8  | 0.937  |   0.937   |  0.0892  |         0.9837         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 126.2986 |  126.324  | 96.3088  |        96.9241         |
|       AlbertForQuestionAnswering        |  4  | 125.4353 | 125.3501  | 96.0152  |        96.0583         |
|            XLNetLMHeadModel             |  8  | 155.7818 | 155.9737  | 67.2505  |        67.2324         |
|     PegasusForConditionalGeneration     | 32  | 54.6872  |  54.7408  | 51.1818  |        51.6328         |
|            TrOCRForCausalLM             | 32  | 55.7681  |  56.6637  | 48.9287  |        48.8672         |
|      MBartForConditionalGeneration      |  2  | 47.8816  |  47.9705  |  44.017  |        44.0139         |
|      BartForConditionalGeneration       |  2  | 47.8024  |  47.9847  | 43.1582  |        43.1273         |
|             OPTForCausalLM              |  2  |  53.648  |  53.3112  | 42.0745  |        41.2446         |
|    MegatronBertForQuestionAnswering     |  8  | 53.3403  |  53.3344  | 42.0692  |        42.4696         |
|            YituTechConvBert             | 16  | 55.6232  |  55.5669  | 40.2893  |        40.6707         |
|     DistilBertForQuestionAnswering      | 256 | 46.5554  |  46.594   | 38.8514  |         38.907         |
|            PLBartForCausalLM            |  8  | 45.0857  |  44.697   |  37.325  |        37.3495         |
|            MBartForCausalLM             |  4  | 43.6099  |  43.7791  | 36.5042  |        36.4795         |
|             BartForCausalLM             |  4  | 43.5584  |  42.0178  | 35.7077  |        35.8188         |
| BlenderbotSmallForConditionalGeneration | 64  | 40.2691  |  40.4938  | 35.1526  |        35.1403         |
|          DistilBertForMaskedLM          | 128 | 40.6536  |  40.6795  | 32.9732  |        33.0902         |
|     PLBartForConditionalGeneration      |  4  | 37.1676  |  37.2915  | 32.7308  |        31.9921         |
|                CamemBert                | 16  | 44.0406  |  44.1191  | 32.6231  |        32.8209         |
|           LayoutLMForMaskedLM           | 16  | 44.2901  |  44.3064  | 32.2304  |        32.4122         |
|             BertForMaskedLM             | 16  |  43.466  |  43.5017  | 32.2059  |        32.4179         |
|           RobertaForCausalLM            | 16  | 45.0913  |  45.1708  | 32.1996  |        32.4264         |
|      DebertaV2ForQuestionAnswering      |  2  | 28.1659  |  28.9917  | 30.5518  |        27.4803         |
|          AllenaiLongformerBase          |  4  | 47.8858  |  48.2592  | 30.2533  |        30.8194         |
|     M2M100ForConditionalGeneration      | 16  | 33.1684  |  34.1592  |  29.41   |        30.3396         |
|     MobileBertForQuestionAnswering      | 128 | 38.6966  |  38.8171  | 27.7223  |        28.5703         |
|       RobertaForQuestionAnswering       | 16  | 35.3476  |  35.4089  | 25.7754  |        25.9691         |
|        BertForQuestionAnswering         | 16  | 35.2181  |  35.2513  | 25.7682  |        25.9445         |
|    LayoutLMForSequenceClassification    | 16  | 35.9689  |  35.9852  | 25.7673  |        25.9488         |
|       ElectraForQuestionAnswering       | 64  | 38.4819  |  38.5101  | 25.5223  |        25.7736         |
|         MegatronBertForCausalLM         |  4  |  31.73   |  31.7292  | 24.8236  |        25.2269         |
|           PegasusForCausalLM            | 32  | 27.7385  |  27.642   | 24.5256  |        24.6589         |
|          BlenderbotForCausalLM          |  4  |  24.755  |  24.8309  | 23.6011  |        23.5165         |
|               GoogleFnet                | 16  | 37.0613  |  37.0595  |  23.593  |        21.0983         |
|               DistillGPT2               | 16  | 40.1614  |  40.1497  | 23.4443  |        23.5533         |
|       DebertaForQuestionAnswering       |  8  | 23.2316  |  23.3065  | 22.3748  |        19.5651         |
|          MobileBertForMaskedLM          | 64  | 35.5562  |  42.1256  | 19.9822  |        24.7356         |
|          DebertaV2ForMaskedLM           |  1  | 28.1455  |  31.2516  | 19.6317  |        27.0823         |
|         Speech2Text2ForCausalLM         | 256 | 24.1515  |  24.2044  | 19.5831  |        19.5111         |
|           ElectraForCausalLM            | 32  | 32.8913  |  32.9534  | 19.4924  |        19.7179         |
|       T5ForConditionalGeneration        |  4  |  34.403  |  34.8905  | 19.2715  |         19.658         |
|                 T5Small                 |  4  | 34.4086  |  34.9084  | 19.2599  |        19.6315         |
|       BlenderbotSmallForCausalLM        | 64  | 23.0937  |  23.217   | 18.3115  |         18.277         |
|      GPT2ForSequenceClassification      |  4  | 32.3442  |  32.4044  | 17.2526  |        17.4334         |
|             XGLMForCausalLM             |  8  | 23.0339  |  24.8851  | 16.8214  |        18.4984         |
|           DebertaForMaskedLM            |  4  |  19.344  |   21.69   | 15.5261  |        15.3074         |
|       MT5ForConditionalGeneration       | 16  | 27.8984  |  31.2039  | 12.3767  |        15.7708         |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|           dm_nfnet_f0           | 128 | 1.0004 |  0.9995   |  1.7446  |         1.7122         |
|        sebotnet33ts_256         | 64  | 0.9995 |  0.9994   |  1.6967  |         1.6855         |
|            nfnet_l0             | 128 | 0.9994 |  0.9998   |  1.658   |         1.6423         |
|           volo_d1_224           | 64  | 0.9995 |  0.9932   |  1.5704  |         1.5529         |
|       eca_botnext26ts_256       | 128 | 0.9994 |  1.0003   |  1.5412  |         1.5275         |
|           resnest101e           | 64  | 0.9987 |  1.0018   |  1.5374  |         1.4997         |
|      xcit_large_24_p8_224       |  5  | 1.0059 |  1.0041   |  1.5263  |         1.4942         |
|          gmlp_s16_224           | 128 | 0.9998 |  0.9992   |  1.5239  |         1.5154         |
|          botnet26t_256          | 128 | 0.9994 |  0.9998   |  1.5164  |         1.5143         |
|        eca_halonext26ts         | 128 | 0.9995 |  0.9999   |  1.5141  |         1.5003         |
|         poolformer_m36          | 64  | 1.0001 |  1.0002   |  1.4992  |         1.4788         |
|         coat_lite_mini          | 128 | 0.9998 |  0.9993   |  1.4682  |         1.4546         |
|           res2next50            | 128 | 0.9996 |  0.9995   |  1.4516  |         1.4274         |
|        res2net50_14w_8s         | 128 | 0.9996 |  0.9997   |  1.4468  |         1.4309         |
|            repvgg_a2            | 128 | 0.9997 |  0.9999   |  1.4285  |         1.4251         |
|           regnety_002           | 128 | 0.998  |  0.9987   |  1.4251  |         1.3912         |
|        res2net101_26w_4s        | 64  | 0.9994 |  0.9998   |  1.4243  |         1.406          |
|        tnt_s_patch16_224        | 128 | 1.0002 |  0.9999   |  1.423   |         1.4171         |
|         mobilenetv2_100         | 128 | 0.998  |  0.9988   |  1.4161  |         1.418          |
|          cait_m36_384           |  4  | 1.0001 |  1.0085   |  1.4105  |         1.3899         |
|          jx_nest_base           | 32  | 0.9994 |  0.9923   |  1.3962  |         1.3732         |
|          convnext_base          | 64  | 0.9989 |  0.9969   |  1.3945  |         1.3825         |
|           rexnet_100            | 128 | 0.999  |  0.9989   |  1.3907  |         1.3873         |
|          ghostnet_100           | 128 | 0.9984 |   0.999   |  1.3852  |         1.3863         |
|            tinynet_a            | 128 | 0.999  |   0.999   |  1.3786  |         1.3702         |
|          cspdarknet53           | 64  | 0.9999 |  1.0003   |  1.3753  |         1.3632         |
|            hrnet_w18            | 128 | 0.9994 |    1.0    |  1.3729  |         1.3542         |
|           convit_base           | 64  | 0.9998 |  0.9997   |  1.3698  |         1.3608         |
|            gernet_l             | 128 | 0.9996 |  0.9996   |  1.3612  |         1.3597         |
|        ese_vovnet19b_dw         | 128 | 0.9992 |  0.9996   |  1.3607  |          1.36          |
|             dla102              | 128 | 0.9996 |  0.9994   |  1.3535  |         1.3491         |
|      mobilenetv3_large_100      | 128 | 0.9982 |  0.9985   |  1.341   |         1.3411         |
|          gmixer_24_224          | 128 | 0.9993 |  0.9999   |  1.3364  |         1.3316         |
|          spnasnet_100           | 128 | 0.9985 |  0.9989   |  1.3357  |         1.3355         |
|       tf_efficientnet_b0        | 128 | 0.9996 |  0.9993   |  1.3354  |         1.3342         |
|           mnasnet_100           | 128 | 0.9986 |  0.9989   |  1.3177  |         1.3178         |
|             dpn107              | 32  | 0.9998 |  0.9995   |  1.3153  |         1.2975         |
|           tf_mixnet_l           | 128 | 0.9999 |  0.9999   |  1.3119  |         1.3004         |
|          resmlp_12_224          | 128 | 0.9999 |  0.9991   |  1.3075  |         1.3092         |
|           fbnetc_100            | 128 | 0.9991 |  0.9994   |  1.3058  |         1.3038         |
|           mobilevit_s           | 64  | 0.9997 |  0.9997   |  1.302   |         1.2906         |
|        adv_inception_v3         | 128 | 0.9996 |  0.9999   |  1.2934  |         1.2884         |
|       gluon_inception_v3        | 128 | 0.9998 |  0.9998   |  1.2927  |         1.2886         |
|          inception_v3           | 128 | 0.9997 |  0.9998   |  1.2904  |         1.288          |
|            fbnetv3_b            | 128 | 0.999  |  0.9992   |  1.2884  |         1.2847         |
|          pnasnet5large          | 16  | 1.0008 |   1.001   |  1.2848  |         1.2771         |
|            lcnet_050            | 128 | 0.9938 |  0.9945   |  1.2759  |         1.293          |
|           selecsls42b           | 128 | 0.9988 |   0.999   |  1.2748  |         1.2726         |
|            mixnet_l             | 128 | 0.9997 |  0.9994   |  1.2746  |         1.2637         |
|         crossvit_9_240          | 128 | 1.0002 |    1.0    |  1.2723  |         1.2578         |
|     swsl_resnext101_32x16d      | 32  | 0.9992 |   1.001   |  1.2598  |         1.2344         |
|        convmixer_768_32         | 32  | 0.9988 |  0.9989   |  1.239   |         1.2383         |
|        gluon_xception65         | 32  | 0.9998 |  0.9999   |  1.188   |         1.1858         |
|            pit_b_224            | 64  | 0.9995 |  0.9995   |  1.1538  |         1.1471         |
|          mixer_b16_224          | 128 | 1.0021 |  1.0001   |  1.1531  |         1.1529         |
|        twins_pcpvt_base         | 64  | 0.9998 |  1.0006   |  1.1448  |         1.1246         |
|      beit_base_patch16_224      | 64  | 0.9997 |  0.9999   |  1.1172  |         1.1113         |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9997   |  1.1123  |         1.1083         |
| deit_base_distilled_patch16_224 | 64  | 0.9995 |   0.999   |  1.0932  |         1.0911         |
|      vit_base_patch16_224       | 64  | 1.0002 |  0.9993   |  1.0883  |         1.0835         |
|         visformer_small         | 128 | 0.9989 |  0.9998   |  1.0722  |         1.0635         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+--------+-----------+---------------+------------------------+
|              name               | bs | eager  | aot_eager |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+--------+-----------+---------------+------------------------+
|        adv_inception_v3         | 8  |  pass  |   pass    |     pass      |          pass          |
|      beit_base_patch16_224      | 8  |  pass  |   pass    |     pass      |          pass          |
|           mobilevit_s           | 8  |  pass  |   pass    |     pass      |          pass          |
|            nfnet_l0             | 8  |  pass  |   pass    |     pass      |          pass          |
|            pit_b_224            | 8  |  pass  |   pass    |     pass      |          pass          |
|          pnasnet5large          | 8  |  pass  |   pass    |     pass      |          pass          |
|         poolformer_m36          | 8  |  pass  |   pass    |     pass      |          pass          |
|           regnety_002           | 8  |  pass  |   pass    |     pass      |          pass          |
|            repvgg_a2            | 8  |  pass  |   pass    |     pass      |          pass          |
|        res2net101_26w_4s        | 8  |  pass  |   pass    |     pass      |          pass          |
|        res2net50_14w_8s         | 8  |  pass  |   pass    |     pass      |          pass          |
|           res2next50            | 8  |  pass  |   pass    |     pass      |          pass          |
|          resmlp_12_224          | 8  |  pass  |   pass    |     pass      |          pass          |
|           resnest101e           | 8  |  pass  |   pass    |     pass      |          pass          |
|           rexnet_100            | 8  |  pass  |   pass    |     pass      |          pass          |
|        sebotnet33ts_256         | 8  |  pass  |   pass    |     pass      |          pass          |
|           selecsls42b           | 8  |  pass  |   pass    |     pass      |          pass          |
|          spnasnet_100           | 8  |  pass  |   pass    |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 8  |  pass  |   pass    |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 8  |  pass  |   pass    |     pass      |          pass          |
|       tf_efficientnet_b0        | 8  |  pass  |   pass    |     pass      |          pass          |
|           tf_mixnet_l           | 8  |  pass  |   pass    |     pass      |          pass          |
|            tinynet_a            | 8  |  pass  |   pass    |     pass      |          pass          |
|        tnt_s_patch16_224        | 8  |  pass  |   pass    |     pass      |          pass          |
|        twins_pcpvt_base         | 8  |  pass  |   pass    |     pass      |          pass          |
|         visformer_small         | 8  |  pass  |   pass    |     pass      |          pass          |
|      vit_base_patch16_224       | 8  |  pass  |   pass    |     pass      |          pass          |
|           volo_d1_224           | 8  |  pass  |   pass    |     pass      |          pass          |
|      xcit_large_24_p8_224       | 8  |  pass  |   pass    |     pass      |          pass          |
|      mobilenetv3_large_100      | 8  |  pass  |   pass    |     pass      |          pass          |
|         mobilenetv2_100         | 8  |  pass  |   pass    |     pass      |          pass          |
|           mnasnet_100           | 8  |  pass  |   pass    |     pass      |          pass          |
|            mixnet_l             | 8  |  pass  |   pass    |     pass      |          pass          |
|          botnet26t_256          | 8  |  pass  |   pass    |     pass      |          pass          |
|         coat_lite_mini          | 8  |  pass  |   pass    |     pass      |          pass          |
|           convit_base           | 8  |  pass  |   pass    |     pass      |          pass          |
|        convmixer_768_32         | 8  |  pass  |   pass    |     pass      |          pass          |
|          convnext_base          | 8  |  pass  |   pass    |     pass      |          pass          |
|         crossvit_9_240          | 8  |  pass  |   pass    |     pass      |          pass          |
|          cspdarknet53           | 8  |  pass  |   pass    |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 8  |  pass  |   pass    |     pass      |          pass          |
|             dla102              | 8  |  pass  |   pass    |     pass      |          pass          |
|           dm_nfnet_f0           | 8  |  pass  |   pass    |     pass      |          pass          |
|             dpn107              | 8  |  pass  |   pass    |     pass      |          pass          |
|       eca_botnext26ts_256       | 8  |  pass  |   pass    |     pass      |          pass          |
|        eca_halonext26ts         | 8  |  pass  |   pass    |     pass      |          pass          |
|        ese_vovnet19b_dw         | 8  |  pass  |   pass    |     pass      |          pass          |
|           fbnetc_100            | 8  |  pass  |   pass    |     pass      |          pass          |
|            fbnetv3_b            | 8  |  pass  |   pass    |     pass      |          pass          |
|            gernet_l             | 8  |  pass  |   pass    |     pass      |          pass          |
|       gluon_inception_v3        | 8  |  pass  |   pass    |     pass      |          pass          |
|        gluon_xception65         | 8  |  pass  |   pass    |     pass      |          pass          |
|          gmixer_24_224          | 8  |  pass  |   pass    |     pass      |          pass          |
|          gmlp_s16_224           | 8  |  pass  |   pass    |     pass      |          pass          |
|            hrnet_w18            | 8  |  pass  |   pass    |     pass      |          pass          |
|          inception_v3           | 8  |  pass  |   pass    |     pass      |          pass          |
|          jx_nest_base           | 8  |  pass  |   pass    |     pass      |          pass          |
|            lcnet_050            | 8  |  pass  |   pass    |     pass      |          pass          |
|          mixer_b16_224          | 8  |  pass  |   pass    |     pass      |          pass          |
|          ghostnet_100           | 8  |  pass  |   pass    | fail_accuracy |     fail_accuracy      |
|          cait_m36_384           | 0  | 0.0000 |  0.0000   |    0.0000     |         0.0000         |
+---------------------------------+----+--------+-----------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|           mobilevit_s           | 64  | 1.6139 |  3.2333   | 49.7143  |        49.8655         |
|        twins_pcpvt_base         | 64  | 2.5291 |  5.7023   | 38.8259  |        38.7766         |
|         coat_lite_mini          | 128 | 1.0834 |  2.2498   | 29.6093  |        29.7204         |
|            hrnet_w18            | 128 | 5.9686 |  14.2881  | 27.7506  |        27.2722         |
|  swin_base_patch4_window7_224   | 64  | 2.6748 |  5.5886   | 27.6489  |        27.8899         |
|          pnasnet5large          | 16  | 5.0359 |  10.3861  | 24.0488  |        24.0914         |
|           resnest101e           | 64  | 3.2415 |  7.5263   | 21.1781  |        20.7087         |
|          cait_m36_384           |  4  | 3.1865 |  7.6199   |  21.153  |        20.9181         |
|          jx_nest_base           | 32  | 1.6404 |  3.5174   | 19.5164  |        19.6318         |
|      xcit_large_24_p8_224       |  5  | 2.9865 |  7.3893   | 18.9831  |        18.9317         |
|          convnext_base          | 64  | 1.4387 |  2.6806   | 17.6813  |        17.6188         |
|        eca_halonext26ts         | 128 | 1.2738 |  2.4199   | 16.4065  |        16.2804         |
|         poolformer_m36          | 64  | 1.6856 |  3.1149   | 15.8347  |        15.8709         |
|        res2net101_26w_4s        | 64  | 3.1804 |  7.9261   | 15.4316  |        15.5139         |
|        res2net50_14w_8s         | 128 | 2.8033 |   7.261   |  14.023  |        14.0996         |
|        sebotnet33ts_256         | 64  | 1.4937 |  2.9401   | 13.7598  |        13.5922         |
|           volo_d1_224           | 64  | 1.4303 |  3.1886   | 13.6178  |        13.4904         |
|          botnet26t_256          | 128 | 1.1163 |  2.1258   | 13.4993  |        13.3582         |
|        tnt_s_patch16_224        | 128 | 1.8458 |  4.3454   |  13.488  |        13.1775         |
|             dpn107              | 32  | 3.2437 |  6.4918   | 13.0539  |        12.9616         |
|         crossvit_9_240          | 128 | 1.5821 |  3.7067   | 11.4174  |        10.8666         |
|          gmlp_s16_224           | 128 | 1.2904 |  2.7068   | 11.1773  |        10.9321         |
|            fbnetv3_b            | 128 | 2.6799 |  5.6824   | 11.0869  |        10.8418         |
|       eca_botnext26ts_256       | 128 | 1.212  |  2.3684   | 10.8006  |        10.5527         |
|          gmixer_24_224          | 128 | 1.3715 |   3.04    | 10.7386  |        10.5671         |
|           tf_mixnet_l           | 128 | 2.9268 |   5.209   | 10.5769  |        10.5804         |
|        gluon_xception65         | 32  | 2.0418 |  5.2545   | 10.2944  |        10.0279         |
|            mixnet_l             | 128 | 2.6305 |   4.882   |  9.7863  |         9.7688         |
|           convit_base           | 64  | 1.1702 |  2.5556   |  9.422   |         9.3346         |
|             dla102              | 128 | 1.8288 |  4.5522   |  9.2009  |         9.1213         |
|          inception_v3           | 128 | 1.5867 |  4.0164   |  9.0338  |         8.7804         |
|       gluon_inception_v3        | 128 | 1.6059 |  3.9134   |  8.9316  |         8.8053         |
|        adv_inception_v3         | 128 | 1.553  |  3.8752   |  8.8676  |         8.8267         |
|           dm_nfnet_f0           | 128 | 2.1399 |   3.878   |  8.8535  |         8.6243         |
|     swsl_resnext101_32x16d      | 32  | 1.8248 |  4.4364   |  8.7958  |         8.6511         |
|           res2next50            | 128 | 1.5706 |  4.0113   |  8.6387  |         8.5934         |
|          ghostnet_100           | 128 | 1.5369 |  3.7467   |  8.6026  |         8.1494         |
|      beit_base_patch16_224      | 64  | 1.1641 |  2.4449   |  8.3774  |         8.1952         |
|          cspdarknet53           | 64  | 1.9024 |  3.7138   |  7.9474  |         7.6717         |
|           rexnet_100            | 128 | 1.7184 |  3.4299   |  7.9134  |         7.7538         |
|            nfnet_l0             | 128 | 1.8644 |  3.4937   |  7.899   |         7.8174         |
|            tinynet_a            | 128 | 1.7622 |  3.5576   |  7.6692  |         7.5266         |
|          resmlp_12_224          | 128 | 0.6248 |  1.1993   |  7.2887  |         7.2549         |
|          mixer_b16_224          | 128 | 0.664  |  1.3817   |  7.247   |         7.1023         |
|            pit_b_224            | 64  | 0.9875 |  2.2269   |  7.2143  |         7.0427         |
|       tf_efficientnet_b0        | 128 | 1.5763 |  3.0272   |  7.0067  |         6.8312         |
| deit_base_distilled_patch16_224 | 64  | 0.8979 |  1.9326   |  6.4769  |         6.2163         |
|           fbnetc_100            | 128 | 1.6791 |  3.2861   |  6.3888  |         6.152          |
|      vit_base_patch16_224       | 64  | 0.8601 |  1.8736   |  6.3494  |         6.1961         |
|          spnasnet_100           | 128 | 1.6534 |  3.1682   |  6.1396  |         6.1671         |
|      mobilenetv3_large_100      | 128 | 1.3788 |  2.6467   |  6.0345  |         5.874          |
|            repvgg_a2            | 128 | 1.6135 |  3.0337   |  5.7661  |         5.6333         |
|            gernet_l             | 128 | 1.7388 |  3.0408   |  5.7659  |         5.5789         |
|        convmixer_768_32         | 32  | 1.1829 |  2.8901   |  5.7082  |         5.4054         |
|         mobilenetv2_100         | 128 | 1.4172 |  2.6936   |  5.6773  |         5.5803         |
|           regnety_002           | 128 | 1.3786 |  2.6961   |  5.4571  |         5.3051         |
|           mnasnet_100           | 128 | 1.3785 |  2.5897   |  5.2115  |         5.139          |
|         visformer_small         | 128 | 0.9126 |  1.9966   |  5.1558  |         5.0566         |
|           selecsls42b           | 128 | 0.7467 |   1.737   |  4.2837  |         4.1223         |
|        ese_vovnet19b_dw         | 128 | 0.8169 |  1.4457   |  4.0973  |         4.061          |
|            lcnet_050            | 128 | 0.8309 |  1.6434   |  3.9526  |         3.903          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|         mobilenetv2_100         | 128 | 1.2169 |  1.2169   |  1.6626  |         1.8034         |
|           rexnet_100            | 128 | 1.2152 |  1.2152   |  1.6556  |         1.7942         |
|            tinynet_a            | 128 | 1.211  |   1.211   |  1.6361  |         1.7717         |
|           mnasnet_100           | 128 | 1.3795 |  1.3795   |  1.6285  |         1.8465         |
|          spnasnet_100           | 128 | 1.3794 |  1.3794   |  1.6281  |         1.846          |
|        ese_vovnet19b_dw         | 128 | 1.4974 |  1.4974   |  1.6192  |         1.7101         |
|           fbnetc_100            | 128 | 1.1429 |  1.1429   |  1.5534  |         1.6826         |
|      mobilenetv3_large_100      | 128 | 1.2001 |  1.2001   |  1.5403  |         1.7144         |
|            fbnetv3_b            | 128 | 1.199  |   1.199   |  1.5385  |         1.709          |
|           dm_nfnet_f0           | 128 | 1.1746 |  1.7712   |  1.5353  |         1.595          |
|           selecsls42b           | 128 | 1.5922 |  1.5922   |  1.4768  |         1.592          |
|        gluon_xception65         | 32  | 1.4509 |  1.4509   |  1.4067  |         1.4509         |
|        sebotnet33ts_256         | 64  | 1.1867 |  1.1867   |  1.3764  |         1.401          |
|          cspdarknet53           | 64  | 1.6419 |  1.6419   |  1.3762  |         1.4148         |
|          pnasnet5large          | 16  | 1.4331 |  0.9719   |  1.3588  |         1.3817         |
|        convmixer_768_32         | 32  | 1.3892 |  1.3892   |  1.329   |         1.3892         |
|            nfnet_l0             | 128 | 1.3949 |  1.3949   |  1.3271  |         1.3947         |
|       tf_efficientnet_b0        | 128 | 1.3195 |  1.3195   |  1.2494  |         1.3195         |
|            hrnet_w18            | 128 | 1.0656 |  1.0656   |  1.2462  |         1.3268         |
|         poolformer_m36          | 64  | 1.1898 |  1.1898   |  1.2225  |         1.2679         |
|            lcnet_050            | 128 | 1.2755 |  1.2755   |  1.1792  |         1.4693         |
|        res2net50_14w_8s         | 128 | 1.2892 |  1.1422   |  1.1426  |         1.1966         |
|           res2next50            | 128 | 1.3228 |  1.1715   |  1.1364  |         1.1885         |
|            mixnet_l             | 128 | 1.153  |   1.153   |  1.1192  |         1.153          |
|           tf_mixnet_l           | 128 | 1.1531 |  1.1531   |  1.1192  |         1.1531         |
|        res2net101_26w_4s        | 64  | 1.204  |  1.0983   |  1.0862  |         1.1267         |
|       eca_botnext26ts_256       | 128 | 1.1408 |  0.9998   |  1.0812  |         1.1408         |
|        eca_halonext26ts         | 128 | 1.1407 |  0.9998   |  1.0812  |         1.1407         |
|          botnet26t_256          | 128 | 1.1397 |  0.9998   |  1.0806  |         1.1397         |
|         coat_lite_mini          | 128 | 1.1029 |  1.0932   |  1.0794  |         1.1245         |
|            repvgg_a2            | 128 | 1.0636 |  1.0636   |  1.0778  |         1.1354         |
|          convnext_base          | 64  | 1.1198 |  1.1156   |  1.0621  |         1.0878         |
|           regnety_002           | 128 |  1.0   |    1.0    |  1.0514  |         1.1972         |
|           mobilevit_s           | 64  | 1.1646 |  1.1646   |  1.0267  |         1.0687         |
|          ghostnet_100           | 128 | 1.1112 |  1.1112   |  1.0214  |         1.1112         |
|     swsl_resnext101_32x16d      | 32  |  1.0   |  0.9816   |  0.9915  |          1.0           |
|             dla102              | 128 |  1.0   |    1.0    |  0.9641  |          1.0           |
|        twins_pcpvt_base         | 64  | 1.0801 |  1.0709   |  0.9525  |         0.9806         |
|           convit_base           | 64  | 1.1582 |  1.1567   |  0.9486  |         0.9714         |
|       gluon_inception_v3        | 128 | 1.0003 |  1.0003   |  0.9485  |         1.0001         |
|          inception_v3           | 128 | 1.0003 |  1.0003   |  0.9485  |         1.0001         |
|        adv_inception_v3         | 128 | 1.0003 |  1.0003   |  0.9485  |         1.0001         |
|            gernet_l             | 128 |  1.0   |    1.0    |  0.9359  |          1.0           |
|          cait_m36_384           |  4  | 1.0086 |   1.008   |  0.9354  |         0.9394         |
|           resnest101e           | 64  |  1.0   |  0.8541   |  0.9281  |         0.959          |
|           volo_d1_224           | 64  |  1.0   |    1.0    |  0.9139  |         0.9537         |
|             dpn107              | 32  | 1.2072 |  1.1164   |  0.9081  |         0.9187         |
|      xcit_large_24_p8_224       |  5  | 1.0129 |  1.0129   |  0.876   |         0.8794         |
|          jx_nest_base           | 32  | 1.1102 |  1.1084   |  0.8737  |         0.8862         |
|            pit_b_224            | 64  | 1.0669 |  1.0659   |  0.8617  |         0.8724         |
|          mixer_b16_224          | 128 | 1.1738 |  1.1696   |  0.8587  |         0.899          |
|         visformer_small         | 128 | 1.1201 |  1.1201   |  0.8585  |         0.9032         |
|  swin_base_patch4_window7_224   | 64  | 1.3578 |  1.3515   |  0.835   |         0.8479         |
|      beit_base_patch16_224      | 64  | 1.0658 |  1.0637   |  0.8089  |         0.8318         |
| deit_base_distilled_patch16_224 | 64  | 1.0676 |  1.0663   |  0.7983  |         0.8219         |
|      vit_base_patch16_224       | 64  | 1.0663 |  1.0642   |  0.7981  |         0.8206         |
|          resmlp_12_224          | 128 | 1.1837 |  1.1837   |  0.7744  |         0.8456         |
|         crossvit_9_240          | 128 | 1.0494 |  1.0445   |  0.6775  |         0.7192         |
|          gmixer_24_224          | 128 | 1.1635 |  1.1479   |  0.6682  |         0.7147         |
|          gmlp_s16_224           | 128 | 1.0787 |  1.0592   |  0.6609  |         0.7151         |
|        tnt_s_patch16_224        | 128 | 1.2117 |  1.0496   |  0.5094  |         0.5345         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 148.9654 | 149.0719  | 104.6927 |        105.1293        |
|        convmixer_768_32         | 32  | 109.0607 | 109.1017  | 87.9242  |        87.9553         |
|           dm_nfnet_f0           | 128 | 119.1667 | 119.2934  | 68.3774  |         69.63          |
|  swin_base_patch4_window7_224   | 64  | 74.2176  |  74.1953  | 66.6548  |        66.9677         |
|          pnasnet5large          | 16  |  80.328  |  80.3555  | 62.6038  |        62.9718         |
|             dla102              | 128 | 84.2243  |  84.2376  | 62.1823  |        62.4255         |
|            hrnet_w18            | 128 | 82.9734  |  83.1324  | 60.4921  |        61.3852         |
|          cait_m36_384           |  4  | 83.1308  |  82.5154  | 58.9528  |        59.8852         |
|            nfnet_l0             | 128 | 94.9852  |  94.9558  | 57.3149  |         57.841         |
|     swsl_resnext101_32x16d      | 32  | 69.6202  |  69.5645  | 55.1758  |        56.4839         |
|          mixer_b16_224          | 128 | 55.0737  |  55.147   | 48.2593  |        48.2582         |
|           convit_base           | 64  | 65.7007  |  65.6777  | 47.9436  |        48.2889         |
|           tf_mixnet_l           | 128 | 62.5082  |  62.5203  | 47.6899  |         48.061         |
|            mixnet_l             | 128 | 60.2366  |  60.2873  | 47.2427  |        47.7046         |
|         poolformer_m36          | 64  | 70.4273  |  70.4285  |  47.001  |         47.663         |
|          inception_v3           | 128 | 60.3174  |  60.3455  | 46.7317  |        46.8366         |
|       gluon_inception_v3        | 128 | 60.3486  |  60.3433  | 46.7135  |        46.8547         |
|        adv_inception_v3         | 128 | 60.3702  |  60.3922  |  46.711  |        46.8479         |
|            pit_b_224            | 64  | 53.8501  |  53.8641  | 46.6385  |        46.9295         |
|           resnest101e           | 64  | 70.3262  |  70.2225  |  45.651  |        46.8674         |
|           res2next50            | 128 | 65.1489  |  65.0755  | 44.7873  |        45.5519         |
|             dpn107              | 32  | 55.9111  |  55.9176  | 42.5219  |        43.0376         |
|        res2net50_14w_8s         | 128 | 60.6821  |  60.6256  | 41.9294  |        42.3783         |
|        gluon_xception65         | 32  | 48.8572  |  49.0012  | 41.1466  |        41.3234         |
|          convnext_base          | 64  | 56.9532  |  57.075   | 40.8314  |        41.1649         |
|         visformer_small         | 128 | 43.5331  |  43.4914  | 40.6089  |        40.8692         |
|      beit_base_patch16_224      | 64  | 43.3181  |  43.3553  | 38.7694  |        38.9668         |
| deit_base_distilled_patch16_224 | 64  | 41.2645  |  41.3351  | 37.8458  |        37.9115         |
|      vit_base_patch16_224       | 64  | 41.0123  |  40.9416  | 37.8111  |        37.9837         |
|        twins_pcpvt_base         | 64  | 39.5165  |  39.5124  | 34.5647  |        35.2026         |
|          gmixer_24_224          | 128 | 45.9274  |  45.9138  | 34.3347  |        34.4751         |
|          gmlp_s16_224           | 128 | 49.3377  |  49.3872  | 32.3656  |        32.5592         |
|           volo_d1_224           | 64  | 50.4443  |  50.7984  | 32.1044  |        32.4684         |
|        res2net101_26w_4s        | 64  | 45.3913  |  45.4029  |  31.876  |        32.2707         |
|            fbnetv3_b            | 128 | 40.8737  |  40.8793  | 31.7002  |        31.8064         |
|          jx_nest_base           | 32  |  41.393  |  41.6816  | 29.5727  |        30.1181         |
|        eca_halonext26ts         | 128 | 43.7129  |  43.7007  | 28.8765  |        29.1417         |
|          botnet26t_256          | 128 | 42.6847  |  42.6623  | 28.1344  |         28.161         |
|         coat_lite_mini          | 128 |  41.185  |  41.2007  | 28.0194  |        28.3056         |
|            gernet_l             | 128 | 37.9461  |  37.9274  | 27.8385  |        27.8736         |
|       eca_botnext26ts_256       | 128 | 42.3338  |  42.3053  | 27.4495  |        27.6741         |
|          cspdarknet53           | 64  | 34.1121  |  34.1163  | 24.8253  |         25.037         |
|            repvgg_a2            | 128 | 35.2978  |  35.3267  | 24.7215  |        24.7609         |
|         crossvit_9_240          | 128 | 30.1934  |  30.2016  | 23.7378  |        23.9801         |
|      xcit_large_24_p8_224       |  5  | 34.6663  |  34.9365  | 22.7257  |        23.2079         |
|       tf_efficientnet_b0        | 128 | 30.0324  |  30.0328  | 22.4684  |        22.4928         |
|           mobilevit_s           | 64  |  28.573  |  28.5851  | 21.9561  |        22.1538         |
|        sebotnet33ts_256         | 64  | 35.1965  |  35.1909  | 20.7381  |        20.8677         |
|           fbnetc_100            | 128 | 25.4068  |  25.4258  | 19.4487  |        19.4747         |
|           rexnet_100            | 128 | 26.2233  |  26.262   | 18.8352  |        18.9135         |
|           selecsls42b           | 128 | 23.4263  |  23.4275  | 18.3751  |        18.4075         |
|        ese_vovnet19b_dw         | 128 | 24.9002  |  24.9009  | 18.2897  |        18.2909         |
|            tinynet_a            | 128 | 24.2155  |  24.2104  | 17.5386  |        17.6373         |
|          resmlp_12_224          | 128 | 21.8646  |  21.8764  | 16.7226  |         16.71          |
|          spnasnet_100           | 128 | 21.5151  |  21.5202  | 16.0869  |        16.1043         |
|           mnasnet_100           | 128 | 20.3695  |  20.3718  | 15.4352  |        15.4274         |
|         mobilenetv2_100         | 128 | 19.2904  |  19.3011  | 13.5938  |        13.5819         |
|      mobilenetv3_large_100      | 128 | 16.4404  |  16.4508  | 12.2472  |         12.255         |
|          ghostnet_100           | 128 | 16.5424  |  16.5664  | 11.9457  |        11.9185         |
|           regnety_002           | 128 |  9.673   |   9.701   |  6.7712  |         6.9554         |
|            lcnet_050            | 128 |  5.1395  |  5.1785   |  3.6353  |         3.7228         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Build Summary

see more

Run name

day_354_20_12_22_performance_float32_370

Commit hashes

pytorch commit: 88c581be87ac59ea1251f35a57b610ae81b9362d
pytorch commit date: 2022-12-21 04:51:51+00:00
functorch Absent
torchbench commit: 43ca0857e9c7b9d90f647d1befbaee1dfe446d7e
torchbench commit date: 2022-12-16 10:47:24-08:00

TorchDynamo config flags

torch._dynamo.config.DO_NOT_USE_legacy_non_fake_example_inputs = False
torch._dynamo.config.HAS_REFS_PRIMS = True
torch._dynamo.config.capture_scalar_outputs = False
torch._dynamo.config.dead_code_elimination = True
torch._dynamo.config.disable = False
torch._dynamo.config.dynamic_shapes = False
torch._dynamo.config.enforce_cond_guards_match = True
torch._dynamo.config.error_on_nested_fx_trace = True
torch._dynamo.config.guard_nn_modules = False
torch._dynamo.config.normalize_ir = False
torch._dynamo.config.optimize_ddp = True
torch._dynamo.config.output_code = False
torch._dynamo.config.output_graph_code = False
torch._dynamo.config.print_graph_breaks = False
torch._dynamo.config.raise_on_ctx_manager_usage = True
torch._dynamo.config.raise_on_unsafe_aot_autograd = False
torch._dynamo.config.replay_record_enabled = False
torch._dynamo.config.rewrite_assert_with_torch_assert = True
torch._dynamo.config.specialize_int_float = True
torch._dynamo.config.suppress_errors = False
torch._dynamo.config.verbose = False
torch._dynamo.config.verify_correctness = False

Torch version

torch: 2.0.0a0+git88c581b

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8302
Number CUDA Devices: 8
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.314694656

@anijain2305
Copy link
Owner Author

anijain2305 commented Dec 22, 2022

Inference Performance Dashboard for float16 precision

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. This is inference run. For accuracy, we check the numerical correctness of forward pass outputs. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 95%, 57/60 | 100%, 45/45 | 100%, 59/59 |
|       aot_eager        | 92%, 55/60 | 100%, 45/45 | 100%, 59/59 |
|        inductor        | 90%, 54/60 | 100%, 45/45 | 95%, 56/59  |
| inductor_no_cudagraphs | 92%, 55/60 | 100%, 45/45 | 95%, 56/59  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.01x    |    1.01x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.50x    |    1.48x    |    1.41x    |
| inductor_no_cudagraphs |   1.37x    |    1.37x    |    1.38x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.07    |    2.98     |    1.84     |
|       aot_eager        |    3.11    |    5.66     |    4.07     |
|        inductor        |    8.70    |    16.24    |    12.75    |
| inductor_no_cudagraphs |    8.06    |    14.24    |    12.53    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.03x    |    1.02x    |    1.16x    |
|       aot_eager        |   1.02x    |    1.02x    |    1.12x    |
|        inductor        |   0.98x    |    1.14x    |    1.06x    |
| inductor_no_cudagraphs |   1.05x    |    1.22x    |    1.13x    |
+------------------------+------------+-------------+-------------+

torchbench suite with float16 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|                drq                |  1   | 1.0231 |  0.9921   |  3.3913  |         1.2582         |
|         soft_actor_critic         | 256  | 1.0442 |  0.9511   |  2.8713  |         1.1857         |
|            hf_T5_base             |  1   | 0.9635 |  0.9343   |  2.5945  |          2.47          |
|         phlippe_densenet          | 128  | 1.0165 |  0.9733   |  2.4943  |         1.6525         |
|               hf_T5               |  4   | 0.9892 |  0.9733   |  2.3937  |         2.6655         |
|            hf_T5_large            |  1   | 0.8395 |   0.737   |  2.3308  |         1.3055         |
|            hf_Reformer            |  8   | 0.9976 |  1.0031   |  2.1214  |         2.0966         |
|             hf_Albert             |  16  | 1.0015 |   1.002   |  1.987   |         1.9339         |
|           lennard_jones           | 1000 | 0.8397 |  0.8577   |  1.9786  |         0.7803         |
|            timm_nfnet             | 128  | 0.9985 |  0.9988   |  1.8871  |         1.7725         |
|               dlrm                |  1   | 0.9811 |  1.0549   |  1.818   |         1.0991         |
|           hf_GPT2_large           |  1   | 1.005  |  0.9992   |  1.8035  |         1.7802         |
|            densenet121            |  64  | 0.9986 |  1.0006   |  1.6773  |         1.6246         |
|           squeezenet1_1           | 256  | 0.9987 |  0.9982   |  1.6314  |          1.62          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9924 |  0.9623   |  1.6101  |         1.4443         |
|           BERT_pytorch            |  32  | 1.0359 |  0.8963   |  1.6076  |         1.5589         |
|              hf_GPT2              |  16  | 1.0001 |  0.9998   |  1.5983  |         1.5928         |
|          vision_maskrcnn          |  4   | 0.9674 |  0.9444   |  1.5583  |         1.6272         |
|           timm_resnest            | 256  | 0.9994 |  0.9996   |  1.5561  |         1.5555         |
|        shufflenet_v2_x1_0         | 128  | 0.9993 |  0.9995   |  1.5426  |         1.4606         |
|           hf_Longformer           |  4   | 0.9999 |  1.0009   |  1.5424  |         1.5347         |
|          phlippe_resnet           | 256  | 1.022  |  0.9479   |  1.5302  |         1.4261         |
|           pytorch_unet            |  4   | 0.9987 |  0.9996   |  1.5269  |         1.5227         |
|             resnet50              |  64  | 0.9979 |  0.9983   |  1.5116  |         1.4846         |
|        Background_Matting         |  1   | 0.9969 |   0.998   |  1.5069  |         1.4845         |
|          resnext50_32x4d          |  64  | 0.9985 |  0.9991   |  1.4894  |         1.4685         |
|             resnet152             |  64  | 0.9993 |  0.9992   |  1.4667  |         1.4365         |
|           mobilenet_v2            | 128  | 0.9981 |  0.9978   |  1.4615  |         1.4519         |
|         timm_efficientnet         | 128  | 0.9986 |  0.9993   |  1.4371  |         1.4219         |
|              hf_Bert              |  8   | 1.0081 |  1.0013   |  1.4283  |         1.3829         |
|            mnasnet1_0             | 128  | 0.998  |   0.998   |  1.4229  |         1.4096         |
|           hf_Bert_large           |  4   | 1.0089 |  0.9981   |  1.4199  |         1.3841         |
|            timm_regnet            |  32  | 0.9994 |  0.9988   |  1.4188  |         1.3686         |
|        mobilenet_v3_large         | 128  | 0.9984 |  0.9989   |  1.3772  |         1.3674         |
|           hf_DistilBert           |  16  | 1.0004 |  0.9983   |  1.3671  |         1.3511         |
|            timm_vovnet            | 128  | 0.9989 |  0.9992   |  1.3529  |         1.3434         |
|       functorch_dp_cifar10        | 512  | 0.9917 |  0.9945   |  1.3476  |         1.2649         |
|        speech_transformer         |  1   | 0.9848 |   0.874   |  1.3413  |         1.344          |
|              yolov3               |  8   | 1.0164 |  0.9992   |  1.3396  |         1.3006         |
| attention_is_all_you_need_pytorch | 256  | 0.9995 |  0.9154   |  1.3255  |         1.2849         |
|               vgg16               |  8   | 0.993  |  0.9947   |  1.3175  |         1.2545         |
|             resnet18              | 256  | 0.9983 |  0.9989   |  1.2885  |         1.2899         |
|          LearningToPaint          | 256  | 0.9966 |  0.9974   |  1.2844  |         1.2739         |
|            Super_SloMo            |  8   | 0.9994 |  0.9993   |  1.2544  |         1.2371         |
|              alexnet              | 1024 | 0.9991 |  0.9991   |  1.2458  |         1.2742         |
|           fastNLP_Bert            |  16  | 0.9967 |  0.9948   |   1.22   |         1.219          |
|               dcgan               | 1024 | 0.9947 |  0.9947   |  1.1862  |         1.1835         |
|              hf_Bart              |  8   | 0.9804 |  0.9313   |  1.1664  |         1.1133         |
|          pytorch_stargan          |  16  | 1.0385 |  0.9059   |  1.1603  |         1.1612         |
|        doctr_det_predictor        |  4   | 1.0001 |  1.0059   |  1.1239  |         1.1252         |
|      timm_vision_transformer      | 128  | 0.9993 |  0.9997   |  1.1072  |         1.1008         |
|   timm_vision_transformer_large   |  8   | 1.0003 |  1.0008   |  1.0519  |         1.0423         |
|              demucs               |  32  | 0.9995 |  0.9997   |  0.9997  |         0.9994         |
|       doctr_reco_predictor        |  64  | 1.0554 |  0.9928   |   0.99   |         0.997          |
|            tts_angular            | 512  | 1.0072 |  1.1018   |  0.9884  |         0.9962         |
|      nvidia_deeprecommender       | 512  | 0.9976 |  0.9967   |  0.9445  |         0.9959         |
|            hf_BigBird             |  4   | 0.9953 |  0.9829   |   0.0    |         1.2566         |
|             tacotron2             | 128  | 1.0925 |    0.0    |   0.0    |          0.0           |
|               moco                |  64  | 0.9928 |    0.0    |   0.0    |          0.0           |
|     detectron2_fcos_r_50_fpn      |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          vision_maskrcnn          |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientdet         |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |       pass       |       pass       |       pass       |          pass          |
|        Background_Matting         |  1  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               | 32  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|       doctr_reco_predictor        |  4  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |   fail_to_run    |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|             tacotron2             |  4  |       pass       |   fail_to_run    |      0.0000      |         0.0000         |
|     detectron2_fcos_r_50_fpn      |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|        doctr_det_predictor        |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|            hf_T5_large            |  1   | 12.2447  |  19.3454  | 50.4579  |        31.4887         |
|           hf_Longformer           |  4   |  4.7509  |  9.6289   | 33.8898  |        33.9265         |
|            hf_T5_base             |  1   |  6.672   |  10.2255  | 27.2742  |         18.041         |
|          vision_maskrcnn          |  4   |  4.1237  |  7.0312   | 27.1427  |        22.9349         |
|           hf_GPT2_large           |  1   |  5.1424  |  9.6722   | 19.7855  |        19.3883         |
| attention_is_all_you_need_pytorch | 256  |  1.3425  |  3.0705   |  19.06   |         17.538         |
|   timm_vision_transformer_large   |  8   |  3.2973  |  7.4977   | 18.5304  |        18.5436         |
|              yolov3               |  8   |  1.756   |  3.6947   | 18.1289  |        18.2339         |
|        speech_transformer         |  1   |  1.884   |  4.3889   | 17.3188  |        16.2409         |
|              hf_Bart              |  8   |  2.8131  |   4.979   | 16.8544  |         15.953         |
|               hf_T5               |  4   |  3.8294  |  5.7653   | 16.3204  |        11.7282         |
|            densenet121            |  64  |  2.0815  |  5.2542   | 15.5421  |        15.2114         |
|            hf_Reformer            |  8   |  1.584   |  2.7683   | 14.9387  |        14.1688         |
|             resnet152             |  64  |  2.3688  |  6.3168   | 13.4284  |        12.7992         |
|           hf_Bert_large           |  4   |  3.4467  |  6.9766   | 12.3296  |        12.2077         |
|            Super_SloMo            |  8   |  1.209   |  3.1097   | 10.4505  |         10.213         |
|            timm_nfnet             | 128  |  2.2102  |  4.1048   |  9.7131  |         9.5693         |
|           fastNLP_Bert            |  16  |  1.7063  |   3.753   |  9.4791  |         8.6099         |
|              hf_GPT2              |  16  |  1.7526  |  3.3334   |  8.3712  |         8.4044         |
|           BERT_pytorch            |  32  |  1.6422  |   3.612   |  8.1593  |         7.9738         |
|            timm_regnet            |  32  |  1.9398  |  3.7542   |  7.3351  |         7.1522         |
|        doctr_det_predictor        |  4   |  1.2721  |   3.171   |  7.1826  |         7.0412         |
|      timm_vision_transformer      | 128  |  1.021   |  2.3209   |  7.1031  |         7.0092         |
|         timm_efficientnet         | 128  |  1.5071  |  3.0849   |  6.9534  |         6.791          |
|              hf_Bert              |  8   |  1.7702  |  3.6606   |  6.6344  |         6.5585         |
|           timm_resnest            | 256  |  0.5911  |   1.31    |  6.6269  |         6.5849         |
|             hf_Albert             |  16  |  1.5999  |  3.3117   |  6.4241  |         6.393          |
|        shufflenet_v2_x1_0         | 128  |  1.0024  |  2.5031   |  6.0376  |         5.7576         |
|         phlippe_densenet          | 128  |  0.8583  |  2.2377   |  5.8409  |         5.7502         |
|        mobilenet_v3_large         | 128  |  0.9156  |  2.3675   |  5.8266  |         5.6523         |
|           hf_DistilBert           |  16  |  0.841   |  1.7546   |  5.5053  |         5.2614         |
|           mobilenet_v2            | 128  |  0.8734  |  2.2504   |  5.4822  |         5.2708         |
|        Background_Matting         |  1   |  0.9136  |  2.3267   |  5.3075  |         5.0218         |
|          resnext50_32x4d          |  64  |  0.8811  |  2.2156   |  5.127   |         4.9836         |
|             resnet50              |  64  |  0.8824  |  2.2278   |  5.1196  |         4.9724         |
|            timm_vovnet            | 128  |  1.1179  |  2.1782   |  5.1106  |         4.9868         |
|            mnasnet1_0             | 128  |  0.8087  |  2.0815   |  4.8432  |         4.829          |
|       functorch_dp_cifar10        | 512  |  0.295   |  0.5761   |  3.4323  |         3.3616         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.4629  |  1.0965   |  3.4315  |         3.3117         |
|           pytorch_unet            |  4   |  0.4831  |  1.1262   |  3.0902  |         2.8645         |
|          pytorch_stargan          |  16  |  0.4267  |  1.1355   |  2.8676  |         2.8132         |
|          LearningToPaint          | 256  |  0.4272  |  0.9849   |  2.7735  |         2.529          |
|             resnet18              | 256  |  0.4089  |  0.9327   |  2.6659  |         2.5204         |
|          phlippe_resnet           | 256  |  0.3992  |  0.9224   |  2.4604  |         2.3152         |
|           squeezenet1_1           | 256  |  0.2381  |  0.3943   |  1.9004  |         1.7338         |
|               vgg16               |  8   |  0.1875  |   0.315   |  1.5229  |         1.4794         |
|              alexnet              | 1024 |  0.163   |  0.2448   |  1.4844  |         1.3551         |
|                drq                |  1   |  0.3087  |  0.4127   |  1.4449  |         1.2545         |
|               dcgan               | 1024 |  0.1562  |  0.2679   |  1.2373  |         1.099          |
|      nvidia_deeprecommender       | 512  |  0.1923  |  0.2913   |  1.2284  |         1.0979         |
|               dlrm                |  1   |  0.2556  |  0.4147   |  1.2182  |         1.105          |
|         soft_actor_critic         | 256  |  0.2148  |  0.2796   |  1.0579  |         0.9408         |
|           lennard_jones           | 1000 |  0.1422  |  0.2229   |  0.9876  |         0.8925         |
|            tts_angular            | 512  |  0.1734  |  0.1932   |  0.9415  |         0.8246         |
|       doctr_reco_predictor        |  64  |   0.75   |  0.7487   |  0.621   |         0.6318         |
|              demucs               |  32  |  0.2829  |  0.2844   |  0.1939  |         0.1918         |
|            hf_BigBird             |  4   |  4.0509  |  6.1417   |   nan    |        13.1385         |
|             tacotron2             | 128  | 122.5979 |    nan    |   nan    |          nan           |
|               moco                |  64  |  22.626  |    nan    |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|         timm_efficientnet         | 128  | 1.214  |   1.214   |  1.6445  |         1.7882         |
|           pytorch_unet            |  4   | 1.4991 |  1.1565   |  1.488   |         1.4991         |
|            timm_vovnet            | 128  | 1.2922 |  1.2922   |  1.4609  |         1.5129         |
|           mobilenet_v2            | 128  | 1.072  |   1.072   |  1.4569  |         1.5861         |
|            timm_nfnet             | 128  | 1.1392 |  1.6403   |  1.4486  |         1.503          |
|           squeezenet1_1           | 256  |  1.0   |  0.9658   |  1.4303  |         1.5976         |
|            mnasnet1_0             | 128  | 1.1252 |  1.1252   |  1.322   |         1.5064         |
|        Background_Matting         |  1   | 1.2867 |  1.2864   |  1.1987  |         1.2138         |
|              demucs               |  32  | 1.1134 |  1.1134   |  1.1134  |         1.1134         |
|              yolov3               |  8   | 1.1163 |  1.1163   |  1.0986  |         1.1181         |
|        doctr_det_predictor        |  4   | 0.512  |   0.512   |  1.0577  |         0.5098         |
|        shufflenet_v2_x1_0         | 128  |  1.0   |  0.9706   |  1.0543  |         1.2701         |
|          pytorch_stargan          |  16  | 1.0488 |  1.0453   |  1.0488  |         1.0488         |
|          phlippe_resnet           | 256  | 1.1662 |  1.1661   |  1.0466  |         1.1662         |
|         phlippe_densenet          | 128  | 1.2207 |  1.2205   |  1.0303  |         1.0779         |
|             hf_Albert             |  16  | 1.0231 |  1.0199   |  1.0156  |         1.0231         |
|           hf_DistilBert           |  16  | 1.0157 |  1.0146   |  1.0103  |         1.0157         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.9994   |  1.007   |         0.9999         |
|              hf_Bert              |  8   | 1.0088 |  1.0076   |  1.0029  |         1.0088         |
|           hf_Bert_large           |  4   | 1.0033 |  1.0026   |   1.0    |         1.0033         |
|              hf_GPT2              |  16  |  1.0   |   0.999   |  0.9986  |          1.0           |
|           hf_GPT2_large           |  1   | 0.9997 |  0.9995   |  0.9986  |         0.9996         |
|               dlrm                |  1   |  1.0   |    1.0    |  0.998   |          1.0           |
|       doctr_reco_predictor        |  64  | 0.9976 |  0.9976   |  0.9976  |         0.9976         |
|      nvidia_deeprecommender       | 512  | 1.001  |   1.001   |  0.9974  |         1.142          |
|               vgg16               |  8   |  1.0   |    1.0    |  0.9876  |          1.0           |
|            timm_regnet            |  32  |  1.0   |    1.0    |  0.9803  |         0.9993         |
|             resnet152             |  64  |  1.0   |  0.8642   |  0.9654  |          1.0           |
|           hf_Longformer           |  4   | 0.5586 |  0.5582   |  0.9649  |         0.9892         |
| attention_is_all_you_need_pytorch | 256  | 1.0316 |  1.0276   |  0.9589  |         0.9658         |
|             resnet50              |  64  |  1.0   |  0.8326   |  0.9561  |          1.0           |
|          resnext50_32x4d          |  64  |  1.0   |  0.8308   |  0.9556  |          1.0           |
|   timm_vision_transformer_large   |  8   | 1.004  |  1.0037   |  0.9548  |         0.9554         |
|            tts_angular            | 512  | 0.9982 |  0.9982   |  0.953   |         0.9982         |
|               dcgan               | 1024 |  1.0   |    1.0    |  0.9486  |          1.0           |
|              hf_Bart              |  8   |  1.0   |    1.0    |  0.9393  |          1.0           |
|        mobilenet_v3_large         | 128  |  1.0   |    1.0    |  0.9354  |         0.9999         |
|             resnet18              | 256  |  1.0   |    1.0    |  0.9327  |          1.0           |
|            Super_SloMo            |  8   | 1.1498 |  0.8973   |  0.9319  |         0.9493         |
|       functorch_dp_cifar10        | 512  |  1.0   |  0.9914   |  0.9299  |          1.0           |
|           timm_resnest            | 256  |  1.0   |  0.8179   |  0.9091  |         0.9473         |
|              alexnet              | 1024 |  1.0   |   0.864   |  0.8945  |         1.0664         |
|          LearningToPaint          | 256  |  1.0   |    1.0    |  0.8842  |          1.0           |
|           fastNLP_Bert            |  16  | 1.0612 |  1.0592   |  0.8671  |         0.8746         |
|          vision_maskrcnn          |  4   | 0.8477 |  0.8474   |  0.8293  |         0.8474         |
|                drq                |  1   | 0.9626 |  0.9626   |  0.7848  |         0.9626         |
|            hf_T5_large            |  1   | 0.9528 |  0.9558   |  0.7593  |         0.9611         |
|         soft_actor_critic         | 256  |  1.0   |    1.0    |  0.7148  |          1.0           |
|      timm_vision_transformer      | 128  | 1.1044 |  1.0931   |  0.6964  |         0.7507         |
|           BERT_pytorch            |  32  | 1.0257 |  1.0257   |  0.6706  |         0.674          |
|            hf_Reformer            |  8   | 1.3472 |  1.4449   |  0.6307  |         0.6736         |
|            densenet121            |  64  | 1.1487 |  0.8823   |  0.5842  |         0.6043         |
|           lennard_jones           | 1000 |  1.0   |    1.0    |  0.5591  |          1.0           |
|        speech_transformer         |  1   | 1.0602 |  1.0594   |  0.4985  |         0.4998         |
|               hf_T5               |  4   | 0.6854 |  0.7344   |  0.3659  |         0.8779         |
|            hf_T5_base             |  1   | 0.7851 |  0.8082   |  0.3305  |         0.902          |
|            hf_BigBird             |  4   | 0.8569 |  0.8569   |   nan    |         0.8569         |
|               moco                |  64  |  1.0   |    nan    |   nan    |          nan           |
|             tacotron2             | 128  | 0.7969 |    nan    |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_Longformer           |  4   | 167.5839 | 167.5337  | 108.6833 |        109.2011        |
|        doctr_det_predictor        |  4   | 66.2587  |  66.8761  | 61.4194  |        61.4289         |
|              demucs               |  32  | 61.1252  |  61.0633  | 60.9794  |        61.0849         |
|              hf_GPT2              |  16  | 88.6055  |  88.586   | 55.4147  |        55.6209         |
|   timm_vision_transformer_large   |  8   | 48.8916  |  48.9394  | 46.4665  |        46.9163         |
|           fastNLP_Bert            |  16  | 49.6382  |  50.3561  | 41.1403  |        40.3824         |
|               hf_T5               |  4   | 98.1321  |  99.8324  | 40.8111  |        36.4306         |
|            hf_T5_base             |  1   | 97.6342  | 100.8787  | 36.3609  |        38.1456         |
|            Super_SloMo            |  8   | 41.6154  |  41.6842  | 33.1723  |        33.6203         |
|          vision_maskrcnn          |  4   | 48.7532  |  52.2999  |  30.392  |        29.1956         |
|      timm_vision_transformer      | 128  |  32.665  |  32.6367  | 29.4762  |        29.6575         |
|           pytorch_unet            |  4   | 44.6134  |  44.6071  |  29.198  |        29.2839         |
|            hf_T5_large            |  1   |  74.36   |  84.8579  |  27.084  |        48.8089         |
|           timm_resnest            | 256  | 39.3627  |  39.3615  | 25.2915  |        25.3065         |
|            timm_nfnet             | 128  | 43.2852  |  43.4349  |  23.102  |        24.4854         |
|             resnet152             |  64  |  31.923  |  32.2496  | 21.7956  |         22.151         |
|           hf_GPT2_large           |  1   | 27.5822  |  27.7416  | 18.0711  |        18.4133         |
|              hf_Bart              |  8   | 21.0853  |  22.2353  |  17.794  |        18.3976         |
|              alexnet              | 1024 | 23.8231  |  22.0259  |  17.658  |        17.2719         |
|            timm_vovnet            | 128  | 23.3373  |  23.3876  |  17.222  |        17.3488         |
|             hf_Albert             |  16  | 28.6986  |  28.6794  | 14.4724  |        14.8775         |
|             resnet18              | 256  | 16.3551  |  16.3722  | 12.6782  |        12.6559         |
|            hf_Reformer            |  8   | 27.1484  |  26.7609  | 12.6749  |        12.8141         |
|            timm_regnet            |  32  |  18.43   |  17.7236  | 12.5566  |        13.6002         |
|           hf_Bert_large           |  4   | 17.3728  |  17.6152  | 12.3874  |        12.6737         |
|            densenet121            |  64  | 19.6994  |  19.6848  | 11.6748  |        12.0642         |
|          resnext50_32x4d          |  64  | 17.2657  |  17.2804  |  11.601  |        11.7697         |
|         timm_efficientnet         | 128  | 16.5735  |  16.5574  | 11.4987  |         11.633         |
|        speech_transformer         |  1   | 14.0249  |  16.5825  | 10.6054  |        10.5122         |
|        Background_Matting         |  1   | 15.4071  |  15.3701  | 10.1761  |        10.3708         |
|           hf_DistilBert           |  16  | 13.5553  |  13.5681  |  9.912   |        10.0293         |
|             resnet50              |  64  | 13.8593  |  13.8984  |  9.1983  |         9.3518         |
| attention_is_all_you_need_pytorch | 256  | 12.1419  |  13.5304  |  9.1797  |         9.4978         |
|           squeezenet1_1           | 256  | 14.6455  |  14.693   |  8.9845  |         9.0459         |
|              hf_Bert              |  8   | 12.4528  |  12.5026  |  8.7468  |         9.027          |
|            mnasnet1_0             | 128  | 12.1252  |  12.1025  |  8.5072  |         8.5835         |
|           mobilenet_v2            | 128  | 12.2137  |  12.2566  |  8.343   |         8.3764         |
|        mobilenet_v3_large         | 128  |  10.926  |  10.9051  |  7.9383  |         8.0026         |
|              yolov3               |  8   |  10.314  |  10.3036  |  7.6725  |         7.8641         |
|           BERT_pytorch            |  32  |  9.5261  |  11.3595  |  6.1571  |         6.3861         |
|            tts_angular            | 512  |  7.1233  |  5.8138   |  5.9249  |         5.821          |
|        shufflenet_v2_x1_0         | 128  |  7.3649  |  7.4131   |  5.854   |         4.9993         |
|       doctr_reco_predictor        |  64  |  5.5366  |  5.1483   |  5.1269  |         6.0113         |
|          LearningToPaint          | 256  |  6.4989  |  6.5063   |  5.0653  |         5.9589         |
|      nvidia_deeprecommender       | 512  |  4.7455  |  4.7493   |  5.0089  |         4.7556         |
|          pytorch_stargan          |  16  |  4.3417  |  5.3585   |  4.6248  |         4.6311         |
|               dcgan               | 1024 |  3.8799  |  3.8774   |  3.242   |         3.2609         |
|               vgg16               |  8   |  3.4816  |  3.9838   |  3.0407  |         2.7381         |
|         phlippe_densenet          | 128  |  5.6171  |   6.038   |  2.4102  |         3.579          |
|       functorch_dp_cifar10        | 512  |  2.8429  |  2.8502   |  2.0912  |         2.2357         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  3.2591  |  3.4697   |  2.0854  |         2.3121         |
|          phlippe_resnet           | 256  |  2.1593  |  2.3208   |  1.5064  |         1.6054         |
|               dlrm                |  1   |  0.6855  |  0.6658   |  0.388   |         0.6445         |
|                drq                |  1   |  0.6266  |  0.6671   |  0.2122  |         0.5915         |
|           lennard_jones           | 1000 |  0.2345  |  0.2477   |  0.1129  |         0.2724         |
|         soft_actor_critic         | 256  |  0.3069  |   0.327   |  0.1116  |         0.3263         |
|            hf_BigBird             |  4   | 121.7511 | 125.0117  |   nan    |        96.4091         |
|             tacotron2             | 128  | 542.0161 |    nan    |   nan    |          nan           |
|               moco                |  64  | 30.1084  |    nan    |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with float16 precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 1.0948 |  0.9093   |  2.8298  |         1.5823         |
|     MobileBertForQuestionAnswering      | 128 | 1.1262 |  0.9015   |  2.6079  |         1.6467         |
|       MT5ForConditionalGeneration       | 16  | 0.9328 |  0.8313   |  2.4934  |         1.4029         |
|            XLNetLMHeadModel             |  8  | 1.0001 |  0.9975   |  2.3547  |         2.3542         |
|       T5ForConditionalGeneration        |  4  | 0.9591 |   0.901   |  2.0885  |         1.8403         |
|                 T5Small                 |  4  | 0.9659 |  0.9058   |  2.081   |         1.8248         |
|      GPT2ForSequenceClassification      |  4  | 1.0017 |  0.9986   |  1.9098  |         1.8736         |
|             XGLMForCausalLM             |  8  | 1.0555 |  0.9258   |  1.7803  |         1.3653         |
|           ElectraForCausalLM            | 32  | 1.006  |  1.0031   |  1.7753  |         1.7259         |
|       ElectraForQuestionAnswering       | 64  | 1.004  |  0.9995   |  1.7058  |         1.6696         |
|          AllenaiLongformerBase          |  4  | 1.0004 |  0.9864   |  1.5803  |         1.5308         |
|               DistillGPT2               | 16  | 0.9995 |  0.9991   |  1.5552  |         1.5479         |
|    LayoutLMForSequenceClassification    | 16  | 1.0028 |  0.9998   |  1.5278  |         1.5068         |
|            YituTechConvBert             | 16  | 1.0025 |  0.9986   |  1.5086  |         1.4756         |
|       RobertaForQuestionAnswering       | 16  | 1.0029 |  1.0006   |  1.5004  |         1.4693         |
|     M2M100ForConditionalGeneration      | 16  | 0.9991 |  0.8859   |  1.5003  |         1.2569         |
|           DebertaForMaskedLM            |  4  | 0.792  |  0.6888   |  1.4929  |         1.0498         |
|        BertForQuestionAnswering         | 16  | 1.0024 |  1.0001   |  1.4925  |         1.4668         |
|           RobertaForCausalLM            | 16  | 1.0024 |  0.9999   |  1.4806  |         1.4575         |
|             OPTForCausalLM              |  2  | 0.9992 |   0.993   |  1.4629  |         1.4843         |
|           LayoutLMForMaskedLM           | 16  | 1.0026 |   1.002   |  1.4487  |         1.4302         |
|         Speech2Text2ForCausalLM         | 256 | 0.9922 |  0.9822   |  1.4212  |         1.4065         |
|          DebertaV2ForMaskedLM           |  1  | 0.7365 |  0.6416   |  1.4185  |         0.7621         |
|             BertForMaskedLM             | 16  | 1.0044 |  1.0016   |  1.4169  |         1.3981         |
|       AlbertForQuestionAnswering        |  4  | 1.0008 |  1.0009   |  1.3807  |         1.3732         |
|            AlbertForMaskedLM            |  4  | 1.0006 |  1.0006   |  1.3794  |         1.3725         |
|         MegatronBertForCausalLM         |  4  | 1.0086 |  0.9058   |  1.3758  |         1.3594         |
|    MegatronBertForQuestionAnswering     |  8  | 1.0008 |  1.0081   |  1.3747  |         1.3662         |
|                CamemBert                | 16  | 1.0024 |  0.9998   |  1.3547  |         1.3383         |
|     DistilBertForQuestionAnswering      | 256 | 1.0003 |  0.9998   |  1.3375  |         1.3335         |
|             BartForCausalLM             |  4  | 1.0025 |  0.9987   |  1.2778  |         1.2721         |
|          DistilBertForMaskedLM          | 128 | 0.9997 |   0.999   |  1.257   |         1.2502         |
|            MBartForCausalLM             |  4  | 1.0043 |  0.9942   |  1.252   |         1.2539         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0046 |   0.977   |  1.2429  |         1.2457         |
|            PLBartForCausalLM            |  8  | 1.0019 |  0.9951   |  1.2335  |         1.2315         |
|          BlenderbotForCausalLM          |  4  | 1.0361 |  0.9169   |  1.2177  |         1.2261         |
|      MBartForConditionalGeneration      |  2  | 0.9779 |  0.9442   |  1.2007  |         1.1414         |
|       DebertaForQuestionAnswering       |  8  | 0.9836 |  0.8492   |  1.1969  |         1.3328         |
|      BartForConditionalGeneration       |  2  | 0.9743 |  0.9347   |  1.1968  |         1.141          |
|            TrOCRForCausalLM             | 32  | 1.0011 |  0.9994   |  1.1755  |         1.1728         |
|     PLBartForConditionalGeneration      |  4  | 0.9801 |  0.9573   |  1.1717  |         1.1281         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9919 |  0.9437   |  1.1677  |         1.1157         |
|           PegasusForCausalLM            | 32  | 0.9972 |  0.9876   |  1.1411  |         1.1331         |
|     PegasusForConditionalGeneration     | 32  | 0.982  |  0.8866   |  1.1239  |         1.0803         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7328 |  0.6343   |  0.958   |         0.7435         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|      DebertaV2ForQuestionAnswering      |  2  | 7.355  |  11.8604  | 45.0586  |        23.7609         |
|          DebertaV2ForMaskedLM           |  1  | 7.3746 |  12.1264  | 43.6145  |        22.3755         |
|          AllenaiLongformerBase          |  4  | 4.7357 |  9.5649   | 38.3529  |        38.0592         |
|           DebertaForMaskedLM            |  4  | 4.1852 |  6.9707   | 31.0316  |        17.2934         |
|       DebertaForQuestionAnswering       |  8  | 4.2085 |  6.8956   | 30.0144  |        16.6051         |
|     M2M100ForConditionalGeneration      | 16  | 4.2349 |  8.6677   | 26.8993  |        24.8252         |
|     PegasusForConditionalGeneration     | 32  | 3.9796 |  8.8897   | 26.2081  |        23.8287         |
|      MBartForConditionalGeneration      |  2  | 4.5279 |   8.964   | 25.7123  |        23.6768         |
|      BartForConditionalGeneration       |  2  | 4.2739 |   8.741   | 24.8725  |        23.1577         |
|     MobileBertForQuestionAnswering      | 128 | 8.1733 |  15.4154  | 22.2011  |        21.1531         |
|          MobileBertForMaskedLM          | 64  | 8.0651 |  15.3842  | 21.5022  |        20.9862         |
|            XLNetLMHeadModel             |  8  | 4.2748 |  8.9965   | 20.9974  |        21.0643         |
|             XGLMForCausalLM             |  8  | 3.0455 |  6.4681   | 20.9292  |        20.1259         |
|       MT5ForConditionalGeneration       | 16  | 4.8859 |  7.7797   |  20.912  |        15.2587         |
|          BlenderbotForCausalLM          |  4  | 2.9578 |  6.2878   | 19.8217  |        23.9034         |
| BlenderbotSmallForConditionalGeneration | 64  | 3.0504 |  6.0726   | 19.1035  |        21.8864         |
|     PLBartForConditionalGeneration      |  4  | 2.6957 |  5.0868   | 17.1676  |        16.5181         |
|       T5ForConditionalGeneration        |  4  | 3.733  |  5.6724   | 16.3474  |        12.0891         |
|                 T5Small                 |  4  | 3.7383 |  5.6538   | 16.3331  |        12.1191         |
|            YituTechConvBert             | 16  | 2.5729 |  5.3726   | 14.6142  |        14.2015         |
|           PegasusForCausalLM            | 32  | 1.6281 |  3.2652   | 14.1206  |        13.8806         |
|         MegatronBertForCausalLM         |  4  | 3.6963 |  7.3304   | 13.1434  |        12.8686         |
|    MegatronBertForQuestionAnswering     |  8  | 3.7195 |  7.0187   | 12.9535  |        12.7135         |
|            MBartForCausalLM             |  4  | 1.5336 |  3.3842   | 12.4774  |        12.3669         |
|             BartForCausalLM             |  4  | 1.4961 |   3.221   | 12.4042  |        12.0877         |
|             OPTForCausalLM              |  2  | 1.5464 |  3.2089   |  12.368  |        12.4574         |
|            TrOCRForCausalLM             | 32  | 1.5652 |  3.1668   | 11.9011  |        11.7614         |
|         Speech2Text2ForCausalLM         | 256 | 0.9235 |  1.7998   | 10.8299  |        10.5092         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0679 |  2.1858   | 10.6104  |        13.1825         |
|            PLBartForCausalLM            |  8  | 0.8847 |  1.7676   |  9.3117  |         9.3755         |
|           LayoutLMForMaskedLM           | 16  | 1.8137 |  3.6135   |  8.947   |         8.8567         |
|    LayoutLMForSequenceClassification    | 16  | 1.8341 |  3.5781   |  8.821   |         8.8302         |
|      GPT2ForSequenceClassification      |  4  | 1.6912 |  3.2496   |  8.698   |         8.5016         |
|           ElectraForCausalLM            | 32  | 1.7943 |  3.5652   |  8.2358  |         7.8918         |
|           RobertaForCausalLM            | 16  | 1.8181 |  3.5764   |  8.087   |         7.8481         |
|               DistillGPT2               | 16  | 0.9294 |  1.7986   |  7.2733  |         6.8724         |
|       ElectraForQuestionAnswering       | 64  | 1.7844 |   3.554   |  7.0683  |         6.7905         |
|                CamemBert                | 16  | 1.8154 |  3.6533   |  6.993   |         6.8104         |
|       RobertaForQuestionAnswering       | 16  | 1.8198 |  3.5831   |  6.9726  |         6.889          |
|        BertForQuestionAnswering         | 16  | 1.7842 |  3.4884   |  6.9617  |         6.9993         |
|             BertForMaskedLM             | 16  | 1.8297 |  3.4718   |  6.8697  |         6.9908         |
|       AlbertForQuestionAnswering        |  4  | 1.6415 |  3.2972   |  6.4707  |         6.2667         |
|            AlbertForMaskedLM            |  4  | 1.616  |  3.2746   |  6.4673  |         6.2456         |
|          DistilBertForMaskedLM          | 128 | 0.8569 |  1.7727   |  5.6062  |         5.5547         |
|     DistilBertForQuestionAnswering      | 256 | 0.8599 |  1.8054   |  5.5172  |         5.5353         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|           ElectraForCausalLM            | 32  | 1.0027 |  1.0016   |  2.3211  |         2.329          |
|               DistillGPT2               | 16  | 1.0041 |  1.0038   |  1.8394  |         1.8434         |
|          DistilBertForMaskedLM          | 128 | 1.011  |  1.0102   |  1.758   |         1.764          |
|     MobileBertForQuestionAnswering      | 128 | 1.9018 |  1.9086   |  1.7243  |         1.7553         |
|       BlenderbotSmallForCausalLM        | 64  | 1.0041 |  1.0036   |  1.6807  |         1.6807         |
|          MobileBertForMaskedLM          | 64  | 1.0073 |  1.0072   |  1.6677  |         1.6781         |
|           RobertaForCausalLM            | 16  | 1.0064 |  1.0055   |  1.6078  |         1.6135         |
|            PLBartForCausalLM            |  8  | 1.0057 |  1.0053   |  1.5924  |         1.5919         |
|             OPTForCausalLM              |  2  | 1.0031 |  1.0027   |  1.5661  |         1.5661         |
|         Speech2Text2ForCausalLM         | 256 | 0.8791 |  0.8791   |  1.5265  |         1.3818         |
|            YituTechConvBert             | 16  | 1.0087 |  1.0075   |  1.5154  |         1.5222         |
|                CamemBert                | 16  | 1.0082 |  1.0071   |  1.5089  |         1.5153         |
|             BertForMaskedLM             | 16  | 1.0085 |  1.0074   |  1.4987  |         1.5054         |
|           LayoutLMForMaskedLM           | 16  | 1.0084 |  1.0073   |  1.4908  |         1.4982         |
|            TrOCRForCausalLM             | 32  | 1.0063 |  1.0057   |  1.4427  |         1.4427         |
|          AllenaiLongformerBase          |  4  | 0.9606 |  0.9606   |  1.3423  |         1.356          |
|             BartForCausalLM             |  4  | 1.0041 |  1.0037   |  1.2508  |         1.2508         |
|            MBartForCausalLM             |  4  | 1.0041 |  1.0037   |  1.2507  |         1.2507         |
|           PegasusForCausalLM            | 32  | 0.9074 |  0.9074   |  1.2238  |         1.1091         |
|             XGLMForCausalLM             |  8  | 0.9703 |  0.9703   |  1.1722  |         1.1397         |
|         MegatronBertForCausalLM         |  4  | 1.0025 |   1.002   |  1.1593  |         1.1626         |
|     M2M100ForConditionalGeneration      | 16  | 0.9363 |  0.9363   |  1.1101  |         1.1057         |
|     PegasusForConditionalGeneration     | 32  | 0.9933 |  0.9933   |  1.0378  |         1.1903         |
|          BlenderbotForCausalLM          |  4  | 1.0008 |  1.0008   |  0.9892  |         0.9892         |
|       AlbertForQuestionAnswering        |  4  | 1.0896 |  1.0892   |  0.9828  |         0.9862         |
|            AlbertForMaskedLM            |  4  | 1.0894 |  1.0891   |  0.982   |         0.9854         |
|    MegatronBertForQuestionAnswering     |  8  | 1.0339 |  1.0328   |  0.9806  |         0.9833         |
|      GPT2ForSequenceClassification      |  4  | 1.0145 |  1.0145   |  0.9675  |         0.9707         |
|    LayoutLMForSequenceClassification    | 16  | 1.0889 |  1.0866   |  0.9626  |         0.9677         |
|        BertForQuestionAnswering         | 16  | 1.0904 |   1.089   |  0.9625  |         0.9668         |
|       RobertaForQuestionAnswering       | 16  | 1.0914 |   1.089   |  0.9625  |         0.9668         |
|     PLBartForConditionalGeneration      |  4  | 0.9995 |    1.0    |  0.9382  |         0.9995         |
|      BartForConditionalGeneration       |  2  |  1.0   |    1.0    |  0.9305  |          1.0           |
|      MBartForConditionalGeneration      |  2  |  1.0   |    1.0    |  0.9287  |          1.0           |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |    1.0    |  0.9229  |          1.0           |
|       ElectraForQuestionAnswering       | 64  | 1.2329 |  1.2126   |  0.9196  |         0.9282         |
|            XLNetLMHeadModel             |  8  | 1.0039 |  1.0031   |  0.8995  |         0.8995         |
|       MT5ForConditionalGeneration       | 16  | 0.9684 |  0.9724   |  0.8984  |         0.9713         |
|     DistilBertForQuestionAnswering      | 256 | 1.1392 |  1.1345   |  0.888   |         0.8922         |
|          DebertaV2ForMaskedLM           |  1  | 0.9999 |  0.9999   |  0.5904  |         0.9999         |
|      DebertaV2ForQuestionAnswering      |  2  | 1.0016 |  1.0016   |  0.4181  |         0.9604         |
|       T5ForConditionalGeneration        |  4  | 0.7115 |  0.7067   |  0.3528  |         0.7236         |
|                 T5Small                 |  4  | 0.7091 |  0.7067   |  0.3528  |         0.7236         |
|           DebertaForMaskedLM            |  4  | 0.9598 |  0.9595   |  0.1833  |         0.9597         |
|       DebertaForQuestionAnswering       |  8  | 0.9364 |  0.9364   |  0.0903  |         0.8793         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 66.6866  |  66.6853  | 48.9019  |        49.1294         |
|       AlbertForQuestionAnswering        |  4  | 66.1753  |  66.1346  | 48.5057  |        48.7772         |
|            XLNetLMHeadModel             |  8  | 100.6799 | 101.3405  |  42.849  |        42.7894         |
|     PegasusForConditionalGeneration     | 32  | 38.8852  |  45.9199  | 34.1454  |        35.4769         |
|            TrOCRForCausalLM             | 32  | 38.3302  |  38.9068  | 33.0452  |        33.0718         |
|          AllenaiLongformerBase          |  4  |  45.429  |  45.8325  | 28.9223  |        29.6078         |
|      MBartForConditionalGeneration      |  2  | 35.3323  |  36.6296  | 28.8116  |        30.1772         |
|      BartForConditionalGeneration       |  2  | 34.6497  |  36.0841  | 28.3006  |        29.8082         |
| BlenderbotSmallForConditionalGeneration | 64  | 29.5603  |  31.4023  |  25.229  |        26.3977         |
|            YituTechConvBert             | 16  | 36.7578  |  36.8352  | 24.3742  |        24.9204         |
|            PLBartForCausalLM            |  8  | 29.2232  |  29.6329  | 23.7903  |        23.6263         |
|            MBartForCausalLM             |  4  | 28.8614  |  29.4201  | 23.2017  |        23.0512         |
|     PLBartForConditionalGeneration      |  4  | 27.0912  |  28.1038  | 22.9043  |        23.7805         |
|    MegatronBertForQuestionAnswering     |  8  | 31.2524  |  31.1211  | 22.8066  |        22.9694         |
|             BartForCausalLM             |  4  | 28.8338  |  28.9693  | 22.0296  |        22.8171         |
|                CamemBert                | 16  | 28.5455  |  28.6161  |  21.112  |        21.3656         |
|             OPTForCausalLM              |  2  |  30.239  |  30.5512  | 20.8431  |         20.424         |
|     DistilBertForQuestionAnswering      | 256 | 26.8451  |  26.7697  | 20.2794  |        20.3715         |
|      DebertaV2ForQuestionAnswering      |  2  | 24.5212  |  28.2817  |  19.05   |        24.5913         |
|     M2M100ForConditionalGeneration      | 16  | 27.8727  |  31.5438  | 18.9641  |        21.6981         |
|          DistilBertForMaskedLM          | 128 | 22.0993  |  22.1228  | 17.7157  |        17.7825         |
|               DistillGPT2               | 16  | 27.4605  |  27.5017  | 17.6576  |        17.7394         |
|           LayoutLMForMaskedLM           | 16  |  25.357  |  25.3838  | 17.5678  |        17.7942         |
|           RobertaForCausalLM            | 16  | 25.9143  |  25.9875  | 17.5618  |        17.8027         |
|             BertForMaskedLM             | 16  | 24.7618  |  24.8049  | 17.5285  |        17.7667         |
|           PegasusForCausalLM            | 32  | 19.3553  |  19.6126  | 16.9605  |        17.0453         |
|       T5ForConditionalGeneration        |  4  | 34.6505  |  36.2254  | 15.6464  |        17.7424         |
|                 T5Small                 |  4  | 33.5697  |  36.1143  | 15.6103  |        17.9002         |
|     MobileBertForQuestionAnswering      | 128 | 26.6153  |  32.4148  |  14.043  |        17.7554         |
|       ElectraForQuestionAnswering       | 64  | 23.7991  |  23.9076  | 13.9942  |        14.2675         |
|    LayoutLMForSequenceClassification    | 16  |  21.259  |  21.3363  | 13.9464  |        14.1574         |
|       RobertaForQuestionAnswering       | 16  | 20.8533  |  20.9167  | 13.9423  |        14.2251         |
|        BertForQuestionAnswering         | 16  | 20.7119  |  20.7791  | 13.9331  |         14.174         |
|         MegatronBertForCausalLM         |  4  | 18.9575  |  23.261   | 13.8565  |        14.0195         |
|       DebertaForQuestionAnswering       |  8  | 16.7141  |  19.3215  |  13.754  |        12.3008         |
|          DebertaV2ForMaskedLM           |  1  | 25.5792  |  29.4933  | 13.7348  |        25.7068         |
|       BlenderbotSmallForCausalLM        | 64  | 16.6079  |  17.0667  | 13.4246  |        13.3189         |
|          BlenderbotForCausalLM          |  4  |  15.607  |  17.7333  |  13.151  |         13.265         |
|       MT5ForConditionalGeneration       | 16  |  34.196  |  38.694   | 12.6952  |         23.118         |
|          MobileBertForMaskedLM          | 64  | 32.1774  |  37.8092  | 12.1581  |        21.6459         |
|             XGLMForCausalLM             |  8  | 19.3337  |  21.9978  | 11.0211  |        14.8551         |
|           ElectraForCausalLM            | 32  |  19.065  |  19.1811  | 10.8167  |        11.1041         |
|           DebertaForMaskedLM            |  4  | 19.6657  |  22.1604  | 10.6533  |         14.823         |
|         Speech2Text2ForCausalLM         | 256 | 14.9367  |  15.2599  |  10.525  |         10.637         |
|      GPT2ForSequenceClassification      |  4  | 19.0671  |  19.1576  | 10.0361  |         10.204         |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with float16 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          resmlp_12_224          | 128 | 0.9997 |  1.0108   |  1.9766  |         1.969          |
|        tnt_s_patch16_224        | 128 | 0.9999 |    1.0    |   1.89   |         1.8676         |
|           dm_nfnet_f0           | 128 | 0.9997 |  0.9995   |  1.8494  |         1.7848         |
|           regnety_002           | 128 | 0.9895 |  0.9944   |  1.756   |         1.3655         |
|         coat_lite_mini          | 128 | 0.9996 |  0.9995   |  1.7503  |         1.7197         |
|            nfnet_l0             | 128 | 0.9995 |  0.9996   |  1.7382  |         1.7212         |
|          convnext_base          | 64  | 0.9991 |  1.0115   |  1.6174  |         1.5931         |
|        sebotnet33ts_256         | 64  | 0.999  |  0.9995   |  1.6119  |         1.5883         |
|          cait_m36_384           |  4  | 0.9971 |  1.0071   |  1.5977  |         1.552          |
|           volo_d1_224           | 64  | 0.9996 |  1.0027   |  1.5974  |         1.5801         |
|           resnest101e           | 64  | 1.0004 |    1.0    |  1.5963  |         1.5001         |
|         poolformer_m36          | 64  | 1.0014 |   1.002   |  1.5791  |         1.5413         |
|        res2net50_14w_8s         | 128 | 0.9989 |  1.0003   |  1.5064  |         1.4746         |
|            tinynet_a            | 128 | 0.9982 |  0.9989   |  1.4939  |         1.4701         |
|          botnet26t_256          | 128 | 0.9992 |  0.9998   |  1.4884  |         1.4794         |
|          gmixer_24_224          | 128 | 0.9999 |  1.0058   |  1.482   |         1.4758         |
|           res2next50            | 128 | 0.9988 |  0.9973   |  1.481   |         1.3969         |
|        res2net101_26w_4s        | 64  | 0.9991 |  0.9998   |  1.4793  |         1.4394         |
|       tf_efficientnet_b0        | 128 | 0.999  |  0.9986   |  1.4783  |         1.4648         |
|          gmlp_s16_224           | 128 | 0.9994 |  1.0057   |  1.4733  |         1.4668         |
|            gernet_l             | 128 | 0.9994 |  1.0002   |  1.4691  |         1.4533         |
|         mobilenetv2_100         | 128 | 0.9977 |  0.9991   |  1.4641  |         1.4507         |
|        ese_vovnet19b_dw         | 128 | 0.9988 |  0.9994   |  1.4524  |         1.4468         |
|       eca_botnext26ts_256       | 128 | 0.9993 |  0.9995   |  1.4504  |         1.4266         |
|             dla102              | 128 | 0.9999 |  1.0004   |  1.4504  |         1.4379         |
|             dpn107              | 32  | 0.9991 |    1.0    |  1.4455  |         1.411          |
|          ghostnet_100           | 128 | 0.9977 |  0.9993   |  1.4408  |         1.4229         |
|        eca_halonext26ts         | 128 | 0.9993 |  0.9996   |  1.4369  |         1.4133         |
|     swsl_resnext101_32x16d      | 32  | 0.9993 |  1.0001   |  1.4332  |         1.3239         |
|          cspdarknet53           | 64  |  1.0   |    1.0    |  1.4275  |         1.4066         |
|          spnasnet_100           | 128 | 0.9981 |  0.9993   |  1.4122  |         1.3982         |
|           mnasnet_100           | 128 | 0.9978 |  0.9996   |  1.4112  |         1.3988         |
|            fbnetv3_b            | 128 | 0.9981 |  0.9996   |  1.4112  |         1.3954         |
|           rexnet_100            | 128 | 0.9982 |  0.9988   |  1.395   |         1.3692         |
|           fbnetc_100            | 128 | 0.9981 |  0.9994   |  1.3883  |         1.3755         |
|          inception_v3           | 128 | 0.9991 |  0.9995   |  1.3859  |         1.3712         |
|       gluon_inception_v3        | 128 | 0.9987 |  0.9996   |  1.3848  |         1.3713         |
|        adv_inception_v3         | 128 | 0.999  |  0.9996   |  1.3844  |         1.3724         |
|      mobilenetv3_large_100      | 128 | 0.9973 |  0.9982   |  1.372   |         1.3547         |
|  swin_base_patch4_window7_224   | 64  | 0.9998 |  1.0001   |  1.3473  |         1.3291         |
|           tf_mixnet_l           | 128 | 0.9994 |  0.9995   |  1.3462  |         1.3219         |
|          pnasnet5large          | 16  | 1.0003 |  1.0016   |  1.3322  |         1.3096         |
|          jx_nest_base           | 32  | 0.9999 |  1.0048   |  1.3313  |         1.3082         |
|            repvgg_a2            | 128 | 0.9997 |  1.0006   |  1.3294  |         1.3199         |
|           selecsls42b           | 128 | 0.9982 |  0.9989   |  1.3226  |         1.3092         |
|           mobilevit_s           | 64  | 0.9994 |  1.0005   |  1.3142  |         1.2952         |
|            hrnet_w18            | 128 | 1.0009 |  1.0011   |  1.3124  |         1.2677         |
|            mixnet_l             | 128 | 0.9988 |  0.9996   |  1.2962  |         1.2714         |
|        gluon_xception65         | 32  | 0.9989 |  0.9994   |  1.2945  |         1.2779         |
|            lcnet_050            | 128 | 0.9927 |  0.9945   |  1.281   |         1.2667         |
|        twins_pcpvt_base         | 64  | 1.0054 |  1.0057   |  1.2503  |         1.2232         |
|         crossvit_9_240          | 128 | 1.0016 |  1.0018   |  1.2388  |         1.2059         |
|        convmixer_768_32         | 32  | 0.9996 |  0.9999   |  1.1892  |         1.1879         |
|          mixer_b16_224          | 128 | 0.9999 |  1.0055   |  1.1817  |         1.1776         |
|            pit_b_224            | 64  | 0.9997 |  0.9999   |  1.1274  |         1.1198         |
| deit_base_distilled_patch16_224 | 64  | 0.9997 |  1.0001   |  1.1241  |         1.1143         |
|         visformer_small         | 128 | 0.999  |  0.9992   |  1.1077  |         1.0905         |
|      beit_base_patch16_224      | 64  | 0.9995 |  0.9999   |  1.097   |         1.1039         |
|      vit_base_patch16_224       | 64  | 0.9998 |  0.9999   |  1.0813  |         1.0747         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+-----------+---------------+------------------------+
|              name               | bs | eager | aot_eager |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+-----------+---------------+------------------------+
|        adv_inception_v3         | 8  | pass  |   pass    |     pass      |          pass          |
|      beit_base_patch16_224      | 8  | pass  |   pass    |     pass      |          pass          |
|            nfnet_l0             | 8  | pass  |   pass    |     pass      |          pass          |
|            pit_b_224            | 8  | pass  |   pass    |     pass      |          pass          |
|          pnasnet5large          | 8  | pass  |   pass    |     pass      |          pass          |
|         poolformer_m36          | 8  | pass  |   pass    |     pass      |          pass          |
|           regnety_002           | 8  | pass  |   pass    |     pass      |          pass          |
|            repvgg_a2            | 8  | pass  |   pass    |     pass      |          pass          |
|        res2net101_26w_4s        | 8  | pass  |   pass    |     pass      |          pass          |
|        res2net50_14w_8s         | 8  | pass  |   pass    |     pass      |          pass          |
|           res2next50            | 8  | pass  |   pass    |     pass      |          pass          |
|          resmlp_12_224          | 8  | pass  |   pass    |     pass      |          pass          |
|           resnest101e           | 8  | pass  |   pass    |     pass      |          pass          |
|           rexnet_100            | 8  | pass  |   pass    |     pass      |          pass          |
|        sebotnet33ts_256         | 8  | pass  |   pass    |     pass      |          pass          |
|           selecsls42b           | 8  | pass  |   pass    |     pass      |          pass          |
|          spnasnet_100           | 8  | pass  |   pass    |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |   pass    |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |   pass    |     pass      |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |   pass    |     pass      |          pass          |
|           tf_mixnet_l           | 8  | pass  |   pass    |     pass      |          pass          |
|            tinynet_a            | 8  | pass  |   pass    |     pass      |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |   pass    |     pass      |          pass          |
|        twins_pcpvt_base         | 8  | pass  |   pass    |     pass      |          pass          |
|         visformer_small         | 8  | pass  |   pass    |     pass      |          pass          |
|      vit_base_patch16_224       | 8  | pass  |   pass    |     pass      |          pass          |
|           volo_d1_224           | 8  | pass  |   pass    |     pass      |          pass          |
|           mobilevit_s           | 8  | pass  |   pass    |     pass      |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |   pass    |     pass      |          pass          |
|         mobilenetv2_100         | 8  | pass  |   pass    |     pass      |          pass          |
|           fbnetc_100            | 8  | pass  |   pass    |     pass      |          pass          |
|          botnet26t_256          | 8  | pass  |   pass    |     pass      |          pass          |
|         coat_lite_mini          | 8  | pass  |   pass    |     pass      |          pass          |
|        convmixer_768_32         | 8  | pass  |   pass    |     pass      |          pass          |
|         crossvit_9_240          | 8  | pass  |   pass    |     pass      |          pass          |
|          cspdarknet53           | 8  | pass  |   pass    |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |   pass    |     pass      |          pass          |
|             dla102              | 8  | pass  |   pass    |     pass      |          pass          |
|           dm_nfnet_f0           | 8  | pass  |   pass    |     pass      |          pass          |
|             dpn107              | 8  | pass  |   pass    |     pass      |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |   pass    |     pass      |          pass          |
|        eca_halonext26ts         | 8  | pass  |   pass    |     pass      |          pass          |
|           mnasnet_100           | 8  | pass  |   pass    |     pass      |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |   pass    |     pass      |          pass          |
|            fbnetv3_b            | 8  | pass  |   pass    |     pass      |          pass          |
|            gernet_l             | 8  | pass  |   pass    |     pass      |          pass          |
|       gluon_inception_v3        | 8  | pass  |   pass    |     pass      |          pass          |
|        gluon_xception65         | 8  | pass  |   pass    |     pass      |          pass          |
|          gmixer_24_224          | 8  | pass  |   pass    |     pass      |          pass          |
|          gmlp_s16_224           | 8  | pass  |   pass    |     pass      |          pass          |
|            hrnet_w18            | 8  | pass  |   pass    |     pass      |          pass          |
|          inception_v3           | 8  | pass  |   pass    |     pass      |          pass          |
|          jx_nest_base           | 8  | pass  |   pass    |     pass      |          pass          |
|            lcnet_050            | 8  | pass  |   pass    |     pass      |          pass          |
|          mixer_b16_224          | 8  | pass  |   pass    |     pass      |          pass          |
|            mixnet_l             | 8  | pass  |   pass    |     pass      |          pass          |
|          cait_m36_384           | 4  | pass  |   pass    | fail_accuracy |     fail_accuracy      |
|          convnext_base          | 8  | pass  |   pass    | fail_accuracy |     fail_accuracy      |
|          ghostnet_100           | 8  | pass  |   pass    | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+-----------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|           mobilevit_s           | 64  | 1.6799 |  3.6168   | 52.8307  |        52.3107         |
|        twins_pcpvt_base         | 64  | 2.9139 |  6.7336   | 44.1319  |        44.2555         |
|         coat_lite_mini          | 128 | 1.1726 |  2.5216   | 32.2498  |        32.6697         |
|  swin_base_patch4_window7_224   | 64  | 2.8827 |  6.2327   | 30.9471  |        30.7917         |
|            hrnet_w18            | 128 | 5.814  |  14.3401  | 28.3875  |        27.8485         |
|          pnasnet5large          | 16  | 4.662  |  10.6008  | 24.9895  |        24.2476         |
|           resnest101e           | 64  | 3.3394 |  7.7367   | 24.6733  |         22.766         |
|          cait_m36_384           |  4  | 3.6466 |  8.9265   | 23.2248  |        22.8576         |
|          convnext_base          | 64  | 1.4013 |  2.7002   | 22.6749  |         22.414         |
|          jx_nest_base           | 32  | 1.7164 |  3.7374   | 21.3238  |        21.4028         |
|        eca_halonext26ts         | 128 | 1.2546 |  2.4489   | 17.6639  |        17.3687         |
|         poolformer_m36          | 64  | 1.6736 |  3.0995   | 16.9444  |        16.4802         |
|        res2net101_26w_4s        | 64  | 3.0594 |  8.0003   | 15.9682  |         15.837         |
|        tnt_s_patch16_224        | 128 | 2.0956 |  5.1344   | 15.1374  |        15.1584         |
|           volo_d1_224           | 64  | 1.4551 |  3.5476   | 14.7528  |        14.5393         |
|        res2net50_14w_8s         | 128 | 2.7393 |  7.2575   |  14.63   |        14.4442         |
|          botnet26t_256          | 128 | 1.1127 |   2.148   | 14.5352  |        14.2127         |
|        sebotnet33ts_256         | 64  | 1.5066 |  2.9793   | 14.2524  |        14.0722         |
|             dpn107              | 32  | 3.2041 |  6.4577   | 13.1818  |        13.1523         |
|          gmlp_s16_224           | 128 | 1.3661 |  3.2155   | 12.6288  |        12.6542         |
|         crossvit_9_240          | 128 | 1.947  |  4.3391   | 12.6238  |        12.6771         |
|          gmixer_24_224          | 128 | 1.5998 |  3.7675   |  12.068  |        12.2187         |
|            fbnetv3_b            | 128 | 2.6724 |  5.7195   |  11.727  |        11.5713         |
|       eca_botnext26ts_256       | 128 | 1.1818 |  2.3511   | 11.2148  |         10.821         |
|           tf_mixnet_l           | 128 | 2.9022 |  5.3671   | 11.0378  |        10.8641         |
|        gluon_xception65         | 32  | 2.0014 |  5.3019   | 10.3698  |        10.0787         |
|            mixnet_l             | 128 | 2.6798 |  4.9943   | 10.3078  |        10.0614         |
|           dm_nfnet_f0           | 128 | 2.1918 |  4.0296   |  9.5938  |         9.4696         |
|      beit_base_patch16_224      | 64  | 1.2945 |   2.791   |  9.4545  |         9.1465         |
|             dla102              | 128 | 1.7585 |  4.5966   |  9.4477  |         9.3291         |
|          inception_v3           | 128 | 1.5588 |  4.0012   |  9.2181  |         8.9852         |
|     swsl_resnext101_32x16d      | 32  | 1.7086 |  4.4762   |   9.15   |         8.9181         |
|       gluon_inception_v3        | 128 | 1.5651 |  3.9212   |  9.0423  |         8.9607         |
|        adv_inception_v3         | 128 | 1.5442 |  4.0018   |  9.0358  |         9.0158         |
|           res2next50            | 128 | 1.5438 |  3.9815   |  8.9507  |         8.6775         |
|            nfnet_l0             | 128 | 1.9949 |  3.8685   |  8.823   |         8.6087         |
|          ghostnet_100           | 128 | 1.5237 |  3.8603   |  8.5616  |         8.4536         |
|          resmlp_12_224          | 128 | 0.6902 |  1.3944   |  8.3372  |         8.1746         |
|           rexnet_100            | 128 | 1.6772 |  3.4909   |  8.2909  |         7.9989         |
|            tinynet_a            | 128 | 1.8515 |  3.6163   |  8.1476  |         7.8964         |
|            pit_b_224            | 64  | 1.1889 |   2.574   |  8.043   |         7.8682         |
|          cspdarknet53           | 64  | 1.8665 |  3.6881   |  8.0148  |         7.8448         |
|          mixer_b16_224          | 128 | 0.7728 |  1.6757   |  7.8537  |         7.581          |
|       tf_efficientnet_b0        | 128 | 1.579  |  3.1523   |  7.4242  |         7.2379         |
|      vit_base_patch16_224       | 64  | 1.0025 |  2.3278   |  7.1656  |         6.9113         |
| deit_base_distilled_patch16_224 | 64  | 1.0353 |  2.2952   |  7.1409  |         6.9869         |
|          spnasnet_100           | 128 | 1.6291 |  3.1654   |  6.4189  |         6.2008         |
|           fbnetc_100            | 128 | 1.6485 |  3.2314   |  6.4166  |         6.2539         |
|      mobilenetv3_large_100      | 128 | 1.4201 |   2.777   |  6.3721  |         6.1577         |
|         mobilenetv2_100         | 128 | 1.3782 |  2.7123   |  6.0471  |         5.8028         |
|            repvgg_a2            | 128 | 1.6064 |  3.0998   |  6.0361  |         5.769          |
|            gernet_l             | 128 | 1.5886 |  3.0408   |  5.829   |         5.7272         |
|        convmixer_768_32         | 32  | 1.1418 |  2.8932   |  5.6791  |         5.5341         |
|           regnety_002           | 128 | 1.3504 |  2.6303   |  5.6094  |         5.3225         |
|           mnasnet_100           | 128 | 1.3254 |  2.6231   |  5.4702  |         5.2641         |
|         visformer_small         | 128 | 0.9127 |  2.0465   |  5.3921  |         5.1751         |
|           selecsls42b           | 128 | 0.7206 |  1.7567   |  4.3714  |         4.2355         |
|            lcnet_050            | 128 | 0.8226 |  1.7444   |  4.3004  |         4.0886         |
|        ese_vovnet19b_dw         | 128 | 0.7879 |  1.4679   |  4.1267  |         4.0908         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|         mobilenetv2_100         | 128 | 1.2161 |  1.2161   |  1.6527  |         1.7992         |
|           mnasnet_100           | 128 | 1.3757 |  1.3757   |  1.6163  |         1.8416         |
|          spnasnet_100           | 128 | 1.3755 |  1.3755   |  1.6159  |         1.8411         |
|            tinynet_a            | 128 | 1.2089 |  1.2089   |  1.6151  |         1.7608         |
|           fbnetc_100            | 128 | 1.1454 |  1.1454   |  1.5494  |         1.6844         |
|      mobilenetv3_large_100      | 128 | 1.2038 |  1.2038   |  1.5323  |         1.7135         |
|            fbnetv3_b            | 128 | 1.1981 |   1.198   |  1.5288  |         1.7037         |
|           rexnet_100            | 128 | 1.2146 |  1.2146   |  1.5136  |         1.6349         |
|        convmixer_768_32         | 32  | 1.1906 |  1.1889   |  1.3816  |         1.471          |
|           selecsls42b           | 128 | 1.5939 |  1.5938   |  1.3475  |         1.4479         |
|          pnasnet5large          | 16  | 1.4292 |  0.6091   |  1.3105  |         1.3352         |
|        sebotnet33ts_256         | 64  | 1.1862 |  1.1862   |  1.2693  |         1.2914         |
|            lcnet_050            | 128 | 1.5357 |  1.5357   |  1.2635  |         1.5436         |
|        gluon_xception65         | 32  | 1.2784 |  1.2784   |  1.2515  |         1.2784         |
|       tf_efficientnet_b0        | 128 | 1.3183 |  1.3183   |  1.2451  |         1.3183         |
|          cspdarknet53           | 64  | 1.4123 |  1.4124   |  1.2097  |         1.2417         |
|           dm_nfnet_f0           | 128 | 1.1526 |   1.345   |  1.2035  |         1.2411         |
|        ese_vovnet19b_dw         | 128 | 1.2901 |  1.2901   |  1.1991  |         1.2388         |
|            hrnet_w18            | 128 | 1.0653 |  1.0652   |  1.1499  |         1.2233         |
|           res2next50            | 128 | 1.2855 |  1.0464   |  1.1203  |         1.1687         |
|            mixnet_l             | 128 | 1.1529 |  1.1529   |  1.1174  |         1.1529         |
|           tf_mixnet_l           | 128 | 1.1529 |  1.1529   |  1.1174  |         1.1529         |
|        res2net50_14w_8s         | 128 | 1.181  |  0.9718   |  1.1167  |         1.166          |
|         coat_lite_mini          | 128 | 1.1047 |  1.0864   |  1.0739  |         1.1263         |
|        res2net101_26w_4s        | 64  | 1.178  |  1.0025   |  1.0688  |         1.1088         |
|            nfnet_l0             | 128 | 1.3463 |  1.3463   |  1.0537  |         1.0933         |
|           regnety_002           | 128 |  1.0   |    1.0    |  1.0448  |         1.1967         |
|          ghostnet_100           | 128 | 1.1127 |  1.1127   |  1.0191  |         1.1127         |
|         poolformer_m36          | 64  | 1.164  |  1.1639   |  1.0187  |         1.0493         |
|        eca_halonext26ts         | 128 | 1.0616 |  0.8517   |  1.011   |         1.0616         |
|       eca_botnext26ts_256       | 128 | 1.0617 |  0.8515   |  1.011   |         1.0617         |
|          botnet26t_256          | 128 | 1.0611 |  0.8526   |  1.0109  |         1.0611         |
|     swsl_resnext101_32x16d      | 32  |  1.0   |  0.9642   |  0.9895  |         0.9994         |
|            repvgg_a2            | 128 | 1.0306 |  1.0306   |  0.9803  |         1.0301         |
|           resnest101e           | 64  |  1.0   |  0.9998   |  0.973   |          1.0           |
|             dla102              | 128 |  1.0   |  0.8898   |  0.9642  |          1.0           |
|        adv_inception_v3         | 128 | 1.0003 |  0.9994   |  0.9469  |         1.0003         |
|       gluon_inception_v3        | 128 | 1.0003 |  0.9994   |  0.9469  |         1.0003         |
|          inception_v3           | 128 | 1.0003 |  0.9994   |  0.9469  |         1.0003         |
|            gernet_l             | 128 |  1.0   |    1.0    |  0.943   |          1.0           |
|          convnext_base          | 64  | 1.1206 |  1.1123   |  0.9288  |         0.9503         |
|             dpn107              | 32  | 1.1468 |  0.9941   |  0.8963  |         0.9046         |
|          jx_nest_base           | 32  | 1.1072 |  1.1034   |  0.8724  |         0.8863         |
|           mobilevit_s           | 64  | 1.1638 |  1.1638   |  0.8648  |         0.8966         |
|            pit_b_224            | 64  | 1.0666 |  1.0648   |  0.8611  |         0.8727         |
|         visformer_small         | 128 | 1.1189 |  1.1188   |  0.8583  |         0.904          |
|           volo_d1_224           | 64  |  1.0   |    1.0    |  0.8566  |         0.8885         |
|          cait_m36_384           |  4  | 1.0083 |  1.0072   |  0.8561  |         0.8605         |
|        twins_pcpvt_base         | 64  | 1.0783 |   1.06    |  0.8096  |         0.832          |
| deit_base_distilled_patch16_224 | 64  | 1.064  |  1.0616   |  0.801   |         0.8265         |
|      vit_base_patch16_224       | 64  | 1.0644 |  1.0603   |  0.7999  |         0.8244         |
|          mixer_b16_224          | 128 | 1.1719 |  1.1635   |  0.7545  |         0.7866         |
|  swin_base_patch4_window7_224   | 64  | 1.3608 |  1.3483   |  0.7307  |         0.741          |
|      beit_base_patch16_224      | 64  | 1.0635 |  1.0595   |  0.7001  |         0.7177         |
|         crossvit_9_240          | 128 | 1.0505 |  1.0392   |  0.6791  |         0.7225         |
|          resmlp_12_224          | 128 | 1.1803 |  1.1803   |  0.6126  |         0.6569         |
|          gmixer_24_224          | 128 | 1.1623 |  1.1317   |  0.553   |         0.5864         |
|          gmlp_s16_224           | 128 | 1.0786 |  1.0409   |  0.5313  |         0.5679         |
|        tnt_s_patch16_224        | 128 | 1.2112 |  0.9202   |  0.5077  |         0.5345         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 88.3165 |  88.3376  | 74.2573  |        74.3042         |
|            hrnet_w18            | 128 | 65.9985 |  66.0677  | 50.4852  |        52.1408         |
|        tnt_s_patch16_224        | 128 | 85.455  |  85.4381  | 45.2385  |        45.7853         |
|            pit_b_224            | 64  | 49.4353 |  49.4178  | 43.8321  |        44.1227         |
|          pnasnet5large          | 16  | 56.6991 |  56.6864  | 42.5542  |        43.2171         |
|           dm_nfnet_f0           | 128 | 73.9658 |  74.0721  | 40.1696  |        41.4165         |
|            nfnet_l0             | 128 | 60.9418 |  60.9299  | 35.0593  |        35.3891         |
|      beit_base_patch16_224      | 64  | 38.2315 |  38.1946  |  34.816  |        34.6042         |
|          cait_m36_384           |  4  | 54.365  |  53.8392  | 34.0968  |        35.0604         |
|           res2next50            | 128 | 50.2074 |  50.2362  | 33.8482  |        36.0347         |
|            mixnet_l             | 128 | 43.3417 |  43.3693  | 33.4691  |        34.1204         |
|           tf_mixnet_l           | 128 | 44.7468 |  44.7004  | 33.2361  |         33.778         |
|      vit_base_patch16_224       | 64  | 35.8868 |  35.8997  | 33.1811  |        33.3936         |
|             dla102              | 128 | 46.9305 |  47.0127  | 32.3875  |        32.6512         |
|          mixer_b16_224          | 128 | 38.2411 |  38.0482  | 32.3461  |         32.48          |
|           resnest101e           | 64  | 50.8634 |  50.8448  | 31.9849  |         33.846         |
|         poolformer_m36          | 64  | 48.5117 |  48.4287  | 30.7446  |        31.4101         |
|        adv_inception_v3         | 128 | 40.5404 |  40.5552  | 29.2389  |        29.5114         |
|          inception_v3           | 128 | 40.5563 |  40.5018  | 29.2258  |        29.5587         |
|       gluon_inception_v3        | 128 | 40.5303 |  40.5701  | 29.2199  |        29.5303         |
|  swin_base_patch4_window7_224   | 64  | 38.633  |  38.6937  |  28.69   |        29.0696         |
|     swsl_resnext101_32x16d      | 32  | 39.7776 |  39.4724  | 27.8144  |        30.0158         |
|           volo_d1_224           | 64  | 43.8514 |  43.7892  | 27.4528  |         27.786         |
|          jx_nest_base           | 32  | 36.4801 |  36.3044  | 27.4244  |        27.8742         |
|        res2net50_14w_8s         | 128 | 40.4617 |  40.4363  |  26.787  |        27.4147         |
|         visformer_small         | 128 | 29.5133 |  29.5069  | 26.6377  |        27.0361         |
|          gmlp_s16_224           | 128 | 39.1046 |  38.8193  | 26.5034  |        26.6144         |
| deit_base_distilled_patch16_224 | 64  | 29.2799 |  29.2929  | 26.0479  |        26.2872         |
|         crossvit_9_240          | 128 | 30.0006 |  29.9943  |  24.265  |        24.9135         |
|          gmixer_24_224          | 128 | 35.1701 |  34.9826  |  23.703  |        23.8389         |
|             dpn107              | 32  | 33.7306 |  33.7207  | 23.3551  |        23.8635         |
|        gluon_xception65         | 32  | 28.3922 |  28.4941  | 21.9773  |        22.2356         |
|        eca_halonext26ts         | 128 | 31.4992 |  31.5405  | 21.9412  |          22.3          |
|        res2net101_26w_4s        | 64  | 32.3105 |  32.2097  | 21.7444  |        22.4656         |
|        twins_pcpvt_base         | 64  | 26.6893 |  26.7326  | 21.4259  |        21.9231         |
|       eca_botnext26ts_256       | 128 | 30.4846 |  30.4586  | 21.0217  |        21.3381         |
|          convnext_base          | 64  | 33.3193 |  32.9082  | 20.5694  |        20.9001         |
|          botnet26t_256          | 128 | 28.8232 |  28.8032  | 19.3533  |        19.4744         |
|            repvgg_a2            | 128 | 22.7659 |  22.7685  | 17.1192  |         17.224         |
|            gernet_l             | 128 | 24.2677 |  24.2874  | 16.5126  |        16.6866         |
|         coat_lite_mini          | 128 | 28.2054 |  28.2207  | 16.1199  |         16.428         |
|            fbnetv3_b            | 128 | 22.5293 |  22.4983  | 15.9502  |        16.0742         |
|          cspdarknet53           | 64  | 21.8632 |  21.8857  | 15.2871  |        15.5543         |
|           mobilevit_s           | 64  | 19.7694 |  19.8003  | 15.0637  |        15.2624         |
|        sebotnet33ts_256         | 64  | 23.0972 |  23.1062  | 14.3191  |        14.5299         |
|           rexnet_100            | 128 | 18.9534 |  18.9047  | 13.5426  |        13.8161         |
|       tf_efficientnet_b0        | 128 | 18.2809 |  18.3635  | 12.3823  |        12.4806         |
|           selecsls42b           | 128 | 14.8226 |  14.8405  |  11.205  |        11.3003         |
|        ese_vovnet19b_dw         | 128 | 16.2803 |  16.2785  |  11.194  |        11.2553         |
|           fbnetc_100            | 128 | 15.1928 |  15.2091  | 10.9585  |        11.0298         |
|            tinynet_a            | 128 | 15.8431 |  15.8779  | 10.6169  |        10.7588         |
|          resmlp_12_224          | 128 | 19.8277 |  19.6207  | 10.0354  |        10.0698         |
|          spnasnet_100           | 128 |  13.37  |  13.3906  |  9.4487  |         9.5333         |
|          ghostnet_100           | 128 | 12.8305 |  12.8864  |  8.9337  |         9.0027         |
|           mnasnet_100           | 128 | 12.3796 |  12.3788  |   8.76   |         8.8137         |
|         mobilenetv2_100         | 128 | 12.224  |  12.2117  |  8.3321  |         8.4003         |
|      mobilenetv3_large_100      | 128 | 11.1546 |  11.1764  |  8.1579  |         8.2212         |
|           regnety_002           | 128 | 8.2647  |  8.2023   |  5.4309  |         6.0336         |
|            lcnet_050            | 128 | 3.3165  |  4.0985   |  3.174   |         2.9703         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Build Summary

see more

Run name

day_355_21_12_22_performance_float16_459

Commit hashes

pytorch commit: 88c581be87ac59ea1251f35a57b610ae81b9362d
pytorch commit date: 2022-12-21 04:51:51+00:00
functorch Absent
torchbench commit: 43ca0857e9c7b9d90f647d1befbaee1dfe446d7e
torchbench commit date: 2022-12-16 10:47:24-08:00

TorchDynamo config flags

torch._dynamo.config.DO_NOT_USE_legacy_non_fake_example_inputs = False
torch._dynamo.config.HAS_REFS_PRIMS = True
torch._dynamo.config.capture_scalar_outputs = False
torch._dynamo.config.dead_code_elimination = True
torch._dynamo.config.disable = False
torch._dynamo.config.dynamic_shapes = False
torch._dynamo.config.enforce_cond_guards_match = True
torch._dynamo.config.error_on_nested_fx_trace = True
torch._dynamo.config.guard_nn_modules = False
torch._dynamo.config.normalize_ir = False
torch._dynamo.config.optimize_ddp = True
torch._dynamo.config.output_code = False
torch._dynamo.config.output_graph_code = False
torch._dynamo.config.print_graph_breaks = False
torch._dynamo.config.raise_on_ctx_manager_usage = True
torch._dynamo.config.raise_on_unsafe_aot_autograd = False
torch._dynamo.config.replay_record_enabled = False
torch._dynamo.config.rewrite_assert_with_torch_assert = True
torch._dynamo.config.specialize_int_float = True
torch._dynamo.config.suppress_errors = False
torch._dynamo.config.verbose = False
torch._dynamo.config.verify_correctness = False

Torch version

torch: 2.0.0a0+git88c581b

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8302
Number CUDA Devices: 8
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.314694656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant