Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one-off runs #3

Open
anijain2305 opened this issue Mar 22, 2023 · 12 comments
Open

one-off runs #3

anijain2305 opened this issue Mar 22, 2023 · 12 comments

Comments

@anijain2305
Copy link
Owner

anijain2305 commented Mar 22, 2023

(next 2 comments are for max-autotune, warm start run)

AMP RUN

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 90%, 53/59 | 100%, 45/45 | 68%, 41/60  |
|       aot_eager        | 88%, 52/59 | 100%, 45/45 | 92%, 55/60  |
|        inductor        | 78%, 46/59 | 84%, 38/45  | 93%, 56/60  |
| inductor_no_cudagraphs | 78%, 46/59 | 84%, 38/45  | 95%, 57/60  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.58x    |    1.66x    |    1.38x    |
| inductor_no_cudagraphs |   1.57x    |    1.65x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.75    |    7.37     |    5.95     |
|       aot_eager        |    9.38    |    16.06    |    12.68    |
|        inductor        |   228.90   |   199.68    |   334.49    |
| inductor_no_cudagraphs |   30.31    |    50.32    |    43.99    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    1.00x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.89x    |
|        inductor        |   0.75x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.88x    |    0.91x    |    0.92x    |
+------------------------+------------+-------------+-------------+
@anijain2305
Copy link
Owner Author

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9745 |   0.925   |  3.6666  |         3.6217         |
|           BERT_pytorch            |  16  | 0.9975 |  0.7999   |  3.1791  |         3.248          |
|            densenet121            |  4   | 0.9888 |  0.6947   |  2.7868  |         2.7862         |
|            hf_T5_large            |  2   | 0.9806 |   0.806   |  2.3425  |         2.262          |
|             hf_Albert             |  8   | 0.9963 |  0.9603   |  2.3376  |         2.3399         |
|              hf_Bart              |  4   | 0.9801 |  0.7934   |  2.1449  |         2.4193         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9748 |  0.8967   |  2.0857  |         1.8595         |
|         phlippe_densenet          | 128  | 0.9853 |  0.7714   |  2.0062  |         2.0183         |
|           squeezenet1_1           |  32  | 0.9843 |  0.9261   |  2.0043  |         1.8585         |
|        mobilenet_v3_large         |  32  | 0.9958 |  0.7796   |  1.997   |         2.0592         |
|              hf_GPT2              |  4   | 0.9953 |  0.9565   |  1.9265  |         1.9259         |
|               hf_T5               |  8   | 0.9868 |  0.8503   |  1.9215  |         1.9342         |
|              hf_Bert              |  4   | 0.9975 |  0.8397   |  1.8429  |         1.8416         |
|           hf_Longformer           |  2   | 0.9252 |  0.5851   |  1.8011  |         1.8042         |
|          phlippe_resnet           | 128  | 0.9781 |  0.7561   |  1.8006  |         1.8106         |
|          pytorch_struct           | 200  | 0.9518 |  0.7782   |  1.7996  |         1.7666         |
|        speech_transformer         |  32  | 0.9826 |  0.7931   |  1.7197  |         1.7331         |
|          resnext50_32x4d          |  8   | 0.9882 |  0.7072   |  1.7009  |         1.6915         |
|      timm_vision_transformer      |  32  | 0.9836 |  0.8443   |  1.7005  |         1.9707         |
|            mnasnet1_0             |  32  | 0.9905 |  0.7353   |  1.6732  |         1.6549         |
| attention_is_all_you_need_pytorch | 256  | 0.9887 |  0.8359   |  1.6465  |         1.6313         |
|           fastNLP_Bert            |  6   | 0.9847 |  0.8539   |  1.635   |         1.6491         |
|           hf_Bert_large           |  4   | 1.0021 |  0.8623   |  1.6239  |         1.6323         |
|             resnet18              |  16  | 0.9895 |  0.7542   |  1.5738  |         1.5531         |
|        shufflenet_v2_x1_0         | 128  | 0.9938 |  0.7535   |  1.5633  |         1.5206         |
|               dcgan               |  32  | 0.8862 |  0.7092   |  1.4916  |         1.5106         |
|           mobilenet_v2            |  96  | 0.997  |  0.7779   |  1.4766  |         1.4746         |
|           hf_DistilBert           |  8   | 0.9836 |  0.9375   |  1.4722  |         1.4475         |
|            timm_nfnet             | 128  | 0.9864 |  0.9842   |  1.4585  |         1.4648         |
|           timm_resnest            |  32  | 0.9928 |  0.8523   |  1.4553  |         1.4551         |
|                drq                |  1   | 0.9672 |  0.7538   |  1.4447  |         1.4735         |
|           lennard_jones           | 1000 | 0.8676 |  0.7663   |  1.4389  |         1.4672         |
|         timm_efficientnet         |  32  | 0.9317 |  0.6227   |  1.3717  |         1.3928         |
|          LearningToPaint          |  96  | 0.9873 |  0.7763   |  1.2759  |         1.2733         |
|               vgg16               |  64  | 0.9994 |   0.998   |  1.2434  |         1.2439         |
|          pytorch_stargan          |  16  | 0.9948 |  0.8039   |  1.2292  |         1.2232         |
|            Super_SloMo            |  6   | 0.9977 |  0.1781   |  1.2182  |         1.2192         |
|         soft_actor_critic         | 256  | 0.7797 |  0.6707   |  1.2117  |         1.0491         |
|           pytorch_unet            |  1   | 0.9969 |  0.2047   |  1.1718  |         1.1721         |
|        Background_Matting         |  4   | 0.9985 |  0.1371   |  1.1714  |         1.1721         |
|             resnet152             |  32  | 0.9948 |  0.7447   |  1.1616  |         1.2227         |
|             resnet50              |  32  | 0.9955 |  0.7607   |  1.139   |         1.1372         |
|              yolov3               |  16  | 0.9967 |  0.8074   |  1.1153  |         1.1159         |
|              demucs               |  4   | 0.9995 |  1.0006   |  1.0262  |         1.0292         |
|            tts_angular            |  64  | 0.9531 |  0.9167   |  0.9745  |         0.9877         |
|            timm_regnet            |  32  | 0.9145 |  0.7756   |  0.9357  |         0.9334         |
|      nvidia_deeprecommender       | 256  | 0.9991 |  0.9984   |  0.9351  |         0.9353         |
|            timm_vovnet            |  32  |  0.86  |  0.7083   |  0.9249  |         0.9187         |
|   timm_vision_transformer_large   |  32  | 0.9982 |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|       doctr_reco_predictor        |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|        doctr_det_predictor        |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               moco                |  32  | 0.9764 |    0.0    |   0.0    |          0.0           |
|           hf_GPT2_large           |  4   | 0.9843 |  0.9721   |   0.0    |         1.7378         |
|            hf_BigBird             |  2   | 0.9753 |  0.7838   |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9529 |  0.8453   |   0.0    |          0.0           |
|            hf_Reformer            |  4   | 0.9928 |  0.9501   |   0.0    |          0.0           |
|              alexnet              | 128  | 0.9991 |  0.9974   |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |     fail_accuracy      |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|            hf_BigBird             |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               dlrm                |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|          phlippe_resnet           |  4  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|        doctr_det_predictor        |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|       doctr_reco_predictor        |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|        speech_transformer         |  32  | 5.9379  |  13.5706  | 804.1871 |        42.8845         |
| attention_is_all_you_need_pytorch | 256  |  4.324  |  10.7867  | 689.7678 |        38.4134         |
|            hf_T5_large            |  2   | 26.2442 |  54.7406  | 506.3967 |        148.4014        |
|      timm_vision_transformer      |  32  | 3.3655  |  7.1652   | 436.4946 |        26.1915         |
|             hf_Albert             |  8   | 2.4513  |  8.5711   | 422.7417 |         28.404         |
|         phlippe_densenet          | 128  | 3.2458  |  6.9257   | 416.1867 |        25.5266         |
|           fastNLP_Bert            |  6   | 4.9636  |  11.1295  | 398.4251 |        34.5511         |
|          pytorch_struct           | 200  | 0.7813  |  1.3378   | 354.4235 |         6.9333         |
|           BERT_pytorch            |  16  | 4.7902  |  11.449   | 350.6787 |        34.5791         |
|           mobilenet_v2            |  96  |  3.094  |  6.9056   | 321.413  |        24.9613         |
|           hf_Bert_large           |  4   | 10.1418 |  20.7939  | 314.3734 |        60.6028         |
|            mnasnet1_0             |  32  | 3.0959  |  6.6976   | 314.3265 |        23.5121         |
|            densenet121            |  4   | 7.4234  |  17.9994  | 309.2972 |        61.1015         |
|               hf_T5               |  8   | 5.6153  |  13.4608  | 291.5986 |        39.5658         |
|        mobilenet_v3_large         |  32  | 3.3994  |  7.5884   | 256.7481 |        27.1516         |
|                drq                |  1   | 0.6686  |  1.0099   | 253.8761 |         6.3746         |
|      nvidia_deeprecommender       | 256  | 0.4823  |   0.766   | 249.458  |         5.9622         |
|           hf_Longformer           |  2   | 11.2554 |  31.1595  | 243.8476 |        123.5476        |
|              yolov3               |  16  |  4.812  |  10.4255  | 227.5451 |        36.3119         |
|              hf_GPT2              |  4   | 4.6244  |  9.6041   | 218.6052 |        29.5181         |
|        shufflenet_v2_x1_0         | 128  | 3.4342  |  7.6127   | 215.3652 |        26.8468         |
|         timm_efficientnet         |  32  | 4.9429  |  10.0025  | 207.2363 |        30.7292         |
|            timm_nfnet             | 128  | 5.7487  |  10.9879  | 206.3631 |         32.057         |
|            timm_vovnet            |  32  | 3.5961  |  6.2972   | 199.301  |        22.1326         |
|         soft_actor_critic         | 256  | 0.4404  |  0.6177   | 179.8936 |         5.4369         |
|              hf_Bart              |  4   | 10.8484 |  18.0336  | 179.5238 |        49.6952         |
|            timm_regnet            |  32  | 6.6018  |  12.1995  | 179.196  |        33.0846         |
|          LearningToPaint          |  96  | 1.4753  |  2.8955   | 167.1134 |         12.25          |
|             resnet152             |  32  | 8.8693  |  20.1297  | 163.4345 |        58.4397         |
|               vgg16               |  64  | 0.6332  |  1.1205   | 160.4233 |         7.4845         |
|          resnext50_32x4d          |  8   | 3.1743  |  7.4339   | 158.7375 |        22.7869         |
|           lennard_jones           | 1000 | 0.3987  |  0.6209   | 143.0381 |         4.5367         |
|        Background_Matting         |  4   | 3.2032  |  11.4127  | 131.4817 |        26.8162         |
|             resnet18              |  16  |  1.338  |  2.7724   | 128.2701 |         12.183         |
|           pytorch_unet            |  1   | 1.5283  |  4.4352   | 121.5264 |        13.9278         |
|       functorch_dp_cifar10        |  64  | 1.1992  |  2.5475   | 117.7511 |        12.8811         |
|          phlippe_resnet           | 128  |  1.349  |  2.7318   | 113.7247 |        10.8462         |
|              hf_Bert              |  4   | 4.9301  |  10.3482  | 110.8388 |         32.425         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2026  |  2.8968   | 89.0985  |        12.3657         |
|           timm_resnest            |  32  |  1.822  |  3.8811   | 78.4203  |        16.7527         |
|            Super_SloMo            |  6   | 2.7734  |  9.7645   | 73.1308  |        25.4957         |
|              demucs               |  4   | 1.4955  |  2.2725   | 71.8464  |         9.6114         |
|           hf_DistilBert           |  8   | 2.3655  |  5.6075   |  61.993  |        19.3363         |
|          pytorch_stargan          |  16  | 1.1848  |  3.2111   | 46.9079  |        10.7557         |
|           squeezenet1_1           |  32  | 1.0332  |  1.7378   | 44.4823  |         8.5951         |
|             resnet50              |  32  | 3.1836  |  7.4252   | 23.9834  |        23.0489         |
|               dcgan               |  32  | 0.4331  |  0.7077   | 16.4177  |         5.1875         |
|            tts_angular            |  64  | 0.4423  |  0.5108   |  4.7125  |         3.838          |
|           hf_GPT2_large           |  4   | 14.8619 |  29.6938  |   nan    |        84.9245         |
|            hf_BigBird             |  2   | 12.8484 |  39.0664  |   nan    |          nan           |
|            hf_Reformer            |  4   | 4.1752  |  6.3515   |   nan    |          nan           |
|               dlrm                | 1024 |  0.374  |  0.7853   |   nan    |          nan           |
|              alexnet              | 128  | 0.5032  |  0.7703   |   nan    |          nan           |
|               moco                |  32  | 27.3074 |    nan    |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 9.3266  |    nan    |   nan    |          nan           |
|        doctr_det_predictor        |  0   |   nan   |    nan    |   nan    |          nan           |
|       doctr_reco_predictor        |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.1588  |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0399  |         1.0863         |
|           mobilenet_v2            |  96  | 0.9864 |  0.7651   |  1.0107  |         1.0572         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  0.9988  |         1.0163         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  0.9953  |         1.052          |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  0.9852  |         0.9852         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9693  |         1.0269         |
|            timm_nfnet             | 128  | 0.907  |  0.8752   |  0.9619  |         0.9678         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         0.9428         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.9421  |         0.9421         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9317  |         0.9319         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9138  |         0.9401         |
|         timm_efficientnet         |  32  | 0.9865 |   0.819   |  0.874   |         1.072          |
|              yolov3               |  16  | 0.9923 |  0.8257   |  0.8711  |         0.8705         |
|        shufflenet_v2_x1_0         | 128  | 0.9549 |  0.8395   |  0.8621  |         0.8979         |
|        speech_transformer         |  32  | 0.9915 |    0.9    |  0.8583  |         1.0773         |
|            timm_regnet            |  32  | 0.995  |  0.8499   |  0.8501  |         0.8484         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8456  |         0.8517         |
|             resnet50              |  32  | 0.9922 |  0.8613   |  0.8365  |         0.8344         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8357  |         0.9369         |
|        Background_Matting         |  4   | 1.0125 |  0.6487   |  0.834   |         0.8484         |
|             resnet152             |  32  | 0.9959 |  0.8916   |  0.8319  |         0.8684         |
|           timm_resnest            |  32  | 0.9888 |  0.8973   |  0.8297  |         0.9564         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         0.8201         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.7988  |         1.0061         |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.7734  |         0.8554         |
|           squeezenet1_1           |  32  | 0.9674 |  0.9309   |  0.773   |         1.0247         |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.9248         |
|              demucs               |  4   | 0.9663 |  0.9659   |  0.7661  |         0.7734         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.7545  |         0.7546         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7428  |         0.8185         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7338  |         0.9955         |
|               vgg16               |  64  | 0.9922 |  0.7246   |  0.723   |         0.7231         |
|            mnasnet1_0             |  32  | 0.9819 |  0.8641   |  0.7201  |         0.8596         |
|            densenet121            |  4   | 0.9956 |  0.9802   |  0.7085  |         0.9766         |
|        mobilenet_v3_large         |  32  | 0.9801 |  0.8396   |  0.6992  |         0.9037         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.6585         |
|          resnext50_32x4d          |  8   | 0.9947 |  0.8438   |  0.6561  |         0.7855         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.597   |         0.7089         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8796   |  0.5458  |         0.8393         |
|             resnet18              |  16  | 0.983  |  0.8055   |  0.5409  |         0.7792         |
|           hf_Longformer           |  2   | 0.8565 |  0.8296   |  0.4206  |         0.4205         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.7086         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3272  |         0.8517         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.1818  |         0.6379         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.1811  |         0.7821         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9255   |  0.1109  |         0.6066         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.0648  |         0.7073         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |   nan    |         0.8905         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |          nan           |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |   nan    |          nan           |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |   nan    |          nan           |
|              alexnet              | 128  | 0.9452 |  0.7935   |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |          nan           |
|               moco                |  32  | 0.9958 |    nan    |   nan    |          nan           |
|        doctr_det_predictor        |  0   |  nan   |    nan    |   nan    |          nan           |
|       doctr_reco_predictor        |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|        Background_Matting         |  4   | 126.0957 | 918.6078  | 107.5371 |        107.5328        |
|            hf_T5_large            |  2   | 269.1613 | 273.5731  | 98.1109  |        97.8314         |
|               hf_T5               |  8   | 183.1776 | 210.9367  | 93.3605  |         93.504         |
|            timm_nfnet             | 128  | 119.6264 | 120.2675  | 80.7577  |        80.7667         |
|            Super_SloMo            |  6   | 79.8167  | 446.6856  | 65.4352  |        65.1566         |
|           hf_Longformer           |  2   |  122.75  | 193.0691  | 62.2797  |        62.0164         |
|              yolov3               |  16  | 68.7648  |  84.7996  | 61.4977  |         61.413         |
|            timm_regnet            |  32  | 61.4621  |  71.7881  | 59.6524  |        60.1548         |
|             resnet152             |  32  | 63.4651  |  87.8524  | 54.3377  |        55.1513         |
|               vgg16               |  64  | 66.2892  |  66.3835  | 53.3402  |        53.3047         |
|              demucs               |  4   | 53.5993  |  53.4955  | 52.2167  |        52.3699         |
|           hf_Bert_large           |  4   | 83.6988  |  94.6253  | 50.9415  |        50.8037         |
|           pytorch_unet            |  1   | 39.9741  | 194.5111  | 34.0153  |        33.9792         |
|        speech_transformer         |  32  | 59.5544  |  84.1112  | 33.4183  |        33.0468         |
|           fastNLP_Bert            |  6   | 57.0606  |  60.7527  |  33.011  |        31.1786         |
| attention_is_all_you_need_pytorch | 256  | 58.3394  |  68.5091  | 32.8929  |        32.8822         |
|              hf_Bart              |  4   |  71.723  |  86.4912  | 32.6812  |        33.0355         |
|           mobilenet_v2            |  96  | 47.1521  |  60.4154  | 31.7936  |        31.8657         |
|             hf_Albert             |  8   |  68.645  |  72.3887  | 29.6789  |        29.6467         |
|            timm_vovnet            |  32  | 28.8224  |  35.1916  | 26.8202  |        26.7869         |
|              hf_GPT2              |  4   | 49.3612  |  50.6168  | 25.3235  |        25.2676         |
|         timm_efficientnet         |  32  |  34.564  |  51.7456  | 23.5489  |        23.2529         |
|             resnet50              |  32  | 26.2812  |  37.0821  | 22.8879  |        22.8241         |
|              hf_Bert              |  4   | 40.7494  |  48.2982  | 22.4111  |        22.4317         |
|           hf_DistilBert           |  8   | 32.1005  |  35.7249  | 22.0729  |        22.0321         |
|            densenet121            |  4   | 60.8842  |  86.2346  | 20.9685  |        18.8171         |
|        shufflenet_v2_x1_0         | 128  | 32.1105  |  40.0698  | 19.7102  |        19.7218         |
|           BERT_pytorch            |  16  | 53.4104  |  66.8912  | 17.0935  |        17.0782         |
|      timm_vision_transformer      |  32  |  33.391  |  33.5448  | 16.7177  |        16.6983         |
|           timm_resnest            |  32  | 24.2609  |  28.3734  | 16.6174  |        16.6065         |
|        mobilenet_v3_large         |  32  | 28.9221  |  36.5796  | 13.3713  |        13.8347         |
|            mnasnet1_0             |  32  | 23.6481  |  31.9652  | 13.2494  |        13.1868         |
|          pytorch_stargan          |  16  | 14.7275  |  18.1919  | 11.9033  |        11.8483         |
|         phlippe_densenet          | 128  | 23.9166  |  30.3144  | 11.9014  |         11.541         |
|          resnext50_32x4d          |  8   | 22.3265  |  30.6785  | 11.8135  |        11.6231         |
|      nvidia_deeprecommender       | 256  | 10.2265  |  10.2372  | 10.9236  |        10.9303         |
|          LearningToPaint          |  96  | 12.0771  |  14.3152  |  8.7481  |         8.7039         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 13.8376  |  15.1046  |  7.2182  |         7.1309         |
|            tts_angular            |  64  |  6.5685  |  6.8807   |  6.5165  |         6.3773         |
|             resnet18              |  16  |  9.8001  |   12.82   |  5.7904  |         5.7424         |
|           squeezenet1_1           |  32  | 10.3157  |  11.8552  |  5.4993  |         5.4342         |
|          phlippe_resnet           | 128  |  9.2808  |  12.0101  |  5.0985  |         5.0227         |
|       functorch_dp_cifar10        |  64  | 10.5392  |  12.2784  |  2.8779  |         2.8591         |
|                drq                |  1   |  3.3757  |   4.345   |  2.8418  |         3.0024         |
|          pytorch_struct           | 200  |  5.596   |  6.1243   |  2.7861  |         2.688          |
|               dcgan               |  32  |  2.3663  |  3.0443   |  1.4545  |         1.4307         |
|         soft_actor_critic         | 256  |  2.6689  |   3.419   |  1.2898  |         1.8432         |
|           lennard_jones           | 1000 |  1.7473  |  2.3537   |  1.0843  |         1.0323         |
|           hf_GPT2_large           |  4   | 213.7892 | 214.8782  |   nan    |        120.237         |
|            hf_BigBird             |  2   | 197.2802 | 275.9741  |   nan    |          nan           |
|            hf_Reformer            |  4   | 81.6483  |  86.0763  |   nan    |          nan           |
|              alexnet              | 128  |  9.8389  |  9.8565   |   nan    |          nan           |
|               dlrm                | 1024 |  4.942   |  5.0729   |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 465.2338 |    nan    |   nan    |          nan           |
|               moco                |  32  | 50.3693  |    nan    |   nan    |          nan           |
|        doctr_det_predictor        |  0   |   nan    |    nan    |   nan    |          nan           |
|       doctr_reco_predictor        |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.993  |  0.9322   |  2.4897  |         2.4996         |
|      GPT2ForSequenceClassification      |  4  | 0.9814 |  0.9568   |  2.2984  |         2.2976         |
|             XGLMForCausalLM             |  8  | 0.9698 |  0.7496   |  2.2303  |         2.2367         |
|       ElectraForQuestionAnswering       | 64  | 0.988  |  0.9772   |  2.1243  |         2.1242         |
|       MT5ForConditionalGeneration       | 16  | 0.9921 |  0.8398   |  2.1178  |         2.134          |
|          MobileBertForMaskedLM          | 64  | 0.9448 |  0.8025   |  1.9947  |         1.7704         |
|               DistillGPT2               | 16  | 0.9903 |  0.9579   |  1.8904  |         1.8944         |
|            PLBartForCausalLM            |  8  | 0.9948 |  0.9622   |  1.8711  |         1.887          |
|            XLNetLMHeadModel             |  8  | 0.9959 |  0.9685   |  1.8479  |         1.8511         |
|           ElectraForCausalLM            | 32  | 0.9826 |  0.9361   |  1.7959  |         1.7954         |
|        BertForQuestionAnswering         | 16  | 0.9856 |  0.9702   |  1.772   |         1.7714         |
|       RobertaForQuestionAnswering       | 16  | 0.9855 |  0.9707   |  1.7664  |         1.7652         |
|          AllenaiLongformerBase          |  4  | 0.9464 |  0.6571   |  1.7653  |         1.7588         |
|     PLBartForConditionalGeneration      |  4  | 0.9923 |   0.933   |  1.7257  |         1.7043         |
|           RobertaForCausalLM            | 16  | 0.988  |  0.9628   |  1.6759  |         1.6768         |
|      MBartForConditionalGeneration      |  2  | 0.9965 |  0.9602   |  1.666   |         1.5361         |
|                 T5Small                 |  4  | 0.9822 |  0.8534   |  1.665   |         1.6628         |
|       T5ForConditionalGeneration        |  4  | 0.9834 |  0.8546   |  1.6609  |         1.6574         |
|            MBartForCausalLM             |  4  | 0.9931 |  0.9688   |  1.6459  |         1.6427         |
|             BartForCausalLM             |  4  | 0.9924 |  0.9684   |  1.6393  |         1.638          |
|    MegatronBertForQuestionAnswering     |  8  | 0.9811 |  0.9616   |  1.6253  |         1.6249         |
|       AlbertForQuestionAnswering        |  4  | 0.9998 |  0.8856   |  1.6244  |         1.6236         |
|                CamemBert                | 16  | 0.988  |  0.9641   |  1.6202  |         1.6186         |
|            YituTechConvBert             | 16  | 0.9863 |  0.9562   |  1.6156  |         1.6138         |
|            AlbertForMaskedLM            |  4  | 0.9998 |   0.885   |  1.6134  |         1.6143         |
|             BertForMaskedLM             | 16  | 0.9863 |  0.9617   |  1.597   |         1.5965         |
|     M2M100ForConditionalGeneration      | 16  | 1.0331 |   0.819   |  1.5961  |         1.7258         |
|           LayoutLMForMaskedLM           | 16  | 0.9874 |   0.963   |  1.5816  |         1.5888         |
|      BartForConditionalGeneration       |  2  | 0.9946 |  0.9555   |  1.5481  |         1.5446         |
|         MegatronBertForCausalLM         |  4  | 0.9818 |   0.903   |  1.509   |         1.5013         |
|         Speech2Text2ForCausalLM         | 256 | 0.9851 |  0.9354   |  1.4949  |         1.4798         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0006 |  0.8831   |  1.4595  |         1.4665         |
|     DistilBertForQuestionAnswering      | 256 | 0.9944 |  0.9881   |  1.4505  |         1.4518         |
|           PegasusForCausalLM            | 32  | 0.9892 |  0.8861   |  1.3919  |         1.3482         |
|            TrOCRForCausalLM             | 32  | 0.9928 |  0.9635   |  1.3812  |         1.3801         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9778 |  0.8852   |  1.3759  |         1.373          |
|     PegasusForConditionalGeneration     | 32  | 0.9997 |  0.9115   |  1.3254  |         1.2809         |
|          DistilBertForMaskedLM          | 128 | 0.9925 |  0.9507   |  1.2233  |         1.2236         |
|     MobileBertForQuestionAnswering      | 128 | 0.9484 |  0.8065   |  0.7854  |         0.764          |
|    LayoutLMForSequenceClassification    | 16  | 0.9851 |  0.9718   |   0.0    |          0.0           |
|       DebertaForQuestionAnswering       |  8  | 0.9473 |  0.7895   |   0.0    |          0.0           |
|          BlenderbotForCausalLM          |  4  | 0.9715 |  0.7692   |   0.0    |          0.0           |
|           DebertaForMaskedLM            |  4  | 0.8734 |  0.6369   |   0.0    |          0.0           |
|      DebertaV2ForQuestionAnswering      |  2  | 0.8385 |  0.6083   |   0.0    |          0.0           |
|          DebertaV2ForMaskedLM           |  1  | 0.8334 |  0.6075   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 15.2018 |  38.6369  | 612.8043 |        113.3956        |
|     MobileBertForQuestionAnswering      | 128 | 15.3019 |  38.2697  | 601.811  |        114.6848        |
|       MT5ForConditionalGeneration       | 16  | 7.8685  |  18.4295  | 579.4308 |        55.6624         |
|           ElectraForCausalLM            | 32  | 7.4534  |  13.9227  | 409.8685 |        37.9467         |
|            AlbertForMaskedLM            |  4  | 2.3588  |  8.1529   | 321.446  |        27.6028         |
|            XLNetLMHeadModel             |  8  | 10.4839 |  27.2821  | 282.4343 |        84.5672         |
|       ElectraForQuestionAnswering       | 64  | 5.1265  |  11.261   | 273.235  |        33.3345         |
|       T5ForConditionalGeneration        |  4  | 5.5981  |  13.199   | 260.3542 |        40.7701         |
|     M2M100ForConditionalGeneration      | 16  | 11.5678 |  25.368   | 259.9593 |        88.0024         |
|          AllenaiLongformerBase          |  4  | 11.3305 |  31.1314  | 251.2462 |        125.0294        |
|      GPT2ForSequenceClassification      |  4  | 4.7907  |  9.7196   | 238.0007 |        30.1957         |
|            YituTechConvBert             | 16  | 10.3942 |  20.1558  | 236.3194 |        54.8597         |
|             XGLMForCausalLM             |  8  | 9.7456  |  20.9409  | 232.0268 |        74.9239         |
|      BartForConditionalGeneration       |  2  | 11.5589 |  25.6066  | 220.9387 |        78.6658         |
|             BertForMaskedLM             | 16  | 5.1121  |  10.6408  | 217.7401 |        33.0531         |
|          DistilBertForMaskedLM          | 128 |  2.499  |  5.7283   | 215.7511 |        18.7842         |
|            TrOCRForCausalLM             | 32  | 6.4664  |  11.8024  | 215.566  |         38.541         |
|     DistilBertForQuestionAnswering      | 256 | 2.4975  |  5.6655   | 198.5407 |        18.6101         |
|             BartForCausalLM             |  4  | 6.2474  |  11.8835  | 178.4346 |        39.8267         |
|       BlenderbotSmallForCausalLM        | 64  | 4.3127  |  8.3279   | 172.8293 |        28.0008         |
|               DistillGPT2               | 16  | 2.5314  |  4.9717   | 163.3151 |        17.6674         |
|         Speech2Text2ForCausalLM         | 256 | 3.2033  |  6.0461   | 161.3093 |        23.5722         |
|    MegatronBertForQuestionAnswering     |  8  | 10.0741 |  20.9543  | 146.4264 |        63.9292         |
|             OPTForCausalLM              |  2  | 5.3838  |  10.8433  | 116.3097 |         36.559         |
|           PegasusForCausalLM            | 32  | 5.8891  |  11.4565  | 106.0252 |         38.085         |
|      MBartForConditionalGeneration      |  2  | 11.4184 |  25.7575  | 104.9186 |         88.017         |
|         MegatronBertForCausalLM         |  4  | 9.9645  |  21.1117  | 100.4353 |         64.697         |
|     PLBartForConditionalGeneration      |  4  | 9.1138  |  17.3234  | 97.3324  |        47.6158         |
|     PegasusForConditionalGeneration     | 32  | 4.9909  |  18.9919  | 93.9846  |        75.2113         |
|            PLBartForCausalLM            |  8  | 3.5866  |  6.7189   | 91.7901  |        21.8006         |
|        BertForQuestionAnswering         | 16  | 5.0775  |  11.2128  | 88.0399  |        33.0342         |
|       AlbertForQuestionAnswering        |  4  |  2.308  |  7.9774   | 86.4534  |        27.2846         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.5515  |  17.9201  | 85.1413  |        53.4714         |
|                CamemBert                | 16  | 5.1905  |  11.2798  | 63.3132  |        32.8948         |
|            MBartForCausalLM             |  4  | 6.2442  |  12.2141  | 41.3957  |        38.1136         |
|                 T5Small                 |  4  | 5.5735  |  12.5455  | 40.3395  |        40.4291         |
|           RobertaForCausalLM            | 16  | 5.1457  |  11.2786  | 40.1117  |        33.7172         |
|           LayoutLMForMaskedLM           | 16  |  5.557  |  11.7614  | 35.2617  |        34.1443         |
|       RobertaForQuestionAnswering       | 16  | 5.1141  |  11.2751  | 33.8184  |        32.8534         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.1057 |  27.645   |   nan    |          nan           |
|          DebertaV2ForMaskedLM           |  1  | 15.1776 |  26.3611  |   nan    |          nan           |
|          BlenderbotForCausalLM          |  4  | 11.5496 |  22.3509  |   nan    |          nan           |
|           DebertaForMaskedLM            |  4  |  7.414  |  13.4548  |   nan    |          nan           |
|       DebertaForQuestionAnswering       |  8  | 7.1924  |  13.3661  |   nan    |          nan           |
|    LayoutLMForSequenceClassification    | 16  | 5.4712  |  11.605   |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1135  |         1.114          |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1114  |         1.1387         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0868  |         1.0868         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.0865         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9246   |  1.0617  |         1.062          |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0541         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.0382         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0356  |         1.0382         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |   1.03   |         1.0539         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.057          |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0277  |         1.0511         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0078         |
|            YituTechConvBert             | 16  | 0.953  |  0.8749   |  0.9793  |         0.9793         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         0.9734         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         0.9682         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         0.9574         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         0.953          |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8787   |  0.9215  |         0.9575         |
|     PegasusForConditionalGeneration     | 32  | 0.9439 |  0.8957   |  0.8911  |         0.8911         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.8906  |         0.9089         |
|           ElectraForCausalLM            | 32  | 0.9161 |  0.7864   |  0.8896  |         0.8896         |
|            PLBartForCausalLM            |  8  | 0.9237 |  0.8168   |  0.8748  |         0.8918         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8677  |         0.8849         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         0.8672         |
|            TrOCRForCausalLM             | 32  |  0.92  |  0.8307   |  0.8628  |         0.8628         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8913   |  0.8501  |         0.8501         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         0.8456         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         0.845          |
|             BartForCausalLM             |  4  | 0.951  |  0.8911   |  0.8311  |         0.8311         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.816   |         0.8729         |
|           PegasusForCausalLM            | 32  | 0.9238 |  0.8405   |  0.7966  |         0.7966         |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.787   |         0.808          |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.752   |         0.7654         |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7573   |  0.7364  |         0.7566         |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |  0.6744  |         0.6744         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6505  |         0.6644         |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.6058  |         0.6058         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |  0.4696  |         0.4697         |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0537   |   nan    |          nan           |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |   nan    |          nan           |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9764 |  0.9763   |   nan    |          nan           |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |   nan    |          nan           |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9156   |   nan    |          nan           |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|     MobileBertForQuestionAnswering      | 128 | 173.0459 |  210.317  | 216.1644 |        216.2452        |
|            AlbertForMaskedLM            |  4  | 266.3771 |  300.754  | 165.0621 |        164.8743        |
|       AlbertForQuestionAnswering        |  4  | 264.3063 | 298.3179  | 162.6193 |        162.649         |
|            XLNetLMHeadModel             |  8  | 281.629  | 288.8707  | 151.3718 |        151.9249        |
|     PegasusForConditionalGeneration     | 32  | 139.0758 | 157.9987  | 107.5884 |        107.4553        |
|          AllenaiLongformerBase          |  4  | 192.6358 | 274.6697  | 103.0385 |        103.0073        |
|            TrOCRForCausalLM             | 32  | 139.2544 |  142.412  | 100.0796 |         99.95          |
|          MobileBertForMaskedLM          | 64  | 175.2932 | 215.3965  |  96.302  |         95.617         |
|      MBartForConditionalGeneration      |  2  | 139.8787 | 147.0231  | 89.2408  |        89.0775         |
|      BartForConditionalGeneration       |  2  | 138.7751 | 145.6953  | 88.8995  |        88.9364         |
|    MegatronBertForQuestionAnswering     |  8  | 144.4802 | 147.2926  | 87.2724  |        87.2661         |
|            YituTechConvBert             | 16  | 127.3377 | 131.3746  | 77.6685  |        77.7384         |
| BlenderbotSmallForConditionalGeneration | 64  | 113.1721 |  141.713  | 75.8347  |        75.7973         |
|                CamemBert                | 16  | 119.9261 | 123.1083  | 73.0439  |        73.1466         |
|     M2M100ForConditionalGeneration      | 16  | 106.1743 | 169.0829  | 71.3574  |         71.253         |
|           LayoutLMForMaskedLM           | 16  | 114.0876 | 116.9931  | 71.1725  |        70.8657         |
|     DistilBertForQuestionAnswering      | 256 | 104.2816 | 104.6358  | 71.1721  |        71.1248         |
|            MBartForCausalLM             |  4  | 114.3137 | 117.1971  | 69.4236  |        69.1229         |
|             BartForCausalLM             |  4  | 114.7808 | 116.9336  | 69.3089  |        69.2921         |
|          DistilBertForMaskedLM          | 128 | 85.2505  |  89.1399  | 69.2377  |        69.1994         |
|             BertForMaskedLM             | 16  | 111.4021 | 114.1966  | 68.8368  |        68.8219         |
|     PLBartForConditionalGeneration      |  4  | 118.6356 | 126.0645  |  68.807  |         68.619         |
|           RobertaForCausalLM            | 16  | 116.4052 | 119.7136  |  68.69   |        68.5675         |
|             OPTForCausalLM              |  2  | 169.854  | 183.0746  | 68.1696  |         68.278         |
|       T5ForConditionalGeneration        |  4  | 106.0459 |  122.561  | 63.0649  |        63.0061         |
|                 T5Small                 |  4  | 106.7222 | 122.3385  | 62.9937  |        62.9785         |
|            PLBartForCausalLM            |  8  | 115.8989 | 117.9432  | 62.1046  |        62.1039         |
|         MegatronBertForCausalLM         |  4  | 88.7223  |  96.5178  | 57.6411  |        57.5781         |
|               DistillGPT2               | 16  | 106.8258 | 110.3284  | 55.8867  |         55.807         |
|       RobertaForQuestionAnswering       | 16  | 96.9853  |  98.7218  | 54.1639  |        54.1492         |
|       ElectraForQuestionAnswering       | 64  | 116.1091 | 117.4931  | 53.9407  |         53.893         |
|        BertForQuestionAnswering         | 16  | 96.7365  |  98.4099  | 53.7452  |        53.7093         |
|           PegasusForCausalLM            | 32  | 69.9836  |  83.7333  | 53.2489  |        53.2104         |
|             XGLMForCausalLM             |  8  |  93.387  | 146.3344  | 52.7061  |        52.8895         |
|           ElectraForCausalLM            | 32  |  89.611  |  94.3427  | 49.0565  |        49.0309         |
|       MT5ForConditionalGeneration       | 16  | 92.3155  | 111.3906  | 43.7342  |        43.7212         |
|       BlenderbotSmallForCausalLM        | 64  | 62.8581  |  69.3089  | 42.0479  |        42.0492         |
|      GPT2ForSequenceClassification      |  4  | 93.2532  |  95.4327  | 39.8223  |        39.7509         |
|         Speech2Text2ForCausalLM         | 256 |  53.442  |  56.6301  | 35.7724  |        35.7478         |
|      DebertaV2ForQuestionAnswering      |  2  | 126.2931 | 192.3707  |   nan    |          nan           |
|          DebertaV2ForMaskedLM           |  1  | 122.3793 | 192.1189  |   nan    |          nan           |
|          BlenderbotForCausalLM          |  4  | 104.363  | 130.0217  |   nan    |          nan           |
|    LayoutLMForSequenceClassification    | 16  | 99.1831  | 100.6837  |   nan    |          nan           |
|           DebertaForMaskedLM            |  4  | 80.4079  |  98.8314  |   nan    |          nan           |
|       DebertaForQuestionAnswering       |  8  | 80.0075  |  95.9283  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9991 |   0.997   |  3.2993  |         3.3012         |
|        twins_pcpvt_base         | 64  | 0.9981 |  0.9037   |  2.1057  |         2.0902         |
|      xcit_large_24_p8_224       |  5  | 0.9936 |  0.8702   |  2.0997  |         2.1006         |
|         coat_lite_mini          | 128 | 0.9973 |  0.9957   |  2.0576  |         2.0582         |
|          gmixer_24_224          | 128 | 0.9949 |  0.8894   |  1.8599  |         1.8621         |
|         crossvit_9_240          | 128 | 0.9903 |  0.7832   |  1.7896  |         1.7833         |
|          ghostnet_100           | 128 | 0.9921 |  0.7612   |  1.7783  |         1.7826         |
|           volo_d1_224           | 64  | 0.9939 |  0.9729   |  1.7272  |         1.7248         |
|          gmlp_s16_224           | 128 | 0.9944 |  1.0822   |  1.7209  |         1.7202         |
|  swin_base_patch4_window7_224   | 64  | 0.991  |  0.9544   |  1.7083  |         1.7071         |
|           convit_base           | 64  | 0.998  |  0.9971   |  1.6228  |         1.6226         |
|            pit_b_224            | 64  | 0.9949 |  0.9924   |  1.602   |         1.6041         |
|            lcnet_050            | 128 | 0.9413 |   0.73    |  1.5877  |         1.5826         |
|          jx_nest_base           | 32  | 0.9867 |  0.9852   |  1.5469  |         1.5457         |
|       gluon_inception_v3        | 128 | 0.9963 |  0.8646   |  1.5182  |         1.518          |
|        adv_inception_v3         | 128 | 0.9959 |  0.8597   |  1.509   |         1.5098         |
|          inception_v3           | 128 | 0.9981 |  0.8632   |  1.5074  |         1.5053         |
|          convnext_base          | 64  | 0.9836 |  0.9846   |  1.4959  |         1.4947         |
|        sebotnet33ts_256         | 64  | 0.9576 |  0.7547   |  1.4718  |         1.472          |
|             dla102              | 128 | 0.9958 |  0.8154   |  1.4684  |         1.4674         |
|           mobilevit_s           | 64  | 0.9618 |  0.7313   |  1.4474  |         1.4471         |
|      beit_base_patch16_224      | 64  | 0.9969 |  0.9588   |  1.4431  |         1.4446         |
|          cait_m36_384           |  4  | 0.995  |  0.9924   |  1.4373  |         1.439          |
|            nfnet_l0             | 128 | 0.9895 |  0.8142   |  1.4362  |         1.4496         |
|           dm_nfnet_f0           | 128 | 0.9866 |  0.9853   |  1.4131  |         1.4138         |
|       eca_botnext26ts_256       | 128 | 0.9734 |  0.7194   |  1.4042  |         1.4048         |
|          resmlp_12_224          | 128 | 0.9927 |  0.8893   |  1.3938  |         1.3925         |
|          botnet26t_256          | 128 | 0.9734 |   0.851   |  1.3868  |         1.3853         |
|           mnasnet_100           | 128 | 0.9488 |  0.7407   |  1.3724  |         1.3718         |
|           resnest101e           | 64  | 0.9945 |   0.868   |  1.3632  |         1.3657         |
|          mixer_b16_224          | 128 | 0.9974 |  1.0178   |  1.3596  |         1.3597         |
|           selecsls42b           | 128 | 0.9984 |  0.8114   |  1.3545  |         1.3524         |
|           regnety_002           | 128 | 0.9545 |  0.7143   |  1.3509  |         1.3572         |
|         mobilenetv2_100         | 128 | 0.9493 |  0.7379   |  1.3464  |         1.3488         |
|      mobilenetv3_large_100      | 128 | 0.9495 |  0.7603   |  1.3463  |         1.347          |
|      vit_base_patch16_224       | 64  | 0.9961 |  0.9935   |  1.3375  |         1.3342         |
|        res2net50_14w_8s         | 128 | 0.999  |  0.7907   |  1.336   |         1.335          |
|            hrnet_w18            | 128 | 0.9921 |  0.6427   |  1.3248  |         1.3228         |
|           res2next50            | 128 | 0.9987 |  0.8247   |  1.3153  |         1.3152         |
| deit_base_distilled_patch16_224 | 64  | 0.9966 |  0.9935   |  1.3144  |         1.3149         |
|          spnasnet_100           | 128 | 0.9412 |  0.7387   |  1.3038  |         1.3053         |
|       tf_efficientnet_b0        | 128 | 0.9609 |  0.6814   |  1.293   |         1.2933         |
|           fbnetc_100            | 128 | 0.9501 |  0.7387   |  1.2924  |         1.3164         |
|         poolformer_m36          | 64  | 0.9859 |  0.9831   |  1.2743  |         1.2761         |
|           rexnet_100            | 128 | 0.952  |  0.7028   |  1.2474  |         1.245          |
|        ese_vovnet19b_dw         | 128 | 0.9583 |  0.8341   |  1.2474  |         1.248          |
|            fbnetv3_b            | 128 | 0.949  |  0.7691   |  1.2178  |         1.2426         |
|         visformer_small         | 128 | 0.9962 |  0.9446   |  1.1907  |         1.1905         |
|            tinynet_a            | 128 | 0.9472 |  0.6783   |  1.1812  |         1.1823         |
|           tf_mixnet_l           | 128 | 0.9766 |   0.827   |  1.1678  |         1.1665         |
|            mixnet_l             | 128 | 0.9757 |  0.8212   |  1.1554  |         1.1558         |
|          cspdarknet53           | 64  | 0.9318 |  0.7859   |  1.1486  |         1.148          |
|        res2net101_26w_4s        | 64  | 1.0004 |  0.7876   |  1.122   |         1.123          |
|             dpn107              | 32  | 0.9319 |  0.8071   |  1.0769  |         1.0769         |
|        gluon_xception65         | 32  | 0.9923 |  0.8426   |  1.0646  |         1.0652         |
|     swsl_resnext101_32x16d      | 32  | 0.9977 |  0.8398   |  1.0434  |         1.0432         |
|            repvgg_a2            | 128 | 0.936  |  0.7563   |  1.0375  |         1.0353         |
|            gernet_l             | 128 | 0.9358 |  0.7928   |  1.0066  |         1.0234         |
|        convmixer_768_32         | 32  | 0.9986 |   0.965   |  0.9959  |         0.9959         |
|          pnasnet5large          | 16  | 0.9857 |   0.91    |  0.9034  |         0.901          |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+---------------+---------------+---------------+------------------------+
|              name               | bs |     eager     |   aot_eager   |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+---------------+---------------+---------------+------------------------+
|        adv_inception_v3         | 8  |     pass      |     pass      |     pass      |          pass          |
|           resnest101e           | 8  |     pass      |     pass      |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 8  |     pass      |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 8  |     pass      |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 8  |     pass      |     pass      |     pass      |          pass          |
|        twins_pcpvt_base         | 8  |     pass      |     pass      |     pass      |          pass          |
|         visformer_small         | 8  |     pass      |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 8  |     pass      |     pass      |     pass      |          pass          |
|           volo_d1_224           | 8  |     pass      |     pass      |     pass      |          pass          |
|          botnet26t_256          | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|          cspdarknet53           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|             dpn107              | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           fbnetc_100            | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|            mixnet_l             | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           mnasnet_100           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           mobilevit_s           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           regnety_002           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|            repvgg_a2            | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           rexnet_100            | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|          spnasnet_100           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|            tinynet_a            | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 8  | fail_accuracy | fail_accuracy |     pass      |          pass          |
|            gernet_l             | 8  | fail_accuracy | fail_accuracy |     pass      |          pass          |
|         mobilenetv2_100         | 8  | fail_accuracy | fail_accuracy |     pass      |          pass          |
|      beit_base_patch16_224      | 8  |     pass      |     pass      |     pass      |          pass          |
|           selecsls42b           | 8  |     pass      |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 8  |     pass      |     pass      |     pass      |          pass          |
|          gmlp_s16_224           | 8  |     pass      |     pass      |     pass      |          pass          |
|          cait_m36_384           | 4  |     pass      |     pass      |     pass      |          pass          |
|           convit_base           | 8  |     pass      |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 8  |     pass      |     pass      |     pass      |          pass          |
|          convnext_base          | 8  |     pass      |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 8  |     pass      |     pass      |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 8  |     pass      |     pass      |     pass      |          pass          |
|             dla102              | 8  |     pass      |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 8  |     pass      |     pass      |     pass      |          pass          |
|          ghostnet_100           | 8  |     pass      |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 8  |     pass      |     pass      |     pass      |          pass          |
|        gluon_xception65         | 8  |     pass      |     pass      |     pass      |          pass          |
|           res2next50            | 8  |     pass      |     pass      |     pass      |          pass          |
|          gmixer_24_224          | 8  |     pass      |     pass      |     pass      |          pass          |
|            hrnet_w18            | 8  |     pass      |     pass      |     pass      |          pass          |
|          inception_v3           | 8  |     pass      |     pass      |     pass      |          pass          |
|          jx_nest_base           | 8  |     pass      |     pass      |     pass      |          pass          |
|            lcnet_050            | 8  |     pass      |     pass      |     pass      |          pass          |
|          mixer_b16_224          | 8  |     pass      |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 8  |     pass      |     pass      |     pass      |          pass          |
|            nfnet_l0             | 8  |     pass      |     pass      |     pass      |          pass          |
|            pit_b_224            | 8  |     pass      |     pass      |     pass      |          pass          |
|          pnasnet5large          | 8  |     pass      |     pass      |     pass      |          pass          |
|         poolformer_m36          | 8  |     pass      |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 8  |     pass      |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 8  |     pass      |     pass      |     pass      |          pass          |
|        sebotnet33ts_256         | 8  |     pass      |     pass      | fail_accuracy |     fail_accuracy      |
|      xcit_large_24_p8_224       | 8  |     pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
|            fbnetv3_b            | 8  | fail_accuracy | fail_accuracy | fail_accuracy |     fail_accuracy      |
|         coat_lite_mini          | 8  |     pass      |     pass      |    0.0000     |          pass          |
+---------------------------------+----+---------------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+-----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor  | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+-----------+------------------------+
|        twins_pcpvt_base         | 64  | 10.9375 |  22.9749  | 1493.7265 |        84.4076         |
|           mobilevit_s           | 64  | 5.1798  |  11.1877  | 1421.1621 |        54.0064         |
|         coat_lite_mini          | 128 | 3.2757  |  7.7508   | 1266.3653 |        39.1627         |
|         crossvit_9_240          | 128 | 5.7183  |  13.177   | 1126.9456 |        53.4964         |
|           volo_d1_224           | 64  | 4.9475  |  11.5611  | 960.0025  |        50.4496         |
|      xcit_large_24_p8_224       |  5  | 12.256  |  27.6366  | 952.2158  |        94.8712         |
|  swin_base_patch4_window7_224   | 64  | 8.2057  |  19.0326  | 909.9143  |        76.4838         |
|            pit_b_224            | 64  | 3.3985  |  7.8688   | 907.1938  |        34.0108         |
|          cait_m36_384           |  4  | 13.429  |  30.2438  | 904.5886  |        109.7344        |
|          jx_nest_base           | 32  | 6.4038  |  14.6293  | 898.7522  |        61.1489         |
|        sebotnet33ts_256         | 64  | 4.0935  |  8.6344   | 607.0759  |        36.0296         |
|        tnt_s_patch16_224        | 128 | 6.3572  |  15.7293  | 519.0725  |        61.7616         |
|          botnet26t_256          | 128 | 2.8803  |  6.2326   | 467.1591  |        27.2511         |
|          convnext_base          | 64  |  6.597  |  12.6165  | 450.4418  |        42.0147         |
|          ghostnet_100           | 128 | 7.6607  |  14.4827  | 443.5871  |        45.6448         |
|           rexnet_100            | 128 | 5.4778  |  10.994   | 409.2706  |        38.3455         |
|           convit_base           | 64  | 3.4205  |  9.0273   | 367.8235  |        37.0268         |
|          pnasnet5large          | 16  |  7.612  |  25.3454  | 350.2498  |        95.9074         |
|         visformer_small         | 128 |  2.572  |   5.989   | 348.9025  |        24.2141         |
|        res2net101_26w_4s        | 64  | 10.1806 |  24.2164  | 339.5458  |        74.7437         |
|            hrnet_w18            | 128 | 8.7326  |  34.5125  | 338.9126  |        134.1646        |
|        adv_inception_v3         | 128 | 5.9209  |  13.0317  | 332.1356  |         45.519         |
|          gmixer_24_224          | 128 | 5.6274  |  12.7052  | 299.2786  |        42.3277         |
|            mixnet_l             | 128 | 8.1281  |  15.8957  | 292.1469  |        43.4132         |
|           fbnetc_100            | 128 | 4.9523  |  9.3137   | 289.0544  |        32.0899         |
|        res2net50_14w_8s         | 128 | 8.7271  |  21.7623  | 281.2083  |        69.0181         |
|      beit_base_patch16_224      | 64  | 4.0954  |  9.1919   | 276.1219  |        32.1217         |
|            tinynet_a            | 128 | 5.8447  |  11.9565  | 275.2935  |        36.2317         |
|       eca_botnext26ts_256       | 128 | 3.0388  |  6.7046   |  272.341  |        29.1281         |
| deit_base_distilled_patch16_224 | 64  | 3.2211  |  6.9901   | 256.0044  |        29.7817         |
|             dpn107              | 32  | 9.6262  |  18.9494  | 248.3074  |        53.3347         |
|            fbnetv3_b            | 128 | 8.1861  |  16.7199  | 247.0458  |        51.1052         |
|          mixer_b16_224          | 128 | 2.6709  |  5.8103   | 229.0474  |        22.4949         |
|         poolformer_m36          | 64  | 7.7828  |  13.4975  | 206.1682  |        54.5545         |
|           regnety_002           | 128 | 4.7592  |   8.665   |  191.623  |         26.015         |
|          cspdarknet53           | 64  | 5.6278  |  10.6572  | 185.1414  |         33.367         |
|          gmlp_s16_224           | 128 | 5.5001  |  11.8491  | 180.5791  |        42.9278         |
|           resnest101e           | 64  | 10.7308 |  23.8881  | 180.0071  |        68.2027         |
|          resmlp_12_224          | 128 | 2.7816  |   5.76    | 161.4423  |        20.7392         |
|            nfnet_l0             | 128 | 5.2234  |  10.7758  | 160.1165  |        29.6723         |
|            gernet_l             | 128 | 4.8682  |   8.771   | 156.1277  |        26.2432         |
|             dla102              | 128 | 6.3819  |  13.8443  | 154.9948  |        43.9537         |
|        gluon_xception65         | 32  | 7.5637  |  16.5978  |  153.068  |        49.0476         |
|            repvgg_a2            | 128 | 4.7286  |  8.6174   | 120.9601  |        25.3145         |
|           mnasnet_100           | 128 | 3.9199  |  7.4746   | 113.4024  |        25.4573         |
|       tf_efficientnet_b0        | 128 | 5.0115  |  10.2675  | 111.0374  |        32.2744         |
|           res2next50            | 128 | 4.9405  |  11.8552  | 101.3286  |        39.4143         |
|        ese_vovnet19b_dw         | 128 | 2.6367  |  4.5434   | 100.8023  |         18.15          |
|        convmixer_768_32         | 32  | 1.6536  |  6.7341   |  93.2744  |        25.0943         |
|           selecsls42b           | 128 | 2.4469  |  5.3327   |  89.5487  |        22.5624         |
|           tf_mixnet_l           | 128 | 8.8405  |  16.5804  |  80.9724  |        46.1405         |
|      mobilenetv3_large_100      | 128 | 4.1222  |  8.2565   |  79.517   |        28.7507         |
|         mobilenetv2_100         | 128 | 3.9283  |  7.7782   |  68.9263  |        26.5835         |
|     swsl_resnext101_32x16d      | 32  | 5.8336  |  13.197   |  66.7856  |        40.2228         |
|            lcnet_050            | 128 | 2.4885  |  5.2702   |  60.757   |        19.2951         |
|      vit_base_patch16_224       | 64  | 3.0516  |  6.9148   |  46.9498  |        29.0295         |
|          inception_v3           | 128 | 5.8675  |  12.2238  |  45.4211  |        46.2169         |
|       gluon_inception_v3        | 128 |  5.812  |  12.1337  |  45.3753  |        46.3915         |
|          spnasnet_100           | 128 | 4.8849  |  9.1503   |  37.0112  |        30.0364         |
|           dm_nfnet_f0           | 128 | 5.9129  |  11.2757  |  31.9991  |        32.3646         |
+---------------------------------+-----+---------+-----------+-----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 1.0015 |  0.9787   |  1.1839  |         1.2053         |
|          pnasnet5large          | 16  | 1.0593 |  0.9927   |  1.1539  |         1.1723         |
|          gmixer_24_224          | 128 | 1.0014 |  0.9787   |  1.1127  |         1.1381         |
|           convit_base           | 64  |  1.0   |  0.8505   |  1.0948  |         1.0997         |
|         mobilenetv2_100         | 128 | 0.9996 |  0.7725   |  1.0266  |         1.0431         |
|           dm_nfnet_f0           | 128 | 0.9808 |  0.9006   |  1.0129  |         1.0129         |
|          resmlp_12_224          | 128 | 0.9999 |  0.9667   |  1.0097  |         1.0742         |
|            tinynet_a            | 128 | 0.9998 |  0.7975   |  0.9985  |         1.025          |
|           resnest101e           | 64  | 0.9998 |  1.0033   |  0.9933  |         0.9971         |
|       tf_efficientnet_b0        | 128 | 0.9992 |  0.7813   |  0.9873  |         0.9872         |
|        tnt_s_patch16_224        | 128 |  1.0   |  0.9781   |  0.9834  |         0.9981         |
|           rexnet_100            | 128 |  1.0   |  0.7935   |  0.9746  |         0.9984         |
|        twins_pcpvt_base         | 64  | 1.0001 |  0.9273   |  0.9727  |         1.0054         |
|        convmixer_768_32         | 32  |  1.0   |  0.9812   |  0.9657  |         0.9764         |
|             dla102              | 128 | 0.9709 |  0.9221   |  0.9535  |         0.9535         |
|          mixer_b16_224          | 128 |  1.0   |  0.9644   |  0.9438  |         0.9522         |
|      vit_base_patch16_224       | 64  | 1.0001 |   0.936   |  0.9362  |         0.9362         |
|           tf_mixnet_l           | 128 | 0.9995 |  0.8647   |  0.9345  |         0.9345         |
|      beit_base_patch16_224      | 64  | 0.9999 |  0.9344   |  0.9306  |         0.9306         |
|           mobilevit_s           | 64  | 0.9998 |  0.7836   |  0.9262  |         0.9557         |
|         visformer_small         | 128 | 1.0005 |  0.9328   |  0.9245  |         0.9347         |
|            fbnetv3_b            | 128 | 0.9989 |  0.8019   |  0.9167  |         0.9227         |
|            nfnet_l0             | 128 | 1.0005 |  0.8489   |  0.9101  |         0.9214         |
|          cspdarknet53           | 64  | 0.9996 |   0.86    |  0.9098  |         0.9098         |
| deit_base_distilled_patch16_224 | 64  | 0.9995 |  0.9358   |  0.9071  |         0.9352         |
|           volo_d1_224           | 64  | 1.001  |  0.9514   |  0.9067  |         0.9327         |
|        ese_vovnet19b_dw         | 128 | 0.9986 |  0.9082   |  0.8975  |         0.9046         |
|        sebotnet33ts_256         | 64  | 0.9957 |  0.7151   |  0.8908  |         0.9207         |
|        adv_inception_v3         | 128 |  1.0   |  0.8752   |  0.8902  |         0.8902         |
|       gluon_inception_v3        | 128 |  1.0   |  0.8752   |  0.8902  |         0.8902         |
|          inception_v3           | 128 |  1.0   |  0.8752   |  0.8902  |         0.8902         |
|            hrnet_w18            | 128 | 0.9999 |  0.9269   |  0.8872  |         0.8918         |
|        gluon_xception65         | 32  | 0.9998 |  0.8877   |  0.8832  |         0.8832         |
|          spnasnet_100           | 128 | 0.9992 |  0.8982   |  0.8787  |         0.8787         |
|      xcit_large_24_p8_224       |  5  | 0.9989 |  0.8874   |  0.8761  |         0.8964         |
|       eca_botnext26ts_256       | 128 | 0.9995 |  0.7791   |  0.8738  |         0.8738         |
|             dpn107              | 32  | 0.9932 |  0.9066   |  0.8687  |         0.8833         |
|            mixnet_l             | 128 | 0.9997 |  0.8539   |  0.8686  |         0.8686         |
|           mnasnet_100           | 128 | 0.9992 |  0.8897   |  0.8683  |         0.8684         |
|           res2next50            | 128 | 1.0003 |   0.918   |  0.866   |         0.866          |
|      mobilenetv3_large_100      | 128 | 0.9993 |  0.8597   |  0.8649  |         0.8885         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8637  |         0.8637         |
|         poolformer_m36          | 64  | 1.0014 |  0.9514   |  0.8598  |         0.8769         |
|           fbnetc_100            | 128 | 0.9989 |  0.8651   |  0.8596  |         0.8963         |
|            pit_b_224            | 64  | 1.0005 |  0.8033   |  0.8566  |         0.8744         |
|        res2net101_26w_4s        | 64  | 1.0002 |  0.9186   |  0.8505  |         0.8813         |
|        res2net50_14w_8s         | 128 | 1.0002 |  0.9151   |  0.8496  |         0.8712         |
|            gernet_l             | 128 | 0.9989 |  0.8652   |  0.8493  |         0.8499         |
|     swsl_resnext101_32x16d      | 32  | 1.0001 |  0.8706   |  0.8477  |         0.8477         |
|           selecsls42b           | 128 | 1.0006 |  0.8947   |  0.8472  |         0.8784         |
|          ghostnet_100           | 128 | 0.9983 |  0.8894   |  0.8416  |         0.8972         |
|         coat_lite_mini          | 128 | 1.0445 |   0.929   |  0.8401  |         0.8647         |
|          convnext_base          | 64  | 1.0052 |  0.9275   |  0.832   |         0.8504         |
|          botnet26t_256          | 128 | 0.9994 |  0.8791   |  0.824   |         0.8239         |
|            lcnet_050            | 128 | 0.9982 |  0.8057   |  0.8172  |         0.8281         |
|           regnety_002           | 128 | 0.9992 |  0.8629   |  0.7846  |         0.8214         |
|            repvgg_a2            | 128 | 0.9997 |  0.7933   |  0.7738  |         0.7738         |
|         crossvit_9_240          | 128 | 0.999  |  0.8819   |  0.7526  |         0.776          |
|  swin_base_patch4_window7_224   | 64  | 1.001  |  0.9237   |  0.7214  |         0.7384         |
|          jx_nest_base           | 32  | 1.0006 |  0.8943   |  0.6693  |         0.6838         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 301.0442 | 311.3992  | 301.7929 |        301.8158        |
|          pnasnet5large          | 16  | 199.1005 | 214.9961  | 218.5681 |        218.7462        |
|            hrnet_w18            | 128 | 281.2086 | 433.1533  | 211.6484 |        210.718         |
|           tf_mixnet_l           | 128 | 193.8799 | 229.2666  | 162.3698 |        162.474         |
|            mixnet_l             | 128 | 185.5385 | 220.6245  | 156.7373 |        156.7082        |
|           resnest101e           | 64  | 165.1075 | 188.9326  | 120.3588 |        120.599         |
|             dla102              | 128 | 172.5703 | 210.5239  | 117.0292 |        117.0832        |
|          cait_m36_384           |  4  | 167.8886 | 167.9834  | 116.1649 |        116.2469        |
|         poolformer_m36          | 64  | 146.8675 | 147.2348  | 113.6049 |        113.5832        |
|     swsl_resnext101_32x16d      | 32  | 118.7959 | 141.3735  | 113.603  |        113.5603        |
|       gluon_inception_v3        | 128 | 160.4058 | 184.9758  | 105.3727 |        105.4292        |
|          inception_v3           | 128 | 159.2487 | 183.7234  | 105.2961 |        105.3201        |
|        adv_inception_v3         | 128 | 159.8353 | 185.0633  | 105.2829 |        105.3636        |
|        res2net50_14w_8s         | 128 | 140.5677 | 177.7007  | 105.2146 |        105.3745        |
|           convit_base           | 64  | 163.1107 |  163.213  | 100.4048 |        100.3903        |
|             dpn107              | 32  | 114.1529 |  131.61   |  98.606  |        98.6911         |
|        tnt_s_patch16_224        | 128 | 323.7242 | 324.0157  | 97.8962  |        97.8557         |
|           res2next50            | 128 | 125.5394 | 152.0323  | 95.4568  |        95.5517         |
|        gluon_xception65         | 32  | 99.6856  | 117.3351  | 93.1259  |        92.9022         |
|            fbnetv3_b            | 128 | 115.2621 | 142.3501  | 90.0136  |        88.1486         |
|           dm_nfnet_f0           | 128 | 128.3444 | 128.9464  | 89.5194  |        89.5123         |
|        res2net101_26w_4s        | 64  | 98.7865  | 125.6659  | 87.4592  |        87.4773         |
|          mixer_b16_224          | 128 | 116.7944 | 114.4476  | 85.6791  |        85.6619         |
|  swin_base_patch4_window7_224   | 64  | 147.4891 | 153.1044  |  85.486  |         85.793         |
|          convnext_base          | 64  | 124.5485 | 124.1639  |  81.795  |        81.8644         |
|          gmlp_s16_224           | 128 | 137.5815 | 126.5099  | 79.5972  |        79.6404         |
|            nfnet_l0             | 128 | 112.5881 | 136.7591  | 77.4578  |        77.4729         |
|          cspdarknet53           | 64  | 94.9862  | 112.6971  | 77.0842  |        77.1985         |
|         visformer_small         | 128 | 91.3647  |  96.3827  | 76.4604  |        76.4324         |
|       eca_botnext26ts_256       | 128 | 108.873  | 147.1515  | 75.3926  |        75.4105         |
|            pit_b_224            | 64  | 118.9212 | 119.0104  | 73.7484  |        73.6664         |
|            gernet_l             | 128 | 77.8019  |  91.6916  | 72.3449  |        71.1734         |
|          botnet26t_256          | 128 | 101.8712 | 116.5567  | 71.5378  |         71.612         |
|      beit_base_patch16_224      | 64  | 101.5135 |  105.727  | 70.1323  |        70.1878         |
|            repvgg_a2            | 128 |  77.693  |  96.0585  | 70.1066  |        70.1973         |
|           volo_d1_224           | 64  | 121.0063 | 123.5599  | 69.8246  |        69.8118         |
|      vit_base_patch16_224       | 64  | 87.0679  |  87.2499  | 64.9352  |         65.045         |
|          jx_nest_base           | 32  | 101.6843 | 101.5289  | 64.9104  |        64.9633         |
| deit_base_distilled_patch16_224 | 64  | 85.0397  |  85.1759  |  64.47   |        64.4549         |
|          gmixer_24_224          | 128 | 118.2172 | 131.9519  | 63.4326  |        63.2182         |
|       tf_efficientnet_b0        | 128 | 84.6866  | 119.5088  | 63.0207  |        62.9982         |
|           rexnet_100            | 128 |   80.0   | 108.5706  | 61.0416  |        61.2412         |
|           fbnetc_100            | 128 | 82.7582  | 106.4172  | 60.9432  |         59.841         |
|      xcit_large_24_p8_224       |  5  | 124.3672 | 142.2014  | 60.5284  |        60.3945         |
|            tinynet_a            | 128 | 73.6418  | 102.5825  | 58.9293  |        58.9064         |
|           mobilevit_s           | 64  | 84.5795  | 111.3608  | 56.2617  |        56.2496         |
|        twins_pcpvt_base         | 64  | 131.6595 |  140.647  | 55.9878  |        55.9654         |
|         coat_lite_mini          | 128 | 113.1055 | 113.0707  | 54.7766  |        54.8024         |
|        sebotnet33ts_256         | 64  | 80.5015  | 102.3617  | 52.2965  |        52.3101         |
|          spnasnet_100           | 128 | 70.4101  |  89.7348  | 50.8511  |        50.8508         |
|          ghostnet_100           | 128 | 90.6519  | 118.0659  | 50.5423  |        50.3824         |
|        ese_vovnet19b_dw         | 128 | 64.5959  |  74.2357  | 49.6637  |        49.6421         |
|         mobilenetv2_100         | 128 | 65.4573  |  84.3407  | 46.1979  |        46.0899         |
|         crossvit_9_240          | 128 | 82.6129  | 104.4326  | 45.6656  |        45.9106         |
|           mnasnet_100           | 128 | 64.1768  |  82.2424  | 44.3672  |        44.4069         |
|           selecsls42b           | 128 | 60.0486  |  73.9224  | 44.2568  |        44.3691         |
|      mobilenetv3_large_100      | 128 | 61.2516  |  76.5285  | 43.1577  |        43.2367         |
|          resmlp_12_224          | 128 | 53.4711  |  59.8274  | 38.1437  |        38.2947         |
|           regnety_002           | 128 | 41.3628  |  51.3802  | 27.4551  |        27.3733         |
|            lcnet_050            | 128 | 31.6182  |  40.9209  | 18.7843  |        18.8229         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

@williamwen42
Copy link
Collaborator

williamwen42 commented Mar 24, 2023

Performance Dashboard for amp precision (max autotune, with cold start)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 90%, 53/59 | 100%, 45/45 | 68%, 41/60  |
|       aot_eager        | 88%, 52/59 | 100%, 45/45 | 92%, 55/60  |
|        inductor        | 78%, 46/59 | 84%, 38/45  | 93%, 56/60  |
| inductor_no_cudagraphs | 78%, 46/59 | 84%, 38/45  | 92%, 55/60  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.59x    |    1.67x    |    1.38x    |
| inductor_no_cudagraphs |   1.57x    |    1.68x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.73    |    7.46     |    5.96     |
|       aot_eager        |    9.28    |    16.12    |    12.80    |
|        inductor        |   272.07   |   338.74    |   458.29    |
| inductor_no_cudagraphs |   273.46   |   324.87    |   448.96    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    0.97x    |    1.00x    |
|       aot_eager        |   0.86x    |    0.89x    |    0.89x    |
|        inductor        |   0.75x    |    0.90x    |    0.90x    |
| inductor_no_cudagraphs |   0.75x    |    0.90x    |    0.90x    |
+------------------------+------------+-------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |             dlrm              |   fail_to_run   |      fail_to_run       |
| torchbench  |          hf_BigBird           |   fail_to_run   |      fail_to_run       |
| torchbench  |        phlippe_resnet         |  fail_accuracy  |     fail_accuracy      |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |        vision_maskrcnn        | eager_variation |    eager_variation     |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |      doctr_det_predictor      |     0.0000      |         0.0000         |
| torchbench  |     doctr_reco_predictor      |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |      fail_to_run       |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
| timm_models |       gluon_xception65        |      pass       |     fail_accuracy      |
| timm_models |       sebotnet33ts_256        |      pass       |     fail_accuracy      |
| timm_models |       twins_pcpvt_base        |      pass       |         0.0000         |
| timm_models |            dla102             |  fail_accuracy  |          pass          |
| timm_models |     xcit_large_24_p8_224      |  fail_accuracy  |     fail_accuracy      |
| timm_models |           fbnetv3_b           |  fail_accuracy  |     fail_accuracy      |
| timm_models |        coat_lite_mini         |     0.0000      |          pass          |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-----------------------------------+----------+------------------------+
|    suite    |               name                | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------+----------+------------------------+
| torchbench  |            timm_regnet            |  0.9445  |         0.9507         |
| torchbench  |            timm_vovnet            |  0.9414  |         0.9241         |
| torchbench  |      nvidia_deeprecommender       |  0.9355  |         0.9348         |
| torchbench  |              alexnet              |   0.0    |          0.0           |
| torchbench  |            hf_Reformer            |   0.0    |          0.0           |
| torchbench  |           hf_GPT2_large           |   0.0    |          0.0           |
| torchbench  |               dlrm                |   0.0    |          0.0           |
| torchbench  |            hf_BigBird             |   0.0    |          0.0           |
| torchbench  |   timm_vision_transformer_large   |   0.0    |          0.0           |
| torchbench  |               moco                |   0.0    |          0.0           |
| torchbench  |        doctr_det_predictor        |   0.0    |          0.0           |
| torchbench  |       doctr_reco_predictor        |   0.0    |          0.0           |
| torchbench  |             tacotron2             |   0.0    |          0.0           |
| torchbench  |           torchrec_dlrm           |   0.0    |          0.0           |
| huggingface |  MobileBertForQuestionAnswering   |  0.9272  |         0.9363         |
| huggingface | LayoutLMForSequenceClassification |   0.0    |          0.0           |
| huggingface |    DebertaForQuestionAnswering    |   0.0    |          0.0           |
| huggingface |       BlenderbotForCausalLM       |   0.0    |          0.0           |
| huggingface |        DebertaForMaskedLM         |   0.0    |          0.0           |
| huggingface |   DebertaV2ForQuestionAnswering   |   0.0    |          0.0           |
| huggingface |       DebertaV2ForMaskedLM        |   0.0    |          0.0           |
| timm_models |           pnasnet5large           |  0.9045  |         0.8892         |
+-------------+-----------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+-----------------------------------------+-----------+------------------------+
|    suite    |                  name                   | inductor  | inductor_no_cudagraphs |
+-------------+-----------------------------------------+-----------+------------------------+
| torchbench  |           speech_transformer            | 811.9404  |        801.5224        |
| torchbench  |    attention_is_all_you_need_pytorch    | 671.8298  |        713.6033        |
| torchbench  |               hf_T5_large               | 514.4005  |        510.0798        |
| torchbench  |              hf_Longformer              | 448.7565  |        444.5843        |
| torchbench  |         timm_vision_transformer         | 430.9528  |        432.7792        |
| torchbench  |            phlippe_densenet             | 424.9286  |        426.024         |
| torchbench  |           mobilenet_v3_large            | 420.8786  |        415.3274        |
| torchbench  |              fastNLP_Bert               | 408.0208  |        436.9673        |
| torchbench  |                hf_Albert                | 407.7363  |        401.7922        |
| torchbench  |                 hf_GPT2                 | 377.4055  |        376.3293        |
| torchbench  |            timm_efficientnet            | 370.9045  |        370.5136        |
| torchbench  |              BERT_pytorch               | 348.6068  |        347.8106        |
| torchbench  |              hf_Bert_large              | 344.6791  |        343.3888        |
| torchbench  |                 hf_Bert                 | 343.1352  |        333.2602        |
| torchbench  |             pytorch_struct              | 343.1183  |        342.3984        |
| torchbench  |              mobilenet_v2               | 338.0467  |        338.612         |
| torchbench  |               densenet121               | 328.6991  |        343.919         |
| torchbench  |                 hf_Bart                 | 326.7358  |        322.7043        |
| torchbench  |               mnasnet1_0                | 323.0717  |        324.1732        |
| torchbench  |                  hf_T5                  | 282.9559  |        281.6635        |
| torchbench  |              hf_DistilBert              | 279.7668  |        279.3107        |
| torchbench  |                 yolov3                  | 262.6355  |        259.9473        |
| torchbench  |                resnet152                | 251.3667  |        246.7747        |
| torchbench  |               timm_vovnet               | 246.8284  |        250.8095        |
| torchbench  |                   drq                   |  243.14   |        263.0037        |
| torchbench  |         nvidia_deeprecommender          | 242.6257  |        251.857         |
| torchbench  |               timm_nfnet                |  229.374  |        225.6032        |
| torchbench  |              timm_resnest               | 229.2705  |        229.4557        |
| torchbench  |           shufflenet_v2_x1_0            | 224.7522  |        228.6695        |
| torchbench  |                resnet50                 | 214.3405  |        211.5939        |
| torchbench  |               timm_regnet               |  207.459  |        202.3439        |
| torchbench  |             resnext50_32x4d             | 181.0959  |        181.4499        |
| torchbench  |             LearningToPaint             | 173.8694  |        175.1049        |
| torchbench  |            soft_actor_critic            | 172.2714  |        181.3975        |
| torchbench  |                resnet18                 | 157.2362  |        148.1282        |
| torchbench  |                  vgg16                  | 156.8115  |        158.9342        |
| torchbench  |             phlippe_resnet              |  142.043  |        140.9482        |
| torchbench  |              lennard_jones              | 140.1834  |        136.1011        |
| torchbench  |              pytorch_unet               | 138.7686  |        140.8722        |
| torchbench  |           Background_Matting            | 135.1568  |        136.699         |
| huggingface |           PegasusForCausalLM            | 814.8897  |         255.29         |
| huggingface |          MobileBertForMaskedLM          | 621.2472  |        641.7833        |
| huggingface |     MobileBertForQuestionAnswering      |  582.039  |        579.7287        |
| huggingface |       MT5ForConditionalGeneration       | 551.3717  |        560.1428        |
| huggingface |            YituTechConvBert             | 465.3614  |        466.0206        |
| huggingface |           ElectraForCausalLM            |  437.293  |        439.8019        |
| huggingface |          AllenaiLongformerBase          | 411.2795  |        410.4339        |
| huggingface |     M2M100ForConditionalGeneration      | 380.7446  |        380.955         |
| huggingface |            XLNetLMHeadModel             | 352.8004  |        352.7516        |
| huggingface |            AlbertForMaskedLM            | 352.0903  |        370.1729        |
| huggingface |             XGLMForCausalLM             | 346.8905  |        344.7347        |
| huggingface |         MegatronBertForCausalLM         | 340.8456  |        344.1151        |
| huggingface |       ElectraForQuestionAnswering       | 340.3166  |        341.0878        |
| huggingface |     PegasusForConditionalGeneration     | 339.9314  |        317.5693        |
| huggingface |      MBartForConditionalGeneration      |  332.702  |        333.4432        |
| huggingface |                 T5Small                 | 324.3855  |        325.5567        |
| huggingface |       T5ForConditionalGeneration        | 323.6672  |        325.1325        |
| huggingface |    MegatronBertForQuestionAnswering     | 323.4524  |        324.4397        |
| huggingface |      GPT2ForSequenceClassification      | 323.1691  |        321.3846        |
| huggingface |      BartForConditionalGeneration       | 321.1777  |        333.1578        |
| huggingface |       AlbertForQuestionAnswering        | 314.2912  |        328.9652        |
| huggingface |     PLBartForConditionalGeneration      | 297.6701  |        278.446         |
| huggingface | BlenderbotSmallForConditionalGeneration | 289.8975  |        292.8094        |
| huggingface |       BlenderbotSmallForCausalLM        |  274.757  |        261.6962        |
| huggingface |       RobertaForQuestionAnswering       |  273.882  |        274.5867        |
| huggingface |               DistillGPT2               |  267.031  |        267.848         |
| huggingface |        BertForQuestionAnswering         | 265.9109  |        281.6978        |
| huggingface |           LayoutLMForMaskedLM           | 261.6479  |        262.2402        |
| huggingface |                CamemBert                | 260.9713  |        266.0405        |
| huggingface |          DistilBertForMaskedLM          | 256.5461  |        240.982         |
| huggingface |           RobertaForCausalLM            | 255.0796  |        254.953         |
| huggingface |             BertForMaskedLM             | 246.8009  |        261.1474        |
| huggingface |     DistilBertForQuestionAnswering      | 245.7984  |        244.739         |
| huggingface |             BartForCausalLM             | 240.4101  |        239.5187        |
| huggingface |             OPTForCausalLM              | 237.5466  |        237.6625        |
| huggingface |            MBartForCausalLM             | 235.4426  |        235.7554        |
| huggingface |         Speech2Text2ForCausalLM         | 234.1054  |        234.3185        |
| huggingface |            TrOCRForCausalLM             | 230.3112  |        228.9748        |
| huggingface |            PLBartForCausalLM            | 212.6627  |         213.84         |
| timm_models |            twins_pcpvt_base             | 1549.1182 |       1549.3968        |
| timm_models |               mobilevit_s               | 1260.1864 |       1250.1549        |
| timm_models |             coat_lite_mini              | 1256.0368 |       1261.5438        |
| timm_models |             crossvit_9_240              | 1181.8398 |       1163.6234        |
| timm_models |      swin_base_patch4_window7_224       | 1144.8752 |       1145.9039        |
| timm_models |               volo_d1_224               | 959.9922  |        972.2927        |
| timm_models |                pit_b_224                | 939.9928  |        944.8402        |
| timm_models |          xcit_large_24_p8_224           | 919.4447  |        930.1613        |
| timm_models |              jx_nest_base               |  909.285  |        909.8677        |
| timm_models |              cait_m36_384               | 877.7569  |        889.0475        |
| timm_models |            sebotnet33ts_256             | 723.3903  |        715.0022        |
| timm_models |            tnt_s_patch16_224            | 639.1069  |        644.4422        |
| timm_models |               convit_base               | 606.4733  |        607.7017        |
| timm_models |           eca_botnext26ts_256           | 576.4692  |        577.7936        |
| timm_models |              ghostnet_100               | 574.4232  |        585.6942        |
| timm_models |               rexnet_100                | 574.0523  |        579.1604        |
| timm_models |              botnet26t_256              | 564.6397  |        558.1972        |
| timm_models |                hrnet_w18                | 517.5075  |        521.5671        |
| timm_models |             visformer_small             | 451.9125  |        458.0089        |
| timm_models |              convnext_base              | 441.4118  |        458.4532        |
| timm_models |                fbnetv3_b                | 407.5326  |        401.4525        |
| timm_models |            res2net50_14w_8s             | 405.0288  |        402.7811        |
| timm_models |                tinynet_a                | 379.1793  |        385.4118        |
| timm_models |           tf_efficientnet_b0            | 373.0437  |        376.6531        |
| timm_models |            adv_inception_v3             |  371.087  |        371.0584        |
| timm_models |           gluon_inception_v3            | 367.0138  |        368.7707        |
| timm_models |          mobilenetv3_large_100          | 366.7194  |        368.975         |
| timm_models |              inception_v3               | 365.8731  |        372.7785        |
| timm_models |               tf_mixnet_l               | 364.5554  |        355.9007        |
| timm_models |              pnasnet5large              | 361.0583  |        357.5933        |
| timm_models |                mixnet_l                 | 357.2374  |        360.5247        |
| timm_models |              spnasnet_100               |  356.122  |        360.5331        |
| timm_models |               fbnetc_100                | 354.4488  |        358.268         |
| timm_models |            res2net101_26w_4s            | 347.2845  |        352.7768        |
| timm_models |     deit_base_distilled_patch16_224     |  333.189  |        330.9312        |
| timm_models |          vit_base_patch16_224           | 331.7911  |        330.6702        |
| timm_models |             mobilenetv2_100             | 327.4934  |        325.4134        |
| timm_models |               resnest101e               |  322.625  |        328.1115        |
| timm_models |          beit_base_patch16_224          | 318.5332  |        325.0581        |
| timm_models |               mnasnet_100               | 316.9831  |        323.0384        |
| timm_models |              gmixer_24_224              | 289.7795  |        296.1159        |
| timm_models |             poolformer_m36              | 278.4865  |        277.6924        |
| timm_models |                 dpn107                  | 277.8568  |        280.9468        |
| timm_models |               res2next50                | 275.9085  |        277.7791        |
| timm_models |              cspdarknet53               | 272.0032  |        273.5491        |
| timm_models |               selecsls42b               | 259.2803  |        260.0981        |
| timm_models |               regnety_002               | 258.7406  |        252.8447        |
| timm_models |              gmlp_s16_224               | 257.6034  |        258.6444        |
| timm_models |              resmlp_12_224              | 249.7601  |        252.793         |
| timm_models |              mixer_b16_224              | 249.4048  |        245.5265        |
| timm_models |            gluon_xception65             | 239.5066  |        234.587         |
| timm_models |                lcnet_050                | 229.6674  |        221.3622        |
| timm_models |            ese_vovnet19b_dw             | 223.3347  |        223.677         |
| timm_models |               dm_nfnet_f0               | 219.4157  |        222.9388        |
| timm_models |                gernet_l                 |  216.739  |        216.6382        |
| timm_models |                 dla102                  | 192.1844  |        191.9956        |
| timm_models |                nfnet_l0                 | 189.6519  |        186.6258        |
| timm_models |         swsl_resnext101_32x16d          |  187.444  |        191.9566        |
| timm_models |                repvgg_a2                | 176.7575  |        178.8257        |
+-------------+-----------------------------------------+-----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |            timm_efficientnet            |  0.9293  |         0.8747         |
| torchbench  |                 hf_Bert                 |  0.8815  |         0.8815         |
| torchbench  |                 yolov3                  |   0.87   |         0.8701         |
| torchbench  |           shufflenet_v2_x1_0            |  0.8596  |         0.8599         |
| torchbench  |           speech_transformer            |  0.8583  |         0.8583         |
| torchbench  |               timm_regnet               |  0.8512  |         0.8498         |
| torchbench  |              hf_DistilBert              |  0.8456  |         0.8456         |
| torchbench  |              timm_resnest               |  0.8414  |         0.8304         |
| torchbench  |         timm_vision_transformer         |  0.8357  |         0.8357         |
| torchbench  |           Background_Matting            |  0.834   |         0.834          |
| torchbench  |                resnet152                |  0.8296  |         0.8286         |
| torchbench  |               hf_T5_large               |  0.8201  |         0.8201         |
| torchbench  |            phlippe_densenet             |  0.806   |         0.7988         |
| torchbench  |           mobilenet_v3_large            |  0.7848  |         0.7275         |
| torchbench  |              pytorch_unet               |  0.7734  |         0.7734         |
| torchbench  |              squeezenet1_1              |  0.773   |         0.773          |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.7715         |
| torchbench  |                 demucs                  |  0.7665  |         0.7665         |
| torchbench  |                 hf_Bart                 |  0.7545  |         0.7543         |
| torchbench  |                resnet50                 |  0.7515  |         0.7522         |
| torchbench  |               timm_vovnet               |  0.7427  |         0.7427         |
| torchbench  |               mnasnet1_0                |  0.742   |         0.7486         |
| torchbench  |             pytorch_struct              |  0.7338  |         0.7274         |
| torchbench  |                  vgg16                  |  0.723   |         0.723          |
| torchbench  |               densenet121               |  0.7085  |         0.7085         |
| torchbench  |             resnext50_32x4d             |  0.6608  |         0.6608         |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.6585         |
| torchbench  |             LearningToPaint             |  0.6018  |         0.6018         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.5458  |         0.5458         |
| torchbench  |                resnet18                 |  0.5409  |         0.5409         |
| torchbench  |              hf_Longformer              |  0.4203  |         0.4206         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.3991         |
| torchbench  |             phlippe_resnet              |  0.3202  |         0.3202         |
| torchbench  |                   drq                   |  0.1818  |         0.1818         |
| torchbench  |                  dcgan                  |  0.1811  |         0.1811         |
| torchbench  |            soft_actor_critic            |  0.1078  |         0.1078         |
| torchbench  |              lennard_jones              |  0.0648  |         0.0648         |
| huggingface |     PegasusForConditionalGeneration     |  0.8911  |         0.8911         |
| huggingface |       MT5ForConditionalGeneration       |  0.8906  |         0.8906         |
| huggingface |           ElectraForCausalLM            |  0.8896  |         0.8896         |
| huggingface |            PLBartForCausalLM            |  0.8748  |         0.8748         |
| huggingface |          DistilBertForMaskedLM          |  0.8677  |         0.8677         |
| huggingface |      MBartForConditionalGeneration      |  0.8672  |         0.8672         |
| huggingface |            TrOCRForCausalLM             |  0.8558  |         0.8558         |
| huggingface |            MBartForCausalLM             |  0.8501  |         0.8501         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         0.8456         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         0.845          |
| huggingface |             BartForCausalLM             |  0.8311  |         0.8311         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.816   |         0.816          |
| huggingface |           PegasusForCausalLM            |  0.7966  |         0.7966         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.787   |         0.787          |
| huggingface |          MobileBertForMaskedLM          |  0.7473  |         0.7473         |
| huggingface |         Speech2Text2ForCausalLM         |  0.7364  |         0.7364         |
| huggingface |             XGLMForCausalLM             |  0.6744  |         0.6744         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6505  |         0.6505         |
| huggingface |     M2M100ForConditionalGeneration      |  0.6058  |         0.6058         |
| huggingface |          AllenaiLongformerBase          |  0.4696  |         0.4696         |
| timm_models |            ese_vovnet19b_dw             |  0.8975  |         0.8975         |
| timm_models |            sebotnet33ts_256             |  0.891   |         0.8908         |
| timm_models |           gluon_inception_v3            |  0.8902  |         0.8902         |
| timm_models |              inception_v3               |  0.8902  |         0.8902         |
| timm_models |            adv_inception_v3             |  0.8902  |         0.8902         |
| timm_models |                hrnet_w18                |  0.8872  |         0.8872         |
| timm_models |            gluon_xception65             |  0.8832  |         0.8832         |
| timm_models |              spnasnet_100               |  0.8786  |         0.8786         |
| timm_models |          xcit_large_24_p8_224           |  0.8761  |         0.8761         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         0.8738         |
| timm_models |                mixnet_l                 |  0.8686  |         0.8686         |
| timm_models |                 dpn107                  |  0.8685  |         0.8685         |
| timm_models |               mnasnet_100               |  0.8683  |         0.8683         |
| timm_models |              cait_m36_384               |  0.8637  |         0.8637         |
| timm_models |             poolformer_m36              |  0.8598  |         0.8598         |
| timm_models |               fbnetc_100                |  0.8596  |         0.8596         |
| timm_models |                pit_b_224                |  0.8566  |         0.8566         |
| timm_models |            res2net101_26w_4s            |  0.8505  |         0.8505         |
| timm_models |            res2net50_14w_8s             |  0.8497  |         0.8494         |
| timm_models |                gernet_l                 |  0.8495  |         0.8496         |
| timm_models |         swsl_resnext101_32x16d          |  0.8477  |         0.8477         |
| timm_models |               selecsls42b               |  0.8471  |         0.8472         |
| timm_models |               res2next50                |  0.8452  |         0.8452         |
| timm_models |              ghostnet_100               |  0.8416  |         0.8416         |
| timm_models |          mobilenetv3_large_100          |  0.8413  |         0.8413         |
| timm_models |             coat_lite_mini              |  0.8401  |         0.8401         |
| timm_models |              convnext_base              |  0.832   |         0.832          |
| timm_models |              botnet26t_256              |  0.824   |         0.824          |
| timm_models |                lcnet_050                |  0.8048  |         0.8048         |
| timm_models |                repvgg_a2                |  0.7738  |         0.7738         |
| timm_models |               regnety_002               |   0.76   |          0.76          |
| timm_models |             crossvit_9_240              |  0.7525  |         0.7526         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.7214         |
| timm_models |              jx_nest_base               |  0.6693  |         0.6693         |
+-------------+-----------------------------------------+----------+------------------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9746 |  0.9236   |  3.662   |         3.5979         |
|           BERT_pytorch            |  16  | 0.9898 |  0.7963   |  3.1291  |         3.3661         |
|            densenet121            |  4   | 0.9895 |  0.7021   |  2.8065  |         2.6894         |
|            hf_T5_large            |  2   | 0.9821 |  0.8081   |  2.509   |         2.3003         |
|              hf_Bart              |  4   | 1.0209 |  0.7921   |  2.4068  |         1.8198         |
|             hf_Albert             |  8   | 0.9967 |  0.9594   |  2.3429  |         2.3003         |
|         phlippe_densenet          | 128  | 0.9882 |  0.7687   |  2.0411  |         2.0366         |
|        mobilenet_v3_large         |  32  | 0.996  |  0.7824   |  1.9934  |         1.975          |
|              hf_GPT2              |  4   | 0.9934 |   0.959   |  1.9291  |         1.9049         |
|               hf_T5               |  8   | 0.9874 |  0.8549   |  1.9207  |         1.9199         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.979  |  0.8999   |  1.8918  |         1.9186         |
|           squeezenet1_1           |  32  | 0.9849 |  0.9328   |   1.86   |         1.8459         |
|              hf_Bert              |  4   | 0.9956 |  0.8407   |  1.8589  |         1.7836         |
|          phlippe_resnet           | 128  | 0.9899 |  0.7542   |  1.8075  |         1.8209         |
| attention_is_all_you_need_pytorch | 256  | 0.989  |  0.8368   |  1.8016  |         1.6518         |
|           hf_Longformer           |  2   | 0.9252 |  0.6047   |   1.79   |         1.7971         |
|          resnext50_32x4d          |  8   | 0.991  |  0.7309   |  1.7085  |         1.7171         |
|            mnasnet1_0             |  32  | 0.9897 |  0.7359   |  1.7049  |         1.6477         |
|        speech_transformer         |  32  | 0.9833 |  0.7947   |  1.6978  |         1.6917         |
|      timm_vision_transformer      |  32  | 0.9858 |  0.8468   |  1.6895  |         1.7383         |
|          pytorch_struct           | 200  | 0.9475 |  0.7728   |  1.6787  |         1.7968         |
|           fastNLP_Bert            |  6   | 0.9904 |  0.7991   |  1.6516  |         1.652          |
|        shufflenet_v2_x1_0         | 128  | 0.9937 |  0.7578   |  1.6243  |         1.5341         |
|           hf_Bert_large           |  4   | 0.9986 |   0.869   |  1.6226  |         1.7705         |
|                drq                |  1   | 0.9652 |  0.7589   |  1.6021  |         1.5104         |
|             resnet18              |  16  | 0.9862 |  0.7674   |  1.5523  |         1.5773         |
|               dcgan               |  32  | 0.8865 |   0.714   |  1.4782  |         1.4896         |
|           mobilenet_v2            |  96  | 0.997  |  0.7769   |  1.4765  |         1.4777         |
|           hf_DistilBert           |  8   | 1.0012 |  0.9406   |  1.4693  |         1.4257         |
|            timm_nfnet             | 128  | 0.9866 |  0.9838   |  1.468   |         1.4635         |
|           lennard_jones           | 1000 | 0.8601 |  0.7407   |  1.4464  |         1.5381         |
|           timm_resnest            |  32  | 0.9924 |   0.853   |  1.4459  |         1.4536         |
|         timm_efficientnet         |  32  | 0.9379 |  0.6245   |  1.3829  |         1.3947         |
|         soft_actor_critic         | 256  | 0.8171 |  0.6609   |  1.3122  |         1.262          |
|          LearningToPaint          |  96  | 0.9884 |  0.7668   |  1.2739  |         1.2848         |
|               vgg16               |  64  | 0.9992 |   0.998   |  1.2445  |         1.2447         |
|          pytorch_stargan          |  16  | 0.9922 |   0.807   |  1.2273  |         1.2254         |
|            Super_SloMo            |  6   | 0.9968 |  0.1781   |  1.2179  |         1.2169         |
|        Background_Matting         |  4   | 0.9991 |  0.1369   |  1.173   |         1.173          |
|           pytorch_unet            |  1   | 0.9962 |  0.2049   |  1.1719  |         1.1712         |
|             resnet152             |  32  | 0.9963 |  0.7511   |  1.1568  |         1.1564         |
|             resnet50              |  32  | 0.9972 |  0.7605   |  1.1374  |         1.1891         |
|              yolov3               |  16  | 0.996  |  0.8059   |  1.115   |         1.1151         |
|              demucs               |  4   | 1.0004 |   1.001   |  1.0261  |         1.0291         |
|            tts_angular            |  64  | 0.9587 |  0.9191   |  0.9848  |         0.9907         |
|            timm_regnet            |  32  | 0.9153 |  0.7728   |  0.9445  |         0.9507         |
|            timm_vovnet            |  32  | 0.8477 |  0.7091   |  0.9414  |         0.9241         |
|      nvidia_deeprecommender       | 256  | 0.9992 |  0.9981   |  0.9355  |         0.9348         |
|              alexnet              | 128  | 0.9988 |  0.9978   |   0.0    |          0.0           |
|            hf_Reformer            |  4   | 0.9922 |  0.9932   |   0.0    |          0.0           |
|           hf_GPT2_large           |  4   | 0.983  |  0.9716   |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.9391 |  0.8483   |   0.0    |          0.0           |
|            hf_BigBird             |  2   | 0.9798 |  0.7923   |   0.0    |          0.0           |
|   timm_vision_transformer_large   |  32  | 0.9981 |    0.0    |   0.0    |          0.0           |
|               moco                |  32  | 0.9808 |    0.0    |   0.0    |          0.0           |
|        doctr_det_predictor        |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|       doctr_reco_predictor        |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|               dlrm                |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|            hf_BigBird             |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|          phlippe_resnet           |  4  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  |      0.0000      | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|        doctr_det_predictor        |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|       doctr_reco_predictor        |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|        speech_transformer         |  32  | 5.8681  |  13.5367  | 811.9404 |        801.5224        |
| attention_is_all_you_need_pytorch | 256  | 4.3576  |  10.6986  | 671.8298 |        713.6033        |
|            hf_T5_large            |  2   | 26.0158 |  54.0349  | 514.4005 |        510.0798        |
|           hf_Longformer           |  2   | 11.7053 |  30.7402  | 448.7565 |        444.5843        |
|      timm_vision_transformer      |  32  | 3.2698  |  7.2605   | 430.9528 |        432.7792        |
|         phlippe_densenet          | 128  | 3.3621  |  6.9743   | 424.9286 |        426.024         |
|        mobilenet_v3_large         |  32  | 3.3586  |  7.6126   | 420.8786 |        415.3274        |
|           fastNLP_Bert            |  6   | 5.0195  |  10.9845  | 408.0208 |        436.9673        |
|             hf_Albert             |  8   | 2.4723  |  8.5721   | 407.7363 |        401.7922        |
|              hf_GPT2              |  4   | 4.5578  |  9.5606   | 377.4055 |        376.3293        |
|         timm_efficientnet         |  32  | 4.9059  |  10.1556  | 370.9045 |        370.5136        |
|           BERT_pytorch            |  16  | 4.7936  |  11.4043  | 348.6068 |        347.8106        |
|           hf_Bert_large           |  4   | 9.9442  |  20.9677  | 344.6791 |        343.3888        |
|              hf_Bert              |  4   | 4.9178  |  10.4553  | 343.1352 |        333.2602        |
|          pytorch_struct           | 200  | 0.7768  |   1.321   | 343.1183 |        342.3984        |
|           mobilenet_v2            |  96  | 3.0792  |  6.8929   | 338.0467 |        338.612         |
|            densenet121            |  4   |  7.453  |  17.5669  | 328.6991 |        343.919         |
|              hf_Bart              |  4   | 10.7711 |  17.9662  | 326.7358 |        322.7043        |
|            mnasnet1_0             |  32  | 3.0395  |  6.6307   | 323.0717 |        324.1732        |
|               hf_T5               |  8   | 5.7026  |  13.3363  | 282.9559 |        281.6635        |
|           hf_DistilBert           |  8   | 2.4985  |  5.5365   | 279.7668 |        279.3107        |
|              yolov3               |  16  | 4.7603  |  11.3064  | 262.6355 |        259.9473        |
|             resnet152             |  32  |  8.877  |  19.9992  | 251.3667 |        246.7747        |
|            timm_vovnet            |  32  |  3.53   |  6.3249   | 246.8284 |        250.8095        |
|                drq                |  1   | 0.6622  |  1.0061   |  243.14  |        263.0037        |
|      nvidia_deeprecommender       | 256  | 0.4753  |  0.7606   | 242.6257 |        251.857         |
|            timm_nfnet             | 128  | 5.6618  |  11.0017  | 229.374  |        225.6032        |
|           timm_resnest            |  32  | 1.8281  |  3.8931   | 229.2705 |        229.4557        |
|        shufflenet_v2_x1_0         | 128  | 3.4138  |  7.5888   | 224.7522 |        228.6695        |
|             resnet50              |  32  | 3.1864  |  7.3638   | 214.3405 |        211.5939        |
|            timm_regnet            |  32  |  6.717  |  12.1563  | 207.459  |        202.3439        |
|          resnext50_32x4d          |  8   | 3.1695  |  6.9125   | 181.0959 |        181.4499        |
|          LearningToPaint          |  96  | 1.4565  |  2.8318   | 173.8694 |        175.1049        |
|         soft_actor_critic         | 256  |  0.445  |  0.6082   | 172.2714 |        181.3975        |
|             resnet18              |  16  | 1.3341  |  2.7142   | 157.2362 |        148.1282        |
|               vgg16               |  64  | 0.6314  |  1.1167   | 156.8115 |        158.9342        |
|          phlippe_resnet           | 128  | 1.3364  |  2.7032   | 142.043  |        140.9482        |
|           lennard_jones           | 1000 | 0.3913  |  0.5997   | 140.1834 |        136.1011        |
|           pytorch_unet            |  1   | 1.6234  |   4.367   | 138.7686 |        140.8722        |
|        Background_Matting         |  4   | 3.0396  |  11.433   | 135.1568 |        136.699         |
|       functorch_dp_cifar10        |  64  |  1.207  |  2.3889   | 117.332  |        117.371         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2559  |  2.8988   | 101.5353 |        101.5587        |
|              demucs               |  4   | 1.4184  |  2.1307   | 78.2632  |        80.3662         |
|            Super_SloMo            |  6   |  2.738  |  9.6775   | 78.1261  |        78.2861         |
|          pytorch_stargan          |  16  | 1.1868  |  3.1623   | 48.8599  |        51.7515         |
|           squeezenet1_1           |  32  | 1.0267  |  1.7364   | 47.1619  |        46.8625         |
|               dcgan               |  32  | 0.4283  |  0.7074   | 19.3249  |         17.817         |
|            tts_angular            |  64  | 0.4437  |  0.5138   |  6.1078  |         6.1647         |
|            hf_BigBird             |  2   |  12.76  |  36.8124  |   nan    |          nan           |
|           hf_GPT2_large           |  4   | 14.2063 |  29.5919  |   nan    |          nan           |
|            hf_Reformer            |  4   |  4.117  |  5.9118   |   nan    |          nan           |
|               dlrm                | 1024 | 0.3698  |  0.7866   |   nan    |          nan           |
|              alexnet              | 128  |  0.481  |  0.7624   |   nan    |          nan           |
|               moco                |  32  | 27.4118 |    nan    |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 9.2332  |    nan    |   nan    |          nan           |
|        doctr_det_predictor        |  0   |   nan   |    nan    |   nan    |          nan           |
|       doctr_reco_predictor        |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.1588  |         1.1588         |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0399  |         1.0399         |
|           mobilenet_v2            |  96  | 0.9866 |  0.7652   |  1.0106  |         1.0111         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  0.9988  |         0.9988         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  0.9953  |         0.9953         |
|            tts_angular            |  64  | 0.9957 |  0.9957   |  0.9852  |         0.9852         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9693  |         0.9693         |
|            timm_nfnet             | 128  | 0.9071 |  0.8746   |  0.9614  |         0.9612         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         0.9428         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9317  |         0.9317         |
|         timm_efficientnet         |  32  | 0.9874 |  0.7663   |  0.9293  |         0.8747         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9138  |         0.9138         |
|              hf_Bert              |  4   | 0.9645 |  0.8338   |  0.8815  |         0.8815         |
|              yolov3               |  16  | 0.9877 |  0.8253   |   0.87   |         0.8701         |
|        shufflenet_v2_x1_0         | 128  | 0.954  |  0.8383   |  0.8596  |         0.8599         |
|        speech_transformer         |  32  | 0.9914 |   0.901   |  0.8583  |         0.8583         |
|            timm_regnet            |  32  | 0.9908 |  0.8499   |  0.8512  |         0.8498         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8456  |         0.8456         |
|           timm_resnest            |  32  | 0.9888 |  0.8817   |  0.8414  |         0.8304         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8357  |         0.8357         |
|        Background_Matting         |  4   | 1.0125 |  0.6486   |  0.834   |         0.834          |
|             resnet152             |  32  | 0.996  |  0.8915   |  0.8296  |         0.8286         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         0.8201         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.806   |         0.7988         |
|        mobilenet_v3_large         |  32  | 0.9793 |  0.8396   |  0.7848  |         0.7275         |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.7734  |         0.7734         |
|           squeezenet1_1           |  32  | 0.9674 |  0.9353   |  0.773   |         0.773          |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.7715         |
|              demucs               |  4   | 0.9663 |  0.9664   |  0.7665  |         0.7665         |
|              hf_Bart              |  4   | 0.9084 |   0.843   |  0.7545  |         0.7543         |
|             resnet50              |  32  | 0.9909 |  0.8638   |  0.7515  |         0.7522         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7427  |         0.7427         |
|            mnasnet1_0             |  32  | 0.9801 |  0.8686   |  0.742   |         0.7486         |
|          pytorch_struct           | 200  | 0.9992 |  0.5168   |  0.7338  |         0.7274         |
|               vgg16               |  64  | 0.9922 |  0.7246   |  0.723   |         0.723          |
|            densenet121            |  4   | 0.9944 |  0.9823   |  0.7085  |         0.7085         |
|          resnext50_32x4d          |  8   | 0.9947 |  0.8434   |  0.6608  |         0.6608         |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.6585         |
|          LearningToPaint          |  96  | 0.9202 |  0.7116   |  0.6018  |         0.6018         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8594   |  0.5458  |         0.5458         |
|             resnet18              |  16  | 0.983  |  0.8055   |  0.5409  |         0.5409         |
|           hf_Longformer           |  2   | 0.8565 |  0.8296   |  0.4203  |         0.4206         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.3991         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3202  |         0.3202         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.1818  |         0.1818         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.1811  |         0.1811         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9255   |  0.1078  |         0.1078         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.0648  |         0.0648         |
|               dlrm                | 1024 | 0.9995 |  0.9944   |   nan    |          nan           |
|            hf_BigBird             |  2   | 0.9493 |  0.9268   |   nan    |          nan           |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |   nan    |          nan           |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |   nan    |          nan           |
|              alexnet              | 128  | 0.9452 |  0.7919   |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |          nan           |
|               moco                |  32  |  0.99  |    nan    |   nan    |          nan           |
|        doctr_det_predictor        |  0   |  nan   |    nan    |   nan    |          nan           |
|       doctr_reco_predictor        |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|        Background_Matting         |  4   | 126.188  | 920.9148  | 107.3641 |        107.3898        |
|            hf_T5_large            |  2   | 226.1605 | 273.5668  | 97.5123  |         97.364         |
|               hf_T5               |  8   | 183.5685 | 212.1523  | 93.4684  |         93.424         |
|            timm_nfnet             | 128  | 120.3776 | 120.3382  | 80.7816  |        80.7568         |
|            Super_SloMo            |  6   |  79.768  | 446.9936  | 65.2653  |        65.3729         |
|           hf_Longformer           |  2   | 134.7514 | 185.8467  | 63.0971  |        62.7711         |
|              yolov3               |  16  | 68.7545  |  85.0545  | 61.4474  |        61.6063         |
|            timm_regnet            |  32  | 61.6607  |  72.541   | 59.1829  |        59.1793         |
|             resnet152             |  32  | 66.9857  |  88.1438  | 54.3423  |         54.388         |
|               vgg16               |  64  | 66.3144  |  66.3727  | 53.3586  |        53.3111         |
|              demucs               |  4   | 53.7002  |  53.4274  | 51.9174  |         52.292         |
|           hf_Bert_large           |  4   | 82.7755  |  95.0971  | 50.9809  |        51.0579         |
|           pytorch_unet            |  1   | 40.0974  |  194.485  | 34.0614  |        34.0626         |
|        speech_transformer         |  32  | 67.6605  |  82.6321  | 33.5419  |        34.1472         |
| attention_is_all_you_need_pytorch | 256  | 58.0688  |  67.5439  | 32.9818  |        32.9453         |
|              hf_Bart              |  4   | 72.3349  |  86.4992  | 32.4278  |        32.1897         |
|           mobilenet_v2            |  96  | 47.0772  |  60.4758  | 31.8545  |        31.8096         |
|           fastNLP_Bert            |  6   | 57.1553  |  70.013   | 31.4759  |        31.4673         |
|             hf_Albert             |  8   | 68.5193  |  72.423   | 29.6847  |        29.6681         |
|            timm_vovnet            |  32  |  28.972  |  35.2093  | 26.8181  |        26.8979         |
|              hf_GPT2              |  4   |  49.085  |  50.7364  |  25.272  |        25.2969         |
|         timm_efficientnet         |  32  |  34.504  |  52.1242  | 23.3891  |        23.4013         |
|             resnet50              |  32  | 27.0455  |  36.8316  | 22.9154  |        22.8704         |
|              hf_Bert              |  4   | 45.3523  |  48.2591  | 22.4861  |        22.6212         |
|           hf_DistilBert           |  8   | 33.4487  |  34.5952  | 22.0925  |        22.0207         |
|        shufflenet_v2_x1_0         | 128  | 32.1145  |  39.8152  | 19.6246  |        19.6546         |
|            densenet121            |  4   | 61.1776  |  84.6034  | 18.6762  |        20.0414         |
|           BERT_pytorch            |  16  |  53.911  |  65.6561  | 17.2022  |        17.2928         |
|      timm_vision_transformer      |  32  | 28.9208  |  33.1916  | 16.6903  |        16.7272         |
|           timm_resnest            |  32  | 24.2712  |  28.3163  | 16.6836  |        16.6077         |
|            mnasnet1_0             |  32  | 21.8716  |  31.7637  | 13.9869  |        13.3091         |
|        mobilenet_v3_large         |  32  | 28.6989  |  34.3167  | 13.2958  |        13.2472         |
|          pytorch_stargan          |  16  | 14.7159  |  17.9245  | 11.8429  |        11.8773         |
|          resnext50_32x4d          |  8   | 20.0615  |  27.0873  | 11.6723  |        11.7853         |
|         phlippe_densenet          | 128  | 25.8489  |  30.5896  | 11.6326  |        11.6586         |
|      nvidia_deeprecommender       | 256  | 10.2264  |  10.2346  | 10.9126  |        10.9174         |
|          LearningToPaint          |  96  | 12.0674  |  15.0326  |  8.7502  |         8.7575         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.1254  |  14.8061  |  7.1496  |         7.1753         |
|            tts_angular            |  64  |  6.623   |  6.9183   |  6.3719  |         7.0997         |
|             resnet18              |  16  |  9.0644  |  11.6187  |  5.769   |         6.2752         |
|           squeezenet1_1           |  32  | 10.2129  |  11.6679  |  5.449   |         5.4429         |
|          phlippe_resnet           | 128  |  9.1012  |  11.8759  |  5.0423  |         5.1005         |
|          pytorch_struct           | 200  |  5.7718  |  6.0466   |  3.2621  |         3.0943         |
|       functorch_dp_cifar10        |  64  | 10.6986  |  11.1796  |  2.8798  |         2.8826         |
|                drq                |  1   |  3.4406  |  4.3278   |  2.2122  |         2.2131         |
|               dcgan               |  32  |  2.3487  |  3.2693   |  1.4337  |         1.4339         |
|         soft_actor_critic         | 256  |  2.6852  |  2.4219   |  1.394   |         1.2928         |
|           lennard_jones           | 1000 |  1.7279  |  2.1056   |  1.0799  |         1.1465         |
|            hf_BigBird             |  2   | 193.8281 | 274.9347  |   nan    |          nan           |
|           hf_GPT2_large           |  4   | 212.6495 | 215.2812  |   nan    |          nan           |
|            hf_Reformer            |  4   | 81.6056  |  81.5701  |   nan    |          nan           |
|              alexnet              | 128  |  9.8388  |  9.8515   |   nan    |          nan           |
|               dlrm                | 1024 |  4.4203  |  4.9248   |   nan    |          nan           |
|   timm_vision_transformer_large   |  32  | 465.1919 |    nan    |   nan    |          nan           |
|               moco                |  32  | 52.5112  |    nan    |   nan    |          nan           |
|        doctr_det_predictor        |  0   |   nan    |    nan    |   nan    |          nan           |
|       doctr_reco_predictor        |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9942 |  0.9359   |  2.4989  |         2.505          |
|      GPT2ForSequenceClassification      |  4  | 0.9809 |  0.9575   |  2.3042  |         2.3047         |
|       ElectraForQuestionAnswering       | 64  | 0.9879 |  0.9777   |  2.1261  |         2.1251         |
|       MT5ForConditionalGeneration       | 16  | 0.994  |  0.8331   |  2.1198  |         2.3462         |
|             XGLMForCausalLM             |  8  | 0.9689 |   0.756   |  2.0803  |         2.2268         |
|     M2M100ForConditionalGeneration      | 16  | 1.0386 |  0.8242   |  2.0196  |         2.2301         |
|      MBartForConditionalGeneration      |  2  | 0.9944 |  0.9533   |  1.8923  |         1.536          |
|               DistillGPT2               | 16  | 0.9906 |  0.9577   |  1.8889  |         1.8886         |
|            PLBartForCausalLM            |  8  | 0.9921 |  0.9626   |  1.8785  |         1.8758         |
|            XLNetLMHeadModel             |  8  | 0.9966 |  0.9671   |  1.8342  |         1.8388         |
|          MobileBertForMaskedLM          | 64  | 0.9474 |  0.8102   |  1.8218  |         1.8041         |
|           ElectraForCausalLM            | 32  | 0.9833 |  0.9369   |  1.7939  |         1.7971         |
|       RobertaForQuestionAnswering       | 16  | 0.9855 |  0.9698   |  1.7829  |         1.7818         |
|        BertForQuestionAnswering         | 16  | 0.9854 |   0.971   |  1.7696  |         1.7711         |
|          AllenaiLongformerBase          |  4  | 0.9455 |  0.6584   |  1.766   |         1.7542         |
|     PLBartForConditionalGeneration      |  4  | 0.9926 |  0.9393   |  1.7384  |         1.7431         |
|           RobertaForCausalLM            | 16  | 0.9881 |  0.9635   |  1.6676  |         1.6706         |
|       T5ForConditionalGeneration        |  4  | 0.9836 |   0.855   |  1.6653  |         1.6562         |
|                 T5Small                 |  4  | 0.9834 |  0.8595   |  1.6621  |         1.6593         |
|            MBartForCausalLM             |  4  | 0.9933 |  0.9655   |  1.645   |         1.6471         |
|             BartForCausalLM             |  4  | 0.9942 |  0.9666   |  1.638   |         1.6438         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9813 |  0.9614   |   1.62   |         1.6249         |
|                CamemBert                | 16  | 0.988  |  0.9633   |  1.619   |         1.6201         |
|       AlbertForQuestionAnswering        |  4  | 0.9999 |  0.8856   |  1.6163  |         1.6168         |
|            YituTechConvBert             | 16  | 0.9863 |  0.9494   |  1.613   |         1.6131         |
|            AlbertForMaskedLM            |  4  | 0.9997 |  0.8849   |  1.6071  |         1.6083         |
|             BertForMaskedLM             | 16  | 0.9863 |  0.9606   |  1.5977  |         1.5978         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.999  |  0.9031   |  1.5926  |         1.5913         |
|           LayoutLMForMaskedLM           | 16  | 0.9861 |  0.9631   |  1.5826  |         1.5936         |
|      BartForConditionalGeneration       |  2  | 0.9959 |  0.9619   |  1.5433  |         1.5814         |
|         MegatronBertForCausalLM         |  4  | 0.9928 |  0.9097   |  1.5092  |         1.4978         |
|         Speech2Text2ForCausalLM         | 256 | 0.9886 |  0.9262   |  1.498   |         1.5172         |
|     DistilBertForQuestionAnswering      | 256 | 0.9946 |  0.9873   |  1.4495  |         1.4508         |
|           PegasusForCausalLM            | 32  | 0.9718 |  0.8882   |  1.3946  |         1.3935         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9844 |  0.8981   |  1.3803  |         1.4522         |
|            TrOCRForCausalLM             | 32  | 0.9934 |  0.9634   |  1.3757  |         1.3755         |
|     PegasusForConditionalGeneration     | 32  | 0.9862 |  0.8712   |  1.3015  |         1.4156         |
|          DistilBertForMaskedLM          | 128 | 0.9935 |  0.9525   |  1.2233  |         1.2222         |
|     MobileBertForQuestionAnswering      | 128 | 0.946  |  0.8143   |  0.9272  |         0.9363         |
|    LayoutLMForSequenceClassification    | 16  | 0.9854 |  0.9721   |   0.0    |          0.0           |
|       DebertaForQuestionAnswering       |  8  | 0.946  |  0.7679   |   0.0    |          0.0           |
|          BlenderbotForCausalLM          |  4  | 0.9651 |  0.7567   |   0.0    |          0.0           |
|           DebertaForMaskedLM            |  4  | 0.8554 |  0.6488   |   0.0    |          0.0           |
|      DebertaV2ForQuestionAnswering      |  2  | 0.837  |  0.6083   |   0.0    |          0.0           |
|          DebertaV2ForMaskedLM           |  1  | 0.8338 |  0.6069   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|           PegasusForCausalLM            | 32  | 5.9802  |  11.4431  | 814.8897 |         255.29         |
|          MobileBertForMaskedLM          | 64  | 15.699  |  40.5806  | 621.2472 |        641.7833        |
|     MobileBertForQuestionAnswering      | 128 | 15.7757 |  40.3503  | 582.039  |        579.7287        |
|       MT5ForConditionalGeneration       | 16  |  8.135  |  18.6691  | 551.3717 |        560.1428        |
|            YituTechConvBert             | 16  | 10.6034 |  20.2192  | 465.3614 |        466.0206        |
|           ElectraForCausalLM            | 32  | 7.5894  |  13.9728  | 437.293  |        439.8019        |
|          AllenaiLongformerBase          |  4  | 11.4177 |  30.8158  | 411.2795 |        410.4339        |
|     M2M100ForConditionalGeneration      | 16  | 11.7046 |  25.3976  | 380.7446 |        380.955         |
|            XLNetLMHeadModel             |  8  | 10.2437 |  27.9182  | 352.8004 |        352.7516        |
|            AlbertForMaskedLM            |  4  | 2.3681  |  8.1171   | 352.0903 |        370.1729        |
|             XGLMForCausalLM             |  8  | 9.6187  |  20.6035  | 346.8905 |        344.7347        |
|         MegatronBertForCausalLM         |  4  | 10.3539 |  21.2987  | 340.8456 |        344.1151        |
|       ElectraForQuestionAnswering       | 64  | 5.2707  |  10.5109  | 340.3166 |        341.0878        |
|     PegasusForConditionalGeneration     | 32  | 5.1667  |  19.225   | 339.9314 |        317.5693        |
|      MBartForConditionalGeneration      |  2  | 11.7647 |  25.5629  | 332.702  |        333.4432        |
|                 T5Small                 |  4  | 5.5074  |  13.2374  | 324.3855 |        325.5567        |
|       T5ForConditionalGeneration        |  4  | 5.5495  |  13.1651  | 323.6672 |        325.1325        |
|    MegatronBertForQuestionAnswering     |  8  | 10.3207 |  21.2929  | 323.4524 |        324.4397        |
|      GPT2ForSequenceClassification      |  4  |  4.77   |  9.7326   | 323.1691 |        321.3846        |
|      BartForConditionalGeneration       |  2  | 11.427  |  25.7008  | 321.1777 |        333.1578        |
|       AlbertForQuestionAnswering        |  4  | 2.3495  |  8.0052   | 314.2912 |        328.9652        |
|     PLBartForConditionalGeneration      |  4  | 8.9844  |  16.5692  | 297.6701 |        278.446         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.5996  |  17.1367  | 289.8975 |        292.8094        |
|       BlenderbotSmallForCausalLM        | 64  | 4.3221  |  8.2531   | 274.757  |        261.6962        |
|       RobertaForQuestionAnswering       | 16  | 5.1173  |  11.1714  | 273.882  |        274.5867        |
|               DistillGPT2               | 16  | 2.5543  |  5.0368   | 267.031  |        267.848         |
|        BertForQuestionAnswering         | 16  | 5.1209  |  10.5719  | 265.9109 |        281.6978        |
|           LayoutLMForMaskedLM           | 16  | 5.6772  |  11.1826  | 261.6479 |        262.2402        |
|                CamemBert                | 16  | 5.2619  |  11.2078  | 260.9713 |        266.0405        |
|          DistilBertForMaskedLM          | 128 | 2.5551  |  5.7136   | 256.5461 |        240.982         |
|           RobertaForCausalLM            | 16  | 5.1179  |  11.2959  | 255.0796 |        254.953         |
|             BertForMaskedLM             | 16  | 5.1511  |  10.6419  | 246.8009 |        261.1474        |
|     DistilBertForQuestionAnswering      | 256 | 2.5246  |  5.4178   | 245.7984 |        244.739         |
|             BartForCausalLM             |  4  | 6.2038  |  11.7528  | 240.4101 |        239.5187        |
|             OPTForCausalLM              |  2  | 5.6827  |  11.3146  | 237.5466 |        237.6625        |
|            MBartForCausalLM             |  4  | 6.5467  |  12.1503  | 235.4426 |        235.7554        |
|         Speech2Text2ForCausalLM         | 256 | 3.3142  |  6.2447   | 234.1054 |        234.3185        |
|            TrOCRForCausalLM             | 32  | 6.1579  |  11.9922  | 230.3112 |        228.9748        |
|            PLBartForCausalLM            |  8  | 3.7152  |  6.7484   | 212.6627 |         213.84         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.2019 |  27.8129  |   nan    |          nan           |
|          DebertaV2ForMaskedLM           |  1  | 15.3387 |  26.5443  |   nan    |          nan           |
|          BlenderbotForCausalLM          |  4  | 11.6699 |  22.2352  |   nan    |          nan           |
|       DebertaForQuestionAnswering       |  8  | 7.2183  |  13.7242  |   nan    |          nan           |
|           DebertaForMaskedLM            |  4  | 7.2914  |  13.3464  |   nan    |          nan           |
|    LayoutLMForSequenceClassification    | 16  |  5.553  |  11.6407  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1135  |         1.1135         |
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1114  |         1.1114         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0816  |         1.0816         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0789  |         1.0789         |
|             OPTForCausalLM              |  2  | 0.9682 |  0.9246   |  1.0615  |         1.0615         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0541         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0356  |         1.0356         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0356  |         1.0356         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |   1.03   |          1.03          |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.0299         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0277  |         1.0277         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  0.9867  |         0.9867         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         0.9734         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         0.9682         |
|            YituTechConvBert             | 16  | 0.953  |  0.8749   |  0.9575  |         0.9575         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         0.9574         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         0.953          |
|     PLBartForConditionalGeneration      |  4  |  0.93  |  0.8787   |  0.9215  |         0.9215         |
|     PegasusForConditionalGeneration     | 32  | 0.9439 |  0.8957   |  0.8911  |         0.8911         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.8906  |         0.8906         |
|           ElectraForCausalLM            | 32  | 0.9161 |  0.7864   |  0.8896  |         0.8896         |
|            PLBartForCausalLM            |  8  | 0.9237 |  0.8168   |  0.8748  |         0.8748         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8677  |         0.8677         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         0.8672         |
|            TrOCRForCausalLM             | 32  |  0.92  |  0.8307   |  0.8558  |         0.8558         |
|            MBartForCausalLM             |  4  | 0.951  |  0.8913   |  0.8501  |         0.8501         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         0.8456         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         0.845          |
|             BartForCausalLM             |  4  | 0.951  |  0.8911   |  0.8311  |         0.8311         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.816   |         0.816          |
|           PegasusForCausalLM            | 32  | 0.9238 |  0.8405   |  0.7966  |         0.7966         |
|       BlenderbotSmallForCausalLM        | 64  | 0.8906 |  0.7493   |  0.787   |         0.787          |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         0.7473         |
|         Speech2Text2ForCausalLM         | 256 | 0.8865 |  0.7573   |  0.7364  |         0.7364         |
|             XGLMForCausalLM             |  8  | 0.9431 |  0.8612   |  0.6744  |         0.6744         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6505  |         0.6505         |
|     M2M100ForConditionalGeneration      | 16  | 0.955  |  0.8772   |  0.6058  |         0.6058         |
|          AllenaiLongformerBase          |  4  | 0.8568 |  0.7887   |  0.4696  |         0.4696         |
|       DebertaForQuestionAnswering       |  8  | 0.9524 |  1.0537   |   nan    |          nan           |
|          BlenderbotForCausalLM          |  4  | 0.9932 |  0.9937   |   nan    |          nan           |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9764   |   nan    |          nan           |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |   nan    |          nan           |
|           DebertaForMaskedLM            |  4  | 0.9326 |  0.9156   |   nan    |          nan           |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|     MobileBertForQuestionAnswering      | 128 | 182.5226 |  235.974  | 215.1751 |        216.4484        |
|            AlbertForMaskedLM            |  4  | 266.3614 | 300.7619  | 165.8751 |        165.7583        |
|       AlbertForQuestionAnswering        |  4  | 264.1258 | 298.0116  | 163.5808 |        163.5286        |
|            XLNetLMHeadModel             |  8  | 281.0762 | 287.9997  | 151.6205 |        151.4111        |
|     PegasusForConditionalGeneration     | 32  | 147.1145 | 181.5002  | 107.5555 |        108.9964        |
|          AllenaiLongformerBase          |  4  | 192.8537 | 273.1966  | 103.1999 |        102.9686        |
|            TrOCRForCausalLM             | 32  | 139.0841 | 143.3156  |  99.98   |        100.4592        |
|          MobileBertForMaskedLM          | 64  | 186.7725 | 241.7768  | 95.3639  |        95.5129         |
|      MBartForConditionalGeneration      |  2  | 145.0382 | 157.2691  | 90.9443  |        89.4301         |
|      BartForConditionalGeneration       |  2  | 139.8443 | 144.7792  | 89.1163  |        91.0751         |
|    MegatronBertForQuestionAnswering     |  8  | 144.707  | 147.5001  | 87.5361  |        87.4944         |
|            YituTechConvBert             | 16  | 127.1205 | 132.4614  | 77.8216  |        77.8727         |
| BlenderbotSmallForConditionalGeneration | 64  | 113.4108 | 133.5939  | 76.6448  |        76.8754         |
|                CamemBert                | 16  | 119.8878 | 123.1316  | 73.1863  |        73.2015         |
|     M2M100ForConditionalGeneration      | 16  | 116.6824 | 180.4988  |   72.1   |        71.9561         |
|     DistilBertForQuestionAnswering      | 256 | 103.9112 | 104.6181  | 71.5622  |        71.5815         |
|           LayoutLMForMaskedLM           | 16  | 114.3058 | 116.9869  | 71.1742  |        70.8273         |
|            MBartForCausalLM             |  4  | 114.2678 | 118.2349  | 69.3464  |        69.4083         |
|          DistilBertForMaskedLM          | 128 | 85.2726  |  89.0205  | 69.3028  |        69.3028         |
|             BartForCausalLM             |  4  | 114.0892 | 117.8833  | 69.2815  |        69.5855         |
|     PLBartForConditionalGeneration      |  4  | 117.413  | 126.8599  | 69.2785  |         69.06          |
|           RobertaForCausalLM            | 16  | 116.517  | 119.6429  | 69.1365  |        68.8249         |
|             BertForMaskedLM             | 16  | 111.6765 | 114.3151  | 68.8883  |        68.9191         |
|             OPTForCausalLM              |  2  | 172.5161 | 180.0393  | 68.2571  |        68.1263         |
|                 T5Small                 |  4  | 106.202  | 124.1665  | 63.0574  |        63.1405         |
|       T5ForConditionalGeneration        |  4  | 106.3016 | 123.0624  | 62.9952  |        63.0701         |
|            PLBartForCausalLM            |  8  | 115.1702 |  117.927  | 62.1232  |        62.1436         |
|         MegatronBertForCausalLM         |  4  | 88.8001  |  94.8498  | 57.5259  |        57.7304         |
|               DistillGPT2               | 16  | 106.873  | 110.3903  | 55.9704  |         55.96          |
|       ElectraForQuestionAnswering       | 64  | 116.1725 | 117.0499  |  53.847  |        53.8888         |
|        BertForQuestionAnswering         | 16  |  96.648  |  97.9249  | 53.7727  |        53.7566         |
|       RobertaForQuestionAnswering       | 16  | 96.9622  |  99.8688  |  53.749  |         53.702         |
|           PegasusForCausalLM            | 32  | 74.0405  |  84.9498  | 53.2941  |        53.2093         |
|             XGLMForCausalLM             |  8  | 94.4726  | 143.9937  | 52.5737  |        52.7642         |
|           ElectraForCausalLM            | 32  | 89.7358  |  94.1721  | 49.1765  |        49.2004         |
|       MT5ForConditionalGeneration       | 16  | 94.1218  | 111.5718  | 43.6003  |        43.7569         |
|       BlenderbotSmallForCausalLM        | 64  | 58.6737  |  65.1917  | 42.0874  |        42.3345         |
|      GPT2ForSequenceClassification      |  4  | 93.2112  |  95.5255  | 39.7861  |        39.8909         |
|         Speech2Text2ForCausalLM         | 256 | 53.5922  |  58.1395  | 35.8121  |        35.9591         |
|          DebertaV2ForMaskedLM           |  1  | 123.9398 | 192.0318  |   nan    |          nan           |
|      DebertaV2ForQuestionAnswering      |  2  | 127.6671 | 191.7707  |   nan    |          nan           |
|          BlenderbotForCausalLM          |  4  | 104.6523 | 154.9237  |   nan    |          nan           |
|           DebertaForMaskedLM            |  4  | 70.4682  | 105.8811  |   nan    |          nan           |
|    LayoutLMForSequenceClassification    | 16  | 99.2054  | 100.6683  |   nan    |          nan           |
|       DebertaForQuestionAnswering       |  8  | 80.1068  |  98.6121  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9976 |  0.9965   |  3.3016  |         3.3034         |
|      xcit_large_24_p8_224       |  5  | 0.9917 |  0.8684   |  2.4064  |         2.0561         |
|        twins_pcpvt_base         | 64  | 0.9973 |  0.9068   |  2.0944  |         2.0897         |
|         coat_lite_mini          | 128 | 0.9973 |  0.9954   |  2.0584  |         2.0591         |
|          gmixer_24_224          | 128 | 0.9953 |  0.8894   |  1.8678  |         1.8633         |
|         crossvit_9_240          | 128 | 0.9932 |  0.7829   |  1.7906  |         1.7845         |
|          ghostnet_100           | 128 | 0.9922 |  0.7644   |  1.7815  |         1.7813         |
|           volo_d1_224           | 64  | 0.9944 |  0.9734   |  1.7241  |         1.7279         |
|          gmlp_s16_224           | 128 | 0.9942 |  1.0826   |  1.7207  |         1.7198         |
|  swin_base_patch4_window7_224   | 64  | 0.9913 |  0.9525   |  1.7048  |         1.7109         |
|           convit_base           | 64  | 0.9981 |  0.9968   |  1.6208  |         1.6213         |
|            pit_b_224            | 64  | 0.995  |  0.9924   |  1.6034  |         1.6029         |
|            lcnet_050            | 128 | 0.9417 |  0.7353   |  1.5874  |         1.5867         |
|          jx_nest_base           | 32  | 0.987  |  0.9858   |  1.5438  |         1.5463         |
|       gluon_inception_v3        | 128 | 0.9962 |  0.8648   |  1.5201  |         1.5191         |
|        adv_inception_v3         | 128 | 0.9969 |  0.8593   |  1.5082  |         1.5107         |
|          inception_v3           | 128 | 0.9961 |  0.8633   |  1.5067  |         1.5071         |
|          convnext_base          | 64  | 0.9837 |   0.984   |  1.4956  |         1.4988         |
|        sebotnet33ts_256         | 64  | 0.9576 |  0.7538   |  1.473   |         1.4567         |
|             dla102              | 128 | 0.9956 |  0.8149   |  1.4697  |         1.469          |
|            nfnet_l0             | 128 | 0.9897 |  0.8134   |  1.4515  |         1.4418         |
|           mobilevit_s           | 64  | 0.9623 |  0.7315   |  1.4473  |         1.4482         |
|      beit_base_patch16_224      | 64  | 0.997  |  0.9664   |  1.4441  |         1.4451         |
|          cait_m36_384           |  4  | 0.9948 |  0.9459   |  1.4405  |         1.4398         |
|           dm_nfnet_f0           | 128 | 0.9865 |  0.9845   |  1.413   |         1.4148         |
|          resmlp_12_224          | 128 | 0.9927 |   0.889   |  1.3963  |         1.3978         |
|       eca_botnext26ts_256       | 128 | 0.9725 |  0.7194   |  1.3919  |         1.4062         |
|          botnet26t_256          | 128 | 0.9739 |  0.8514   |  1.3841  |         1.3863         |
|           mnasnet_100           | 128 | 0.9488 |  0.7405   |  1.3742  |         1.3719         |
|           resnest101e           | 64  | 0.9944 |  0.8672   |  1.3659  |         1.3658         |
|          mixer_b16_224          | 128 | 0.9974 |  1.0175   |  1.3612  |         1.3615         |
|           selecsls42b           | 128 | 0.9992 |  0.8114   |  1.355   |         1.3549         |
|      mobilenetv3_large_100      | 128 |  0.95  |  0.7594   |  1.3501  |         1.3488         |
|         mobilenetv2_100         | 128 | 0.9494 |  0.7369   |  1.3493  |         1.3468         |
|           regnety_002           | 128 | 0.9535 |  0.7166   |  1.3462  |         1.4305         |
|      vit_base_patch16_224       | 64  | 0.9961 |  0.9933   |  1.3365  |         1.3361         |
|        res2net50_14w_8s         | 128 | 0.999  |  0.7891   |  1.336   |         1.3354         |
|            hrnet_w18            | 128 | 0.9921 |  0.6354   |  1.3291  |         1.3244         |
|           fbnetc_100            | 128 | 0.9499 |  0.7389   |  1.3239  |         1.3223         |
|           res2next50            | 128 | 0.9989 |  0.8238   |  1.3155  |         1.3161         |
| deit_base_distilled_patch16_224 | 64  | 0.9963 |  0.9935   |  1.3137  |         1.3151         |
|          spnasnet_100           | 128 | 0.9424 |  0.7387   |  1.3006  |         1.3052         |
|       tf_efficientnet_b0        | 128 | 0.9604 |  0.6813   |  1.293   |         1.2928         |
|         poolformer_m36          | 64  | 0.9865 |  0.9826   |  1.2762  |         1.276          |
|           rexnet_100            | 128 | 0.9522 |  0.7028   |  1.2496  |         1.2498         |
|        ese_vovnet19b_dw         | 128 | 0.9579 |  0.8336   |  1.2493  |         1.2474         |
|            fbnetv3_b            | 128 | 0.9498 |  0.7687   |  1.2442  |         1.2436         |
|         visformer_small         | 128 | 0.9963 |  0.9448   |  1.1907  |         1.1918         |
|            tinynet_a            | 128 | 0.9466 |  0.6782   |  1.1811  |         1.1585         |
|           tf_mixnet_l           | 128 | 0.9764 |  0.8271   |  1.1666  |         1.1664         |
|            mixnet_l             | 128 | 0.9762 |  0.8209   |  1.1561  |         1.1552         |
|          cspdarknet53           | 64  | 0.9329 |  0.7856   |  1.151   |         1.1503         |
|        res2net101_26w_4s        | 64  | 1.0005 |  0.7889   |  1.1295  |         1.1286         |
|             dpn107              | 32  | 0.9307 |  0.8074   |  1.0747  |         1.0746         |
|        gluon_xception65         | 32  | 0.9924 |  0.8435   |  1.0652  |         1.064          |
|     swsl_resnext101_32x16d      | 32  | 0.9977 |  0.8395   |  1.0436  |         1.0433         |
|            gernet_l             | 128 | 0.9361 |  0.7923   |  1.0214  |         1.0219         |
|            repvgg_a2            | 128 | 0.9362 |  0.7555   |  1.0167  |         1.037          |
|        convmixer_768_32         | 32  | 0.9986 |  0.9635   |  0.996   |         0.9959         |
|          pnasnet5large          | 16  | 0.9858 |   0.912   |  0.9045  |         0.8892         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+---------------+---------------+---------------+------------------------+
|              name               | bs |     eager     |   aot_eager   |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+---------------+---------------+---------------+------------------------+
|        adv_inception_v3         | 8  |     pass      |     pass      |     pass      |          pass          |
|           selecsls42b           | 8  |     pass      |     pass      |     pass      |          pass          |
|      beit_base_patch16_224      | 8  |     pass      |     pass      |     pass      |          pass          |
|        tnt_s_patch16_224        | 8  |     pass      |     pass      |     pass      |          pass          |
|         visformer_small         | 8  |     pass      |     pass      |     pass      |          pass          |
|      vit_base_patch16_224       | 8  |     pass      |     pass      |     pass      |          pass          |
|           volo_d1_224           | 8  |     pass      |     pass      |     pass      |          pass          |
|          botnet26t_256          | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|          cspdarknet53           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|             dpn107              | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|        ese_vovnet19b_dw         | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           fbnetc_100            | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|            mixnet_l             | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           mnasnet_100           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           mobilevit_s           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           regnety_002           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|            repvgg_a2            | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           rexnet_100            | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|          spnasnet_100           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|       tf_efficientnet_b0        | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|           tf_mixnet_l           | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|            tinynet_a            | 8  | fail_accuracy |     pass      |     pass      |          pass          |
|       eca_botnext26ts_256       | 8  | fail_accuracy | fail_accuracy |     pass      |          pass          |
|            gernet_l             | 8  | fail_accuracy | fail_accuracy |     pass      |          pass          |
|         mobilenetv2_100         | 8  | fail_accuracy | fail_accuracy |     pass      |          pass          |
|        gluon_xception65         | 8  |     pass      |     pass      |     pass      |     fail_accuracy      |
|        sebotnet33ts_256         | 8  |     pass      |     pass      |     pass      |     fail_accuracy      |
|  swin_base_patch4_window7_224   | 8  |     pass      |     pass      |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 8  |     pass      |     pass      |     pass      |          pass          |
|           resnest101e           | 8  |     pass      |     pass      |     pass      |          pass          |
|            hrnet_w18            | 8  |     pass      |     pass      |     pass      |          pass          |
|          cait_m36_384           | 4  |     pass      |     pass      |     pass      |          pass          |
|           convit_base           | 8  |     pass      |     pass      |     pass      |          pass          |
|        convmixer_768_32         | 8  |     pass      |     pass      |     pass      |          pass          |
|          convnext_base          | 8  |     pass      |     pass      |     pass      |          pass          |
|         crossvit_9_240          | 8  |     pass      |     pass      |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 8  |     pass      |     pass      |     pass      |          pass          |
|           dm_nfnet_f0           | 8  |     pass      |     pass      |     pass      |          pass          |
|          ghostnet_100           | 8  |     pass      |     pass      |     pass      |          pass          |
|       gluon_inception_v3        | 8  |     pass      |     pass      |     pass      |          pass          |
|          gmixer_24_224          | 8  |     pass      |     pass      |     pass      |          pass          |
|          resmlp_12_224          | 8  |     pass      |     pass      |     pass      |          pass          |
|          gmlp_s16_224           | 8  |     pass      |     pass      |     pass      |          pass          |
|          inception_v3           | 8  |     pass      |     pass      |     pass      |          pass          |
|          pnasnet5large          | 8  |     pass      |     pass      |     pass      |          pass          |
|           res2next50            | 8  |     pass      |     pass      |     pass      |          pass          |
|        res2net50_14w_8s         | 8  |     pass      |     pass      |     pass      |          pass          |
|        res2net101_26w_4s        | 8  |     pass      |     pass      |     pass      |          pass          |
|          jx_nest_base           | 8  |     pass      |     pass      |     pass      |          pass          |
|         poolformer_m36          | 8  |     pass      |     pass      |     pass      |          pass          |
|            pit_b_224            | 8  |     pass      |     pass      |     pass      |          pass          |
|            nfnet_l0             | 8  |     pass      |     pass      |     pass      |          pass          |
|      mobilenetv3_large_100      | 8  |     pass      |     pass      |     pass      |          pass          |
|          mixer_b16_224          | 8  |     pass      |     pass      |     pass      |          pass          |
|            lcnet_050            | 8  |     pass      |     pass      |     pass      |          pass          |
|        twins_pcpvt_base         | 8  |     pass      |     pass      |     pass      |         0.0000         |
|             dla102              | 8  |     pass      |     pass      | fail_accuracy |          pass          |
|      xcit_large_24_p8_224       | 8  |     pass      | fail_accuracy | fail_accuracy |     fail_accuracy      |
|            fbnetv3_b            | 8  | fail_accuracy | fail_accuracy | fail_accuracy |     fail_accuracy      |
|         coat_lite_mini          | 8  |     pass      |     pass      |    0.0000     |          pass          |
+---------------------------------+----+---------------+---------------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+-----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor  | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+-----------+------------------------+
|        twins_pcpvt_base         | 64  | 10.9058 |  24.2534  | 1549.1182 |       1549.3968        |
|           mobilevit_s           | 64  | 5.4637  |  11.1892  | 1260.1864 |       1250.1549        |
|         coat_lite_mini          | 128 | 3.2474  |  7.9128   | 1256.0368 |       1261.5438        |
|         crossvit_9_240          | 128 | 5.9921  |  13.1413  | 1181.8398 |       1163.6234        |
|  swin_base_patch4_window7_224   | 64  | 8.6916  |  20.0535  | 1144.8752 |       1145.9039        |
|           volo_d1_224           | 64  | 4.9355  |  12.331   | 959.9922  |        972.2927        |
|            pit_b_224            | 64  | 3.3643  |  8.2352   | 939.9928  |        944.8402        |
|      xcit_large_24_p8_224       |  5  | 12.3087 |  27.5101  | 919.4447  |        930.1613        |
|          jx_nest_base           | 32  | 6.8481  |  14.5014  |  909.285  |        909.8677        |
|          cait_m36_384           |  4  | 13.3383 |  32.1735  | 877.7569  |        889.0475        |
|        sebotnet33ts_256         | 64  | 4.2668  |  8.7328   | 723.3903  |        715.0022        |
|        tnt_s_patch16_224        | 128 | 6.3291  |  15.7476  | 639.1069  |        644.4422        |
|           convit_base           | 64  | 3.6151  |  9.0698   | 606.4733  |        607.7017        |
|       eca_botnext26ts_256       | 128 | 3.0188  |  6.7253   | 576.4692  |        577.7936        |
|          ghostnet_100           | 128 | 7.6126  |  14.4555  | 574.4232  |        585.6942        |
|           rexnet_100            | 128 | 5.4521  |  10.9926  | 574.0523  |        579.1604        |
|          botnet26t_256          | 128 | 2.8889  |  5.9092   | 564.6397  |        558.1972        |
|            hrnet_w18            | 128 | 8.7491  |  34.9528  | 517.5075  |        521.5671        |
|         visformer_small         | 128 | 2.5587  |  5.9233   | 451.9125  |        458.0089        |
|          convnext_base          | 64  | 6.9519  |  12.5288  | 441.4118  |        458.4532        |
|            fbnetv3_b            | 128 | 8.1352  |  17.7019  | 407.5326  |        401.4525        |
|        res2net50_14w_8s         | 128 | 8.7918  |  21.7777  | 405.0288  |        402.7811        |
|            tinynet_a            | 128 | 5.8002  |  12.6407  | 379.1793  |        385.4118        |
|       tf_efficientnet_b0        | 128 | 4.9927  |  10.2662  | 373.0437  |        376.6531        |
|        adv_inception_v3         | 128 | 5.5206  |  12.2555  |  371.087  |        371.0584        |
|       gluon_inception_v3        | 128 | 5.5473  |  12.2703  | 367.0138  |        368.7707        |
|      mobilenetv3_large_100      | 128 | 4.1129  |  8.3298   | 366.7194  |        368.975         |
|          inception_v3           | 128 | 5.7569  |  13.1324  | 365.8731  |        372.7785        |
|           tf_mixnet_l           | 128 | 8.7849  |  16.5201  | 364.5554  |        355.9007        |
|          pnasnet5large          | 16  | 7.7177  |  25.259   | 361.0583  |        357.5933        |
|            mixnet_l             | 128 | 8.1606  |  16.0154  | 357.2374  |        360.5247        |
|          spnasnet_100           | 128 | 4.8912  |  9.0884   |  356.122  |        360.5331        |
|           fbnetc_100            | 128 | 4.9076  |  9.2235   | 354.4488  |        358.268         |
|        res2net101_26w_4s        | 64  | 10.3673 |  24.4714  | 347.2845  |        352.7768        |
| deit_base_distilled_patch16_224 | 64  | 3.2818  |   6.96    |  333.189  |        330.9312        |
|      vit_base_patch16_224       | 64  | 3.0129  |  6.9671   | 331.7911  |        330.6702        |
|         mobilenetv2_100         | 128 | 4.1163  |  7.7334   | 327.4934  |        325.4134        |
|           resnest101e           | 64  | 10.6575 |  23.9364  |  322.625  |        328.1115        |
|      beit_base_patch16_224      | 64  | 4.0731  |   9.155   | 318.5332  |        325.0581        |
|           mnasnet_100           | 128 | 3.9122  |  7.4883   | 316.9831  |        323.0384        |
|          gmixer_24_224          | 128 | 5.5918  |  12.7248  | 289.7795  |        296.1159        |
|         poolformer_m36          | 64  | 7.4032  |  13.5467  | 278.4865  |        277.6924        |
|             dpn107              | 32  | 9.3986  |  19.044   | 277.8568  |        280.9468        |
|           res2next50            | 128 | 4.9122  |  11.8557  | 275.9085  |        277.7791        |
|          cspdarknet53           | 64  | 5.6079  |  10.569   | 272.0032  |        273.5491        |
|           selecsls42b           | 128 | 2.4384  |   5.358   | 259.2803  |        260.0981        |
|           regnety_002           | 128 | 4.7881  |  9.2158   | 258.7406  |        252.8447        |
|          gmlp_s16_224           | 128 | 5.4311  |  11.8688  | 257.6034  |        258.6444        |
|          resmlp_12_224          | 128 | 2.7509  |  5.3595   | 249.7601  |        252.793         |
|          mixer_b16_224          | 128 | 2.6356  |  5.8652   | 249.4048  |        245.5265        |
|        gluon_xception65         | 32  | 7.5463  |  16.5946  | 239.5066  |        234.587         |
|            lcnet_050            | 128 | 2.4614  |  4.9731   | 229.6674  |        221.3622        |
|        ese_vovnet19b_dw         | 128 | 2.4879  |  4.4711   | 223.3347  |        223.677         |
|           dm_nfnet_f0           | 128 | 5.8579  |  11.242   | 219.4157  |        222.9388        |
|            gernet_l             | 128 | 4.8269  |  8.7057   |  216.739  |        216.6382        |
|             dla102              | 128 | 5.9674  |  13.9042  | 192.1844  |        191.9956        |
|            nfnet_l0             | 128 | 5.2141  |  10.8368  | 189.6519  |        186.6258        |
|     swsl_resnext101_32x16d      | 32  | 5.9419  |  13.3517  |  187.444  |        191.9566        |
|            repvgg_a2            | 128 | 4.6791  |  8.5646   | 176.7575  |        178.8257        |
|        convmixer_768_32         | 32  | 1.6543  |  6.9368   | 101.2559  |        99.4474         |
+---------------------------------+-----+---------+-----------+-----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 1.0015 |  0.9787   |  1.1839  |         1.1839         |
|          pnasnet5large          | 16  | 1.0593 |  0.9927   |  1.1539  |         1.1525         |
|          gmixer_24_224          | 128 | 1.0014 |  0.9787   |  1.1127  |         1.1127         |
|           convit_base           | 64  |  1.0   |  0.8505   |  1.0948  |         1.0948         |
|         mobilenetv2_100         | 128 | 0.9996 |  0.7725   |  1.0266  |         1.0266         |
|           dm_nfnet_f0           | 128 | 0.9808 |  0.9006   |  1.0129  |         1.0129         |
|          resmlp_12_224          | 128 | 0.9999 |  0.9667   |  1.0097  |         1.0097         |
|            tinynet_a            | 128 | 0.9998 |  0.7975   |  0.9985  |         0.9985         |
|           resnest101e           | 64  | 0.9998 |  1.0033   |  0.9933  |         0.9933         |
|       tf_efficientnet_b0        | 128 | 0.9992 |  0.7813   |  0.9873  |         0.9873         |
|        tnt_s_patch16_224        | 128 |  1.0   |  0.9781   |  0.9834  |         0.9834         |
|           rexnet_100            | 128 |  1.0   |  0.7935   |  0.9745  |         0.9745         |
|        twins_pcpvt_base         | 64  | 0.9995 |  0.9273   |  0.9727  |         0.9727         |
|        convmixer_768_32         | 32  |  1.0   |  0.9812   |  0.9657  |         0.9657         |
|             dla102              | 128 | 0.9708 |  0.9218   |  0.9535  |         0.9535         |
|          mixer_b16_224          | 128 |  1.0   |  0.9644   |  0.9438  |         0.9438         |
|           tf_mixnet_l           | 128 | 0.9995 |  0.8647   |  0.9345  |         0.9345         |
|      beit_base_patch16_224      | 64  | 0.9999 |  0.9344   |  0.9306  |         0.9306         |
|           mobilevit_s           | 64  | 0.9998 |  0.7836   |  0.9262  |         0.9262         |
|         visformer_small         | 128 | 1.0005 |  0.9328   |  0.9245  |         0.9245         |
|            fbnetv3_b            | 128 | 0.9989 |  0.8019   |  0.9167  |         0.9167         |
|            nfnet_l0             | 128 | 1.0005 |  0.8489   |  0.9101  |         0.9101         |
|          cspdarknet53           | 64  | 0.9996 |   0.86    |  0.9098  |         0.9098         |
|      vit_base_patch16_224       | 64  | 1.0001 |   0.936   |  0.9078  |         0.9078         |
| deit_base_distilled_patch16_224 | 64  | 0.9995 |  0.9358   |  0.9071  |         0.9071         |
|           volo_d1_224           | 64  | 1.001  |  0.9514   |  0.9067  |         0.9067         |
|        ese_vovnet19b_dw         | 128 | 0.9986 |  0.9082   |  0.8975  |         0.8975         |
|        sebotnet33ts_256         | 64  | 0.9957 |  0.7151   |  0.891   |         0.8908         |
|       gluon_inception_v3        | 128 |  1.0   |  0.8752   |  0.8902  |         0.8902         |
|          inception_v3           | 128 |  1.0   |  0.8752   |  0.8902  |         0.8902         |
|        adv_inception_v3         | 128 |  1.0   |  0.8752   |  0.8902  |         0.8902         |
|            hrnet_w18            | 128 | 0.9999 |  0.9269   |  0.8872  |         0.8872         |
|        gluon_xception65         | 32  | 0.9998 |  0.8877   |  0.8832  |         0.8832         |
|          spnasnet_100           | 128 | 0.9992 |  0.8982   |  0.8786  |         0.8786         |
|      xcit_large_24_p8_224       |  5  | 0.9989 |  0.8874   |  0.8761  |         0.8761         |
|       eca_botnext26ts_256       | 128 | 0.9995 |  0.7791   |  0.8738  |         0.8738         |
|            mixnet_l             | 128 | 0.9997 |  0.8539   |  0.8686  |         0.8686         |
|             dpn107              | 32  | 0.9932 |  0.9066   |  0.8685  |         0.8685         |
|           mnasnet_100           | 128 | 0.9992 |  0.8897   |  0.8683  |         0.8683         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8637  |         0.8637         |
|         poolformer_m36          | 64  | 1.0014 |  0.9514   |  0.8598  |         0.8598         |
|           fbnetc_100            | 128 | 0.9989 |  0.8651   |  0.8596  |         0.8596         |
|            pit_b_224            | 64  | 1.0005 |  0.8033   |  0.8566  |         0.8566         |
|        res2net101_26w_4s        | 64  | 1.0002 |  0.9186   |  0.8505  |         0.8505         |
|        res2net50_14w_8s         | 128 | 1.0002 |  0.9151   |  0.8497  |         0.8494         |
|            gernet_l             | 128 | 0.9989 |  0.8652   |  0.8495  |         0.8496         |
|     swsl_resnext101_32x16d      | 32  | 1.0002 |  0.8706   |  0.8477  |         0.8477         |
|           selecsls42b           | 128 | 1.0006 |  0.8947   |  0.8471  |         0.8472         |
|           res2next50            | 128 | 1.0003 |   0.918   |  0.8452  |         0.8452         |
|          ghostnet_100           | 128 | 0.9983 |  0.8894   |  0.8416  |         0.8416         |
|      mobilenetv3_large_100      | 128 | 0.9993 |  0.8597   |  0.8413  |         0.8413         |
|         coat_lite_mini          | 128 | 1.0445 |   0.929   |  0.8401  |         0.8401         |
|          convnext_base          | 64  | 1.0052 |  0.9275   |  0.832   |         0.832          |
|          botnet26t_256          | 128 | 0.9994 |  0.8791   |  0.824   |         0.824          |
|            lcnet_050            | 128 | 0.9982 |  0.8057   |  0.8048  |         0.8048         |
|            repvgg_a2            | 128 | 0.9997 |  0.7933   |  0.7738  |         0.7738         |
|           regnety_002           | 128 | 0.9992 |  0.8629   |   0.76   |          0.76          |
|         crossvit_9_240          | 128 | 0.999  |  0.8819   |  0.7525  |         0.7526         |
|  swin_base_patch4_window7_224   | 64  | 1.001  |  0.9237   |  0.7214  |         0.7214         |
|          jx_nest_base           | 32  | 1.0006 |  0.8943   |  0.6693  |         0.6693         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 301.0462 | 312.2161  | 302.1458 |        301.8995        |
|          pnasnet5large          | 16  | 198.8496 | 214.7347  | 218.3926 |        221.4154        |
|            hrnet_w18            | 128 | 281.0378 | 439.5517  | 210.4836 |        210.9151        |
|           tf_mixnet_l           | 128 | 194.1237 | 229.2241  | 162.5183 |        162.555         |
|            mixnet_l             | 128 | 185.6321 |  220.698  | 156.7757 |        156.7845        |
|           resnest101e           | 64  | 165.7725 | 189.7388  | 120.6507 |        120.6243        |
|             dla102              | 128 | 172.5634 |  210.823  | 116.9649 |        116.9901        |
|          cait_m36_384           |  4  | 168.4862 | 181.6364  | 116.1641 |        115.9536        |
|         poolformer_m36          | 64  | 146.9774 | 147.4164  | 113.6654 |        113.617         |
|     swsl_resnext101_32x16d      | 32  | 118.7176 | 141.2897  | 113.5715 |        113.5552        |
|        adv_inception_v3         | 128 | 159.508  | 185.1819  | 105.6749 |        105.1826        |
|          inception_v3           | 128 | 159.1964 | 183.9483  | 105.4387 |        105.2299        |
|        res2net50_14w_8s         | 128 | 140.8834 | 178.1801  | 105.2017 |        105.3541        |
|       gluon_inception_v3        | 128 | 160.4461 | 184.7298  | 105.1827 |        105.2334        |
|           convit_base           | 64  | 163.2989 | 163.4412  | 100.5729 |        100.5841        |
|             dpn107              | 32  | 114.1878 | 131.4156  | 98.7964  |        98.7699         |
|        tnt_s_patch16_224        | 128 | 323.9905 | 324.3737  | 98.0196  |        97.8647         |
|           res2next50            | 128 | 125.7301 | 152.3633  | 95.4602  |        95.5511         |
|        gluon_xception65         | 32  | 99.7088  | 117.1326  | 92.8894  |        93.1743         |
|           dm_nfnet_f0           | 128 | 128.8182 | 128.9241  | 89.5698  |        89.5179         |
|            fbnetv3_b            | 128 | 115.195  | 142.6675  | 87.9638  |        88.1022         |
|        res2net101_26w_4s        | 64  | 99.1765  | 125.6375  | 87.4515  |        87.4971         |
|          mixer_b16_224          | 128 | 116.789  | 114.4068  | 86.0718  |        86.0705         |
|  swin_base_patch4_window7_224   | 64  | 147.7465 | 153.5656  | 85.6142  |        85.4956         |
|          convnext_base          | 64  | 124.5935 | 124.2454  | 81.9299  |        81.6359         |
|          gmlp_s16_224           | 128 | 137.7481 | 126.3972  | 79.6968  |        79.8003         |
|            nfnet_l0             | 128 | 112.5273 | 136.7954  | 77.4258  |        77.4401         |
|          cspdarknet53           | 64  | 94.8595  | 112.7862  | 77.0728  |        77.0337         |
|         visformer_small         | 128 | 91.3297  |  96.3689  | 76.4668  |        76.4226         |
|       eca_botnext26ts_256       | 128 | 108.8211 |  147.228  | 76.2004  |        75.4198         |
|            pit_b_224            | 64  | 118.8002 | 119.1475  | 73.7353  |         73.683         |
|          botnet26t_256          | 128 | 101.8523 | 116.4763  | 71.6663  |        71.6238         |
|            repvgg_a2            | 128 | 77.6633  |  96.2935  | 71.6151  |        70.1499         |
|            gernet_l             | 128 | 77.6879  |  91.8388  | 71.1741  |        71.2745         |
|      beit_base_patch16_224      | 64  | 101.6785 |  104.701  | 70.0992  |        70.0879         |
|           volo_d1_224           | 64  |  120.95  | 123.7946  | 69.8023  |         69.816         |
|      vit_base_patch16_224       | 64  | 87.0411  |  87.2011  |  64.934  |         64.922         |
|          jx_nest_base           | 32  | 101.7517 | 101.6071  | 64.8841  |        64.9252         |
| deit_base_distilled_patch16_224 | 64  | 85.2421  |  85.144   | 64.4973  |        64.4607         |
|          gmixer_24_224          | 128 | 117.897  | 132.1535  |  63.091  |        63.2013         |
|       tf_efficientnet_b0        | 128 | 84.7414  | 119.5686  | 63.0366  |        63.0261         |
|           rexnet_100            | 128 | 79.9465  | 108.5101  |  60.996  |        60.9486         |
|      xcit_large_24_p8_224       |  5  | 124.4075 | 141.7627  | 60.3619  |        60.1882         |
|           fbnetc_100            | 128 | 82.8684  | 106.4635  | 59.4945  |        59.4972         |
|            tinynet_a            | 128 | 73.5495  | 102.8014  | 59.0413  |        60.2127         |
|           mobilevit_s           | 64  | 84.9274  | 111.1643  | 56.2777  |        56.1736         |
|        twins_pcpvt_base         | 64  | 131.6094 | 140.7932  | 56.0408  |        56.1467         |
|         coat_lite_mini          | 128 | 113.1211 | 113.2933  | 54.7608  |        54.7565         |
|        sebotnet33ts_256         | 64  |  80.684  | 102.4806  | 52.3276  |        53.1544         |
|          spnasnet_100           | 128 | 70.3295  |  89.8098  | 51.0077  |        50.8003         |
|          ghostnet_100           | 128 | 90.9596  |  117.569  | 50.5172  |        50.5421         |
|        ese_vovnet19b_dw         | 128 | 64.5898  |  74.2655  | 49.5358  |        49.6475         |
|         mobilenetv2_100         | 128 | 65.4813  |  84.4073  |  46.111  |        46.2088         |
|         crossvit_9_240          | 128 | 82.8826  | 104.4348  | 45.6622  |        45.9007         |
|           mnasnet_100           | 128 | 64.1628  |  82.2534  | 44.3909  |        44.4532         |
|           selecsls42b           | 128 | 59.9773  |  73.8027  | 44.2711  |        44.2379         |
|      mobilenetv3_large_100      | 128 | 61.3151  |  76.591   | 43.1009  |        43.1787         |
|          resmlp_12_224          | 128 | 53.4639  |  59.7417  | 38.1176  |        38.0715         |
|           regnety_002           | 128 | 41.3086  |  57.0772  | 27.4955  |        27.4312         |
|            lcnet_050            | 128 | 31.6613  |  40.5824  | 18.7968  |        18.8276         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Build Summary

see more

Run name

day_079_20_03_23_performance_amp_778

Commit hashes

pytorch commit: 9423b863f800c6d20b9b3de4422558cbb338fb83
pytorch commit date: 2023-03-23 00:32:51+00:00
torchbench commit: d618fa8e06c13bbe441cc929c5d3bf498d0f369c
torchbench commit date: 2023-03-22 15:27:07-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git9423b86

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

williamwen42 commented Mar 29, 2023

Performance Dashboard for amp precision (inference, no max-autotune)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 94%, 59/63 | 100%, 46/46 | 100%, 60/60 |
|       aot_eager        | 90%, 57/63 | 100%, 46/46 | 100%, 60/60 |
|        inductor        | 84%, 53/63 | 100%, 46/46 | 98%, 59/60  |
| inductor_no_cudagraphs | 86%, 54/63 | 100%, 46/46 | 98%, 59/60  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.49x    |    1.37x    |    1.33x    |
| inductor_no_cudagraphs |   1.39x    |    1.33x    |    1.32x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   16.76    |    3.72     |    2.65     |
|       aot_eager        |   26.52    |    6.18     |    5.35     |
|        inductor        |   16.49    |    21.29    |    21.32    |
| inductor_no_cudagraphs |   15.67    |    19.13    |    20.99    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.04x    |    1.01x    |    1.17x    |
|       aot_eager        |   1.01x    |    1.01x    |    1.18x    |
|        inductor        |   0.92x    |    1.16x    |    1.09x    |
| inductor_no_cudagraphs |   0.99x    |    1.25x    |    1.16x    |
+------------------------+------------+-------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+--------------------------+---------------+
|    suite    |           name           |   inductor    |
+-------------+--------------------------+---------------+
| torchbench  |          llama           |  fail_to_run  |
| torchbench  | detectron2_fcos_r_50_fpn |  fail_to_run  |
| torchbench  |           moco           |  fail_to_run  |
| torchbench  |      DALLE2_pytorch      |  fail_to_run  |
| torchbench  |        hf_BigBird        |    0.0000     |
| torchbench  |        tacotron2         |    0.0000     |
| torchbench  |     vision_maskrcnn      |    0.0000     |
| torchbench  |          demucs          |    0.0000     |
| torchbench  |      torchrec_dlrm       |    0.0000     |
| timm_models |       cait_m36_384       | fail_accuracy |
+-------------+--------------------------+---------------+

Performance speedup warnings

+-------------+-------------------------------+----------+
|    suite    |             name              | inductor |
+-------------+-------------------------------+----------+
| torchbench  |    nvidia_deeprecommender     |  0.8654  |
| torchbench  |           tacotron2           |   0.0    |
| torchbench  |             llama             |   0.0    |
| torchbench  |             moco              |   0.0    |
| torchbench  |   detectron2_fcos_r_50_fpn    |   0.0    |
| torchbench  |        DALLE2_pytorch         |   0.0    |
| torchbench  |         torchrec_dlrm         |   0.0    |
| huggingface | DebertaV2ForQuestionAnswering |  0.921   |
+-------------+-------------------------------+----------+

Compilation latency (sec) warnings

+------------+-----------------+----------+
|   suite    |      name       | inductor |
+------------+-----------------+----------+
| torchbench | vision_maskrcnn | 143.7892 |
+------------+-----------------+----------+

Peak Memory Compression Ratio warnings

+-------------+---------------------------------+----------+
|    suite    |              name               | inductor |
+-------------+---------------------------------+----------+
| torchbench  |             alexnet             |  0.8928  |
| torchbench  |         LearningToPaint         |  0.8634  |
| torchbench  |      doctr_reco_predictor       |  0.8433  |
| torchbench  |         vision_maskrcnn         |  0.8204  |
| torchbench  |          BERT_pytorch           |  0.8033  |
| torchbench  |               drq               |  0.7762  |
| torchbench  |           hf_T5_large           |  0.7346  |
| torchbench  |       shufflenet_v2_x1_0        |  0.7328  |
| torchbench  |         pytorch_stargan         |  0.7292  |
| torchbench  |             demucs              |  0.7055  |
| torchbench  |        soft_actor_critic        |  0.7024  |
| torchbench  |     timm_vision_transformer     |  0.6935  |
| torchbench  |       speech_transformer        |  0.6699  |
| torchbench  |           Super_SloMo           |  0.6691  |
| torchbench  |           densenet121           |  0.6671  |
| torchbench  |          lennard_jones          |  0.5327  |
| torchbench  |  pytorch_CycleGAN_and_pix2pix   |  0.3981  |
| torchbench  |              hf_T5              |  0.359   |
| torchbench  |         phlippe_resnet          |  0.3583  |
| torchbench  |           hf_BigBird            |  0.3278  |
| torchbench  |           hf_T5_base            |  0.3239  |
| torchbench  |           hf_Reformer           |  0.2509  |
| torchbench  |          hf_Longformer          |  0.2374  |
| huggingface | DistilBertForQuestionAnswering  |  0.8871  |
| huggingface |      DebertaV2ForMaskedLM       |  0.793   |
| huggingface |           GoogleFnet            |  0.7194  |
| huggingface |  DebertaV2ForQuestionAnswering  |  0.6539  |
| huggingface |       DebertaForMaskedLM        |  0.4906  |
| huggingface |      AllenaiLongformerBase      |  0.4883  |
| huggingface |   DebertaForQuestionAnswering   |  0.2841  |
| timm_models |        twins_pcpvt_base         |  0.8882  |
| timm_models |  swin_base_patch4_window7_224   |  0.883   |
| timm_models |      xcit_large_24_p8_224       |  0.8765  |
| timm_models |            pit_b_224            |  0.8608  |
| timm_models |          mixer_b16_224          |  0.8569  |
| timm_models |         visformer_small         |  0.8474  |
| timm_models |      beit_base_patch16_224      |  0.8072  |
| timm_models | deit_base_distilled_patch16_224 |  0.7967  |
| timm_models |      vit_base_patch16_224       |  0.7965  |
| timm_models |          jx_nest_base           |  0.7852  |
| timm_models |          resmlp_12_224          |  0.771   |
| timm_models |          gmlp_s16_224           |  0.7311  |
| timm_models |          gmixer_24_224          |  0.667   |
| timm_models |         crossvit_9_240          |  0.586   |
| timm_models |        tnt_s_patch16_224        |  0.4363  |
+-------------+---------------------------------+----------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|                drq                |  1   | 0.9944 |  0.9762   |  3.5568  |         1.5196         |
|         soft_actor_critic         | 256  | 0.9433 |  0.9402   |  2.7425  |         1.3304         |
|        shufflenet_v2_x1_0         | 128  | 0.9897 |  1.0318   |  2.4279  |         2.3955         |
|           lennard_jones           | 1000 | 0.8317 |  0.8248   |  2.1716  |         0.8541         |
|             hf_Albert             |  16  | 0.9984 |  0.9968   |  2.0078  |         1.9641         |
|               dlrm                |  1   | 0.9848 |  1.0698   |  1.9589  |         1.1354         |
|         phlippe_densenet          | 128  | 0.9604 |  0.7585   |  1.9379  |         1.6092         |
|            hf_Reformer            |  8   | 0.983  |  0.9829   |  1.8413  |          2.03          |
|            hf_BigBird             |  4   | 0.9836 |  0.9587   |  1.7951  |         1.555          |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9795 |  1.0056   |  1.7835  |         1.6679         |
|            timm_nfnet             | 128  | 0.9628 |  0.9636   |  1.7602  |         1.7364         |
|            hf_T5_large            |  1   | 0.7306 |  0.6522   |  1.7409  |         1.1671         |
|              hf_GPT2              |  16  | 0.9871 |  0.9864   |  1.738   |         1.7322         |
|           timm_resnest            | 256  | 0.9966 |  0.9959   |  1.7028  |         1.6947         |
|           hf_GPT2_large           |  1   | 0.9657 |  0.9247   |  1.6772  |         1.6616         |
|           squeezenet1_1           | 256  | 0.9941 |  0.9902   |  1.6761  |         1.6757         |
|            densenet121            |  64  | 0.9829 |  0.9633   |  1.6597  |         1.6069         |
|        speech_transformer         |  1   | 0.9528 |  0.8302   |  1.6203  |         1.5788         |
|               hf_T5               |  4   | 0.9576 |   0.941   |  1.5934  |         1.695          |
|             resnet50              |  64  | 0.9901 |   0.979   |  1.5348  |         1.5185         |
|           mobilenet_v2            | 128  | 0.9894 |  0.9765   |  1.5342  |         1.5344         |
|        Background_Matting         |  1   | 0.9959 |  0.6317   |  1.5107  |         1.4961         |
|          resnext50_32x4d          |  64  | 0.9929 |  0.9841   |  1.5065  |         1.4864         |
|            mnasnet1_0             | 128  | 0.9847 |  0.9722   |  1.496   |         1.4936         |
|             resnet152             |  64  | 0.9893 |  0.9748   |  1.4779  |         1.4556         |
|            hf_T5_base             |  1   | 0.9066 |   0.876   |  1.445   |         1.3874         |
|           pytorch_unet            |  4   | 0.9979 |  0.6971   |  1.4357  |         1.4338         |
|        mobilenet_v3_large         | 128  | 0.9793 |  0.9678   |  1.4198  |         1.4143         |
|           fastNLP_Bert            |  16  | 0.9824 |  0.9785   |  1.3965  |         1.392          |
|           hf_Bert_large           |  4   | 1.0037 |  0.8763   |  1.3776  |         1.3774         |
|        doctr_det_predictor        |  4   | 0.9943 |   0.748   |  1.3662  |         1.3625         |
|            Super_SloMo            |  8   | 0.9978 |  0.7862   |  1.3618  |         1.3627         |
|           hf_DistilBert           |  16  | 0.9743 |  0.9703   |  1.3531  |         1.3411         |
|             resnet18              | 256  | 0.9945 |  0.9918   |  1.3134  |         1.3143         |
|           hf_Longformer           |  4   | 0.9983 |  0.4373   |  1.3074  |         1.3465         |
|            timm_regnet            |  32  | 0.9156 |  0.9025   |  1.3059  |         1.2223         |
|         timm_efficientnet         | 128  | 0.9487 |  0.9415   |  1.3038  |         1.2898         |
|               vgg16               |  8   | 0.9908 |  0.9809   |  1.3015  |         1.2668         |
|          LearningToPaint          | 256  | 0.9908 |  1.0023   |  1.2919  |         1.3194         |
|              hf_Bart              |  8   | 0.9257 |  0.8594   |  1.2873  |         1.2325         |
|           BERT_pytorch            |  32  | 0.9499 |  0.9402   |  1.2855  |         1.2555         |
|              yolov3               |  8   | 0.9829 |  0.9118   |  1.2847  |         1.2705         |
|              hf_Bert              |  8   | 0.9095 |  0.9045   |  1.2752  |         1.2506         |
|          phlippe_resnet           | 256  | 0.9632 |  0.7686   |   1.27   |         1.2245         |
|            timm_vovnet            | 128  | 0.9398 |  0.9371   |  1.2699  |         1.2619         |
|              alexnet              | 1024 | 0.9986 |  0.9983   |  1.2516  |         1.2808         |
|       functorch_dp_cifar10        | 512  | 0.9744 |  0.9747   |  1.2247  |         1.1604         |
|          pytorch_stargan          |  16  | 0.9891 |  0.8871   |  1.2033  |         1.2068         |
|       doctr_reco_predictor        |  64  | 0.9933 |  0.9797   |  1.2022  |         1.1924         |
|      timm_vision_transformer      | 128  | 0.9862 |  0.9859   |  1.1951  |         1.1848         |
|               dcgan               | 1024 | 0.994  |   1.034   |  1.1875  |         1.1965         |
|              demucs               |  32  | 0.9994 |  0.9993   |  1.1593  |         1.2062         |
| attention_is_all_you_need_pytorch | 256  | 0.969  |  0.9596   |  1.1067  |         1.0889         |
|   timm_vision_transformer_large   |  32  | 0.994  |  0.9935   |  1.0854  |         1.0777         |
|          vision_maskrcnn          |  4   | 0.8912 |   0.852   |  1.0774  |         1.084          |
|            tts_angular            | 512  | 0.9891 |  0.9884   |  0.989   |         0.9905         |
|      nvidia_deeprecommender       | 512  | 0.9943 |  0.9936   |  0.8654  |         0.9916         |
|             tacotron2             | 128  | 0.992  |  0.9919   |   0.0    |          0.0           |
|               llama               | 1024 | 0.9819 |  0.6014   |   0.0    |          0.0           |
|               moco                |  64  | 0.988  |    0.0    |   0.0    |          0.0           |
|     detectron2_fcos_r_50_fpn      |  4   | 0.802  |    0.0    |   0.0    |          0.0           |
|          DALLE2_pytorch           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |       pass       |       pass       |       pass       |          pass          |
|        Background_Matting         |  1  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|        doctr_det_predictor        |  4  |       pass       |       pass       |       pass       |          pass          |
|       doctr_reco_predictor        |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|               llama               |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|     detectron2_fcos_r_50_fpn      |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|          DALLE2_pytorch           |  4  |   fail_to_run    |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|            hf_BigBird             |  4  |       pass       |       pass       |      0.0000      |          pass          |
|             tacotron2             |  4  |       pass       |       pass       |      0.0000      |         0.0000         |
|          vision_maskrcnn          |  4  |       pass       |       pass       |      0.0000      |         0.0000         |
|              demucs               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|          vision_maskrcnn          |  4   | 12.6624  |  24.6886  | 143.7892 |        112.1183        |
|           hf_Longformer           |  4   |  7.4484  |  18.8382  | 74.5018  |        60.8011         |
|            hf_T5_large            |  1   | 14.9758  |  23.4288  | 67.1839  |        43.3799         |
|            hf_BigBird             |  4   |  9.7464  |  19.8755  | 60.6145  |         49.509         |
|            hf_T5_base             |  1   |  7.7194  |  12.5269  | 38.6659  |        26.6758         |
|            hf_Reformer            |  8   |  2.8316  |  4.4848   | 32.6493  |        28.2142         |
|           hf_GPT2_large           |  1   |  8.4584  |  14.2657  | 28.1275  |        27.3305         |
|               hf_T5               |  4   |  4.4212  |  7.0701   | 27.1365  |        20.0244         |
|        speech_transformer         |  1   |  2.1767  |  4.7553   | 25.9839  |        25.7781         |
|            densenet121            |  64  |  2.7661  |  6.5193   | 25.7481  |        24.4043         |
|   timm_vision_transformer_large   |  32  |  4.1539  |  9.1503   | 25.2112  |        24.7234         |
|              hf_Bart              |  8   |  3.8656  |  6.5933   | 24.9794  |        22.3447         |
|              yolov3               |  8   |  2.0772  |  4.6528   | 23.9742  |        23.3447         |
|            timm_nfnet             | 128  |  3.2002  |  5.4945   |  22.976  |        22.9633         |
| attention_is_all_you_need_pytorch | 256  |  1.7707  |  3.7662   | 22.3304  |          21.2          |
|             resnet152             |  64  |  2.9275  |   7.644   |  20.461  |         19.791         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.5017  |  1.1717   |  19.304  |        18.6227         |
|           hf_Bert_large           |  4   |  5.166   |  9.2175   | 17.8246  |        17.7783         |
|            Super_SloMo            |  8   |  1.5558  |   4.06    | 17.2359  |        17.1916         |
|       functorch_dp_cifar10        | 512  |  0.3612  |  0.6794   | 16.7978  |        16.0351         |
|          pytorch_stargan          |  16  |  0.5123  |  1.4316   | 16.5372  |        16.2262         |
|            timm_regnet            |  32  |  3.2648  |  5.3385   | 16.3202  |        15.7472         |
|           fastNLP_Bert            |  16  |  2.0923  |  3.9621   | 16.2619  |        14.0235         |
|             hf_Albert             |  16  |  1.9754  |  3.9679   | 15.9788  |        14.7029         |
|         timm_efficientnet         | 128  |  2.0938  |  3.9804   | 15.7054  |        15.4181         |
|              hf_GPT2              |  16  |  2.4689  |  4.3735   | 15.2184  |         15.058         |
|        mobilenet_v3_large         | 128  |  1.1834  |  2.8617   | 14.8561  |        15.1658         |
|         phlippe_densenet          | 128  |  1.1248  |  2.7419   | 14.5965  |        14.4419         |
|       doctr_reco_predictor        |  64  |  0.5259  |  1.0603   | 13.5329  |        12.2191         |
|      timm_vision_transformer      | 128  |  1.3039  |  2.9015   | 13.4639  |        13.8721         |
|           BERT_pytorch            |  32  |  2.0095  |  3.7978   | 13.1224  |        13.3125         |
|        shufflenet_v2_x1_0         | 128  |  1.1867  |  2.9829   | 13.0808  |        12.9476         |
|        doctr_det_predictor        |  4   |  1.5603  |  4.1548   | 12.8966  |        12.8151         |
|              demucs               |  32  |  0.3578  |   0.567   | 12.8307  |        10.9186         |
|           timm_resnest            | 256  |  0.7442  |  1.5706   | 12.6194  |        12.5018         |
|           mobilenet_v2            | 128  |  1.0405  |  2.6644   | 12.2616  |        10.9306         |
|           hf_DistilBert           |  16  |  1.0569  |  2.1336   | 12.0845  |        11.7387         |
|          resnext50_32x4d          |  64  |  1.074   |  2.6712   | 11.9345  |        10.7818         |
|             resnet50              |  64  |  1.0672  |  2.6556   | 11.8492  |        10.8521         |
|            mnasnet1_0             | 128  |  0.9902  |  2.5045   |  11.743  |        11.3677         |
|            timm_vovnet            | 128  |  1.872   |  2.9855   | 11.6538  |        11.3552         |
|              hf_Bert              |  8   |  2.4012  |  4.4845   | 11.4191  |         10.95          |
|        Background_Matting         |  1   |  1.0652  |   2.933   | 10.1229  |        10.3911         |
|          phlippe_resnet           | 256  |  0.4847  |  1.1107   |  9.0674  |         8.7675         |
|             resnet18              | 256  |  0.5115  |   1.128   |  9.0495  |         7.9961         |
|          LearningToPaint          | 256  |  0.5141  |  1.1709   |  8.166   |         6.7555         |
|           pytorch_unet            |  4   |  0.6357  |  1.6981   |  7.7293  |         7.6636         |
|           squeezenet1_1           | 256  |  0.3256  |   0.536   |  6.4693  |         5.4472         |
|                drq                |  1   |  0.3346  |  0.4368   |  5.7531  |         5.4296         |
|              alexnet              | 1024 |  0.1951  |  0.3048   |  5.7208  |         4.5737         |
|         soft_actor_critic         | 256  |  0.2428  |  0.3001   |  5.1839  |         3.9874         |
|               vgg16               |  8   |  0.2344  |   0.401   |  5.092   |         4.9237         |
|               dcgan               | 1024 |  0.1918  |  0.3153   |  4.7729  |         4.4336         |
|               dlrm                |  1   |  0.2857  |  0.4284   |  4.5918  |         4.1552         |
|      nvidia_deeprecommender       | 512  |  0.2301  |  0.3088   |  4.5063  |         4.2944         |
|            tts_angular            | 512  |  0.1595  |   0.189   |  4.0973  |         3.9736         |
|           lennard_jones           | 1000 |  0.1663  |  0.2289   |  4.0944  |         3.7544         |
|             tacotron2             | 128  | 827.0071 | 1255.5524 |   nan    |          nan           |
|               llama               | 1024 |  1.3762  |  2.8133   |   nan    |          nan           |
|               moco                |  64  | 24.2323  |    nan    |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  4   |  8.2336  |    nan    |   nan    |          nan           |
|          DALLE2_pytorch           |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|           pytorch_unet            |  4   | 1.5928 |  1.3424   |  1.5789  |         1.5928         |
|            timm_vovnet            | 128  | 1.2915 |  1.2915   |  1.4547  |         1.5115         |
|           mobilenet_v2            | 128  | 1.0717 |  1.0717   |  1.4418  |         1.5814         |
|           squeezenet1_1           | 256  |  1.0   |    1.0    |  1.3548  |         1.5039         |
|        mobilenet_v3_large         | 128  | 1.0989 |  1.0989   |  1.2775  |         1.1405         |
|              yolov3               |  8   | 1.2988 |  1.2422   |  1.276   |         1.2988         |
|            timm_nfnet             | 128  | 1.1078 |  1.5535   |  1.2336  |         1.2775         |
| attention_is_all_you_need_pytorch | 256  | 1.0311 |  1.0311   |  1.2219  |         1.2333         |
|        Background_Matting         |  1   | 1.311  |  0.6879   |  1.2134  |         1.2299         |
|         timm_efficientnet         | 128  | 1.2128 |  1.2128   |  1.1388  |         1.2128         |
|        doctr_det_predictor        |  4   | 1.0601 |  0.7397   |  1.0501  |         1.0501         |
|              hf_Bart              |  8   | 0.909  |  0.9092   |  1.0336  |         1.1927         |
|             hf_Albert             |  16  | 1.0229 |  1.0229   |  1.0072  |         1.0229         |
|           hf_DistilBert           |  16  | 1.0156 |  1.0156   |  1.0049  |         1.0156         |
|               dlrm                |  1   |  1.0   |    1.0    |  0.9979  |          1.0           |
|      nvidia_deeprecommender       | 512  | 1.001  |   1.001   |  0.9977  |         1.142          |
|           hf_GPT2_large           |  1   |  1.0   |    1.0    |  0.9973  |         0.9996         |
|              hf_GPT2              |  16  |  1.0   |    1.0    |  0.9972  |          1.0           |
|              hf_Bert              |  8   | 1.0087 |  1.0087   |  0.9969  |         1.0087         |
|           hf_Bert_large           |  4   | 1.0033 |  1.0033   |  0.9966  |         1.0033         |
|          resnext50_32x4d          |  64  | 1.0093 |  1.0477   |  0.9963  |         1.0093         |
|               vgg16               |  8   |  1.0   |    1.0    |  0.9827  |          1.0           |
|            timm_regnet            |  32  |  1.0   |    1.0    |  0.9753  |          1.0           |
|       functorch_dp_cifar10        | 512  |  1.0   |    1.0    |  0.9681  |          1.0           |
|   timm_vision_transformer_large   |  32  | 1.0155 |  1.0155   |  0.9629  |         0.9667         |
|             resnet152             |  64  | 1.0389 |  1.0388   |  0.9594  |          1.0           |
|            tts_angular            | 512  | 1.001  |   1.001   |  0.9583  |         1.001          |
|           fastNLP_Bert            |  16  | 1.0617 |  1.0616   |  0.9518  |         0.9574         |
|             resnet50              |  64  |  1.0   |    1.0    |  0.9488  |         1.0494         |
|            mnasnet1_0             | 128  | 1.124  |  1.0471   |  0.947   |         1.0471         |
|         phlippe_densenet          | 128  | 1.0594 |  1.0594   |  0.9418  |         0.9727         |
|               dcgan               | 1024 |  1.0   |    1.0    |  0.9404  |          1.0           |
|             resnet18              | 256  |  1.0   |    1.0    |  0.9296  |          1.0           |
|           timm_resnest            | 256  |  1.0   |    1.0    |  0.9074  |         0.9474         |
|              alexnet              | 1024 |  1.0   |    1.0    |  0.8928  |         1.0662         |
|          LearningToPaint          | 256  |  1.0   |    1.0    |  0.8634  |          1.0           |
|       doctr_reco_predictor        |  64  |  1.0   |    1.0    |  0.8433  |         0.852          |
|          vision_maskrcnn          |  4   | 1.3759 |  1.3753   |  0.8204  |         1.3758         |
|           BERT_pytorch            |  32  | 1.0264 |  1.0264   |  0.8033  |         0.809          |
|                drq                |  1   | 0.9613 |  0.9613   |  0.7762  |         0.9613         |
|            hf_T5_large            |  1   | 0.9541 |  0.9528   |  0.7346  |         0.9584         |
|        shufflenet_v2_x1_0         | 128  | 1.0511 |  0.9994   |  0.7328  |         0.8127         |
|          pytorch_stargan          |  16  | 1.0494 |  1.0492   |  0.7292  |         0.7292         |
|              demucs               |  32  | 0.8934 |  0.8934   |  0.7055  |         0.8934         |
|         soft_actor_critic         | 256  |  1.0   |    1.0    |  0.7024  |          1.0           |
|      timm_vision_transformer      | 128  | 1.1031 |  1.1031   |  0.6935  |         0.7525         |
|        speech_transformer         |  1   | 1.0551 |  1.0551   |  0.6699  |         0.6723         |
|            Super_SloMo            |  8   | 1.0852 |  0.7407   |  0.6691  |         0.6691         |
|            densenet121            |  64  | 1.196  |  1.1984   |  0.6671  |         0.6076         |
|           lennard_jones           | 1000 |  1.0   |    1.0    |  0.5327  |          1.0           |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.9982   |  0.3981  |         0.3989         |
|               hf_T5               |  4   | 0.6729 |  0.7461   |  0.359   |         0.8834         |
|          phlippe_resnet           | 256  | 1.4892 |  1.4892   |  0.3583  |         0.3611         |
|            hf_BigBird             |  4   | 0.8571 |  0.8571   |  0.3278  |         0.8572         |
|            hf_T5_base             |  1   | 0.7645 |  0.8122   |  0.3239  |         0.8896         |
|            hf_Reformer            |  8   | 0.7321 |  0.7393   |  0.2509  |         0.7401         |
|           hf_Longformer           |  4   | 0.3904 |  0.3904   |  0.2374  |         0.3905         |
|             tacotron2             | 128  | 0.9209 |  0.9209   |   nan    |          nan           |
|               llama               | 1024 |  1.0   |  0.6756   |   nan    |          nan           |
|               moco                |  64  | 1.0354 |    nan    |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  4   | 0.8694 |    nan    |   nan    |          nan           |
|          DALLE2_pytorch           |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|          vision_maskrcnn          |  4   | 193.6408 | 200.8595  | 156.7076 |        154.8324        |
|   timm_vision_transformer_large   |  32  | 142.0358 | 142.1905  | 130.5995 |        131.2456        |
|           hf_Longformer           |  4   | 141.3068 | 322.8847  | 107.9811 |        104.8337        |
|              demucs               |  32  | 94.0761  |  94.1388  | 81.1125  |        77.9673         |
|            hf_T5_base             |  1   | 110.9219 | 115.4631  | 70.6839  |        73.2236         |
|            hf_BigBird             |  4   | 123.8805 | 127.3245  | 68.1053  |        78.4072         |
|               hf_T5               |  4   | 103.3766 | 105.4867  | 62.5763  |        58.6273         |
|              hf_GPT2              |  16  |  94.483  |  94.5901  |  53.683  |        53.8919         |
|           pytorch_unet            |  4   | 68.2269  |  97.6348  | 47.4514  |        47.4754         |
|            Super_SloMo            |  8   | 63.9705  |  81.1255  | 46.8564  |         46.875         |
|            hf_T5_large            |  1   | 103.1815 |  112.413  | 43.8136  |         65.332         |
|           fastNLP_Bert            |  16  |  53.979  |  54.4642  | 38.1553  |        38.1644         |
|        doctr_det_predictor        |  4   | 48.9311  |  64.2433  | 35.9617  |        36.6363         |
|            timm_nfnet             | 128  | 43.5001  |  43.458   | 23.6251  |        24.0825         |
|           timm_resnest            | 256  | 39.6985  |  39.7262  | 23.2228  |        23.3216         |
|           hf_GPT2_large           |  1   | 37.3139  |  39.7391  | 22.0567  |        22.4471         |
|             resnet152             |  64  | 31.8951  |  32.3969  | 21.3476  |        21.6854         |
| attention_is_all_you_need_pytorch | 256  | 24.2616  |  24.4727  | 21.2205  |        21.5668         |
|            timm_vovnet            | 128  | 24.9054  |  24.947   | 18.4175  |        18.5532         |
|      timm_vision_transformer      | 128  | 21.4662  |  21.4878  | 17.7461  |        17.9044         |
|              alexnet              | 1024 | 22.1587  |  22.1406  | 17.6596  |        17.2577         |
|              hf_Bart              |  8   |  22.396  |  24.1673  | 16.3471  |        18.0951         |
|        Background_Matting         |  1   | 22.5735  |  35.5875  |  14.897  |        15.0132         |
|           hf_Bert_large           |  4   | 20.3634  |  23.1206  | 14.8437  |        14.9634         |
|            hf_Reformer            |  8   | 27.3388  |  27.3432  |  14.587  |        13.2207         |
|             hf_Albert             |  16  | 28.8533  |  28.8725  | 14.3412  |        14.6687         |
|            timm_regnet            |  32  | 19.5412  |  19.9795  | 13.6135  |        14.5742         |
|             resnet18              | 256  | 16.4811  |  16.5351  | 12.4661  |        12.4688         |
|         timm_efficientnet         | 128  | 17.0848  |  17.2579  | 12.4632  |        12.5746         |
|            densenet121            |  64  | 20.3402  |  20.5007  | 11.9025  |        12.2509         |
|          resnext50_32x4d          |  64  | 17.9332  |  18.0899  | 11.8072  |         11.973         |
|              yolov3               |  8   |  15.035  |  16.2467  | 11.5051  |        11.6233         |
|           BERT_pytorch            |  32  |  15.348  |  15.5259  | 11.2865  |        11.5267         |
|        speech_transformer         |  1   | 19.0673  |  21.7882  | 11.0612  |        13.7705         |
|           hf_DistilBert           |  16  |  13.965  |  14.0145  | 10.0689  |        10.1606         |
|              hf_Bert              |  8   |  13.866  |  13.9758  |  9.8862  |        10.1199         |
|            tts_angular            | 512  |  9.1869  |  9.1719   |  9.2492  |         9.231          |
|             resnet50              |  64  | 14.0973  |  14.2829  |  9.0935  |         9.1947         |
|           squeezenet1_1           | 256  | 14.1416  |  14.207   |  8.3985  |         8.3959         |
|           mobilenet_v2            | 128  | 12.0409  |  12.2002  |  7.7608  |         7.7607         |
|            mnasnet1_0             | 128  | 11.5663  |  11.7183  |  7.6011  |         7.6078         |
|          LearningToPaint          | 256  |  9.2729  |  9.1613   |  7.1069  |         6.9662         |
|        mobilenet_v3_large         | 128  | 10.2227  |  10.3381  |  7.0488  |         7.1027         |
|       doctr_reco_predictor        |  64  |  6.9806  |   7.069   |  5.8021  |         5.838          |
|          pytorch_stargan          |  16  |  7.0147  |  7.8547   |  5.7597  |         5.7583         |
|      nvidia_deeprecommender       | 512  |  3.8833  |  3.8903   |  4.4712  |         3.896          |
|               dcgan               | 1024 |  4.6848  |  4.5082   |  3.9245  |         3.8996         |
|        shufflenet_v2_x1_0         | 128  |  8.2596  |  8.5497   |  3.6303  |         3.6414         |
|         phlippe_densenet          | 128  |  6.5669  |   8.271   |  3.2364  |         3.8701         |
|       functorch_dp_cifar10        | 512  |  3.5803  |   3.567   |  2.8423  |          2.99          |
|               vgg16               |  8   |  3.3561  |  3.3948   |  2.558   |         2.6189         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  4.5094  |  4.4086   |  2.4604  |         2.6617         |
|          phlippe_resnet           | 256  |  2.1931  |  2.7663   |  1.6765  |         1.7453         |
|               dlrm                |  1   |  0.8356  |  0.7833   |  0.428   |         0.9191         |
|                drq                |  1   |  0.7457  |  0.7562   |  0.2118  |         0.5984         |
|         soft_actor_critic         | 256  |  0.3534  |  0.3602   |  0.1372  |         0.2567         |
|           lennard_jones           | 1000 |  0.2871  |  0.2963   |  0.117   |         0.2923         |
|             tacotron2             | 128  | 780.5868 | 770.4857  |   nan    |          nan           |
|               llama               | 1024 | 10.3325  |  16.8184  |   nan    |          nan           |
|     detectron2_fcos_r_50_fpn      |  4   | 82.3764  |    nan    |   nan    |          nan           |
|               moco                |  64  | 47.5212  |    nan    |   nan    |          nan           |
|          DALLE2_pytorch           |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       MT5ForConditionalGeneration       | 16  | 0.9882 |  0.8915   |  2.7012  |         1.9115         |
|            XLNetLMHeadModel             |  8  | 0.9935 |  0.9935   |  2.2589  |         2.2467         |
|             XGLMForCausalLM             |  8  | 0.8538 |  0.7931   |  1.886   |         1.2059         |
|      GPT2ForSequenceClassification      |  4  | 0.9619 |   0.962   |  1.7996  |         1.7778         |
|             OPTForCausalLM              |  2  | 0.9987 |  1.0155   |  1.7991  |         1.8335         |
|          MobileBertForMaskedLM          | 64  | 0.8518 |  0.7485   |  1.7358  |         1.2738         |
|                 T5Small                 |  4  | 0.9573 |  0.9497   |  1.7247  |         1.6953         |
|       T5ForConditionalGeneration        |  4  | 0.9577 |  0.9484   |  1.7221  |         1.6901         |
|               GoogleFnet                | 16  | 0.9977 |  0.9978   |  1.6889  |         1.9025         |
|               DistillGPT2               | 16  | 0.9847 |   0.984   |  1.6874  |         1.6799         |
|           ElectraForCausalLM            | 32  | 0.9633 |  0.9618   |  1.5904  |         1.5705         |
|            PLBartForCausalLM            |  8  | 0.9932 |  0.9921   |  1.5248  |         1.5877         |
|       ElectraForQuestionAnswering       | 64  | 0.968  |  0.9674   |  1.4399  |         1.4238         |
|     M2M100ForConditionalGeneration      | 16  | 0.9398 |  0.8831   |  1.4347  |         1.3186         |
|         Speech2Text2ForCausalLM         | 256 | 0.9815 |  0.9867   |  1.402   |         1.4371         |
|             BartForCausalLM             |  4  | 0.9967 |  0.9931   |  1.3549  |         1.3676         |
|            YituTechConvBert             | 16  | 0.9741 |  0.9846   |  1.3498  |         1.3435         |
|           RobertaForCausalLM            | 16  | 0.9712 |  0.9719   |  1.3493  |         1.3389         |
|            MBartForCausalLM             |  4  | 0.9953 |  0.9922   |  1.3382  |         1.3483         |
|    LayoutLMForSequenceClassification    | 16  | 0.9661 |   0.965   |  1.3362  |         1.3247         |
|           LayoutLMForMaskedLM           | 16  | 0.9708 |  0.9707   |  1.3228  |         1.314          |
|       BlenderbotSmallForCausalLM        | 64  |  0.99  |  0.9867   |  1.3132  |         1.3359         |
|       RobertaForQuestionAnswering       | 16  | 0.9654 |  0.9643   |  1.3122  |         1.3024         |
|        BertForQuestionAnswering         | 16  | 0.9648 |  0.9643   |  1.3091  |         1.3005         |
|            AlbertForMaskedLM            |  4  | 0.9958 |  0.9978   |  1.3078  |         1.3081         |
|                CamemBert                | 16  | 0.9719 |  0.9711   |  1.3065  |         1.2949         |
|       AlbertForQuestionAnswering        |  4  | 0.9954 |  0.9959   |  1.3051  |         1.3023         |
|             BertForMaskedLM             | 16  | 0.9703 |  0.9712   |  1.2971  |         1.2926         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9684 |  0.9764   |  1.2914  |         1.1867         |
|     PLBartForConditionalGeneration      |  4  | 0.9841 |  0.9782   |  1.2877  |         1.285          |
|           DebertaForMaskedLM            |  4  | 0.6941 |  0.6132   |  1.2769  |         1.0259         |
|          DistilBertForMaskedLM          | 128 | 0.9903 |   0.99    |  1.2263  |         1.2214         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9545 |  0.9544   |  1.1994  |         1.1898         |
|            TrOCRForCausalLM             | 32  | 0.9971 |  0.9953   |  1.1984  |         1.2153         |
|     DistilBertForQuestionAnswering      | 256 | 0.9913 |  0.9919   |  1.1975  |         1.1942         |
|         MegatronBertForCausalLM         |  4  | 0.9272 |  0.9274   |  1.1712  |         1.155          |
|     MobileBertForQuestionAnswering      | 128 | 0.8314 |  0.7729   |  1.1533  |         1.1302         |
|           PegasusForCausalLM            | 32  | 0.9908 |  0.9885   |  1.1487  |         1.1549         |
|          DebertaV2ForMaskedLM           |  1  | 0.5762 |   0.512   |  1.1412  |         0.7069         |
|      BartForConditionalGeneration       |  2  | 0.9774 |  0.9697   |  1.1389  |         1.1217         |
|          AllenaiLongformerBase          |  4  | 0.8311 |  0.3603   |  1.1281  |         1.1602         |
|      MBartForConditionalGeneration      |  2  | 0.9621 |  0.9693   |  1.1183  |         1.1078         |
|     PegasusForConditionalGeneration     | 32  | 0.9729 |  0.9797   |  1.0869  |         1.0789         |
|          BlenderbotForCausalLM          |  4  | 0.8219 |  0.7984   |  1.0512  |         1.044          |
|       DebertaForQuestionAnswering       |  8  | 0.9692 |  0.8471   |  1.047   |         1.3318         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.586  |  0.5991   |  0.921   |         0.8083         |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|               GoogleFnet                | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          AllenaiLongformerBase          |  4  | 7.3464  |  17.8156  | 74.4527  |        62.3997         |
|      DebertaV2ForQuestionAnswering      |  2  | 7.7715  |  10.8712  | 39.7915  |        26.2237         |
|          DebertaV2ForMaskedLM           |  1  |  7.566  |  11.012   | 39.4228  |        25.9096         |
|     MobileBertForQuestionAnswering      | 128 | 15.4822 |  21.443   | 37.5372  |        37.4897         |
|          MobileBertForMaskedLM          | 64  | 15.4361 |  21.5753  | 36.2554  |        35.6657         |
|     M2M100ForConditionalGeneration      | 16  | 4.4089  |  8.4552   |  29.306  |        26.7293         |
|            XLNetLMHeadModel             |  8  | 5.0533  |  10.4946  | 28.1601  |         28.809         |
|       MT5ForConditionalGeneration       | 16  | 5.0652  |  7.6617   | 27.8499  |        27.7992         |
|           DebertaForMaskedLM            |  4  | 4.0626  |  6.2606   | 26.9032  |        19.8899         |
|       DebertaForQuestionAnswering       |  8  | 4.1454  |  6.1749   | 26.4084  |         19.572         |
|     PegasusForConditionalGeneration     | 32  | 4.2528  |  8.2133   | 26.3983  |        25.8311         |
|             XGLMForCausalLM             |  8  | 3.5673  |  6.8418   | 26.1937  |         22.839         |
|          BlenderbotForCausalLM          |  4  | 3.5992  |   6.631   | 23.6705  |        20.5129         |
|            YituTechConvBert             | 16  | 3.3766  |  5.7813   | 22.9145  |        22.2111         |
|      BartForConditionalGeneration       |  2  | 4.4986  |  8.5543   | 22.3036  |        21.9549         |
|           ElectraForCausalLM            | 32  | 2.4641  |  4.0428   | 22.2271  |        19.2678         |
|     PLBartForConditionalGeneration      |  4  | 3.6159  |  5.6838   | 21.7781  |        20.3014         |
|      MBartForConditionalGeneration      |  2  | 4.4836  |  8.6447   | 21.5754  |        21.3252         |
|            MBartForCausalLM             |  4  | 2.0317  |  3.6361   | 20.1181  |        16.1675         |
|            TrOCRForCausalLM             | 32  | 2.2027  |  3.6454   | 19.0339  |        17.7479         |
|       BlenderbotSmallForCausalLM        | 64  | 1.7049  |  2.7038   | 18.3077  |        15.3715         |
|       T5ForConditionalGeneration        |  4  | 3.3898  |  5.3293   | 18.2346  |        18.1498         |
|                 T5Small                 |  4  | 3.3883  |  5.3677   | 18.1819  |        17.9587         |
|               GoogleFnet                | 16  | 1.5578  |  2.3435   |  18.161  |        13.1706         |
|             BartForCausalLM             |  4  | 2.1693  |  3.6504   | 17.8162  |        16.0648         |
| BlenderbotSmallForConditionalGeneration | 64  | 3.0134  |  5.6103   | 17.7408  |        17.1638         |
|           PegasusForCausalLM            | 32  | 1.9809  |  3.5391   | 17.4076  |        15.9636         |
|         MegatronBertForCausalLM         |  4  | 4.8882  |  7.8416   | 17.1618  |        17.0175         |
|             OPTForCausalLM              |  2  |  2.159  |  3.6535   | 17.0997  |        14.9461         |
|    MegatronBertForQuestionAnswering     |  8  |  4.841  |  7.7894   | 17.0335  |        17.1013         |
|            PLBartForCausalLM            |  8  | 1.3005  |  2.1753   | 15.9801  |        14.8611         |
|         Speech2Text2ForCausalLM         | 256 | 1.1794  |   2.036   | 15.9672  |        14.5855         |
|      GPT2ForSequenceClassification      |  4  | 2.5227  |  4.0595   | 15.1507  |        14.5196         |
|           LayoutLMForMaskedLM           | 16  | 2.6751  |   4.258   | 15.1189  |        13.5883         |
|    LayoutLMForSequenceClassification    | 16  | 2.6604  |  4.2231   | 15.1042  |        13.4161         |
|     DistilBertForQuestionAnswering      | 256 | 0.9645  |   1.852   |  14.313  |        14.3004         |
|            AlbertForMaskedLM            |  4  |  1.823  |  3.4349   |  14.204  |        13.4766         |
|       ElectraForQuestionAnswering       | 64  | 2.4232  |  3.9472   |  13.947  |         12.373         |
|           RobertaForCausalLM            | 16  | 2.4405  |   3.935   | 12.6494  |        12.4924         |
|       AlbertForQuestionAnswering        |  4  |  1.818  |  3.4268   | 12.5125  |         12.191         |
|               DistillGPT2               | 16  |  1.257  |  2.2129   | 11.9388  |        11.7975         |
|        BertForQuestionAnswering         | 16  | 2.4166  |  3.9506   | 11.4339  |        10.1714         |
|       RobertaForQuestionAnswering       | 16  |  2.435  |  3.9326   |  10.593  |        10.3068         |
|             BertForMaskedLM             | 16  | 2.4361  |  3.9584   | 10.3961  |        10.1533         |
|          DistilBertForMaskedLM          | 128 | 0.9903  |  1.8112   | 10.2963  |        10.0923         |
|                CamemBert                | 16  | 2.4651  |  3.9064   | 10.2845  |        10.0996         |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|           ElectraForCausalLM            | 32  | 0.9946 |  0.9946   |  2.4838  |         2.554          |
|               DistillGPT2               | 16  | 1.0041 |  1.0041   |  2.0018  |         2.0075         |
|           RobertaForCausalLM            | 16  | 1.0065 |  1.0065   |  1.8161  |         1.8234         |
|          DistilBertForMaskedLM          | 128 | 1.0111 |  1.0111   |  1.7631  |         1.7691         |
| BlenderbotSmallForConditionalGeneration | 64  | 1.0042 |  1.0042   |  1.6885  |         1.6947         |
|                CamemBert                | 16  | 1.0084 |  1.0084   |  1.5248  |         1.5315         |
|             BertForMaskedLM             | 16  | 1.0087 |  1.0087   |  1.5151  |         1.522          |
|           LayoutLMForMaskedLM           | 16  | 1.0086 |  1.0086   |  1.5076  |         1.5143         |
|       MT5ForConditionalGeneration       | 16  |  1.0   |    1.0    |  1.4483  |         1.451          |
|       BlenderbotSmallForCausalLM        | 64  | 0.9344 |  0.9344   |  1.4426  |         1.5037         |
|       T5ForConditionalGeneration        |  4  | 1.0096 |  1.0096   |  1.415   |         1.4248         |
|                 T5Small                 |  4  | 1.0096 |  1.0096   |  1.415   |         1.4248         |
|            YituTechConvBert             | 16  | 0.9748 |  0.9748   |  1.4091  |         1.4628         |
|            PLBartForCausalLM            |  8  | 0.9305 |  0.9305   |  1.3022  |         1.4272         |
|             OPTForCausalLM              |  2  | 0.9236 |  0.9236   |  1.2556  |         1.5083         |
|         Speech2Text2ForCausalLM         | 256 | 0.8748 |  0.8748   |  1.2159  |         1.3714         |
|     PegasusForConditionalGeneration     | 32  | 0.9933 |  0.9933   |  1.1627  |         1.1905         |
|         MegatronBertForCausalLM         |  4  | 1.0025 |  1.0025   |  1.1586  |         1.1619         |
|     PLBartForConditionalGeneration      |  4  | 0.9045 |  0.9045   |  1.1461  |         1.189          |
|      MBartForConditionalGeneration      |  2  | 1.0021 |  1.0021   |  1.1091  |         1.1117         |
|            TrOCRForCausalLM             | 32  | 0.8803 |  0.8803   |  1.1054  |         1.1507         |
|             XGLMForCausalLM             |  8  | 0.9702 |  0.9702   |  1.0954  |         1.1396         |
|     M2M100ForConditionalGeneration      | 16  | 0.9362 |  0.9362   |  1.0902  |         1.1055         |
|      BartForConditionalGeneration       |  2  | 1.0021 |  1.0021   |  1.0599  |         1.0623         |
|           PegasusForCausalLM            | 32  | 0.907  |   0.907   |  1.0553  |         1.1086         |
|             BartForCausalLM             |  4  | 0.9074 |  0.9074   |  1.0188  |         1.1093         |
|            MBartForCausalLM             |  4  | 0.9074 |  0.9074   |  1.0146  |         1.1093         |
|     MobileBertForQuestionAnswering      | 128 | 1.9097 |  1.9097   |  1.0066  |         1.021          |
|            XLNetLMHeadModel             |  8  |  1.0   |    1.0    |   1.0    |          1.0           |
|       AlbertForQuestionAnswering        |  4  | 1.0896 |  1.0896   |  0.9832  |         0.9866         |
|            AlbertForMaskedLM            |  4  | 1.0894 |  1.0894   |  0.9828  |         0.9862         |
|    MegatronBertForQuestionAnswering     |  8  | 1.0339 |  1.0339   |  0.9809  |         0.9836         |
|          BlenderbotForCausalLM          |  4  | 0.9883 |  0.9883   |  0.9796  |         0.9889         |
|      GPT2ForSequenceClassification      |  4  | 1.0149 |  1.0149   |  0.9653  |         0.9695         |
|    LayoutLMForSequenceClassification    | 16  | 1.0924 |  1.0924   |  0.9616  |         0.9669         |
|        BertForQuestionAnswering         | 16  | 1.0943 |  1.0943   |  0.9606  |         0.966          |
|       RobertaForQuestionAnswering       | 16  | 1.0943 |  1.0943   |  0.9606  |         0.966          |
|       ElectraForQuestionAnswering       | 64  | 1.2329 |  1.2329   |  0.9205  |         0.9289         |
|          MobileBertForMaskedLM          | 64  | 1.0073 |  1.0073   |  0.9034  |         0.9065         |
|     DistilBertForQuestionAnswering      | 256 | 1.1397 |  1.1397   |  0.8871  |         0.8914         |
|          DebertaV2ForMaskedLM           |  1  | 0.9996 |  0.9996   |  0.793   |         1.0348         |
|               GoogleFnet                | 16  | 0.9911 |  0.9911   |  0.7194  |         1.5527         |
|      DebertaV2ForQuestionAnswering      |  2  | 1.0008 |   0.994   |  0.6539  |         1.0008         |
|           DebertaForMaskedLM            |  4  | 0.9569 |  0.9569   |  0.4906  |         1.212          |
|          AllenaiLongformerBase          |  4  | 0.6108 |  0.6108   |  0.4883  |         0.7383         |
|       DebertaForQuestionAnswering       |  8  |  0.93  |  0.8765   |  0.2841  |          0.93          |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 126.2542 | 126.5573  | 97.5407  |        97.4501         |
|       AlbertForQuestionAnswering        |  4  | 125.5669 | 125.6202  | 96.9402  |        97.0922         |
|            XLNetLMHeadModel             |  8  | 158.2154 | 158.1649  | 69.6399  |        70.2706         |
|     PegasusForConditionalGeneration     | 32  | 56.2907  |  55.6185  | 50.3291  |        50.5872         |
|            TrOCRForCausalLM             | 32  |  57.022  |  57.141   | 47.4576  |        46.9352         |
|          AllenaiLongformerBase          |  4  | 60.7885  | 140.3403  |  44.852  |        43.4621         |
|    MegatronBertForQuestionAnswering     |  8  | 55.7552  |  55.734   |  44.35   |        44.7266         |
|      MBartForConditionalGeneration      |  2  | 50.4745  |  49.271   | 43.3659  |        43.8122         |
|      BartForConditionalGeneration       |  2  | 48.7139  |  49.0767  | 42.2649  |        43.1124         |
|            YituTechConvBert             | 16  | 57.1477  |  56.4361  | 41.1798  |         41.394         |
|     MobileBertForQuestionAnswering      | 128 |  55.161  |   50.43   | 39.5618  |        40.2196         |
|     DistilBertForQuestionAnswering      | 256 | 46.7692  |  46.7596  | 38.9442  |        39.0187         |
| BlenderbotSmallForConditionalGeneration | 64  | 42.0612  |  41.3286  |  34.098  |        34.3559         |
|                CamemBert                | 16  | 45.3732  |  45.3945  | 33.7687  |        34.0647         |
|           LayoutLMForMaskedLM           | 16  | 45.6835  |  45.6522  | 33.5465  |        33.7145         |
|             BertForMaskedLM             | 16  | 44.8864  |  44.8019  | 33.5365  |        33.6575         |
|           RobertaForCausalLM            | 16  | 46.5962  |  46.4922  | 33.4839  |        33.7512         |
|          DistilBertForMaskedLM          | 128 |  41.04   |  41.0428  | 33.1664  |        33.2767         |
|            MBartForCausalLM             |  4  |  43.867  |  44.4155  | 33.1073  |        32.6457         |
|             BartForCausalLM             |  4  | 44.2067  |  44.404   | 32.6872  |        32.2174         |
|             OPTForCausalLM              |  2  |  62.434  |  61.0617  | 32.3361  |         31.721         |
|          MobileBertForMaskedLM          | 64  | 63.7915  |  58.382   | 31.1003  |         42.51          |
|      DebertaV2ForQuestionAnswering      |  2  | 48.5257  |  46.8956  | 30.5598  |        34.7174         |
|     M2M100ForConditionalGeneration      | 16  | 46.0436  |  38.9137  | 29.7022  |         32.725         |
|     PLBartForConditionalGeneration      |  4  | 38.3815  |  37.3951  | 29.1801  |        29.0664         |
|            PLBartForCausalLM            |  8  | 43.8298  |  46.9022  | 27.9291  |        27.5476         |
|         MegatronBertForCausalLM         |  4  | 34.2912  |  34.246   | 27.1647  |        27.5346         |
|    LayoutLMForSequenceClassification    | 16  | 37.3113  |  37.3188  | 26.9969  |        27.1816         |
|       RobertaForQuestionAnswering       | 16  | 36.7772  |  36.7082  | 26.9932  |        27.1737         |
|        BertForQuestionAnswering         | 16  |  36.572  |  36.5435  |  26.974  |         27.084         |
|       ElectraForQuestionAnswering       | 64  | 39.8295  |  39.7537  | 26.8014  |        27.0277         |
|               DistillGPT2               | 16  | 41.1372  |  41.1132  |  23.955  |        24.0677         |
|           PegasusForCausalLM            | 32  | 27.5298  |  27.6807  | 23.8989  |         23.676         |
|          BlenderbotForCausalLM          |  4  | 32.5449  |  31.4046  | 23.3216  |        25.7966         |
|          DebertaV2ForMaskedLM           |  1  | 44.9702  |  50.1849  | 22.5054  |        37.1475         |
|       DebertaForQuestionAnswering       |  8  | 24.0186  |  27.4533  |  22.296  |        17.4813         |
|               GoogleFnet                | 16  | 37.0892  |  37.0956  | 21.9284  |        19.4609         |
|           ElectraForCausalLM            | 32  | 34.3517  |  34.2947  | 20.8451  |        21.0074         |
|       T5ForConditionalGeneration        |  4  | 36.2782  |  36.2645  | 19.9811  |        20.3573         |
|                 T5Small                 |  4  | 36.2273  |  36.237   | 19.9487  |         20.314         |
|      GPT2ForSequenceClassification      |  4  | 34.3034  |  34.1441  | 18.3303  |        18.4762         |
|       BlenderbotSmallForCausalLM        | 64  | 22.9798  |  23.4065  | 17.3623  |        17.3722         |
|         Speech2Text2ForCausalLM         | 256 | 24.6211  |  23.4238  | 16.5433  |        16.1413         |
|             XGLMForCausalLM             |  8  | 41.2122  |  34.9513  | 16.3257  |        23.1257         |
|       MT5ForConditionalGeneration       | 16  | 40.5845  |  37.7231  | 14.9378  |        20.9932         |
|           DebertaForMaskedLM            |  4  | 27.2988  |  31.156   | 14.8458  |        18.2129         |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9862 |  1.1076   |  1.8839  |         1.8791         |
|           dm_nfnet_f0           | 128 | 0.9858 |  0.9857   |  1.8184  |         1.7779         |
|        sebotnet33ts_256         | 64  | 0.9664 |  0.9608   |  1.6653  |         1.6589         |
|            nfnet_l0             | 128 | 0.9879 |  0.9881   |  1.639   |         1.6231         |
|         poolformer_m36          | 64  | 0.9701 |  0.9717   |  1.5818  |         1.5613         |
|          cait_m36_384           |  4  | 0.9889 |  0.9983   |  1.5802  |         1.5522         |
|      xcit_large_24_p8_224       |  5  | 0.981  |  0.9948   |  1.5605  |         1.5203         |
|           resnest101e           | 64  | 0.9865 |  0.9699   |  1.5588  |         1.507          |
|           volo_d1_224           | 64  | 0.9869 |  0.9797   |  1.5525  |         1.5345         |
|       eca_botnext26ts_256       | 128 | 0.9769 |  0.9731   |  1.4959  |         1.4849         |
|          botnet26t_256          | 128 | 0.9795 |  0.9777   |  1.488   |         1.4893         |
|         coat_lite_mini          | 128 | 0.9923 |   0.993   |  1.4759  |         1.4632         |
|             dla102              | 128 | 0.9901 |  0.9885   |  1.453   |          1.45          |
|          gmixer_24_224          | 128 | 0.9883 |  1.0315   |  1.4524  |         1.4471         |
|        res2net50_14w_8s         | 128 | 0.9978 |  0.9854   |  1.4394  |         1.4235         |
|        tnt_s_patch16_224        | 128 | 0.9963 |  0.9963   |  1.4195  |         1.4088         |
|        res2net101_26w_4s        | 64  | 0.9959 |  0.9767   |  1.4145  |         1.3916         |
|            hrnet_w18            | 128 | 0.9806 |  0.9272   |  1.411   |         1.3319         |
|           res2next50            | 128 | 0.9986 |   0.991   |  1.3912  |         1.3735         |
|          jx_nest_base           | 32  | 0.9683 |  0.9622   |  1.3839  |         1.3622         |
|           convit_base           | 64  | 0.9949 |  0.9946   |  1.3723  |         1.3634         |
|            repvgg_a2            | 128 | 0.9602 |  0.9537   |  1.3686  |         1.3759         |
|          inception_v3           | 128 |  0.99  |  0.9768   |  1.3636  |         1.3638         |
|       gluon_inception_v3        | 128 |  0.99  |  0.9768   |  1.3627  |         1.3636         |
|        adv_inception_v3         | 128 |  0.99  |  0.9766   |  1.3624  |         1.364          |
|         mobilenetv2_100         | 128 | 0.9626 |   0.949   |  1.3539  |         1.3557         |
|          ghostnet_100           | 128 | 0.962  |  0.8149   |  1.3447  |         1.345          |
|          convnext_base          | 64  | 0.9781 |  0.9775   |  1.3407  |         1.3343         |
|       tf_efficientnet_b0        | 128 | 0.9702 |  0.9637   |  1.3282  |         1.3199         |
|           rexnet_100            | 128 | 0.9517 |  0.9432   |  1.3161  |         1.3125         |
|            gernet_l             | 128 | 0.9573 |  0.9498   |  1.3121  |         1.3165         |
|            tinynet_a            | 128 | 0.9588 |  0.9425   |  1.309   |         1.2996         |
|        ese_vovnet19b_dw         | 128 | 0.9633 |  0.9601   |  1.2965  |         1.2982         |
|      mobilenetv3_large_100      | 128 | 0.9532 |  0.9372   |  1.2962  |         1.2963         |
|           tf_mixnet_l           | 128 | 0.9745 |  0.9721   |  1.2883  |         1.2792         |
|          spnasnet_100           | 128 | 0.9593 |  0.9422   |  1.2823  |         1.2839         |
|          cspdarknet53           | 64  | 0.9422 |  0.9313   |  1.2817  |         1.2766         |
|           mnasnet_100           | 128 | 0.9638 |  0.9514   |  1.2774  |         1.2799         |
|          resmlp_12_224          | 128 | 0.982  |  0.9774   |  1.2698  |         1.272          |
|             dpn107              | 32  | 0.959  |  0.9508   |  1.2669  |         1.2508         |
|     swsl_resnext101_32x16d      | 32  | 0.9954 |  0.9838   |  1.2661  |         1.2378         |
|           fbnetc_100            | 128 | 0.9656 |  0.9522   |  1.265   |         1.267          |
|           selecsls42b           | 128 | 0.9965 |  0.9924   |  1.2605  |         1.2597         |
|            mixnet_l             | 128 | 0.9765 |  0.9747   |  1.2598  |         1.2488         |
|        convmixer_768_32         | 32  | 0.9968 |  0.9961   |  1.2449  |         1.2444         |
|         crossvit_9_240          | 128 | 0.973  |  0.5755   |  1.2436  |         1.2291         |
|            fbnetv3_b            | 128 | 0.9651 |  0.9507   |  1.2414  |         1.236          |
|           regnety_002           | 128 | 0.8913 |  0.8702   |  1.2373  |         1.2011         |
|          pnasnet5large          | 16  | 0.9743 |  0.9625   |  1.2296  |         1.2237         |
|           mobilevit_s           | 64  | 0.955  |   0.951   |  1.2271  |         1.2189         |
|        gluon_xception65         | 32  | 0.9838 |  0.9758   |  1.1692  |         1.1764         |
|        twins_pcpvt_base         | 64  | 0.9743 |   0.974   |  1.1605  |         1.1378         |
|            lcnet_050            | 128 | 0.9075 |  0.8804   |  1.1556  |         1.1762         |
|            pit_b_224            | 64  | 0.9888 |  0.9884   |  1.1544  |         1.145          |
|  swin_base_patch4_window7_224   | 64  | 0.9827 |  0.9826   |  1.1388  |         1.1315         |
|          mixer_b16_224          | 128 | 0.9948 |  1.0318   |  1.1372  |         1.1394         |
|      beit_base_patch16_224      | 64  | 0.9926 |  0.9923   |  1.1076  |         1.1025         |
| deit_base_distilled_patch16_224 | 64  | 0.9933 |  0.9928   |  1.0912  |         1.0904         |
|      vit_base_patch16_224       | 64  | 0.993  |  0.9925   |  1.088   |         1.0849         |
|         visformer_small         | 128 | 0.9915 |  0.9875   |  1.0797  |         1.0692         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+-----------+---------------+------------------------+
|              name               | bs | eager | aot_eager |   inductor    | inductor_no_cudagraphs |
+---------------------------------+----+-------+-----------+---------------+------------------------+
|        adv_inception_v3         | 8  | pass  |   pass    |     pass      |          pass          |
|      beit_base_patch16_224      | 8  | pass  |   pass    |     pass      |          pass          |
|           mobilevit_s           | 8  | pass  |   pass    |     pass      |          pass          |
|            nfnet_l0             | 8  | pass  |   pass    |     pass      |          pass          |
|            pit_b_224            | 8  | pass  |   pass    |     pass      |          pass          |
|          pnasnet5large          | 8  | pass  |   pass    |     pass      |          pass          |
|         poolformer_m36          | 8  | pass  |   pass    |     pass      |          pass          |
|           regnety_002           | 8  | pass  |   pass    |     pass      |          pass          |
|            repvgg_a2            | 8  | pass  |   pass    |     pass      |          pass          |
|        res2net101_26w_4s        | 8  | pass  |   pass    |     pass      |          pass          |
|        res2net50_14w_8s         | 8  | pass  |   pass    |     pass      |          pass          |
|           res2next50            | 8  | pass  |   pass    |     pass      |          pass          |
|          resmlp_12_224          | 8  | pass  |   pass    |     pass      |          pass          |
|           resnest101e           | 8  | pass  |   pass    |     pass      |          pass          |
|           rexnet_100            | 8  | pass  |   pass    |     pass      |          pass          |
|        sebotnet33ts_256         | 8  | pass  |   pass    |     pass      |          pass          |
|           selecsls42b           | 8  | pass  |   pass    |     pass      |          pass          |
|          spnasnet_100           | 8  | pass  |   pass    |     pass      |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |   pass    |     pass      |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |   pass    |     pass      |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |   pass    |     pass      |          pass          |
|           tf_mixnet_l           | 8  | pass  |   pass    |     pass      |          pass          |
|            tinynet_a            | 8  | pass  |   pass    |     pass      |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |   pass    |     pass      |          pass          |
|        twins_pcpvt_base         | 8  | pass  |   pass    |     pass      |          pass          |
|         visformer_small         | 8  | pass  |   pass    |     pass      |          pass          |
|      vit_base_patch16_224       | 8  | pass  |   pass    |     pass      |          pass          |
|           volo_d1_224           | 8  | pass  |   pass    |     pass      |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |   pass    |     pass      |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |   pass    |     pass      |          pass          |
|         mobilenetv2_100         | 8  | pass  |   pass    |     pass      |          pass          |
|           mnasnet_100           | 8  | pass  |   pass    |     pass      |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |   pass    |     pass      |          pass          |
|          botnet26t_256          | 8  | pass  |   pass    |     pass      |          pass          |
|         coat_lite_mini          | 8  | pass  |   pass    |     pass      |          pass          |
|           convit_base           | 8  | pass  |   pass    |     pass      |          pass          |
|        convmixer_768_32         | 8  | pass  |   pass    |     pass      |          pass          |
|          convnext_base          | 8  | pass  |   pass    |     pass      |          pass          |
|         crossvit_9_240          | 8  | pass  |   pass    |     pass      |          pass          |
|          cspdarknet53           | 8  | pass  |   pass    |     pass      |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |   pass    |     pass      |          pass          |
|             dla102              | 8  | pass  |   pass    |     pass      |          pass          |
|           dm_nfnet_f0           | 8  | pass  |   pass    |     pass      |          pass          |
|             dpn107              | 8  | pass  |   pass    |     pass      |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |   pass    |     pass      |          pass          |
|           fbnetc_100            | 8  | pass  |   pass    |     pass      |          pass          |
|            mixnet_l             | 8  | pass  |   pass    |     pass      |          pass          |
|            fbnetv3_b            | 8  | pass  |   pass    |     pass      |          pass          |
|            gernet_l             | 8  | pass  |   pass    |     pass      |          pass          |
|          ghostnet_100           | 8  | pass  |   pass    |     pass      |          pass          |
|       gluon_inception_v3        | 8  | pass  |   pass    |     pass      |          pass          |
|        gluon_xception65         | 8  | pass  |   pass    |     pass      |          pass          |
|          gmixer_24_224          | 8  | pass  |   pass    |     pass      |          pass          |
|          gmlp_s16_224           | 8  | pass  |   pass    |     pass      |          pass          |
|            hrnet_w18            | 8  | pass  |   pass    |     pass      |          pass          |
|          inception_v3           | 8  | pass  |   pass    |     pass      |          pass          |
|          jx_nest_base           | 8  | pass  |   pass    |     pass      |          pass          |
|            lcnet_050            | 8  | pass  |   pass    |     pass      |          pass          |
|          mixer_b16_224          | 8  | pass  |   pass    |     pass      |          pass          |
|          cait_m36_384           | 4  | pass  |   pass    | fail_accuracy |     fail_accuracy      |
+---------------------------------+----+-------+-----------+---------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|           mobilevit_s           | 64  | 2.5109 |  4.5861   | 63.6771  |        62.5295         |
|        twins_pcpvt_base         | 64  | 3.3927 |  7.1876   |  62.207  |        62.4913         |
|         coat_lite_mini          | 128 | 1.4326 |  2.8992   | 47.3055  |        46.9281         |
|         poolformer_m36          | 64  | 3.1267 |  4.7846   | 39.7162  |         38.66          |
|            hrnet_w18            | 128 | 8.4835 |  18.9589  | 37.9987  |        36.7699         |
|          jx_nest_base           | 32  | 2.6134 |  4.9714   | 37.6429  |        36.5513         |
|  swin_base_patch4_window7_224   | 64  | 3.8164 |  7.3584   | 37.6034  |        37.5829         |
|          pnasnet5large          | 16  | 7.5197 |  14.7443  | 34.6454  |        33.7544         |
|          cait_m36_384           |  4  | 4.2816 |  9.6582   | 34.3131  |        33.9495         |
|           resnest101e           | 64  | 4.4083 |  9.7529   | 30.3535  |        30.0594         |
|      xcit_large_24_p8_224       |  5  | 4.0315 |  9.0001   | 29.0763  |        28.5659         |
|         crossvit_9_240          | 128 | 2.106  |  4.9942   | 27.8044  |        27.6619         |
|        res2net101_26w_4s        | 64  | 3.9006 |  10.5464  | 24.7029  |        24.4074         |
|            nfnet_l0             | 128 | 2.6694 |   4.555   | 24.5339  |        23.5969         |
|        tnt_s_patch16_224        | 128 | 2.4935 |  5.5044   | 23.5368  |        23.3995         |
|        res2net50_14w_8s         | 128 | 3.3069 |  10.0388  | 23.5204  |        23.4001         |
|             dpn107              | 32  | 5.3803 |  9.4092   | 22.7362  |        22.8578         |
|        sebotnet33ts_256         | 64  | 2.3773 |  4.1817   | 21.8676  |        21.9724         |
|           dm_nfnet_f0           | 128 | 3.3494 |  5.2263   | 21.4196  |        21.7254         |
|          botnet26t_256          | 128 | 1.6571 |  3.0716   | 21.3122  |        21.2952         |
|            fbnetv3_b            | 128 | 4.0129 |  7.5427   | 21.2972  |        20.9957         |
|           tf_mixnet_l           | 128 | 4.9118 |  7.8595   | 20.8744  |        21.6546         |
|           volo_d1_224           | 64  | 1.9156 |  4.3133   | 20.8301  |        21.6174         |
|           rexnet_100            | 128 | 2.7202 |  4.9431   | 19.9971  |        18.8192         |
|            mixnet_l             | 128 | 4.4366 |  7.4657   | 19.7177  |        19.5087         |
|          gmlp_s16_224           | 128 | 1.8286 |  3.8192   | 19.6017  |         18.215         |
|          ghostnet_100           | 128 | 2.1601 |   5.095   | 18.6402  |        18.5284         |
|          convnext_base          | 64  | 2.6738 |  4.2466   |  18.557  |        17.5518         |
|       eca_botnext26ts_256       | 128 | 1.7939 |  3.3672   |  17.878  |        17.7734         |
|        gluon_xception65         | 32  | 2.7732 |   6.728   | 17.6377  |        17.4105         |
|          gmixer_24_224          | 128 | 2.0412 |  4.0211   | 17.4138  |        16.3532         |
|            tinynet_a            | 128 | 2.6767 |  4.8185   |  17.001  |        17.0008         |
|             dla102              | 128 | 2.5362 |  5.9113   | 16.8211  |        16.7233         |
|           convit_base           | 64  | 1.4174 |  3.2202   | 16.6146  |        16.3608         |
|        adv_inception_v3         | 128 | 2.4349 |  5.3608   | 16.4565  |         16.311         |
|          cspdarknet53           | 64  | 3.2084 |   5.455   | 16.4465  |        16.2417         |
|       gluon_inception_v3        | 128 | 2.4051 |  5.3807   | 16.4334  |        15.8552         |
|           res2next50            | 128 | 1.8606 |  5.3242   | 16.0582  |        15.8784         |
|       tf_efficientnet_b0        | 128 | 2.4044 |  4.3198   | 16.0181  |         15.09          |
|     swsl_resnext101_32x16d      | 32  | 2.1616 |  5.6607   | 15.8774  |         15.358         |
|          resmlp_12_224          | 128 | 0.9143 |  1.6517   | 15.7966  |         15.438         |
|      beit_base_patch16_224      | 64  | 1.4667 |  3.1358   | 15.4823  |        15.3814         |
|      mobilenetv3_large_100      | 128 | 1.9087 |   3.713   | 15.4547  |        14.4825         |
|          inception_v3           | 128 |  2.4   |  5.3349   |  15.445  |        15.3646         |
|           regnety_002           | 128 | 2.3567 |  3.9035   | 14.5992  |         13.608         |
|            pit_b_224            | 64  | 1.3228 |  2.9014   | 14.5198  |        14.6241         |
|            lcnet_050            | 128 | 1.157  |  2.2195   | 14.5036  |        14.1976         |
| deit_base_distilled_patch16_224 | 64  | 1.1333 |  2.4567   | 13.7162  |        12.8836         |
|          spnasnet_100           | 128 | 2.4494 |  4.3582   | 13.4498  |        13.2197         |
|           fbnetc_100            | 128 | 2.513  |  4.4267   | 13.3719  |        13.1345         |
|          mixer_b16_224          | 128 | 0.9072 |  1.7908   |  13.333  |        12.1382         |
|            repvgg_a2            | 128 | 2.6836 |  4.4134   |  12.983  |        12.6781         |
|            gernet_l             | 128 | 2.7131 |  4.4845   |  12.823  |        12.2412         |
|         mobilenetv2_100         | 128 | 1.9222 |  3.7363   | 12.2066  |        11.1786         |
|           mnasnet_100           | 128 | 1.8568 |  3.5354   | 12.0597  |        11.0723         |
|      vit_base_patch16_224       | 64  | 1.1511 |  2.3926   | 11.8892  |        13.1103         |
|         visformer_small         | 128 | 1.2108 |  2.5719   | 11.7276  |        12.5315         |
|           selecsls42b           | 128 | 0.8573 |  2.1482   | 11.3474  |        12.0893         |
|        ese_vovnet19b_dw         | 128 | 1.2276 |  2.0608   | 11.0977  |        11.0487         |
|        convmixer_768_32         | 32  | 1.4111 |  3.7471   | 10.0531  |        10.7476         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|         mobilenetv2_100         | 128 | 1.2161 |  1.2161   |  1.6528  |         1.7996         |
|           rexnet_100            | 128 | 1.2146 |  1.2146   |  1.6467  |         1.7914         |
|            tinynet_a            | 128 | 1.2101 |  1.2101   |  1.6235  |         1.7668         |
|           fbnetc_100            | 128 | 1.1425 |  1.1425   |  1.5449  |         1.6797         |
|           dm_nfnet_f0           | 128 | 1.1745 |  1.7697   |  1.5321  |         1.5939         |
|            fbnetv3_b            | 128 | 1.1984 |  1.1984   |  1.5302  |         1.706          |
|      mobilenetv3_large_100      | 128 | 1.1993 |  1.1993   |  1.5295  |         1.7105         |
|        ese_vovnet19b_dw         | 128 | 1.4966 |  1.4966   |  1.4838  |         1.564          |
|           selecsls42b           | 128 | 1.5903 |  1.5903   |  1.4697  |         1.5901         |
|          pnasnet5large          | 16  | 1.5067 |  1.5067   |  1.3889  |         1.4158         |
|        sebotnet33ts_256         | 64  | 1.1865 |  1.1865   |  1.374   |         1.4004         |
|        gluon_xception65         | 32  | 1.405  |   1.405   |  1.3623  |         1.405          |
|           mnasnet_100           | 128 | 1.3777 |  1.3777   |  1.354   |         1.5095         |
|          spnasnet_100           | 128 | 1.3775 |  1.3775   |  1.3538  |         1.5093         |
|        convmixer_768_32         | 32  | 1.1936 |  1.1936   |  1.3296  |         1.4137         |
|            nfnet_l0             | 128 | 1.3942 |  1.3942   |  1.3246  |         1.3942         |
|         poolformer_m36          | 64  | 1.1888 |  1.1888   |  1.3016  |         1.3592         |
|          convnext_base          | 64  | 1.1431 |  1.1431   |  1.2874  |         1.3331         |
|       tf_efficientnet_b0        | 128 | 1.3185 |  1.3185   |  1.2451  |         1.3185         |
|            hrnet_w18            | 128 | 1.0654 |  1.0654   |  1.2414  |         1.3255         |
|          cspdarknet53           | 64  | 1.6402 |  1.6402   |  1.2103  |         1.2425         |
|        res2net50_14w_8s         | 128 | 1.2884 |  1.2884   |  1.1395  |         1.1961         |
|           res2next50            | 128 | 1.3217 |  1.3217   |  1.1334  |         1.1879         |
|            mixnet_l             | 128 | 1.1528 |  1.1528   |  1.1172  |         1.1528         |
|           tf_mixnet_l           | 128 | 1.1528 |  1.1528   |  1.1172  |         1.1528         |
|        res2net101_26w_4s        | 64  | 1.2034 |  1.2034   |  1.082   |         1.1263         |
|       eca_botnext26ts_256       | 128 | 1.1405 |  1.1405   |  1.0787  |         1.1404         |
|          botnet26t_256          | 128 | 1.1393 |  1.1393   |  1.0781  |         1.1393         |
|         coat_lite_mini          | 128 | 1.1027 |  1.1027   |  1.0754  |         1.123          |
|           mobilevit_s           | 64  | 1.164  |   1.164   |  1.0235  |         1.0685         |
|          ghostnet_100           | 128 | 1.1107 |  1.1107   |  1.0169  |         1.1107         |
|            repvgg_a2            | 128 |  1.0   |    1.0    |  1.0105  |         1.0674         |
|     swsl_resnext101_32x16d      | 32  | 1.0101 |  1.0101   |  0.9992  |         1.0101         |
|             dla102              | 128 |  1.0   |    1.0    |  0.9641  |          1.0           |
|        adv_inception_v3         | 128 | 1.0001 |  1.0001   |  0.9469  |          1.0           |
|       gluon_inception_v3        | 128 | 1.0001 |  1.0001   |  0.9469  |          1.0           |
|          inception_v3           | 128 | 1.0001 |  1.0001   |  0.9469  |          1.0           |
|           convit_base           | 64  | 1.1577 |  1.1577   |  0.9464  |         0.9715         |
|          cait_m36_384           |  4  | 1.0086 |  1.0086   |  0.934   |         0.9395         |
|            gernet_l             | 128 |  1.0   |    1.0    |  0.9336  |          1.0           |
|             dpn107              | 32  | 1.2334 |  1.2334   |  0.929   |         0.9398         |
|            lcnet_050            | 128 | 1.267  |   1.267   |  0.9279  |         1.0873         |
|           resnest101e           | 64  |  1.0   |    1.0    |  0.926   |         0.9591         |
|           volo_d1_224           | 64  |  1.0   |    1.0    |  0.9076  |         0.9519         |
|           regnety_002           | 128 |  1.0   |  0.9997   |  0.9011  |         0.9997         |
|        twins_pcpvt_base         | 64  | 1.0799 |  1.0799   |  0.8882  |         0.9152         |
|  swin_base_patch4_window7_224   | 64  | 1.3573 |  1.3573   |  0.883   |         0.899          |
|      xcit_large_24_p8_224       |  5  | 1.0001 |  1.0001   |  0.8765  |         0.8818         |
|            pit_b_224            | 64  | 1.0667 |  1.0667   |  0.8608  |         0.8725         |
|          mixer_b16_224          | 128 | 1.1733 |  1.1733   |  0.8569  |         0.8992         |
|         visformer_small         | 128 | 1.1302 |  1.1302   |  0.8474  |         0.8967         |
|      beit_base_patch16_224      | 64  | 1.0655 |  1.0655   |  0.8072  |         0.8323         |
| deit_base_distilled_patch16_224 | 64  | 1.0673 |  1.0673   |  0.7967  |         0.8224         |
|      vit_base_patch16_224       | 64  | 1.066  |   1.066   |  0.7965  |         0.8211         |
|          jx_nest_base           | 32  | 1.1099 |  1.1099   |  0.7852  |         0.7961         |
|          resmlp_12_224          | 128 | 1.1807 |  1.1807   |  0.771   |         0.8453         |
|          gmlp_s16_224           | 128 | 1.0706 |  1.1961   |  0.7311  |         0.7947         |
|          gmixer_24_224          | 128 | 1.1616 |  1.1616   |  0.667   |         0.7162         |
|         crossvit_9_240          | 128 | 1.0501 |  0.7597   |  0.586   |         0.618          |
|        tnt_s_patch16_224        | 128 | 1.2109 |  1.2143   |  0.4363  |         0.4552         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 149.1923 | 149.2683  | 104.7925 |        105.4986        |
|        convmixer_768_32         | 32  | 103.4503 |  103.554  | 82.8589  |        82.8607         |
|          pnasnet5large          | 16  | 87.1141  |  88.0131  | 69.1091  |        69.3241         |
|          convnext_base          | 64  | 93.7803  |  93.8366  | 68.4079  |        68.7651         |
|  swin_base_patch4_window7_224   | 64  | 75.6175  |  75.6251  | 65.2854  |        65.7053         |
|           dm_nfnet_f0           | 128 | 118.7708 | 118.7609  | 64.3937  |         65.807         |
|            hrnet_w18            | 128 | 88.8038  |  94.1599  |  61.907  |        65.6613         |
|            nfnet_l0             | 128 |  93.333  |  93.2669  | 56.2601  |        56.8116         |
|     swsl_resnext101_32x16d      | 32  | 69.0705  |  70.0095  | 54.3424  |        55.6951         |
|          cait_m36_384           |  4  | 84.1241  |  83.3399  | 52.6155  |        53.6378         |
|           res2next50            | 128 | 72.4039  |  72.965   | 51.9822  |        52.6281         |
|          mixer_b16_224          | 128 | 57.1711  |  55.0987  | 50.2668  |        50.2266         |
|           tf_mixnet_l           | 128 | 63.4185  |  63.6169  | 47.9635  |        48.3355         |
|           convit_base           | 64  | 66.0091  |  66.0994  | 47.8865  |        48.2165         |
|            mixnet_l             | 128 | 60.9605  |  61.0818  | 47.2602  |        47.6719         |
|             dla102              | 128 | 69.2312  |  69.3597  | 47.1591  |        47.2852         |
|            pit_b_224            | 64  | 53.5091  |  53.5016  | 45.8215  |        46.1857         |
|           resnest101e           | 64  | 68.2328  |  69.4816  | 43.1241  |        44.6106         |
|        adv_inception_v3         | 128 | 59.2757  |  60.1412  | 43.0922  |        43.0549         |
|       gluon_inception_v3        | 128 | 59.2919  |  60.0902  | 43.0874  |        43.0732         |
|          inception_v3           | 128 | 59.2951  |  60.0798  | 43.0461  |        43.0286         |
|             dpn107              | 32  | 56.3283  |  56.7856  | 42.6949  |        43.2053         |
|        res2net50_14w_8s         | 128 | 61.4325  |  62.238   | 42.5792  |        43.1201         |
|         poolformer_m36          | 64  | 68.8329  |  68.7182  | 42.2239  |        42.7849         |
|        gluon_xception65         | 32  | 47.9668  |  48.3785  | 40.3682  |        40.1352         |
|      beit_base_patch16_224      | 64  | 43.6054  |  43.6353  | 39.0751  |        39.2595         |
|      vit_base_patch16_224       | 64  | 41.1438  |  41.1629  | 37.7777  |         37.737         |
| deit_base_distilled_patch16_224 | 64  | 41.3421  |  41.3454  | 37.6964  |        37.8401         |
|         visformer_small         | 128 | 40.3744  |  40.576   | 37.1107  |        37.4877         |
|        twins_pcpvt_base         | 64  | 40.3829  |  40.4092  | 33.8905  |        34.5775         |
|        res2net101_26w_4s        | 64  | 47.3095  |  48.1181  | 33.2628  |        33.7877         |
|          gmixer_24_224          | 128 |  48.003  |  46.0203  | 32.6857  |         32.828         |
|           volo_d1_224           | 64  | 51.1698  |  51.5128  | 32.5393  |        32.8632         |
|            fbnetv3_b            | 128 | 41.0289  |  41.6895  | 31.9579  |        32.0791         |
|          jx_nest_base           | 32  | 42.8787  |  43.1312  | 30.0473  |        30.4682         |
|          gmlp_s16_224           | 128 | 55.8714  |  49.766   | 29.2488  |        29.3498         |
|          botnet26t_256          | 128 | 42.8027  |  42.8724  | 28.1824  |        28.1392         |
|       eca_botnext26ts_256       | 128 | 42.9088  |  43.0961  | 28.0556  |        28.2598         |
|         coat_lite_mini          | 128 | 41.4042  |  41.4011  | 27.8322  |        28.0837         |
|            gernet_l             | 128 | 37.4633  |  37.7791  | 27.3376  |        27.2569         |
|          cspdarknet53           | 64  | 34.8843  |  35.2782  | 25.6231  |        25.7544         |
|            repvgg_a2            | 128 | 35.6206  |  35.8679  |  25.012  |        24.8768         |
|         crossvit_9_240          | 128 | 31.1044  |  52.5942  | 24.3748  |        24.6695         |
|      xcit_large_24_p8_224       |  5  | 36.3044  |  35.8093  | 22.8528  |        23.4736         |
|       tf_efficientnet_b0        | 128 | 30.8646  |  31.1052  | 22.5788  |        22.6771         |
|           mobilevit_s           | 64  | 28.9446  |  29.0415  | 22.5057  |        22.6671         |
|        sebotnet33ts_256         | 64  | 35.6274  |  35.8246  | 20.6718  |        20.7656         |
|           fbnetc_100            | 128 | 25.9533  |  26.3184  |  19.805  |        19.7679         |
|           rexnet_100            | 128 | 27.3365  |  27.6136  |  19.78   |        19.8209         |
|           selecsls42b           | 128 | 24.6295  |   24.72   | 19.4539  |        19.4744         |
|        ese_vovnet19b_dw         | 128 | 25.7312  |  25.8037  | 19.1276  |        19.0831         |
|            tinynet_a            | 128 | 24.6358  |  25.101   | 18.0523  |        18.1949         |
|          resmlp_12_224          | 128 |  21.852  |  21.9576  | 16.8879  |        16.8753         |
|          spnasnet_100           | 128 | 22.1372  |  22.5349  | 16.5651  |        16.5328         |
|           mnasnet_100           | 128 | 20.6708  |  20.9537  | 15.6149  |        15.5647         |
|         mobilenetv2_100         | 128 | 19.9811  |  20.2817  |  14.205  |        14.1859         |
|      mobilenetv3_large_100      | 128 | 16.7199  |  17.0148  | 12.3002  |        12.3156         |
|          ghostnet_100           | 128 |  16.847  |  19.9368  | 12.0683  |        12.0477         |
|           regnety_002           | 128 | 11.1693  |  11.4997  |  8.095   |         8.3068         |
|            lcnet_050            | 128 |  5.1087  |  5.2683   |  4.0118  |         3.942          |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_087_28_03_23_performance_amp_838/torchbench_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_087_28_03_23_performance_amp_838/huggingface_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_087_28_03_23_performance_amp_838/timm_models_amp.png :

Build Summary

see more

Run name

day_087_28_03_23_performance_amp_838

Commit hashes

pytorch commit: 0c78456e24eab0e175cec7567d2dfa45ecff58dc
pytorch commit date: 2023-03-28 22:46:34+00:00
torchbench commit: d618fa8e06c13bbe441cc929c5d3bf498d0f369c
torchbench commit date: 2023-03-22 15:27:07-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gita7c8d25

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42 williamwen42 changed the title max_autotune run one-off runs Mar 29, 2023
@williamwen42
Copy link
Collaborator

williamwen42 commented Apr 7, 2023

Performance Dashboard for amp precision (inductor max-autotune with cudagraphs)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+-----------------------+------------+-------------+-------------+
|       Compiler        | torchbench | huggingface | timm_models |
+-----------------------+------------+-------------+-------------+
| inductor_max_autotune | 78%, 47/60 | 91%, 41/45  | 95%, 57/60  |
+-----------------------+------------+-------------+-------------+

Geometric mean speedup

+-----------------------+------------+-------------+-------------+
|       Compiler        | torchbench | huggingface | timm_models |
+-----------------------+------------+-------------+-------------+
| inductor_max_autotune |   1.61x    |    1.62x    |    1.42x    |
+-----------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+-----------------------+------------+-------------+-------------+
|       Compiler        | torchbench | huggingface | timm_models |
+-----------------------+------------+-------------+-------------+
| inductor_max_autotune |   348.81   |   210.04    |   497.89    |
+-----------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+-----------------------+------------+-------------+-------------+
|       Compiler        | torchbench | huggingface | timm_models |
+-----------------------+------------+-------------+-------------+
| inductor_max_autotune |   0.77x    |    0.90x    |    0.91x    |
+-----------------------+------------+-------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+-----------------------+
|               name                |  bs  | inductor_max_autotune |
+-----------------------------------+------+-----------------------+
|       functorch_dp_cifar10        |  64  |        3.7416         |
|           BERT_pytorch            |  16  |        3.2846         |
|            densenet121            |  4   |        2.7968         |
|            hf_BigBird             |  2   |        2.6482         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |        2.4523         |
|             hf_Albert             |  8   |         2.355         |
|            hf_T5_large            |  2   |        2.3266         |
|        mobilenet_v3_large         |  32  |        2.0984         |
|              hf_Bart              |  4   |        2.0956         |
|         phlippe_densenet          | 128  |        2.0789         |
|               dlrm                | 1024 |        2.0617         |
|           squeezenet1_1           |  32  |        2.0064         |
|              hf_GPT2              |  4   |        1.9726         |
|               hf_T5               |  8   |        1.9579         |
|              hf_Bert              |  4   |        1.8905         |
|          phlippe_resnet           | 128  |        1.8371         |
|          pytorch_struct           | 200  |        1.8044         |
|      timm_vision_transformer      |  32  |        1.7806         |
|          resnext50_32x4d          |  8   |        1.7364         |
|        speech_transformer         |  32  |        1.7304         |
|            mnasnet1_0             |  32  |        1.7063         |
| attention_is_all_you_need_pytorch | 256  |        1.6814         |
|        shufflenet_v2_x1_0         | 128  |        1.6711         |
|           fastNLP_Bert            |  6   |        1.6644         |
|           hf_Bert_large           |  4   |        1.6419         |
|             resnet18              |  16  |        1.6279         |
|           timm_resnest            |  32  |        1.5636         |
|            timm_nfnet             | 128  |        1.5355         |
|                drq                |  1   |        1.5315         |
|           mobilenet_v2            |  96  |        1.5227         |
|           hf_DistilBert           |  8   |        1.4591         |
|         timm_efficientnet         |  32  |        1.4566         |
|               dcgan               |  32  |        1.4469         |
|           lennard_jones           | 1000 |        1.4177         |
|          LearningToPaint          |  96  |        1.3766         |
|           pytorch_unet            |  1   |        1.3561         |
|          pytorch_stargan          |  16  |        1.2663         |
|               vgg16               |  64  |        1.2499         |
|            Super_SloMo            |  6   |        1.2349         |
|        Background_Matting         |  4   |        1.2171         |
|              yolov3               |  16  |        1.2042         |
|             resnet152             |  32  |        1.1934         |
|             resnet50              |  32  |        1.1808         |
|         soft_actor_critic         | 256  |        1.1772         |
|            hf_Reformer            |  4   |        1.1453         |
|              alexnet              | 128  |         1.13          |
|              demucs               |  4   |        1.0397         |
|            timm_regnet            |  32  |        1.0208         |
|            timm_vovnet            |  32  |        0.9487         |
|            tts_angular            |  64  |        0.9484         |
|      nvidia_deeprecommender       | 256  |        0.9337         |
|               moco                |  0   |          0.0          |
|             tacotron2             |  0   |          0.0          |
|           hf_Longformer           |  0   |          0.0          |
|               sage                |  0   |          0.0          |
|                gcn                |  0   |          0.0          |
|   timm_vision_transformer_large   |  0   |          0.0          |
|           torchrec_dlrm           |  0   |          0.0          |
|                gat                |  0   |          0.0          |
|           hf_GPT2_large           |  0   |          0.0          |
+-----------------------------------+------+-----------------------+

Accuracy

+-----------------------------------+-----+-----------------------+
|               name                | bs  | inductor_max_autotune |
+-----------------------------------+-----+-----------------------+
|            hf_T5_large            |  4  |   pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  |   pass_due_to_skip    |
|           hf_GPT2_large           |  4  |   pass_due_to_skip    |
|             resnet50              |  4  |         pass          |
|            mnasnet1_0             |  4  |         pass          |
|        mobilenet_v3_large         |  4  |         pass          |
|      nvidia_deeprecommender       |  4  |         pass          |
|         phlippe_densenet          |  4  |         pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |         pass          |
|          pytorch_stargan          | 16  |         pass          |
|          pytorch_struct           | 200 |         pass          |
|           pytorch_unet            |  2  |         pass          |
|             resnet152             |  4  |         pass          |
|             resnet18              |  4  |         pass          |
|           BERT_pytorch            |  4  |         pass          |
|           lennard_jones           |  4  |         pass          |
|        shufflenet_v2_x1_0         |  4  |         pass          |
|         soft_actor_critic         | 256 |         pass          |
|        speech_transformer         |  4  |         pass          |
|         timm_efficientnet         |  4  |         pass          |
|            timm_nfnet             |  4  |         pass          |
|            timm_regnet            |  4  |         pass          |
|           timm_resnest            |  4  |         pass          |
|      timm_vision_transformer      |  4  |         pass          |
|            timm_vovnet            |  4  |         pass          |
|            tts_angular            |  4  |         pass          |
|               vgg16               |  4  |         pass          |
|          resnext50_32x4d          |  4  |         pass          |
|           mobilenet_v2            |  4  |         pass          |
|              yolov3               |  4  |         pass          |
|              hf_Bart              |  4  |         pass          |
| attention_is_all_you_need_pytorch |  4  |         pass          |
|               dcgan               |  4  |         pass          |
|              demucs               |  4  |         pass          |
|            densenet121            |  4  |         pass          |
|               dlrm                |  4  |         pass          |
|            Super_SloMo            |  4  |         pass          |
|           fastNLP_Bert            |  4  |         pass          |
|       functorch_dp_cifar10        |  4  |         pass          |
|            hf_T5_base             |  4  |         pass          |
|          LearningToPaint          |  4  |         pass          |
|             hf_Albert             |  4  |         pass          |
|              hf_Bert              |  4  |         pass          |
|           hf_Bert_large           |  4  |         pass          |
|            hf_BigBird             |  4  |         pass          |
|           hf_DistilBert           |  4  |         pass          |
|              hf_GPT2              |  2  |         pass          |
|            hf_Reformer            |  4  |         pass          |
|               hf_T5               |  4  |         pass          |
|              alexnet              |  4  |         pass          |
|           hf_Longformer           |  4  |      fail_to_run      |
|               moco                |  4  |      fail_to_run      |
|           squeezenet1_1           |  4  |     fail_accuracy     |
|          phlippe_resnet           |  4  |     fail_accuracy     |
|                drq                |  1  |     fail_accuracy     |
|          vision_maskrcnn          |  4  |    eager_variation    |
|        Background_Matting         |  4  |    eager_variation    |
|           torchrec_dlrm           |  0  |        0.0000         |
|               llama               |  0  |        0.0000         |
|             tacotron2             |  0  |        0.0000         |
|               sage                |  0  |        0.0000         |
|                gcn                |  0  |        0.0000         |
|                gat                |  0  |        0.0000         |
+-----------------------------------+-----+-----------------------+

Compilation latency (sec)

+-----------------------------------+------+-----------------------+
|               name                |  bs  | inductor_max_autotune |
+-----------------------------------+------+-----------------------+
|            densenet121            |  4   |       1210.9224       |
|        speech_transformer         |  32  |       866.9875        |
|         phlippe_densenet          | 128  |       851.8078        |
| attention_is_all_you_need_pytorch | 256  |       727.7114        |
|            mnasnet1_0             |  32  |       609.9616        |
|        mobilenet_v3_large         |  32  |       563.3833        |
|           mobilenet_v2            |  96  |       550.3897        |
|            hf_BigBird             |  2   |       533.2365        |
|            hf_T5_large            |  2   |        494.556        |
|      timm_vision_transformer      |  32  |       477.4533        |
|              yolov3               |  16  |       473.3135        |
|            timm_regnet            |  32  |       464.3725        |
|            timm_nfnet             | 128  |       446.6786        |
|             hf_Albert             |  8   |        442.971        |
|         timm_efficientnet         |  32  |        423.847        |
|           fastNLP_Bert            |  6   |        420.551        |
|          pytorch_struct           | 200  |       393.8293        |
|               dlrm                | 1024 |       383.5522        |
|           BERT_pytorch            |  16  |       381.3677        |
|            timm_vovnet            |  32  |       380.7795        |
|          resnext50_32x4d          |  8   |       375.2733        |
|                drq                |  1   |       331.8744        |
|        shufflenet_v2_x1_0         | 128  |       331.8666        |
|           hf_Bert_large           |  4   |       330.7902        |
|          LearningToPaint          |  96  |       327.6767        |
|            Super_SloMo            |  6   |       305.3558        |
|               hf_T5               |  8   |       276.8124        |
|           squeezenet1_1           |  32  |       260.9367        |
|             resnet18              |  16  |       259.5868        |
|      nvidia_deeprecommender       | 256  |        253.237        |
|       functorch_dp_cifar10        |  64  |       251.3416        |
|           pytorch_unet            |  1   |       250.9091        |
|        Background_Matting         |  4   |       248.3087        |
|               vgg16               |  64  |       234.7503        |
|              alexnet              | 128  |       218.6421        |
|            hf_Reformer            |  4   |       215.0925        |
|           timm_resnest            |  32  |       210.3936        |
|              hf_GPT2              |  4   |       203.1842        |
|          phlippe_resnet           | 128  |       200.4492        |
|         soft_actor_critic         | 256  |       184.3741        |
|             resnet152             |  32  |        182.207        |
|              hf_Bart              |  4   |       179.6105        |
|           lennard_jones           | 1000 |       155.9248        |
|   pytorch_CycleGAN_and_pix2pix    |  1   |       134.3312        |
|              hf_Bert              |  4   |       104.8345        |
|          pytorch_stargan          |  16  |        87.0111        |
|              demucs               |  4   |        72.8124        |
|               dcgan               |  32  |        61.981         |
|           hf_DistilBert           |  8   |        54.9767        |
|             resnet50              |  32  |        28.2371        |
|            tts_angular            |  64  |        5.2083         |
|                gat                |  0   |          nan          |
|                gcn                |  0   |          nan          |
|           hf_GPT2_large           |  0   |          nan          |
|           hf_Longformer           |  0   |          nan          |
|               moco                |  0   |          nan          |
|               sage                |  0   |          nan          |
|             tacotron2             |  0   |          nan          |
|   timm_vision_transformer_large   |  0   |          nan          |
|           torchrec_dlrm           |  0   |          nan          |
+-----------------------------------+------+-----------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+-----------------------+
|               name                |  bs  | inductor_max_autotune |
+-----------------------------------+------+-----------------------+
|            Super_SloMo            |  6   |        1.1595         |
|             hf_Albert             |  8   |        1.0399         |
|           mobilenet_v2            |  96  |        1.0102         |
|               hf_T5               |  8   |        0.9988         |
|           fastNLP_Bert            |  6   |        0.9953         |
|            tts_angular            |  64  |        0.9895         |
| attention_is_all_you_need_pytorch | 256  |        0.9693         |
|            timm_nfnet             | 128  |        0.9617         |
|               dlrm                | 1024 |        0.9466         |
|           BERT_pytorch            |  16  |        0.9428         |
|              hf_Bert              |  4   |        0.9421         |
|              hf_GPT2              |  4   |        0.9319         |
|         timm_efficientnet         |  32  |        0.9282         |
|           hf_Bert_large           |  4   |        0.9138         |
|              yolov3               |  16  |        0.8685         |
|        shufflenet_v2_x1_0         | 128  |         0.865         |
|        speech_transformer         |  32  |        0.8588         |
|            timm_regnet            |  32  |        0.8479         |
|           hf_DistilBert           |  8   |        0.8456         |
|      timm_vision_transformer      |  32  |        0.8357         |
|             resnet50              |  32  |        0.8346         |
|        Background_Matting         |  4   |        0.8333         |
|             resnet152             |  32  |        0.8323         |
|           timm_resnest            |  32  |        0.8293         |
|            hf_T5_large            |  2   |        0.8201         |
|         phlippe_densenet          | 128  |        0.7988         |
|        mobilenet_v3_large         |  32  |         0.785         |
|          pytorch_stargan          |  16  |        0.7724         |
|           pytorch_unet            |  1   |        0.7708         |
|              demucs               |  4   |        0.7661         |
|              hf_Bart              |  4   |        0.7626         |
|           squeezenet1_1           |  32  |        0.7625         |
|            timm_vovnet            |  32  |        0.7457         |
|            mnasnet1_0             |  32  |        0.7428         |
|          pytorch_struct           | 200  |        0.7341         |
|               vgg16               |  64  |        0.7228         |
|              alexnet              | 128  |        0.7091         |
|            densenet121            |  4   |        0.7088         |
|            hf_BigBird             |  2   |         0.696         |
|      nvidia_deeprecommender       | 256  |        0.6585         |
|          resnext50_32x4d          |  8   |        0.6558         |
|          LearningToPaint          |  96  |        0.6006         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |        0.5607         |
|             resnet18              |  16  |        0.5357         |
|            hf_Reformer            |  4   |        0.4622         |
|       functorch_dp_cifar10        |  64  |        0.4063         |
|          phlippe_resnet           | 128  |        0.3272         |
|                drq                |  1   |        0.1818         |
|               dcgan               |  32  |        0.1811         |
|         soft_actor_critic         | 256  |        0.1108         |
|           lennard_jones           | 1000 |        0.0648         |
|                gat                |  0   |          nan          |
|                gcn                |  0   |          nan          |
|           hf_GPT2_large           |  0   |          nan          |
|           hf_Longformer           |  0   |          nan          |
|               moco                |  0   |          nan          |
|               sage                |  0   |          nan          |
|             tacotron2             |  0   |          nan          |
|   timm_vision_transformer_large   |  0   |          nan          |
|           torchrec_dlrm           |  0   |          nan          |
+-----------------------------------+------+-----------------------+

Absolute latency (ms)

+-----------------------------------+------+-----------------------+
|               name                |  bs  | inductor_max_autotune |
+-----------------------------------+------+-----------------------+
|        Background_Matting         |  4   |       103.4522        |
|            hf_T5_large            |  2   |        98.2323        |
|               hf_T5               |  8   |        91.6259        |
|            timm_nfnet             | 128  |        76.4085        |
|            hf_BigBird             |  2   |        73.4201        |
|            hf_Reformer            |  4   |        70.6724        |
|            Super_SloMo            |  6   |        64.3118        |
|              yolov3               |  16  |        56.8469        |
|            timm_regnet            |  32  |        54.7105        |
|               vgg16               |  64  |        52.9478        |
|             resnet152             |  32  |        52.7409        |
|              demucs               |  4   |        51.9206        |
|           hf_Bert_large           |  4   |        50.6931        |
|        speech_transformer         |  32  |        34.397         |
|              hf_Bart              |  4   |        33.2546        |
| attention_is_all_you_need_pytorch | 256  |        32.5595        |
|           fastNLP_Bert            |  6   |        31.487         |
|           mobilenet_v2            |  96  |        30.8072        |
|           pytorch_unet            |  1   |        29.3277        |
|             hf_Albert             |  8   |        29.0455        |
|            timm_vovnet            |  32  |        26.2372        |
|              hf_GPT2              |  4   |        24.6515        |
|         timm_efficientnet         |  32  |         22.29         |
|              hf_Bert              |  4   |        22.1109        |
|             resnet50              |  32  |        21.9717        |
|           hf_DistilBert           |  8   |        21.4556        |
|            densenet121            |  4   |        21.3784        |
|        shufflenet_v2_x1_0         | 128  |        18.6762        |
|           BERT_pytorch            |  16  |        17.1107        |
|      timm_vision_transformer      |  32  |        16.7119        |
|           timm_resnest            |  32  |        15.3226        |
|            mnasnet1_0             |  32  |         13.26         |
|        mobilenet_v3_large         |  32  |        12.9303        |
|          resnext50_32x4d          |  8   |        12.117         |
|          pytorch_stargan          |  16  |        11.6841        |
|         phlippe_densenet          | 128  |        11.4957        |
|      nvidia_deeprecommender       | 256  |        10.9424        |
|              alexnet              | 128  |         8.679         |
|          LearningToPaint          |  96  |        8.3947         |
|            tts_angular            |  64  |        6.6692         |
|             resnet18              |  16  |        5.8876         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |        5.8258         |
|           squeezenet1_1           |  32  |         5.321         |
|          phlippe_resnet           | 128  |        4.9519         |
|       functorch_dp_cifar10        |  64  |        2.8959         |
|          pytorch_struct           | 200  |        2.7246         |
|                drq                |  1   |        2.1961         |
|               dlrm                | 1024 |          2.1          |
|               dcgan               |  32  |        1.5496         |
|         soft_actor_critic         | 256  |        1.3602         |
|           lennard_jones           | 1000 |        1.1525         |
|                gat                |  0   |          nan          |
|                gcn                |  0   |          nan          |
|           hf_GPT2_large           |  0   |          nan          |
|           hf_Longformer           |  0   |          nan          |
|               moco                |  0   |          nan          |
|               sage                |  0   |          nan          |
|             tacotron2             |  0   |          nan          |
|   timm_vision_transformer_large   |  0   |          nan          |
|           torchrec_dlrm           |  0   |          nan          |
+-----------------------------------+------+-----------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+-----------------------+
|                  name                   | bs  | inductor_max_autotune |
+-----------------------------------------+-----+-----------------------+
|             OPTForCausalLM              |  2  |        2.5219         |
|      GPT2ForSequenceClassification      |  4  |        2.3089         |
|          MobileBertForMaskedLM          | 64  |        2.1917         |
|       MT5ForConditionalGeneration       | 16  |        2.1437         |
|       ElectraForQuestionAnswering       | 64  |        2.1284         |
|     M2M100ForConditionalGeneration      | 16  |        1.9772         |
|               DistillGPT2               | 16  |        1.9024         |
|            PLBartForCausalLM            |  8  |        1.8722         |
|           ElectraForCausalLM            | 32  |        1.8501         |
|            XLNetLMHeadModel             |  8  |        1.8247         |
|    LayoutLMForSequenceClassification    | 16  |        1.7923         |
|       RobertaForQuestionAnswering       | 16  |        1.7901         |
|        BertForQuestionAnswering         | 16  |        1.7802         |
|     PLBartForConditionalGeneration      |  4  |         1.735         |
|             XGLMForCausalLM             |  8  |        1.7268         |
|           RobertaForCausalLM            | 16  |        1.6711         |
|       T5ForConditionalGeneration        |  4  |        1.6644         |
|                 T5Small                 |  4  |        1.6595         |
|             BartForCausalLM             |  4  |        1.6579         |
|            MBartForCausalLM             |  4  |        1.6514         |
|       AlbertForQuestionAnswering        |  4  |        1.6512         |
|            YituTechConvBert             | 16  |        1.6384         |
|            AlbertForMaskedLM            |  4  |        1.6346         |
|    MegatronBertForQuestionAnswering     |  8  |        1.6248         |
|                CamemBert                | 16  |        1.6231         |
|      BartForConditionalGeneration       |  2  |        1.6161         |
|             BertForMaskedLM             | 16  |        1.5983         |
|           LayoutLMForMaskedLM           | 16  |        1.5828         |
|      MBartForConditionalGeneration      |  2  |        1.5361         |
|         Speech2Text2ForCausalLM         | 256 |        1.5178         |
|         MegatronBertForCausalLM         |  4  |        1.4962         |
|     DistilBertForQuestionAnswering      | 256 |        1.4593         |
| BlenderbotSmallForConditionalGeneration | 64  |        1.4572         |
|     PegasusForConditionalGeneration     | 32  |        1.4289         |
|     MobileBertForQuestionAnswering      | 128 |        1.3941         |
|            TrOCRForCausalLM             | 32  |        1.3932         |
|       BlenderbotSmallForCausalLM        | 64  |        1.3854         |
|           PegasusForCausalLM            | 32  |        1.3282         |
|          DistilBertForMaskedLM          | 128 |        1.2159         |
|       DebertaForQuestionAnswering       |  8  |        1.0628         |
|           DebertaForMaskedLM            |  4  |        0.9939         |
|          DebertaV2ForMaskedLM           |  1  |        0.8898         |
|      DebertaV2ForQuestionAnswering      |  2  |        0.8359         |
|          BlenderbotForCausalLM          |  0  |          0.0          |
|          AllenaiLongformerBase          |  0  |          0.0          |
+-----------------------------------------+-----+-----------------------+

Accuracy

+-----------------------------------------+----+-----------------------+
|                  name                   | bs | inductor_max_autotune |
+-----------------------------------------+----+-----------------------+
|          BlenderbotForCausalLM          | 1  |   pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  |   pass_due_to_skip    |
|            AlbertForMaskedLM            | 1  |         pass          |
|           PegasusForCausalLM            | 1  |         pass          |
|       MT5ForConditionalGeneration       | 1  |         pass          |
|         MegatronBertForCausalLM         | 1  |         pass          |
|    MegatronBertForQuestionAnswering     | 1  |         pass          |
|          MobileBertForMaskedLM          | 1  |         pass          |
|     MobileBertForQuestionAnswering      | 1  |         pass          |
|             OPTForCausalLM              | 1  |         pass          |
|            PLBartForCausalLM            | 1  |         pass          |
|     PLBartForConditionalGeneration      | 1  |         pass          |
|     PegasusForConditionalGeneration     | 1  |         pass          |
|            MBartForCausalLM             | 1  |         pass          |
|           RobertaForCausalLM            | 1  |         pass          |
|       RobertaForQuestionAnswering       | 1  |         pass          |
|         Speech2Text2ForCausalLM         | 1  |         pass          |
|       T5ForConditionalGeneration        | 1  |         pass          |
|                 T5Small                 | 1  |         pass          |
|            TrOCRForCausalLM             | 1  |         pass          |
|             XGLMForCausalLM             | 1  |         pass          |
|            XLNetLMHeadModel             | 1  |         pass          |
|      MBartForConditionalGeneration      | 1  |         pass          |
|    LayoutLMForSequenceClassification    | 1  |         pass          |
|     M2M100ForConditionalGeneration      | 1  |         pass          |
|           DebertaForMaskedLM            | 1  |         pass          |
|          AllenaiLongformerBase          | 1  |         pass          |
|             BartForCausalLM             | 1  |         pass          |
|      BartForConditionalGeneration       | 1  |         pass          |
|             BertForMaskedLM             | 1  |         pass          |
|        BertForQuestionAnswering         | 1  |         pass          |
|       BlenderbotSmallForCausalLM        | 1  |         pass          |
| BlenderbotSmallForConditionalGeneration | 1  |         pass          |
|                CamemBert                | 1  |         pass          |
|       DebertaForQuestionAnswering       | 1  |         pass          |
|          DistilBertForMaskedLM          | 1  |         pass          |
|     DistilBertForQuestionAnswering      | 1  |         pass          |
|               DistillGPT2               | 1  |         pass          |
|           ElectraForCausalLM            | 1  |         pass          |
|       ElectraForQuestionAnswering       | 1  |         pass          |
|      GPT2ForSequenceClassification      | 1  |         pass          |
|           LayoutLMForMaskedLM           | 1  |         pass          |
|            YituTechConvBert             | 1  |         pass          |
|      DebertaV2ForQuestionAnswering      | 1  |      fail_to_run      |
|       AlbertForQuestionAnswering        | 1  |     fail_accuracy     |
+-----------------------------------------+----+-----------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------------------+
|                  name                   | bs  | inductor_max_autotune |
+-----------------------------------------+-----+-----------------------+
|          MobileBertForMaskedLM          | 64  |        641.83         |
|     MobileBertForQuestionAnswering      | 128 |       618.0127        |
|       MT5ForConditionalGeneration       | 16  |        598.155        |
|          DebertaV2ForMaskedLM           |  1  |       482.9045        |
|           ElectraForCausalLM            | 32  |       385.9535        |
|      DebertaV2ForQuestionAnswering      |  2  |       357.7603        |
|            AlbertForMaskedLM            |  4  |       346.3587        |
|            XLNetLMHeadModel             |  8  |       305.4182        |
|             XGLMForCausalLM             |  8  |       298.9284        |
|     M2M100ForConditionalGeneration      | 16  |       292.4152        |
|       T5ForConditionalGeneration        |  4  |        272.078        |
|       ElectraForQuestionAnswering       | 64  |       255.4572        |
|            YituTechConvBert             | 16  |       244.8943        |
|            TrOCRForCausalLM             | 32  |       244.1556        |
|      BartForConditionalGeneration       |  2  |       241.7899        |
|             BertForMaskedLM             | 16  |       233.1334        |
|       BlenderbotSmallForCausalLM        | 64  |       207.1343        |
|          DistilBertForMaskedLM          | 128 |       204.2688        |
|       DebertaForQuestionAnswering       |  8  |       203.9153        |
|             BartForCausalLM             |  4  |       194.9556        |
|      GPT2ForSequenceClassification      |  4  |       194.0422        |
|     DistilBertForQuestionAnswering      | 256 |       189.3398        |
|    LayoutLMForSequenceClassification    | 16  |       167.9029        |
|           DebertaForMaskedLM            |  4  |       160.1921        |
|         Speech2Text2ForCausalLM         | 256 |       153.7871        |
|    MegatronBertForQuestionAnswering     |  8  |       145.4149        |
|               DistillGPT2               | 16  |       134.5174        |
|             OPTForCausalLM              |  2  |       121.5937        |
|      MBartForConditionalGeneration      |  2  |       112.0044        |
|           PegasusForCausalLM            | 32  |        106.749        |
|     PegasusForConditionalGeneration     | 32  |       103.4952        |
|         MegatronBertForCausalLM         |  4  |       103.1821        |
|     PLBartForConditionalGeneration      |  4  |        97.4698        |
|        BertForQuestionAnswering         | 16  |        95.2555        |
|       AlbertForQuestionAnswering        |  4  |        89.417         |
| BlenderbotSmallForConditionalGeneration | 64  |        79.5304        |
|            PLBartForCausalLM            |  8  |        76.8499        |
|                CamemBert                | 16  |        73.4491        |
|            MBartForCausalLM             |  4  |        53.9876        |
|                 T5Small                 |  4  |        45.9055        |
|           RobertaForCausalLM            | 16  |        45.5067        |
|           LayoutLMForMaskedLM           | 16  |         41.4          |
|       RobertaForQuestionAnswering       | 16  |        38.3782        |
|          AllenaiLongformerBase          |  0  |          nan          |
|          BlenderbotForCausalLM          |  0  |          nan          |
+-----------------------------------------+-----+-----------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+-----------------------+
|                  name                   | bs  | inductor_max_autotune |
+-----------------------------------------+-----+-----------------------+
|            XLNetLMHeadModel             |  8  |        1.1342         |
|      GPT2ForSequenceClassification      |  4  |        1.1135         |
|       ElectraForQuestionAnswering       | 64  |        1.1114         |
|             OPTForCausalLM              |  2  |         1.094         |
|        BertForQuestionAnswering         | 16  |        1.0868         |
|       RobertaForQuestionAnswering       | 16  |        1.0865         |
|    LayoutLMForSequenceClassification    | 16  |        1.0583         |
|           RobertaForCausalLM            | 16  |        1.0541         |
|            YituTechConvBert             | 16  |        1.0402         |
|                 T5Small                 |  4  |        1.0382         |
|       T5ForConditionalGeneration        |  4  |        1.0356         |
|     DistilBertForQuestionAnswering      | 256 |        1.0299         |
|           LayoutLMForMaskedLM           | 16  |        1.0078         |
|             BertForMaskedLM             | 16  |        0.9864         |
|                CamemBert                | 16  |        0.9828         |
|       AlbertForQuestionAnswering        |  4  |        0.9734         |
|           ElectraForCausalLM            | 32  |        0.9731         |
|               DistillGPT2               | 16  |        0.9682         |
|            AlbertForMaskedLM            |  4  |        0.9574         |
|    MegatronBertForQuestionAnswering     |  8  |         0.953         |
|     PLBartForConditionalGeneration      |  4  |        0.9294         |
|            MBartForCausalLM             |  4  |        0.9281         |
|           PegasusForCausalLM            | 32  |         0.893         |
|            TrOCRForCausalLM             | 32  |        0.8836         |
|             BartForCausalLM             |  4  |        0.8818         |
|     PegasusForConditionalGeneration     | 32  |        0.8687         |
|      MBartForConditionalGeneration      |  2  |        0.8672         |
|      BartForConditionalGeneration       |  2  |        0.8456         |
|         MegatronBertForCausalLM         |  4  |         0.845         |
|            PLBartForCausalLM            |  8  |        0.8437         |
|       MT5ForConditionalGeneration       | 16  |        0.8222         |
| BlenderbotSmallForConditionalGeneration | 64  |         0.816         |
|          DistilBertForMaskedLM          | 128 |        0.8045         |
|     M2M100ForConditionalGeneration      | 16  |        0.7651         |
|          MobileBertForMaskedLM          | 64  |         0.752         |
|       BlenderbotSmallForCausalLM        | 64  |        0.7355         |
|         Speech2Text2ForCausalLM         | 256 |        0.7143         |
|             XGLMForCausalLM             |  8  |        0.7117         |
|     MobileBertForQuestionAnswering      | 128 |        0.6505         |
|           DebertaForMaskedLM            |  4  |        0.5504         |
|          DebertaV2ForMaskedLM           |  1  |        0.5138         |
|      DebertaV2ForQuestionAnswering      |  2  |        0.4821         |
|       DebertaForQuestionAnswering       |  8  |        0.4604         |
|          AllenaiLongformerBase          |  0  |          nan          |
|          BlenderbotForCausalLM          |  0  |          nan          |
+-----------------------------------------+-----+-----------------------+

Absolute latency (ms)

+-----------------------------------------+-----+-----------------------+
|                  name                   | bs  | inductor_max_autotune |
+-----------------------------------------+-----+-----------------------+
|            AlbertForMaskedLM            |  4  |       162.9778        |
|       AlbertForQuestionAnswering        |  4  |       160.0323        |
|            XLNetLMHeadModel             |  8  |       152.8289        |
|      DebertaV2ForQuestionAnswering      |  2  |       129.8578        |
|     MobileBertForQuestionAnswering      | 128 |       124.2626        |
|          DebertaV2ForMaskedLM           |  1  |       117.7692        |
|     PegasusForConditionalGeneration     | 32  |       105.6076        |
|            TrOCRForCausalLM             | 32  |       100.0077        |
|      BartForConditionalGeneration       |  2  |        90.9734        |
|      MBartForConditionalGeneration      |  2  |        89.4901        |
|    MegatronBertForQuestionAnswering     |  8  |        87.4238        |
|          MobileBertForMaskedLM          | 64  |        80.8102        |
|            YituTechConvBert             | 16  |        76.6409        |
| BlenderbotSmallForConditionalGeneration | 64  |        75.9591        |
|     M2M100ForConditionalGeneration      | 16  |        74.0544        |
|                CamemBert                | 16  |        72.8774        |
|       DebertaForQuestionAnswering       |  8  |        72.3622        |
|     DistilBertForQuestionAnswering      | 256 |        71.0903        |
|           LayoutLMForMaskedLM           | 16  |        71.0194        |
|          DistilBertForMaskedLM          | 128 |        69.6847        |
|            MBartForCausalLM             |  4  |        69.1411        |
|             BertForMaskedLM             | 16  |        68.8905        |
|           RobertaForCausalLM            | 16  |        68.7737        |
|     PLBartForConditionalGeneration      |  4  |        68.4457        |
|             BartForCausalLM             |  4  |        68.2489        |
|             OPTForCausalLM              |  2  |        67.7134        |
|           DebertaForMaskedLM            |  4  |        64.0079        |
|       T5ForConditionalGeneration        |  4  |        62.8488        |
|                 T5Small                 |  4  |        62.8476        |
|            PLBartForCausalLM            |  8  |        61.7269        |
|         MegatronBertForCausalLM         |  4  |        57.6586        |
|               DistillGPT2               | 16  |        55.5112        |
|    LayoutLMForSequenceClassification    | 16  |        54.4499        |
|       ElectraForQuestionAnswering       | 64  |        53.7627        |
|        BertForQuestionAnswering         | 16  |        53.4591        |
|       RobertaForQuestionAnswering       | 16  |        53.3137        |
|           PegasusForCausalLM            | 32  |        53.0044        |
|             XGLMForCausalLM             |  8  |        52.2883        |
|           ElectraForCausalLM            | 32  |        47.5662        |
|       MT5ForConditionalGeneration       | 16  |        42.9764        |
|       BlenderbotSmallForCausalLM        | 64  |        41.8483        |
|      GPT2ForSequenceClassification      |  4  |        39.5726        |
|         Speech2Text2ForCausalLM         | 256 |        34.7189        |
|          AllenaiLongformerBase          |  0  |          nan          |
|          BlenderbotForCausalLM          |  0  |          nan          |
+-----------------------------------------+-----+-----------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+-----------------------+
|              name               | bs  | inductor_max_autotune |
+---------------------------------+-----+-----------------------+
|        tnt_s_patch16_224        | 128 |        3.3194         |
|        twins_pcpvt_base         | 64  |        2.1355         |
|      xcit_large_24_p8_224       |  5  |         2.11          |
|         coat_lite_mini          | 128 |        2.0666         |
|          gmixer_24_224          | 128 |        1.8929         |
|          gmlp_s16_224           | 128 |         1.858         |
|          ghostnet_100           | 128 |        1.8472         |
|         crossvit_9_240          | 128 |        1.8215         |
|           volo_d1_224           | 64  |        1.7402         |
|           convit_base           | 64  |        1.7127         |
|  swin_base_patch4_window7_224   | 64  |        1.7101         |
|            lcnet_050            | 128 |        1.6936         |
|            pit_b_224            | 64  |        1.6036         |
|       gluon_inception_v3        | 128 |        1.5392         |
|          inception_v3           | 128 |         1.539         |
|        adv_inception_v3         | 128 |        1.5353         |
|          jx_nest_base           | 32  |        1.5332         |
|             dla102              | 128 |        1.5258         |
|        sebotnet33ts_256         | 64  |         1.515         |
|          convnext_base          | 64  |        1.4921         |
|           dm_nfnet_f0           | 128 |        1.4853         |
|            nfnet_l0             | 128 |        1.4848         |
|           mobilevit_s           | 64  |        1.4742         |
|      beit_base_patch16_224      | 64  |        1.4573         |
|       eca_botnext26ts_256       | 128 |        1.4484         |
|          cait_m36_384           |  4  |        1.4476         |
|           regnety_002           | 128 |        1.4428         |
|      mobilenetv3_large_100      | 128 |        1.4334         |
|           mnasnet_100           | 128 |        1.4275         |
|           resnest101e           | 64  |        1.4233         |
|           selecsls42b           | 128 |        1.4112         |
|          botnet26t_256          | 128 |        1.4094         |
|        res2net50_14w_8s         | 128 |        1.3973         |
|          mixer_b16_224          | 128 |        1.3956         |
|          resmlp_12_224          | 128 |        1.3944         |
|         mobilenetv2_100         | 128 |        1.3898         |
|            hrnet_w18            | 128 |        1.3838         |
|           res2next50            | 128 |        1.3707         |
|        ese_vovnet19b_dw         | 128 |         1.364         |
|          spnasnet_100           | 128 |        1.3517         |
|       tf_efficientnet_b0        | 128 |        1.3513         |
|           fbnetc_100            | 128 |        1.3493         |
|      vit_base_patch16_224       | 64  |         1.346         |
|         poolformer_m36          | 64  |        1.3271         |
|            fbnetv3_b            | 128 |        1.3191         |
| deit_base_distilled_patch16_224 | 64  |        1.3168         |
|           rexnet_100            | 128 |        1.2987         |
|          cspdarknet53           | 64  |        1.2264         |
|            tinynet_a            | 128 |        1.2254         |
|         visformer_small         | 128 |        1.2055         |
|           tf_mixnet_l           | 128 |         1.19          |
|            mixnet_l             | 128 |        1.1786         |
|        res2net101_26w_4s        | 64  |        1.1749         |
|          pnasnet5large          | 16  |        1.1241         |
|             dpn107              | 32  |        1.0918         |
|        gluon_xception65         | 32  |        1.0849         |
|            repvgg_a2            | 128 |        1.0847         |
|     swsl_resnext101_32x16d      | 32  |        1.0601         |
|            gernet_l             | 128 |        1.0416         |
|        convmixer_768_32         | 32  |        1.0078         |
+---------------------------------+-----+-----------------------+

Accuracy

+---------------------------------+----+-----------------------+
|              name               | bs | inductor_max_autotune |
+---------------------------------+----+-----------------------+
|        adv_inception_v3         | 8  |         pass          |
|          resmlp_12_224          | 8  |         pass          |
|         mobilenetv2_100         | 8  |         pass          |
|      mobilenetv3_large_100      | 8  |         pass          |
|           mobilevit_s           | 8  |         pass          |
|            nfnet_l0             | 8  |         pass          |
|            pit_b_224            | 8  |         pass          |
|          pnasnet5large          | 8  |         pass          |
|         poolformer_m36          | 8  |         pass          |
|           regnety_002           | 8  |         pass          |
|            repvgg_a2            | 8  |         pass          |
|        res2net101_26w_4s        | 8  |         pass          |
|        res2net50_14w_8s         | 8  |         pass          |
|           res2next50            | 8  |         pass          |
|           resnest101e           | 8  |         pass          |
|            mixnet_l             | 8  |         pass          |
|           rexnet_100            | 8  |         pass          |
|        sebotnet33ts_256         | 8  |         pass          |
|           selecsls42b           | 8  |         pass          |
|          spnasnet_100           | 8  |         pass          |
|     swsl_resnext101_32x16d      | 8  |         pass          |
|       tf_efficientnet_b0        | 8  |         pass          |
|           tf_mixnet_l           | 8  |         pass          |
|            tinynet_a            | 8  |         pass          |
|        tnt_s_patch16_224        | 8  |         pass          |
|         visformer_small         | 8  |         pass          |
|      vit_base_patch16_224       | 8  |         pass          |
|           volo_d1_224           | 8  |         pass          |
|      beit_base_patch16_224      | 8  |         pass          |
|           mnasnet_100           | 8  |         pass          |
|          mixer_b16_224          | 8  |         pass          |
|       eca_botnext26ts_256       | 8  |         pass          |
|          botnet26t_256          | 8  |         pass          |
|          cait_m36_384           | 4  |         pass          |
|           convit_base           | 8  |         pass          |
|        convmixer_768_32         | 8  |         pass          |
|          convnext_base          | 8  |         pass          |
|         crossvit_9_240          | 8  |         pass          |
|          cspdarknet53           | 8  |         pass          |
| deit_base_distilled_patch16_224 | 8  |         pass          |
|             dla102              | 8  |         pass          |
|           dm_nfnet_f0           | 8  |         pass          |
|            lcnet_050            | 8  |         pass          |
|             dpn107              | 8  |         pass          |
|        ese_vovnet19b_dw         | 8  |         pass          |
|           fbnetc_100            | 8  |         pass          |
|            fbnetv3_b            | 8  |         pass          |
|            gernet_l             | 8  |         pass          |
|          ghostnet_100           | 8  |         pass          |
|       gluon_inception_v3        | 8  |         pass          |
|        gluon_xception65         | 8  |         pass          |
|          gmixer_24_224          | 8  |         pass          |
|          gmlp_s16_224           | 8  |         pass          |
|            hrnet_w18            | 8  |         pass          |
|          inception_v3           | 8  |         pass          |
|          jx_nest_base           | 8  |         pass          |
|      xcit_large_24_p8_224       | 8  |         pass          |
|  swin_base_patch4_window7_224   | 8  |     fail_accuracy     |
|        twins_pcpvt_base         | 0  |        0.0000         |
|         coat_lite_mini          | 0  |        0.0000         |
+---------------------------------+----+-----------------------+

Compilation latency (sec)

+---------------------------------+-----+-----------------------+
|              name               | bs  | inductor_max_autotune |
+---------------------------------+-----+-----------------------+
|        twins_pcpvt_base         | 64  |       1688.7575       |
|           mobilevit_s           | 64  |       1572.9991       |
|         coat_lite_mini          | 128 |       1364.6126       |
|         crossvit_9_240          | 128 |       1264.9017       |
|           rexnet_100            | 128 |       1099.5804       |
|      xcit_large_24_p8_224       |  5  |       1088.1898       |
|           volo_d1_224           | 64  |       1084.1596       |
|  swin_base_patch4_window7_224   | 64  |       985.0894        |
|            pit_b_224            | 64  |       983.8723        |
|          jx_nest_base           | 32  |       975.7591        |
|          cait_m36_384           |  4  |       953.7136        |
|          ghostnet_100           | 128 |       915.6073        |
|            hrnet_w18            | 128 |       852.9444        |
|            mixnet_l             | 128 |       827.5129        |
|        sebotnet33ts_256         | 64  |       814.3166        |
|        adv_inception_v3         | 128 |       788.1443        |
|        res2net50_14w_8s         | 128 |       785.2897        |
|          botnet26t_256          | 128 |       758.1937        |
|        res2net101_26w_4s        | 64  |       727.2155        |
|             dpn107              | 32  |       702.1314        |
|            fbnetv3_b            | 128 |       674.6594        |
|          pnasnet5large          | 16  |        628.957        |
|           fbnetc_100            | 128 |       584.9966        |
|        tnt_s_patch16_224        | 128 |       579.6948        |
|          convnext_base          | 64  |       526.0179        |
|            tinynet_a            | 128 |       510.8161        |
|           regnety_002           | 128 |        477.458        |
|             dla102              | 128 |       459.0892        |
|         visformer_small         | 128 |       444.6314        |
|           convit_base           | 64  |       435.4578        |
|           resnest101e           | 64  |       435.2065        |
|          cspdarknet53           | 64  |       395.4526        |
|        gluon_xception65         | 32  |       355.2383        |
|            nfnet_l0             | 128 |       336.8407        |
|      beit_base_patch16_224      | 64  |       336.6734        |
|         poolformer_m36          | 64  |       333.2378        |
|          gmixer_24_224          | 128 |       330.7234        |
|       eca_botnext26ts_256       | 128 |       326.0591        |
|            gernet_l             | 128 |        320.009        |
|       tf_efficientnet_b0        | 128 |       316.5582        |
|           selecsls42b           | 128 |       289.8516        |
|           mnasnet_100           | 128 |       285.7957        |
|        ese_vovnet19b_dw         | 128 |       285.7598        |
| deit_base_distilled_patch16_224 | 64  |       270.0214        |
|            repvgg_a2            | 128 |       269.9819        |
|          mixer_b16_224          | 128 |       259.1909        |
|            lcnet_050            | 128 |       225.9875        |
|          gmlp_s16_224           | 128 |       196.2178        |
|      mobilenetv3_large_100      | 128 |        189.084        |
|     swsl_resnext101_32x16d      | 32  |       179.7337        |
|          resmlp_12_224          | 128 |       173.4721        |
|           res2next50            | 128 |       145.9784        |
|         mobilenetv2_100         | 128 |       126.8471        |
|        convmixer_768_32         | 32  |       109.2042        |
|           tf_mixnet_l           | 128 |        92.7005        |
|          spnasnet_100           | 128 |        76.5348        |
|       gluon_inception_v3        | 128 |        56.6582        |
|          inception_v3           | 128 |        56.136         |
|      vit_base_patch16_224       | 64  |        48.2327        |
|           dm_nfnet_f0           | 128 |        39.9079        |
+---------------------------------+-----+-----------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+-----------------------+
|              name               | bs  | inductor_max_autotune |
+---------------------------------+-----+-----------------------+
|          gmlp_s16_224           | 128 |        1.1841         |
|          pnasnet5large          | 16  |        1.1522         |
|          gmixer_24_224          | 128 |        1.1129         |
|           convit_base           | 64  |        1.0948         |
|         mobilenetv2_100         | 128 |        1.0267         |
|           dm_nfnet_f0           | 128 |         1.013         |
|          resmlp_12_224          | 128 |         1.01          |
|            tinynet_a            | 128 |        0.9985         |
|           resnest101e           | 64  |        0.9933         |
|       tf_efficientnet_b0        | 128 |        0.9875         |
|        tnt_s_patch16_224        | 128 |        0.9834         |
|           rexnet_100            | 128 |        0.9744         |
|        twins_pcpvt_base         | 64  |        0.9729         |
|        convmixer_768_32         | 32  |         0.967         |
|             dla102              | 128 |        0.9528         |
|          mixer_b16_224          | 128 |        0.9439         |
|      vit_base_patch16_224       | 64  |        0.9362         |
|           tf_mixnet_l           | 128 |        0.9344         |
|      beit_base_patch16_224      | 64  |        0.9284         |
|           mobilevit_s           | 64  |        0.9263         |
|         visformer_small         | 128 |        0.9245         |
|            fbnetv3_b            | 128 |         0.917         |
|            nfnet_l0             | 128 |        0.9101         |
|          cspdarknet53           | 64  |        0.9098         |
| deit_base_distilled_patch16_224 | 64  |        0.9072         |
|           volo_d1_224           | 64  |        0.9068         |
|        ese_vovnet19b_dw         | 128 |        0.8976         |
|        sebotnet33ts_256         | 64  |        0.8908         |
|       gluon_inception_v3        | 128 |        0.8902         |
|          inception_v3           | 128 |        0.8902         |
|        adv_inception_v3         | 128 |        0.8902         |
|            hrnet_w18            | 128 |        0.8889         |
|        gluon_xception65         | 32  |        0.8833         |
|          spnasnet_100           | 128 |        0.8788         |
|      xcit_large_24_p8_224       |  5  |        0.8761         |
|       eca_botnext26ts_256       | 128 |        0.8738         |
|            mixnet_l             | 128 |        0.8685         |
|           mnasnet_100           | 128 |        0.8684         |
|             dpn107              | 32  |        0.8676         |
|           res2next50            | 128 |        0.8659         |
|      mobilenetv3_large_100      | 128 |         0.865         |
|          cait_m36_384           |  4  |        0.8633         |
|         poolformer_m36          | 64  |        0.8599         |
|           fbnetc_100            | 128 |        0.8597         |
|            pit_b_224            | 64  |        0.8566         |
|        res2net101_26w_4s        | 64  |        0.8506         |
|        res2net50_14w_8s         | 128 |        0.8501         |
|            gernet_l             | 128 |        0.8494         |
|           selecsls42b           | 128 |        0.8473         |
|     swsl_resnext101_32x16d      | 32  |        0.8461         |
|          ghostnet_100           | 128 |        0.8408         |
|         coat_lite_mini          | 128 |        0.8402         |
|          convnext_base          | 64  |         0.832         |
|          botnet26t_256          | 128 |        0.8241         |
|            lcnet_050            | 128 |        0.8174         |
|           regnety_002           | 128 |        0.7846         |
|            repvgg_a2            | 128 |        0.7738         |
|         crossvit_9_240          | 128 |        0.7525         |
|  swin_base_patch4_window7_224   | 64  |        0.7214         |
|          jx_nest_base           | 32  |        0.6693         |
+---------------------------------+-----+-----------------------+

Absolute latency (ms)

+---------------------------------+-----+-----------------------+
|              name               | bs  | inductor_max_autotune |
+---------------------------------+-----+-----------------------+
|        convmixer_768_32         | 32  |        297.715        |
|            hrnet_w18            | 128 |       201.8791        |
|          pnasnet5large          | 16  |        174.487        |
|           tf_mixnet_l           | 128 |       158.9862        |
|            mixnet_l             | 128 |       153.5053        |
|          cait_m36_384           |  4  |       115.5031        |
|           resnest101e           | 64  |       114.6772        |
|             dla102              | 128 |       112.6617        |
|     swsl_resnext101_32x16d      | 32  |       111.8307        |
|         poolformer_m36          | 64  |       109.0074        |
|        adv_inception_v3         | 128 |       104.2656        |
|       gluon_inception_v3        | 128 |       104.0856        |
|          inception_v3           | 128 |       103.9529        |
|        res2net50_14w_8s         | 128 |       100.7721        |
|             dpn107              | 32  |        97.2715        |
|        tnt_s_patch16_224        | 128 |        97.2103        |
|           convit_base           | 64  |        95.154         |
|           res2next50            | 128 |        91.7484        |
|        gluon_xception65         | 32  |        91.2787        |
|  swin_base_patch4_window7_224   | 64  |        85.4506        |
|           dm_nfnet_f0           | 128 |        85.0143        |
|        res2net101_26w_4s        | 64  |        84.7107        |
|          mixer_b16_224          | 128 |        83.7046        |
|            fbnetv3_b            | 128 |        82.9483        |
|          convnext_base          | 64  |        81.8352        |
|         visformer_small         | 128 |        75.483         |
|            nfnet_l0             | 128 |        75.172         |
|          gmlp_s16_224           | 128 |        73.8028        |
|            pit_b_224            | 64  |        73.6116        |
|       eca_botnext26ts_256       | 128 |        73.1492        |
|          cspdarknet53           | 64  |        72.1408        |
|          botnet26t_256          | 128 |        70.3273        |
|      beit_base_patch16_224      | 64  |        69.9807        |
|            gernet_l             | 128 |        69.8907        |
|           volo_d1_224           | 64  |        69.0967        |
|            repvgg_a2            | 128 |        66.9311        |
|          jx_nest_base           | 32  |        65.2306        |
| deit_base_distilled_patch16_224 | 64  |        64.6438        |
|      vit_base_patch16_224       | 64  |        64.3562        |
|          gmixer_24_224          | 128 |        62.1661        |
|       tf_efficientnet_b0        | 128 |        60.2175        |
|      xcit_large_24_p8_224       |  5  |        59.6563        |
|           rexnet_100            | 128 |        58.6476        |
|           fbnetc_100            | 128 |        58.2957        |
|            tinynet_a            | 128 |        56.7487        |
|        twins_pcpvt_base         | 64  |        55.8003        |
|           mobilevit_s           | 64  |        55.2001        |
|         coat_lite_mini          | 128 |        54.5083        |
|        sebotnet33ts_256         | 64  |        50.7858        |
|          spnasnet_100           | 128 |        49.0136        |
|          ghostnet_100           | 128 |        48.6445        |
|        ese_vovnet19b_dw         | 128 |        45.4094        |
|         crossvit_9_240          | 128 |        44.9644        |
|         mobilenetv2_100         | 128 |        44.7157        |
|           mnasnet_100           | 128 |        42.6209        |
|           selecsls42b           | 128 |        42.4078        |
|      mobilenetv3_large_100      | 128 |        40.5618        |
|          resmlp_12_224          | 128 |        38.0244        |
|           regnety_002           | 128 |        26.5598        |
|            lcnet_050            | 128 |        17.6225        |
+---------------------------------+-----+-----------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_095_05_04_23_performance_amp_283/torchbench_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_095_05_04_23_performance_amp_283/huggingface_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_095_05_04_23_performance_amp_283/timm_models_amp.png :

Build Summary

see more

Run name

day_095_05_04_23_performance_amp_283

Commit hashes

pytorch commit: f55e72c0f6bd6da016aaa51de379e6ba6d7891cc
pytorch commit date: 2023-04-07 17:30:27+00:00
torchbench commit: 735f1927996c8d9ab81f0b0c05dd1ebdb26a6250
torchbench commit date: 2023-04-05 09:43:21-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitf55e72c

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

Performance Dashboard for amp precision (inductor max-autotune without cudagraphs)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+-------------------------------------+------------+-------------+-------------+
|              Compiler               | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor_max_autotune_no_cudagraphs | 80%, 48/60 | 96%, 43/45  | 95%, 57/60  |
+-------------------------------------+------------+-------------+-------------+

Geometric mean speedup

+-------------------------------------+------------+-------------+-------------+
|              Compiler               | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor_max_autotune_no_cudagraphs |   1.32x    |    1.57x    |    1.40x    |
+-------------------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+-------------------------------------+------------+-------------+-------------+
|              Compiler               | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor_max_autotune_no_cudagraphs |   360.44   |   222.95    |   497.02    |
+-------------------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+-------------------------------------+------------+-------------+-------------+
|              Compiler               | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
| inductor_max_autotune_no_cudagraphs |   0.88x    |    1.02x    |    1.01x    |
+-------------------------------------+------------+-------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+-------------------------------------+
|               name                |  bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+------+-------------------------------------+
|             hf_Albert             |  8   |               2.3013                |
|           BERT_pytorch            |  16  |               2.1213                |
|               hf_T5               |  8   |                2.007                |
|            hf_T5_large            |  2   |               1.9716                |
|              hf_GPT2              |  4   |               1.9456                |
|           hf_GPT2_large           |  4   |               1.8553                |
|        speech_transformer         |  32  |               1.7828                |
|   pytorch_CycleGAN_and_pix2pix    |  1   |               1.7737                |
|            hf_BigBird             |  2   |               1.7036                |
| attention_is_all_you_need_pytorch | 256  |               1.6701                |
|              hf_Bert              |  4   |               1.6423                |
|           hf_Bert_large           |  4   |               1.6183                |
|           fastNLP_Bert            |  6   |               1.6095                |
|              hf_Bart              |  4   |               1.5487                |
|      timm_vision_transformer      |  32  |               1.5202                |
|            timm_nfnet             | 128  |               1.4864                |
|           timm_resnest            |  32  |               1.4809                |
|           hf_DistilBert           |  8   |                1.472                |
|           mobilenet_v2            |  96  |               1.4695                |
|       functorch_dp_cifar10        |  64  |                1.378                |
|           squeezenet1_1           |  32  |               1.3735                |
|           pytorch_unet            |  1   |               1.3534                |
|          pytorch_struct           | 200  |               1.3482                |
|               vgg16               |  64  |               1.2635                |
|               dlrm                | 1024 |               1.2449                |
|          pytorch_stargan          |  16  |               1.2381                |
|            Super_SloMo            |  6   |               1.2342                |
|        Background_Matting         |  4   |               1.2138                |
|              yolov3               |  16  |               1.2071                |
|        shufflenet_v2_x1_0         | 128  |               1.2004                |
|              alexnet              | 128  |               1.1836                |
|        mobilenet_v3_large         |  32  |               1.1808                |
|   timm_vision_transformer_large   |  32  |               1.1654                |
|                drq                |  1   |               1.1516                |
|      nvidia_deeprecommender       | 256  |               1.1048                |
|            mnasnet1_0             |  32  |               1.0859                |
|          LearningToPaint          |  96  |               1.0833                |
|            hf_Reformer            |  4   |               1.0677                |
|           lennard_jones           | 1000 |               1.0671                |
|             resnet50              |  32  |               1.0633                |
|          phlippe_resnet           | 128  |               1.0544                |
|         timm_efficientnet         |  32  |               1.0499                |
|         phlippe_densenet          | 128  |               1.0464                |
|            densenet121            |  4   |               1.0444                |
|              demucs               |  4   |                1.038                |
|             resnet152             |  32  |               1.0293                |
|            timm_regnet            |  32  |               0.9828                |
|          resnext50_32x4d          |  8   |               0.9655                |
|            tts_angular            |  64  |               0.9547                |
|             resnet18              |  16  |               0.9463                |
|            timm_vovnet            |  32  |               0.9212                |
|         soft_actor_critic         | 256  |               0.8621                |
|               dcgan               |  32  |               0.8277                |
|               sage                |  0   |                 0.0                 |
|           hf_Longformer           |  0   |                 0.0                 |
|             tacotron2             |  0   |                 0.0                 |
|               moco                |  0   |                 0.0                 |
|           torchrec_dlrm           |  0   |                 0.0                 |
|                gcn                |  0   |                 0.0                 |
|                gat                |  0   |                 0.0                 |
+-----------------------------------+------+-------------------------------------+

Accuracy

+-----------------------------------+-----+-------------------------------------+
|               name                | bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+-----+-------------------------------------+
|            hf_T5_large            |  4  |          pass_due_to_skip           |
|   timm_vision_transformer_large   |  4  |          pass_due_to_skip           |
|           hf_GPT2_large           |  4  |          pass_due_to_skip           |
|             resnet50              |  4  |                pass                 |
|            mnasnet1_0             |  4  |                pass                 |
|        mobilenet_v3_large         |  4  |                pass                 |
|      nvidia_deeprecommender       |  4  |                pass                 |
|         phlippe_densenet          |  4  |                pass                 |
|   pytorch_CycleGAN_and_pix2pix    |  1  |                pass                 |
|          pytorch_stargan          | 16  |                pass                 |
|          pytorch_struct           | 200 |                pass                 |
|           pytorch_unet            |  2  |                pass                 |
|             resnet152             |  4  |                pass                 |
|             resnet18              |  4  |                pass                 |
|           BERT_pytorch            |  4  |                pass                 |
|           lennard_jones           |  4  |                pass                 |
|        shufflenet_v2_x1_0         |  4  |                pass                 |
|         soft_actor_critic         | 256 |                pass                 |
|        speech_transformer         |  4  |                pass                 |
|         timm_efficientnet         |  4  |                pass                 |
|            timm_nfnet             |  4  |                pass                 |
|            timm_regnet            |  4  |                pass                 |
|           timm_resnest            |  4  |                pass                 |
|      timm_vision_transformer      |  4  |                pass                 |
|            timm_vovnet            |  4  |                pass                 |
|            tts_angular            |  4  |                pass                 |
|               vgg16               |  4  |                pass                 |
|          resnext50_32x4d          |  4  |                pass                 |
|           mobilenet_v2            |  4  |                pass                 |
|              yolov3               |  4  |                pass                 |
|              hf_Bart              |  4  |                pass                 |
| attention_is_all_you_need_pytorch |  4  |                pass                 |
|               dcgan               |  4  |                pass                 |
|              demucs               |  4  |                pass                 |
|            densenet121            |  4  |                pass                 |
|               dlrm                |  4  |                pass                 |
|            Super_SloMo            |  4  |                pass                 |
|           fastNLP_Bert            |  4  |                pass                 |
|       functorch_dp_cifar10        |  4  |                pass                 |
|            hf_T5_base             |  4  |                pass                 |
|          LearningToPaint          |  4  |                pass                 |
|             hf_Albert             |  4  |                pass                 |
|              hf_Bert              |  4  |                pass                 |
|            hf_BigBird             |  4  |                pass                 |
|           hf_DistilBert           |  4  |                pass                 |
|              hf_GPT2              |  2  |                pass                 |
|            hf_Reformer            |  4  |                pass                 |
|               hf_T5               |  4  |                pass                 |
|              alexnet              |  4  |                pass                 |
|           hf_Bert_large           |  4  |             fail_to_run             |
|           hf_Longformer           |  4  |             fail_to_run             |
|               moco                |  4  |             fail_to_run             |
|           squeezenet1_1           |  4  |            fail_accuracy            |
|          phlippe_resnet           |  4  |            fail_accuracy            |
|                drq                |  1  |            fail_accuracy            |
|          vision_maskrcnn          |  4  |           eager_variation           |
|        Background_Matting         |  4  |           eager_variation           |
|           torchrec_dlrm           |  0  |               0.0000                |
|               llama               |  0  |               0.0000                |
|             tacotron2             |  0  |               0.0000                |
|               sage                |  0  |               0.0000                |
|                gcn                |  0  |               0.0000                |
|                gat                |  0  |               0.0000                |
+-----------------------------------+-----+-------------------------------------+

Compilation latency (sec)

+-----------------------------------+------+-------------------------------------+
|               name                |  bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+------+-------------------------------------+
|            densenet121            |  4   |              1220.2865              |
|        speech_transformer         |  32  |              872.3027               |
|           hf_GPT2_large           |  4   |              862.7958               |
|         phlippe_densenet          | 128  |              843.1009               |
| attention_is_all_you_need_pytorch | 256  |              742.8618               |
|            mnasnet1_0             |  32  |               603.51                |
|        mobilenet_v3_large         |  32  |              561.2653               |
|           mobilenet_v2            |  96  |              548.8599               |
|            hf_BigBird             |  2   |               504.069               |
|            hf_T5_large            |  2   |              494.9345               |
|              yolov3               |  16  |              472.1285               |
|   timm_vision_transformer_large   |  32  |              469.5215               |
|      timm_vision_transformer      |  32  |              469.4433               |
|            timm_regnet            |  32  |              458.2416               |
|            timm_nfnet             | 128  |              452.6057               |
|             hf_Albert             |  8   |              443.6373               |
|           fastNLP_Bert            |  6   |               432.484               |
|         timm_efficientnet         |  32  |               416.368               |
|          pytorch_struct           | 200  |              380.1856               |
|          resnext50_32x4d          |  8   |              378.2311               |
|               dlrm                | 1024 |              367.2021               |
|           BERT_pytorch            |  16  |              356.5031               |
|           hf_Bert_large           |  4   |              328.6867               |
|        shufflenet_v2_x1_0         | 128  |              327.7534               |
|                drq                |  1   |              325.9405               |
|            timm_vovnet            |  32  |               321.087               |
|            Super_SloMo            |  6   |              301.9892               |
|               hf_T5               |  8   |              298.1788               |
|          LearningToPaint          |  96  |              295.7467               |
|      nvidia_deeprecommender       | 256  |              269.3865               |
|           pytorch_unet            |  1   |              261.0985               |
|             resnet18              |  16  |              256.4528               |
|           squeezenet1_1           |  32  |              248.7594               |
|               vgg16               |  64  |              242.8924               |
|       functorch_dp_cifar10        |  64  |              241.1431               |
|              alexnet              | 128  |              228.0223               |
|              hf_GPT2              |  4   |              226.4292               |
|           timm_resnest            |  32  |              218.0259               |
|          phlippe_resnet           | 128  |              212.9454               |
|            hf_Reformer            |  4   |              206.2104               |
|         soft_actor_critic         | 256  |              194.8455               |
|             resnet152             |  32  |              183.7911               |
|              hf_Bart              |  4   |               182.812               |
|        Background_Matting         |  4   |              178.2901               |
|           lennard_jones           | 1000 |              158.1878               |
|   pytorch_CycleGAN_and_pix2pix    |  1   |              135.3382               |
|              hf_Bert              |  4   |              103.9135               |
|          pytorch_stargan          |  16  |               86.6805               |
|              demucs               |  4   |               76.9034               |
|           hf_DistilBert           |  8   |               61.5585               |
|               dcgan               |  32  |               39.0268               |
|             resnet50              |  32  |               28.1121               |
|            tts_angular            |  64  |               4.8102                |
|                gat                |  0   |                 nan                 |
|                gcn                |  0   |                 nan                 |
|           hf_Longformer           |  0   |                 nan                 |
|               moco                |  0   |                 nan                 |
|               sage                |  0   |                 nan                 |
|             tacotron2             |  0   |                 nan                 |
|           torchrec_dlrm           |  0   |                 nan                 |
+-----------------------------------+------+-------------------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+-------------------------------------+
|               name                |  bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+------+-------------------------------------+
|             hf_Albert             |  8   |               1.1991                |
|               hf_T5               |  8   |               1.1719                |
|           BERT_pytorch            |  16  |               1.1689                |
|            hf_T5_large            |  2   |               1.1595                |
|            Super_SloMo            |  6   |               1.1595                |
|           fastNLP_Bert            |  6   |                1.147                |
|           hf_GPT2_large           |  4   |                1.134                |
|           mobilenet_v2            |  96  |               1.1007                |
| attention_is_all_you_need_pytorch | 256  |               1.0885                |
|            hf_BigBird             |  2   |               1.0756                |
|            timm_nfnet             | 128  |                1.072                |
|              hf_GPT2              |  4   |               1.0707                |
|           hf_Bert_large           |  4   |               1.0453                |
|        Background_Matting         |  4   |               1.0399                |
|              yolov3               |  16  |               1.0062                |
|            tts_angular            |  64  |               0.9983                |
|               vgg16               |  64  |               0.9938                |
|             resnet50              |  32  |               0.9921                |
|              hf_Bert              |  4   |                0.974                |
|   timm_vision_transformer_large   |  32  |               0.9725                |
|              demucs               |  4   |               0.9657                |
|           timm_resnest            |  32  |               0.9652                |
|        shufflenet_v2_x1_0         | 128  |               0.9628                |
|               dlrm                | 1024 |               0.9565                |
|            timm_regnet            |  32  |               0.9521                |
|         timm_efficientnet         |  32  |                0.94                 |
|             resnet152             |  32  |               0.9392                |
|           hf_DistilBert           |  8   |                0.932                |
|              hf_Bart              |  4   |               0.9175                |
|      nvidia_deeprecommender       | 256  |               0.9175                |
|           pytorch_unet            |  1   |               0.8949                |
|              alexnet              | 128  |               0.8908                |
|      timm_vision_transformer      |  32  |               0.8835                |
|            timm_vovnet            |  32  |                0.882                |
|        mobilenet_v3_large         |  32  |               0.8702                |
|         phlippe_densenet          | 128  |               0.8648                |
|        speech_transformer         |  32  |               0.8606                |
|           squeezenet1_1           |  32  |               0.8434                |
|            hf_Reformer            |  4   |               0.8029                |
|            densenet121            |  4   |               0.7981                |
|          pytorch_stargan          |  16  |                0.783                |
|            mnasnet1_0             |  32  |               0.7752                |
|          resnext50_32x4d          |  8   |               0.7558                |
|          pytorch_struct           | 200  |               0.7362                |
|          LearningToPaint          |  96  |               0.7295                |
|             resnet18              |  16  |               0.6019                |
|   pytorch_CycleGAN_and_pix2pix    |  1   |               0.5911                |
|       functorch_dp_cifar10        |  64  |               0.4424                |
|          phlippe_resnet           | 128  |               0.3394                |
|                drq                |  1   |               0.1965                |
|               dcgan               |  32  |               0.1873                |
|         soft_actor_critic         | 256  |               0.1141                |
|           lennard_jones           | 1000 |               0.0666                |
|                gat                |  0   |                 nan                 |
|                gcn                |  0   |                 nan                 |
|           hf_Longformer           |  0   |                 nan                 |
|               moco                |  0   |                 nan                 |
|               sage                |  0   |                 nan                 |
|             tacotron2             |  0   |                 nan                 |
|           torchrec_dlrm           |  0   |                 nan                 |
+-----------------------------------+------+-------------------------------------+

Absolute latency (ms)

+-----------------------------------+------+-------------------------------------+
|               name                |  bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+------+-------------------------------------+
|   timm_vision_transformer_large   |  32  |              398.4664               |
|            hf_BigBird             |  2   |              115.0004               |
|            hf_T5_large            |  2   |              114.9077               |
|           hf_GPT2_large           |  4   |               112.83                |
|        Background_Matting         |  4   |              103.7789               |
|               hf_T5               |  8   |               89.5281               |
|            timm_nfnet             | 128  |               79.536                |
|            hf_Reformer            |  4   |               75.8413               |
|            Super_SloMo            |  6   |               64.3573               |
|             resnet152             |  32  |                61.98                |
|              yolov3               |  16  |               56.9666               |
|            timm_regnet            |  32  |               56.7352               |
|            densenet121            |  4   |               54.8286               |
|               vgg16               |  64  |               52.3855               |
|           hf_Bert_large           |  4   |               51.707                |
|              demucs               |  4   |               51.628                |
|              hf_Bart              |  4   |               49.4292               |
|           fastNLP_Bert            |  6   |               33.3076               |
| attention_is_all_you_need_pytorch | 256  |               32.9483               |
|        speech_transformer         |  32  |               32.5738               |
|           mobilenet_v2            |  96  |                32.09                |
|         timm_efficientnet         |  32  |               31.1444               |
|             hf_Albert             |  8   |               29.7185               |
|           pytorch_unet            |  1   |               29.433                |
|            timm_vovnet            |  32  |               27.2264               |
|        shufflenet_v2_x1_0         | 128  |               26.0945               |
|           BERT_pytorch            |  16  |               26.0669               |
|              hf_Bert              |  4   |               25.2858               |
|              hf_GPT2              |  4   |               25.2109               |
|             resnet50              |  32  |               25.1884               |
|        mobilenet_v3_large         |  32  |               23.766                |
|         phlippe_densenet          | 128  |               22.884                |
|      timm_vision_transformer      |  32  |               21.5993               |
|          resnext50_32x4d          |  8   |               21.5213               |
|           hf_DistilBert           |  8   |               21.4008               |
|            mnasnet1_0             |  32  |               21.3485               |
|           timm_resnest            |  32  |               16.3099               |
|          pytorch_stargan          |  16  |               11.8717               |
|          LearningToPaint          |  96  |               10.7695               |
|             resnet18              |  16  |               10.0338               |
|      nvidia_deeprecommender       | 256  |                9.254                |
|          phlippe_resnet           | 128  |               8.8261                |
|              alexnet              | 128  |               8.2875                |
|       functorch_dp_cifar10        |  64  |               7.7689                |
|   pytorch_CycleGAN_and_pix2pix    |  1   |               7.6824                |
|           squeezenet1_1           |  32  |               7.5651                |
|            tts_angular            |  64  |               6.5613                |
|          pytorch_struct           | 200  |               3.6031                |
|               dlrm                | 1024 |               3.5055                |
|                drq                |  1   |               3.0625                |
|               dcgan               |  32  |               2.6613                |
|         soft_actor_critic         | 256  |               1.9039                |
|           lennard_jones           | 1000 |                1.539                |
|                gat                |  0   |                 nan                 |
|                gcn                |  0   |                 nan                 |
|           hf_Longformer           |  0   |                 nan                 |
|               moco                |  0   |                 nan                 |
|               sage                |  0   |                 nan                 |
|             tacotron2             |  0   |                 nan                 |
|           torchrec_dlrm           |  0   |                 nan                 |
+-----------------------------------+------+-------------------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+-------------------------------------+
|                  name                   | bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+-----+-------------------------------------+
|             OPTForCausalLM              |  2  |               2.5239                |
|      GPT2ForSequenceClassification      |  4  |               2.3477                |
|       ElectraForQuestionAnswering       | 64  |               2.0998                |
|               DistillGPT2               | 16  |                1.958                |
|       MT5ForConditionalGeneration       | 16  |               1.9534                |
|            PLBartForCausalLM            |  8  |               1.9276                |
|           ElectraForCausalLM            | 32  |                1.833                |
|            XLNetLMHeadModel             |  8  |               1.8255                |
|    LayoutLMForSequenceClassification    | 16  |               1.7915                |
|                 T5Small                 |  4  |               1.7771                |
|       RobertaForQuestionAnswering       | 16  |               1.7735                |
|       T5ForConditionalGeneration        |  4  |               1.7718                |
|        BertForQuestionAnswering         | 16  |               1.7677                |
|     PLBartForConditionalGeneration      |  4  |               1.7639                |
|      BartForConditionalGeneration       |  2  |               1.7328                |
|             BartForCausalLM             |  4  |               1.6963                |
|            MBartForCausalLM             |  4  |               1.6848                |
|           RobertaForCausalLM            | 16  |                1.67                 |
|             XGLMForCausalLM             |  8  |               1.6486                |
|       AlbertForQuestionAnswering        |  4  |                1.644                |
|    MegatronBertForQuestionAnswering     |  8  |               1.6421                |
|            AlbertForMaskedLM            |  4  |                1.625                |
|            YituTechConvBert             | 16  |               1.6188                |
|                CamemBert                | 16  |               1.6128                |
|           LayoutLMForMaskedLM           | 16  |                1.609                |
|     M2M100ForConditionalGeneration      | 16  |               1.6015                |
|             BertForMaskedLM             | 16  |               1.5876                |
|      MBartForConditionalGeneration      |  2  |               1.5766                |
|         Speech2Text2ForCausalLM         | 256 |               1.5756                |
|         MegatronBertForCausalLM         |  4  |               1.5563                |
| BlenderbotSmallForConditionalGeneration | 64  |               1.4753                |
|     DistilBertForQuestionAnswering      | 256 |               1.4538                |
|           PegasusForCausalLM            | 32  |                1.435                |
|     PegasusForConditionalGeneration     | 32  |               1.4324                |
|            TrOCRForCausalLM             | 32  |               1.4192                |
|       BlenderbotSmallForCausalLM        | 64  |               1.4111                |
|          BlenderbotForCausalLM          |  4  |               1.2585                |
|          DistilBertForMaskedLM          | 128 |               1.2406                |
|          MobileBertForMaskedLM          | 64  |               1.2012                |
|     MobileBertForQuestionAnswering      | 128 |               1.1555                |
|       DebertaForQuestionAnswering       |  8  |               0.9571                |
|           DebertaForMaskedLM            |  4  |               0.8277                |
|      DebertaV2ForQuestionAnswering      |  2  |               0.7041                |
|          DebertaV2ForMaskedLM           |  1  |               0.6608                |
|          AllenaiLongformerBase          |  0  |                 0.0                 |
+-----------------------------------------+-----+-------------------------------------+

Accuracy

+-----------------------------------------+----+-------------------------------------+
|                  name                   | bs | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+----+-------------------------------------+
|          BlenderbotForCausalLM          | 1  |          pass_due_to_skip           |
|          DebertaV2ForMaskedLM           | 1  |          pass_due_to_skip           |
|            AlbertForMaskedLM            | 1  |                pass                 |
|           PegasusForCausalLM            | 1  |                pass                 |
|       MT5ForConditionalGeneration       | 1  |                pass                 |
|         MegatronBertForCausalLM         | 1  |                pass                 |
|    MegatronBertForQuestionAnswering     | 1  |                pass                 |
|          MobileBertForMaskedLM          | 1  |                pass                 |
|     MobileBertForQuestionAnswering      | 1  |                pass                 |
|             OPTForCausalLM              | 1  |                pass                 |
|            PLBartForCausalLM            | 1  |                pass                 |
|     PLBartForConditionalGeneration      | 1  |                pass                 |
|     PegasusForConditionalGeneration     | 1  |                pass                 |
|            MBartForCausalLM             | 1  |                pass                 |
|           RobertaForCausalLM            | 1  |                pass                 |
|       RobertaForQuestionAnswering       | 1  |                pass                 |
|         Speech2Text2ForCausalLM         | 1  |                pass                 |
|       T5ForConditionalGeneration        | 1  |                pass                 |
|                 T5Small                 | 1  |                pass                 |
|            TrOCRForCausalLM             | 1  |                pass                 |
|             XGLMForCausalLM             | 1  |                pass                 |
|            XLNetLMHeadModel             | 1  |                pass                 |
|      MBartForConditionalGeneration      | 1  |                pass                 |
|    LayoutLMForSequenceClassification    | 1  |                pass                 |
|     M2M100ForConditionalGeneration      | 1  |                pass                 |
|           DebertaForMaskedLM            | 1  |                pass                 |
|          AllenaiLongformerBase          | 1  |                pass                 |
|             BartForCausalLM             | 1  |                pass                 |
|      BartForConditionalGeneration       | 1  |                pass                 |
|             BertForMaskedLM             | 1  |                pass                 |
|        BertForQuestionAnswering         | 1  |                pass                 |
|       BlenderbotSmallForCausalLM        | 1  |                pass                 |
| BlenderbotSmallForConditionalGeneration | 1  |                pass                 |
|                CamemBert                | 1  |                pass                 |
|       DebertaForQuestionAnswering       | 1  |                pass                 |
|      DebertaV2ForQuestionAnswering      | 1  |                pass                 |
|          DistilBertForMaskedLM          | 1  |                pass                 |
|     DistilBertForQuestionAnswering      | 1  |                pass                 |
|               DistillGPT2               | 1  |                pass                 |
|           ElectraForCausalLM            | 1  |                pass                 |
|       ElectraForQuestionAnswering       | 1  |                pass                 |
|      GPT2ForSequenceClassification      | 1  |                pass                 |
|           LayoutLMForMaskedLM           | 1  |                pass                 |
|            YituTechConvBert             | 1  |                pass                 |
|       AlbertForQuestionAnswering        | 1  |            fail_accuracy            |
+-----------------------------------------+----+-------------------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-------------------------------------+
|                  name                   | bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+-----+-------------------------------------+
|          BlenderbotForCausalLM          |  4  |              649.4784               |
|          MobileBertForMaskedLM          | 64  |              639.3457               |
|     MobileBertForQuestionAnswering      | 128 |              616.8377               |
|       MT5ForConditionalGeneration       | 16  |              612.7483               |
|           ElectraForCausalLM            | 32  |              421.2196               |
|          DebertaV2ForMaskedLM           |  1  |              399.9344               |
|            AlbertForMaskedLM            |  4  |               342.793               |
|             XGLMForCausalLM             |  8  |              314.3379               |
|            XLNetLMHeadModel             |  8  |              299.0966               |
|     M2M100ForConditionalGeneration      | 16  |              298.5031               |
|      DebertaV2ForQuestionAnswering      |  2  |              288.9307               |
|       ElectraForQuestionAnswering       | 64  |              281.8455               |
|       T5ForConditionalGeneration        |  4  |              271.9564               |
|             BertForMaskedLM             | 16  |              244.5709               |
|            YituTechConvBert             | 16  |              242.3426               |
|      BartForConditionalGeneration       |  2  |              233.1992               |
|          DistilBertForMaskedLM          | 128 |              230.7913               |
|            TrOCRForCausalLM             | 32  |              230.0665               |
|      GPT2ForSequenceClassification      |  4  |              221.7695               |
|     DistilBertForQuestionAnswering      | 256 |              213.5598               |
|             BartForCausalLM             |  4  |              202.7918               |
|       BlenderbotSmallForCausalLM        | 64  |              190.1801               |
|    LayoutLMForSequenceClassification    | 16  |              176.0581               |
|       DebertaForQuestionAnswering       |  8  |              171.8915               |
|               DistillGPT2               | 16  |              168.6432               |
|         Speech2Text2ForCausalLM         | 256 |              155.1008               |
|    MegatronBertForQuestionAnswering     |  8  |              147.9811               |
|           DebertaForMaskedLM            |  4  |              126.1295               |
|             OPTForCausalLM              |  2  |              122.2415               |
|         MegatronBertForCausalLM         |  4  |              111.1831               |
|      MBartForConditionalGeneration      |  2  |              110.6487               |
|           PegasusForCausalLM            | 32  |               107.636               |
|     PegasusForConditionalGeneration     | 32  |              102.5265               |
|     PLBartForConditionalGeneration      |  4  |               97.2393               |
|       AlbertForQuestionAnswering        |  4  |               92.8096               |
|        BertForQuestionAnswering         | 16  |               92.5737               |
|            PLBartForCausalLM            |  8  |               76.7421               |
|                CamemBert                | 16  |               76.3326               |
| BlenderbotSmallForConditionalGeneration | 64  |               75.592                |
|            MBartForCausalLM             |  4  |               53.1392               |
|                 T5Small                 |  4  |               45.5744               |
|           RobertaForCausalLM            | 16  |               44.4327               |
|           LayoutLMForMaskedLM           | 16  |               40.6821               |
|       RobertaForQuestionAnswering       | 16  |               38.2552               |
|          AllenaiLongformerBase          |  0  |                 nan                 |
+-----------------------------------------+-----+-------------------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+-------------------------------------+
|                  name                   | bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+-----+-------------------------------------+
|       AlbertForQuestionAnswering        |  4  |               1.3106                |
|            AlbertForMaskedLM            |  4  |               1.2322                |
|      GPT2ForSequenceClassification      |  4  |               1.2114                |
|                 T5Small                 |  4  |               1.1813                |
|       RobertaForQuestionAnswering       | 16  |               1.1724                |
|       ElectraForQuestionAnswering       | 64  |               1.1609                |
|       DebertaForQuestionAnswering       |  8  |               1.1528                |
|        BertForQuestionAnswering         | 16  |               1.1418                |
|    LayoutLMForSequenceClassification    | 16  |                1.137                |
|             OPTForCausalLM              |  2  |               1.1345                |
|            XLNetLMHeadModel             |  8  |               1.1342                |
|    MegatronBertForQuestionAnswering     |  8  |               1.1152                |
|     DistilBertForQuestionAnswering      | 256 |               1.1135                |
|       T5ForConditionalGeneration        |  4  |               1.1019                |
|         MegatronBertForCausalLM         |  4  |               1.0784                |
|               DistillGPT2               | 16  |               1.0642                |
|           RobertaForCausalLM            | 16  |                1.052                |
|           LayoutLMForMaskedLM           | 16  |               1.0517                |
|            YituTechConvBert             | 16  |               1.0411                |
|      MBartForConditionalGeneration      |  2  |               1.0307                |
|     PegasusForConditionalGeneration     | 32  |               1.0185                |
|          BlenderbotForCausalLM          |  4  |               0.9995                |
|     PLBartForConditionalGeneration      |  4  |               0.9987                |
|            MBartForCausalLM             |  4  |               0.9912                |
|           PegasusForCausalLM            | 32  |               0.9864                |
|             BertForMaskedLM             | 16  |               0.9848                |
|      BartForConditionalGeneration       |  2  |               0.9844                |
|                CamemBert                | 16  |               0.9812                |
|          MobileBertForMaskedLM          | 64  |               0.9802                |
|      DebertaV2ForQuestionAnswering      |  2  |                0.98                 |
|           DebertaForMaskedLM            |  4  |               0.9759                |
|           ElectraForCausalLM            | 32  |               0.9739                |
|            TrOCRForCausalLM             | 32  |               0.9583                |
|     M2M100ForConditionalGeneration      | 16  |               0.9273                |
|             BartForCausalLM             |  4  |               0.9243                |
|          DebertaV2ForMaskedLM           |  1  |               0.9165                |
|             XGLMForCausalLM             |  8  |               0.9124                |
| BlenderbotSmallForConditionalGeneration | 64  |               0.9085                |
|            PLBartForCausalLM            |  8  |               0.9066                |
|       MT5ForConditionalGeneration       | 16  |               0.8968                |
|          DistilBertForMaskedLM          | 128 |               0.8675                |
|     MobileBertForQuestionAnswering      | 128 |                0.837                |
|       BlenderbotSmallForCausalLM        | 64  |               0.8095                |
|         Speech2Text2ForCausalLM         | 256 |               0.7856                |
|          AllenaiLongformerBase          |  0  |                 nan                 |
+-----------------------------------------+-----+-------------------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+-------------------------------------+
|                  name                   | bs  | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+-----+-------------------------------------+
|            AlbertForMaskedLM            |  4  |              164.5389               |
|       AlbertForQuestionAnswering        |  4  |              161.0295               |
|          DebertaV2ForMaskedLM           |  1  |              158.0596               |
|      DebertaV2ForQuestionAnswering      |  2  |              152.7889               |
|            XLNetLMHeadModel             |  8  |              152.6646               |
|     MobileBertForQuestionAnswering      | 128 |              148.1638               |
|          MobileBertForMaskedLM          | 64  |              145.3818               |
|     PegasusForConditionalGeneration     | 32  |              101.4923               |
|            TrOCRForCausalLM             | 32  |               97.2026               |
|          BlenderbotForCausalLM          |  4  |                95.5                 |
|     M2M100ForConditionalGeneration      | 16  |               90.5091               |
|      MBartForConditionalGeneration      |  2  |               88.1672               |
|      BartForConditionalGeneration       |  2  |               88.0842               |
|    MegatronBertForQuestionAnswering     |  8  |               86.5595               |
| BlenderbotSmallForConditionalGeneration | 64  |               82.6847               |
|       DebertaForQuestionAnswering       |  8  |               79.3787               |
|            YituTechConvBert             | 16  |               77.3507               |
|           DebertaForMaskedLM            |  4  |               75.6133               |
|                CamemBert                | 16  |               73.4118               |
|     DistilBertForQuestionAnswering      | 256 |               71.4565               |
|           LayoutLMForMaskedLM           | 16  |               69.969                |
|             BertForMaskedLM             | 16  |               69.4337               |
|           RobertaForCausalLM            | 16  |               68.8753               |
|          DistilBertForMaskedLM          | 128 |               68.2423               |
|             XGLMForCausalLM             |  8  |               67.992                |
|            MBartForCausalLM             |  4  |               67.8393               |
|     PLBartForConditionalGeneration      |  4  |               67.3449               |
|             OPTForCausalLM              |  2  |               67.1718               |
|             BartForCausalLM             |  4  |               66.9884               |
|            PLBartForCausalLM            |  8  |               60.7361               |
|       T5ForConditionalGeneration        |  4  |               59.0007               |
|                 T5Small                 |  4  |               58.8911               |
|         MegatronBertForCausalLM         |  4  |               56.8237               |
|       ElectraForQuestionAnswering       | 64  |               54.6674               |
|    LayoutLMForSequenceClassification    | 16  |               54.6074               |
|               DistillGPT2               | 16  |               54.0521               |
|       RobertaForQuestionAnswering       | 16  |               53.9412               |
|        BertForQuestionAnswering         | 16  |               53.9321               |
|           PegasusForCausalLM            | 32  |               52.0473               |
|       MT5ForConditionalGeneration       | 16  |               48.7685               |
|           ElectraForCausalLM            | 32  |               48.194                |
|       BlenderbotSmallForCausalLM        | 64  |               43.7949               |
|      GPT2ForSequenceClassification      |  4  |               38.967                |
|         Speech2Text2ForCausalLM         | 256 |               34.1697               |
|          AllenaiLongformerBase          |  0  |                 nan                 |
+-----------------------------------------+-----+-------------------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+-------------------------------------+
|              name               | bs  | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+-------------------------------------+
|        tnt_s_patch16_224        | 128 |               3.2808                |
|         coat_lite_mini          | 128 |                2.043                |
|          gmixer_24_224          | 128 |               1.8775                |
|          gmlp_s16_224           | 128 |                1.848                |
|        twins_pcpvt_base         | 64  |               1.8212                |
|         crossvit_9_240          | 128 |               1.7885                |
|           volo_d1_224           | 64  |               1.7163                |
|           convit_base           | 64  |               1.7118                |
|  swin_base_patch4_window7_224   | 64  |               1.7051                |
|            pit_b_224            | 64  |               1.5949                |
|          ghostnet_100           | 128 |               1.5827                |
|      xcit_large_24_p8_224       |  5  |                1.577                |
|        sebotnet33ts_256         | 64  |               1.5422                |
|       gluon_inception_v3        | 128 |               1.5297                |
|          inception_v3           | 128 |               1.5277                |
|             dla102              | 128 |               1.5251                |
|          jx_nest_base           | 32  |               1.5248                |
|        adv_inception_v3         | 128 |               1.5244                |
|           mnasnet_100           | 128 |               1.4935                |
|           mobilevit_s           | 64  |                1.486                |
|          convnext_base          | 64  |               1.4763                |
|      beit_base_patch16_224      | 64  |               1.4595                |
|         mobilenetv2_100         | 128 |               1.4441                |
|            lcnet_050            | 128 |               1.4427                |
|          cait_m36_384           |  4  |               1.4424                |
|           dm_nfnet_f0           | 128 |               1.4387                |
|            nfnet_l0             | 128 |                1.436                |
|       eca_botnext26ts_256       | 128 |               1.4247                |
|          botnet26t_256          | 128 |               1.4204                |
|           selecsls42b           | 128 |               1.4156                |
|          spnasnet_100           | 128 |               1.4026                |
|           fbnetc_100            | 128 |               1.3996                |
|          resmlp_12_224          | 128 |               1.3908                |
|      mobilenetv3_large_100      | 128 |                1.39                 |
|          mixer_b16_224          | 128 |               1.3896                |
|        ese_vovnet19b_dw         | 128 |               1.3818                |
|       tf_efficientnet_b0        | 128 |               1.3813                |
|        res2net50_14w_8s         | 128 |               1.3811                |
|            hrnet_w18            | 128 |                1.378                |
|           res2next50            | 128 |               1.3608                |
|           resnest101e           | 64  |                1.355                |
|      vit_base_patch16_224       | 64  |               1.3449                |
|           rexnet_100            | 128 |               1.3386                |
|         poolformer_m36          | 64  |               1.3193                |
| deit_base_distilled_patch16_224 | 64  |               1.3188                |
|            fbnetv3_b            | 128 |               1.3007                |
|          cspdarknet53           | 64  |               1.2718                |
|            tinynet_a            | 128 |               1.2295                |
|           regnety_002           | 128 |               1.2245                |
|         visformer_small         | 128 |               1.1971                |
|           tf_mixnet_l           | 128 |               1.1961                |
|            mixnet_l             | 128 |                1.187                |
|          pnasnet5large          | 16  |               1.1411                |
|             dpn107              | 32  |               1.1358                |
|            repvgg_a2            | 128 |               1.1229                |
|        res2net101_26w_4s        | 64  |               1.0952                |
|        gluon_xception65         | 32  |               1.0893                |
|            gernet_l             | 128 |               1.0709                |
|     swsl_resnext101_32x16d      | 32  |               1.0223                |
|        convmixer_768_32         | 32  |               1.0087                |
+---------------------------------+-----+-------------------------------------+

Accuracy

+---------------------------------+----+-------------------------------------+
|              name               | bs | inductor_max_autotune_no_cudagraphs |
+---------------------------------+----+-------------------------------------+
|        adv_inception_v3         | 8  |                pass                 |
|          resmlp_12_224          | 8  |                pass                 |
|         mobilenetv2_100         | 8  |                pass                 |
|      mobilenetv3_large_100      | 8  |                pass                 |
|           mobilevit_s           | 8  |                pass                 |
|            nfnet_l0             | 8  |                pass                 |
|            pit_b_224            | 8  |                pass                 |
|          pnasnet5large          | 8  |                pass                 |
|         poolformer_m36          | 8  |                pass                 |
|           regnety_002           | 8  |                pass                 |
|            repvgg_a2            | 8  |                pass                 |
|        res2net101_26w_4s        | 8  |                pass                 |
|        res2net50_14w_8s         | 8  |                pass                 |
|           res2next50            | 8  |                pass                 |
|           resnest101e           | 8  |                pass                 |
|            mixnet_l             | 8  |                pass                 |
|           rexnet_100            | 8  |                pass                 |
|        sebotnet33ts_256         | 8  |                pass                 |
|           selecsls42b           | 8  |                pass                 |
|          spnasnet_100           | 8  |                pass                 |
|     swsl_resnext101_32x16d      | 8  |                pass                 |
|       tf_efficientnet_b0        | 8  |                pass                 |
|           tf_mixnet_l           | 8  |                pass                 |
|            tinynet_a            | 8  |                pass                 |
|        tnt_s_patch16_224        | 8  |                pass                 |
|         visformer_small         | 8  |                pass                 |
|      vit_base_patch16_224       | 8  |                pass                 |
|           volo_d1_224           | 8  |                pass                 |
|      beit_base_patch16_224      | 8  |                pass                 |
|           mnasnet_100           | 8  |                pass                 |
|          mixer_b16_224          | 8  |                pass                 |
|       eca_botnext26ts_256       | 8  |                pass                 |
|          botnet26t_256          | 8  |                pass                 |
|          cait_m36_384           | 4  |                pass                 |
|           convit_base           | 8  |                pass                 |
|        convmixer_768_32         | 8  |                pass                 |
|          convnext_base          | 8  |                pass                 |
|         crossvit_9_240          | 8  |                pass                 |
|          cspdarknet53           | 8  |                pass                 |
| deit_base_distilled_patch16_224 | 8  |                pass                 |
|             dla102              | 8  |                pass                 |
|           dm_nfnet_f0           | 8  |                pass                 |
|            lcnet_050            | 8  |                pass                 |
|             dpn107              | 8  |                pass                 |
|        ese_vovnet19b_dw         | 8  |                pass                 |
|           fbnetc_100            | 8  |                pass                 |
|            fbnetv3_b            | 8  |                pass                 |
|            gernet_l             | 8  |                pass                 |
|          ghostnet_100           | 8  |                pass                 |
|       gluon_inception_v3        | 8  |                pass                 |
|        gluon_xception65         | 8  |                pass                 |
|          gmixer_24_224          | 8  |                pass                 |
|          gmlp_s16_224           | 8  |                pass                 |
|            hrnet_w18            | 8  |                pass                 |
|          inception_v3           | 8  |                pass                 |
|          jx_nest_base           | 8  |                pass                 |
|      xcit_large_24_p8_224       | 8  |                pass                 |
|  swin_base_patch4_window7_224   | 8  |            fail_accuracy            |
|        twins_pcpvt_base         | 0  |               0.0000                |
|         coat_lite_mini          | 0  |               0.0000                |
+---------------------------------+----+-------------------------------------+

Compilation latency (sec)

+---------------------------------+-----+-------------------------------------+
|              name               | bs  | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+-------------------------------------+
|        twins_pcpvt_base         | 64  |              1661.7656              |
|           mobilevit_s           | 64  |              1550.9785              |
|         coat_lite_mini          | 128 |              1446.3549              |
|         crossvit_9_240          | 128 |              1274.4466              |
|      xcit_large_24_p8_224       |  5  |              1090.0301              |
|           rexnet_100            | 128 |              1086.8005              |
|           volo_d1_224           | 64  |              1077.1989              |
|          cait_m36_384           |  4  |               985.835               |
|            pit_b_224            | 64  |              983.0208               |
|  swin_base_patch4_window7_224   | 64  |              979.3625               |
|          jx_nest_base           | 32  |              973.6643               |
|          ghostnet_100           | 128 |              917.2281               |
|            hrnet_w18            | 128 |              848.4131               |
|            mixnet_l             | 128 |              828.4946               |
|        sebotnet33ts_256         | 64  |               815.841               |
|        adv_inception_v3         | 128 |              791.4986               |
|        res2net50_14w_8s         | 128 |              782.9523               |
|          botnet26t_256          | 128 |              748.2407               |
|        res2net101_26w_4s        | 64  |              721.0519               |
|             dpn107              | 32  |              699.6225               |
|            fbnetv3_b            | 128 |              671.9632               |
|          pnasnet5large          | 16  |               625.272               |
|           fbnetc_100            | 128 |              582.0276               |
|        tnt_s_patch16_224        | 128 |              573.5951               |
|          convnext_base          | 64  |              520.1623               |
|            tinynet_a            | 128 |               513.939               |
|           regnety_002           | 128 |              483.9504               |
|         visformer_small         | 128 |              476.2705               |
|             dla102              | 128 |              461.3868               |
|           resnest101e           | 64  |              438.5754               |
|          cspdarknet53           | 64  |              394.8797               |
|            nfnet_l0             | 128 |              357.7415               |
|        gluon_xception65         | 32  |              350.2381               |
|      beit_base_patch16_224      | 64  |              341.4709               |
|         poolformer_m36          | 64  |              335.8947               |
|          gmixer_24_224          | 128 |              333.6782               |
|            gernet_l             | 128 |              330.3406               |
|       eca_botnext26ts_256       | 128 |              330.1705               |
|       tf_efficientnet_b0        | 128 |              314.6216               |
|           convit_base           | 64  |              307.2133               |
|           selecsls42b           | 128 |              298.4681               |
|        ese_vovnet19b_dw         | 128 |               284.912               |
|           mnasnet_100           | 128 |               283.627               |
|            repvgg_a2            | 128 |               266.93                |
| deit_base_distilled_patch16_224 | 64  |              266.4633               |
|          mixer_b16_224          | 128 |              258.0345               |
|            lcnet_050            | 128 |              237.0143               |
|          gmlp_s16_224           | 128 |              192.8936               |
|      mobilenetv3_large_100      | 128 |              188.8074               |
|     swsl_resnext101_32x16d      | 32  |              184.8311               |
|          resmlp_12_224          | 128 |              172.1008               |
|           res2next50            | 128 |              152.5042               |
|         mobilenetv2_100         | 128 |               128.714               |
|        convmixer_768_32         | 32  |              127.8047               |
|           tf_mixnet_l           | 128 |               92.5167               |
|          spnasnet_100           | 128 |               78.3946               |
|       gluon_inception_v3        | 128 |               55.8254               |
|          inception_v3           | 128 |               54.8863               |
|      vit_base_patch16_224       | 64  |               47.5897               |
|           dm_nfnet_f0           | 128 |               38.9676               |
+---------------------------------+-----+-------------------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+-------------------------------------+
|              name               | bs  | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+-------------------------------------+
|          pnasnet5large          | 16  |               1.2842                |
|          gmlp_s16_224           | 128 |               1.2049                |
|         poolformer_m36          | 64  |               1.1877                |
|           convit_base           | 64  |                1.157                |
|          gmixer_24_224          | 128 |               1.1482                |
|         mobilenetv2_100         | 128 |                1.118                |
|           resnest101e           | 64  |               1.0869                |
|           dm_nfnet_f0           | 128 |               1.0845                |
|       tf_efficientnet_b0        | 128 |               1.0724                |
|            tinynet_a            | 128 |               1.0712                |
|           tf_mixnet_l           | 128 |               1.0681                |
|        tnt_s_patch16_224        | 128 |               1.0504                |
|           rexnet_100            | 128 |               1.0454                |
|          resmlp_12_224          | 128 |               1.0349                |
|          cspdarknet53           | 64  |               1.0319                |
|             dla102              | 128 |               1.0312                |
|          inception_v3           | 128 |               1.0265                |
|       gluon_inception_v3        | 128 |               1.0265                |
|        twins_pcpvt_base         | 64  |               1.0223                |
|         visformer_small         | 128 |               1.0194                |
|        sebotnet33ts_256         | 64  |               1.0191                |
|        adv_inception_v3         | 128 |               1.0174                |
|          convnext_base          | 64  |               1.0165                |
|       eca_botnext26ts_256       | 128 |               0.9979                |
|            nfnet_l0             | 128 |               0.9952                |
|            hrnet_w18            | 128 |               0.9915                |
|         crossvit_9_240          | 128 |               0.9898                |
|        ese_vovnet19b_dw         | 128 |               0.9897                |
|            mixnet_l             | 128 |               0.9893                |
|          spnasnet_100           | 128 |               0.9863                |
|        convmixer_768_32         | 32  |               0.9852                |
|          cait_m36_384           |  4  |               0.9845                |
|           mobilevit_s           | 64  |               0.9818                |
|      beit_base_patch16_224      | 64  |               0.9812                |
|          mixer_b16_224          | 128 |               0.9788                |
|            pit_b_224            | 64  |               0.9773                |
|            fbnetv3_b            | 128 |               0.9772                |
|          ghostnet_100           | 128 |               0.9765                |
|     swsl_resnext101_32x16d      | 32  |               0.9747                |
|      xcit_large_24_p8_224       |  5  |               0.9737                |
|        gluon_xception65         | 32  |                0.97                 |
|           volo_d1_224           | 64  |               0.9673                |
|         coat_lite_mini          | 128 |               0.9634                |
|            gernet_l             | 128 |               0.9634                |
|             dpn107              | 32  |               0.9609                |
|          jx_nest_base           | 32  |               0.9605                |
|          botnet26t_256          | 128 |               0.9593                |
|           selecsls42b           | 128 |                0.959                |
|        res2net50_14w_8s         | 128 |                0.959                |
|      vit_base_patch16_224       | 64  |                0.955                |
| deit_base_distilled_patch16_224 | 64  |               0.9536                |
|           fbnetc_100            | 128 |               0.9536                |
|           res2next50            | 128 |               0.9531                |
|            repvgg_a2            | 128 |               0.9518                |
|        res2net101_26w_4s        | 64  |               0.9459                |
|           mnasnet_100           | 128 |               0.9395                |
|      mobilenetv3_large_100      | 128 |               0.9352                |
|  swin_base_patch4_window7_224   | 64  |               0.9044                |
|           regnety_002           | 128 |               0.8964                |
|            lcnet_050            | 128 |               0.8843                |
+---------------------------------+-----+-------------------------------------+

Absolute latency (ms)

+---------------------------------+-----+-------------------------------------+
|              name               | bs  | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+-------------------------------------+
|        convmixer_768_32         | 32  |              297.5561               |
|            hrnet_w18            | 128 |              203.6333               |
|          pnasnet5large          | 16  |              172.5757               |
|           tf_mixnet_l           | 128 |               158.779               |
|            mixnet_l             | 128 |              152.6326               |
|           resnest101e           | 64  |              121.1486               |
|     swsl_resnext101_32x16d      | 32  |              116.2847               |
|          cait_m36_384           |  4  |              115.9344               |
|             dla102              | 128 |              112.9899               |
|         poolformer_m36          | 64  |              109.8308               |
|        adv_inception_v3         | 128 |              105.1958               |
|          inception_v3           | 128 |              105.0804               |
|       gluon_inception_v3        | 128 |              104.8359               |
|        res2net50_14w_8s         | 128 |              101.7858               |
|        tnt_s_patch16_224        | 128 |               98.5791               |
|           convit_base           | 64  |               95.1954               |
|             dpn107              | 32  |               93.5896               |
|           res2next50            | 128 |               92.5685               |
|        gluon_xception65         | 32  |               91.1349               |
|        res2net101_26w_4s        | 64  |               90.9843               |
|           dm_nfnet_f0           | 128 |               87.9436               |
|  swin_base_patch4_window7_224   | 64  |               85.6668               |
|          mixer_b16_224          | 128 |               84.4027               |
|            fbnetv3_b            | 128 |               84.3302               |
|          convnext_base          | 64  |               82.8924               |
|      xcit_large_24_p8_224       |  5  |               81.6976               |
|            nfnet_l0             | 128 |               78.0815               |
|         visformer_small         | 128 |               76.0427               |
|       eca_botnext26ts_256       | 128 |               74.3494               |
|            pit_b_224            | 64  |               74.1421               |
|          gmlp_s16_224           | 128 |               74.107                |
|           volo_d1_224           | 64  |               70.2582               |
|          botnet26t_256          | 128 |               69.8646               |
|          cspdarknet53           | 64  |               69.729                |
|      beit_base_patch16_224      | 64  |               69.6892               |
|            gernet_l             | 128 |               68.0302               |
|          jx_nest_base           | 32  |               65.7983               |
|        twins_pcpvt_base         | 64  |               65.1973               |
|            repvgg_a2            | 128 |               64.726                |
|      vit_base_patch16_224       | 64  |               64.3227               |
| deit_base_distilled_patch16_224 | 64  |               64.2817               |
|          gmixer_24_224          | 128 |               62.5841               |
|       tf_efficientnet_b0        | 128 |               59.1001               |
|           rexnet_100            | 128 |               57.0838               |
|          ghostnet_100           | 128 |               57.0195               |
|            tinynet_a            | 128 |               56.9044               |
|           fbnetc_100            | 128 |               56.3749               |
|         coat_lite_mini          | 128 |               55.196                |
|           mobilevit_s           | 64  |               54.9463               |
|        sebotnet33ts_256         | 64  |               49.9624               |
|          spnasnet_100           | 128 |               47.2813               |
|         crossvit_9_240          | 128 |               45.7538               |
|        ese_vovnet19b_dw         | 128 |               44.8411               |
|         mobilenetv2_100         | 128 |               43.0641               |
|           selecsls42b           | 128 |               42.4034               |
|      mobilenetv3_large_100      | 128 |               41.9541               |
|           mnasnet_100           | 128 |               40.8903               |
|          resmlp_12_224          | 128 |               38.1688               |
|           regnety_002           | 128 |               30.7911               |
|            lcnet_050            | 128 |               20.7195               |
+---------------------------------+-----+-------------------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_095_05_04_23_performance_amp_233/timm_models_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_095_05_04_23_performance_amp_233/huggingface_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_095_05_04_23_performance_amp_233/torchbench_amp.png :

Build Summary

see more

Run name

day_095_05_04_23_performance_amp_233

Commit hashes

pytorch commit: f55e72c0f6bd6da016aaa51de379e6ba6d7891cc
pytorch commit date: 2023-04-07 17:30:27+00:00
torchbench commit: 735f1927996c8d9ab81f0b0c05dd1ebdb26a6250
torchbench commit date: 2023-04-05 09:43:21-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitf55e72c

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

Performance Dashboard for amp precision (inductor max-autotune comparison on timm models)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               | 100%, 60/60 |
|       inductor_no_cudagraphs        | 100%, 60/60 |
|        inductor_max_autotune        | 100%, 60/60 |
| inductor_max_autotune_no_cudagraphs | 100%, 60/60 |
+-------------------------------------+-------------+

Geometric mean speedup

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |    1.42x    |
|       inductor_no_cudagraphs        |    1.40x    |
|        inductor_max_autotune        |    1.47x    |
| inductor_max_autotune_no_cudagraphs |    1.44x    |
+-------------------------------------+-------------+

Mean compilation time (seconds)

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |    80.25    |
|       inductor_no_cudagraphs        |    44.69    |
|        inductor_max_autotune        |   372.93    |
| inductor_max_autotune_no_cudagraphs |    52.43    |
+-------------------------------------+-------------+

Peak memory footprint compression ratio (higher is better)

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |    0.91x    |
|       inductor_no_cudagraphs        |    1.03x    |
|        inductor_max_autotune        |    0.90x    |
| inductor_max_autotune_no_cudagraphs |    1.03x    |
+-------------------------------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Compilation latency (sec) warnings

+-------------+----------------------+----------+------------------------+
|    suite    |         name         | inductor | inductor_no_cudagraphs |
+-------------+----------------------+----------+------------------------+
| timm_models |      rexnet_100      | 227.9512 |        41.7002         |
| timm_models |      hrnet_w18       | 208.9208 |        150.3511        |
| timm_models |     ghostnet_100     | 184.1414 |        51.6369         |
| timm_models |    pnasnet5large     | 160.7814 |        104.1362        |
| timm_models |   adv_inception_v3   | 153.9169 |        50.6449         |
| timm_models |  res2net101_26w_4s   | 147.3733 |        85.8358         |
| timm_models |   twins_pcpvt_base   | 144.0143 |        67.4679         |
| timm_models |      fbnetv3_b       | 140.7021 |        56.8816         |
| timm_models |      fbnetc_100      | 125.6072 |        33.4715         |
| timm_models | xcit_large_24_p8_224 | 124.9615 |        86.6285         |
| timm_models |      tinynet_a       | 123.9613 |         40.715         |
| timm_models |     resnest101e      | 123.647  |        77.6249         |
+-------------+----------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+------------------------------+----------+------------------------+
|    suite    |             name             | inductor | inductor_no_cudagraphs |
+-------------+------------------------------+----------+------------------------+
| timm_models |         ghostnet_100         |  0.8976  |         1.0223         |
| timm_models |          hrnet_w18           |  0.8918  |         1.0029         |
| timm_models |       sebotnet33ts_256       |  0.891   |         1.1308         |
| timm_models |       adv_inception_v3       |  0.8904  |         1.0264         |
| timm_models |         inception_v3         |  0.8904  |         1.0264         |
| timm_models |      gluon_inception_v3      |  0.8904  |         1.0264         |
| timm_models |    mobilenetv3_large_100     |  0.8881  |         0.9808         |
| timm_models |            dpn107            |  0.8833  |         0.995          |
| timm_models |       gluon_xception65       |  0.8832  |         0.9952         |
| timm_models |         spnasnet_100         |  0.8786  |         0.9858         |
| timm_models |         selecsls42b          |  0.8785  |         0.9929         |
| timm_models |        poolformer_m36        |  0.8768  |         1.1865         |
| timm_models |     eca_botnext26ts_256      |  0.8738  |         1.0136         |
| timm_models |       res2net50_14w_8s       |  0.8712  |         0.9743         |
| timm_models |      res2net101_26w_4s       |  0.871   |         0.9759         |
| timm_models |           mixnet_l           |  0.8687  |         1.0035         |
| timm_models |         mnasnet_100          |  0.8683  |         0.9844         |
| timm_models |          res2next50          |  0.866   |         0.9673         |
| timm_models |         cait_m36_384         |  0.8632  |         1.0068         |
| timm_models |          fbnetc_100          |  0.8596  |         0.991          |
| timm_models |          pit_b_224           |  0.8578  |         1.0345         |
| timm_models |        convnext_base         |  0.8505  |         1.033          |
| timm_models |           gernet_l           |  0.8499  |         0.9793         |
| timm_models |    swsl_resnext101_32x16d    |  0.8461  |         0.9986         |
| timm_models |        coat_lite_mini        |  0.8402  |         1.033          |
| timm_models |          lcnet_050           |  0.8273  |         0.9465         |
| timm_models |        botnet26t_256         |  0.8239  |         0.9848         |
| timm_models |     xcit_large_24_p8_224     |  0.8225  |         1.0063         |
| timm_models |         regnety_002          |  0.8164  |         0.9526         |
| timm_models |          repvgg_a2           |  0.7738  |         0.9882         |
| timm_models |        crossvit_9_240        |  0.7526  |         0.9882         |
| timm_models | swin_base_patch4_window7_224 |  0.7214  |         0.9272         |
| timm_models |         jx_nest_base         |  0.6693  |         0.9883         |
+-------------+------------------------------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|        tnt_s_patch16_224        | 128 |  3.0329  |         2.9895         |        3.3512         |               3.3078                |
|        twins_pcpvt_base         | 64  |  2.1343  |         1.7009         |        2.3417         |               1.7722                |
|      xcit_large_24_p8_224       |  5  |  2.0741  |         1.5882         |        2.4879         |               1.5875                |
|         coat_lite_mini          | 128 |  1.954   |         1.9295         |        2.0832         |                2.061                |
|          ghostnet_100           | 128 |  1.8722  |         1.5676         |        1.8762         |               1.6385                |
|          gmlp_s16_224           | 128 |  1.8698  |         1.8518         |        1.8886         |               1.8702                |
|          gmixer_24_224          | 128 |  1.7823  |         1.7637         |        1.9194         |               1.8962                |
|           volo_d1_224           | 64  |  1.7067  |         1.6817         |        1.7614         |               1.7384                |
|            lcnet_050            | 128 |  1.7011  |         1.4729         |        1.7062         |               1.4845                |
|         crossvit_9_240          | 128 |  1.6702  |         1.6427         |        1.8597         |               1.8271                |
|  swin_base_patch4_window7_224   | 64  |  1.6415  |         1.6303         |        1.7455         |                1.738                |
|           convit_base           | 64  |  1.6178  |         1.6169         |        1.7202         |                1.719                |
|          inception_v3           | 128 |  1.5391  |         1.5278         |        1.5463         |               1.5348                |
|        adv_inception_v3         | 128 |  1.5387  |         1.5304         |        1.5454         |               1.5371                |
|             dla102              | 128 |  1.5377  |         1.534          |        1.5413         |               1.5351                |
|       gluon_inception_v3        | 128 |  1.5375  |         1.5296         |        1.5476         |                1.539                |
|          convnext_base          | 64  |  1.5255  |         1.5075         |        1.5322         |               1.5137                |
|        sebotnet33ts_256         | 64  |  1.525   |         1.5548         |        1.5361         |               1.5669                |
|            nfnet_l0             | 128 |  1.5105  |         1.4579         |        1.5098         |               1.4525                |
|           dm_nfnet_f0           | 128 |  1.5081  |         1.4559         |         1.518         |               1.4655                |
|       eca_botnext26ts_256       | 128 |  1.4564  |         1.4336         |        1.4599         |               1.4387                |
|           mobilevit_s           | 64  |  1.4494  |         1.4644         |        1.4949         |               1.5133                |
|      mobilenetv3_large_100      | 128 |  1.4447  |         1.4345         |        1.4434         |               1.4414                |
|            pit_b_224            | 64  |  1.4446  |         1.4389         |        1.6209         |               1.6134                |
|           mnasnet_100           | 128 |  1.4424  |         1.4995         |        1.4387         |               1.4986                |
|           resnest101e           | 64  |  1.4417  |         1.3672         |        1.4409         |                1.37                 |
|           regnety_002           | 128 |  1.4378  |         1.253          |        1.5412         |               1.2333                |
|          botnet26t_256          | 128 |  1.417   |         1.4347         |        1.4224         |               1.4427                |
|           selecsls42b           | 128 |  1.4152  |         1.4148         |        1.4193         |               1.4181                |
|         mobilenetv2_100         | 128 |  1.3979  |         1.4541         |        1.3959         |               1.4508                |
|          jx_nest_base           | 32  |  1.3891  |         1.3804         |        1.5752         |               1.5631                |
|        res2net50_14w_8s         | 128 |  1.3834  |         1.3591         |        1.4046         |               1.3859                |
|           res2next50            | 128 |  1.3732  |         1.3664         |        1.3731         |               1.3673                |
|        ese_vovnet19b_dw         | 128 |  1.3689  |         1.3881         |        1.3818         |               1.4021                |
|          spnasnet_100           | 128 |  1.3653  |         1.4233         |        1.3628         |               1.4256                |
|          mixer_b16_224          | 128 |  1.3653  |         1.366          |        1.3999         |               1.3996                |
|            hrnet_w18            | 128 |  1.3631  |         1.3628         |        1.3972         |               1.3667                |
|       tf_efficientnet_b0        | 128 |  1.3619  |         1.3927         |        1.3604         |               1.3935                |
|           fbnetc_100            | 128 |  1.3577  |         1.4092         |         1.355         |               1.3943                |
|      beit_base_patch16_224      | 64  |  1.3577  |         1.3574         |         1.468         |               1.4675                |
|          cait_m36_384           |  4  |  1.3565  |         1.3576         |         1.456         |               1.4451                |
|         poolformer_m36          | 64  |  1.3506  |         1.3425         |         1.352         |               1.3436                |
|            fbnetv3_b            | 128 |  1.322   |         1.3385         |        1.3214         |                1.344                |
|           rexnet_100            | 128 |  1.3169  |         1.3524         |        1.3222         |               1.3589                |
|          resmlp_12_224          | 128 |  1.2766  |         1.2699         |        1.4146         |               1.4091                |
| deit_base_distilled_patch16_224 | 64  |  1.2621  |         1.2615         |        1.3254         |               1.3254                |
|          cspdarknet53           | 64  |  1.2459  |         1.2821         |        1.2541         |               1.2908                |
|      vit_base_patch16_224       | 64  |  1.2419  |         1.2409         |        1.3522         |               1.3512                |
|            tinynet_a            | 128 |  1.2355  |         1.2629         |        1.2386         |               1.2597                |
|           tf_mixnet_l           | 128 |  1.1935  |         1.1991         |        1.1977         |               1.2051                |
|            mixnet_l             | 128 |  1.1819  |         1.188          |        1.1866         |               1.1933                |
|         visformer_small         | 128 |  1.1782  |         1.1703         |        1.2101         |               1.2029                |
|        res2net101_26w_4s        | 64  |  1.1561  |         1.0921         |         1.168         |               1.0969                |
|          pnasnet5large          | 16  |  1.1282  |         1.1439         |        1.1205         |               1.1624                |
|             dpn107              | 32  |  1.1035  |         1.1502         |        1.1052         |               1.1487                |
|            repvgg_a2            | 128 |  1.0975  |         1.1313         |        1.1043         |               1.1354                |
|        gluon_xception65         | 32  |  1.0841  |         1.0881         |        1.0961         |               1.0994                |
|     swsl_resnext101_32x16d      | 32  |  1.0634  |         1.0258         |        1.0626         |               1.0227                |
|            gernet_l             | 128 |  1.0495  |         1.0804         |        1.0546         |               1.0876                |
|        convmixer_768_32         | 32  |  1.0032  |         1.004          |        1.0091         |                1.01                 |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Accuracy

+---------------------------------+----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+----+----------+------------------------+-----------------------+-------------------------------------+
|        adv_inception_v3         | 8  |   pass   |          pass          |         pass          |                pass                 |
|      beit_base_patch16_224      | 8  |   pass   |          pass          |         pass          |                pass                 |
|      mobilenetv3_large_100      | 8  |   pass   |          pass          |         pass          |                pass                 |
|           mobilevit_s           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            nfnet_l0             | 8  |   pass   |          pass          |         pass          |                pass                 |
|            pit_b_224            | 8  |   pass   |          pass          |         pass          |                pass                 |
|          pnasnet5large          | 8  |   pass   |          pass          |         pass          |                pass                 |
|         poolformer_m36          | 8  |   pass   |          pass          |         pass          |                pass                 |
|           regnety_002           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            repvgg_a2            | 8  |   pass   |          pass          |         pass          |                pass                 |
|        res2net101_26w_4s        | 8  |   pass   |          pass          |         pass          |                pass                 |
|        res2net50_14w_8s         | 8  |   pass   |          pass          |         pass          |                pass                 |
|           res2next50            | 8  |   pass   |          pass          |         pass          |                pass                 |
|          resmlp_12_224          | 8  |   pass   |          pass          |         pass          |                pass                 |
|           resnest101e           | 8  |   pass   |          pass          |         pass          |                pass                 |
|           rexnet_100            | 8  |   pass   |          pass          |         pass          |                pass                 |
|        sebotnet33ts_256         | 8  |   pass   |          pass          |         pass          |                pass                 |
|           selecsls42b           | 8  |   pass   |          pass          |         pass          |                pass                 |
|          spnasnet_100           | 8  |   pass   |          pass          |         pass          |                pass                 |
|  swin_base_patch4_window7_224   | 8  |   pass   |          pass          |         pass          |                pass                 |
|     swsl_resnext101_32x16d      | 8  |   pass   |          pass          |         pass          |                pass                 |
|       tf_efficientnet_b0        | 8  |   pass   |          pass          |         pass          |                pass                 |
|           tf_mixnet_l           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            tinynet_a            | 8  |   pass   |          pass          |         pass          |                pass                 |
|        tnt_s_patch16_224        | 8  |   pass   |          pass          |         pass          |                pass                 |
|        twins_pcpvt_base         | 8  |   pass   |          pass          |         pass          |                pass                 |
|         visformer_small         | 8  |   pass   |          pass          |         pass          |                pass                 |
|      vit_base_patch16_224       | 8  |   pass   |          pass          |         pass          |                pass                 |
|           volo_d1_224           | 8  |   pass   |          pass          |         pass          |                pass                 |
|         mobilenetv2_100         | 8  |   pass   |          pass          |         pass          |                pass                 |
|           mnasnet_100           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            mixnet_l             | 8  |   pass   |          pass          |         pass          |                pass                 |
|       eca_botnext26ts_256       | 8  |   pass   |          pass          |         pass          |                pass                 |
|          botnet26t_256          | 8  |   pass   |          pass          |         pass          |                pass                 |
|          cait_m36_384           | 4  |   pass   |          pass          |         pass          |                pass                 |
|         coat_lite_mini          | 8  |   pass   |          pass          |         pass          |                pass                 |
|           convit_base           | 8  |   pass   |          pass          |         pass          |                pass                 |
|        convmixer_768_32         | 8  |   pass   |          pass          |         pass          |                pass                 |
|          convnext_base          | 8  |   pass   |          pass          |         pass          |                pass                 |
|         crossvit_9_240          | 8  |   pass   |          pass          |         pass          |                pass                 |
|          cspdarknet53           | 8  |   pass   |          pass          |         pass          |                pass                 |
| deit_base_distilled_patch16_224 | 8  |   pass   |          pass          |         pass          |                pass                 |
|             dla102              | 8  |   pass   |          pass          |         pass          |                pass                 |
|           dm_nfnet_f0           | 8  |   pass   |          pass          |         pass          |                pass                 |
|             dpn107              | 8  |   pass   |          pass          |         pass          |                pass                 |
|        ese_vovnet19b_dw         | 8  |   pass   |          pass          |         pass          |                pass                 |
|          mixer_b16_224          | 8  |   pass   |          pass          |         pass          |                pass                 |
|           fbnetc_100            | 8  |   pass   |          pass          |         pass          |                pass                 |
|            fbnetv3_b            | 8  |   pass   |          pass          |         pass          |                pass                 |
|            gernet_l             | 8  |   pass   |          pass          |         pass          |                pass                 |
|          ghostnet_100           | 8  |   pass   |          pass          |         pass          |                pass                 |
|       gluon_inception_v3        | 8  |   pass   |          pass          |         pass          |                pass                 |
|        gluon_xception65         | 8  |   pass   |          pass          |         pass          |                pass                 |
|          gmixer_24_224          | 8  |   pass   |          pass          |         pass          |                pass                 |
|          gmlp_s16_224           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            hrnet_w18            | 8  |   pass   |          pass          |         pass          |                pass                 |
|          inception_v3           | 8  |   pass   |          pass          |         pass          |                pass                 |
|          jx_nest_base           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            lcnet_050            | 8  |   pass   |          pass          |         pass          |                pass                 |
|      xcit_large_24_p8_224       | 8  |   pass   |          pass          |         pass          |                pass                 |
+---------------------------------+----+----------+------------------------+-----------------------+-------------------------------------+

Compilation latency (sec)

+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|           rexnet_100            | 128 | 227.9512 |        41.7002         |       528.9244        |               45.2923               |
|            hrnet_w18            | 128 | 208.9208 |        150.3511        |       527.2794        |              165.8731               |
|          ghostnet_100           | 128 | 184.1414 |        51.6369         |       566.6493        |               55.3906               |
|          pnasnet5large          | 16  | 160.7814 |        104.1362        |       398.3731        |              112.1264               |
|        adv_inception_v3         | 128 | 153.9169 |        50.6449         |       370.1581        |               56.4033               |
|        res2net101_26w_4s        | 64  | 147.3733 |        85.8358         |       413.4458        |               96.2796               |
|        twins_pcpvt_base         | 64  | 144.0143 |        67.4679         |       1334.9068       |               95.1846               |
|            fbnetv3_b            | 128 | 140.7021 |        56.8816         |       392.2788        |               62.289                |
|           fbnetc_100            | 128 | 125.6072 |        33.4715         |       350.3351        |               37.5104               |
|      xcit_large_24_p8_224       |  5  | 124.9615 |        86.6285         |       864.4854        |              105.9428               |
|            tinynet_a            | 128 | 123.9613 |         40.715         |       302.3102        |               45.6944               |
|           resnest101e           | 64  | 123.647  |        77.6249         |       245.0998        |               83.6248               |
|          cait_m36_384           |  4  | 119.756  |        85.9746         |       803.6332        |              118.6149               |
|            mixnet_l             | 128 | 116.8338 |        47.9094         |       473.2307        |               53.6196               |
|           mobilevit_s           | 64  | 110.6918 |        50.1101         |       1135.7656       |               60.5821               |
|  swin_base_patch4_window7_224   | 64  | 104.5658 |        61.5276         |       719.3702        |               79.4015               |
|        res2net50_14w_8s         | 128 | 101.3692 |        78.6431         |       462.8679        |               87.5904               |
|         poolformer_m36          | 64  | 93.8884  |        60.7278         |       211.4394        |               64.262                |
|         coat_lite_mini          | 128 | 87.0573  |        31.2822         |       1045.1848       |               42.2426               |
|             dpn107              | 32  | 85.8901  |        59.0832         |        360.542        |               63.0853               |
|         crossvit_9_240          | 128 | 83.8572  |        40.5448         |       1021.9742       |               57.0584               |
|          botnet26t_256          | 128 | 83.6736  |         29.015         |        445.744        |               31.4552               |
|             dla102              | 128 | 83.4309  |        49.1961         |       230.1203        |               54.2374               |
|          cspdarknet53           | 64  | 81.6337  |        36.7646         |       208.8842        |               40.8153               |
|        gluon_xception65         | 32  | 80.7204  |        57.3019         |       251.8274        |               59.1264               |
|          jx_nest_base           | 32  | 79.6116  |        52.7253         |       771.5004        |               66.4315               |
|           tf_mixnet_l           | 128 |  69.528  |        49.7329         |        71.0052        |               54.565                |
|           regnety_002           | 128 | 69.0197  |        28.8112         |       288.4092        |               33.9246               |
|           dm_nfnet_f0           | 128 | 66.6549  |        35.6506         |       259.0451        |               39.0645               |
|        tnt_s_patch16_224        | 128 | 65.4538  |        46.4816         |       424.9861        |               68.0578               |
|        sebotnet33ts_256         | 64  | 63.3396  |        35.9376         |       606.9687        |               42.0218               |
|          gmlp_s16_224           | 128 | 59.8873  |        35.7854         |       145.5686        |               47.7077               |
|           volo_d1_224           | 64  | 59.6683  |        38.8359         |       856.2288        |               55.4987               |
|            nfnet_l0             | 128 | 58.3811  |        32.8717         |       200.0886        |               37.2039               |
|            gernet_l             | 128 | 58.2368  |        28.3871         |       190.5211        |               30.981                |
|       tf_efficientnet_b0        | 128 | 54.7643  |        35.5072         |       198.6235        |               39.886                |
|          convnext_base          | 64  | 54.6605  |        37.2604         |       386.5703        |               45.9042               |
|       gluon_inception_v3        | 128 | 53.5437  |        50.8934         |        55.4889        |               55.9809               |
|          inception_v3           | 128 | 53.2153  |        50.8377         |        56.3443        |               57.0138               |
|          gmixer_24_224          | 128 | 51.4464  |         37.915         |       254.9779        |               47.3883               |
|           mnasnet_100           | 128 | 50.0611  |         28.208         |       170.2907        |               30.6131               |
|     swsl_resnext101_32x16d      | 32  | 48.2503  |        45.5488         |       164.7072        |               48.7519               |
|        ese_vovnet19b_dw         | 128 | 47.7346  |        19.3878         |       206.4203        |               21.5838               |
|      mobilenetv3_large_100      | 128 |  47.047  |        31.3849         |       119.2528        |               34.6582               |
|       eca_botnext26ts_256       | 128 | 46.9274  |        28.7514         |       248.7129        |               32.9131               |
|           res2next50            | 128 | 45.7447  |        44.2491         |       114.0371        |               47.763                |
|           convit_base           | 64  | 44.8977  |         28.75          |       303.0676        |               40.7247               |
|         mobilenetv2_100         | 128 | 44.3548  |        28.9598         |        86.289         |               32.0914               |
|         visformer_small         | 128 | 43.9894  |        23.5964         |       336.3081        |               28.9532               |
|            pit_b_224            | 64  | 42.9319  |        25.4213         |        749.448        |               37.6907               |
| deit_base_distilled_patch16_224 | 64  | 37.9943  |         22.833         |       191.8346        |               34.2248               |
|          resmlp_12_224          | 128 | 37.5703  |        16.7311         |       121.0222        |               21.0698               |
|            lcnet_050            | 128 | 36.3735  |         20.661         |       135.5437        |               23.8247               |
|        convmixer_768_32         | 32  | 34.6792  |        27.5799         |        96.8781        |               29.449                |
|          spnasnet_100           | 128 | 33.7377  |        33.3421         |        59.2232        |               36.1375               |
|      beit_base_patch16_224      | 64  | 33.2787  |        27.1632         |       305.5229        |               33.5373               |
|      vit_base_patch16_224       | 64  | 32.5272  |        22.1508         |        36.6643        |               31.6409               |
|            repvgg_a2            | 128 | 32.0223  |        28.4511         |       154.9555        |               32.5881               |
|          mixer_b16_224          | 128 | 29.1258  |         20.426         |       192.0016        |               24.9519               |
|           selecsls42b           | 128 | 28.7757  |        24.8736         |       152.2227        |               27.3081               |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|          gmlp_s16_224           | 128 |  1.1848  |         1.2263         |        1.1831         |               1.2263                |
|          pnasnet5large          | 16  |  1.1712  |         1.3174         |        1.1522         |               1.3167                |
|          gmixer_24_224          | 128 |  1.1117  |         1.1802         |        1.1144         |               1.1802                |
|           convit_base           | 64  |  1.0948  |         1.1825         |         1.098         |               1.1825                |
|         mobilenetv2_100         | 128 |  1.0431  |         1.155          |        1.0267         |                1.155                |
|           dm_nfnet_f0           | 128 |  1.013   |         1.0845         |        1.0129         |               1.0845                |
|          resmlp_12_224          | 128 |  1.0079  |         1.0838         |        1.0093         |               1.0838                |
|            tinynet_a            | 128 |  0.9984  |         1.0981         |        0.9986         |               1.0981                |
|           rexnet_100            | 128 |  0.9977  |         1.0733         |        0.9744         |               1.0734                |
|           resnest101e           | 64  |  0.9972  |         1.0989         |        0.9933         |               1.0989                |
|       tf_efficientnet_b0        | 128 |  0.9871  |         1.0917         |        0.9873         |               1.0915                |
|        tnt_s_patch16_224        | 128 |  0.9834  |         1.0597         |         0.986         |               1.0597                |
|        convmixer_768_32         | 32  |  0.9762  |         0.9981         |        0.9657         |               0.9981                |
|        twins_pcpvt_base         | 64  |  0.9729  |         1.085          |        0.9763         |                1.085                |
|           mobilevit_s           | 64  |  0.9557  |         1.0163         |        0.9262         |               1.0164                |
|             dla102              | 128 |  0.9536  |         1.0351         |        0.9528         |               1.0349                |
|          mixer_b16_224          | 128 |  0.9501  |         1.0049         |        0.9466         |               1.0049                |
|      vit_base_patch16_224       | 64  |  0.9362  |         0.9818         |        0.9362         |               0.9818                |
| deit_base_distilled_patch16_224 | 64  |  0.9353  |         0.9815         |        0.9072         |               0.9815                |
|         visformer_small         | 128 |  0.9348  |         1.029          |        0.9245         |                1.029                |
|           tf_mixnet_l           | 128 |  0.9346  |         1.0819         |        0.9344         |               1.0817                |
|      beit_base_patch16_224      | 64  |  0.9285  |         1.0106         |        0.9284         |               1.0106                |
|            fbnetv3_b            | 128 |  0.9228  |         0.9876         |         0.917         |               0.9939                |
|            nfnet_l0             | 128 |  0.9215  |         0.9953         |        0.9101         |               0.9953                |
|           volo_d1_224           | 64  |  0.9131  |         1.0027         |        0.9089         |               1.0028                |
|          cspdarknet53           | 64  |  0.9097  |         1.0473         |        0.9098         |               1.0473                |
|        ese_vovnet19b_dw         | 128 |  0.9047  |         0.9907         |        0.8976         |               0.9907                |
|          ghostnet_100           | 128 |  0.8976  |         1.0223         |        0.8408         |               1.0213                |
|            hrnet_w18            | 128 |  0.8918  |         1.0029         |        0.8898         |               1.0063                |
|        sebotnet33ts_256         | 64  |  0.891   |         1.1308         |        0.9207         |               1.1308                |
|        adv_inception_v3         | 128 |  0.8904  |         1.0264         |        0.8902         |               1.0265                |
|          inception_v3           | 128 |  0.8904  |         1.0264         |        0.8902         |               1.0265                |
|       gluon_inception_v3        | 128 |  0.8904  |         1.0264         |        0.8902         |               1.0265                |
|      mobilenetv3_large_100      | 128 |  0.8881  |         0.9808         |         0.865         |               0.9808                |
|             dpn107              | 32  |  0.8833  |         0.995          |        0.8676         |                0.995                |
|        gluon_xception65         | 32  |  0.8832  |         0.9952         |        0.8833         |               0.9952                |
|          spnasnet_100           | 128 |  0.8786  |         0.9858         |        0.8788         |               0.9858                |
|           selecsls42b           | 128 |  0.8785  |         0.9929         |        0.8473         |               0.9931                |
|         poolformer_m36          | 64  |  0.8768  |         1.1865         |        0.8592         |               1.1865                |
|       eca_botnext26ts_256       | 128 |  0.8738  |         1.0136         |        0.8738         |               1.0136                |
|        res2net50_14w_8s         | 128 |  0.8712  |         0.9743         |        0.8501         |               0.9745                |
|        res2net101_26w_4s        | 64  |  0.871   |         0.9759         |        0.8506         |               0.9759                |
|            mixnet_l             | 128 |  0.8687  |         1.0035         |        0.8686         |               1.0031                |
|           mnasnet_100           | 128 |  0.8683  |         0.9844         |        0.8684         |               0.9844                |
|           res2next50            | 128 |  0.866   |         0.9673         |        0.8659         |               0.9673                |
|          cait_m36_384           |  4  |  0.8632  |         1.0068         |        0.8633         |               1.0073                |
|           fbnetc_100            | 128 |  0.8596  |         0.991          |        0.8597         |                0.991                |
|            pit_b_224            | 64  |  0.8578  |         1.0345         |        0.8566         |               1.0345                |
|          convnext_base          | 64  |  0.8505  |         1.033          |        0.8317         |                1.033                |
|            gernet_l             | 128 |  0.8499  |         0.9793         |        0.8496         |               0.9793                |
|     swsl_resnext101_32x16d      | 32  |  0.8461  |         0.9986         |        0.8461         |               0.9986                |
|         coat_lite_mini          | 128 |  0.8402  |         1.033          |        0.8501         |                1.033                |
|            lcnet_050            | 128 |  0.8273  |         0.9465         |        0.8174         |               0.9465                |
|          botnet26t_256          | 128 |  0.8239  |         0.9848         |        0.8241         |               0.9848                |
|      xcit_large_24_p8_224       |  5  |  0.8225  |         1.0063         |         0.826         |               1.0104                |
|           regnety_002           | 128 |  0.8164  |         0.9526         |        0.7697         |               0.9526                |
|            repvgg_a2            | 128 |  0.7738  |         0.9882         |        0.7738         |               0.9882                |
|         crossvit_9_240          | 128 |  0.7526  |         0.9882         |        0.7524         |               0.9882                |
|  swin_base_patch4_window7_224   | 64  |  0.7214  |         0.9272         |        0.7297         |               0.9272                |
|          jx_nest_base           | 32  |  0.6693  |         0.9883         |        0.6705         |               0.9883                |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|        convmixer_768_32         | 32  | 298.9236 |        298.6576        |       297.1984        |              296.9094               |
|            hrnet_w18            | 128 | 204.8608 |        205.117         |       199.9165        |              204.6855               |
|          pnasnet5large          | 16  | 173.7291 |        171.2209        |       174.8788        |              168.5491               |
|           tf_mixnet_l           | 128 | 158.652  |        157.671         |       158.1293        |              156.9048               |
|            mixnet_l             | 128 | 152.9535 |        152.1897        |       152.5661        |              151.6937               |
|          cait_m36_384           |  4  | 122.7744 |        122.9833        |       114.7995        |              117.1221               |
|           resnest101e           | 64  | 113.7942 |        119.6901        |       113.8629        |              119.4312               |
|             dla102              | 128 | 111.7066 |        111.8767        |        111.516        |              111.9509               |
|     swsl_resnext101_32x16d      | 32  | 111.5857 |        115.3483        |       111.3781        |               115.949               |
|         poolformer_m36          | 64  | 107.0517 |        107.7065        |       107.0818        |              107.8243               |
|        tnt_s_patch16_224        | 128 | 106.6173 |        108.1408        |        96.4473        |               97.7809               |
|       gluon_inception_v3        | 128 | 104.292  |        104.7367        |       103.5322        |              104.0155               |
|        adv_inception_v3         | 128 | 104.0804 |        104.5105        |       103.5948        |               104.08                |
|          inception_v3           | 128 | 103.9287 |        104.8307        |       103.5586        |              104.3301               |
|        res2net50_14w_8s         | 128 | 101.6507 |        103.5652        |       100.1172        |              101.3419               |
|           convit_base           | 64  | 100.5689 |        100.6643        |        94.7881        |               94.7251               |
|             dpn107              | 32  | 96.1954  |        92.0629         |        95.9879        |               92.4044               |
|           res2next50            | 128 | 91.5825  |        92.1187         |        91.7129        |               91.9829               |
|        gluon_xception65         | 32  | 91.1507  |        90.9928         |        90.225         |               90.0748               |
|  swin_base_patch4_window7_224   | 64  | 88.9285  |        89.5009         |        83.7126        |               83.9664               |
|          mixer_b16_224          | 128 | 85.2227  |        85.0455         |        83.3849        |               82.9833               |
|        res2net101_26w_4s        | 64  | 84.9355  |        91.2943         |        84.5028        |               93.0162               |
|           dm_nfnet_f0           | 128 | 84.0395  |        86.8564         |        83.2917        |               86.2557               |
|            fbnetv3_b            | 128 |  82.681  |         81.766         |        82.8601        |               81.3775               |
|            pit_b_224            | 64  | 81.6451  |         82.029         |        72.7684        |               73.0988               |
|          convnext_base          | 64  | 80.0433  |         81.074         |        79.7035        |               80.7168               |
|         visformer_small         | 128 | 77.1963  |        77.6896         |        75.1034        |               75.4646               |
|      beit_base_patch16_224      | 64  | 74.5128  |        74.5353         |        68.8509        |               68.8849               |
|            nfnet_l0             | 128 | 74.0349  |        77.0221         |        73.9104        |               76.5953               |
|          gmlp_s16_224           | 128 | 73.2685  |        73.8547         |        72.4792        |               73.0208               |
|       eca_botnext26ts_256       | 128 | 72.7699  |        73.7935         |        72.5323        |               73.5424               |
|          jx_nest_base           | 32  | 72.1658  |        72.8174         |        63.5922        |               64.1123               |
|          cspdarknet53           | 64  | 71.1087  |        69.0098         |        70.6885        |               68.5118               |
|           volo_d1_224           | 64  | 70.5353  |        71.4076         |        68.2591        |               69.1839               |
|          botnet26t_256          | 128 |  69.98   |        69.0591         |        69.771         |               68.6946               |
|      vit_base_patch16_224       | 64  | 69.7539  |        69.6572         |        64.0009        |               63.9545               |
|            gernet_l             | 128 | 69.2196  |        67.2462         |        68.9426        |               66.829                |
| deit_base_distilled_patch16_224 | 64  | 66.9793  |        66.9572         |        63.7497        |               63.7385               |
|            repvgg_a2            | 128 | 66.1846  |        64.2209         |        65.8451        |               63.9399               |
|          gmixer_24_224          | 128 | 66.0192  |        66.5525         |        61.3279        |               61.9823               |
|      xcit_large_24_p8_224       |  5  | 60.7263  |        78.0806         |        58.0621        |               77.4042               |
|       tf_efficientnet_b0        | 128 | 59.7304  |        58.4102         |        59.7971        |               58.3392               |
|        twins_pcpvt_base         | 64  | 59.0531  |        68.5729         |        54.7026        |               71.1183               |
|           fbnetc_100            | 128 | 57.8504  |        55.8367         |        58.0617        |               56.4342               |
|           rexnet_100            | 128 | 57.8403  |        56.2595         |        57.5706        |               55.9677               |
|         coat_lite_mini          | 128 | 57.6942  |        58.2846         |        54.0633        |               54.5718               |
|            tinynet_a            | 128 | 56.3532  |        55.1254         |        56.2176        |               55.1946               |
|           mobilevit_s           | 64  | 56.0432  |        55.5573         |        54.433         |               53.7596               |
|        sebotnet33ts_256         | 64  |  50.45   |        49.4999         |        50.1737        |                49.16                |
|         crossvit_9_240          | 128 | 48.8902  |        49.6442         |        43.9164        |               44.6908               |
|          spnasnet_100           | 128 | 48.5291  |         46.618         |        48.6425        |               46.4811               |
|          ghostnet_100           | 128 | 47.9938  |        57.2881         |        47.8285        |               54.8628               |
|        ese_vovnet19b_dw         | 128 | 45.2378  |        44.5388         |        44.8207        |               44.129                |
|         mobilenetv2_100         | 128 | 44.4712  |        42.7257         |        44.5527        |               42.8295               |
|           selecsls42b           | 128 | 42.3522  |        42.3434         |        42.2274        |               42.304                |
|           mnasnet_100           | 128 | 42.1794  |        40.6623         |        42.3015        |               40.6776               |
|          resmlp_12_224          | 128 | 41.6781  |        41.8082         |        37.5586        |               37.7228               |
|      mobilenetv3_large_100      | 128 | 40.2061  |        40.5582         |        40.2527        |                40.32                |
|           regnety_002           | 128 | 25.6615  |        29.6504         |        25.6012        |               31.4771               |
|            lcnet_050            | 128 |  17.516  |        20.2249         |        17.4373        |               20.0605               |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_100_10_04_23_performance_amp_441/timm_models_amp.png :

Build Summary

see more

Run name

day_100_10_04_23_performance_amp_441

Commit hashes

pytorch commit: f55e72c0f6bd6da016aaa51de379e6ba6d7891cc
pytorch commit date: 2023-04-07 17:30:27+00:00
torchbench commit: 735f1927996c8d9ab81f0b0c05dd1ebdb26a6250
torchbench commit date: 2023-04-05 09:43:21-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitf55e72c

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

Performance Dashboard for amp precision (2.0 release binary oneoff)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 88%, 53/60 | 100%, 45/45 | 100%, 60/60 |
|       aot_eager        | 87%, 52/60 | 100%, 45/45 | 97%, 58/60  |
|        inductor        | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
| inductor_no_cudagraphs | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.59x    |    1.58x    |    1.41x    |
| inductor_no_cudagraphs |   1.27x    |    1.50x    |    1.39x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.85    |    7.26     |    5.99     |
|       aot_eager        |    9.37    |    15.82    |    13.21    |
|        inductor        |   63.80    |    62.92    |   111.25    |
| inductor_no_cudagraphs |   64.01    |    72.27    |   110.32    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.97x    |    1.00x    |    0.99x    |
|       aot_eager        |   0.86x    |    0.90x    |    0.88x    |
|        inductor        |   0.79x    |    0.91x    |    0.91x    |
| inductor_no_cudagraphs |   0.94x    |    1.05x    |    1.01x    |
+------------------------+------------+-------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |         hf_Longformer         |   fail_to_run   |      fail_to_run       |
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |        vision_maskrcnn        | eager_variation |    eager_variation     |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |              gat              |     0.0000      |         0.0000         |
| torchbench  |              gcn              |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |             sage              |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |          pass          |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |             dcgan             |  1.4097  |         0.8227         |
| torchbench  |         lennard_jones         |  1.3901  |         0.8762         |
| torchbench  |       soft_actor_critic       |  1.0289  |         0.7237         |
| torchbench  |          tts_angular          |  0.9646  |         0.949          |
| torchbench  |          timm_vovnet          |  0.9395  |         0.9242         |
| torchbench  |    nvidia_deeprecommender     |  0.8715  |         1.0183         |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.0813         |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  0.9657  |         0.8392         |
| huggingface |     DebertaV2ForMaskedLM      |  0.8861  |         0.6807         |
| huggingface | DebertaV2ForQuestionAnswering |  0.8253  |         0.6939         |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.2351         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 175.9942 |        175.5837        |
| torchbench  |        phlippe_densenet        | 165.4425 |        165.5889        |
| torchbench  |           hf_BigBird           | 151.3503 |        127.7386        |
| torchbench  |       timm_efficientnet        | 144.2798 |        145.5432        |
| torchbench  |       mobilenet_v3_large       | 141.1027 |        139.9887        |
| torchbench  |          densenet121           | 139.3279 |        137.4203        |
| torchbench  |          mobilenet_v2          | 133.2833 |        132.3333        |
| torchbench  |             yolov3             | 121.0227 |        118.0911        |
| torchbench  | timm_vision_transformer_large  |   nan    |        125.7471        |
| huggingface |     MobileBertForMaskedLM      | 150.6225 |        148.9238        |
| huggingface | MobileBertForQuestionAnswering | 140.7136 |        653.5987        |
| huggingface |      DebertaV2ForMaskedLM      | 138.8626 |        75.8935         |
| huggingface | DebertaV2ForQuestionAnswering  | 137.5645 |        72.7915         |
| huggingface | M2M100ForConditionalGeneration | 135.8697 |        137.9521        |
| huggingface |  MT5ForConditionalGeneration   | 134.411  |        133.8593        |
| huggingface |        XGLMForCausalLM         | 133.5907 |        132.7691        |
| timm_models |           rexnet_100           | 275.1515 |        276.6628        |
| timm_models |           hrnet_w18            | 255.7748 |        249.9176        |
| timm_models |          ghostnet_100          | 244.5686 |        243.1281        |
| timm_models |           fbnetv3_b            | 178.1302 |        174.7349        |
| timm_models |         pnasnet5large          | 167.4111 |        161.8981        |
| timm_models |          resnest101e           | 166.679  |        168.3601        |
| timm_models |          mobilevit_s           | 164.6195 |        161.0406        |
| timm_models |       gluon_inception_v3       | 162.3067 |        161.1515        |
| timm_models |        adv_inception_v3        | 162.0021 |        163.0911        |
| timm_models |           tinynet_a            | 160.6803 |        156.4597        |
| timm_models |     mobilenetv3_large_100      | 160.5822 |        153.4424        |
| timm_models |            mixnet_l            | 159.9061 |        158.8318        |
| timm_models |          inception_v3          | 156.6721 |        159.4795        |
| timm_models |          tf_mixnet_l           | 156.3445 |        156.0664        |
| timm_models |       res2net101_26w_4s        | 153.8691 |        153.8122        |
| timm_models |        twins_pcpvt_base        | 149.9154 |        147.8376        |
| timm_models |       tf_efficientnet_b0       | 149.8097 |        154.6713        |
| timm_models |           fbnetc_100           | 136.5809 |        133.1516        |
| timm_models |          spnasnet_100          |  136.5   |        137.2727        |
| timm_models |      xcit_large_24_p8_224      | 135.2229 |        132.5646        |
| timm_models |        mobilenetv2_100         | 130.7307 |        133.0086        |
| timm_models |          mnasnet_100           | 126.2514 |        126.592         |
| timm_models |        res2net50_14w_8s        | 123.4541 |        126.3791        |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |              hf_GPT2_large              |  0.8904  |         1.128          |
| torchbench  |                 yolov3                  |  0.8742  |         1.0155         |
| torchbench  |            timm_efficientnet            |  0.8696  |         0.9417         |
| torchbench  |           speech_transformer            |  0.8651  |         0.8682         |
| torchbench  |              timm_resnest               |  0.8604  |         0.9668         |
| torchbench  |           shufflenet_v2_x1_0            |  0.8598  |         0.9587         |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.8835         |
| torchbench  |               timm_regnet               |  0.8507  |         0.9508         |
| torchbench  |                resnet152                |  0.8501  |         0.9397         |
| torchbench  |           Background_Matting            |  0.8485  |         1.0406         |
| torchbench  |              hf_DistilBert              |  0.8476  |         0.9945         |
| torchbench  |               hf_T5_large               |  0.8201  |         1.168          |
| torchbench  |              pytorch_unet               |  0.8134  |         0.9308         |
| torchbench  |            phlippe_densenet             |  0.8058  |         0.8659         |
| torchbench  |                 hf_Bart                 |  0.7933  |         0.9173         |
| torchbench  |                resnet50                 |  0.7821  |         0.8839         |
| torchbench  |                  dcgan                  |  0.7821  |         0.9645         |
| torchbench  |                 demucs                  |  0.773   |         0.9656         |
| torchbench  |              squeezenet1_1              |  0.7722  |         0.908          |
| torchbench  |             pytorch_stargan             |  0.7715  |         0.8893         |
| torchbench  |               timm_vovnet               |  0.7529  |         0.8869         |
| torchbench  |               mnasnet1_0                |  0.7438  |         0.778          |
| torchbench  |             pytorch_struct              |  0.7277  |         0.7362         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9808         |
| torchbench  |               densenet121               |  0.7096  |         0.7998         |
| torchbench  |                 alexnet                 |  0.7091  |         0.939          |
| torchbench  |           mobilenet_v3_large            |  0.6984  |         0.8724         |
| torchbench  |               hf_BigBird                |  0.6961  |         1.1191         |
| torchbench  |             resnext50_32x4d             |  0.6682  |         0.772          |
| torchbench  |         nvidia_deeprecommender          |  0.6585  |         0.8931         |
| torchbench  |                   drq                   |  0.6379  |         0.9573         |
| torchbench  |            soft_actor_critic            |  0.6066  |         0.9973         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.7463         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.5904  |         0.6004         |
| torchbench  |                resnet18                 |  0.5395  |         0.6097         |
| torchbench  |              lennard_jones              |  0.5317  |         0.9997         |
| torchbench  |               hf_Reformer               |  0.4538  |         0.8022         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         0.4424         |
| torchbench  |             phlippe_resnet              |  0.3169  |         0.3395         |
| huggingface |           PegasusForCausalLM            |  0.893   |         0.9864         |
| huggingface |          DistilBertForMaskedLM          |  0.8849  |         0.9624         |
| huggingface |            TrOCRForCausalLM             |  0.8836  |         0.9583         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8729  |         0.9803         |
| huggingface |     PegasusForConditionalGeneration     |  0.8689  |         1.0689         |
| huggingface |      MBartForConditionalGeneration      |  0.8672  |         1.0307         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         1.0139         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         1.0962         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8184  |         0.9119         |
| huggingface |         Speech2Text2ForCausalLM         |  0.789   |         0.8779         |
| huggingface |     M2M100ForConditionalGeneration      |  0.7651  |         0.9908         |
| huggingface |          MobileBertForMaskedLM          |  0.7473  |         1.016          |
| huggingface |             XGLMForCausalLM             |  0.7117  |         0.9792         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6569  |         0.8392         |
| huggingface |           DebertaForMaskedLM            |  0.5501  |         0.9978         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5197  |         0.9665         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.487   |         0.9802         |
| huggingface |       DebertaForQuestionAnswering       |  0.4601  |         1.1526         |
| timm_models |                hrnet_w18                |  0.8918  |          0.99          |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1115         |
| timm_models |              inception_v3               |  0.8904  |         1.0171         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0171         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0171         |
| timm_models |                 dpn107                  |  0.8833  |         0.9642         |
| timm_models |            gluon_xception65             |  0.8831  |         0.9705         |
| timm_models |              ghostnet_100               |  0.8807  |         0.977          |
| timm_models |              spnasnet_100               |  0.8786  |         0.9451         |
| timm_models |          mobilenetv3_large_100          |  0.877   |         0.9361         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1871         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0072         |
| timm_models |          xcit_large_24_p8_224           |  0.8721  |         0.9732         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9607         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9483         |
| timm_models |                mixnet_l                 |  0.8687  |         0.9902         |
| timm_models |               mnasnet_100               |  0.8683  |         0.9403         |
| timm_models |               res2next50                |  0.866   |         0.9547         |
| timm_models |              cait_m36_384               |  0.8632  |         0.989          |
| timm_models |               fbnetc_100                |  0.8596  |         0.9535         |
| timm_models |                pit_b_224                |  0.8578  |         1.0242         |
| timm_models |               selecsls42b               |  0.8576  |         0.9664         |
| timm_models |              convnext_base              |  0.8505  |         1.0338         |
| timm_models |                gernet_l                 |  0.8499  |         0.9706         |
| timm_models |         swsl_resnext101_32x16d          |  0.8461  |         0.9786         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0202         |
| timm_models |              botnet26t_256              |  0.8239  |         0.9779         |
| timm_models |                lcnet_050                |  0.805   |         0.884          |
| timm_models |                repvgg_a2                |  0.7738  |         0.9611         |
| timm_models |               regnety_002               |  0.7602  |         0.8966         |
| timm_models |             crossvit_9_240              |  0.7526  |         0.9898         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9045         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9604         |
+-------------+-----------------------------------------+----------+------------------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|           BERT_pytorch            |  16  | 0.9922 |  0.8069   |  3.5997  |         2.1159         |
|       functorch_dp_cifar10        |  64  | 0.9647 |  0.9154   |  3.5789  |         1.3412         |
|            densenet121            |  4   | 0.9882 |  0.7174   |  2.7605  |         1.0121         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9728 |  0.8939   |  2.6642  |         1.8162         |
|            hf_BigBird             |  2   | 0.9603 |  0.7797   |  2.6313  |         1.6709         |
|             hf_Albert             |  8   | 0.9918 |   0.956   |  2.3383  |         2.2906         |
|            hf_T5_large            |  2   | 0.9737 |  0.8029   |  2.2331  |         1.8342         |
|         phlippe_densenet          | 128  | 0.9825 |  0.7652   |  2.0708  |         0.9888         |
|        mobilenet_v3_large         |  32  | 0.9988 |   0.772   |  2.0664  |         1.1584         |
|           squeezenet1_1           |  32  | 0.9841 |  0.9338   |  2.063   |         1.2499         |
|               dlrm                | 1024 | 0.9526 |   0.827   |  1.9633  |         1.2072         |
|              hf_GPT2              |  4   | 0.9997 |  0.9597   |  1.9349  |         1.8027         |
|               hf_T5               |  8   | 0.9842 |  0.8532   |  1.9135  |         1.9983         |
|              hf_Bert              |  4   | 0.9947 |  0.8383   |  1.8512  |         1.5752         |
|          phlippe_resnet           | 128  | 0.9851 |  0.7556   |  1.8385  |         0.9817         |
|          resnext50_32x4d          |  8   | 0.9819 |  0.7161   |  1.7426  |         0.9686         |
|            mnasnet1_0             |  32  | 0.9936 |  0.7307   |  1.7159  |         1.0948         |
|      timm_vision_transformer      |  32  | 0.9809 |  0.8515   |  1.711   |         1.3864         |
|              hf_Bart              |  4   | 0.9789 |  0.8422   |  1.678   |         1.5056         |
|           hf_GPT2_large           |  4   | 0.9828 |  0.9713   |  1.6777  |         1.7374         |
|        shufflenet_v2_x1_0         | 128  | 0.9933 |  0.7467   |  1.673   |         1.2154         |
|        speech_transformer         |  32  | 0.9757 |  0.7876   |  1.6058  |         1.5851         |
|           hf_Bert_large           |  4   | 1.0027 |  0.8536   |  1.6025  |         1.5538         |
|             resnet18              |  16  | 0.9871 |   0.76    |  1.5882  |         0.9764         |
|           timm_resnest            |  32  | 0.9937 |  0.8577   |  1.5568  |         1.4949         |
|           fastNLP_Bert            |  6   | 0.9862 |   0.798   |  1.5446  |         1.5062         |
|          pytorch_struct           | 200  | 0.9148 |  0.7762   |  1.5382  |         1.1431         |
|            timm_nfnet             | 128  | 0.986  |  0.9854   |  1.5349  |         1.468          |
|           mobilenet_v2            |  96  | 0.9967 |   0.777   |  1.5261  |         1.5179         |
|                drq                |  1   | 0.9633 |  0.7385   |  1.5083  |         1.0341         |
| attention_is_all_you_need_pytorch | 256  | 0.9864 |  0.8339   |  1.5013  |         1.4689         |
|           hf_DistilBert           |  8   | 0.9963 |  0.9573   |  1.4862  |         1.4746         |
|         timm_efficientnet         |  32  | 0.9365 |  0.6228   |  1.4462  |         1.0716         |
|               dcgan               |  32  | 0.8588 |  0.6885   |  1.4097  |         0.8227         |
|           lennard_jones           | 1000 | 0.8643 |  0.7665   |  1.3901  |         0.8762         |
|           pytorch_unet            |  1   | 0.9963 |  0.2048   |  1.3577  |         1.3522         |
|          LearningToPaint          |  96  | 0.9851 |  0.7718   |  1.3021  |         1.0694         |
|          pytorch_stargan          |  16  | 0.9907 |  0.8009   |  1.2742  |         1.2469         |
|             resnet152             |  32  | 0.9946 |  0.7479   |  1.2512  |         1.0055         |
|               vgg16               |  64  | 0.9994 |  0.9983   |  1.2406  |         1.2536         |
|            Super_SloMo            |  6   | 0.997  |  0.1792   |  1.2323  |         1.2329         |
|        Background_Matting         |  4   | 0.9985 |  0.1369   |  1.2132  |         1.2076         |
|              yolov3               |  16  | 0.9957 |  0.8061   |  1.1973  |         1.1979         |
|             resnet50              |  32  | 0.994  |  0.7755   |  1.1916  |         1.0536         |
|            hf_Reformer            |  4   | 0.9857 |   0.963   |  1.1225  |         1.0582         |
|              alexnet              | 128  | 0.9989 |  0.9975   |  1.0872  |         1.1367         |
|              demucs               |  4   | 0.9987 |  1.0013   |  1.0425  |         1.0389         |
|         soft_actor_critic         | 256  | 0.8469 |   0.627   |  1.0289  |         0.7237         |
|            timm_regnet            |  32  | 0.9173 |  0.7724   |  0.9877  |         0.9643         |
|            tts_angular            |  64  | 0.9128 |  0.8758   |  0.9646  |         0.949          |
|            timm_vovnet            |  32  | 0.855  |  0.7008   |  0.9395  |         0.9242         |
|      nvidia_deeprecommender       | 256  | 0.9987 |  0.9986   |  0.8715  |         1.0183         |
|   timm_vision_transformer_large   |  32  | 0.9981 |    0.0    |   0.0    |         1.0813         |
|           hf_Longformer           |  2   | 1.0048 |  0.6888   |   0.0    |          0.0           |
|               moco                |  32  | 0.9358 |    0.0    |   0.0    |          0.0           |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            |  2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            |  4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  |  fail_accuracy   |      0.0000      | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|            hf_T5_large            |  2   | 27.0016 |  55.4844  | 175.9942 |        175.5837        |
|         phlippe_densenet          | 128  | 3.2932  |  7.1632   | 165.4425 |        165.5889        |
|            hf_BigBird             |  2   | 13.0269 |  37.7408  | 151.3503 |        127.7386        |
|         timm_efficientnet         |  32  | 4.9739  |  10.2058  | 144.2798 |        145.5432        |
|        mobilenet_v3_large         |  32  | 3.4446  |   8.179   | 141.1027 |        139.9887        |
|            densenet121            |  4   | 7.6852  |  18.1939  | 139.3279 |        137.4203        |
|           mobilenet_v2            |  96  | 3.1576  |  7.0356   | 133.2833 |        132.3333        |
|              yolov3               |  16  | 5.0054  |  10.7865  | 121.0227 |        118.0911        |
|            mnasnet1_0             |  32  | 3.1362  |   6.849   | 111.3499 |        111.0362        |
|           hf_GPT2_large           |  4   | 15.1052 |  30.3518  | 109.1564 |        104.9821        |
|             resnet152             |  32  | 9.1608  |  20.5326  | 108.0717 |        106.4709        |
|           timm_resnest            |  32  | 1.8451  |  3.9997   | 95.9391  |        100.3221        |
|        shufflenet_v2_x1_0         | 128  | 3.4764  |  7.7384   | 83.0937  |        84.7648         |
|        speech_transformer         |  32  |  5.972  |  13.784   | 78.4976  |         77.929         |
| attention_is_all_you_need_pytorch | 256  | 4.4253  |  11.1024  | 76.5471  |        74.6155         |
|            timm_nfnet             | 128  | 6.2417  |  11.1718  | 75.4307  |        73.0296         |
|            timm_regnet            |  32  | 6.8625  |  12.4395  | 72.4393  |        73.1837         |
|        Background_Matting         |  4   | 3.1076  |  11.5416  | 70.8216  |        67.8165         |
|           BERT_pytorch            |  16  | 4.9397  |  11.6625  | 70.4056  |         70.713         |
|             resnet50              |  32  | 3.2582  |  7.0396   | 67.8751  |        65.1941         |
|           hf_Bert_large           |  4   | 10.4292 |  21.3581  | 65.8159  |        63.4937         |
|            timm_vovnet            |  32  | 3.6376  |  6.4162   | 64.4312  |        63.4021         |
|           pytorch_unet            |  1   | 1.5486  |   4.451   | 60.7753  |        58.7652         |
|       functorch_dp_cifar10        |  64  | 1.2096  |  2.4342   | 57.3101  |        56.1893         |
|          resnext50_32x4d          |  8   | 3.2174  |  7.0465   |  54.085  |        54.1601         |
|      timm_vision_transformer      |  32  | 3.3304  |  7.4161   | 53.1279  |        51.8676         |
|               hf_T5               |  8   | 5.9817  |  13.6137  | 52.3317  |         51.655         |
|           fastNLP_Bert            |  6   | 5.2394  |  11.3087  | 51.3959  |         51.806         |
|              hf_Bart              |  4   | 6.3516  |  13.9029  | 49.4887  |        50.6385         |
|            hf_Reformer            |  4   |  4.165  |  6.0667   |  48.277  |         43.817         |
|          pytorch_stargan          |  16  | 1.2151  |  3.2242   | 46.5096  |         47.096         |
|          LearningToPaint          |  96  | 1.4142  |  2.9085   | 46.3854  |        45.0346         |
|             resnet18              |  16  | 1.3514  |  2.9053   | 45.4315  |        44.3227         |
|            Super_SloMo            |  6   | 2.7691  |  10.2726  | 43.8271  |        44.9401         |
|              hf_GPT2              |  4   | 4.9339  |  9.6931   | 42.6598  |         43.531         |
|              hf_Bert              |  4   | 5.1149  |  10.5558  | 39.4039  |         40.363         |
|             hf_Albert             |  8   | 2.6288  |  8.0988   | 39.0277  |        39.6987         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 1.2908  |  2.9679   | 37.8462  |        36.5206         |
|          phlippe_resnet           | 128  | 1.3605  |  2.9018   | 32.9531  |        32.6658         |
|           hf_DistilBert           |  8   | 2.4936  |  5.2869   | 32.0276  |        30.0542         |
|              demucs               |  4   | 1.4353  |  2.1884   | 31.8417  |        29.9797         |
|           squeezenet1_1           |  32  | 1.0522  |  1.7709   | 23.9868  |        25.4235         |
|          pytorch_struct           | 200  | 0.7499  |  1.3475   | 21.7295  |        21.2596         |
|               vgg16               |  64  | 0.6368  |  1.1245   | 17.3685  |         17.052         |
|              alexnet              | 128  | 0.4866  |  0.7789   | 15.4395  |        15.4033         |
|      nvidia_deeprecommender       | 256  | 0.4753  |  0.8006   |  10.797  |        10.7915         |
|                drq                |  1   | 0.6622  |  1.0246   |  9.6767  |         9.3641         |
|               dcgan               |  32  | 0.4382  |  0.7187   |  9.1784  |         8.8061         |
|         soft_actor_critic         | 256  | 0.4318  |  0.6065   |  8.3155  |         7.8794         |
|               dlrm                | 1024 |  0.379  |  0.7845   |  7.8935  |         8.4439         |
|           lennard_jones           | 1000 | 0.3995  |  0.6023   |  7.0385  |         7.041          |
|            tts_angular            |  64  | 0.4509  |  0.5187   |  6.798   |         6.7437         |
|   timm_vision_transformer_large   |  32  | 9.4809  |    nan    |   nan    |        125.7471        |
|           hf_Longformer           |  2   | 9.8627  |  30.5952  |   nan    |          nan           |
|               moco                |  32  | 33.564  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0014 |   0.822   |  1.208   |         1.208          |
|             hf_Albert             |  8   | 0.9599 |  0.9008   |  1.0863  |         1.2557         |
|           fastNLP_Bert            |  6   | 1.0003 |  0.8878   |  1.0496  |         1.1593         |
|               hf_T5               |  8   | 0.9507 |  0.8891   |  1.0163  |         1.1719         |
|           mobilenet_v2            |  96  | 0.9863 |  0.7657   |  1.0107  |         1.1025         |
|            tts_angular            |  64  | 0.9983 |  0.9983   |  0.9895  |         0.9983         |
| attention_is_all_you_need_pytorch | 256  | 0.9648 |  0.9066   |  0.9689  |         1.1266         |
|            timm_nfnet             | 128  | 0.9071 |  0.8753   |  0.9677  |         1.073          |
|               dlrm                | 1024 | 0.9995 |  0.9944   |  0.952   |         1.0009         |
|           BERT_pytorch            |  16  | 1.0003 |  0.8671   |  0.9428  |         1.1717         |
|              hf_Bert              |  4   | 0.9645 |  0.8353   |  0.9422  |         1.0258         |
|           hf_Bert_large           |  4   | 0.9845 |  0.8521   |  0.9402  |         1.0725         |
|              hf_GPT2              |  4   | 0.9357 |  0.8198   |  0.9321  |         1.0713         |
|           hf_GPT2_large           |  4   | 0.9663 |  0.8303   |  0.8904  |         1.128          |
|              yolov3               |  16  | 0.9837 |   0.846   |  0.8742  |         1.0155         |
|         timm_efficientnet         |  32  | 0.9846 |  0.7674   |  0.8696  |         0.9417         |
|        speech_transformer         |  32  | 0.9915 |    0.9    |  0.8651  |         0.8682         |
|           timm_resnest            |  32  | 0.9881 |  0.8984   |  0.8604  |         0.9668         |
|        shufflenet_v2_x1_0         | 128  | 0.9549 |  0.8395   |  0.8598  |         0.9587         |
|      timm_vision_transformer      |  32  | 0.9907 |  0.9299   |  0.8593  |         0.8835         |
|            timm_regnet            |  32  | 0.9908 |  0.8523   |  0.8507  |         0.9508         |
|             resnet152             |  32  | 0.9959 |  0.8912   |  0.8501  |         0.9397         |
|        Background_Matting         |  4   | 1.0127 |  0.6489   |  0.8485  |         1.0406         |
|           hf_DistilBert           |  8   | 0.9262 |  0.8146   |  0.8476  |         0.9945         |
|            hf_T5_large            |  2   | 0.9831 |  0.8302   |  0.8201  |         1.168          |
|           pytorch_unet            |  1   | 0.9953 |  0.7154   |  0.8134  |         0.9308         |
|         phlippe_densenet          | 128  | 0.9983 |  0.9982   |  0.8058  |         0.8659         |
|              hf_Bart              |  4   | 0.9087 |  0.7524   |  0.7933  |         0.9173         |
|             resnet50              |  32  | 0.9894 |  0.8606   |  0.7821  |         0.8839         |
|               dcgan               |  32  | 0.9647 |  0.7957   |  0.7821  |         0.9645         |
|              demucs               |  4   | 0.9661 |  0.9657   |  0.773   |         0.9656         |
|           squeezenet1_1           |  32  | 0.9666 |  0.9321   |  0.7722  |         0.908          |
|          pytorch_stargan          |  16  | 0.9914 |   0.969   |  0.7715  |         0.8893         |
|            timm_vovnet            |  32  | 0.9892 |  0.8166   |  0.7529  |         0.8869         |
|            mnasnet1_0             |  32  | 0.9801 |  0.8971   |  0.7438  |         0.778          |
|          pytorch_struct           | 200  | 0.9992 |  0.5106   |  0.7277  |         0.7362         |
|               vgg16               |  64  | 0.9923 |  0.7245   |  0.7227  |         0.9808         |
|            densenet121            |  4   | 0.994  |  0.9823   |  0.7096  |         0.7998         |
|              alexnet              | 128  | 0.9454 |  0.7939   |  0.7091  |         0.939          |
|        mobilenet_v3_large         |  32  | 0.979  |  0.8383   |  0.6984  |         0.8724         |
|            hf_BigBird             |  2   | 0.9486 |  0.9264   |  0.6961  |         1.1191         |
|          resnext50_32x4d          |  8   | 0.9942 |  0.8441   |  0.6682  |         0.772          |
|      nvidia_deeprecommender       | 256  | 0.9176 |  0.8055   |  0.6585  |         0.8931         |
|                drq                |  1   | 0.9877 |  0.8852   |  0.6379  |         0.9573         |
|         soft_actor_critic         | 256  | 0.9995 |  0.9239   |  0.6066  |         0.9973         |
|          LearningToPaint          |  96  | 0.9192 |  0.7116   |  0.5925  |         0.7463         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.9965 |  0.8796   |  0.5904  |         0.6004         |
|             resnet18              |  16  | 0.9753 |  0.7786   |  0.5395  |         0.6097         |
|           lennard_jones           | 1000 | 0.9996 |  0.9997   |  0.5317  |         0.9997         |
|            hf_Reformer            |  4   | 0.8004 |  0.8004   |  0.4538  |         0.8022         |
|       functorch_dp_cifar10        |  64  | 0.9953 |  0.8396   |  0.3991  |         0.4424         |
|          phlippe_resnet           | 128  | 0.9881 |   0.864   |  0.3169  |         0.3395         |
|   timm_vision_transformer_large   |  32  | 0.9992 |    nan    |   nan    |         0.9724         |
|           hf_Longformer           |  2   | 0.9511 |   0.893   |   nan    |          nan           |
|               moco                |  32  | 0.9994 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           hf_GPT2_large           |  4   | 212.5713 | 215.0116  | 124.5754 |        120.1776        |
|        Background_Matting         |  4   | 125.9211 | 918.9735  | 103.6754 |        104.2231        |
|            hf_T5_large            |  2   | 226.7349 | 274.3269  | 101.6134 |        123.3929        |
|               hf_T5               |  8   | 182.5237 | 212.6779  | 93.9617  |        90.7301         |
|            timm_nfnet             | 128  | 119.6924 |  119.881  | 76.9207  |        80.3142         |
|            hf_BigBird             |  2   | 199.7247 | 282.9694  | 74.0526  |        119.391         |
|            hf_Reformer            |  4   | 82.1887  |  84.0477  | 72.3456  |         76.472         |
|            Super_SloMo            |  6   | 79.7687  | 444.0523  | 64.3392  |        64.3528         |
|              yolov3               |  16  | 68.8495  |  84.9866  |  57.271  |        57.2379         |
|            timm_regnet            |  32  | 61.2759  |  72.6624  |   57.0   |        58.5843         |
|               vgg16               |  64  | 66.2181  |  66.283   | 53.4235  |         52.888         |
|             resnet152             |  32  |  65.944  |  88.1843  | 53.4196  |        69.4877         |
|           hf_Bert_large           |  4   |  83.431  |  96.3325  | 51.9581  |        52.8354         |
|              demucs               |  4   | 53.8259  |  53.7991  | 51.8241  |        51.8682         |
| attention_is_all_you_need_pytorch | 256  |  58.61   |  68.8829  | 36.1536  |        36.2749         |
|        speech_transformer         |  32  | 68.2645  |  84.1561  | 36.1351  |         40.846         |
|              hf_Bart              |  4   | 72.7967  |  91.1483  | 34.8988  |        57.3494         |
|           fastNLP_Bert            |  6   | 57.4997  |  70.1987  | 33.8033  |        34.6908         |
|           mobilenet_v2            |  96  | 47.0762  |  60.4194  | 30.7384  |        30.9666         |
|           pytorch_unet            |  1   | 39.9373  | 194.3137  | 29.3036  |        29.4218         |
|             hf_Albert             |  8   | 70.2281  |  71.3937  | 29.1167  |        29.7844         |
|              hf_GPT2              |  4   | 53.1371  |  50.6452  | 27.2817  |        27.0821         |
|            timm_vovnet            |  32  | 28.9548  |  35.2513  | 26.3159  |        26.7092         |
|              hf_Bert              |  4   | 41.7318  |  48.7282  | 22.4821  |        26.0855         |
|         timm_efficientnet         |  32  | 34.3449  |  51.7878  | 22.3151  |        30.4378         |
|             resnet50              |  32  | 26.9137  |  33.4846  |  22.034  |         25.551         |
|           hf_DistilBert           |  8   | 33.6447  |  32.7199  | 21.5907  |        21.2494         |
|            densenet121            |  4   | 60.6301  |  73.3102  | 19.1877  |        57.5659         |
|        shufflenet_v2_x1_0         | 128  | 32.1091  |  42.4701  | 18.8091  |        25.2013         |
|      timm_vision_transformer      |  32  | 33.3921  |  38.255   | 18.3129  |        22.6728         |
|           BERT_pytorch            |  16  | 63.7573  |  78.4856  | 17.6744  |        25.7188         |
|           timm_resnest            |  32  | 24.3614  |  28.0114  | 15.3539  |        16.1654         |
|          resnext50_32x4d          |  8   | 20.0174  |  27.6844  | 12.9061  |        23.0576         |
|            mnasnet1_0             |  32  | 23.5617  |  31.9797  | 12.9007  |        20.4177         |
|        mobilenet_v3_large         |  32  | 28.7084  |  36.8709  | 12.8173  |        24.9966         |
|      nvidia_deeprecommender       | 256  | 10.2273  |  10.2441  | 11.7005  |         10.039         |
|          pytorch_stargan          |  16  | 14.8667  |  18.3652  |  11.619  |        11.8572         |
|         phlippe_densenet          | 128  | 23.9634  |  30.7791  | 11.5006  |        23.8297         |
|              alexnet              | 128  |  9.8213  |  9.8453   |  9.0183  |         8.6404         |
|          LearningToPaint          |  96  |  11.366  |  15.1778  |  8.5461  |        10.5252         |
|            tts_angular            |  64  |  6.8889  |  7.1934   |  7.3287  |         6.7854         |
|             resnet18              |  16  |  9.0945  |  11.8108  |  6.1737  |         9.2452         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 15.3745  |  17.0378  |  5.7449  |         8.4458         |
|           squeezenet1_1           |  32  |  11.141  |  11.7973  |  5.4112  |         8.8339         |
|          phlippe_resnet           | 128  |  9.1401  |  12.0766  |  4.9692  |         9.2926         |
|          pytorch_struct           | 200  |  5.2106  |  6.0388   |  3.1803  |         4.7757         |
|       functorch_dp_cifar10        |  64  | 10.6262  |  11.161   |  2.829   |         7.632          |
|         soft_actor_critic         | 256  |  1.8478  |  2.7498   |  2.3774  |         3.1232         |
|               dlrm                | 1024 |  4.9623  |  5.6626   |  2.134   |         3.5273         |
|                drq                |  1   |  3.4367  |  4.4024   |  2.1296  |         3.1579         |
|               dcgan               |  32  |  2.4622  |  3.1088   |  1.535   |         2.8966         |
|           lennard_jones           | 1000 |  1.8775  |  2.1456   |  1.1697  |         1.796          |
|   timm_vision_transformer_large   |  32  | 464.5636 |    nan    |   nan    |        428.4966        |
|           hf_Longformer           |  2   | 122.311  | 162.3272  |   nan    |          nan           |
|               moco                |  32  | 55.3786  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9882 |  0.9043   |  2.4603  |         2.4844         |
|          MobileBertForMaskedLM          | 64  | 0.945  |  0.8098   |  2.4031  |         1.0739         |
|      GPT2ForSequenceClassification      |  4  | 0.9749 |  0.9511   |  2.2733  |         2.2876         |
|     MobileBertForQuestionAnswering      | 128 | 0.9492 |  0.8023   |  2.1662  |         1.1065         |
|       MT5ForConditionalGeneration       | 16  | 0.9868 |  0.8375   |  2.131   |         2.1327         |
|       ElectraForQuestionAnswering       | 64  | 0.9871 |  0.9754   |  2.1164  |         2.1086         |
|           ElectraForCausalLM            | 32  | 0.9814 |  0.9375   |  1.8425  |         1.8404         |
|            XLNetLMHeadModel             |  8  | 0.9952 |  0.9672   |  1.8089  |         1.8186         |
|    LayoutLMForSequenceClassification    | 16  | 0.9844 |  0.9706   |  1.801   |         1.7894         |
|       RobertaForQuestionAnswering       | 16  | 0.9842 |  0.9694   |  1.7883  |         1.7572         |
|        BertForQuestionAnswering         | 16  | 0.984  |  0.9695   |  1.7746  |         1.761          |
|             XGLMForCausalLM             |  8  | 1.0009 |  0.8353   |  1.7021  |         1.467          |
|           RobertaForCausalLM            | 16  | 0.9868 |  0.9619   |  1.6805  |         1.6658         |
|     M2M100ForConditionalGeneration      | 16  | 0.9694 |  0.8432   |  1.671   |         1.3683         |
|               DistillGPT2               | 16  | 0.9866 |  0.9543   |  1.6568  |         1.6994         |
|       AlbertForQuestionAnswering        |  4  | 0.9997 |  0.8856   |  1.6476  |         1.6435         |
|            PLBartForCausalLM            |  8  | 0.985  |  0.9581   |  1.6399  |         1.6817         |
|            AlbertForMaskedLM            |  4  | 0.9996 |  0.8847   |  1.6394  |         1.6363         |
|                 T5Small                 |  4  | 0.979  |  0.8493   |  1.6338  |         1.7547         |
|       T5ForConditionalGeneration        |  4  | 0.9781 |  0.8491   |  1.6216  |         1.7275         |
|     PLBartForConditionalGeneration      |  4  | 0.9863 |  0.9462   |  1.6209  |         1.6515         |
|    MegatronBertForQuestionAnswering     |  8  | 0.9802 |  0.9611   |  1.6048  |         1.6287         |
|             BertForMaskedLM             | 16  | 0.9858 |  0.9609   |  1.5947  |         1.5825         |
|           LayoutLMForMaskedLM           | 16  | 0.9861 |  0.9622   |  1.5805  |         1.5935         |
|                CamemBert                | 16  | 0.9869 |  0.9635   |  1.5453  |         1.5353         |
|         Speech2Text2ForCausalLM         | 256 | 0.9717 |  0.9143   |  1.5334  |         1.5754         |
|             BartForCausalLM             |  4  | 0.9848 |  0.9561   |  1.5161  |          1.55          |
|            YituTechConvBert             | 16  | 0.9856 |  0.9579   |  1.5119  |         1.491          |
|            MBartForCausalLM             |  4  | 0.9827 |  0.9526   |  1.5088  |         1.5417         |
|         MegatronBertForCausalLM         |  4  | 0.9946 |  0.9099   |  1.4689  |         1.4965         |
|      BartForConditionalGeneration       |  2  | 0.9949 |  0.9698   |  1.4594  |         1.4429         |
|      MBartForConditionalGeneration      |  2  | 0.9964 |  0.9611   |  1.4485  |         1.4278         |
|     DistilBertForQuestionAnswering      | 256 | 0.9938 |  0.9868   |  1.4465  |         1.4456         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9967 |  0.9194   |  1.3618  |         1.4146         |
|     PegasusForConditionalGeneration     | 32  | 0.9954 |  0.9419   |  1.343   |         1.3484         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9818 |  0.9067   |  1.2689  |         1.2793         |
|            TrOCRForCausalLM             | 32  | 0.9875 |  0.9527   |  1.2554  |         1.2906         |
|          DistilBertForMaskedLM          | 128 | 0.9924 |  0.9504   |  1.2082  |         1.2325         |
|           PegasusForCausalLM            | 32  | 0.9769 |   0.927   |  1.1827  |         1.2771         |
|       DebertaForQuestionAnswering       |  8  | 0.7931 |   0.697   |  1.0464  |         0.9605         |
|           DebertaForMaskedLM            |  4  | 0.7155 |  0.5797   |  0.9657  |         0.8392         |
|          DebertaV2ForMaskedLM           |  1  | 0.6824 |  0.5188   |  0.8861  |         0.6807         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.6852 |   0.523   |  0.8253  |         0.6939         |
|          BlenderbotForCausalLM          |  4  | 0.9807 |  0.8479   |   0.0    |         1.2351         |
|          AllenaiLongformerBase          |  4  | 1.0039 |  0.6715   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|          MobileBertForMaskedLM          | 64  | 17.3446 |  40.2025  | 150.6225 |        148.9238        |
|     MobileBertForQuestionAnswering      | 128 | 17.3915 |  39.9163  | 140.7136 |        653.5987        |
|          DebertaV2ForMaskedLM           |  1  | 15.5696 |  27.5237  | 138.8626 |        75.8935         |
|      DebertaV2ForQuestionAnswering      |  2  | 15.4095 |  27.3197  | 137.5645 |        72.7915         |
|     M2M100ForConditionalGeneration      | 16  | 12.2124 |  26.5896  | 135.8697 |        137.9521        |
|       MT5ForConditionalGeneration       | 16  | 8.1726  |  18.3017  | 134.411  |        133.8593        |
|             XGLMForCausalLM             |  8  | 9.4942  |  21.0463  | 133.5907 |        132.7691        |
|            XLNetLMHeadModel             |  8  | 10.5172 |  27.7844  | 94.9186  |         94.954         |
|           DebertaForMaskedLM            |  4  | 7.3861  |  14.0928  | 86.1956  |        55.2549         |
|       DebertaForQuestionAnswering       |  8  | 7.2761  |  13.5437  | 82.1909  |         54.083         |
|      MBartForConditionalGeneration      |  2  | 11.8022 |  26.2283  | 81.8046  |        79.1892         |
|      BartForConditionalGeneration       |  2  | 11.619  |  26.1386  | 77.0969  |        76.5973         |
|     PegasusForConditionalGeneration     | 32  | 5.3819  |  19.4859  |  70.171  |        69.1206         |
|    MegatronBertForQuestionAnswering     |  8  | 10.5797 |  21.3237  | 69.5369  |        66.5752         |
|            YituTechConvBert             | 16  | 7.1759  |  15.7872  |  69.047  |        69.9462         |
|         MegatronBertForCausalLM         |  4  | 10.5922 |  21.7549  | 67.6891  |         66.815         |
| BlenderbotSmallForConditionalGeneration | 64  | 7.7023  |  17.2411  | 56.1683  |        56.4529         |
|                 T5Small                 |  4  | 5.5999  |  12.7517  | 52.2982  |        51.5535         |
|       T5ForConditionalGeneration        |  4  | 5.6543  |  12.8023  | 51.9082  |        51.0954         |
|           ElectraForCausalLM            | 32  | 5.3042  |  10.8613  | 50.7892  |        54.2764         |
|     PLBartForConditionalGeneration      |  4  | 6.2761  |  13.4427  | 49.7039  |        49.0915         |
|    LayoutLMForSequenceClassification    | 16  | 5.6218  |  11.1507  | 48.1803  |        47.1304         |
|       ElectraForQuestionAnswering       | 64  |  5.259  |  11.5491  | 43.4886  |        46.9035         |
|            MBartForCausalLM             |  4  | 5.7291  |  11.2827  |  42.565  |        40.9752         |
|             BertForMaskedLM             | 16  | 5.3048  |  10.981   | 40.7287  |        40.9174         |
|        BertForQuestionAnswering         | 16  | 5.2297  |  10.8328  | 40.6538  |        39.3134         |
|           LayoutLMForMaskedLM           | 16  | 5.5829  |  11.2154  |  39.796  |        42.1987         |
|           RobertaForCausalLM            | 16  | 5.2612  |  10.9255  | 39.5288  |        37.6301         |
|             OPTForCausalLM              |  2  | 4.7853  |  10.2633  | 39.2228  |        38.2162         |
|             BartForCausalLM             |  4  | 5.7122  |  11.0283  | 39.1731  |        40.0025         |
|           PegasusForCausalLM            | 32  |  5.686  |  11.2324  | 38.8494  |        38.2764         |
|            TrOCRForCausalLM             | 32  | 5.6649  |  10.9667  | 38.4776  |        37.8722         |
|      GPT2ForSequenceClassification      |  4  | 4.8679  |   9.946   | 37.9841  |        36.1413         |
|       RobertaForQuestionAnswering       | 16  | 5.2309  |  10.8488  | 37.8191  |        37.8725         |
|            AlbertForMaskedLM            |  4  | 2.2973  |  8.1403   | 37.8175  |        39.2703         |
|                CamemBert                | 16  |  5.257  |  10.8434  | 36.9008  |        39.1965         |
|     DistilBertForQuestionAnswering      | 256 | 2.5187  |  5.3678   | 35.8255  |        37.2627         |
|       AlbertForQuestionAnswering        |  4  |  2.359  |  8.1263   | 34.1647  |        35.0516         |
|          DistilBertForMaskedLM          | 128 | 2.5152  |  5.5412   | 33.8824  |        35.4997         |
|       BlenderbotSmallForCausalLM        | 64  | 3.8756  |  7.5187   | 31.0369  |         29.927         |
|               DistillGPT2               | 16  | 2.5873  |  5.1278   | 30.1948  |        28.9185         |
|         Speech2Text2ForCausalLM         | 256 | 3.0315  |  5.7656   |  27.228  |         26.774         |
|            PLBartForCausalLM            |  8  | 3.0107  |  5.9704   | 26.8497  |         26.839         |
|          BlenderbotForCausalLM          |  4  | 11.0248 |  21.9303  |   nan    |        69.4999         |
|          AllenaiLongformerBase          |  4  | 9.7652  |  31.4218  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|       ElectraForQuestionAnswering       | 64  | 1.0014 |  0.9537   |  1.1387  |         1.195          |
|            XLNetLMHeadModel             |  8  | 0.9843 |  0.9603   |  1.1342  |         1.1342         |
|      GPT2ForSequenceClassification      |  4  | 1.0001 |   0.906   |  1.1139  |         1.2307         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9165   |  1.094   |         1.1346         |
|       RobertaForQuestionAnswering       | 16  | 1.0012 |  0.9279   |  1.0865  |         1.1724         |
|        BertForQuestionAnswering         | 16  | 1.0017 |  0.9284   |  1.0818  |         1.1729         |
|    LayoutLMForSequenceClassification    | 16  | 1.0014 |  0.9295   |  1.0583  |         1.1368         |
|           RobertaForCausalLM            | 16  | 0.9999 |  0.9209   |  1.0541  |         1.0519         |
|             BertForMaskedLM             | 16  | 0.9998 |  0.9207   |  1.0539  |         1.0518         |
|                CamemBert                | 16  |  1.0   |  0.9184   |  1.0511  |         1.0491         |
|            YituTechConvBert             | 16  |  1.0   |  0.9143   |  1.0402  |         1.0411         |
|       T5ForConditionalGeneration        |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|                 T5Small                 |  4  | 0.9999 |  0.9516   |  1.0382  |         1.1813         |
|     DistilBertForQuestionAnswering      | 256 | 1.0114 |  0.9556   |  1.0299  |         1.1479         |
|           LayoutLMForMaskedLM           | 16  | 0.9999 |  0.9211   |  1.0078  |         1.0518         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.7449   |  0.9734  |         1.3147         |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8475   |  0.9731  |         0.9739         |
|               DistillGPT2               | 16  |  1.0   |  0.8591   |  0.9682  |         1.0642         |
|     PLBartForConditionalGeneration      |  4  | 1.0001 |  0.9301   |  0.9649  |         1.052          |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.7338   |  0.9574  |         1.268          |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |   0.904   |  0.953   |         1.1152         |
|            MBartForCausalLM             |  4  |  1.0   |  0.8937   |  0.9281  |         0.9912         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8677   |  0.9138  |         0.9886         |
|             BartForCausalLM             |  4  |  1.0   |  0.8936   |  0.9137  |         0.9749         |
|       MT5ForConditionalGeneration       | 16  | 0.9999 |  0.8495   |  0.9089  |         1.0018         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8822   |  0.893   |         0.9864         |
|          DistilBertForMaskedLM          | 128 |  1.0   |  0.8468   |  0.8849  |         0.9624         |
|            TrOCRForCausalLM             | 32  |  1.0   |   0.873   |  0.8836  |         0.9583         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8895   |  0.8729  |         0.9803         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |   0.91    |  0.8689  |         1.0689         |
|      MBartForConditionalGeneration      |  2  |  1.0   |  0.8946   |  0.8672  |         1.0307         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8987   |  0.8456  |         1.0139         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8644   |  0.845   |         1.0962         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8137   |  0.8184  |         0.9119         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8183   |  0.789   |         0.8779         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |  0.8084   |  0.7651  |         0.9908         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8769   |  0.7473  |         1.016          |
|             XGLMForCausalLM             |  8  |  1.0   |  0.7834   |  0.7117  |         0.9792         |
|     MobileBertForQuestionAnswering      | 128 | 1.0161 |  1.0064   |  0.6569  |         0.8392         |
|           DebertaForMaskedLM            |  4  | 0.9316 |  0.9156   |  0.5501  |         0.9978         |
|          DebertaV2ForMaskedLM           |  1  | 0.977  |  0.9068   |  0.5197  |         0.9665         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9763   |  0.487   |         0.9802         |
|       DebertaForQuestionAnswering       |  8  | 0.9525 |  1.0537   |  0.4601  |         1.1526         |
|          BlenderbotForCausalLM          |  4  | 0.9978 |  0.9099   |   nan    |         0.999          |
|          AllenaiLongformerBase          |  4  | 0.9508 |  0.8684   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 266.0741 | 300.5957  | 162.3949 |        162.7744        |
|       AlbertForQuestionAnswering        |  4  | 263.9479 | 297.7421  | 160.4003 |        160.7389        |
|            XLNetLMHeadModel             |  8  | 281.0544 | 288.5521  | 155.1764 |        152.0193        |
|      DebertaV2ForQuestionAnswering      |  2  | 156.3569 | 204.4826  | 130.8562 |        176.3497        |
|          DebertaV2ForMaskedLM           |  1  | 152.8454 | 198.3433  | 122.8313 |        170.5682        |
|     PegasusForConditionalGeneration     | 32  | 140.5407 |  147.003  | 113.8014 |        111.6971        |
|            TrOCRForCausalLM             | 32  | 138.9949 | 143.9675  | 110.2195 |        106.7772        |
|      MBartForConditionalGeneration      |  2  | 139.5138 | 144.5445  | 95.3065  |        101.5438        |
|      BartForConditionalGeneration       |  2  | 138.9865 | 141.9254  | 94.5213  |        99.6872         |
|    MegatronBertForQuestionAnswering     |  8  | 144.6633 | 147.2943  | 88.5423  |        87.1288         |
|            YituTechConvBert             | 16  | 127.002  | 130.6102  | 83.1027  |        83.9486         |
| BlenderbotSmallForConditionalGeneration | 64  | 114.5927 | 120.5037  | 81.1285  |         79.534         |
|     MobileBertForQuestionAnswering      | 128 | 177.187  | 208.6011  | 80.8667  |        166.8054        |
|                CamemBert                | 16  | 119.8016 | 122.7432  | 76.5838  |        77.1308         |
|            MBartForCausalLM             |  4  | 115.3325 | 118.9225  | 75.7891  |        73.5422         |
|     M2M100ForConditionalGeneration      | 16  | 128.6238 |  133.153  | 75.3887  |        98.9848         |
|             BartForCausalLM             |  4  | 115.0677 | 118.3685  | 74.6634  |        73.1909         |
|          MobileBertForMaskedLM          | 64  | 180.1248 | 213.0675  |  73.625  |        167.0572        |
|     PLBartForConditionalGeneration      |  4  | 119.2534 | 123.0497  | 73.2631  |        72.1816         |
|       DebertaForQuestionAnswering       |  8  |  95.34   | 108.4551  | 72.6669  |        78.8288         |
|     DistilBertForQuestionAnswering      | 256 | 103.9379 | 104.6612  | 71.6685  |        71.6706         |
|           LayoutLMForMaskedLM           | 16  | 114.0344 | 116.8081  |  71.225  |        70.7144         |
|            PLBartForCausalLM            |  8  | 117.5152 | 117.9444  | 70.2469  |        68.9389         |
|          DistilBertForMaskedLM          | 128 | 85.2454  |  88.9772  |  70.024  |         68.646         |
|             OPTForCausalLM              |  2  | 170.5106 | 182.1011  |  69.414  |        68.1763         |
|             BertForMaskedLM             | 16  | 111.5972 | 114.3774  | 68.9088  |        69.4608         |
|           RobertaForCausalLM            | 16  | 116.5124 |  119.445  | 68.5252  |        69.0099         |
|           DebertaForMaskedLM            |  4  |  88.363  | 121.4233  | 66.3399  |        82.1714         |
|                 T5Small                 |  4  | 106.3489 | 122.9907  | 64.4048  |        60.8002         |
|       T5ForConditionalGeneration        |  4  | 106.5098 | 122.9733  | 64.2992  |        60.4403         |
|               DistillGPT2               | 16  | 107.1357 | 110.7336  | 63.7492  |        62.1923         |
|         MegatronBertForCausalLM         |  4  | 88.8653  |  95.8709  | 59.3401  |        58.3729         |
|           PegasusForCausalLM            | 32  | 71.0425  |  74.4433  | 59.0177  |         58.396         |
|             XGLMForCausalLM             |  8  | 90.0649  | 110.7601  | 54.2954  |        80.4944         |
|    LayoutLMForSequenceClassification    | 16  |  99.102  | 100.4559  | 54.2432  |        54.6501         |
|       ElectraForQuestionAnswering       | 64  | 116.1258 | 117.7138  | 54.0483  |        55.3243         |
|       RobertaForQuestionAnswering       | 16  |  96.963  |  98.5077  | 53.6013  |        54.3902         |
|        BertForQuestionAnswering         | 16  | 96.6566  |  98.0127  | 53.5906  |        54.0604         |
|           ElectraForCausalLM            | 32  |  89.58   |  93.7388  | 47.6422  |        48.9399         |
|       BlenderbotSmallForCausalLM        | 64  | 59.1455  |  64.8676  | 47.4692  |         46.018         |
|       MT5ForConditionalGeneration       | 16  | 94.1362  | 109.5801  | 44.2144  |        50.4218         |
|      GPT2ForSequenceClassification      |  4  | 93.8274  |  96.1364  |  40.721  |        40.0718         |
|         Speech2Text2ForCausalLM         | 256 | 55.0411  |  57.9019  | 35.1635  |        34.3504         |
|          BlenderbotForCausalLM          |  4  | 112.6806 | 122.6812  |   nan    |        89.3649         |
|          AllenaiLongformerBase          |  4  | 180.6334 | 271.5463  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.9984 |  0.9975   |  3.0126  |         2.9689         |
|      xcit_large_24_p8_224       |  5  | 0.9916 |  0.8567   |  2.0536  |         1.5766         |
|        twins_pcpvt_base         | 64  | 0.9964 |  0.9026   |  1.9888  |         1.6746         |
|         coat_lite_mini          | 128 | 0.9967 |  0.9949   |  1.9441  |         1.9189         |
|          ghostnet_100           | 128 | 0.9921 |  0.7618   |  1.8488  |         1.6141         |
|          gmlp_s16_224           | 128 | 0.9946 |  1.0823   |  1.8427  |         1.8272         |
|          gmixer_24_224          | 128 | 0.9952 |  0.8887   |  1.7584  |         1.749          |
|           volo_d1_224           | 64  | 0.9943 |  0.9732   |  1.6883  |         1.6657         |
|            lcnet_050            | 128 | 0.9406 |  0.7365   |  1.6844  |         1.4333         |
|         crossvit_9_240          | 128 | 0.9907 |  0.7829   |  1.6438  |         1.6158         |
|  swin_base_patch4_window7_224   | 64  | 0.9906 |  0.9424   |  1.6135  |         1.607          |
|           convit_base           | 64  | 0.998  |  0.9976   |  1.6129  |         1.6106         |
|       gluon_inception_v3        | 128 | 0.9965 |  0.8652   |  1.5319  |         1.5218         |
|          inception_v3           | 128 | 0.9962 |  0.8642   |  1.5309  |         1.519          |
|        adv_inception_v3         | 128 | 0.9964 |  0.8603   |  1.5307  |         1.5178         |
|             dla102              | 128 | 0.9956 |  0.8148   |  1.5256  |         1.5213         |
|          convnext_base          | 64  | 0.9837 |  0.9843   |  1.4875  |         1.4696         |
|            nfnet_l0             | 128 | 0.9892 |  0.8141   |  1.4861  |         1.4363         |
|        sebotnet33ts_256         | 64  | 0.9567 |  0.7649   |  1.4808  |         1.5326         |
|           dm_nfnet_f0           | 128 | 0.9873 |  0.9852   |  1.4754  |         1.4281         |
|       eca_botnext26ts_256       | 128 | 0.9735 |  0.7194   |  1.4387  |         1.4237         |
|      mobilenetv3_large_100      | 128 | 0.949  |  0.7604   |  1.4347  |         1.3885         |
|            pit_b_224            | 64  | 0.9947 |  0.9925   |  1.4347  |         1.4287         |
|           resnest101e           | 64  | 0.9942 |  0.8678   |  1.4338  |         1.3532         |
|           mnasnet_100           | 128 | 0.948  |  0.7407   |  1.429   |         1.4981         |
|           mobilevit_s           | 64  | 0.9614 |  0.7305   |  1.4264  |         1.4403         |
|           regnety_002           | 128 | 0.9505 |  0.7097   |  1.4125  |         1.2311         |
|           selecsls42b           | 128 | 0.9985 |  0.8117   |  1.4105  |         1.4118         |
|          botnet26t_256          | 128 | 0.973  |  0.8519   |  1.4081  |         1.4225         |
|        res2net50_14w_8s         | 128 | 0.9988 |  0.7899   |  1.3787  |         1.3566         |
|           res2next50            | 128 | 0.9991 |  0.8256   |  1.3711  |         1.3638         |
|          jx_nest_base           | 32  | 0.9869 |  0.9851   |  1.3661  |         1.3574         |
|          mixer_b16_224          | 128 | 0.997  |  1.0181   |  1.3622  |         1.3601         |
|            hrnet_w18            | 128 | 0.9925 |  0.6446   |  1.3579  |         1.3448         |
|         mobilenetv2_100         | 128 | 0.9486 |  0.7368   |  1.3578  |         1.4448         |
|          spnasnet_100           | 128 | 0.9413 |  0.7389   |  1.3569  |         1.4187         |
|        ese_vovnet19b_dw         | 128 | 0.9582 |  0.8331   |  1.3544  |         1.3722         |
|      beit_base_patch16_224      | 64  | 0.9964 |  0.9584   |  1.3519  |         1.352          |
|           fbnetc_100            | 128 | 0.9497 |  0.7394   |  1.3515  |         1.404          |
|          cait_m36_384           |  4  | 0.9948 |  0.9439   |  1.3501  |         1.3483         |
|       tf_efficientnet_b0        | 128 | 0.9603 |  0.6814   |  1.3498  |         1.384          |
|         poolformer_m36          | 64  | 0.9863 |  0.9834   |  1.3271  |         1.3175         |
|            fbnetv3_b            | 128 | 0.949  |  0.7693   |  1.3139  |         1.2565         |
|           rexnet_100            | 128 | 0.9515 |  0.7031   |  1.2965  |         1.3361         |
|          resmlp_12_224          | 128 | 0.9931 |  0.8893   |  1.2598  |         1.2563         |
| deit_base_distilled_patch16_224 | 64  | 0.9962 |   0.994   |  1.2546  |         1.2545         |
|      vit_base_patch16_224       | 64  | 0.9961 |  0.9936   |  1.2353  |         1.2352         |
|            tinynet_a            | 128 | 0.9471 |  0.6782   |  1.2245  |         1.2324         |
|          cspdarknet53           | 64  | 0.9329 |  0.7858   |  1.2226  |         1.2588         |
|           tf_mixnet_l           | 128 | 0.9758 |  0.8265   |  1.1838  |         1.191          |
|            mixnet_l             | 128 | 0.9763 |  0.8206   |  1.1745  |         1.1816         |
|         visformer_small         | 128 | 0.9962 |  0.9449   |  1.1736  |         1.1656         |
|        res2net101_26w_4s        | 64  | 0.998  |  0.7839   |  1.1582  |         1.0901         |
|          pnasnet5large          | 16  | 0.9853 |  0.9189   |  1.0927  |         1.1131         |
|             dpn107              | 32  | 0.932  |  0.8074   |  1.0901  |         1.1336         |
|            repvgg_a2            | 128 | 0.9348 |  0.7549   |  1.087   |         1.118          |
|        gluon_xception65         | 32  | 0.9921 |  0.8422   |  1.0751  |         1.0787         |
|     swsl_resnext101_32x16d      | 32  | 0.9976 |  0.8426   |  1.0564  |         1.0211         |
|            gernet_l             | 128 | 0.9354 |  0.7935   |  1.0215  |         1.0663         |
|        convmixer_768_32         | 32  | 0.9986 |  0.9645   |  1.0016  |         1.0027         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+---------------+----------+------------------------+
|              name               | bs | eager |   aot_eager   | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+---------------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |     pass      |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |     pass      |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |     pass      |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |     pass      |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |     pass      |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |     pass      |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |     pass      |   pass   |          pass          |
|           regnety_002           | 8  | pass  |     pass      |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |     pass      |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |     pass      |   pass   |          pass          |
|           res2next50            | 8  | pass  |     pass      |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |     pass      |   pass   |          pass          |
|           resnest101e           | 8  | pass  |     pass      |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |     pass      |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |     pass      |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |     pass      |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |     pass      |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |     pass      |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |     pass      |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |     pass      |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |     pass      |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |     pass      |   pass   |          pass          |
|         visformer_small         | 8  | pass  |     pass      |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |     pass      |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |     pass      |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |     pass      |   pass   |          pass          |
|            lcnet_050            | 8  | pass  | fail_accuracy |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |     pass      |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |     pass      |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |     pass      |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |     pass      |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |     pass      |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |     pass      |   pass   |          pass          |
|           convit_base           | 8  | pass  |     pass      |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |     pass      |   pass   |          pass          |
|          convnext_base          | 8  | pass  |     pass      |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |     pass      |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |     pass      |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |     pass      |   pass   |          pass          |
|             dla102              | 8  | pass  |     pass      |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |     pass      |   pass   |          pass          |
|             dpn107              | 8  | pass  |     pass      |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |     pass      |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |     pass      |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |     pass      |   pass   |          pass          |
|            gernet_l             | 8  | pass  |     pass      |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |     pass      |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |     pass      |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |     pass      |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |     pass      |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |     pass      |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |     pass      |   pass   |          pass          |
|          inception_v3           | 8  | pass  |     pass      |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |     pass      |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |     pass      |   pass   |          pass          |
|            tinynet_a            | 8  | pass  | fail_accuracy |   pass   |          pass          |
+---------------------------------+----+-------+---------------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+---------+-----------+----------+------------------------+
|              name               | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+---------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 5.6993  |  11.1994  | 275.1515 |        276.6628        |
|            hrnet_w18            | 128 | 9.5989  |  36.4291  | 255.7748 |        249.9176        |
|          ghostnet_100           | 128 | 7.9291  |  14.9681  | 244.5686 |        243.1281        |
|            fbnetv3_b            | 128 | 8.3509  |  16.9235  | 178.1302 |        174.7349        |
|          pnasnet5large          | 16  | 8.2713  |  26.1052  | 167.4111 |        161.8981        |
|           resnest101e           | 64  | 11.0908 |  24.5085  | 166.679  |        168.3601        |
|           mobilevit_s           | 64  | 5.3554  |  11.389   | 164.6195 |        161.0406        |
|       gluon_inception_v3        | 128 | 5.6874  |  12.6224  | 162.3067 |        161.1515        |
|        adv_inception_v3         | 128 | 5.7431  |  12.661   | 162.0021 |        163.0911        |
|            tinynet_a            | 128 | 5.9901  |  12.2575  | 160.6803 |        156.4597        |
|      mobilenetv3_large_100      | 128 | 4.2468  |  8.4014   | 160.5822 |        153.4424        |
|            mixnet_l             | 128 | 8.7634  |  16.282   | 159.9061 |        158.8318        |
|          inception_v3           | 128 | 6.0812  |  12.5695  | 156.6721 |        159.4795        |
|           tf_mixnet_l           | 128 | 9.4942  |  16.827   | 156.3445 |        156.0664        |
|        res2net101_26w_4s        | 64  | 10.8721 |  25.1048  | 153.8691 |        153.8122        |
|        twins_pcpvt_base         | 64  | 10.5615 |  23.4808  | 149.9154 |        147.8376        |
|       tf_efficientnet_b0        | 128 | 5.1005  |  10.495   | 149.8097 |        154.6713        |
|           fbnetc_100            | 128 | 4.9279  |   9.256   | 136.5809 |        133.1516        |
|          spnasnet_100           | 128 | 5.0268  |  9.2387   |  136.5   |        137.2727        |
|      xcit_large_24_p8_224       |  5  | 13.4196 |  28.5272  | 135.2229 |        132.5646        |
|         mobilenetv2_100         | 128 | 4.0544  |  7.9001   | 130.7307 |        133.0086        |
|           mnasnet_100           | 128 | 4.1163  |  8.1373   | 126.2514 |        126.592         |
|        res2net50_14w_8s         | 128 | 9.4369  |  22.5034  | 123.4541 |        126.3791        |
|          cait_m36_384           |  4  | 14.6993 |  32.7877  | 117.823  |        115.3478        |
|  swin_base_patch4_window7_224   | 64  |  8.833  |  19.181   | 112.5906 |        109.616         |
|           regnety_002           | 128 | 4.8494  |  8.8289   | 109.1437 |        108.2611        |
|        sebotnet33ts_256         | 64  | 4.2077  |  8.8238   | 107.283  |        106.4203        |
|          cspdarknet53           | 64  | 5.7987  |  10.8478  | 102.8756 |        102.9829        |
|         poolformer_m36          | 64  | 7.6214  |  13.8123  | 102.5988 |        101.2815        |
|             dpn107              | 32  |  9.768  |  19.4637  | 102.452  |        99.7743         |
|       eca_botnext26ts_256       | 128 | 3.0752  |  6.8258   | 101.9253 |        99.7253         |
|             dla102              | 128 | 6.2123  |  14.0217  | 99.5575  |        98.5763         |
|            lcnet_050            | 128 | 2.5388  |  4.9787   | 96.5038  |        100.1957        |
|        gluon_xception65         | 32  | 7.8211  |  16.9041  | 96.4024  |        96.3856         |
|          botnet26t_256          | 128 | 3.0199  |  5.9755   | 93.8917  |        91.0777         |
|           selecsls42b           | 128 | 2.5041  |  5.3691   | 93.0398  |        90.9342         |
|           res2next50            | 128 | 5.0693  |  12.1397  | 90.5335  |        87.4438         |
|         coat_lite_mini          | 128 | 3.3219  |  8.3018   | 90.2117  |        91.3832         |
|         crossvit_9_240          | 128 | 5.8162  |  13.2613  | 87.8356  |        88.4924         |
|          jx_nest_base           | 32  | 6.6893  |  14.8517  |  85.221  |        83.1627         |
|            gernet_l             | 128 | 5.0745  |  8.8521   | 82.5061  |        82.9219         |
|            nfnet_l0             | 128 | 5.3011  |  11.0453  | 81.5994  |        79.7887         |
|        ese_vovnet19b_dw         | 128 | 2.5455  |  4.5574   | 77.4739  |        79.0887         |
|           volo_d1_224           | 64  | 5.3472  |  11.8284  |  75.533  |        75.7515         |
|           dm_nfnet_f0           | 128 | 5.9807  |  11.4623  | 74.5388  |        74.8234         |
|        tnt_s_patch16_224        | 128 | 6.9195  |  17.0106  | 69.4291  |        71.1891         |
|         visformer_small         | 128 | 2.6158  |  6.0686   | 68.4679  |        67.1496         |
|     swsl_resnext101_32x16d      | 32  | 6.1312  |  13.6653  | 65.6469  |        62.8965         |
|            repvgg_a2            | 128 |  4.861  |  9.2605   | 62.4885  |        61.1467         |
|          gmlp_s16_224           | 128 | 5.6677  |  11.9838  | 61.9518  |        62.6159         |
|          convnext_base          | 64  | 6.6326  |  12.5169  |  61.102  |        59.6916         |
|          gmixer_24_224          | 128 | 5.8246  |  12.8653  |  53.639  |        52.8536         |
|           convit_base           | 64  |  3.689  |  8.5796   | 49.5443  |        49.2829         |
|            pit_b_224            | 64  | 3.3955  |  7.9829   |  47.546  |        47.3044         |
| deit_base_distilled_patch16_224 | 64  | 3.2802  |  7.1145   | 45.2268  |        43.3123         |
|          resmlp_12_224          | 128 | 2.8034  |  5.2855   | 42.2218  |        41.8847         |
|      vit_base_patch16_224       | 64  |  3.091  |  7.0077   | 41.5768  |        40.4922         |
|        convmixer_768_32         | 32  |  1.678  |  6.8306   | 39.8388  |        37.2196         |
|      beit_base_patch16_224      | 64  | 3.9267  |  9.2819   |  36.842  |        34.9824         |
|          mixer_b16_224          | 128 | 2.7715  |  5.8739   | 34.7621  |        33.8914         |
+---------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          gmlp_s16_224           | 128 | 0.9951 |  0.9727   |  1.1858  |         1.2049         |
|          pnasnet5large          | 16  | 1.059  |  0.9907   |  1.1712  |         1.2836         |
|          gmixer_24_224          | 128 | 0.9928 |  0.9706   |  1.1129  |         1.1596         |
|           convit_base           | 64  | 0.9967 |  0.8482   |  1.0948  |         1.157          |
|         mobilenetv2_100         | 128 | 0.9865 |  0.7647   |  1.0266  |         1.1179         |
|           dm_nfnet_f0           | 128 | 0.9742 |  0.8946   |  1.013   |         1.0845         |
|          resmlp_12_224          | 128 | 0.9826 |  0.9506   |  1.0099  |         1.0351         |
|            tinynet_a            | 128 | 0.9892 |  0.7906   |  0.9984  |         1.0721         |
|           resnest101e           | 64  | 0.9947 |  0.9986   |  0.9972  |         1.0876         |
|       tf_efficientnet_b0        | 128 | 0.9863 |  0.7735   |  0.9872  |         1.0728         |
|        tnt_s_patch16_224        | 128 | 0.9947 |  0.9729   |  0.9834  |         1.0506         |
|        convmixer_768_32         | 32  | 0.9981 |  0.9795   |  0.9762  |         0.9854         |
|           rexnet_100            | 128 | 0.9898 |  0.7866   |  0.9747  |         1.0457         |
|        twins_pcpvt_base         | 64  | 0.9961 |  0.9232   |  0.9729  |         1.0539         |
|           mobilevit_s           | 64  | 0.9929 |  0.7794   |  0.9557  |         1.0057         |
|             dla102              | 128 | 0.9634 |  0.9151   |  0.9536  |         1.0326         |
|          mixer_b16_224          | 128 | 0.9919 |  0.9569   |  0.951   |         0.9948         |
|      vit_base_patch16_224       | 64  | 0.9949 |  0.9316   |  0.9362  |         0.955          |
| deit_base_distilled_patch16_224 | 64  | 0.9942 |  0.9313   |  0.9353  |         0.9528         |
|         visformer_small         | 128 | 0.9896 |  0.9236   |  0.9348  |         1.0194         |
|           tf_mixnet_l           | 128 | 0.9905 |   0.858   |  0.9346  |         1.0675         |
|      beit_base_patch16_224      | 64  | 0.9949 |  0.9303   |  0.9285  |         0.989          |
|            fbnetv3_b            | 128 | 0.9857 |  0.7935   |  0.9228  |         0.9793         |
|            nfnet_l0             | 128 | 0.9892 |  0.8404   |  0.9215  |         0.9952         |
|           volo_d1_224           | 64  | 0.9959 |  0.9469   |  0.9131  |         0.9727         |
|          cspdarknet53           | 64  | 0.9909 |  0.8538   |  0.9097  |         1.0328         |
|        ese_vovnet19b_dw         | 128 | 0.9861 |  0.8968   |  0.9047  |         0.9903         |
|            hrnet_w18            | 128 | 0.9909 |  0.9196   |  0.8918  |          0.99          |
|        sebotnet33ts_256         | 64  | 0.9925 |  0.7116   |  0.891   |         1.1115         |
|          inception_v3           | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|       gluon_inception_v3        | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|        adv_inception_v3         | 128 | 0.9825 |  0.8621   |  0.8904  |         1.0171         |
|             dpn107              | 32  | 0.9932 |   0.904   |  0.8833  |         0.9642         |
|        gluon_xception65         | 32  | 0.9954 |  0.8841   |  0.8831  |         0.9705         |
|          ghostnet_100           | 128 | 0.9748 |  0.8689   |  0.8807  |         0.977          |
|          spnasnet_100           | 128 | 0.9796 |  0.8826   |  0.8786  |         0.9451         |
|      mobilenetv3_large_100      | 128 | 0.9777 |  0.8424   |  0.877   |         0.9361         |
|         poolformer_m36          | 64  | 0.9981 |  0.9485   |  0.8768  |         1.1871         |
|       eca_botnext26ts_256       | 128 | 0.9881 |  0.7722   |  0.8738  |         1.0072         |
|      xcit_large_24_p8_224       |  5  | 0.9983 |  0.8871   |  0.8721  |         0.9732         |
|        res2net50_14w_8s         | 128 | 0.9912 |  0.9074   |  0.8712  |         0.9607         |
|        res2net101_26w_4s        | 64  | 0.9937 |  0.9132   |  0.871   |         0.9483         |
|            mixnet_l             | 128 |  0.99  |  0.8469   |  0.8687  |         0.9902         |
|           mnasnet_100           | 128 | 0.9777 |  0.8719   |  0.8683  |         0.9403         |
|           res2next50            | 128 | 0.9913 |  0.9106   |  0.866   |         0.9547         |
|          cait_m36_384           |  4  | 0.9998 |   0.913   |  0.8632  |         0.989          |
|           fbnetc_100            | 128 | 0.9819 |  0.8512   |  0.8596  |         0.9535         |
|            pit_b_224            | 64  | 0.9969 |  0.8011   |  0.8578  |         1.0242         |
|           selecsls42b           | 128 | 0.9806 |  0.8786   |  0.8576  |         0.9664         |
|          convnext_base          | 64  | 1.001  |   0.924   |  0.8505  |         1.0338         |
|            gernet_l             | 128 | 0.9781 |  0.8499   |  0.8499  |         0.9706         |
|     swsl_resnext101_32x16d      | 32  | 0.998  |  0.8688   |  0.8461  |         0.9786         |
|         coat_lite_mini          | 128 | 1.0337 |  0.9207   |  0.8402  |         1.0202         |
|          botnet26t_256          | 128 | 0.9842 |  0.8676   |  0.8239  |         0.9779         |
|            lcnet_050            | 128 | 0.9447 |  0.7712   |  0.805   |         0.884          |
|            repvgg_a2            | 128 | 0.9761 |  0.7778   |  0.7738  |         0.9611         |
|           regnety_002           | 128 | 0.9523 |  0.8281   |  0.7602  |         0.8966         |
|         crossvit_9_240          | 128 | 0.9851 |  0.8711   |  0.7526  |         0.9898         |
|  swin_base_patch4_window7_224   | 64  | 0.9976 |  0.9204   |  0.7214  |         0.9045         |
|          jx_nest_base           | 32  | 0.9985 |  0.8927   |  0.6693  |         0.9604         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 300.4785 | 311.1037  | 300.2453 |        299.2399        |
|            hrnet_w18            | 128 | 280.5567 | 433.2557  | 205.2586 |        207.4694        |
|          pnasnet5large          | 16  | 198.6957 | 212.7856  | 179.4515 |        176.8271        |
|           tf_mixnet_l           | 128 | 193.8395 | 228.9561  | 159.9145 |        159.006         |
|            mixnet_l             | 128 | 185.4911 | 220.5022  | 153.9461 |        153.0711        |
|          cait_m36_384           |  4  | 173.9893 |  181.402  | 123.7248 |        123.6517        |
|           resnest101e           | 64  | 165.259  | 188.3443  | 113.9577 |        121.5095        |
|             dla102              | 128 | 172.3799 | 210.6412  | 112.5554 |        113.0406        |
|     swsl_resnext101_32x16d      | 32  | 118.6215 | 140.3636  | 111.8232 |        115.6837        |
|         poolformer_m36          | 64  | 146.8757 | 147.1298  | 108.979  |        109.8046        |
|        tnt_s_patch16_224        | 128 | 323.4275 | 323.8514  | 107.2166 |        108.8224        |
|        adv_inception_v3         | 128 | 160.5676 | 185.9636  | 104.6174 |        105.5461        |
|          inception_v3           | 128 | 160.6964 | 185.2429  | 104.585  |        105.4366        |
|       gluon_inception_v3        | 128 | 160.8065 | 185.1928  | 104.5689 |        105.2862        |
|        res2net50_14w_8s         | 128 | 140.935  | 177.9539  | 102.2278 |        103.6838        |
|           convit_base           | 64  | 163.2652 | 163.0411  | 100.9347 |        101.0903        |
|             dpn107              | 32  | 113.7021 |  131.014  | 97.1951  |        93.4699         |
|        gluon_xception65         | 32  | 99.8143  | 117.2946  |  92.136  |        91.6258         |
|           res2next50            | 128 | 125.9044 | 152.2697  | 91.7131  |        92.2184         |
|  swin_base_patch4_window7_224   | 64  | 147.6003 | 154.6088  | 90.4329  |        90.7476         |
|           dm_nfnet_f0           | 128 | 128.6559 | 128.8463  | 85.7176  |        88.8292         |
|          mixer_b16_224          | 128 | 116.6407 | 114.2483  | 85.6271  |        85.5345         |
|        res2net101_26w_4s        | 64  | 100.7719 | 126.5222  | 85.1775  |        91.9015         |
|            fbnetv3_b            | 128 | 115.2918 | 142.0549  | 83.2112  |        87.1757         |
|            pit_b_224            | 64  | 118.7083 | 119.0224  | 82.2987  |         82.554         |
|          convnext_base          | 64  | 124.4767 | 123.9587  |  82.125  |        83.2952         |
|         visformer_small         | 128 | 91.2132  |  96.1167  | 77.5302  |        77.9409         |
|            nfnet_l0             | 128 | 112.9672 | 136.6466  | 75.1601  |        77.8623         |
|      beit_base_patch16_224      | 64  | 101.5011 | 105.6243  |  74.93   |        74.7418         |
|          gmlp_s16_224           | 128 | 137.4137 | 126.3103  | 74.3719  |        74.8948         |
|       eca_botnext26ts_256       | 128 | 108.6756 | 147.1409  | 73.6353  |        74.3115         |
|          jx_nest_base           | 32  | 101.4118 | 101.6103  | 73.2177  |        73.8287         |
|          cspdarknet53           | 64  |  94.887  | 112.5543  | 72.5498  |        70.3858         |
|            gernet_l             | 128 | 77.7055  |  91.5737  | 71.2669  |        68.2328         |
|           volo_d1_224           | 64  | 121.1454 | 123.3736  | 71.2254  |         72.128         |
|          botnet26t_256          | 128 | 101.7933 | 116.3549  | 70.4664  |        69.7069         |
|      vit_base_patch16_224       | 64  | 86.9746  |  87.1374  | 70.1112  |        69.9958         |
| deit_base_distilled_patch16_224 | 64  | 84.9154  |  85.0308  |  67.423  |         67.449         |
|            repvgg_a2            | 128 | 77.6178  |  96.1671  | 66.8631  |        64.9452         |
|          gmixer_24_224          | 128 | 118.032  | 131.9766  | 66.8484  |        67.1371         |
|      xcit_large_24_p8_224       |  5  | 144.0041 | 166.1195  | 62.3256  |        77.9301         |
|       tf_efficientnet_b0        | 128 | 84.6526  | 119.5026  | 60.2613  |        58.8375         |
|        twins_pcpvt_base         | 64  | 117.7882 | 141.6634  | 60.2158  |        69.1968         |
|           rexnet_100            | 128 | 80.0842  | 108.2213  | 58.7168  |        56.9406         |
|           fbnetc_100            | 128 | 82.7335  | 106.2835  | 58.1617  |        55.9695         |
|         coat_lite_mini          | 128 | 112.9578 | 113.2281  | 58.0038  |        58.6439         |
|           mobilevit_s           | 64  | 84.5601  | 111.3721  | 56.9916  |        56.5112         |
|            tinynet_a            | 128 | 73.5434  | 102.6301  | 56.8566  |        56.5248         |
|        sebotnet33ts_256         | 64  | 80.5053  |  100.537  | 51.9924  |         50.285         |
|         crossvit_9_240          | 128 | 82.4008  | 104.1625  | 49.7802  |        50.4592         |
|          spnasnet_100           | 128 | 70.4153  |  89.5947  | 48.8766  |        46.7056         |
|          ghostnet_100           | 128 | 90.5123  | 117.7778  | 48.5371  |        55.7042         |
|         mobilenetv2_100         | 128 | 65.4555  |  84.3222  | 45.8039  |        43.0103         |
|        ese_vovnet19b_dw         | 128 | 64.5312  |  74.3313  | 45.7309  |        45.0924         |
|           mnasnet_100           | 128 |  64.216  |  82.3891  | 42.6134  |        40.6277         |
|           selecsls42b           | 128 | 60.0775  |  73.8697  |  42.506  |        42.4269         |
|          resmlp_12_224          | 128 | 53.4379  |  59.7054  | 42.1271  |        42.2446         |
|      mobilenetv3_large_100      | 128 | 61.3007  |  76.4433  | 40.5678  |        41.9268         |
|           regnety_002           | 128 | 41.2991  |  55.4889  | 26.5131  |        30.3718         |
|            lcnet_050            | 128 | 31.6951  |  40.4419  | 17.6729  |        20.7928         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_100_10_04_23_performance_amp_229/huggingface_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_100_10_04_23_performance_amp_229/timm_models_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_100_10_04_23_performance_amp_229/torchbench_amp.png :

Build Summary

see more

Run name

day_100_10_04_23_performance_amp_229

Commit hashes

pytorch commit: f55e72c0f6bd6da016aaa51de379e6ba6d7891cc
pytorch commit date: 2023-04-07 17:30:27+00:00
torchbench commit: 735f1927996c8d9ab81f0b0c05dd1ebdb26a6250
torchbench commit date: 2023-04-05 09:43:21-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitf55e72c

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

Performance Dashboard for float32 precision (2.0 release binary oneoff)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 82%, 50/61 | 100%, 46/46 | 100%, 60/60 |
|       aot_eager        | 77%, 47/61 | 100%, 46/46 | 100%, 60/60 |
|        inductor        | 74%, 45/61 | 93%, 43/46  | 100%, 60/60 |
| inductor_no_cudagraphs | 75%, 46/61 | 98%, 45/46  | 100%, 60/60 |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.32x    |    1.22x    |    1.23x    |
| inductor_no_cudagraphs |   1.18x    |    1.22x    |    1.23x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    3.64    |    4.90     |    4.08     |
|       aot_eager        |    7.69    |    11.32    |    9.93     |
|        inductor        |   59.38    |    51.27    |   100.75    |
| inductor_no_cudagraphs |   58.66    |    47.84    |    99.83    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   0.99x    |    1.00x    |    1.00x    |
|       aot_eager        |   0.88x    |    0.92x    |    0.89x    |
|        inductor        |   0.81x    |    0.84x    |    0.92x    |
| inductor_no_cudagraphs |   0.97x    |    0.98x    |    1.02x    |
+------------------------+------------+-------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+------------------------+-----------------+
|    suite    |             name              | inductor_no_cudagraphs |    inductor     |
+-------------+-------------------------------+------------------------+-----------------+
| torchbench  |             moco              |      fail_to_run       |   fail_to_run   |
| torchbench  |    resnet50_quantized_qat     |      fail_to_run       |   fail_to_run   |
| torchbench  |  mobilenet_v2_quantized_qat   |      fail_to_run       |   fail_to_run   |
| torchbench  |         hf_Longformer         |      fail_to_run       |   fail_to_run   |
| torchbench  |      Background_Matting       |    eager_variation     | eager_variation |
| torchbench  |          Super_SloMo          |    eager_variation     | eager_variation |
| torchbench  |            alexnet            |    eager_variation     | eager_variation |
| torchbench  | pytorch_CycleGAN_and_pix2pix  |    eager_variation     | eager_variation |
| torchbench  |         pytorch_unet          |    eager_variation     | eager_variation |
| torchbench  |             vgg16             |    eager_variation     | eager_variation |
| torchbench  |        vision_maskrcnn        |    eager_variation     | eager_variation |
| torchbench  |           tacotron2           |         0.0000         |     0.0000      |
| torchbench  |              gat              |         0.0000         |     0.0000      |
| torchbench  |              gcn              |         0.0000         |     0.0000      |
| torchbench  |             llama             |         0.0000         |     0.0000      |
| torchbench  |             sage              |         0.0000         |     0.0000      |
| torchbench  |         torchrec_dlrm         |         0.0000         |     0.0000      |
| huggingface | DebertaV2ForQuestionAnswering |          pass          |   fail_to_run   |
+-------------+-------------------------------+------------------------+-----------------+

Performance speedup warnings

+-------------+-------------------------------+------------------------+----------+
|    suite    |             name              | inductor_no_cudagraphs | inductor |
+-------------+-------------------------------+------------------------+----------+
| torchbench  |         lennard_jones         |         0.883          |  1.3312  |
| torchbench  |             dcgan             |         0.8639         |  1.2259  |
| torchbench  |       soft_actor_critic       |         0.794          |  1.0635  |
| torchbench  |          timm_vovnet          |         0.9712         |  0.9419  |
| torchbench  |    nvidia_deeprecommender     |         0.9666         |  0.7988  |
| torchbench  |              gat              |          0.0           |   0.0    |
| torchbench  |           tacotron2           |          0.0           |   0.0    |
| torchbench  |             sage              |          0.0           |   0.0    |
| torchbench  |              gcn              |          0.0           |   0.0    |
| torchbench  |         hf_GPT2_large         |         1.3826         |   0.0    |
| torchbench  |             moco              |          0.0           |   0.0    |
| torchbench  |         hf_Longformer         |          0.0           |   0.0    |
| torchbench  |    resnet50_quantized_qat     |          0.0           |   0.0    |
| torchbench  |  mobilenet_v2_quantized_qat   |          0.0           |   0.0    |
| torchbench  |         torchrec_dlrm         |          0.0           |   0.0    |
| huggingface |      DebertaForMaskedLM       |         0.8475         |  0.8603  |
| huggingface | DebertaV2ForQuestionAnswering |         0.7018         |  0.7474  |
| huggingface |     DebertaV2ForMaskedLM      |         0.6235         |  0.7285  |
| huggingface |     BlenderbotForCausalLM     |         1.0243         |   0.0    |
| huggingface |     AllenaiLongformerBase     |          0.0           |   0.0    |
| timm_models |         resmlp_12_224         |         0.9299         |  0.9303  |
+-------------+-------------------------------+------------------------+----------+

Compilation latency (sec) warnings

+-------------+-------------------------------+------------------------+----------+
|    suite    |             name              | inductor_no_cudagraphs | inductor |
+-------------+-------------------------------+------------------------+----------+
| torchbench  |       phlippe_densenet        |        164.3401        | 164.7971 |
| torchbench  |          hf_T5_large          |        147.2658        | 149.7604 |
| torchbench  |       timm_efficientnet       |        133.6245        | 135.2297 |
| torchbench  |      mobilenet_v3_large       |        127.5956        | 131.7941 |
| torchbench  |          hf_BigBird           |        115.2682        | 131.5293 |
| torchbench  |         mobilenet_v2          |        121.9159        | 125.891  |
| torchbench  |          densenet121          |        128.3598        | 125.1659 |
| huggingface |  MT5ForConditionalGeneration  |        124.5428        | 123.6133 |
| huggingface |     DebertaV2ForMaskedLM      |        56.2815         | 122.3893 |
| huggingface | DebertaV2ForQuestionAnswering |        54.1742         | 120.5788 |
| timm_models |          rexnet_100           |        273.9646        | 276.7602 |
| timm_models |           hrnet_w18           |        225.2791        | 233.938  |
| timm_models |         ghostnet_100          |        229.9564        | 231.1308 |
| timm_models |          mobilevit_s          |        178.0214        | 183.8375 |
| timm_models |           fbnetv3_b           |        157.2946        | 160.7021 |
| timm_models |      gluon_inception_v3       |        146.5848        | 153.5986 |
| timm_models |         inception_v3          |        148.1424        | 152.5033 |
| timm_models |     mobilenetv3_large_100     |        143.8063        | 152.1616 |
| timm_models |       adv_inception_v3        |        152.1385        | 149.3431 |
| timm_models |           tinynet_a           |        146.7822        | 148.3809 |
| timm_models |      tf_efficientnet_b0       |        143.9997        | 146.624  |
| timm_models |           mixnet_l            |        138.3035        | 146.5976 |
| timm_models |         pnasnet5large         |        147.4728        | 146.1797 |
| timm_models |          resnest101e          |        147.637         | 145.8378 |
| timm_models |          tf_mixnet_l          |        147.7897        | 141.5892 |
| timm_models |       res2net101_26w_4s       |        139.8832        | 133.0005 |
| timm_models |          fbnetc_100           |        130.5764        | 132.591  |
| timm_models |         spnasnet_100          |        131.7982        | 129.5098 |
| timm_models |        mobilenetv2_100        |        123.3439        | 126.203  |
+-------------+-------------------------------+------------------------+----------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+------------------------+----------+
|    suite    |                  name                   | inductor_no_cudagraphs | inductor |
+-------------+-----------------------------------------+------------------------+----------+
| torchbench  |             pytorch_stargan             |         1.0715         |  0.8997  |
| torchbench  |              timm_resnest               |         1.0032         |  0.8975  |
| torchbench  |                resnet152                |         0.9666         |  0.8892  |
| torchbench  |         timm_vision_transformer         |         0.9267         |  0.8846  |
| torchbench  |                  hf_T5                  |         1.1711         |  0.8774  |
| torchbench  |               timm_nfnet                |         1.1331         |  0.8734  |
| torchbench  |               timm_regnet               |         0.982          |  0.8628  |
| torchbench  |            phlippe_densenet             |         0.9199         |  0.8562  |
| torchbench  |              pytorch_unet               |         0.9923         |  0.8501  |
| torchbench  |           mobilenet_v3_large            |         0.9276         |  0.8424  |
| torchbench  |                resnet50                 |         0.9405         |  0.8404  |
| torchbench  |           speech_transformer            |         0.844          |   0.84   |
| torchbench  |                 alexnet                 |         1.0006         |  0.8346  |
| torchbench  |              hf_DistilBert              |         0.9835         |  0.8317  |
| torchbench  |                  dcgan                  |         0.9932         |  0.8287  |
| torchbench  |             resnext50_32x4d             |         0.919          |  0.8236  |
| torchbench  |               hf_T5_large               |         1.1293         |  0.8219  |
| torchbench  |              squeezenet1_1              |         0.9868         |  0.8131  |
| torchbench  |               mnasnet1_0                |         0.8757         |  0.8117  |
| torchbench  |                 hf_Bart                 |         1.0392         |  0.794   |
| torchbench  |    attention_is_all_you_need_pytorch    |         0.9488         |  0.7913  |
| torchbench  |                 demucs                  |         0.9979         |  0.7557  |
| torchbench  |               timm_vovnet               |         0.952          |  0.7515  |
| torchbench  |             pytorch_struct              |         0.7355         |  0.726   |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |         0.7115         |  0.6919  |
| torchbench  |               hf_BigBird                |         1.0398         |  0.6819  |
| torchbench  |                  vgg16                  |         0.9999         |  0.6712  |
| torchbench  |         nvidia_deeprecommender          |         0.9844         |  0.6651  |
| torchbench  |                   drq                   |         0.9801         |  0.6493  |
| torchbench  |               densenet121               |         0.7937         |  0.647   |
| torchbench  |             LearningToPaint             |         0.8274         |  0.6394  |
| torchbench  |                resnet18                 |         0.6981         |  0.6283  |
| torchbench  |            soft_actor_critic            |         0.9997         |  0.6192  |
| torchbench  |              lennard_jones              |          1.0           |  0.5322  |
| torchbench  |             phlippe_resnet              |         0.4791         |  0.4452  |
| torchbench  |          functorch_dp_cifar10           |         0.423          |  0.3989  |
| torchbench  |               hf_Reformer               |         0.7865         |  0.3861  |
| huggingface |               DistillGPT2               |         0.9667         |  0.8492  |
| huggingface |         MegatronBertForCausalLM         |         1.0256         |  0.8374  |
| huggingface |            MBartForCausalLM             |         0.9142         |  0.8347  |
| huggingface |             BartForCausalLM             |         0.9141         |  0.8345  |
| huggingface |            PLBartForCausalLM            |         0.9244         |  0.8267  |
| huggingface |      MBartForConditionalGeneration      |         0.9794         |  0.8239  |
| huggingface | BlenderbotSmallForConditionalGeneration |         0.9055         |  0.8181  |
| huggingface |     PegasusForConditionalGeneration     |         0.9645         |  0.8166  |
| huggingface |      BartForConditionalGeneration       |         0.9794         |  0.8137  |
| huggingface |          DistilBertForMaskedLM          |         0.8955         |  0.7997  |
| huggingface |           PegasusForCausalLM            |         0.8904         |  0.7939  |
| huggingface |       MT5ForConditionalGeneration       |         0.9119         |  0.7921  |
| huggingface |            TrOCRForCausalLM             |         0.8833         |  0.7827  |
| huggingface |       AlbertForQuestionAnswering        |         1.2465         |  0.7775  |
| huggingface |            AlbertForMaskedLM            |         1.2096         |  0.7684  |
| huggingface |     M2M100ForConditionalGeneration      |         0.9016         |  0.7498  |
| huggingface |       BlenderbotSmallForCausalLM        |         0.8426         |  0.7233  |
| huggingface |         Speech2Text2ForCausalLM         |         0.8199         |  0.7096  |
| huggingface |             XGLMForCausalLM             |         0.9136         |  0.7095  |
| huggingface |          MobileBertForMaskedLM          |         0.736          |  0.5643  |
| huggingface |          DebertaV2ForMaskedLM           |         0.988          |  0.5506  |
| huggingface |           DebertaForMaskedLM            |         1.0107         |  0.5429  |
| huggingface |     MobileBertForQuestionAnswering      |         0.5574         |  0.4677  |
| huggingface |       DebertaForQuestionAnswering       |         1.1413         |  0.4577  |
| huggingface |      DebertaV2ForQuestionAnswering      |         0.9759         |  0.4556  |
| timm_models |               volo_d1_224               |         0.9634         |  0.8975  |
| timm_models |            ese_vovnet19b_dw             |         1.0127         |  0.8974  |
| timm_models |            gluon_xception65             |         0.9923         |  0.8947  |
| timm_models |               fbnetc_100                |         0.9847         |  0.8935  |
| timm_models |                mixnet_l                 |         1.0338         |  0.8918  |
| timm_models |                lcnet_050                |         0.9552         |  0.881   |
| timm_models |               dm_nfnet_f0               |         1.1277         |  0.8735  |
| timm_models |              gmlp_s16_224               |         0.8743         |  0.8656  |
| timm_models |      swin_base_patch4_window7_224       |         0.988          |  0.8653  |
| timm_models |              botnet26t_256              |         0.9861         |  0.8623  |
| timm_models |                gernet_l                 |         0.998          |  0.8613  |
| timm_models |            twins_pcpvt_base             |         0.9398         |   0.86   |
| timm_models |              jx_nest_base               |         0.9832         |  0.8479  |
| timm_models |            sebotnet33ts_256             |         1.0449         |  0.8393  |
| timm_models |             crossvit_9_240              |         0.9829         |   0.82   |
| timm_models |             poolformer_m36              |         1.1099         |  0.8195  |
| timm_models |               regnety_002               |         0.9579         |  0.8013  |
| timm_models |                pit_b_224                |         0.9905         |  0.7981  |
| timm_models |                repvgg_a2                |         1.005          |  0.7788  |
| timm_models |              convnext_base              |         0.9504         |  0.7585  |
| timm_models |             coat_lite_mini              |         0.9347         |  0.7543  |
+-------------+-----------------------------------------+------------------------+----------+

torchbench suite with float32 precision

see more

Performance speedup

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.9892 |  1.0233   |  2.4578  |         1.2468         |
|            densenet121            |  4   | 0.9914 |  0.7178   |  2.3994  |         1.0269         |
|            hf_BigBird             |  2   | 0.9584 |  0.8102   |  2.3453  |         1.6393         |
|           BERT_pytorch            |  16  | 0.9915 |  0.8688   |  1.8793  |         1.8898         |
|         phlippe_densenet          | 128  | 0.9859 |  0.7984   |  1.7597  |         1.0625         |
|               dlrm                | 1024 | 0.9863 |  0.9297   |  1.7536  |         1.1839         |
|             hf_Albert             |  8   | 0.9993 |  0.9991   |   1.64   |         1.6547         |
|        mobilenet_v3_large         |  32  | 0.9973 |  0.8255   |  1.601   |         1.1435         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.991  |  0.9867   |  1.529   |         1.4943         |
|          phlippe_resnet           | 128  | 0.9954 |  0.7643   |  1.5247  |         1.0439         |
|           squeezenet1_1           |  32  | 0.9867 |  0.9881   |  1.5138  |         1.2732         |
|        speech_transformer         |  32  | 0.9843 |   0.869   |  1.4552  |         1.4468         |
|            hf_T5_large            |  2   | 0.9846 |  0.8446   |  1.4357  |         1.5018         |
|            timm_nfnet             | 128  | 0.991  |   0.99    |  1.4282  |         1.3882         |
|           fastNLP_Bert            |  6   | 0.9821 |  0.9599   |  1.387   |         1.373          |
|               hf_T5               |  8   | 0.9897 |  0.8148   |  1.3856  |         1.3987         |
|          pytorch_struct           | 200  | 0.9834 |  0.7408   |  1.3682  |         1.0671         |
|           timm_resnest            |  32  | 0.9944 |  0.8806   |  1.3642  |         1.344          |
|        shufflenet_v2_x1_0         | 128  | 0.9953 |  0.7791   |  1.3535  |         1.216          |
|              hf_GPT2              |  4   | 0.9829 |  0.9545   |  1.3502  |         1.3962         |
|           mobilenet_v2            |  96  | 0.9975 |  0.8432   |  1.3448  |         1.3512         |
|           lennard_jones           | 1000 | 0.9286 |  0.8268   |  1.3312  |         0.883          |
|          resnext50_32x4d          |  8   | 0.9954 |  0.7545   |  1.3049  |         0.9658         |
|             resnet18              |  16  | 0.9946 |  0.7806   |  1.2596  |         1.0033         |
|            mnasnet1_0             |  32  | 0.992  |  0.7753   |  1.2441  |         1.0429         |
|               dcgan               |  32  | 0.9294 |  0.7511   |  1.2259  |         0.8639         |
|                drq                |  1   | 0.9611 |  0.7341   |  1.2247  |         0.9858         |
|          pytorch_stargan          |  16  | 0.9956 |  0.9596   |  1.2061  |         1.1967         |
|              hf_Bart              |  4   | 0.9974 |  0.8982   |  1.1787  |         1.4096         |
|           pytorch_unet            |  1   | 0.9973 |  0.2734   |  1.1752  |         1.1744         |
|               vgg16               |  64  | 0.9995 |  0.9988   |  1.154   |         1.1605         |
|           hf_Bert_large           |  4   | 0.973  |  0.9585   |  1.149   |         1.1431         |
|          LearningToPaint          |  96  | 0.9919 |  0.8502   |  1.1467  |         1.0756         |
|           hf_DistilBert           |  8   | 0.9866 |  0.9315   |  1.1449  |         1.1652         |
|              yolov3               |  16  | 0.9972 |  0.8491   |  1.1405  |         1.1465         |
|              hf_Bert              |  4   | 0.9974 |  0.9115   |  1.1393  |         1.1418         |
|            Super_SloMo            |  6   | 0.9985 |  0.2444   |  1.1385  |         1.1381         |
|         timm_efficientnet         |  32  | 0.9443 |  0.7005   |  1.1246  |         1.1065         |
|             resnet50              |  32  | 0.9962 |  0.8766   |  1.1175  |         1.1243         |
|      timm_vision_transformer      |  32  | 0.9914 |  0.9794   |  1.1042  |         1.1013         |
|            hf_Reformer            |  4   | 0.991  |  0.9912   |  1.1042  |         1.0966         |
|              alexnet              | 128  | 0.9989 |  0.9963   |  1.0719  |         1.1119         |
|        Background_Matting         |  4   | 0.9992 |  0.1875   |  1.0695  |         1.0644         |
|         soft_actor_critic         | 256  | 0.9479 |   0.67    |  1.0635  |         0.794          |
| attention_is_all_you_need_pytorch | 256  | 0.9891 |  0.9599   |  1.0624  |         1.0642         |
|            timm_regnet            |  32  | 0.9516 |  0.8724   |  1.0564  |         1.0494         |
|             resnet152             |  32  | 0.9956 |  0.8188   |  1.054   |         1.0427         |
|              demucs               |  4   | 0.9992 |  0.9988   |  1.0287  |         1.0306         |
|            tts_angular            |  64  | 0.9984 |  0.9825   |  0.9983  |         0.994          |
|            timm_vovnet            |  32  | 0.8891 |  0.8204   |  0.9419  |         0.9712         |
|      nvidia_deeprecommender       | 256  | 0.9988 |  0.9653   |  0.7988  |         0.9666         |
|                gat                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|             tacotron2             |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|               sage                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|                gcn                |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
|           hf_GPT2_large           |  4   | 0.9845 |  0.9627   |   0.0    |         1.3826         |
|               moco                |  32  | 0.9532 |    0.0    |   0.0    |          0.0           |
|           hf_Longformer           |  2   | 1.011  |  0.6513   |   0.0    |          0.0           |
|      resnet50_quantized_qat       |  32  | 1.0004 |  0.8255   |   0.0    |          0.0           |
|    mobilenet_v2_quantized_qat     |  96  | 1.0007 |  0.8393   |   0.0    |          0.0           |
|           torchrec_dlrm           |  0   |  0.0   |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|               name                | bs  |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            |  4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|         soft_actor_critic         | 256 |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       |  4  |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          |  4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           |  4  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_struct           | 200 |       pass       |       pass       |       pass       |          pass          |
|             resnet18              |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              |  4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          |  4  |       pass       |       pass       |       pass       |          pass          |
|        shufflenet_v2_x1_0         |  4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         |  4  |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            |  4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           |  4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            |  4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      |  4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            |  4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            |  4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               |  4  |       pass       |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         |  4  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             |  4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             |  4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        |  4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            |  4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          |  4  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           |  4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               |  4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               |  4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            |  4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                |  4  |       pass       |       pass       |       pass       |          pass          |
|                drq                |  1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            |  4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch |  4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             |  4  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              |  2  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            |  4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               |  4  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             |  4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              |  4  |       pass       |       pass       |       pass       |          pass          |
|               moco                |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|      resnet50_quantized_qat       |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|    mobilenet_v2_quantized_qat     |  4  |       pass       |   fail_to_run    |   fail_to_run    |      fail_to_run       |
|           hf_Longformer           |  4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|        Background_Matting         |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|            Super_SloMo            |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|              alexnet              |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|   pytorch_CycleGAN_and_pix2pix    |  1  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|           pytorch_unet            |  2  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|               vgg16               |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|          vision_maskrcnn          |  4  | eager_variation  | eager_variation  | eager_variation  |    eager_variation     |
|             tacotron2             |  4  |   fail_to_run    |   fail_to_run    |      0.0000      |         0.0000         |
|                gat                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|                gcn                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               llama               |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|               sage                |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
|           torchrec_dlrm           |  0  |      0.0000      |      0.0000      |      0.0000      |         0.0000         |
+-----------------------------------+-----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+---------+-----------+----------+------------------------+
|               name                |  bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+---------+-----------+----------+------------------------+
|         phlippe_densenet          | 128  | 2.0434  |  5.2908   | 164.7971 |        164.3401        |
|            hf_T5_large            |  2   | 21.4732 |  45.4038  | 149.7604 |        147.2658        |
|         timm_efficientnet         |  32  | 3.4574  |  8.2183   | 135.2297 |        133.6245        |
|        mobilenet_v3_large         |  32  | 2.1226  |  5.5377   | 131.7941 |        127.5956        |
|            hf_BigBird             |  2   | 10.4138 |   32.39   | 131.5293 |        115.2682        |
|           mobilenet_v2            |  96  | 1.8645  |  5.2314   | 125.891  |        121.9159        |
|            densenet121            |  4   | 5.0182  |  14.1483  | 125.1659 |        128.3598        |
|              yolov3               |  16  | 3.3578  |   8.249   | 106.548  |        107.9441        |
|            mnasnet1_0             |  32  | 1.8866  |   5.04    | 102.6443 |        107.335         |
|             resnet152             |  32  | 5.8926  |  15.6119  | 96.0523  |        94.6267         |
|           timm_resnest            |  32  |  1.201  |  2.8365   | 94.2961  |        92.8541         |
|        shufflenet_v2_x1_0         | 128  | 2.1886  |  5.9156   | 75.8802  |        76.4391         |
|        speech_transformer         |  32  | 3.8319  |  9.5836   | 71.0866  |        66.3364         |
|            timm_regnet            |  32  | 4.7255  |  9.2573   | 67.3946  |        65.5732         |
|            timm_nfnet             | 128  | 4.3111  |  8.7662   | 67.0176  |        65.5873         |
| attention_is_all_you_need_pytorch | 256  | 2.9718  |  8.4526   | 66.3485  |        66.4413         |
|             resnet50              |  32  | 1.9673  |  5.2497   |  63.072  |        60.4255         |
|            timm_vovnet            |  32  | 2.6075  |  5.0828   | 60.6609  |         57.895         |
|        Background_Matting         |  4   |  1.77   |  9.3554   | 58.4763  |        61.5001         |
|           BERT_pytorch            |  16  | 3.2058  |  8.2991   | 58.3789  |        58.1252         |
|       functorch_dp_cifar10        |  64  | 0.7285  |   1.66    | 55.4975  |        51.3286         |
|           hf_Bert_large           |  4   | 7.0004  |  15.0229  | 53.0323  |        51.7548         |
|           pytorch_unet            |  1   | 1.0032  |  3.6044   |  51.237  |        52.2397         |
|          resnext50_32x4d          |  8   | 1.9993  |  5.3057   | 50.0594  |        48.2653         |
|               hf_T5               |  8   | 4.2815  |  11.0157  | 44.8169  |        42.0625         |
|          pytorch_stargan          |  16  | 0.8089  |  2.5276   | 43.9681  |         43.702         |
|             resnet18              |  16  | 0.8488  |  2.0641   | 43.1969  |        41.8598         |
|              hf_Bart              |  4   | 3.7981  |  9.5703   | 42.0159  |        40.7236         |
|      timm_vision_transformer      |  32  | 1.9661  |  5.0041   | 41.5039  |        39.7002         |
|          LearningToPaint          |  96  | 0.8934  |  2.1626   | 41.3686  |        42.7328         |
|           fastNLP_Bert            |  6   | 3.4399  |  7.8405   |  40.772  |        40.5171         |
|            hf_Reformer            |  4   | 3.4828  |  5.0013   | 39.1851  |        37.2445         |
|            Super_SloMo            |  6   | 2.0205  |  8.1186   | 36.8793  |        37.0915         |
|              hf_GPT2              |  4   | 3.2502  |  7.0724   | 35.8465  |         35.201         |
|             hf_Albert             |  8   | 1.8409  |  6.2215   |  33.981  |        35.9237         |
|              hf_Bert              |  4   | 3.4755  |  7.5101   | 33.0238  |        31.2422         |
|          phlippe_resnet           | 128  | 0.8666  |  2.0713   | 30.4201  |        29.2593         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.7983  |  2.3081   | 29.3072  |        29.8332         |
|           hf_DistilBert           |  8   | 1.5105  |   3.87    | 27.3705  |        25.8568         |
|              demucs               |  4   | 0.7699  |  1.2502   |  25.426  |        25.2651         |
|          pytorch_struct           | 200  | 0.4584  |  0.8927   | 24.9754  |        24.2444         |
|           squeezenet1_1           |  32  | 0.6277  |  1.0986   | 22.1159  |        21.7369         |
|              alexnet              | 128  | 0.3074  |  0.5007   | 13.7286  |        13.5414         |
|               vgg16               |  64  | 0.3428  |  0.6748   | 13.6218  |        13.8409         |
|                drq                |  1   | 0.4692  |  0.7062   |  9.6357  |         7.7873         |
|      nvidia_deeprecommender       | 256  | 0.3126  |  0.5004   |  9.3936  |         9.0937         |
|         soft_actor_critic         | 256  | 0.3123  |  0.4303   |  7.539   |         6.1807         |
|               dcgan               |  32  | 0.2867  |  0.5134   |  6.6982  |          6.32          |
|               dlrm                | 1024 | 0.3075  |  0.6226   |  6.668   |         6.5399         |
|            tts_angular            |  64  | 0.2707  |  0.3275   |  5.1372  |         4.9307         |
|           lennard_jones           | 1000 | 0.2497  |   0.376   |  4.9453  |         5.3687         |
|           hf_GPT2_large           |  4   | 10.5783 |  23.7133  |   nan    |        85.4726         |
|           hf_Longformer           |  2   | 7.3288  |  28.3443  |   nan    |          nan           |
|    mobilenet_v2_quantized_qat     |  96  | 2.5628  |  10.6738  |   nan    |          nan           |
|      resnet50_quantized_qat       |  32  | 2.4272  |  10.1255  |   nan    |          nan           |
|               moco                |  32  | 30.4064 |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+--------+-----------+----------+------------------------+
|               name                |  bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+--------+-----------+----------+------------------------+
|            Super_SloMo            |  6   | 1.0073 |  0.9035   |  1.3192  |         1.3192         |
|           mobilenet_v2            |  96  | 1.0002 |  0.7663   |  1.1503  |         1.2552         |
|           fastNLP_Bert            |  6   | 1.0002 |  0.9109   |  1.1054  |         1.2176         |
|             hf_Albert             |  8   |  1.0   |  0.9487   |  1.0049  |         1.1711         |
|              hf_Bert              |  4   |  1.0   |   0.892   |  0.9854  |         0.9889         |
|         timm_efficientnet         |  32  | 1.0014 |  0.7903   |  0.9844  |         1.0585         |
|            tts_angular            |  64  |  1.0   |    1.0    |  0.9819  |          1.0           |
|        shufflenet_v2_x1_0         | 128  |  1.0   |  0.9152   |  0.9673  |         1.0646         |
|           hf_Bert_large           |  4   |  1.0   |  0.8872   |  0.9556  |         1.0278         |
|               dlrm                | 1024 |  1.0   |  0.9945   |  0.9522  |         1.001          |
|              yolov3               |  16  | 0.9999 |  0.8557   |  0.9276  |         1.1038         |
|           BERT_pytorch            |  16  |  1.0   |  0.8854   |  0.913   |         1.1114         |
|        Background_Matting         |  4   | 1.0027 |  0.8166   |  0.9124  |         1.0422         |
|              hf_GPT2              |  4   |  1.0   |  0.8882   |  0.9095  |         1.1129         |
|          pytorch_stargan          |  16  |  1.0   |  1.0123   |  0.8997  |         1.0715         |
|           timm_resnest            |  32  | 1.0022 |  0.9221   |  0.8975  |         1.0032         |
|             resnet152             |  32  | 1.0002 |  0.9113   |  0.8892  |         0.9666         |
|      timm_vision_transformer      |  32  | 1.0001 |  0.9359   |  0.8846  |         0.9267         |
|               hf_T5               |  8   |  1.0   |  0.9409   |  0.8774  |         1.1711         |
|            timm_nfnet             | 128  | 0.9114 |  0.8889   |  0.8734  |         1.1331         |
|            timm_regnet            |  32  | 1.0004 |   0.866   |  0.8628  |         0.982          |
|         phlippe_densenet          | 128  |  1.0   |  0.9031   |  0.8562  |         0.9199         |
|           pytorch_unet            |  1   | 1.0005 |  0.8208   |  0.8501  |         0.9923         |
|        mobilenet_v3_large         |  32  |  1.0   |  0.8899   |  0.8424  |         0.9276         |
|             resnet50              |  32  | 1.0004 |  0.8706   |  0.8404  |         0.9405         |
|        speech_transformer         |  32  | 0.9961 |  0.9115   |   0.84   |         0.844          |
|              alexnet              | 128  | 1.0003 |   0.877   |  0.8346  |         1.0006         |
|           hf_DistilBert           |  8   |  1.0   |   0.899   |  0.8317  |         0.9835         |
|               dcgan               |  32  |  1.0   |  0.8428   |  0.8287  |         0.9932         |
|          resnext50_32x4d          |  8   | 0.999  |   0.888   |  0.8236  |         0.919          |
|            hf_T5_large            |  2   |  1.0   |  0.8482   |  0.8219  |         1.1293         |
|           squeezenet1_1           |  32  | 0.9994 |  0.8302   |  0.8131  |         0.9868         |
|            mnasnet1_0             |  32  | 1.0021 |  0.9062   |  0.8117  |         0.8757         |
|              hf_Bart              |  4   |  1.0   |  0.8676   |  0.794   |         1.0392         |
| attention_is_all_you_need_pytorch | 256  | 1.0021 |  0.9238   |  0.7913  |         0.9488         |
|              demucs               |  4   | 0.9981 |  0.9982   |  0.7557  |         0.9979         |
|            timm_vovnet            |  32  | 1.0014 |  0.7568   |  0.7515  |         0.952          |
|          pytorch_struct           | 200  |  1.0   |  0.5108   |  0.726   |         0.7355         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  1.0   |  0.9023   |  0.6919  |         0.7115         |
|            hf_BigBird             |  2   | 0.9886 |  0.9851   |  0.6819  |         1.0398         |
|               vgg16               |  64  | 0.9999 |  0.6744   |  0.6712  |         0.9999         |
|      nvidia_deeprecommender       | 256  | 1.0002 |  0.8886   |  0.6651  |         0.9844         |
|                drq                |  1   |  1.0   |   0.98    |  0.6493  |         0.9801         |
|            densenet121            |  4   | 1.0027 |  0.7954   |  0.647   |         0.7937         |
|          LearningToPaint          |  96  | 0.9989 |  0.7184   |  0.6394  |         0.8274         |
|             resnet18              |  16  | 0.9996 |  0.8022   |  0.6283  |         0.6981         |
|         soft_actor_critic         | 256  | 0.9999 |  0.9689   |  0.6192  |         0.9997         |
|           lennard_jones           | 1000 |  1.0   |    1.0    |  0.5322  |          1.0           |
|          phlippe_resnet           | 128  |  1.0   |  0.8597   |  0.4452  |         0.4791         |
|       functorch_dp_cifar10        |  64  |  1.0   |  0.9209   |  0.3989  |         0.423          |
|            hf_Reformer            |  4   | 0.7852 |  0.7852   |  0.3861  |         0.7865         |
|           hf_GPT2_large           |  4   |  1.0   |  0.8611   |   nan    |         1.1216         |
|           hf_Longformer           |  2   | 0.9991 |  0.9645   |   nan    |          nan           |
|      resnet50_quantized_qat       |  32  | 1.0003 |  0.9473   |   nan    |          nan           |
|    mobilenet_v2_quantized_qat     |  96  | 1.0002 |  0.8329   |   nan    |          nan           |
|               moco                |  32  | 1.0125 |    nan    |   nan    |          nan           |
|                gat                |  0   |  nan   |    nan    |   nan    |          nan           |
|                gcn                |  0   |  nan   |    nan    |   nan    |          nan           |
|               sage                |  0   |  nan   |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |  nan   |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |  nan   |    nan    |   nan    |          nan           |
+-----------------------------------+------+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|        Background_Matting         |  4   | 183.5563 | 976.4087  | 171.3196 |        172.2008        |
|            timm_nfnet             | 128  | 194.864  | 195.9181  | 135.1297 |        140.6248        |
|            hf_T5_large            |  2   | 198.1343 | 227.2955  | 134.8755 |         131.7          |
|               hf_T5               |  8   | 184.5011 | 223.9152  | 133.1855 |        131.5264        |
|            Super_SloMo            |  6   | 118.7721 | 484.8472  | 104.1504 |        104.1616        |
|               vgg16               |  64  | 107.124  |  107.332  | 92.8585  |        92.3837         |
|              yolov3               |  16  | 99.6302  | 116.8732  | 87.0073  |        86.6359         |
|            timm_regnet            |  32  | 96.4354  | 105.4046  | 86.8946  |        87.7459         |
|            hf_BigBird             |  2   | 193.3227 | 232.3756  | 82.4634  |        117.5966        |
|           hf_Bert_large           |  4   | 94.3999  |  95.8017  | 80.1398  |        80.4483         |
|             resnet152             |  32  |  83.352  | 101.1892  | 78.5245  |        79.6654         |
|            hf_Reformer            |  4   | 82.5132  |  82.6397  | 74.1087  |        74.6232         |
|              demucs               |  4   | 75.0165  |  74.9495  | 72.8915  |        72.7964         |
| attention_is_all_you_need_pytorch | 256  |  72.808  |  75.0754  | 67.9447  |        67.7891         |
|           mobilenet_v2            |  96  | 69.5288  |  82.273   | 51.5659  |        51.3613         |
|           pytorch_unet            |  1   | 58.4208  | 212.9134  | 49.5746  |         49.592         |
|              hf_Bart              |  4   | 54.9569  |  83.361   | 46.4361  |        45.5966         |
|             hf_Albert             |  8   | 76.0351  |  76.115   | 45.9599  |        45.9271         |
|           fastNLP_Bert            |  6   | 60.5081  |  61.9936  | 42.7522  |        43.2022         |
|            timm_vovnet            |  32  |  42.175  |  45.7721  | 39.7809  |        38.6882         |
|              hf_GPT2              |  4   | 51.0283  |  52.3553  | 37.0157  |        35.9351         |
|        speech_transformer         |  32  | 53.7805  |  56.5039  | 36.6872  |        37.2397         |
|           hf_DistilBert           |  8   | 40.1388  |  42.587   | 34.2391  |        33.4453         |
|             resnet50              |  32  | 38.3426  |  43.4971  | 34.0744  |        33.8635         |
|              hf_Bert              |  4   | 39.3833  |  42.5515  | 33.7546  |        34.0765         |
|         timm_efficientnet         |  32  | 38.7203  |  57.9944  |  32.23   |        33.6749         |
|      timm_vision_transformer      |  32  | 30.1458  |  30.2235  | 26.5782  |        26.8362         |
|        shufflenet_v2_x1_0         | 128  | 34.0006  |  43.8624  | 24.7304  |        27.5297         |
|           BERT_pytorch            |  16  | 55.4602  |  62.765   | 24.5789  |        24.3929         |
|           timm_resnest            |  32  | 31.1804  |  35.2712  | 22.6863  |        23.0978         |
|            densenet121            |  4   | 56.9023  |  66.2191  | 21.3421  |        46.2307         |
|            mnasnet1_0             |  32  | 26.2314  |  33.4751  | 20.6817  |        25.2216         |
|          pytorch_stargan          |  16  | 23.7977  |  24.6021  | 19.5231  |        19.7364         |
|        mobilenet_v3_large         |  32  |  27.518  |  32.2362  | 17.6835  |        23.7602         |
|          resnext50_32x4d          |  8   | 20.4098  |  28.2847  |  15.448  |        22.2046         |
|         phlippe_densenet          | 128  | 25.2362  |  30.8328  | 13.9288  |        23.1727         |
|          LearningToPaint          |  96  | 13.7963  |  16.1152  | 12.1798  |        12.7358         |
|              alexnet              | 128  | 12.2762  |  12.2953  |  11.437  |        11.0193         |
|      nvidia_deeprecommender       | 256  |  8.5635  |  8.8596   | 10.7008  |         8.8471         |
|            tts_angular            |  64  |  9.5202  |  9.6973   | 10.1497  |         9.3571         |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 12.6895  |  13.4687  |  8.0956  |         8.438          |
|             resnet18              |  16  |  9.3312  |  12.102   |  7.306   |         9.184          |
|           squeezenet1_1           |  32  |  9.4014  |  9.9142   |  7.0637  |         7.7263         |
|          phlippe_resnet           | 128  |  8.1671  |  10.7136  |  5.3025  |         7.7777         |
|       functorch_dp_cifar10        |  64  |  7.9513  |  7.7878   |  3.2445  |         6.6713         |
|          pytorch_struct           | 200  |  3.9275  |  5.0782   |  2.7707  |         3.5213         |
|                drq                |  1   |  2.2941  |  3.0523   |  2.6294  |         2.6745         |
|               dlrm                | 1024 |  3.9623  |  3.9053   |  2.2843  |         3.1055         |
|               dcgan               |  32  |  1.9172  |  2.4214   |  1.5442  |         2.139          |
|         soft_actor_critic         | 256  |  1.3241  |  1.7093   |  1.1223  |         1.4966         |
|           lennard_jones           | 1000 |  1.2416  |  1.4443   |  0.9161  |         1.324          |
|           hf_GPT2_large           |  4   | 243.5783 | 249.4878  |   nan    |        173.7771        |
|           hf_Longformer           |  2   | 138.4195 | 227.2282  |   nan    |          nan           |
|    mobilenet_v2_quantized_qat     |  96  | 143.9017 | 173.1822  |   nan    |          nan           |
|      resnet50_quantized_qat       |  32  | 89.0956  | 108.4712  |   nan    |          nan           |
|               moco                |  32  | 64.7334  |    nan    |   nan    |          nan           |
|                gat                |  0   |   nan    |    nan    |   nan    |          nan           |
|                gcn                |  0   |   nan    |    nan    |   nan    |          nan           |
|               sage                |  0   |   nan    |    nan    |   nan    |          nan           |
|             tacotron2             |  0   |   nan    |    nan    |   nan    |          nan           |
|           torchrec_dlrm           |  0   |   nan    |    nan    |   nan    |          nan           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

huggingface suite with float32 precision

see more

Performance speedup

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|             OPTForCausalLM              |  2  | 0.9951 |  0.9287   |  1.7924  |         1.8314         |
|      GPT2ForSequenceClassification      |  4  | 0.9882 |   0.962   |  1.6283  |         1.6549         |
|            XLNetLMHeadModel             |  8  | 0.9977 |  0.9647   |  1.6185  |         1.6244         |
|          MobileBertForMaskedLM          | 64  | 0.9297 |  0.8159   |  1.6089  |         1.2759         |
|               GoogleFnet                | 16  | 0.9866 |  0.9521   |  1.5586  |         1.5449         |
|       MT5ForConditionalGeneration       | 16  | 0.9955 |   0.886   |  1.4886  |         1.4741         |
|           ElectraForCausalLM            | 32  | 0.9871 |  0.9259   |  1.4355  |         1.4274         |
|       ElectraForQuestionAnswering       | 64  | 0.9896 |  0.9805   |  1.4239  |         1.4119         |
|               DistillGPT2               | 16  | 0.9944 |  0.9386   |  1.3715  |         1.4236         |
|             XGLMForCausalLM             |  8  | 1.0018 |  0.9453   |  1.3505  |         1.3069         |
|    LayoutLMForSequenceClassification    | 16  | 0.9882 |  0.9772   |  1.2796  |         1.2844         |
|       RobertaForQuestionAnswering       | 16  | 0.9881 |  0.9761   |  1.2709  |         1.2635         |
|        BertForQuestionAnswering         | 16  | 0.9881 |  0.9761   |  1.2694  |         1.2621         |
|           RobertaForCausalLM            | 16  | 0.9905 |  0.9606   |  1.2672  |         1.2605         |
|       AlbertForQuestionAnswering        |  4  | 0.9996 |  1.0013   |  1.2531  |         1.2525         |
|            AlbertForMaskedLM            |  4  | 1.0008 |  0.9991   |  1.2504  |         1.248          |
|       T5ForConditionalGeneration        |  4  | 0.9853 |  0.8135   |  1.2391  |         1.3141         |
|                 T5Small                 |  4  | 0.9849 |  0.8164   |  1.2368  |         1.3147         |
|            PLBartForCausalLM            |  8  | 0.9959 |  0.9501   |  1.2293  |         1.2584         |
|     PLBartForConditionalGeneration      |  4  | 0.9954 |  0.9545   |  1.2248  |         1.2388         |
|     MobileBertForQuestionAnswering      | 128 | 0.9388 |  0.8869   |  1.2088  |         1.3298         |
|            YituTechConvBert             | 16  | 0.9902 |  0.9596   |  1.2023  |         1.1982         |
|                CamemBert                | 16  |  0.99  |  0.9584   |  1.1903  |         1.188          |
|             BertForMaskedLM             | 16  |  0.99  |  0.9583   |  1.1882  |         1.1886         |
|    MegatronBertForQuestionAnswering     |  8  | 0.985  |  0.9705   |  1.1852  |         1.2016         |
|           LayoutLMForMaskedLM           | 16  | 0.9905 |  0.9597   |  1.1844  |         1.2016         |
|     DistilBertForQuestionAnswering      | 256 | 0.998  |  0.9919   |  1.1479  |         1.1475         |
|         Speech2Text2ForCausalLM         | 256 | 0.9949 |  0.9142   |  1.1457  |          1.18          |
|             BartForCausalLM             |  4  | 0.9894 |  0.9552   |  1.1388  |         1.1687         |
|            MBartForCausalLM             |  4  | 0.9957 |  0.9552   |  1.1336  |         1.1575         |
|         MegatronBertForCausalLM         |  4  | 0.9755 |   0.963   |  1.1307  |         1.1548         |
|      MBartForConditionalGeneration      |  2  | 0.9915 |  0.9769   |  1.0933  |         1.1074         |
|      BartForConditionalGeneration       |  2  | 0.9909 |  0.9745   |  1.0908  |         1.1125         |
| BlenderbotSmallForConditionalGeneration | 64  | 0.9927 |  0.9307   |  1.067   |         1.0876         |
|       DebertaForQuestionAnswering       |  8  | 0.8483 |  0.7994   |  1.0467  |         1.0367         |
|     M2M100ForConditionalGeneration      | 16  | 0.9955 |  0.9589   |  1.0401  |         1.096          |
|          DistilBertForMaskedLM          | 128 | 0.9963 |  0.9483   |  1.0303  |         1.0545         |
|            TrOCRForCausalLM             | 32  | 0.9956 |  0.9481   |  1.0262  |         1.0554         |
|     PegasusForConditionalGeneration     | 32  | 0.9917 |  0.9611   |  1.0146  |         1.0469         |
|           PegasusForCausalLM            | 32  | 0.9913 |  0.9384   |  0.9987  |         1.0299         |
|       BlenderbotSmallForCausalLM        | 64  | 0.9936 |  0.9001   |  0.9902  |         1.041          |
|           DebertaForMaskedLM            |  4  | 0.7215 |   0.593   |  0.8603  |         0.8475         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.7042 |   0.635   |  0.7474  |         0.7018         |
|          DebertaV2ForMaskedLM           |  1  | 0.6643 |  0.5353   |  0.7285  |         0.6235         |
|          BlenderbotForCausalLM          |  4  | 0.9966 |  0.9254   |   0.0    |         1.0243         |
|          AllenaiLongformerBase          |  4  | 0.9965 |  0.5955   |   0.0    |          0.0           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|               GoogleFnet                | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   fail_to_run    |          pass          |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|                  name                   | bs  |  eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+
|       MT5ForConditionalGeneration       | 16  | 6.0306  |  14.9876  | 123.6133 |        124.5428        |
|          DebertaV2ForMaskedLM           |  1  | 11.0895 |  20.1937  | 122.3893 |        56.2815         |
|      DebertaV2ForQuestionAnswering      |  2  | 11.0277 |  19.613   | 120.5788 |        54.1742         |
|          MobileBertForMaskedLM          | 64  | 14.9969 |  31.3009  | 118.6652 |        116.5587        |
|     M2M100ForConditionalGeneration      | 16  | 7.3935  |  18.2064  |  112.9   |        111.2294        |
|     MobileBertForQuestionAnswering      | 128 | 15.1632 |  30.7802  | 111.0429 |        111.8068        |
|             XGLMForCausalLM             |  8  | 5.9312  |  14.3923  | 107.2883 |        107.2729        |
|       DebertaForQuestionAnswering       |  8  | 5.8017  |  10.9267  | 82.5257  |        44.9883         |
|            XLNetLMHeadModel             |  8  | 7.1862  |  21.3912  | 81.8546  |        80.7136         |
|           DebertaForMaskedLM            |  4  | 5.8912  |  11.5427  | 75.1533  |        43.9535         |
|      MBartForConditionalGeneration      |  2  | 7.5776  |  18.0258  | 62.0776  |        60.3802         |
|      BartForConditionalGeneration       |  2  | 7.4367  |  18.0763  | 59.3816  |        58.8567         |
|     PegasusForConditionalGeneration     | 32  | 4.3856  |  14.7653  | 58.6293  |        58.4029         |
|            YituTechConvBert             | 16  | 4.8974  |  11.2509  | 54.7476  |        54.8362         |
|         MegatronBertForCausalLM         |  4  | 6.9317  |  15.3855  | 54.7446  |        53.8561         |
|    MegatronBertForQuestionAnswering     |  8  | 6.8451  |  15.2455  |  53.911  |        53.2993         |
| BlenderbotSmallForConditionalGeneration | 64  | 5.0077  |  11.7142  | 45.1028  |        43.5075         |
|       T5ForConditionalGeneration        |  4  | 4.0546  |  10.1857  | 44.8251  |         43.684         |
|                 T5Small                 |  4  | 4.0669  |  10.201   | 44.6733  |        44.2553         |
|           ElectraForCausalLM            | 32  |  3.49   |  7.6327   | 43.5182  |        42.8592         |
|    LayoutLMForSequenceClassification    | 16  | 3.5895  |  7.8777   | 39.8379  |        40.2257         |
|     PLBartForConditionalGeneration      |  4  | 3.7711  |   9.302   | 39.3145  |        39.2942         |
|       ElectraForQuestionAnswering       | 64  | 3.4177  |  7.5813   | 38.1008  |        35.9447         |
|        BertForQuestionAnswering         | 16  | 3.4449  |  7.5345   | 35.3922  |        32.0624         |
|           LayoutLMForMaskedLM           | 16  |  3.666  |  7.9313   | 34.5733  |        33.0403         |
|             BertForMaskedLM             | 16  | 3.4448  |  7.5144   | 33.4232  |        32.3938         |
|            AlbertForMaskedLM            |  4  | 1.7264  |  5.8716   | 33.3856  |        32.6207         |
|            MBartForCausalLM             |  4  | 3.1934  |  7.2127   | 33.2318  |        33.3187         |
|     DistilBertForQuestionAnswering      | 256 | 1.6929  |  3.7184   | 32.9286  |        33.0869         |
|           PegasusForCausalLM            | 32  | 3.0876  |  7.1433   | 32.3449  |        31.9244         |
|             BartForCausalLM             |  4  | 3.1263  |  7.2331   | 31.3449  |        30.4841         |
|             OPTForCausalLM              |  2  | 2.9414  |  7.0093   | 30.9589  |        30.6932         |
|                CamemBert                | 16  | 3.4923  |  7.6519   | 30.7026  |        30.5741         |
|      GPT2ForSequenceClassification      |  4  |   3.3   |  7.1728   | 30.6641  |        29.6677         |
|          DistilBertForMaskedLM          | 128 | 1.6488  |  3.7467   | 30.5608  |        29.8045         |
|           RobertaForCausalLM            | 16  | 3.4806  |   7.56    | 30.4575  |        30.9289         |
|       AlbertForQuestionAnswering        |  4  | 1.7361  |  5.8229   | 29.9435  |        29.1805         |
|       RobertaForQuestionAnswering       | 16  | 3.4405  |  7.4946   | 29.8235  |        29.7004         |
|            TrOCRForCausalLM             | 32  | 3.1085  |  7.1161   | 29.7668  |        28.9086         |
|               GoogleFnet                | 16  | 1.9636  |  3.9306   | 27.9837  |        26.8862         |
|               DistillGPT2               | 16  | 1.8164  |  3.8511   | 25.6592  |        25.3368         |
|       BlenderbotSmallForCausalLM        | 64  | 2.1887  |  4.7941   | 23.9729  |        23.4394         |
|            PLBartForCausalLM            |  8  | 1.7353  |  3.8075   | 22.4415  |        21.8727         |
|         Speech2Text2ForCausalLM         | 256 | 1.7042  |  3.8062   | 20.8635  |        22.2097         |
|          BlenderbotForCausalLM          |  4  | 6.2057  |  14.3338  |   nan    |        53.9075         |
|          AllenaiLongformerBase          |  4  | 7.3138  |  27.7093  |   nan    |          nan           |
+-----------------------------------------+-----+---------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|                  name                   | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+
|               GoogleFnet                | 16  |  1.0   |  0.9205   |  1.1428  |         1.1437         |
|            XLNetLMHeadModel             |  8  |  1.0   |  0.9738   |  1.0737  |         1.0737         |
|       ElectraForQuestionAnswering       | 64  |  1.0   |  0.9522   |  1.0207  |         1.0758         |
|      GPT2ForSequenceClassification      |  4  |  1.0   |  0.8955   |  1.0169  |         1.1459         |
|             OPTForCausalLM              |  2  | 0.9999 |  0.9236   |  1.0142  |         1.094          |
|       RobertaForQuestionAnswering       | 16  |  1.0   |  0.9325   |  0.9949  |         1.0711         |
|        BertForQuestionAnswering         | 16  |  1.0   |  0.9325   |  0.9949  |         1.0711         |
|    LayoutLMForSequenceClassification    | 16  | 1.0001 |  0.9327   |  0.994   |         1.0557         |
|             BertForMaskedLM             | 16  |  1.0   |  0.9392   |  0.9843  |         0.9848         |
|           RobertaForCausalLM            | 16  |  1.0   |  0.9389   |  0.9841  |         0.9847         |
|                CamemBert                | 16  |  1.0   |  0.9372   |  0.9815  |         0.982          |
|            YituTechConvBert             | 16  |  1.0   |  0.9351   |  0.9445  |         0.945          |
|     DistilBertForQuestionAnswering      | 256 |  1.0   |  0.9594   |  0.9362  |         1.0349         |
|           LayoutLMForMaskedLM           | 16  |  1.0   |  0.9393   |  0.9249  |         0.9848         |
|                 T5Small                 |  4  |  1.0   |  0.9589   |  0.9202  |         1.0871         |
|       T5ForConditionalGeneration        |  4  |  1.0   |  0.9589   |  0.9202  |         1.0871         |
|    MegatronBertForQuestionAnswering     |  8  |  1.0   |  0.9167   |  0.915   |         1.063          |
|           ElectraForCausalLM            | 32  |  1.0   |  0.8827   |  0.9094  |         0.9099         |
|     PLBartForConditionalGeneration      |  4  | 0.9999 |  0.9321   |  0.9018  |         0.9919         |
|               DistillGPT2               | 16  |  1.0   |  0.8755   |  0.8492  |         0.9667         |
|         MegatronBertForCausalLM         |  4  |  1.0   |  0.8909   |  0.8374  |         1.0256         |
|            MBartForCausalLM             |  4  |  1.0   |  0.9069   |  0.8347  |         0.9142         |
|             BartForCausalLM             |  4  |  1.0   |  0.9067   |  0.8345  |         0.9141         |
|            PLBartForCausalLM            |  8  |  1.0   |  0.8876   |  0.8267  |         0.9244         |
|      MBartForConditionalGeneration      |  2  |  1.0   |   0.882   |  0.8239  |         0.9794         |
| BlenderbotSmallForConditionalGeneration | 64  |  1.0   |  0.8954   |  0.8181  |         0.9055         |
|     PegasusForConditionalGeneration     | 32  |  1.0   |  0.9169   |  0.8166  |         0.9645         |
|      BartForConditionalGeneration       |  2  |  1.0   |  0.8824   |  0.8137  |         0.9794         |
|          DistilBertForMaskedLM          | 128 |  1.0   |   0.883   |  0.7997  |         0.8955         |
|           PegasusForCausalLM            | 32  |  1.0   |  0.8813   |  0.7939  |         0.8904         |
|       MT5ForConditionalGeneration       | 16  | 1.0006 |   0.869   |  0.7921  |         0.9119         |
|            TrOCRForCausalLM             | 32  |  1.0   |  0.8737   |  0.7827  |         0.8833         |
|       AlbertForQuestionAnswering        |  4  |  1.0   |  0.9399   |  0.7775  |         1.2465         |
|            AlbertForMaskedLM            |  4  |  1.0   |  0.9222   |  0.7684  |         1.2096         |
|     M2M100ForConditionalGeneration      | 16  |  1.0   |   0.843   |  0.7498  |         0.9016         |
|       BlenderbotSmallForCausalLM        | 64  |  1.0   |  0.8375   |  0.7233  |         0.8426         |
|         Speech2Text2ForCausalLM         | 256 |  1.0   |  0.8419   |  0.7096  |         0.8199         |
|             XGLMForCausalLM             |  8  |  1.0   |   0.818   |  0.7095  |         0.9136         |
|          MobileBertForMaskedLM          | 64  |  1.0   |  0.8258   |  0.5643  |         0.736          |
|          DebertaV2ForMaskedLM           |  1  | 0.9877 |  0.9876   |  0.5506  |         0.988          |
|           DebertaForMaskedLM            |  4  | 0.9751 |  0.9598   |  0.5429  |         1.0107         |
|     MobileBertForQuestionAnswering      | 128 |  1.0   |  0.9908   |  0.4677  |         0.5574         |
|       DebertaForQuestionAnswering       |  8  | 0.9614 |  1.0317   |  0.4577  |         1.1413         |
|      DebertaV2ForQuestionAnswering      |  2  | 0.9762 |  0.9724   |  0.4556  |         0.9759         |
|          BlenderbotForCausalLM          |  4  | 1.0005 |  1.0017   |   nan    |         1.001          |
|          AllenaiLongformerBase          |  4  | 0.9986 |  0.9301   |   nan    |          nan           |
+-----------------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|            AlbertForMaskedLM            |  4  | 389.0115 | 389.2913  | 310.855  |        312.6066        |
|       AlbertForQuestionAnswering        |  4  | 386.6616 | 385.6037  | 308.6096 |        309.324         |
|            XLNetLMHeadModel             |  8  | 378.8146 | 391.4038  | 233.8937 |        232.714         |
|     PegasusForConditionalGeneration     | 32  | 173.7936 | 179.6325  | 171.3355 |        166.1784        |
|            TrOCRForCausalLM             | 32  | 169.986  | 176.7528  | 164.9937 |        160.7474        |
|    MegatronBertForQuestionAnswering     |  8  | 175.0354 | 177.6833  | 146.1871 |        143.9478        |
|      DebertaV2ForQuestionAnswering      |  2  | 150.0347 | 166.2364  |  141.55  |        151.3088        |
|      MBartForConditionalGeneration      |  2  | 149.9696 | 152.0281  | 136.274  |        135.587         |
|      BartForConditionalGeneration       |  2  | 150.516  | 154.1089  | 135.9758 |        133.4038        |
|            YituTechConvBert             | 16  | 156.8592 | 161.6919  | 129.2601 |        129.6307        |
|     DistilBertForQuestionAnswering      | 256 |  145.52  | 146.5274  | 127.2631 |        127.1935        |
|          DebertaV2ForMaskedLM           |  1  | 133.0088 | 167.0629  | 123.1884 |        141.8007        |
|          DistilBertForMaskedLM          | 128 | 122.8925 | 129.7185  | 119.6985 |        116.5623        |
|           LayoutLMForMaskedLM           | 16  | 138.3723 |  142.863  | 116.1603 |        114.3909        |
|     MobileBertForQuestionAnswering      | 128 | 150.8937 | 158.9379  | 115.2885 |        122.3066        |
|                CamemBert                | 16  | 137.3197 | 141.6983  | 114.6608 |        114.6942        |
|             BertForMaskedLM             | 16  | 135.9518 | 140.3558  | 113.6014 |        113.5406        |
|           RobertaForCausalLM            | 16  | 144.2961 | 148.6017  | 113.1714 |        113.6962        |
| BlenderbotSmallForConditionalGeneration | 64  | 120.1667 |  129.791  | 112.0214 |        109.6656        |
|     M2M100ForConditionalGeneration      | 16  | 119.428  | 122.0898  | 111.9917 |        109.8338        |
|            MBartForCausalLM             |  4  | 123.8205 | 129.0186  | 109.3868 |        107.3908        |
|             BartForCausalLM             |  4  | 125.0899 | 129.0296  | 108.9578 |        106.2221        |
|     PLBartForConditionalGeneration      |  4  | 121.8425 | 126.4535  | 100.5217 |        97.4683         |
|            PLBartForCausalLM            |  8  | 120.6186 | 126.9439  | 97.6729  |         93.818         |
|          MobileBertForMaskedLM          | 64  | 131.454  | 153.5142  | 95.4707  |        111.4202        |
|             OPTForCausalLM              |  2  | 173.7386 | 184.1215  | 95.4297  |        93.1427         |
|         MegatronBertForCausalLM         |  4  | 104.076  | 105.1513  | 90.5093  |        88.4321         |
|    LayoutLMForSequenceClassification    | 16  |  114.87  |  116.26   | 89.1693  |        88.8637         |
|       ElectraForQuestionAnswering       | 64  | 126.3775 | 127.5094  | 88.6793  |        88.5631         |
|        BertForQuestionAnswering         | 16  | 112.4422 | 113.6181  | 88.5957  |        88.4051         |
|       RobertaForQuestionAnswering       | 16  | 112.6914 | 114.0163  | 88.1282  |        88.5579         |
|               DistillGPT2               | 16  | 121.1643 | 128.8744  | 87.9638  |        84.6725         |
|           PegasusForCausalLM            | 32  | 85.9548  |   90.78   |  86.953  |         83.362         |
|       T5ForConditionalGeneration        |  4  | 105.7331 | 128.1551  | 84.5827  |        79.4515         |
|                 T5Small                 |  4  | 105.898  |  128.076  | 84.5623  |        79.4429         |
|       DebertaForQuestionAnswering       |  8  | 97.7999  | 102.5552  | 79.4854  |        79.0613         |
|           ElectraForCausalLM            | 32  | 107.3826 | 114.4564  | 73.9921  |        74.2814         |
|             XGLMForCausalLM             |  8  |  81.788  |  85.2463  | 71.0978  |         73.876         |
|           DebertaForMaskedLM            |  4  | 84.2589  | 106.5771  | 70.5148  |        73.9413         |
|               GoogleFnet                | 16  | 103.2382 | 106.9232  | 65.9896  |        66.0227         |
|       BlenderbotSmallForCausalLM        | 64  | 64.6973  |  72.4413  | 65.0582  |        62.1556         |
|      GPT2ForSequenceClassification      |  4  | 103.6259 | 106.3697  | 62.8925  |        61.8215         |
|       MT5ForConditionalGeneration       | 16  | 88.3505  | 100.0412  | 59.4994  |        59.9986         |
|         Speech2Text2ForCausalLM         | 256 | 63.8424  |  69.6142  | 56.6166  |        54.0307         |
|          BlenderbotForCausalLM          |  4  | 102.5556 | 110.4037  |   nan    |        94.2818         |
|          AllenaiLongformerBase          |  4  | 249.5502 | 417.8179  |   nan    |          nan           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

timm_models suite with float32 precision

see more

Performance speedup

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          ghostnet_100           | 128 | 0.9933 |  0.7712   |  1.7307  |         1.7072         |
|        tnt_s_patch16_224        | 128 | 0.9988 |  0.9982   |   1.71   |         1.704          |
|            lcnet_050            | 128 | 0.9516 |  0.7499   |  1.5224  |         1.5354         |
|         coat_lite_mini          | 128 | 0.9983 |  0.9983   |  1.4452  |         1.3993         |
|           convit_base           | 64  | 0.9981 |  0.9969   |  1.4426  |         1.443          |
|           dm_nfnet_f0           | 128 | 0.9912 |  0.9893   |  1.4254  |         1.3935         |
|            nfnet_l0             | 128 | 0.9933 |  0.7786   |  1.4172  |         1.3946         |
|          gmlp_s16_224           | 128 | 0.9949 |  1.0696   |  1.3669  |         1.3665         |
|      xcit_large_24_p8_224       |  5  | 0.9944 |  0.9545   |  1.3646  |         1.2864         |
|          gmixer_24_224          | 128 | 0.9961 |  0.8277   |  1.3569  |         1.3554         |
|           volo_d1_224           | 64  | 0.9954 |  0.9714   |  1.3563  |         1.3468         |
|            hrnet_w18            | 128 | 0.9947 |  0.7018   |  1.3469  |         1.3173         |
|             dla102              | 128 | 0.997  |  0.8733   |  1.3469  |         1.3466         |
|         crossvit_9_240          | 128 | 0.9912 |  0.8044   |  1.3385  |         1.3293         |
|        sebotnet33ts_256         | 64  | 0.9707 |  0.7635   |  1.3311  |         1.3487         |
|        adv_inception_v3         | 128 | 0.9973 |  0.8947   |  1.3083  |         1.3036         |
|          inception_v3           | 128 | 0.9974 |  0.8949   |  1.3075  |         1.3043         |
|       gluon_inception_v3        | 128 | 0.9974 |  0.8923   |  1.3073  |         1.3043         |
|        res2net50_14w_8s         | 128 | 0.9992 |  0.8676   |  1.3059  |         1.298          |
|         mobilenetv2_100         | 128 | 0.9654 |  0.8228   |  1.2848  |         1.3199         |
|      mobilenetv3_large_100      | 128 | 0.9637 |  0.8127   |  1.2834  |         1.3128         |
|        twins_pcpvt_base         | 64  | 0.9919 |  0.9876   |  1.2576  |         1.2314         |
|       tf_efficientnet_b0        | 128 | 0.9745 |  0.7141   |  1.2569  |         1.2752         |
|          botnet26t_256          | 128 | 0.9814 |  0.8942   |  1.2548  |         1.2648         |
|       eca_botnext26ts_256       | 128 | 0.9856 |  0.7508   |  1.244   |         1.2311         |
|            fbnetv3_b            | 128 | 0.9629 |  0.8267   |  1.2345  |         1.2668         |
|           resnest101e           | 64  | 0.9967 |   0.922   |  1.2341  |         1.2042         |
|        ese_vovnet19b_dw         | 128 | 0.9736 |  0.8891   |  1.2197  |         1.2312         |
|           mnasnet_100           | 128 | 0.9651 |  0.8086   |  1.2169  |         1.2509         |
|           rexnet_100            | 128 | 0.9684 |  0.7308   |  1.2136  |         1.232          |
|           selecsls42b           | 128 | 0.999  |  0.8635   |  1.2059  |         1.1993         |
|           fbnetc_100            | 128 | 0.966  |   0.821   |  1.205   |         1.2343         |
|           mobilevit_s           | 64  | 0.9732 |  0.7151   |  1.2046  |         1.1926         |
|           regnety_002           | 128 | 0.9266 |   0.775   |  1.2044  |         1.2023         |
|           res2next50            | 128 | 0.9992 |  0.9119   |  1.1955  |         1.1763         |
|          jx_nest_base           | 32  | 0.9888 |  0.9828   |  1.1901  |         1.1858         |
|          cait_m36_384           |  4  | 0.9957 |  0.9958   |  1.189   |         1.1915         |
|            pit_b_224            | 64  | 0.9963 |  0.9941   |  1.1869  |         1.1808         |
|          spnasnet_100           | 128 |  0.96  |  0.8037   |  1.1864  |         1.2203         |
|            tinynet_a            | 128 | 0.9638 |  0.7008   |  1.1818  |         1.2082         |
|          cspdarknet53           | 64  | 0.9501 |  0.8277   |  1.1728  |         1.1978         |
|         poolformer_m36          | 64  | 0.9891 |  0.9857   |  1.1708  |         1.1666         |
|  swin_base_patch4_window7_224   | 64  | 0.9943 |   0.972   |  1.1702  |         1.1689         |
|           tf_mixnet_l           | 128 | 0.9824 |  0.8382   |  1.1605  |         1.1664         |
|             dpn107              | 32  | 0.9589 |  0.9096   |  1.1563  |         1.1879         |
|            mixnet_l             | 128 | 0.9822 |  0.8338   |  1.1492  |         1.1552         |
|          pnasnet5large          | 16  | 0.9906 |  0.9535   |  1.1415  |         1.158          |
|            repvgg_a2            | 128 | 0.961  |   0.873   |  1.1363  |         1.1442         |
|        res2net101_26w_4s        | 64  | 0.9989 |  0.8949   |  1.1226  |         1.1357         |
|          mixer_b16_224          | 128 | 0.9977 |  0.9997   |  1.1062  |         1.1092         |
|      beit_base_patch16_224      | 64  | 0.9982 |  0.9793   |  1.0974  |         1.1031         |
|          convnext_base          | 64  | 0.9924 |  0.9906   |  1.0909  |         1.0866         |
| deit_base_distilled_patch16_224 | 64  | 0.9977 |  0.9964   |  1.0856  |         1.0854         |
|      vit_base_patch16_224       | 64  | 0.9986 |  0.9966   |  1.0812  |         1.0823         |
|     swsl_resnext101_32x16d      | 32  | 0.9986 |  0.9244   |  1.0667  |         1.0377         |
|        convmixer_768_32         | 32  | 0.9989 |  0.9904   |  1.0592  |         1.061          |
|            gernet_l             | 128 | 0.9644 |  0.8881   |  1.0517  |         1.061          |
|         visformer_small         | 128 | 0.9974 |  0.9528   |  1.0458  |         1.0157         |
|        gluon_xception65         | 32  | 0.9959 |  0.9063   |  1.0272  |         1.0345         |
|          resmlp_12_224          | 128 | 0.9935 |  0.8928   |  0.9303  |         0.9299         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+-------+-----------+----------+------------------------+
|              name               | bs | eager | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+----+-------+-----------+----------+------------------------+
|        adv_inception_v3         | 8  | pass  |   pass    |   pass   |          pass          |
|      beit_base_patch16_224      | 8  | pass  |   pass    |   pass   |          pass          |
|      mobilenetv3_large_100      | 8  | pass  |   pass    |   pass   |          pass          |
|           mobilevit_s           | 8  | pass  |   pass    |   pass   |          pass          |
|            nfnet_l0             | 8  | pass  |   pass    |   pass   |          pass          |
|            pit_b_224            | 8  | pass  |   pass    |   pass   |          pass          |
|          pnasnet5large          | 8  | pass  |   pass    |   pass   |          pass          |
|         poolformer_m36          | 8  | pass  |   pass    |   pass   |          pass          |
|           regnety_002           | 8  | pass  |   pass    |   pass   |          pass          |
|            repvgg_a2            | 8  | pass  |   pass    |   pass   |          pass          |
|        res2net101_26w_4s        | 8  | pass  |   pass    |   pass   |          pass          |
|        res2net50_14w_8s         | 8  | pass  |   pass    |   pass   |          pass          |
|           res2next50            | 8  | pass  |   pass    |   pass   |          pass          |
|          resmlp_12_224          | 8  | pass  |   pass    |   pass   |          pass          |
|           resnest101e           | 8  | pass  |   pass    |   pass   |          pass          |
|           rexnet_100            | 8  | pass  |   pass    |   pass   |          pass          |
|        sebotnet33ts_256         | 8  | pass  |   pass    |   pass   |          pass          |
|           selecsls42b           | 8  | pass  |   pass    |   pass   |          pass          |
|          spnasnet_100           | 8  | pass  |   pass    |   pass   |          pass          |
|  swin_base_patch4_window7_224   | 8  | pass  |   pass    |   pass   |          pass          |
|     swsl_resnext101_32x16d      | 8  | pass  |   pass    |   pass   |          pass          |
|       tf_efficientnet_b0        | 8  | pass  |   pass    |   pass   |          pass          |
|           tf_mixnet_l           | 8  | pass  |   pass    |   pass   |          pass          |
|            tinynet_a            | 8  | pass  |   pass    |   pass   |          pass          |
|        tnt_s_patch16_224        | 8  | pass  |   pass    |   pass   |          pass          |
|        twins_pcpvt_base         | 8  | pass  |   pass    |   pass   |          pass          |
|         visformer_small         | 8  | pass  |   pass    |   pass   |          pass          |
|      vit_base_patch16_224       | 8  | pass  |   pass    |   pass   |          pass          |
|           volo_d1_224           | 8  | pass  |   pass    |   pass   |          pass          |
|         mobilenetv2_100         | 8  | pass  |   pass    |   pass   |          pass          |
|           mnasnet_100           | 8  | pass  |   pass    |   pass   |          pass          |
|            mixnet_l             | 8  | pass  |   pass    |   pass   |          pass          |
|       eca_botnext26ts_256       | 8  | pass  |   pass    |   pass   |          pass          |
|          botnet26t_256          | 8  | pass  |   pass    |   pass   |          pass          |
|          cait_m36_384           | 4  | pass  |   pass    |   pass   |          pass          |
|         coat_lite_mini          | 8  | pass  |   pass    |   pass   |          pass          |
|           convit_base           | 8  | pass  |   pass    |   pass   |          pass          |
|        convmixer_768_32         | 8  | pass  |   pass    |   pass   |          pass          |
|          convnext_base          | 8  | pass  |   pass    |   pass   |          pass          |
|         crossvit_9_240          | 8  | pass  |   pass    |   pass   |          pass          |
|          cspdarknet53           | 8  | pass  |   pass    |   pass   |          pass          |
| deit_base_distilled_patch16_224 | 8  | pass  |   pass    |   pass   |          pass          |
|             dla102              | 8  | pass  |   pass    |   pass   |          pass          |
|           dm_nfnet_f0           | 8  | pass  |   pass    |   pass   |          pass          |
|             dpn107              | 8  | pass  |   pass    |   pass   |          pass          |
|        ese_vovnet19b_dw         | 8  | pass  |   pass    |   pass   |          pass          |
|          mixer_b16_224          | 8  | pass  |   pass    |   pass   |          pass          |
|           fbnetc_100            | 8  | pass  |   pass    |   pass   |          pass          |
|            fbnetv3_b            | 8  | pass  |   pass    |   pass   |          pass          |
|            gernet_l             | 8  | pass  |   pass    |   pass   |          pass          |
|          ghostnet_100           | 8  | pass  |   pass    |   pass   |          pass          |
|       gluon_inception_v3        | 8  | pass  |   pass    |   pass   |          pass          |
|        gluon_xception65         | 8  | pass  |   pass    |   pass   |          pass          |
|          gmixer_24_224          | 8  | pass  |   pass    |   pass   |          pass          |
|          gmlp_s16_224           | 8  | pass  |   pass    |   pass   |          pass          |
|            hrnet_w18            | 8  | pass  |   pass    |   pass   |          pass          |
|          inception_v3           | 8  | pass  |   pass    |   pass   |          pass          |
|          jx_nest_base           | 8  | pass  |   pass    |   pass   |          pass          |
|            lcnet_050            | 8  | pass  |   pass    |   pass   |          pass          |
|      xcit_large_24_p8_224       | 8  | pass  |   pass    |   pass   |          pass          |
+---------------------------------+----+-------+-----------+----------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|           rexnet_100            | 128 | 4.0105 |  8.6583   | 276.7602 |        273.9646        |
|            hrnet_w18            | 128 | 8.6475 |  31.4324  | 233.938  |        225.2791        |
|          ghostnet_100           | 128 | 5.5447 |  11.9806  | 231.1308 |        229.9564        |
|           mobilevit_s           | 64  | 3.6748 |  8.5316   | 183.8375 |        178.0214        |
|            fbnetv3_b            | 128 | 6.0199 |  12.8531  | 160.7021 |        157.2946        |
|       gluon_inception_v3        | 128 | 3.8215 |  9.6792   | 153.5986 |        146.5848        |
|          inception_v3           | 128 | 3.8261 |  9.8092   | 152.5033 |        148.1424        |
|      mobilenetv3_large_100      | 128 | 2.979  |  6.4432   | 152.1616 |        143.8063        |
|        adv_inception_v3         | 128 | 3.8456 |  9.9342   | 149.3431 |        152.1385        |
|            tinynet_a            | 128 | 4.0906 |  9.1665   | 148.3809 |        146.7822        |
|       tf_efficientnet_b0        | 128 | 3.5411 |  8.0063   | 146.624  |        143.9997        |
|            mixnet_l             | 128 | 6.2324 |  12.6212  | 146.5976 |        138.3035        |
|          pnasnet5large          | 16  | 7.5545 |  22.6856  | 146.1797 |        147.4728        |
|           resnest101e           | 64  | 7.464  |  18.3107  | 145.8378 |        147.637         |
|           tf_mixnet_l           | 128 | 6.8421 |  13.3153  | 141.5892 |        147.7897        |
|        res2net101_26w_4s        | 64  | 6.8763 |  19.2526  | 133.0005 |        139.8832        |
|           fbnetc_100            | 128 | 3.5554 |  7.5188   | 132.591  |        130.5764        |
|          spnasnet_100           | 128 | 3.5338 |  7.4684   | 129.5098 |        131.7982        |
|         mobilenetv2_100         | 128 | 2.8945 |  6.1392   | 126.203  |        123.3439        |
|        twins_pcpvt_base         | 64  | 6.2758 |  15.7811  | 118.1931 |        116.9325        |
|           mnasnet_100           | 128 | 2.9329 |   5.887   | 113.2946 |        112.9517        |
|        sebotnet33ts_256         | 64  | 3.1461 |  7.1507   | 113.2927 |        110.2737        |
|        res2net50_14w_8s         | 128 | 5.8385 |  17.6618  | 111.7137 |         112.39         |
|      xcit_large_24_p8_224       |  5  | 7.8301 |  20.0694  | 109.2289 |        111.1661        |
|           regnety_002           | 128 | 3.3296 |   6.642   | 101.3164 |        102.7874        |
|  swin_base_patch4_window7_224   | 64  | 6.1619 |  14.7808  | 100.7511 |        102.5781        |
|       eca_botnext26ts_256       | 128 | 2.489  |  5.8238   | 98.2307  |        96.8318         |
|            lcnet_050            | 128 | 1.6455 |  3.8924   | 96.5549  |        95.9596         |
|          cait_m36_384           |  4  | 8.8692 |  21.8164  | 95.3837  |        93.2648         |
|          cspdarknet53           | 64  | 4.2361 |  8.6512   | 93.2061  |        91.5792         |
|             dpn107              | 32  | 7.2081 |  16.1371  | 91.1458  |        88.0163         |
|             dla102              | 128 | 4.0386 |  10.9094  | 89.4054  |        89.3383         |
|           selecsls42b           | 128 | 1.4854 |  4.1342   | 85.9912  |        85.7256         |
|          botnet26t_256          | 128 | 2.3637 |  4.8399   | 85.0983  |        85.4387         |
|         poolformer_m36          | 64  | 5.1194 |  10.2649  | 85.0233  |        83.9623         |
|        gluon_xception65         | 32  | 4.9185 |  12.9028  | 83.8195  |        83.2555         |
|           res2next50            | 128 | 3.2575 |  9.3991   | 81.1746  |        79.9079         |
|            gernet_l             | 128 | 3.7093 |  7.2589   | 78.4188  |        77.4023         |
|         crossvit_9_240          | 128 | 3.7687 |  9.7221   | 77.9705  |        76.6655         |
|         coat_lite_mini          | 128 | 2.2574 |  5.7788   | 76.9479  |        76.6717         |
|            nfnet_l0             | 128 | 3.7411 |  8.4894   | 72.7622  |        69.9893         |
|        ese_vovnet19b_dw         | 128 | 1.7335 |  3.6326   | 71.2505  |        75.9827         |
|          jx_nest_base           | 32  |  4.49  |  11.0734  | 70.3396  |        70.0794         |
|           dm_nfnet_f0           | 128 | 4.4244 |  9.2973   | 66.0111  |        64.7937         |
|           volo_d1_224           | 64  | 3.3102 |   8.892   | 63.4175  |        62.6754         |
|        tnt_s_patch16_224        | 128 | 4.1952 |  11.5155  | 57.9447  |        56.8855         |
|            repvgg_a2            | 128 | 3.5682 |  7.1054   | 56.6497  |        56.0347         |
|         visformer_small         | 128 | 1.7805 |  4.6255   | 56.6172  |        57.9289         |
|     swsl_resnext101_32x16d      | 32  | 4.0933 |  10.7145  |  54.425  |        54.5669         |
|          gmlp_s16_224           | 128 | 3.4351 |  7.9731   | 50.2895  |        50.0782         |
|          convnext_base          | 64  | 4.2918 |   8.671   |  44.082  |        43.2773         |
|           convit_base           | 64  | 2.2987 |  6.5112   | 41.0221  |        39.5729         |
|          gmixer_24_224          | 128 | 3.3967 |  8.7836   | 40.9155  |        42.8216         |
|            pit_b_224            | 64  | 2.4085 |  5.6843   | 40.0445  |        39.8593         |
|          resmlp_12_224          | 128 | 1.7517 |  3.5746   | 35.2694  |        36.3512         |
|        convmixer_768_32         | 32  | 1.4819 |  5.8924   | 33.1188  |        30.0091         |
| deit_base_distilled_patch16_224 | 64  | 2.0282 |  4.9177   | 30.6342  |        29.0745         |
|      beit_base_patch16_224      | 64  | 2.6669 |  6.3452   | 29.3731  |         28.95          |
|      vit_base_patch16_224       | 64  | 2.0335 |  4.8715   | 27.7207  |        28.0849         |
|          mixer_b16_224          | 128 | 1.5802 |   3.897   | 26.6254  |        26.6312         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+--------+-----------+----------+------------------------+
|              name               | bs  | eager  | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+--------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.0694 |  1.0163   |  1.2139  |         1.3259         |
|         mobilenetv2_100         | 128 | 1.0001 |  0.7668   |  1.1613  |         1.2656         |
|            tinynet_a            | 128 | 1.0001 |  0.7829   |  1.0278  |         1.0949         |
|           convit_base           | 64  |  1.0   |  0.8836   |  1.0277  |         1.1097         |
|           rexnet_100            | 128 |  1.0   |  0.7887   |  1.0166  |         1.0838         |
|            fbnetv3_b            | 128 |  1.0   |  0.8078   |  0.9954  |         1.003          |
|       tf_efficientnet_b0        | 128 |  1.0   |  0.7729   |  0.9912  |         1.0929         |
|        convmixer_768_32         | 32  |  1.0   |  0.9865   |  0.9839  |         0.9959         |
|           selecsls42b           | 128 | 1.0003 |  0.9762   |  0.9761  |         1.0215         |
|          resmlp_12_224          | 128 |  1.0   |  0.9524   |  0.9691  |         0.9868         |
|             dla102              | 128 | 0.9831 |  0.9212   |  0.9641  |         1.048          |
|           tf_mixnet_l           | 128 | 0.9999 |   0.86    |  0.9609  |         1.1153         |
|          mixer_b16_224          | 128 |  1.0   |  0.9704   |  0.9604  |         1.0065         |
|          gmixer_24_224          | 128 |  1.0   |  0.9767   |  0.9576  |         0.9917         |
|          ghostnet_100           | 128 | 1.0005 |  0.9033   |  0.9565  |         1.0646         |
|           resnest101e           | 64  |  1.0   |  0.9586   |  0.9562  |         1.0502         |
|          cspdarknet53           | 64  |  1.0   |  0.8713   |  0.9538  |         1.1043         |
|      xcit_large_24_p8_224       |  5  | 0.9995 |  0.9182   |  0.9329  |         1.0105         |
|            hrnet_w18            | 128 | 0.9997 |  0.9301   |  0.9322  |         1.0108         |
|             dpn107              | 32  | 1.0001 |  0.9526   |  0.9283  |         1.0017         |
|           mobilevit_s           | 64  |  1.0   |  0.7758   |  0.9255  |         0.9901         |
|      beit_base_patch16_224      | 64  |  1.0   |  0.9562   |  0.9248  |         0.9992         |
|        tnt_s_patch16_224        | 128 | 1.0001 |  0.9808   |  0.9221  |         1.0036         |
|          spnasnet_100           | 128 | 0.9996 |  0.9208   |  0.9175  |         0.976          |
|          inception_v3           | 128 | 1.0002 |  0.8727   |  0.917   |         1.0689         |
|       gluon_inception_v3        | 128 | 1.0002 |  0.8727   |  0.917   |         1.0689         |
|        adv_inception_v3         | 128 | 1.0002 |  0.8727   |  0.917   |         1.0689         |
|        res2net101_26w_4s        | 64  |  1.0   |  0.9279   |  0.9164  |         1.002          |
|       eca_botnext26ts_256       | 128 |  1.0   |  0.7715   |  0.916   |         1.0173         |
|      mobilenetv3_large_100      | 128 | 0.9996 |  0.8846   |  0.9144  |         0.9851         |
|      vit_base_patch16_224       | 64  |  1.0   |  0.9453   |  0.9119  |         0.9949         |
|           mnasnet_100           | 128 | 0.9998 |  0.9135   |  0.9103  |         0.9738         |
|            nfnet_l0             | 128 | 0.9999 |  0.8322   |  0.9098  |         0.998          |
| deit_base_distilled_patch16_224 | 64  | 1.0005 |  0.9469   |  0.9075  |         0.9905         |
|     swsl_resnext101_32x16d      | 32  | 1.0001 |  0.9085   |  0.9075  |          1.0           |
|        res2net50_14w_8s         | 128 | 1.0001 |  0.9171   |  0.9054  |         1.0181         |
|           res2next50            | 128 | 1.0002 |  0.9196   |  0.9014  |         1.0134         |
|         visformer_small         | 128 | 1.0004 |  0.9421   |  0.9007  |         0.9926         |
|          cait_m36_384           |  4  | 1.0001 |   0.935   |  0.9005  |         0.988          |
|           volo_d1_224           | 64  | 0.9999 |  0.9243   |  0.8975  |         0.9634         |
|        ese_vovnet19b_dw         | 128 | 0.9999 |  0.8975   |  0.8974  |         1.0127         |
|        gluon_xception65         | 32  |  1.0   |  0.8967   |  0.8947  |         0.9923         |
|           fbnetc_100            | 128 | 0.9999 |  0.8607   |  0.8935  |         0.9847         |
|            mixnet_l             | 128 |  1.0   |  0.8479   |  0.8918  |         1.0338         |
|            lcnet_050            | 128 | 1.0004 |   0.786   |  0.881   |         0.9552         |
|           dm_nfnet_f0           | 128 | 0.9113 |  0.8857   |  0.8735  |         1.1277         |
|          gmlp_s16_224           | 128 |  1.0   |  0.9822   |  0.8656  |         0.8743         |
|  swin_base_patch4_window7_224   | 64  | 0.9999 |  0.9295   |  0.8653  |         0.988          |
|          botnet26t_256          | 128 |  1.0   |  0.8666   |  0.8623  |         0.9861         |
|            gernet_l             | 128 |  1.0   |  0.8663   |  0.8613  |         0.998          |
|        twins_pcpvt_base         | 64  | 1.0005 |   0.921   |   0.86   |         0.9398         |
|          jx_nest_base           | 32  | 1.002  |  0.8971   |  0.8479  |         0.9832         |
|        sebotnet33ts_256         | 64  |  1.0   |  0.7135   |  0.8393  |         1.0449         |
|         crossvit_9_240          | 128 | 1.0001 |  0.8744   |   0.82   |         0.9829         |
|         poolformer_m36          | 64  | 0.9998 |  0.9517   |  0.8195  |         1.1099         |
|           regnety_002           | 128 | 1.0001 |  0.8225   |  0.8013  |         0.9579         |
|            pit_b_224            | 64  | 1.0001 |  0.7934   |  0.7981  |         0.9905         |
|            repvgg_a2            | 128 | 1.0005 |   0.827   |  0.7788  |         1.005          |
|          convnext_base          | 64  | 1.0001 |  0.9147   |  0.7585  |         0.9504         |
|         coat_lite_mini          | 128 | 1.0111 |  0.8823   |  0.7543  |         0.9347         |
+---------------------------------+-----+--------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        convmixer_768_32         | 32  | 355.5362 | 358.4894  | 335.1907 |        334.7495        |
|            hrnet_w18            | 128 | 376.6619 | 531.5092  | 277.106  |        283.2083        |
|        tnt_s_patch16_224        | 128 | 468.9229 | 469.0802  | 273.724  |        274.8511        |
|          pnasnet5large          | 16  | 292.2616 |  303.402  | 253.8925 |        249.7514        |
|          convnext_base          | 64  | 272.9205 | 273.5949  | 248.4064 |        249.464         |
|           tf_mixnet_l           | 128 | 258.3502 | 302.7072  | 218.2655 |        217.1794        |
|            mixnet_l             | 128 | 248.7933 | 292.7808  | 212.1573 |        211.2961        |
|           res2next50            | 128 | 251.5109 |  275.733  | 210.3495 |        214.2467        |
|     swsl_resnext101_32x16d      | 32  | 217.2013 | 234.3006  | 203.2743 |        208.8541        |
|           resnest101e           | 64  | 250.3652 | 269.8928  | 201.5444 |        207.2365        |
|  swin_base_patch4_window7_224   | 64  | 235.842  | 241.6752  | 200.5019 |        200.8853        |
|             dla102              | 128 | 250.2675 | 285.5803  | 185.2588 |        185.3246        |
|          cait_m36_384           |  4  | 216.5972 | 216.3744  | 181.4181 |        181.1613        |
|        gluon_xception65         | 32  | 180.2627 |  198.163  | 174.8302 |        173.9422        |
|        adv_inception_v3         | 128 | 223.1727 | 248.7867  | 170.856  |        170.8174        |
|       gluon_inception_v3        | 128 | 223.358  |  249.417  | 170.5258 |        170.6001        |
|          inception_v3           | 128 | 223.2336 | 248.7032  | 170.3955 |        170.6673        |
|        res2net50_14w_8s         | 128 | 212.8003 | 244.9183  | 162.7145 |        163.8175        |
|             dpn107              | 32  | 192.5966 | 203.2194  | 159.6021 |        155.7298        |
|       eca_botnext26ts_256       | 128 | 195.4949 | 255.6489  | 154.2469 |        155.9393        |
|          mixer_b16_224          | 128 | 161.6875 | 161.3904  | 147.5941 |        147.4979        |
|         poolformer_m36          | 64  | 172.914  |  173.454  | 146.0974 |        146.7028        |
|           dm_nfnet_f0           | 128 | 196.5281 | 196.8797  | 136.8117 |        139.7913        |
|           convit_base           | 64  | 196.4037 | 196.9776  | 136.3484 |        136.2296        |
|        res2net101_26w_4s        | 64  | 151.0843 | 168.0723  | 134.6003 |        132.7344        |
|         coat_lite_mini          | 128 | 192.8658 | 192.9047  | 133.2406 |        137.5414        |
|            pit_b_224            | 64  | 158.1907 | 158.4419  | 132.7388 |        133.4912        |
|            gernet_l             | 128 | 141.4057 | 153.6261  | 129.9314 |        128.6654        |
|            fbnetv3_b            | 128 | 160.2543 | 186.4211  | 124.7924 |        121.7648        |
|         visformer_small         | 128 | 127.6969 | 133.7669  | 121.6278 |        125.3865        |
|      beit_base_patch16_224      | 64  | 129.1954 | 132.0865  | 119.4448 |        117.2841        |
|            nfnet_l0             | 128 | 169.7502 | 216.1149  | 118.6176 |        120.6885        |
|          gmlp_s16_224           | 128 | 162.7403 | 151.7253  | 118.6085 |        118.6111        |
|          botnet26t_256          | 128 | 149.5536 | 164.1675  | 116.972  |        116.0351        |
|           volo_d1_224           | 64  | 152.7537 | 156.5245  | 112.349  |        112.8275        |
| deit_base_distilled_patch16_224 | 64  | 120.9586 | 121.3401  | 111.6025 |        111.4072        |
|      vit_base_patch16_224       | 64  | 120.6468 | 120.9155  | 111.1462 |        111.4671        |
|            repvgg_a2            | 128 | 125.9267 | 138.7671  | 107.1892 |        106.1287        |
|          gmixer_24_224          | 128 | 145.6243 | 175.0378  | 106.8543 |        106.8778        |
|        twins_pcpvt_base         | 64  | 135.0741 | 135.6805  | 106.5016 |        109.0349        |
|      xcit_large_24_p8_224       |  5  | 134.8545 |  141.691  | 105.0561 |        105.6414        |
|          cspdarknet53           | 64  | 129.2247 | 148.3894  | 104.7582 |        102.4885        |
|       tf_efficientnet_b0        | 128 | 131.7866 | 180.0391  | 102.1566 |        100.7133        |
|          jx_nest_base           | 32  | 120.9773 |  121.961  | 100.7285 |        100.9481        |
|           mobilevit_s           | 64  | 120.5948 | 164.0299  | 97.1536  |        98.3099         |
|           fbnetc_100            | 128 | 120.3775 | 141.7194  |  96.512  |        94.2228         |
|           rexnet_100            | 128 | 117.2615 | 155.5388  | 93.4263  |        92.2906         |
|            tinynet_a            | 128 | 107.8594 | 148.1958  | 87.7753  |        85.9203         |
|        sebotnet33ts_256         | 64  | 115.1202 | 146.1043  | 83.7935  |        82.7258         |
|          spnasnet_100           | 128 | 102.476  | 122.6691  |  82.978  |        80.7174         |
|        ese_vovnet19b_dw         | 128 | 98.8806  | 108.2632  | 78.9117  |        78.2674         |
|           selecsls42b           | 128 | 91.7843  | 106.1433  | 76.0327  |        76.4712         |
|           mnasnet_100           | 128 | 95.4617  | 113.7311  | 75.6008  |        73.7201         |
|         mobilenetv2_100         | 128 | 95.4738  | 112.1577  | 71.6766  |        69.9374         |
|         crossvit_9_240          | 128 |  94.805  | 117.0483  | 70.3365  |        70.7082         |
|          resmlp_12_224          | 128 | 63.9366  |  71.3055  | 68.4751  |        68.3251         |
|          ghostnet_100           | 128 | 110.1656 | 142.0036  | 63.1794  |        64.1466         |
|      mobilenetv3_large_100      | 128 | 83.1833  |  98.7643  | 62.4827  |        61.1277         |
|           regnety_002           | 128 | 48.5115  |  58.118   |  37.483  |        38.8874         |
|            lcnet_050            | 128 | 36.8528  |  46.8385  | 23.0214  |        22.8623         |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_100_10_04_23_performance_float32_549/huggingface_float32.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_100_10_04_23_performance_float32_549/torchbench_float32.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_100_10_04_23_performance_float32_549/timm_models_float32.png :

Build Summary

see more

Run name

day_100_10_04_23_performance_float32_549

Commit hashes

pytorch commit: f55e72c0f6bd6da016aaa51de379e6ba6d7891cc
pytorch commit date: 2023-04-07 17:30:27+00:00
torchbench commit: 735f1927996c8d9ab81f0b0c05dd1ebdb26a6250
torchbench commit date: 2023-04-05 09:43:21-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitf55e72c

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

Performance Dashboard for amp precision (inductor max-autotune comparison on timm models, small)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |  100%, 2/2  |
|       inductor_no_cudagraphs        |  100%, 2/2  |
|        inductor_max_autotune        |  100%, 2/2  |
| inductor_max_autotune_no_cudagraphs |  100%, 2/2  |
+-------------------------------------+-------------+

Geometric mean speedup

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |    2.54x    |
|       inductor_no_cudagraphs        |    2.20x    |
|        inductor_max_autotune        |    2.72x    |
| inductor_max_autotune_no_cudagraphs |    2.33x    |
+-------------------------------------+-------------+

Mean compilation time (seconds)

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |   106.20    |
|       inductor_no_cudagraphs        |    69.36    |
|        inductor_max_autotune        |   748.14    |
| inductor_max_autotune_no_cudagraphs |    81.81    |
+-------------------------------------+-------------+

Peak memory footprint compression ratio (higher is better)

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |    0.90x    |
|       inductor_no_cudagraphs        |    1.03x    |
|        inductor_max_autotune        |    0.91x    |
| inductor_max_autotune_no_cudagraphs |    1.04x    |
+-------------------------------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Compilation latency (sec) warnings

+-------------+----------------------+----------+------------------------+
|    suite    |         name         | inductor | inductor_no_cudagraphs |
+-------------+----------------------+----------+------------------------+
| timm_models | xcit_large_24_p8_224 | 138.9123 |        88.2145         |
+-------------+----------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+----------------------+----------+------------------------+
|    suite    |         name         | inductor | inductor_no_cudagraphs |
+-------------+----------------------+----------+------------------------+
| timm_models | xcit_large_24_p8_224 |  0.8225  |         1.0063         |
+-------------+----------------------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|  tnt_s_patch16_224   | 128 |  3.0267  |         2.9763         |        3.3506         |               3.2962                |
| xcit_large_24_p8_224 |  5  |  2.127   |         1.6285         |        2.2127         |               1.6412                |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Accuracy

+----------------------+----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+----+----------+------------------------+-----------------------+-------------------------------------+
|  tnt_s_patch16_224   | 8  |   pass   |          pass          |         pass          |                pass                 |
| xcit_large_24_p8_224 | 8  |   pass   |          pass          |         pass          |                pass                 |
+----------------------+----+----------+------------------------+-----------------------+-------------------------------------+

Compilation latency (sec)

+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
| xcit_large_24_p8_224 |  5  | 138.9123 |        88.2145         |       911.3296        |              100.6539               |
|  tnt_s_patch16_224   | 128 | 73.4942  |         50.505         |       584.9524        |               62.9705               |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Peak Memory Compression Ratio

+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|  tnt_s_patch16_224   | 128 |  0.9834  |         1.0597         |         0.986         |               1.0597                |
| xcit_large_24_p8_224 |  5  |  0.8225  |         1.0063         |         0.826         |               1.0104                |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Absolute latency (ms)

+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|  tnt_s_patch16_224   | 128 | 106.616  |        108.5939        |        96.367         |               98.0578               |
| xcit_large_24_p8_224 |  5  |  60.855  |        79.6217         |        58.0096        |                79.91                |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_101_11_04_23_performance_amp_691/timm_models_amp.png :

Build Summary

see more

Run name

day_101_11_04_23_performance_amp_691

Commit hashes

pytorch commit: f55e72c0f6bd6da016aaa51de379e6ba6d7891cc
pytorch commit date: 2023-04-07 17:30:27+00:00
torchbench commit: 735f1927996c8d9ab81f0b0c05dd1ebdb26a6250
torchbench commit date: 2023-04-05 09:43:21-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+gitf55e72c

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

Performance Dashboard for amp precision (inductor max-autotune comparison on timm models, small, ran locally)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |  100%, 2/2  |
|       inductor_no_cudagraphs        |  100%, 2/2  |
|        inductor_max_autotune        |  100%, 2/2  |
| inductor_max_autotune_no_cudagraphs |  100%, 2/2  |
+-------------------------------------+-------------+

Geometric mean speedup

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |    2.31x    |
|       inductor_no_cudagraphs        |    2.01x    |
|        inductor_max_autotune        |    3.04x    |
| inductor_max_autotune_no_cudagraphs |    2.39x    |
+-------------------------------------+-------------+

Mean compilation time (seconds)

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |   108.16    |
|       inductor_no_cudagraphs        |    68.81    |
|        inductor_max_autotune        |   890.39    |
| inductor_max_autotune_no_cudagraphs |    83.96    |
+-------------------------------------+-------------+

Peak memory footprint compression ratio (higher is better)

+-------------------------------------+-------------+
|              Compiler               | timm_models |
+-------------------------------------+-------------+
|              inductor               |    0.90x    |
|       inductor_no_cudagraphs        |    1.03x    |
|        inductor_max_autotune        |    0.91x    |
| inductor_max_autotune_no_cudagraphs |    1.04x    |
+-------------------------------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Compilation latency (sec) warnings

+-------------+----------------------+----------+------------------------+
|    suite    |         name         | inductor | inductor_no_cudagraphs |
+-------------+----------------------+----------+------------------------+
| timm_models | xcit_large_24_p8_224 | 143.0577 |        89.0471         |
+-------------+----------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+----------------------+----------+------------------------+
|    suite    |         name         | inductor | inductor_no_cudagraphs |
+-------------+----------------------+----------+------------------------+
| timm_models | xcit_large_24_p8_224 |  0.8223  |         1.0062         |
+-------------+----------------------+----------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|  tnt_s_patch16_224   | 128 |  2.4789  |         2.4537         |        3.5366         |               3.5032                |
| xcit_large_24_p8_224 |  5  |  2.1495  |         1.6444         |        2.6118         |               1.6356                |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Accuracy

+----------------------+----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+----+----------+------------------------+-----------------------+-------------------------------------+
|  tnt_s_patch16_224   | 8  |   pass   |          pass          |         pass          |                pass                 |
| xcit_large_24_p8_224 | 8  |   pass   |          pass          |         pass          |                pass                 |
+----------------------+----+----------+------------------------+-----------------------+-------------------------------------+

Compilation latency (sec)

+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
| xcit_large_24_p8_224 |  5  | 143.0577 |        89.0471         |       1044.1055       |              103.8899               |
|  tnt_s_patch16_224   | 128 | 73.2719  |        48.5673         |       736.6731        |               64.0348               |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Peak Memory Compression Ratio

+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|  tnt_s_patch16_224   | 128 |  0.9834  |         1.0597         |         0.986         |               1.0597                |
| xcit_large_24_p8_224 |  5  |  0.8223  |         1.0062         |        0.8257         |               1.0118                |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Absolute latency (ms)

+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|         name         | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|  tnt_s_patch16_224   | 128 | 146.9206 |        148.3703        |       103.1044        |               104.003               |
| xcit_large_24_p8_224 |  5  | 61.2006  |        79.4134         |        58.8175        |               78.9549               |
+----------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_103_13_04_23_performance_amp_325/timm_models_amp.png :

Build Summary

see more

Run name

day_103_13_04_23_performance_amp_325

Commit hashes

pytorch commit: 75f55ca63bd5623352c8eda8e31ff76ee5c960a7
pytorch commit date: 2023-04-13 00:45:48+00:00
torchbench commit: cd89d490ecbcca7d8ca50324522b31a1a198c753
torchbench commit date: 2023-04-13 11:05:33-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git75f55ca

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

Performance Dashboard for amp precision (inductor max-autotune comparison on all suites, with warm start)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+-------------------------------------+------------+-------------+-------------+
|              Compiler               | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
|              inductor               | 85%, 51/60 | 91%, 41/45  | 100%, 60/60 |
|       inductor_no_cudagraphs        | 87%, 52/60 | 96%, 43/45  | 100%, 60/60 |
|        inductor_max_autotune        | 78%, 47/60 | 91%, 41/45  | 98%, 59/60  |
| inductor_max_autotune_no_cudagraphs | 82%, 49/60 | 96%, 43/45  | 100%, 60/60 |
+-------------------------------------+------------+-------------+-------------+

Geometric mean speedup

+-------------------------------------+------------+-------------+-------------+
|              Compiler               | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
|              inductor               |   1.61x    |    1.60x    |    1.40x    |
|       inductor_no_cudagraphs        |   1.29x    |    1.51x    |    1.39x    |
|        inductor_max_autotune        |   1.61x    |    1.63x    |    1.44x    |
| inductor_max_autotune_no_cudagraphs |   1.35x    |    1.58x    |    1.42x    |
+-------------------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+-------------------------------------+------------+-------------+-------------+
|              Compiler               | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
|              inductor               |   56.68    |    59.65    |    79.10    |
|       inductor_no_cudagraphs        |   30.39    |    42.67    |    46.98    |
|        inductor_max_autotune        |   257.92   |   186.71    |   381.29    |
| inductor_max_autotune_no_cudagraphs |   37.42    |    56.47    |    56.80    |
+-------------------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+-------------------------------------+------------+-------------+-------------+
|              Compiler               | torchbench | huggingface | timm_models |
+-------------------------------------+------------+-------------+-------------+
|              inductor               |   0.79x    |    0.91x    |    0.91x    |
|       inductor_no_cudagraphs        |   1.07x    |    1.06x    |    1.05x    |
|        inductor_max_autotune        |   0.76x    |    0.89x    |    0.91x    |
| inductor_max_autotune_no_cudagraphs |   1.07x    |    1.06x    |    1.05x    |
+-------------------------------------+------------+-------------+-------------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+-----------------+------------------------+
|    suite    |             name              |    inductor     | inductor_no_cudagraphs |
+-------------+-------------------------------+-----------------+------------------------+
| torchbench  |         hf_Longformer         |   fail_to_run   |      fail_to_run       |
| torchbench  |             moco              |   fail_to_run   |      fail_to_run       |
| torchbench  |      Background_Matting       | eager_variation |    eager_variation     |
| torchbench  |              gat              |     0.0000      |         0.0000         |
| torchbench  |              gcn              |     0.0000      |         0.0000         |
| torchbench  |             llama             |     0.0000      |         0.0000         |
| torchbench  |             sage              |     0.0000      |         0.0000         |
| torchbench  |           tacotron2           |     0.0000      |         0.0000         |
| torchbench  |         torchrec_dlrm         |     0.0000      |         0.0000         |
| huggingface | DebertaV2ForQuestionAnswering |   fail_to_run   |          pass          |
| huggingface |  AlbertForQuestionAnswering   |  fail_accuracy  |     fail_accuracy      |
+-------------+-------------------------------+-----------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |         lennard_jones         |  1.4551  |         0.8351         |
| torchbench  |             dcgan             |  1.3676  |          0.84          |
| torchbench  |       soft_actor_critic       |  1.0041  |         0.8306         |
| torchbench  |          timm_vovnet          |  0.9088  |         0.9047         |
| torchbench  |    nvidia_deeprecommender     |  0.8719  |         1.0185         |
| torchbench  | timm_vision_transformer_large |   0.0    |         1.084          |
| torchbench  |              gat              |   0.0    |          0.0           |
| torchbench  |              gcn              |   0.0    |          0.0           |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |             sage              |   0.0    |          0.0           |
| torchbench  |           tacotron2           |   0.0    |          0.0           |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      DebertaForMaskedLM       |  1.0944  |         0.9127         |
| huggingface |     DebertaV2ForMaskedLM      |  1.0122  |         0.7354         |
| huggingface | DebertaV2ForQuestionAnswering |  0.9377  |         0.7692         |
| huggingface |     BlenderbotForCausalLM     |   0.0    |         1.1121         |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+----------+------------------------+
|    suite    |              name              | inductor | inductor_no_cudagraphs |
+-------------+--------------------------------+----------+------------------------+
| torchbench  |          hf_T5_large           | 173.0714 |        132.2447        |
| torchbench  |        phlippe_densenet        | 160.9749 |        30.0922         |
| torchbench  |           hf_BigBird           | 149.3215 |        103.7268        |
| torchbench  |          densenet121           | 133.1323 |        73.2249         |
| torchbench  |          mobilenet_v2          | 122.4436 |        30.0659         |
| huggingface |     MobileBertForMaskedLM      | 144.5755 |        102.6639        |
| huggingface | MobileBertForQuestionAnswering | 142.8046 |        101.7791        |
| huggingface |      DebertaV2ForMaskedLM      | 140.6903 |        57.1807         |
| huggingface | DebertaV2ForQuestionAnswering  | 140.1457 |        61.8347         |
| huggingface | M2M100ForConditionalGeneration | 137.1882 |        71.3113         |
| huggingface |  MT5ForConditionalGeneration   | 133.1446 |        48.8807         |
| huggingface |        XGLMForCausalLM         | 121.334  |        58.2817         |
| timm_models |           rexnet_100           | 224.9662 |        43.8651         |
| timm_models |           hrnet_w18            | 192.1312 |        151.1129        |
| timm_models |         pnasnet5large          | 158.5366 |        110.0134        |
| timm_models |          ghostnet_100          | 153.8495 |        53.7871         |
| timm_models |       res2net101_26w_4s        | 150.918  |        87.7778         |
| timm_models |        twins_pcpvt_base        | 147.4936 |        70.2721         |
| timm_models |        adv_inception_v3        | 145.5702 |        52.6549         |
| timm_models |           fbnetv3_b            | 132.0557 |        60.2455         |
| timm_models |      xcit_large_24_p8_224      | 126.2305 |        86.2627         |
| timm_models |          resnest101e           | 124.9575 |        79.8068         |
| timm_models |           tinynet_a            | 120.3408 |        43.3221         |
+-------------+--------------------------------+----------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |              hf_GPT2_large              |  0.8904  |         1.1718         |
| torchbench  |                 yolov3                  |  0.8748  |         1.0642         |
| torchbench  |            timm_efficientnet            |  0.8701  |         1.0972         |
| torchbench  |                resnet152                |  0.8697  |         1.0021         |
| torchbench  |           speech_transformer            |  0.8681  |         1.0968         |
| torchbench  |           shufflenet_v2_x1_0            |  0.8627  |         1.0886         |
| torchbench  |              timm_resnest               |  0.8616  |         1.0911         |
| torchbench  |               Super_SloMo               |  0.8614  |         1.2225         |
| torchbench  |         timm_vision_transformer         |  0.8593  |         0.9978         |
| torchbench  |               timm_regnet               |  0.8513  |         1.0004         |
| torchbench  |           Background_Matting            |  0.8485  |         1.0482         |
| torchbench  |              hf_DistilBert              |  0.8476  |         1.0783         |
| torchbench  |                 hf_Bert                 |  0.8411  |         1.0767         |
| torchbench  |                resnet50                 |  0.8353  |         1.0021         |
| torchbench  |              hf_Bert_large              |  0.8302  |         1.0916         |
| torchbench  |               hf_T5_large               |  0.8201  |         1.1919         |
| torchbench  |               timm_vovnet               |  0.8185  |         1.0133         |
| torchbench  |              pytorch_unet               |  0.8134  |         1.0094         |
| torchbench  |            phlippe_densenet             |  0.8058  |         1.0057         |
| torchbench  |                  dcgan                  |  0.7955  |         0.9998         |
| torchbench  |                 hf_Bart                 |  0.793   |         1.0113         |
| torchbench  |              squeezenet1_1              |  0.7867  |         1.0815         |
| torchbench  |           mobilenet_v3_large            |  0.7849  |          1.0           |
| torchbench  |                 demucs                  |  0.7826  |         0.9998         |
| torchbench  |             pytorch_stargan             |  0.7715  |         1.0716         |
| torchbench  |                 alexnet                 |  0.7396  |         1.0013         |
| torchbench  |                  vgg16                  |  0.7227  |         0.9886         |
| torchbench  |               mnasnet1_0                |  0.7144  |         1.0027         |
| torchbench  |               densenet121               |  0.7071  |         0.9989         |
| torchbench  |             pytorch_struct              |  0.697   |          1.0           |
| torchbench  |               hf_BigBird                |  0.6949  |         1.1929         |
| torchbench  |         nvidia_deeprecommender          |  0.6857  |         0.9711         |
| torchbench  |             resnext50_32x4d             |  0.6786  |          1.0           |
| torchbench  |                   drq                   |  0.6429  |         0.9687         |
| torchbench  |            soft_actor_critic            |  0.6067  |         0.9974         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       |  0.6065  |         1.0224         |
| torchbench  |             LearningToPaint             |  0.5925  |         0.9944         |
| torchbench  |                resnet18                 |  0.5891  |         0.9931         |
| torchbench  |              lennard_jones              |  0.5317  |         1.0001         |
| torchbench  |               hf_Reformer               |  0.4539  |         1.0027         |
| torchbench  |          functorch_dp_cifar10           |  0.3991  |         1.0609         |
| torchbench  |             phlippe_resnet              |  0.3169  |         1.008          |
| huggingface |           ElectraForCausalLM            |  0.8941  |         0.9739         |
| huggingface |           PegasusForCausalLM            |  0.893   |         0.9864         |
| huggingface |          DistilBertForMaskedLM          |  0.8849  |         0.9624         |
| huggingface |            TrOCRForCausalLM             |  0.8836  |         0.9583         |
| huggingface | BlenderbotSmallForConditionalGeneration |  0.8729  |         0.9803         |
| huggingface |     PegasusForConditionalGeneration     |  0.8689  |         1.0689         |
| huggingface |      MBartForConditionalGeneration      |  0.8574  |         1.0307         |
| huggingface |      BartForConditionalGeneration       |  0.8456  |         1.0139         |
| huggingface |         MegatronBertForCausalLM         |  0.845   |         1.0961         |
| huggingface |       BlenderbotSmallForCausalLM        |  0.8184  |         0.9119         |
| huggingface |         Speech2Text2ForCausalLM         |  0.789   |         0.8779         |
| huggingface |     M2M100ForConditionalGeneration      |  0.7651  |         0.9908         |
| huggingface |          MobileBertForMaskedLM          |  0.752   |         1.016          |
| huggingface |             XGLMForCausalLM             |  0.7117  |         0.9792         |
| huggingface |     MobileBertForQuestionAnswering      |  0.6569  |         0.8579         |
| huggingface |           DebertaForMaskedLM            |  0.5646  |         1.0748         |
| huggingface |          DebertaV2ForMaskedLM           |  0.5187  |         0.9894         |
| huggingface |       DebertaForQuestionAnswering       |  0.4867  |         1.2209         |
| huggingface |      DebertaV2ForQuestionAnswering      |  0.4855  |         1.0041         |
| timm_models |              ghostnet_100               |  0.8976  |         1.0514         |
| timm_models |                hrnet_w18                |  0.8918  |         1.0121         |
| timm_models |            sebotnet33ts_256             |  0.891   |         1.1401         |
| timm_models |              inception_v3               |  0.8904  |         1.0459         |
| timm_models |            adv_inception_v3             |  0.8904  |         1.0459         |
| timm_models |           gluon_inception_v3            |  0.8904  |         1.0459         |
| timm_models |          mobilenetv3_large_100          |  0.8881  |         1.0046         |
| timm_models |                 dpn107                  |  0.8833  |         0.9977         |
| timm_models |            gluon_xception65             |  0.8832  |         0.9998         |
| timm_models |              spnasnet_100               |  0.8786  |         1.0063         |
| timm_models |               selecsls42b               |  0.8785  |         1.0139         |
| timm_models |             poolformer_m36              |  0.8768  |         1.1916         |
| timm_models |           eca_botnext26ts_256           |  0.8738  |         1.0257         |
| timm_models |            res2net50_14w_8s             |  0.8712  |         0.9828         |
| timm_models |            res2net101_26w_4s            |  0.871   |         0.9822         |
| timm_models |                mixnet_l                 |  0.8687  |         1.0134         |
| timm_models |               mnasnet_100               |  0.8683  |         1.0074         |
| timm_models |               res2next50                |  0.866   |         0.9759         |
| timm_models |              cait_m36_384               |  0.8636  |         1.0068         |
| timm_models |               fbnetc_100                |  0.8596  |         1.0104         |
| timm_models |                pit_b_224                |  0.8578  |         1.0382         |
| timm_models |              convnext_base              |  0.8505  |         1.0373         |
| timm_models |                gernet_l                 |  0.8499  |         1.0005         |
| timm_models |         swsl_resnext101_32x16d          |  0.8477  |         1.0007         |
| timm_models |             coat_lite_mini              |  0.8402  |         1.0437         |
| timm_models |                lcnet_050                |  0.8273  |         1.0008         |
| timm_models |              botnet26t_256              |  0.8239  |          1.0           |
| timm_models |          xcit_large_24_p8_224           |  0.8228  |         1.0079         |
| timm_models |               regnety_002               |  0.8165  |         1.0004         |
| timm_models |                repvgg_a2                |  0.7738  |         1.0131         |
| timm_models |             crossvit_9_240              |  0.7526  |         1.0019         |
| timm_models |      swin_base_patch4_window7_224       |  0.7214  |         0.9303         |
| timm_models |              jx_nest_base               |  0.6693  |         0.9905         |
+-------------+-----------------------------------------+----------+------------------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+
|               name                |  bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+
|       functorch_dp_cifar10        |  64  |  3.7683  |         1.4164         |        3.8468         |               1.4282                |
|           BERT_pytorch            |  16  |  3.2988  |         2.2199         |        3.3208         |               2.2676                |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  2.9018  |         1.8137         |        2.3668         |               1.8208                |
|            densenet121            |  4   |  2.7444  |         1.0715         |        2.7296         |               1.0748                |
|            hf_BigBird             |  2   |  2.6731  |         1.7162         |        2.6194         |               1.7758                |
|            hf_T5_large            |  2   |  2.4329  |         2.0038         |        2.5267         |               2.1386                |
|             hf_Albert             |  8   |  2.3975  |         2.3005         |        2.3685         |               2.3139                |
|               dlrm                | 1024 |  2.2374  |         1.1677         |        2.0264         |               1.2637                |
|           squeezenet1_1           |  32  |  2.1124  |         1.3397         |        2.0063         |               1.4167                |
|         phlippe_densenet          | 128  |  2.0806  |         1.029          |         2.087         |               1.0705                |
|        mobilenet_v3_large         |  32  |  2.0468  |         1.211          |        2.0609         |               1.2376                |
|          pytorch_struct           | 200  |  1.9681  |         1.1189         |        2.0867         |               1.4794                |
|               hf_T5               |  8   |  1.9589  |         1.9749         |        2.0015         |               2.0269                |
|              hf_Bert              |  4   |  1.9253  |         1.7074         |        1.9583         |               1.7481                |
|              hf_Bart              |  4   |  1.8693  |         1.557          |        1.7178         |               1.6633                |
|              hf_GPT2              |  4   |  1.8582  |         1.9035         |        2.0785         |               2.0666                |
|          phlippe_resnet           | 128  |  1.832   |         1.0113         |        1.8121         |               1.0676                |
|           hf_GPT2_large           |  4   |  1.7265  |         1.7904         |          0.0          |               1.9209                |
|          resnext50_32x4d          |  8   |  1.7133  |         0.9962         |        1.7077         |               0.9997                |
|            mnasnet1_0             |  32  |  1.7066  |         1.0654         |        1.6934         |                1.114                |
|        speech_transformer         |  32  |  1.6683  |         1.6225         |        1.8907         |               1.8447                |
|        shufflenet_v2_x1_0         | 128  |  1.6325  |         1.2108         |        1.6143         |               1.2218                |
|           hf_Bert_large           |  4   |  1.6266  |         1.6462         |        1.6564         |               1.6987                |
|             resnet18              |  16  |  1.5869  |         0.9844         |        1.5787         |               1.0058                |
|           hf_DistilBert           |  8   |  1.5777  |         1.5038         |        1.4896         |               1.5064                |
|      timm_vision_transformer      |  32  |  1.5761  |         1.4238         |        1.7334         |               1.5843                |
|           timm_resnest            |  32  |  1.573   |         1.5269         |        1.5745         |               1.5469                |
|            timm_nfnet             | 128  |  1.5573  |         1.5076         |        1.5801         |               1.5156                |
| attention_is_all_you_need_pytorch | 256  |  1.5525  |         1.554          |        1.7288         |                1.711                |
|           fastNLP_Bert            |  6   |  1.5451  |         1.5499         |        1.6927         |               1.6834                |
|           mobilenet_v2            |  96  |  1.5203  |         1.5083         |        1.5202         |               1.5225                |
|                drq                |  1   |  1.4926  |         1.0524         |        1.4722         |               1.1479                |
|           lennard_jones           | 1000 |  1.4551  |         0.8351         |        1.3814         |                1.07                 |
|         timm_efficientnet         |  32  |  1.3786  |         1.0638         |        1.3933         |               1.0552                |
|               dcgan               |  32  |  1.3676  |          0.84          |        1.4554         |               0.8373                |
|           pytorch_unet            |  1   |  1.3593  |         1.3532         |        1.3587         |               1.3564                |
|          LearningToPaint          |  96  |  1.3205  |         1.0678         |        1.3599         |               1.1066                |
|          pytorch_stargan          |  16  |  1.281   |         1.2489         |         1.267         |               1.2428                |
|            Super_SloMo            |  6   |  1.2511  |         1.2343         |        1.2587         |               1.2411                |
|               vgg16               |  64  |  1.2412  |         1.2537         |        1.2509         |               1.2643                |
|        Background_Matting         |  4   |  1.2119  |         1.2059         |        1.2177         |               1.2108                |
|             resnet152             |  32  |  1.205   |         1.0171         |        1.1848         |                1.037                |
|              yolov3               |  16  |  1.1977  |         1.1969         |        1.2052         |               1.2084                |
|             resnet50              |  32  |  1.1807  |         1.0715         |        1.1844         |               1.0776                |
|            hf_Reformer            |  4   |  1.1415  |         1.0689         |        1.1457         |               1.0826                |
|              alexnet              | 128  |  1.089   |         1.1351         |        1.1322         |               1.1834                |
|              demucs               |  4   |  1.039   |         1.0374         |        1.0363         |               1.0385                |
|         soft_actor_critic         | 256  |  1.0041  |         0.8306         |        1.1711         |               0.8459                |
|            timm_regnet            |  32  |  1.001   |         0.9535         |        1.0103         |               0.9605                |
|            tts_angular            |  64  |  0.9571  |         0.9597         |        0.9585         |               0.9524                |
|            timm_vovnet            |  32  |  0.9088  |         0.9047         |        0.9139         |               0.9079                |
|      nvidia_deeprecommender       | 256  |  0.8719  |         1.0185         |        0.9331         |               1.1032                |
|   timm_vision_transformer_large   |  32  |   0.0    |         1.084          |          0.0          |               1.1625                |
|                gat                |  0   |   0.0    |          0.0           |          0.0          |                 0.0                 |
|                gcn                |  0   |   0.0    |          0.0           |          0.0          |                 0.0                 |
|           hf_Longformer           |  0   |   0.0    |          0.0           |          0.0          |                 0.0                 |
|               moco                |  0   |   0.0    |          0.0           |          0.0          |                 0.0                 |
|               sage                |  0   |   0.0    |          0.0           |          0.0          |                 0.0                 |
|             tacotron2             |  0   |   0.0    |          0.0           |          0.0          |                 0.0                 |
|           torchrec_dlrm           |  0   |   0.0    |          0.0           |          0.0          |                 0.0                 |
+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+

Accuracy

+-----------------------------------+-----+------------------+------------------------+-----------------------+-------------------------------------+
|               name                | bs  |     inductor     | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+-----+------------------+------------------------+-----------------------+-------------------------------------+
|           hf_GPT2_large           |  4  | pass_due_to_skip |    pass_due_to_skip    |   pass_due_to_skip    |          pass_due_to_skip           |
|   timm_vision_transformer_large   |  4  | pass_due_to_skip |    pass_due_to_skip    |   pass_due_to_skip    |          pass_due_to_skip           |
|            hf_T5_large            |  4  | pass_due_to_skip |    pass_due_to_skip    |   pass_due_to_skip    |          pass_due_to_skip           |
|         timm_efficientnet         |  4  |       pass       |          pass          |         pass          |                pass                 |
|   pytorch_CycleGAN_and_pix2pix    |  1  |       pass       |          pass          |         pass          |                pass                 |
|          pytorch_struct           | 200 |       pass       |          pass          |         pass          |                pass                 |
|           pytorch_unet            |  2  |       pass       |          pass          |         pass          |                pass                 |
|             resnet152             |  4  |       pass       |          pass          |         pass          |                pass                 |
|             resnet18              |  4  |       pass       |          pass          |         pass          |                pass                 |
|             resnet50              |  4  |       pass       |          pass          |         pass          |                pass                 |
|          resnext50_32x4d          |  4  |       pass       |          pass          |         pass          |                pass                 |
|        shufflenet_v2_x1_0         |  4  |       pass       |          pass          |         pass          |                pass                 |
|         soft_actor_critic         | 256 |       pass       |          pass          |         pass          |                pass                 |
|        speech_transformer         |  4  |       pass       |          pass          |         pass          |                pass                 |
|            timm_nfnet             |  4  |       pass       |          pass          |         pass          |                pass                 |
|      nvidia_deeprecommender       |  4  |       pass       |          pass          |         pass          |                pass                 |
|            timm_regnet            |  4  |       pass       |          pass          |         pass          |                pass                 |
|           timm_resnest            |  4  |       pass       |          pass          |         pass          |                pass                 |
|      timm_vision_transformer      |  4  |       pass       |          pass          |         pass          |                pass                 |
|            timm_vovnet            |  4  |       pass       |          pass          |         pass          |                pass                 |
|            tts_angular            |  4  |       pass       |          pass          |         pass          |                pass                 |
|               vgg16               |  4  |       pass       |          pass          |         pass          |                pass                 |
|              yolov3               |  4  |       pass       |          pass          |         pass          |                pass                 |
|                drq                |  1  |       pass       |          pass          |     fail_accuracy     |            fail_accuracy            |
|          phlippe_resnet           |  4  |       pass       |          pass          |     fail_accuracy     |            fail_accuracy            |
|           squeezenet1_1           |  4  |       pass       |          pass          |     fail_accuracy     |            fail_accuracy            |
|          vision_maskrcnn          |  4  |       pass       |          pass          |         pass          |               0.0000                |
|         phlippe_densenet          |  4  |       pass       |          pass          |         pass          |                pass                 |
|          pytorch_stargan          | 16  |       pass       |          pass          |         pass          |                pass                 |
|        mobilenet_v3_large         |  4  |       pass       |          pass          |         pass          |                pass                 |
|             hf_Albert             |  4  |       pass       |          pass          |         pass          |                pass                 |
|           BERT_pytorch            |  4  |       pass       |          pass          |         pass          |                pass                 |
|          LearningToPaint          |  4  |       pass       |          pass          |         pass          |                pass                 |
|            Super_SloMo            |  4  |       pass       |          pass          |         pass          |                pass                 |
|              alexnet              |  4  |       pass       |          pass          |         pass          |                pass                 |
| attention_is_all_you_need_pytorch |  4  |       pass       |          pass          |         pass          |                pass                 |
|               dcgan               |  4  |       pass       |          pass          |         pass          |                pass                 |
|              demucs               |  4  |       pass       |          pass          |         pass          |                pass                 |
|           mobilenet_v2            |  4  |       pass       |          pass          |         pass          |                pass                 |
|               dlrm                |  4  |       pass       |          pass          |         pass          |                pass                 |
|           fastNLP_Bert            |  4  |       pass       |          pass          |         pass          |                pass                 |
|       functorch_dp_cifar10        |  4  |       pass       |          pass          |         pass          |                pass                 |
|            densenet121            |  4  |       pass       |          pass          |         pass          |                pass                 |
|              hf_Bart              |  4  |       pass       |          pass          |         pass          |                pass                 |
|            hf_Reformer            |  4  |       pass       |          pass          |         pass          |                pass                 |
|            mnasnet1_0             |  4  |       pass       |          pass          |         pass          |                pass                 |
|           lennard_jones           |  4  |       pass       |          pass          |         pass          |                pass                 |
|              hf_Bert              |  4  |       pass       |          pass          |         pass          |                pass                 |
|               hf_T5               |  4  |       pass       |          pass          |         pass          |                pass                 |
|            hf_T5_base             |  4  |       pass       |          pass          |         pass          |                pass                 |
|              hf_GPT2              |  2  |       pass       |          pass          |         pass          |                pass                 |
|           hf_DistilBert           |  4  |       pass       |          pass          |         pass          |                pass                 |
|            hf_BigBird             |  4  |       pass       |          pass          |         pass          |                pass                 |
|           hf_Bert_large           |  4  |       pass       |          pass          |         pass          |                pass                 |
|           hf_Longformer           |  4  |   fail_to_run    |      fail_to_run       |      fail_to_run      |             fail_to_run             |
|               moco                |  4  |   fail_to_run    |      fail_to_run       |      fail_to_run      |             fail_to_run             |
|        Background_Matting         |  4  | eager_variation  |    eager_variation     |    eager_variation    |           eager_variation           |
|                gat                |  0  |      0.0000      |         0.0000         |        0.0000         |               0.0000                |
|                gcn                |  0  |      0.0000      |         0.0000         |        0.0000         |               0.0000                |
|               llama               |  0  |      0.0000      |         0.0000         |        0.0000         |               0.0000                |
|               sage                |  0  |      0.0000      |         0.0000         |        0.0000         |               0.0000                |
|             tacotron2             |  0  |      0.0000      |         0.0000         |        0.0000         |               0.0000                |
|           torchrec_dlrm           |  0  |      0.0000      |         0.0000         |        0.0000         |               0.0000                |
+-----------------------------------+-----+------------------+------------------------+-----------------------+-------------------------------------+

Compilation latency (sec)

+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+
|               name                |  bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+
|            hf_T5_large            |  2   | 173.0714 |        132.2447        |        416.242        |              177.2517               |
|         phlippe_densenet          | 128  | 160.9749 |        30.0922         |       535.7692        |               32.8417               |
|            hf_BigBird             |  2   | 149.3215 |        103.7268        |        498.612        |              130.1443               |
|            densenet121            |  4   | 133.1323 |        73.2249         |       733.9116        |               76.8014               |
|           mobilenet_v2            |  96  | 122.4436 |        30.0659         |       339.5142        |               31.3981               |
|         timm_efficientnet         |  32  | 111.0412 |        36.4138         |        266.193        |               38.9304               |
|            mnasnet1_0             |  32  | 110.6588 |        29.0324         |       378.7044        |               29.7176               |
|        mobilenet_v3_large         |  32  | 109.9406 |        31.6035         |       344.9076        |               34.1691               |
|           hf_GPT2_large           |  4   | 108.8581 |        77.1005         |          nan          |               96.8448               |
|              yolov3               |  16  | 92.0432  |        43.0049         |       284.2655        |               45.0176               |
|        speech_transformer         |  32  |  80.649  |        39.1877         |        807.794        |               52.1384               |
|        shufflenet_v2_x1_0         | 128  | 79.0767  |        32.8056         |       225.9032        |               32.968                |
| attention_is_all_you_need_pytorch | 256  | 75.6929  |        34.7959         |        634.072        |               46.5701               |
|           BERT_pytorch            |  16  | 71.7002  |        33.6489         |       320.9033        |               42.5153               |
|             resnet152             |  32  | 71.3272  |        69.3953         |       177.8991        |               73.8442               |
|        Background_Matting         |  4   | 70.9447  |        30.4912         |       132.6704        |               33.0487               |
|            timm_nfnet             | 128  | 67.1746  |        37.2659         |       271.9672        |               39.586                |
|            timm_regnet            |  32  |  61.764  |        38.7132         |        296.823        |               41.0341               |
|           hf_Bert_large           |  4   | 61.6953  |        55.8367         |       289.4748        |               72.8998               |
|           timm_resnest            |  32  | 60.5023  |        19.6536         |       114.3721        |               20.7796               |
|       functorch_dp_cifar10        |  64  |  56.865  |         15.62          |       134.2888        |               16.4183               |
|              hf_Bart              |  4   | 50.0551  |         39.499         |       157.2438        |               50.1397               |
|           fastNLP_Bert            |  6   | 49.5644  |        32.8088         |       424.9577        |               42.0206               |
|            hf_Reformer            |  4   | 49.2329  |        15.8472         |       187.3093        |               17.8342               |
|               hf_T5               |  8   | 49.1459  |        33.6421         |       247.7637        |               46.7628               |
|      timm_vision_transformer      |  32  | 48.9995  |        24.0413         |       385.9061        |               31.4757               |
|          pytorch_stargan          |  16  | 47.5214  |        14.5602         |        33.711         |               14.0179               |
|           pytorch_unet            |  1   | 45.6827  |        16.5621         |       138.7921        |               17.9691               |
|          LearningToPaint          |  96  | 43.5813  |        16.6341         |        208.533        |               17.3278               |
|            Super_SloMo            |  6   | 43.1872  |        29.9183         |       130.8438        |               31.9602               |
|          resnext50_32x4d          |  8   | 42.9535  |        28.0696         |       248.8633        |               28.8052               |
|              hf_GPT2              |  4   | 41.8947  |        28.9462         |       175.7557        |               35.6423               |
|             hf_Albert             |  8   | 40.9266  |        23.7433         |        369.308        |               33.0731               |
|              hf_Bert              |  4   | 36.9631  |        29.8651         |        87.4618        |               39.0034               |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 36.7504  |        16.6227         |        94.3367        |               16.3521               |
|            timm_vovnet            |  32  | 35.6387  |         26.284         |       252.3846        |               27.6386               |
|             resnet18              |  16  | 30.4133  |        15.5566         |       174.7456        |               15.7017               |
|             resnet50              |  32  | 28.5418  |        28.5283         |        29.5055        |               29.8457               |
|              demucs               |  4   | 27.9362  |        12.1266         |        66.4479        |               11.9827               |
|           hf_DistilBert           |  8   | 27.6289  |        17.9827         |        46.8685        |               22.3509               |
|          phlippe_resnet           | 128  | 24.4142  |        13.4551         |       158.3286        |               15.0748               |
|           squeezenet1_1           |  32  | 21.7804  |        11.6383         |       145.0019        |               11.9146               |
|          pytorch_struct           | 200  | 20.3197  |         6.8722         |       353.2137        |               8.9324                |
|              alexnet              | 128  | 15.5657  |         8.5006         |       169.6181        |               8.8066                |
|               vgg16               |  64  | 12.7831  |         8.3682         |       165.2221        |               9.5088                |
|                drq                |  1   | 10.9657  |         6.7722         |       263.0723        |               8.6234                |
|      nvidia_deeprecommender       | 256  |  10.344  |         6.318          |       231.7501        |               7.0803                |
|         soft_actor_critic         | 256  |  9.3882  |         6.4124         |       166.2186        |               6.0958                |
|               dlrm                | 1024 |  8.8048  |         6.2748         |       322.0582        |               7.5683                |
|               dcgan               |  32  |  8.4643  |         6.4032         |        34.7612        |               6.6723                |
|           lennard_jones           | 1000 |  7.0308  |         6.2225         |       141.9293        |               5.9564                |
|            tts_angular            |  64  |  6.0732  |         5.0228         |        5.2435         |                5.007                |
|   timm_vision_transformer_large   |  32  |   nan    |        73.4664         |          nan          |              106.1873               |
|                gat                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|                gcn                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|           hf_Longformer           |  0   |   nan    |          nan           |          nan          |                 nan                 |
|               moco                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|               sage                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|             tacotron2             |  0   |   nan    |          nan           |          nan          |                 nan                 |
|           torchrec_dlrm           |  0   |   nan    |          nan           |          nan          |                 nan                 |
+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+
|               name                |  bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+
|             hf_Albert             |  8   |  1.0378  |         1.3253         |        0.9955         |               1.3253                |
|               hf_T5               |  8   |  1.0163  |         1.2478         |        0.9988         |               1.2478                |
|           mobilenet_v2            |  96  |  1.0102  |         1.1747         |        1.0102         |               1.1747                |
|            tts_angular            |  64  |  0.9904  |          1.0           |        0.9896         |                 1.0                 |
|            timm_nfnet             | 128  |  0.9689  |         1.1079         |        0.9619         |               1.1066                |
| attention_is_all_you_need_pytorch | 256  |  0.9689  |         1.1774         |        1.0017         |               1.1736                |
|           fastNLP_Bert            |  6   |  0.9575  |         1.2381         |        0.9595         |               1.2381                |
|               dlrm                | 1024 |  0.9525  |         1.0009         |        0.9466         |               1.0009                |
|           BERT_pytorch            |  16  |  0.9428  |         1.3212         |        0.9428         |               1.3212                |
|              hf_GPT2              |  4   |  0.9321  |         1.1566         |         0.932         |               1.1772                |
|           hf_GPT2_large           |  4   |  0.8904  |         1.1718         |          nan          |               1.1777                |
|              yolov3               |  16  |  0.8748  |         1.0642         |        0.8723         |               1.0736                |
|         timm_efficientnet         |  32  |  0.8701  |         1.0972         |        0.9259         |               1.1033                |
|             resnet152             |  32  |  0.8697  |         1.0021         |        0.8286         |                 1.0                 |
|        speech_transformer         |  32  |  0.8681  |         1.0968         |        0.8618         |               1.0967                |
|        shufflenet_v2_x1_0         | 128  |  0.8627  |         1.0886         |        0.8631         |               1.1038                |
|           timm_resnest            |  32  |  0.8616  |         1.0911         |        0.8431         |               1.1309                |
|            Super_SloMo            |  6   |  0.8614  |         1.2225         |        0.8606         |               1.2225                |
|      timm_vision_transformer      |  32  |  0.8593  |         0.9978         |        0.8357         |               0.9978                |
|            timm_regnet            |  32  |  0.8513  |         1.0004         |        0.8485         |               1.0005                |
|        Background_Matting         |  4   |  0.8485  |         1.0482         |        0.8333         |               1.0482                |
|           hf_DistilBert           |  8   |  0.8476  |         1.0783         |        0.8456         |               1.0783                |
|              hf_Bert              |  4   |  0.8411  |         1.0767         |        0.8411         |               1.0767                |
|             resnet50              |  32  |  0.8353  |         1.0021         |        0.8368         |               1.0001                |
|           hf_Bert_large           |  4   |  0.8302  |         1.0916         |        0.8302         |               1.0916                |
|            hf_T5_large            |  2   |  0.8201  |         1.1919         |        0.8201         |               1.1919                |
|            timm_vovnet            |  32  |  0.8185  |         1.0133         |        0.7426         |               1.0135                |
|           pytorch_unet            |  1   |  0.8134  |         1.0094         |        0.7708         |               1.0094                |
|         phlippe_densenet          | 128  |  0.8058  |         1.0057         |        0.7988         |               1.0056                |
|               dcgan               |  32  |  0.7955  |         0.9998         |        0.1811         |               0.9998                |
|              hf_Bart              |  4   |  0.793   |         1.0113         |        0.7623         |               1.0102                |
|           squeezenet1_1           |  32  |  0.7867  |         1.0815         |         0.763         |               1.0814                |
|        mobilenet_v3_large         |  32  |  0.7849  |          1.0           |         0.698         |                 1.0                 |
|              demucs               |  4   |  0.7826  |         0.9998         |        0.7662         |               0.9998                |
|          pytorch_stargan          |  16  |  0.7715  |         1.0716         |        0.7743         |               1.0716                |
|              alexnet              | 128  |  0.7396  |         1.0013         |        0.7396         |               1.0397                |
|               vgg16               |  64  |  0.7227  |         0.9886         |        0.7228         |               1.0332                |
|            mnasnet1_0             |  32  |  0.7144  |         1.0027         |        0.7485         |               1.0034                |
|            densenet121            |  4   |  0.7071  |         0.9989         |        0.7107         |               1.0012                |
|          pytorch_struct           | 200  |  0.697   |          1.0           |        0.9395         |               1.0001                |
|            hf_BigBird             |  2   |  0.6949  |         1.1929         |         0.694         |               1.1929                |
|      nvidia_deeprecommender       | 256  |  0.6857  |         0.9711         |        0.6857         |               1.0001                |
|          resnext50_32x4d          |  8   |  0.6786  |          1.0           |        0.6565         |               1.0016                |
|                drq                |  1   |  0.6429  |         0.9687         |        0.1818         |                1.035                |
|         soft_actor_critic         | 256  |  0.6067  |         0.9974         |        0.1108         |               0.9974                |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  0.6065  |         1.0224         |        0.5458         |               1.0224                |
|          LearningToPaint          |  96  |  0.5925  |         0.9944         |        0.6015         |               0.9944                |
|             resnet18              |  16  |  0.5891  |         0.9931         |        0.5364         |               0.9931                |
|           lennard_jones           | 1000 |  0.5317  |         1.0001         |        0.0648         |               1.0587                |
|            hf_Reformer            |  4   |  0.4539  |         1.0027         |        0.4622         |               1.0027                |
|       functorch_dp_cifar10        |  64  |  0.3991  |         1.0609         |        0.4626         |               1.0609                |
|          phlippe_resnet           | 128  |  0.3169  |         1.008          |        0.3272         |                1.008                |
|   timm_vision_transformer_large   |  32  |   nan    |         0.9723         |          nan          |               0.9791                |
|                gat                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|                gcn                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|           hf_Longformer           |  0   |   nan    |          nan           |          nan          |                 nan                 |
|               moco                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|               sage                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|             tacotron2             |  0   |   nan    |          nan           |          nan          |                 nan                 |
|           torchrec_dlrm           |  0   |   nan    |          nan           |          nan          |                 nan                 |
+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+

Absolute latency (ms)

+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+
|               name                |  bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+
|           hf_GPT2_large           |  4   | 120.9949 |        116.6407        |          nan          |              108.8995               |
|        Background_Matting         |  4   | 103.5854 |        104.166         |       103.2133        |              103.9099               |
|            hf_T5_large            |  2   | 94.0877  |        112.042         |        90.1229        |              105.0678               |
|               hf_T5               |  8   | 91.4678  |        90.5267         |        89.5872        |               88.9915               |
|            timm_nfnet             | 128  | 75.3214  |        78.3379         |        74.6814        |               77.7218               |
|            hf_BigBird             |  2   |  74.517  |        113.6677        |        73.379         |              112.2197               |
|            hf_Reformer            |  4   | 71.5373  |        75.7153         |        70.6722        |               74.8327               |
|            Super_SloMo            |  6   | 63.3501  |        64.2984         |        63.1509        |               64.0087               |
|              yolov3               |  16  | 57.2038  |        57.2744         |        56.834         |               56.5797               |
|            timm_regnet            |  32  | 55.6464  |        58.7053         |        55.2539        |               57.8637               |
|               vgg16               |  64  | 53.3929  |        52.7979         |        52.9403        |               52.317                |
|             resnet152             |  32  | 53.2036  |        63.0864         |        53.1877        |               62.3167               |
|              demucs               |  4   | 51.5908  |        51.6758         |        51.8294        |               51.7754               |
|           hf_Bert_large           |  4   | 50.6631  |        50.0943         |        49.6293        |               48.9384               |
|        speech_transformer         |  32  |  35.982  |          36.0          |        31.6151        |               31.3294               |
| attention_is_all_you_need_pytorch | 256  | 35.4362  |        35.6083         |        31.7431        |               31.9067               |
|              hf_Bart              |  4   | 34.0784  |        35.0328         |        31.4883        |               32.9934               |
|           fastNLP_Bert            |  6   | 33.4188  |        33.9433         |        31.2706        |               31.3598               |
|           mobilenet_v2            |  96  | 30.8429  |         31.165         |        30.8653        |               30.844                |
|           pytorch_unet            |  1   | 29.2184  |        29.4466         |        29.2677        |               29.3277               |
|             hf_Albert             |  8   | 29.0675  |        29.6172         |        28.819         |               29.442                |
|            timm_vovnet            |  32  | 26.9919  |         27.541         |        27.0308        |               27.3558               |
|              hf_GPT2              |  4   | 25.9285  |        25.7391         |        23.4441        |               23.5272               |
|         timm_efficientnet         |  32  | 23.2412  |        30.5885         |        23.1287        |               30.8732               |
|             resnet50              |  32  | 22.1006  |        25.0252         |        22.1314        |               24.5177               |
|              hf_Bert              |  4   | 21.8011  |         23.871         |        21.3681        |               23.5383               |
|           hf_DistilBert           |  8   | 21.2775  |        20.8865         |        21.0167        |               20.8359               |
|            densenet121            |  4   | 19.8637  |        51.8501         |        20.3152        |               51.2219               |
|        shufflenet_v2_x1_0         | 128  | 18.7948  |         25.566         |        18.7733        |               25.155                |
|      timm_vision_transformer      |  32  | 17.9678  |        19.9034         |        16.3517        |               18.1088               |
|           BERT_pytorch            |  16  | 17.0725  |        24.6427         |        16.3854        |               24.0209               |
|           timm_resnest            |  32  | 15.2495  |        15.7393         |        15.2485        |               15.5518               |
|            mnasnet1_0             |  32  | 13.0466  |        21.6279         |        13.3825        |               19.9102               |
|        mobilenet_v3_large         |  32  | 12.9163  |        22.2031         |        13.1215        |               21.7979               |
|          resnext50_32x4d          |  8   | 11.9392  |        20.6573         |        12.0657        |               20.1557               |
|          pytorch_stargan          |  16  |  11.77   |        11.9393         |        11.6463        |               11.8596               |
|      nvidia_deeprecommender       | 256  |  11.707  |        10.0214         |        10.9381        |               9.2529                |
|         phlippe_densenet          | 128  | 11.4082  |        22.9777         |        11.4472        |               22.7842               |
|              alexnet              | 128  |  9.0139  |         8.6398         |        8.6663         |                8.295                |
|          LearningToPaint          |  96  |  8.5569  |        10.7769         |        8.3681         |               10.2813               |
|            tts_angular            |  64  |  6.5413  |         6.5988         |        6.5713         |               6.5622                |
|             resnet18              |  16  |   5.84   |         9.4998         |        5.6093         |               9.0532                |
|   pytorch_CycleGAN_and_pix2pix    |  1   |  5.7967  |         7.6938         |        5.7312         |               7.5394                |
|           squeezenet1_1           |  32  |  5.4322  |         7.7193         |        5.1811         |               7.2148                |
|          phlippe_resnet           | 128  |  5.0226  |         8.9907         |         5.038         |                8.785                |
|       functorch_dp_cifar10        |  64  |  2.7787  |         7.303          |         2.806         |               7.1296                |
|                drq                |  1   |  2.4704  |         3.1289         |        2.4997         |               2.9058                |
|          pytorch_struct           | 200  |  2.453   |         4.1952         |        2.3217         |               3.3137                |
|               dlrm                | 1024 |  2.1539  |         3.6077         |        2.1012         |               3.4372                |
|         soft_actor_critic         | 256  |  1.9309  |         2.1787         |        1.3703         |               2.0865                |
|               dcgan               |  32  |  1.5488  |         2.5518         |        1.5306         |               2.5165                |
|           lennard_jones           | 1000 |  1.1898  |         2.4098         |        1.1487         |               1.5424                |
|   timm_vision_transformer_large   |  32  |   nan    |        429.9693        |          nan          |              398.9973               |
|                gat                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|                gcn                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|           hf_Longformer           |  0   |   nan    |          nan           |          nan          |                 nan                 |
|               moco                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|               sage                |  0   |   nan    |          nan           |          nan          |                 nan                 |
|             tacotron2             |  0   |   nan    |          nan           |          nan          |                 nan                 |
|           torchrec_dlrm           |  0   |   nan    |          nan           |          nan          |                 nan                 |
+-----------------------------------+------+----------+------------------------+-----------------------+-------------------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|                  name                   | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|          MobileBertForMaskedLM          | 64  |  2.8379  |         1.1804         |         2.542         |               1.3537                |
|     MobileBertForQuestionAnswering      | 128 |  2.7692  |         1.1808         |        1.4973         |               1.3123                |
|      GPT2ForSequenceClassification      |  4  |  2.3231  |         2.3576         |        2.3887         |               2.4404                |
|             OPTForCausalLM              |  2  |  2.2847  |         2.3123         |        2.3442         |               2.3729                |
|       MT5ForConditionalGeneration       | 16  |  2.2568  |         1.9766         |         2.283         |               2.1255                |
|       ElectraForQuestionAnswering       | 64  |  2.1765  |         2.1411         |        2.1955         |               2.1551                |
|           ElectraForCausalLM            | 32  |  1.8447  |         1.8684         |         1.851         |                1.883                |
|    LayoutLMForSequenceClassification    | 16  |  1.8334  |         1.8149         |        1.8365         |               1.8208                |
|            XLNetLMHeadModel             |  8  |  1.8198  |         1.8157         |         1.849         |               1.8362                |
|        BertForQuestionAnswering         | 16  |  1.8037  |         1.805          |        1.8094         |               1.8103                |
|       RobertaForQuestionAnswering       | 16  |  1.8007  |         1.8073         |        1.8071         |               1.8103                |
|     M2M100ForConditionalGeneration      | 16  |  1.7192  |         1.4136         |        1.5253         |               1.5079                |
|           RobertaForCausalLM            | 16  |  1.6812  |         1.6989         |        1.6865         |               1.7033                |
|               DistillGPT2               | 16  |  1.6794  |         1.7193         |        1.9253         |               1.9826                |
|                 T5Small                 |  4  |  1.6666  |         1.8066         |        1.7059         |               1.8308                |
|       T5ForConditionalGeneration        |  4  |  1.6661  |         1.805          |        1.7018         |               1.8589                |
|    MegatronBertForQuestionAnswering     |  8  |  1.6556  |         1.6779         |        1.6706         |               1.6946                |
|       AlbertForQuestionAnswering        |  4  |  1.6425  |         1.6439         |        1.6396         |               1.6462                |
|             XGLMForCausalLM             |  8  |  1.6414  |         1.567          |         1.686         |               1.6278                |
|            AlbertForMaskedLM            |  4  |  1.6328  |         1.6345         |        1.6189         |               1.6393                |
|           LayoutLMForMaskedLM           | 16  |  1.6184  |         1.6414         |        1.6147         |               1.6395                |
|             BertForMaskedLM             | 16  |  1.5979  |         1.6155         |        1.6036         |               1.6168                |
|     PLBartForConditionalGeneration      |  4  |  1.5907  |         1.6254         |        1.7091         |               1.7489                |
|                CamemBert                | 16  |  1.5473  |         1.5626         |        1.6282         |               1.6418                |
|         MegatronBertForCausalLM         |  4  |  1.5329  |         1.5639         |        1.5751         |               1.6277                |
|            YituTechConvBert             | 16  |  1.523   |         1.5246         |        1.6573         |               1.6564                |
|            PLBartForCausalLM            |  8  |  1.4744  |         1.505          |        1.6825         |               1.7212                |
|             BartForCausalLM             |  4  |  1.4667  |         1.499          |        1.6073         |               1.6423                |
|            MBartForCausalLM             |  4  |  1.4619  |         1.4935         |        1.6016         |               1.6392                |
|     DistilBertForQuestionAnswering      | 256 |  1.4549  |         1.4549         |        1.4664         |                1.469                |
|      BartForConditionalGeneration       |  2  |  1.4517  |         1.477          |        1.5394         |               1.5653                |
|      MBartForConditionalGeneration      |  2  |  1.4393  |         1.4681         |        1.5284         |               1.5603                |
|         Speech2Text2ForCausalLM         | 256 |  1.423   |         1.4552         |         1.438         |               1.4968                |
| BlenderbotSmallForConditionalGeneration | 64  |  1.3494  |         1.3821         |        1.4451         |               1.4879                |
|     PegasusForConditionalGeneration     | 32  |  1.2481  |         1.2879         |        1.3445         |               1.3718                |
|            TrOCRForCausalLM             | 32  |  1.2403  |         1.2717         |        1.3742         |               1.4124                |
|          DistilBertForMaskedLM          | 128 |  1.2142  |         1.2433         |        1.2239         |               1.2496                |
|       BlenderbotSmallForCausalLM        | 64  |  1.2128  |         1.2526         |        1.3706         |                1.402                |
|       DebertaForQuestionAnswering       |  8  |  1.1834  |         1.0615         |        1.2014         |               1.0904                |
|           PegasusForCausalLM            | 32  |  1.1733  |         1.2101         |        1.3076         |               1.3469                |
|           DebertaForMaskedLM            |  4  |  1.0944  |         0.9127         |        1.1479         |               0.9382                |
|          DebertaV2ForMaskedLM           |  1  |  1.0122  |         0.7354         |        1.0227         |               0.7303                |
|      DebertaV2ForQuestionAnswering      |  2  |  0.9377  |         0.7692         |        0.9557         |               0.7836                |
|          BlenderbotForCausalLM          |  4  |   0.0    |         1.1121         |          0.0          |               1.1282                |
|          AllenaiLongformerBase          |  0  |   0.0    |          0.0           |          0.0          |                 0.0                 |
+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------------+-----------------------+-------------------------------------+
|                  name                   | bs |     inductor     | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------------+-----------------------+-------------------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip |    pass_due_to_skip    |   pass_due_to_skip    |          pass_due_to_skip           |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip |    pass_due_to_skip    |   pass_due_to_skip    |          pass_due_to_skip           |
|       MT5ForConditionalGeneration       | 1  |       pass       |          pass          |         pass          |                pass                 |
|         MegatronBertForCausalLM         | 1  |       pass       |          pass          |         pass          |                pass                 |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |          pass          |         pass          |                pass                 |
|          MobileBertForMaskedLM          | 1  |       pass       |          pass          |         pass          |                pass                 |
|     MobileBertForQuestionAnswering      | 1  |       pass       |          pass          |         pass          |                pass                 |
|             OPTForCausalLM              | 1  |       pass       |          pass          |         pass          |                pass                 |
|            PLBartForCausalLM            | 1  |       pass       |          pass          |         pass          |                pass                 |
|     PLBartForConditionalGeneration      | 1  |       pass       |          pass          |         pass          |                pass                 |
|           PegasusForCausalLM            | 1  |       pass       |          pass          |         pass          |                pass                 |
|     PegasusForConditionalGeneration     | 1  |       pass       |          pass          |         pass          |                pass                 |
|           RobertaForCausalLM            | 1  |       pass       |          pass          |         pass          |                pass                 |
|       RobertaForQuestionAnswering       | 1  |       pass       |          pass          |         pass          |                pass                 |
|         Speech2Text2ForCausalLM         | 1  |       pass       |          pass          |         pass          |                pass                 |
|       T5ForConditionalGeneration        | 1  |       pass       |          pass          |         pass          |                pass                 |
|                 T5Small                 | 1  |       pass       |          pass          |         pass          |                pass                 |
|            TrOCRForCausalLM             | 1  |       pass       |          pass          |         pass          |                pass                 |
|             XGLMForCausalLM             | 1  |       pass       |          pass          |         pass          |                pass                 |
|            XLNetLMHeadModel             | 1  |       pass       |          pass          |         pass          |                pass                 |
|            YituTechConvBert             | 1  |       pass       |          pass          |         pass          |                pass                 |
|      MBartForConditionalGeneration      | 1  |       pass       |          pass          |         pass          |                pass                 |
|            MBartForCausalLM             | 1  |       pass       |          pass          |         pass          |                pass                 |
|     M2M100ForConditionalGeneration      | 1  |       pass       |          pass          |         pass          |                pass                 |
|    LayoutLMForSequenceClassification    | 1  |       pass       |          pass          |         pass          |                pass                 |
|            AlbertForMaskedLM            | 1  |       pass       |          pass          |         pass          |                pass                 |
|          AllenaiLongformerBase          | 1  |       pass       |          pass          |         pass          |                pass                 |
|             BartForCausalLM             | 1  |       pass       |          pass          |         pass          |                pass                 |
|      BartForConditionalGeneration       | 1  |       pass       |          pass          |         pass          |                pass                 |
|             BertForMaskedLM             | 1  |       pass       |          pass          |         pass          |                pass                 |
|        BertForQuestionAnswering         | 1  |       pass       |          pass          |         pass          |                pass                 |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |          pass          |         pass          |                pass                 |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |          pass          |         pass          |                pass                 |
|                CamemBert                | 1  |       pass       |          pass          |         pass          |                pass                 |
|           DebertaForMaskedLM            | 1  |       pass       |          pass          |         pass          |                pass                 |
|       DebertaForQuestionAnswering       | 1  |       pass       |          pass          |         pass          |                pass                 |
|          DistilBertForMaskedLM          | 1  |       pass       |          pass          |         pass          |                pass                 |
|     DistilBertForQuestionAnswering      | 1  |       pass       |          pass          |         pass          |                pass                 |
|               DistillGPT2               | 1  |       pass       |          pass          |         pass          |                pass                 |
|           ElectraForCausalLM            | 1  |       pass       |          pass          |         pass          |                pass                 |
|       ElectraForQuestionAnswering       | 1  |       pass       |          pass          |         pass          |                pass                 |
|      GPT2ForSequenceClassification      | 1  |       pass       |          pass          |         pass          |                pass                 |
|           LayoutLMForMaskedLM           | 1  |       pass       |          pass          |         pass          |                pass                 |
|      DebertaV2ForQuestionAnswering      | 1  |   fail_to_run    |          pass          |      fail_to_run      |                pass                 |
|       AlbertForQuestionAnswering        | 1  |  fail_accuracy   |     fail_accuracy      |     fail_accuracy     |            fail_accuracy            |
+-----------------------------------------+----+------------------+------------------------+-----------------------+-------------------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|                  name                   | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|          MobileBertForMaskedLM          | 64  | 144.5755 |        102.6639        |       546.5416        |              139.5054               |
|     MobileBertForQuestionAnswering      | 128 | 142.8046 |        101.7791        |       537.4145        |              138.2121               |
|          DebertaV2ForMaskedLM           |  1  | 140.6903 |        57.1807         |       466.6899        |               76.2266               |
|      DebertaV2ForQuestionAnswering      |  2  | 140.1457 |        61.8347         |       348.7994        |               79.0066               |
|     M2M100ForConditionalGeneration      | 16  | 137.1882 |        71.3113         |       217.4307        |               97.9629               |
|       MT5ForConditionalGeneration       | 16  | 133.1446 |        48.8807         |       483.0852        |               67.2957               |
|             XGLMForCausalLM             |  8  | 121.334  |        58.2817         |       212.8778        |               78.0456               |
|            XLNetLMHeadModel             |  8  | 95.8362  |        78.7795         |       252.4958        |              104.9234               |
|      MBartForConditionalGeneration      |  2  | 82.4087  |        70.6402         |       109.5138        |               96.8137               |
|           DebertaForMaskedLM            |  4  | 79.7105  |        33.6328         |       158.0097        |               42.3338               |
|       DebertaForQuestionAnswering       |  8  |  76.661  |        33.5332         |       201.8764        |               41.2801               |
|      BartForConditionalGeneration       |  2  | 75.1442  |        68.2448         |       217.2458        |               94.2897               |
|     PegasusForConditionalGeneration     | 32  | 67.9314  |        62.7712         |       104.6898        |               88.1643               |
|    MegatronBertForQuestionAnswering     |  8  |  66.518  |        57.6726         |       134.8533        |               78.1936               |
|            YituTechConvBert             | 16  |  66.375  |         45.547         |       209.9396        |               57.9979               |
|         MegatronBertForCausalLM         |  4  | 61.8968  |         58.287         |       103.0386        |               78.2287               |
| BlenderbotSmallForConditionalGeneration | 64  | 53.1261  |        46.1383         |        74.7762        |               64.5962               |
|           ElectraForCausalLM            | 32  | 50.6319  |        33.6646         |       386.2907        |               42.0994               |
|       T5ForConditionalGeneration        |  4  | 50.1657  |        37.1969         |       233.8668        |               49.591                |
|     PLBartForConditionalGeneration      |  4  | 47.3857  |        37.9846         |        85.0962        |               50.7367               |
|       ElectraForQuestionAnswering       | 64  | 42.7933  |        32.2662         |       241.3952        |               40.2974               |
|    LayoutLMForSequenceClassification    | 16  | 41.7579  |        34.0668         |       160.4051        |               41.8708               |
|             BertForMaskedLM             | 16  | 38.7295  |        30.3644         |       214.0867        |               39.9959               |
|             OPTForCausalLM              |  2  | 37.5386  |        30.2833         |        99.0268        |               38.8445               |
|            AlbertForMaskedLM            |  4  | 37.4582  |        24.6813         |       300.1565        |               34.2281               |
|            MBartForCausalLM             |  4  | 36.7723  |        30.4993         |        48.9687        |               40.8436               |
|           PegasusForCausalLM            | 32  | 36.5871  |        30.4338         |        94.427         |               40.2854               |
|      GPT2ForSequenceClassification      |  4  | 35.9461  |        28.2024         |       188.6827        |               37.4743               |
|            TrOCRForCausalLM             | 32  | 35.7626  |        29.3602         |       186.3087        |               39.7047               |
|     DistilBertForQuestionAnswering      | 256 | 35.6994  |        19.9854         |       184.4006        |               23.3917               |
|           LayoutLMForMaskedLM           | 16  | 35.3459  |        33.2269         |        40.5109        |               40.9049               |
|                 T5Small                 |  4  | 34.8294  |        37.3824         |        47.6236        |               48.8331               |
|             BartForCausalLM             |  4  | 34.3041  |         29.607         |       168.1618        |               39.0359               |
|          DistilBertForMaskedLM          | 128 | 33.9409  |        18.8863         |       206.8346        |               23.0878               |
|                CamemBert                | 16  | 33.6916  |        31.2407         |        68.9291        |               40.2634               |
|           RobertaForCausalLM            | 16  | 33.5368  |        31.5115         |        44.2814        |               41.1254               |
|        BertForQuestionAnswering         | 16  | 31.8795  |        31.2068         |        92.8413        |               39.9058               |
|       RobertaForQuestionAnswering       | 16  | 30.7016  |         32.569         |        40.1121        |               40.9837               |
|       BlenderbotSmallForCausalLM        | 64  | 28.3922  |        22.7183         |       155.0844        |               27.9117               |
|               DistillGPT2               | 16  | 28.2467  |        16.8918         |       139.5229        |               21.4802               |
|       AlbertForQuestionAnswering        |  4  | 24.8724  |        23.6732         |        88.2348        |               33.7614               |
|         Speech2Text2ForCausalLM         | 256 | 24.1175  |        18.0462         |       131.2631        |               23.6207               |
|            PLBartForCausalLM            |  8  | 24.0539  |        18.9783         |        66.2115        |               23.7083               |
|          BlenderbotForCausalLM          |  4  |   nan    |        56.4431         |          nan          |               75.0952               |
|          AllenaiLongformerBase          |  0  |   nan    |          nan           |          nan          |                 nan                 |
+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|                  name                   | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|            XLNetLMHeadModel             |  8  |  1.1551  |         1.1551         |        1.1551         |               1.1551                |
|       ElectraForQuestionAnswering       | 64  |  1.1376  |         1.195          |        1.1104         |                1.195                |
|      GPT2ForSequenceClassification      |  4  |  1.1139  |          1.23          |        1.1135         |                1.23                 |
|             OPTForCausalLM              |  2  |  1.0939  |         1.1343         |         1.094         |               1.1343                |
|        BertForQuestionAnswering         | 16  |  1.0607  |         1.1729         |        1.0607         |               1.1729                |
|       RobertaForQuestionAnswering       | 16  |  1.0603  |         1.1724         |        1.0603         |               1.1724                |
|    LayoutLMForSequenceClassification    | 16  |  1.0583  |         1.1734         |        1.0583         |               1.1736                |
|                 T5Small                 |  4  |  1.0382  |         1.1813         |        1.0382         |               1.1813                |
|       T5ForConditionalGeneration        |  4  |  1.0382  |         1.1813         |        1.0356         |               1.1813                |
|     DistilBertForQuestionAnswering      | 256 |  1.0299  |         1.1486         |        1.0418         |               1.1486                |
|           LayoutLMForMaskedLM           | 16  |  1.0078  |         1.0517         |        1.0078         |               1.0517                |
|           RobertaForCausalLM            | 16  |  1.0077  |         1.052          |        1.0077         |                1.052                |
|             BertForMaskedLM             | 16  |  1.0075  |         1.0518         |        0.9463         |               1.0518                |
|                CamemBert                | 16  |  1.0035  |         1.0492         |        0.9417         |               1.0492                |
|            YituTechConvBert             | 16  |  0.9911  |         1.0411         |        0.9911         |               1.0411                |
|       AlbertForQuestionAnswering        |  4  |  0.9729  |         1.3147         |        0.9729         |               1.3147                |
|               DistillGPT2               | 16  |  0.9682  |         1.0641         |        0.9682         |               1.0641                |
|     PLBartForConditionalGeneration      |  4  |  0.9649  |         1.0521         |        0.9294         |               1.0521                |
|    MegatronBertForQuestionAnswering     |  8  |  0.953   |         1.1152         |         0.953         |               1.1152                |
|            AlbertForMaskedLM            |  4  |  0.9501  |         1.268          |        0.9501         |                1.268                |
|            MBartForCausalLM             |  4  |  0.9281  |         0.9912         |        0.9281         |               0.9912                |
|            PLBartForCausalLM            |  8  |  0.914   |         0.9887         |        0.8439         |               0.9887                |
|             BartForCausalLM             |  4  |  0.9137  |         0.9749         |        0.8818         |               0.9749                |
|       MT5ForConditionalGeneration       | 16  |  0.9089  |         1.0018         |        0.8222         |               1.0018                |
|           ElectraForCausalLM            | 32  |  0.8941  |         0.9739         |        0.8941         |               0.9739                |
|           PegasusForCausalLM            | 32  |  0.893   |         0.9864         |         0.893         |               0.9864                |
|          DistilBertForMaskedLM          | 128 |  0.8849  |         0.9624         |        0.8045         |               0.9624                |
|            TrOCRForCausalLM             | 32  |  0.8836  |         0.9583         |        0.8836         |               0.9583                |
| BlenderbotSmallForConditionalGeneration | 64  |  0.8729  |         0.9803         |         0.816         |               0.9803                |
|     PegasusForConditionalGeneration     | 32  |  0.8689  |         1.0689         |        0.8687         |               1.0689                |
|      MBartForConditionalGeneration      |  2  |  0.8574  |         1.0307         |        0.8574         |               1.0307                |
|      BartForConditionalGeneration       |  2  |  0.8456  |         1.0139         |        0.8456         |               1.0139                |
|         MegatronBertForCausalLM         |  4  |  0.845   |         1.0961         |         0.845         |               1.0961                |
|       BlenderbotSmallForCausalLM        | 64  |  0.8184  |         0.9119         |        0.7355         |               0.9119                |
|         Speech2Text2ForCausalLM         | 256 |  0.789   |         0.8779         |        0.7143         |               0.8779                |
|     M2M100ForConditionalGeneration      | 16  |  0.7651  |         0.9908         |        0.7651         |               0.9908                |
|          MobileBertForMaskedLM          | 64  |  0.752   |         1.016          |        0.7654         |                1.016                |
|             XGLMForCausalLM             |  8  |  0.7117  |         0.9792         |        0.7117         |               0.9792                |
|     MobileBertForQuestionAnswering      | 128 |  0.6569  |         0.8579         |        0.6505         |               0.8579                |
|           DebertaForMaskedLM            |  4  |  0.5646  |         1.0748         |        0.5649         |               1.0733                |
|          DebertaV2ForMaskedLM           |  1  |  0.5187  |         0.9894         |        0.5129         |               1.0005                |
|       DebertaForQuestionAnswering       |  8  |  0.4867  |         1.2209         |         0.487         |                1.218                |
|      DebertaV2ForQuestionAnswering      |  2  |  0.4855  |         1.0041         |        0.4806         |               1.0036                |
|          BlenderbotForCausalLM          |  4  |   nan    |         0.999          |          nan          |                0.999                |
|          AllenaiLongformerBase          |  0  |   nan    |          nan           |          nan          |                 nan                 |
+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|                  name                   | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|            AlbertForMaskedLM            |  4  | 163.1963 |        163.1587        |       165.2836        |              162.4534               |
|       AlbertForQuestionAnswering        |  4  | 160.9963 |        160.7677        |        161.433        |              160.4504               |
|            XLNetLMHeadModel             |  8  | 153.1104 |        153.6633        |       151.6684        |              152.5373               |
|      DebertaV2ForQuestionAnswering      |  2  | 116.3646 |        157.4995        |       112.4073        |              136.5503               |
|     PegasusForConditionalGeneration     | 32  | 109.4586 |        107.0701        |       102.1534        |               99.8395               |
|            TrOCRForCausalLM             | 32  | 108.956  |        106.1477        |        98.8934        |               95.5395               |
|          DebertaV2ForMaskedLM           |  1  | 106.0534 |        143.0314        |       104.3972        |              141.3999               |
|      MBartForConditionalGeneration      |  2  | 93.8608  |        92.1148         |        87.9214        |               86.2895               |
|      BartForConditionalGeneration       |  2  | 93.0417  |         91.187         |        87.7946        |               86.2112               |
|    MegatronBertForQuestionAnswering     |  8  |  85.885  |        84.7233         |        84.9406        |               83.9433               |
|            YituTechConvBert             | 16  | 82.2597  |        82.3462         |        75.6132        |               75.5079               |
| BlenderbotSmallForConditionalGeneration | 64  |  80.032  |        78.3281         |        74.7315        |               73.1472               |
|                CamemBert                | 16  | 76.4407  |        75.7046         |        72.7593        |               71.9841               |
|            MBartForCausalLM             |  4  | 74.3929  |        72.7671         |        67.9001        |               66.3893               |
|     M2M100ForConditionalGeneration      | 16  | 74.2504  |        76.7499         |        71.7425        |               70.8671               |
|             BartForCausalLM             |  4  | 74.0096  |        72.4829         |        67.6696        |               66.1328               |
|     PLBartForConditionalGeneration      |  4  | 71.6926  |        70.4056         |        66.6692        |               65.2681               |
|     DistilBertForQuestionAnswering      | 256 | 71.2658  |        71.5186         |        70.6492        |               70.572                |
|          DistilBertForMaskedLM          | 128 | 69.9579  |        68.3028         |        69.4614        |               67.6883               |
|     MobileBertForQuestionAnswering      | 128 | 69.8735  |        151.3741        |       113.2246        |              133.5794               |
|           LayoutLMForMaskedLM           | 16  | 69.6713  |        68.9603         |        69.6797        |               68.6248               |
|            PLBartForCausalLM            |  8  | 69.5727  |        68.1384         |        60.9211        |               59.6209               |
|             BertForMaskedLM             | 16  | 68.7292  |        67.9596         |        68.6083        |                67.97                |
|           RobertaForCausalLM            | 16  | 68.3556  |        67.7435         |        68.2043        |               67.5193               |
|             OPTForCausalLM              |  2  | 68.2014  |        67.3931         |        66.4461        |               65.6217               |
|       DebertaForQuestionAnswering       |  8  | 64.2309  |        71.1741         |        63.2963        |               69.3185               |
|               DistillGPT2               | 16  | 63.1898  |        61.4512         |        54.8378        |               53.2618               |
|       T5ForConditionalGeneration        |  4  | 62.6908  |        58.7865         |        61.1997        |               57.4435               |
|                 T5Small                 |  4  | 62.6659  |        58.7947         |        61.1709        |               57.1797               |
|          MobileBertForMaskedLM          | 64  | 60.8558  |        153.5208        |        68.6975        |              131.0284               |
|           DebertaForMaskedLM            |  4  | 58.2843  |        68.9715         |        56.5048        |               66.6704               |
|           PegasusForCausalLM            | 32  | 58.1039  |        56.5348         |        52.1993        |               50.7132               |
|         MegatronBertForCausalLM         |  4  | 56.6531  |        55.5386         |        54.9803        |               53.7762               |
|             XGLMForCausalLM             |  8  | 53.3488  |        56.2894         |        51.1424        |               53.1789               |
|    LayoutLMForSequenceClassification    | 16  | 53.3059  |        53.9067         |        53.1758        |               53.649                |
|       RobertaForQuestionAnswering       | 16  | 53.0766  |        53.3038         |        52.8924        |               52.7856               |
|        BertForQuestionAnswering         | 16  | 52.7161  |        52.7784         |        52.5468        |               52.4791               |
|       ElectraForQuestionAnswering       | 64  | 52.5425  |        53.4447         |        52.1755        |               53.1431               |
|           ElectraForCausalLM            | 32  | 47.7277  |        47.1817         |        47.551         |               46.6989               |
|       BlenderbotSmallForCausalLM        | 64  | 46.5247  |        45.4596         |        41.3807        |               40.2216               |
|       MT5ForConditionalGeneration       | 16  | 41.7186  |        47.6859         |        40.5796        |               44.2481               |
|      GPT2ForSequenceClassification      |  4  | 39.3523  |         38.805         |        38.2517        |               37.8337               |
|         Speech2Text2ForCausalLM         | 256 | 34.6072  |        33.8303         |        34.2895        |               33.5782               |
|          BlenderbotForCausalLM          |  4  |   nan    |        81.9093         |          nan          |               80.6264               |
|          AllenaiLongformerBase          |  0  |   nan    |          nan           |          nan          |                 nan                 |
+-----------------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|        tnt_s_patch16_224        | 128 |  3.026   |         2.9865         |        3.3514         |               3.3072                |
|      xcit_large_24_p8_224       |  5  |  2.0139  |         1.6339         |        2.4595         |               1.6378                |
|        twins_pcpvt_base         | 64  |  1.9894  |         1.7401         |        2.1625         |               1.8586                |
|         coat_lite_mini          | 128 |  1.9522  |         1.9288         |        2.0839         |               2.0577                |
|          gmlp_s16_224           | 128 |  1.8654  |         1.8486         |        1.8895         |               1.8629                |
|          ghostnet_100           | 128 |  1.8548  |         1.6132         |        1.8609         |               1.6433                |
|          gmixer_24_224          | 128 |  1.7826  |         1.7619         |        1.9152         |               1.8867                |
|           volo_d1_224           | 64  |  1.7045  |         1.6798         |        1.7626         |               1.7374                |
|         crossvit_9_240          | 128 |  1.6634  |         1.6403         |        1.8579         |               1.8229                |
|  swin_base_patch4_window7_224   | 64  |  1.6379  |         1.6272         |         1.744         |                1.731                |
|           convit_base           | 64  |  1.6182  |         1.6148         |        1.7199         |               1.7183                |
|            lcnet_050            | 128 |  1.6062  |         1.3809         |        1.6373         |               1.3716                |
|       gluon_inception_v3        | 128 |  1.5356  |         1.525          |        1.5428         |               1.5267                |
|        adv_inception_v3         | 128 |  1.534   |         1.5217         |        1.5427         |               1.5317                |
|          inception_v3           | 128 |  1.5338  |         1.5212         |        1.5417         |               1.5298                |
|             dla102              | 128 |  1.5293  |         1.5277         |         1.535         |               1.5308                |
|          convnext_base          | 64  |  1.5259  |         1.5017         |        1.5366         |               1.5175                |
|            nfnet_l0             | 128 |  1.5072  |         1.4537         |        1.5108         |               1.4558                |
|           dm_nfnet_f0           | 128 |  1.5035  |         1.4557         |        1.5205         |               1.4665                |
|        sebotnet33ts_256         | 64  |  1.4938  |         1.5199         |        1.5026         |               1.5275                |
|            pit_b_224            | 64  |  1.4442  |         1.4382         |        1.6208         |               1.6125                |
|       eca_botnext26ts_256       | 128 |  1.4347  |         1.4163         |        1.4388         |               1.4222                |
|           resnest101e           | 64  |  1.4289  |         1.356          |        1.4328         |                1.358                |
|           mobilevit_s           | 64  |  1.4186  |         1.4164         |        1.4727         |               1.4845                |
|           selecsls42b           | 128 |  1.4075  |         1.4066         |        1.4143         |               1.4126                |
|          botnet26t_256          | 128 |  1.3956  |         1.4138         |        1.4006         |               1.4196                |
|      mobilenetv3_large_100      | 128 |  1.3932  |         1.3941         |        1.4065         |               1.3966                |
|           mnasnet_100           | 128 |  1.3907  |         1.4058         |        1.3931         |               1.4508                |
|          jx_nest_base           | 32  |  1.3891  |         1.3766         |        1.5705         |               1.5543                |
|           regnety_002           | 128 |  1.3792  |         1.2145         |        1.3874         |               1.2139                |
|        res2net50_14w_8s         | 128 |  1.3741  |         1.3534         |        1.3968         |               1.3763                |
|           res2next50            | 128 |  1.3689  |         1.3591         |        1.3681         |               1.3583                |
|          mixer_b16_224          | 128 |  1.3627  |         1.365          |        1.3983         |               1.3978                |
|          cait_m36_384           |  4  |  1.357   |         1.3535         |        1.4529         |               1.4558                |
|      beit_base_patch16_224      | 64  |  1.3548  |         1.357          |        1.4633         |               1.4652                |
|         poolformer_m36          | 64  |  1.3537  |         1.3425         |        1.3518         |               1.3438                |
|         mobilenetv2_100         | 128 |  1.3515  |         1.4039         |        1.3522         |                 1.4                 |
|            hrnet_w18            | 128 |  1.3475  |         1.3503         |        1.3873         |               1.3799                |
|        ese_vovnet19b_dw         | 128 |  1.3401  |         1.3596         |         1.354         |                1.374                |
|       tf_efficientnet_b0        | 128 |  1.3298  |         1.3638         |        1.3289         |               1.3611                |
|          spnasnet_100           | 128 |  1.3072  |         1.3647         |        1.3132         |               1.3681                |
|           fbnetc_100            | 128 |  1.2879  |         1.3661         |        1.3194         |               1.3669                |
|           rexnet_100            | 128 |  1.2836  |         1.3222         |        1.2894         |               1.3234                |
|            fbnetv3_b            | 128 |  1.2737  |         1.2946         |        1.2817         |               1.3094                |
|          resmlp_12_224          | 128 |  1.2729  |         1.2682         |        1.4107         |               1.4059                |
| deit_base_distilled_patch16_224 | 64  |  1.2603  |         1.2604         |        1.3269         |               1.3254                |
|      vit_base_patch16_224       | 64  |  1.2399  |         1.2395         |        1.3509         |                1.351                |
|            tinynet_a            | 128 |  1.2046  |         1.2237         |        1.2066         |               1.2342                |
|          cspdarknet53           | 64  |  1.2031  |         1.2386         |        1.2138         |               1.2477                |
|           tf_mixnet_l           | 128 |  1.1816  |         1.188          |        1.1874         |               1.1934                |
|         visformer_small         | 128 |  1.1765  |         1.1682         |        1.2089         |               1.2015                |
|            mixnet_l             | 128 |  1.1655  |         1.1752         |        1.1761         |               1.1822                |
|        res2net101_26w_4s        | 64  |  1.1612  |         1.0781         |        1.1667         |               1.0969                |
|          pnasnet5large          | 16  |  1.1218  |         1.1364         |        1.1377         |               1.1553                |
|        gluon_xception65         | 32  |  1.0792  |         1.0805         |        1.0901         |               1.0935                |
|             dpn107              | 32  |  1.0693  |         1.1112         |        1.0704         |               1.1096                |
|            repvgg_a2            | 128 |  1.0667  |         1.1006         |        1.0748         |               1.1042                |
|     swsl_resnext101_32x16d      | 32  |  1.059   |         1.022          |        1.0586         |               1.0208                |
|            gernet_l             | 128 |  1.0229  |         1.0491         |        1.0293         |               1.0556                |
|        convmixer_768_32         | 32  |  1.0015  |         1.0022         |        1.0081         |               1.0086                |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Accuracy

+---------------------------------+----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+----+----------+------------------------+-----------------------+-------------------------------------+
|        adv_inception_v3         | 8  |   pass   |          pass          |         pass          |                pass                 |
|      beit_base_patch16_224      | 8  |   pass   |          pass          |         pass          |                pass                 |
|      mobilenetv3_large_100      | 8  |   pass   |          pass          |         pass          |                pass                 |
|           mobilevit_s           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            nfnet_l0             | 8  |   pass   |          pass          |         pass          |                pass                 |
|            pit_b_224            | 8  |   pass   |          pass          |         pass          |                pass                 |
|          pnasnet5large          | 8  |   pass   |          pass          |         pass          |                pass                 |
|         poolformer_m36          | 8  |   pass   |          pass          |         pass          |                pass                 |
|           regnety_002           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            repvgg_a2            | 8  |   pass   |          pass          |         pass          |                pass                 |
|        res2net101_26w_4s        | 8  |   pass   |          pass          |         pass          |                pass                 |
|        res2net50_14w_8s         | 8  |   pass   |          pass          |         pass          |                pass                 |
|           res2next50            | 8  |   pass   |          pass          |         pass          |                pass                 |
|          resmlp_12_224          | 8  |   pass   |          pass          |         pass          |                pass                 |
|           resnest101e           | 8  |   pass   |          pass          |         pass          |                pass                 |
|           rexnet_100            | 8  |   pass   |          pass          |         pass          |                pass                 |
|        sebotnet33ts_256         | 8  |   pass   |          pass          |         pass          |                pass                 |
|           selecsls42b           | 8  |   pass   |          pass          |         pass          |                pass                 |
|          spnasnet_100           | 8  |   pass   |          pass          |         pass          |                pass                 |
|     swsl_resnext101_32x16d      | 8  |   pass   |          pass          |         pass          |                pass                 |
|       tf_efficientnet_b0        | 8  |   pass   |          pass          |         pass          |                pass                 |
|           tf_mixnet_l           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            tinynet_a            | 8  |   pass   |          pass          |         pass          |                pass                 |
|        tnt_s_patch16_224        | 8  |   pass   |          pass          |         pass          |                pass                 |
|        twins_pcpvt_base         | 8  |   pass   |          pass          |         pass          |                pass                 |
|         visformer_small         | 8  |   pass   |          pass          |         pass          |                pass                 |
|      vit_base_patch16_224       | 8  |   pass   |          pass          |         pass          |                pass                 |
|           volo_d1_224           | 8  |   pass   |          pass          |         pass          |                pass                 |
|      xcit_large_24_p8_224       | 8  |   pass   |          pass          |         pass          |                pass                 |
|         mobilenetv2_100         | 8  |   pass   |          pass          |         pass          |                pass                 |
|           mnasnet_100           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            mixnet_l             | 8  |   pass   |          pass          |         pass          |                pass                 |
|       eca_botnext26ts_256       | 8  |   pass   |          pass          |         pass          |                pass                 |
|          botnet26t_256          | 8  |   pass   |          pass          |         pass          |                pass                 |
|          cait_m36_384           | 4  |   pass   |          pass          |         pass          |                pass                 |
|         coat_lite_mini          | 8  |   pass   |          pass          |         pass          |                pass                 |
|           convit_base           | 8  |   pass   |          pass          |         pass          |                pass                 |
|        convmixer_768_32         | 8  |   pass   |          pass          |         pass          |                pass                 |
|          convnext_base          | 8  |   pass   |          pass          |         pass          |                pass                 |
|         crossvit_9_240          | 8  |   pass   |          pass          |         pass          |                pass                 |
|          cspdarknet53           | 8  |   pass   |          pass          |         pass          |                pass                 |
| deit_base_distilled_patch16_224 | 8  |   pass   |          pass          |         pass          |                pass                 |
|             dla102              | 8  |   pass   |          pass          |         pass          |                pass                 |
|           dm_nfnet_f0           | 8  |   pass   |          pass          |         pass          |                pass                 |
|             dpn107              | 8  |   pass   |          pass          |         pass          |                pass                 |
|        ese_vovnet19b_dw         | 8  |   pass   |          pass          |         pass          |                pass                 |
|          mixer_b16_224          | 8  |   pass   |          pass          |         pass          |                pass                 |
|           fbnetc_100            | 8  |   pass   |          pass          |         pass          |                pass                 |
|            fbnetv3_b            | 8  |   pass   |          pass          |         pass          |                pass                 |
|            gernet_l             | 8  |   pass   |          pass          |         pass          |                pass                 |
|          ghostnet_100           | 8  |   pass   |          pass          |         pass          |                pass                 |
|       gluon_inception_v3        | 8  |   pass   |          pass          |         pass          |                pass                 |
|        gluon_xception65         | 8  |   pass   |          pass          |         pass          |                pass                 |
|          gmixer_24_224          | 8  |   pass   |          pass          |         pass          |                pass                 |
|          gmlp_s16_224           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            hrnet_w18            | 8  |   pass   |          pass          |         pass          |                pass                 |
|          inception_v3           | 8  |   pass   |          pass          |         pass          |                pass                 |
|          jx_nest_base           | 8  |   pass   |          pass          |         pass          |                pass                 |
|            lcnet_050            | 8  |   pass   |          pass          |         pass          |                pass                 |
|  swin_base_patch4_window7_224   | 8  |   pass   |          pass          |     fail_accuracy     |                pass                 |
+---------------------------------+----+----------+------------------------+-----------------------+-------------------------------------+

Compilation latency (sec)

+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|           rexnet_100            | 128 | 224.9662 |        43.8651         |       549.7045        |               49.6792               |
|            hrnet_w18            | 128 | 192.1312 |        151.1129        |       538.2087        |              169.1894               |
|          pnasnet5large          | 16  | 158.5366 |        110.0134        |       425.2666        |              119.9122               |
|          ghostnet_100           | 128 | 153.8495 |        53.7871         |        597.803        |               58.2259               |
|        res2net101_26w_4s        | 64  | 150.918  |        87.7778         |       430.3811        |               97.7569               |
|        twins_pcpvt_base         | 64  | 147.4936 |        70.2721         |       1360.7422       |              104.1596               |
|        adv_inception_v3         | 128 | 145.5702 |        52.6549         |       390.6734        |               59.3601               |
|            fbnetv3_b            | 128 | 132.0557 |        60.2455         |       405.0947        |               66.0463               |
|      xcit_large_24_p8_224       |  5  | 126.2305 |        86.2627         |       930.2928        |               115.641               |
|           resnest101e           | 64  | 124.9575 |        79.8068         |       252.5227        |               86.9188               |
|            tinynet_a            | 128 | 120.3408 |        43.3221         |       320.7028        |               47.1822               |
|           mobilevit_s           | 64  | 118.2427 |         57.282         |       1212.6895       |               69.4255               |
|          cait_m36_384           |  4  | 115.4874 |         84.669         |       867.7116        |              132.8021               |
|            mixnet_l             | 128 | 113.4047 |        51.9441         |       486.8732        |               57.2357               |
|  swin_base_patch4_window7_224   | 64  | 106.1148 |         63.544         |       759.2707        |               89.0701               |
|        res2net50_14w_8s         | 128 | 100.3224 |        80.5095         |       479.9018        |               90.3053               |
|         poolformer_m36          | 64  | 94.9903  |        62.2564         |       225.6091        |               66.3959               |
|           fbnetc_100            | 128 |  93.989  |         36.047         |       377.2218        |               40.1527               |
|             dpn107              | 32  | 90.7086  |        64.0835         |       373.3056        |               69.6322               |
|         coat_lite_mini          | 128 | 90.1127  |        33.6073         |       1208.8042       |               44.5213               |
|          cspdarknet53           | 64  | 86.3952  |        40.8298         |       227.6929        |               45.2179               |
|         crossvit_9_240          | 128 |  85.881  |        41.9985         |       1078.4562       |               63.9877               |
|        gluon_xception65         | 32  | 85.8334  |        57.6158         |       204.0599        |               62.532                |
|          jx_nest_base           | 32  | 82.1427  |        52.5359         |       818.2768        |               76.7963               |
|             dla102              | 128 | 75.4119  |        52.0476         |       239.3686        |               57.9405               |
|           tf_mixnet_l           | 128 | 71.7949  |        52.5263         |        74.7729        |               57.3458               |
|           regnety_002           | 128 | 71.2927  |        31.3177         |       299.6921        |               34.7583               |
|          botnet26t_256          | 128 |  68.366  |        29.0346         |       507.1324        |               34.5135               |
|        tnt_s_patch16_224        | 128 | 67.6963  |        48.9934         |       453.8764        |               77.1973               |
|        sebotnet33ts_256         | 64  | 66.4868  |        37.2848         |       632.0998        |               47.2274               |
|           volo_d1_224           | 64  | 63.0412  |        39.7819         |       884.4952        |               62.0114               |
|          gmlp_s16_224           | 128 | 61.0161  |        38.1389         |       152.6563        |               54.1933               |
|            nfnet_l0             | 128 | 60.6763  |        35.0895         |       208.7935        |               39.0511               |
|          convnext_base          | 64  | 60.0963  |        41.6565         |        420.167        |               51.3909               |
|       tf_efficientnet_b0        | 128 | 57.7291  |        38.1859         |       205.9058        |               41.6089               |
|       gluon_inception_v3        | 128 | 54.2118  |        53.0126         |        57.1827        |               61.1203               |
|          inception_v3           | 128 | 54.1294  |        52.9654         |        56.6752        |               58.1736               |
|            gernet_l             | 128 | 51.0487  |        31.4274         |        191.511        |               35.3015               |
|          gmixer_24_224          | 128 |  50.865  |        37.4809         |       277.4413        |               54.7143               |
|       eca_botnext26ts_256       | 128 | 49.9108  |        31.6743         |       272.9045        |               35.6355               |
|           convit_base           | 64  | 48.3676  |        30.4493         |       335.4578        |               45.7735               |
|      mobilenetv3_large_100      | 128 | 48.3606  |        33.7093         |       127.4584        |                37.67                |
|     swsl_resnext101_32x16d      | 32  | 48.1704  |        46.9654         |       104.9527        |               51.188                |
|           mnasnet_100           | 128 | 47.4842  |        31.3142         |        178.561        |               33.5369               |
|           res2next50            | 128 | 47.4303  |        46.5696         |       121.2247        |               49.7811               |
|        ese_vovnet19b_dw         | 128 | 46.6961  |        22.2385         |       158.4995        |               24.6631               |
|            pit_b_224            | 64  | 46.5782  |         29.299         |       800.5762        |               44.3182               |
|         visformer_small         | 128 | 46.3822  |        25.6801         |       358.1299        |               30.8381               |
| deit_base_distilled_patch16_224 | 64  |   43.4   |        24.8982         |       210.4033        |               37.4958               |
|         mobilenetv2_100         | 128 | 41.6236  |        31.8541         |        90.446         |               34.6492               |
|            lcnet_050            | 128 | 40.7226  |        23.6077         |       141.2591        |               25.3412               |
|          resmlp_12_224          | 128 | 40.1924  |        18.6476         |       128.4408        |               24.8915               |
|           dm_nfnet_f0           | 128 | 38.1535  |        38.9289         |        40.2725        |               42.1344               |
|      beit_base_patch16_224      | 64  | 37.0688  |        27.5769         |        272.031        |               40.8913               |
|          spnasnet_100           | 128 |  36.105  |        35.8612         |        60.9458        |               39.5017               |
|        convmixer_768_32         | 32  | 35.6133  |        30.5962         |       101.1971        |               31.817                |
|      vit_base_patch16_224       | 64  | 33.7325  |        25.0333         |        43.3898        |               36.6412               |
|            repvgg_a2            | 128 | 32.9142  |        30.5913         |       164.1643        |               33.4591               |
|           selecsls42b           | 128 | 31.5101  |        27.2454         |        155.84         |               29.7677               |
|          mixer_b16_224          | 128 | 31.2014  |        20.8052         |       206.4417        |               29.0151               |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|          gmlp_s16_224           | 128 |  1.1848  |         1.2358         |        1.1831         |               1.2358                |
|          pnasnet5large          | 16  |  1.1712  |         1.3207         |        1.1522         |               1.3201                |
|          gmixer_24_224          | 128 |  1.1117  |         1.1923         |        1.1144         |               1.1923                |
|           convit_base           | 64  |  1.0948  |         1.1869         |         1.098         |               1.1869                |
|         mobilenetv2_100         | 128 |  1.0431  |         1.1739         |        1.0267         |               1.1739                |
|           dm_nfnet_f0           | 128 |  1.013   |         1.0932         |         1.013         |               1.0932                |
|          resmlp_12_224          | 128 |  1.0079  |         1.1048         |        1.0093         |               1.1048                |
|            tinynet_a            | 128 |  0.9984  |         1.1113         |         0.999         |                1.111                |
|           rexnet_100            | 128 |  0.9977  |         1.0864         |        0.9744         |               1.0862                |
|           resnest101e           | 64  |  0.9972  |         1.1047         |        0.9933         |               1.1047                |
|       tf_efficientnet_b0        | 128 |  0.9871  |         1.1078         |        0.9876         |               1.1074                |
|        tnt_s_patch16_224        | 128 |  0.9834  |         1.066          |         0.986         |                1.066                |
|        convmixer_768_32         | 32  |  0.9762  |         0.9999         |        0.9657         |               0.9999                |
|        twins_pcpvt_base         | 64  |  0.9729  |         1.0909         |        0.9763         |               1.0909                |
|           mobilevit_s           | 64  |  0.9557  |         1.0236         |        0.9263         |               1.0236                |
|             dla102              | 128 |  0.9536  |         1.0437         |        0.9528         |               1.0434                |
|          mixer_b16_224          | 128 |  0.9501  |         1.0133         |        0.9466         |               1.0133                |
|      vit_base_patch16_224       | 64  |  0.9362  |         0.9867         |        0.9362         |               0.9867                |
| deit_base_distilled_patch16_224 | 64  |  0.9353  |         0.9863         |        0.9072         |               0.9863                |
|         visformer_small         | 128 |  0.9348  |         1.0408         |        0.9245         |               1.0408                |
|           tf_mixnet_l           | 128 |  0.9346  |         1.0921         |        0.9343         |                1.092                |
|      beit_base_patch16_224      | 64  |  0.9308  |         1.0156         |        0.9307         |               1.0156                |
|            fbnetv3_b            | 128 |  0.9228  |         1.0004         |         0.917         |               1.0069                |
|            nfnet_l0             | 128 |  0.9215  |         1.0065         |        0.9101         |               1.0065                |
|           volo_d1_224           | 64  |  0.9131  |         1.0077         |        0.9089         |               1.0078                |
|          cspdarknet53           | 64  |  0.9097  |         1.0569         |        0.9098         |               1.0569                |
|        ese_vovnet19b_dw         | 128 |  0.9047  |         1.0046         |        0.8976         |               1.0046                |
|          ghostnet_100           | 128 |  0.8976  |         1.0514         |        0.8408         |                1.05                 |
|            hrnet_w18            | 128 |  0.8918  |         1.0121         |         0.889         |               1.0144                |
|        sebotnet33ts_256         | 64  |  0.891   |         1.1401         |        0.9207         |               1.1401                |
|          inception_v3           | 128 |  0.8904  |         1.0459         |        0.8902         |               1.0459                |
|        adv_inception_v3         | 128 |  0.8904  |         1.0459         |        0.8902         |               1.0459                |
|       gluon_inception_v3        | 128 |  0.8904  |         1.0459         |        0.8902         |               1.0459                |
|      mobilenetv3_large_100      | 128 |  0.8881  |         1.0046         |         0.865         |               1.0046                |
|             dpn107              | 32  |  0.8833  |         0.9977         |        0.8676         |               0.9977                |
|        gluon_xception65         | 32  |  0.8832  |         0.9998         |        0.8833         |               0.9998                |
|          spnasnet_100           | 128 |  0.8786  |         1.0063         |        0.8788         |               1.0063                |
|           selecsls42b           | 128 |  0.8785  |         1.0139         |        0.8473         |               1.0145                |
|         poolformer_m36          | 64  |  0.8768  |         1.1916         |        0.8592         |               1.1916                |
|       eca_botnext26ts_256       | 128 |  0.8738  |         1.0257         |        0.8738         |               1.0257                |
|        res2net50_14w_8s         | 128 |  0.8712  |         0.9828         |        0.8501         |                0.983                |
|        res2net101_26w_4s        | 64  |  0.871   |         0.9822         |        0.8506         |               0.9822                |
|            mixnet_l             | 128 |  0.8687  |         1.0134         |        0.8686         |               1.0134                |
|           mnasnet_100           | 128 |  0.8683  |         1.0074         |        0.8684         |               1.0074                |
|           res2next50            | 128 |  0.866   |         0.9759         |         0.866         |               0.9759                |
|          cait_m36_384           |  4  |  0.8636  |         1.0068         |        0.8637         |               1.0073                |
|           fbnetc_100            | 128 |  0.8596  |         1.0104         |        0.8597         |               1.0104                |
|            pit_b_224            | 64  |  0.8578  |         1.0382         |        0.8566         |               1.0382                |
|          convnext_base          | 64  |  0.8505  |         1.0373         |        0.8317         |               1.0373                |
|            gernet_l             | 128 |  0.8499  |         1.0005         |        0.8497         |               1.0005                |
|     swsl_resnext101_32x16d      | 32  |  0.8477  |         1.0007         |        0.8477         |               1.0007                |
|         coat_lite_mini          | 128 |  0.8402  |         1.0437         |        0.8501         |               1.0437                |
|            lcnet_050            | 128 |  0.8273  |         1.0008         |        0.8174         |               1.0008                |
|          botnet26t_256          | 128 |  0.8239  |          1.0           |         0.824         |                 1.0                 |
|      xcit_large_24_p8_224       |  5  |  0.8228  |         1.0079         |        0.8263         |               1.0124                |
|           regnety_002           | 128 |  0.8165  |         1.0004         |        0.7848         |               1.0004                |
|            repvgg_a2            | 128 |  0.7738  |         1.0131         |        0.7738         |               1.0131                |
|         crossvit_9_240          | 128 |  0.7526  |         1.0019         |        0.7524         |               1.0019                |
|  swin_base_patch4_window7_224   | 64  |  0.7214  |         0.9303         |        0.7297         |               0.9303                |
|          jx_nest_base           | 32  |  0.6693  |         0.9905         |        0.6705         |               0.9905                |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Absolute latency (ms)

+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|              name               | bs  | inductor | inductor_no_cudagraphs | inductor_max_autotune | inductor_max_autotune_no_cudagraphs |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+
|        convmixer_768_32         | 32  | 300.0291 |        299.9443        |       297.8893        |              297.7707               |
|            hrnet_w18            | 128 | 208.3656 |        206.8841        |       201.1329        |              202.9471               |
|          pnasnet5large          | 16  | 174.5393 |         173.23         |       172.1158        |              169.8113               |
|           tf_mixnet_l           | 128 | 160.135  |        159.2144        |       159.3293        |              158.4715               |
|            mixnet_l             | 128 | 155.1819 |        154.0876        |       153.9369        |              153.1759               |
|          cait_m36_384           |  4  | 123.0772 |        123.3186        |       114.7842        |              114.6307               |
|           resnest101e           | 64  | 114.7176 |        121.007         |       114.3349        |              120.8923               |
|             dla102              | 128 | 112.2881 |        112.3634        |       111.9551        |              112.1236               |
|     swsl_resnext101_32x16d      | 32  | 111.9813 |        115.7084        |       111.6536        |              116.0687               |
|         poolformer_m36          | 64  | 106.9875 |        107.7497        |       107.0367        |              107.6677               |
|        tnt_s_patch16_224        | 128 | 106.6506 |        108.0809        |        96.3865        |               97.5664               |
|        adv_inception_v3         | 128 | 104.3768 |        105.2563        |        103.742        |               104.443               |
|       gluon_inception_v3        | 128 | 104.3702 |        104.9805        |       103.8921        |              105.0778               |
|          inception_v3           | 128 | 104.3155 |        105.2835        |       103.8708        |              104.6451               |
|        res2net50_14w_8s         | 128 | 102.1548 |        103.7138        |       100.6208        |              102.0375               |
|           convit_base           | 64  | 100.682  |        100.7395        |        94.6361        |               94.6613               |
|             dpn107              | 32  | 99.1128  |        95.5076         |        98.9991        |               95.5365               |
|           res2next50            | 128 | 91.9468  |        92.5855         |        91.9398        |               92.7046               |
|        gluon_xception65         | 32  |  91.748  |         91.625         |        90.8134        |               90.5599               |
|  swin_base_patch4_window7_224   | 64  | 89.0227  |        89.5743         |        83.7603        |               84.1753               |
|            fbnetv3_b            | 128 | 85.9862  |        84.4588         |        85.4503        |               83.5889               |
|          mixer_b16_224          | 128 | 85.6864  |        85.5349         |        83.5185        |               83.6657               |
|        res2net101_26w_4s        | 64  | 85.5716  |         92.931         |        84.9911        |               91.5106               |
|           dm_nfnet_f0           | 128 | 84.0895  |        86.9625         |        83.2779        |               86.2461               |
|            pit_b_224            | 64  | 81.6561  |        82.0702         |        72.7687        |               73.1262               |
|          convnext_base          | 64  | 80.0661  |         81.453         |        79.5788        |               80.357                |
|         visformer_small         | 128 | 77.2749  |        77.8055         |        75.2202        |               75.6284               |
|      beit_base_patch16_224      | 64  | 74.6681  |        74.5177         |        69.5741        |               69.1183               |
|            nfnet_l0             | 128 | 74.0506  |         76.767         |        73.9031        |               76.7771               |
|       eca_botnext26ts_256       | 128 | 73.7967  |         74.732         |        73.5375        |               74.313                |
|          cspdarknet53           | 64  | 73.5508  |        71.4896         |        72.9836        |               70.9455               |
|          gmlp_s16_224           | 128 | 73.5253  |        73.9639         |        72.5233        |               73.5151               |
|          jx_nest_base           | 32  | 71.9518  |        72.9016         |        63.6997        |               64.5556               |
|            gernet_l             | 128 | 71.1357  |        69.3325         |        70.6905        |               68.8367               |
|          botnet26t_256          | 128 | 71.0828  |         70.029         |        70.7261        |               69.7494               |
|           volo_d1_224           | 64  | 70.5427  |        71.4547         |        68.3347        |               69.2369               |
|      vit_base_patch16_224       | 64  | 69.7831  |        69.7801         |        64.1791        |               64.0354               |
|            repvgg_a2            | 128 | 68.0045  |        65.9019         |        67.5598        |               65.7685               |
| deit_base_distilled_patch16_224 | 64  | 67.5126  |        67.0506         |        63.7581        |               63.7308               |
|          gmixer_24_224          | 128 |  66.039  |        66.6008         |        61.471         |               62.2492               |
|       tf_efficientnet_b0        | 128 |  61.184  |        59.6981         |        61.3009        |               59.8215               |
|           fbnetc_100            | 128 | 61.1237  |        57.6254         |        59.6898        |               57.5408               |
|      xcit_large_24_p8_224       |  5  | 60.9077  |        76.2594         |        58.2081        |               77.2834               |
|           rexnet_100            | 128 | 59.3765  |        57.5236         |        59.0595        |               57.6692               |
|        twins_pcpvt_base         | 64  | 59.1109  |        68.5273         |        54.6172        |               63.0969               |
|            tinynet_a            | 128 | 57.7272  |         56.954         |        57.7552        |               56.4225               |
|         coat_lite_mini          | 128 | 57.6648  |        58.3542         |        54.0857        |               54.7672               |
|           mobilevit_s           | 64  | 57.2611  |         57.544         |        55.2786        |               54.8024               |
|        sebotnet33ts_256         | 64  |  51.564  |        50.6513         |        51.2031        |               50.5236               |
|          spnasnet_100           | 128 | 50.7093  |        48.5572         |        50.4032        |               48.4586               |
|         crossvit_9_240          | 128 | 49.1771  |        49.8118         |        44.0418        |               44.8204               |
|          ghostnet_100           | 128 | 48.4646  |        55.8543         |        48.3651        |               54.7459               |
|        ese_vovnet19b_dw         | 128 | 46.1538  |        45.5122         |        45.7289        |               45.0294               |
|         mobilenetv2_100         | 128 | 45.9662  |        44.2884         |        45.9857        |               44.5279               |
|           mnasnet_100           | 128 |  43.762  |        43.3129         |        43.6721        |               41.9699               |
|           selecsls42b           | 128 | 42.5461  |        42.6063         |        42.3623        |               42.4158               |
|      mobilenetv3_large_100      | 128 | 41.7814  |        41.7146         |        41.3158        |               41.6747               |
|          resmlp_12_224          | 128 | 41.6567  |        41.8046         |        37.6055        |               37.7592               |
|           regnety_002           | 128 | 26.9701  |        31.0044         |        26.8701        |               30.6785               |
|            lcnet_050            | 128 | 18.6049  |        21.6797         |        18.1966        |               21.8385               |
+---------------------------------+-----+----------+------------------------+-----------------------+-------------------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_104_14_04_23_performance_amp_147/torchbench_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_104_14_04_23_performance_amp_147/huggingface_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_104_14_04_23_performance_amp_147/timm_models_amp.png :

Build Summary

see more

Run name

day_104_14_04_23_performance_amp_147

Commit hashes

pytorch commit: 75f55ca63bd5623352c8eda8e31ff76ee5c960a7
pytorch commit date: 2023-04-13 00:45:48+00:00
torchbench commit: cd89d490ecbcca7d8ca50324522b31a1a198c753
torchbench commit date: 2023-04-13 11:05:33-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git75f55ca

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 2
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

@williamwen42
Copy link
Collaborator

Performance Dashboard for amp precision (Python 3.11)

Executive Summary

see more We evaluate different backends across three benchmark suites - torchbench, huggingface and timm. We run these experiments on A100 GPUs. Each experiment runs one iteration of forward pass and backward pass for training and forward pass only for inference. For accuracy, we check the numerical correctness of forward pass outputs and gradients by comparing with native pytorch. We measure speedup by normalizing against the performance of native pytorch. We report mean compilation latency numbers and peak memory footprint reduction ratio.

Caveats

  1. Batch size has been reduced to workaround OOM errors. Work is in progress to reduce peak memory footprint.
  2. Experiments do not cover dynamic shapes.
  3. Experimental setup does not have optimizer.

To measure performance, compilation latency and memory footprint reduction, we remove the models that fail accuracy checks.

Passrate

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          | 87%, 55/63 | 100%, 45/45 | 98%, 60/61  |
|       aot_eager        | 87%, 55/63 | 100%, 45/45 | 98%, 60/61  |
|        inductor        | 83%, 52/63 | 93%, 42/45  | 97%, 59/61  |
| inductor_no_cudagraphs | 84%, 53/63 | 98%, 44/45  | 98%, 60/61  |
+------------------------+------------+-------------+-------------+

Geometric mean speedup

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   1.00x    |    1.00x    |    1.00x    |
|        inductor        |   1.62x    |    1.65x    |    1.46x    |
| inductor_no_cudagraphs |   1.30x    |    1.58x    |    1.40x    |
+------------------------+------------+-------------+-------------+

Mean compilation time (seconds)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |    4.08    |    6.46     |    5.00     |
|       aot_eager        |    8.77    |    14.66    |    11.60    |
|        inductor        |   53.21    |    53.30    |    90.52    |
| inductor_no_cudagraphs |   58.03    |    52.81    |   102.89    |
+------------------------+------------+-------------+-------------+

Peak memory footprint compression ratio (higher is better)

+------------------------+------------+-------------+-------------+
|        Compiler        | torchbench | huggingface | timm_models |
+------------------------+------------+-------------+-------------+
|         eager          |   1.00x    |    1.00x    |    1.00x    |
|       aot_eager        |   0.99x    |    0.96x    |    1.00x    |
|        inductor        |   1.03x    |    0.98x    |    1.01x    |
| inductor_no_cudagraphs |   1.00x    |    1.01x    |    1.00x    |
+------------------------+------------+-------------+-------------+

Summary Statistics Diff

see more For each relevant compiler, we compare the summary statistics for the most 2 recent reports that actually run the compiler.

Current report name: /data/home/williamwen/cluster/oneoff_cron_logs/day_047_16_02_23_performance_amp_184

Previous report name: /data/home/williamwen/cluster/oneoff_cron_logs/day_040_09_02_23_performance_amp_322

Passrate diff

+------------------------+-------------+------------+------------+
|        compiler        |    suite    | prev_value | cur_value  |
+------------------------+-------------+------------+------------+
|        inductor        | torchbench  | 86%, 51/59 | 83%, 49/59 |
|        inductor        | huggingface | 91%, 41/45 | 91%, 41/45 |
|        inductor        | timm_models | 95%, 58/61 | 98%, 60/61 |
| inductor_no_cudagraphs | torchbench  | 86%, 51/59 | 83%, 49/59 |
| inductor_no_cudagraphs | huggingface | 98%, 44/45 | 96%, 43/45 |
| inductor_no_cudagraphs | timm_models | 95%, 58/61 | 98%, 60/61 |
+------------------------+-------------+------------+------------+

Geometric mean speedup diff

+------------------------+-------------+------------+-----------+
|        compiler        |    suite    | prev_value | cur_value |
+------------------------+-------------+------------+-----------+
|        inductor        | torchbench  |   1.60x    |   1.54x   |
|        inductor        | huggingface |   1.59x    |   1.57x   |
|        inductor        | timm_models |   1.37x    |   1.36x   |
| inductor_no_cudagraphs | torchbench  |   1.35x    |   1.28x   |
| inductor_no_cudagraphs | huggingface |   1.51x    |   1.51x   |
| inductor_no_cudagraphs | timm_models |   1.37x    |   1.34x   |
+------------------------+-------------+------------+-----------+

Warnings

see more We flag models where:
  • accuracy fails
  • speedup < 0.95x (NOTE: 0.0 speedup typically signifies a failure in the performance test)
  • compilation latency > 120 sec.
  • compression ratio < 0.9

Accuracy warnings

+-------------+-------------------------------+---------------+------------------------+
|    suite    |             name              |   inductor    | inductor_no_cudagraphs |
+-------------+-------------------------------+---------------+------------------------+
| torchbench  |        vision_maskrcnn        |  infra_error  |      infra_error       |
| torchbench  |        DALLE2_pytorch         |  infra_error  |      infra_error       |
| torchbench  |   detectron2_fcos_r_50_fpn    |  infra_error  |      infra_error       |
| torchbench  |              drq              |  infra_error  |      infra_error       |
| torchbench  |        pytorch_struct         |  infra_error  |      infra_error       |
| torchbench  |       soft_actor_critic       |  infra_error  |      infra_error       |
| torchbench  |       timm_efficientdet       |  infra_error  |      infra_error       |
| torchbench  |         torchrec_dlrm         |  infra_error  |      infra_error       |
| torchbench  |         hf_Longformer         |  fail_to_run  |      fail_to_run       |
| torchbench  |             llama             | fail_accuracy |     fail_accuracy      |
| huggingface | DebertaV2ForQuestionAnswering |  infra_error  |          pass          |
| timm_models |          convit_base          |  fail_to_run  |      fail_to_run       |
| timm_models |         cait_m36_384          |      OOM      |          pass          |
+-------------+-------------------------------+---------------+------------------------+

Performance speedup warnings

+-------------+-------------------------------+----------+------------------------+
|    suite    |             name              | inductor | inductor_no_cudagraphs |
+-------------+-------------------------------+----------+------------------------+
| torchbench  |       phlippe_densenet        | 1.917714 |        0.901889        |
| torchbench  |             dcgan             | 1.514868 |        0.895629        |
| torchbench  |       basic_gnn_edgecnn       | 1.316141 |          0.0           |
| torchbench  |   detectron2_fcos_r_50_fpn    |   0.0    |          0.0           |
| torchbench  |       timm_efficientdet       |   0.0    |          0.0           |
| torchbench  |       soft_actor_critic       |   0.0    |          0.0           |
| torchbench  |        pytorch_struct         |   0.0    |          0.0           |
| torchbench  |              drq              |   0.0    |          0.0           |
| torchbench  |         hf_Longformer         |   0.0    |          0.0           |
| torchbench  |        DALLE2_pytorch         |   0.0    |          0.0           |
| torchbench  |             moco              |   0.0    |          0.0           |
| torchbench  |             dlrm              |   0.0    |        1.226872        |
| torchbench  | timm_vision_transformer_large |   0.0    |        0.993088        |
| torchbench  |         torchrec_dlrm         |   0.0    |          0.0           |
| huggingface |      LayoutLMForMaskedLM      |   0.0    |        1.604682        |
| huggingface |     AllenaiLongformerBase     |   0.0    |          0.0           |
+-------------+-------------------------------+----------+------------------------+

Compilation latency (sec) warnings

+-------------+--------------------------------+------------+------------------------+
|    suite    |              name              |  inductor  | inductor_no_cudagraphs |
+-------------+--------------------------------+------------+------------------------+
| torchbench  |          hf_T5_large           | 164.083421 |       164.456502       |
| torchbench  |           hf_BigBird           | 163.680048 |       126.065458       |
| torchbench  |          densenet121           | 113.059074 |       124.351767       |
| torchbench  |       timm_efficientnet        | 110.753224 |       133.045389       |
| torchbench  |        phlippe_densenet        | 101.337409 |       153.191907       |
| huggingface | MobileBertForQuestionAnswering | 134.464177 |       134.520693       |
| huggingface |     MobileBertForMaskedLM      | 132.344597 |       133.560424       |
| timm_models |           hrnet_w18            | 215.300548 |       235.439188       |
| timm_models |           rexnet_100           | 198.835597 |       287.845509       |
| timm_models |          ghostnet_100          | 182.636547 |       232.169208       |
| timm_models |         pnasnet5large          | 147.549783 |       156.780616       |
| timm_models |          resnest101e           | 138.563704 |       158.368072       |
| timm_models |          mobilevit_s           | 135.482809 |       157.381025       |
| timm_models |           fbnetv3_b            | 131.883844 |       162.084463       |
| timm_models |       gluon_inception_v3       | 131.07415  |       158.07422        |
| timm_models |          tf_mixnet_l           | 128.487929 |       147.30566        |
| timm_models |       res2net101_26w_4s        | 128.054408 |       138.097871       |
| timm_models |        adv_inception_v3        | 126.647641 |       150.583826       |
| timm_models |          inception_v3          | 126.359993 |       154.997186       |
| timm_models |           tinynet_a            | 123.713257 |       143.520164       |
| timm_models |            mixnet_l            | 122.645935 |       147.809105       |
| timm_models |       tf_efficientnet_b0       | 119.095017 |       135.472967       |
| timm_models |     mobilenetv3_large_100      | 116.468401 |       141.481082       |
| timm_models |           fbnetc_100           | 105.643603 |       134.635195       |
| timm_models |          spnasnet_100          | 104.156516 |       131.219595       |
+-------------+--------------------------------+------------+------------------------+

Peak Memory Compression Ratio warnings

+-------------+-----------------------------------------+----------+------------------------+
|    suite    |                  name                   | inductor | inductor_no_cudagraphs |
+-------------+-----------------------------------------+----------+------------------------+
| torchbench  |            basic_gnn_edgecnn            | 1.268084 |          0.0           |
| torchbench  |             pytorch_stargan             | 0.893437 |        0.889299        |
| torchbench  |                resnet50                 | 0.890619 |        0.887016        |
| torchbench  |               timm_vovnet               | 0.888781 |        0.887004        |
| torchbench  |         timm_vision_transformer         | 0.85232  |        0.846964        |
| torchbench  |           speech_transformer            | 0.846621 |        0.844683        |
| torchbench  |           mobilenet_v3_large            | 0.788748 |        0.78255         |
| torchbench  |               mnasnet1_0                | 0.784332 |        0.774557        |
| torchbench  |             resnext50_32x4d             | 0.780881 |        0.771616        |
| torchbench  |              squeezenet1_1              | 0.776372 |        0.775402        |
| torchbench  |             LearningToPaint             | 0.757111 |         0.7482         |
| torchbench  |            phlippe_densenet             | 0.729494 |        0.713997        |
| torchbench  |               densenet121               | 0.691168 |        0.670463        |
| torchbench  |                resnet18                 | 0.618876 |        0.61026         |
| torchbench  |      pytorch_CycleGAN_and_pix2pix       | 0.603505 |        0.600365        |
| torchbench  |          functorch_dp_cifar10           | 0.453125 |        0.444502        |
| torchbench  |             phlippe_resnet              | 0.378591 |        0.36166         |
| torchbench  |        detectron2_fcos_r_50_fpn         |   0.0    |          0.0           |
| torchbench  |            timm_efficientdet            |   0.0    |          0.0           |
| torchbench  |            soft_actor_critic            |   0.0    |          0.0           |
| torchbench  |             pytorch_struct              |   0.0    |          0.0           |
| torchbench  |                   drq                   |   0.0    |          0.0           |
| torchbench  |      timm_vision_transformer_large      |   0.0    |        0.973508        |
| torchbench  |             DALLE2_pytorch              |   0.0    |          0.0           |
| torchbench  |                  moco                   |   0.0    |          0.0           |
| torchbench  |              hf_Longformer              |   0.0    |          0.0           |
| torchbench  |                  dlrm                   |   0.0    |        1.000856        |
| torchbench  |              torchrec_dlrm              |   0.0    |          0.0           |
| huggingface |            TrOCRForCausalLM             | 0.87395  |        0.881037        |
| huggingface | BlenderbotSmallForConditionalGeneration | 0.864986 |        0.897783        |
| huggingface |            PLBartForCausalLM            |  0.863   |        0.860945        |
| huggingface |           ElectraForCausalLM            | 0.861134 |        0.93223         |
| huggingface |     MobileBertForQuestionAnswering      | 0.857907 |        0.857131        |
| huggingface |          DistilBertForMaskedLM          | 0.851792 |        0.849938        |
| huggingface |       BlenderbotSmallForCausalLM        | 0.804903 |        0.803499        |
| huggingface |         Speech2Text2ForCausalLM         | 0.77739  |        0.775883        |
| huggingface |           LayoutLMForMaskedLM           |   0.0    |        0.924424        |
| huggingface |          AllenaiLongformerBase          |   0.0    |          0.0           |
| timm_models |             crossvit_9_240              | 0.871764 |        0.870197        |
| timm_models |               regnety_002               | 0.866936 |        0.862637        |
| timm_models |                lcnet_050                | 0.843427 |        0.838246        |
| timm_models |              jx_nest_base               | 0.733958 |        0.732922        |
+-------------+-----------------------------------------+----------+------------------------+

Metrics over time

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_144_24_05_23_performance_amp_246/comp_time_over_time.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_144_24_05_23_performance_amp_246/geomean_over_time.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_144_24_05_23_performance_amp_246/passrate_over_time.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_144_24_05_23_performance_amp_246/memory_over_time.png :

Recent Regressions

see more For each relevant compiler, we compare the most recent 2 reports (that actually run the compiler) to find previously unflagged models that are now flagged as problematic (according to the 'Warnings' section).

Regressions for torchbench

Current report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/oneoff_cron_logs/day_047_16_02_23_performance_amp_184

Previous report name (compiler: inductor_no_cudagraphs, suite: torchbench): /data/home/williamwen/cluster/oneoff_cron_logs/day_040_09_02_23_performance_amp_322

Current report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/oneoff_cron_logs/day_047_16_02_23_performance_amp_184

Previous report name (compiler: inductor, suite: torchbench): /data/home/williamwen/cluster/oneoff_cron_logs/day_040_09_02_23_performance_amp_322

Accuracy regressions

+------------------------+------------------------------+-------------+---------------+
|        compiler        |             name             | prev_status |  cur_status   |
+------------------------+------------------------------+-------------+---------------+
| inductor_no_cudagraphs | pytorch_CycleGAN_and_pix2pix |    pass     | fail_accuracy |
| inductor_no_cudagraphs |        phlippe_resnet        |    pass     | fail_accuracy |
|        inductor        | pytorch_CycleGAN_and_pix2pix |    pass     | fail_accuracy |
|        inductor        |        phlippe_resnet        |    pass     | fail_accuracy |
+------------------------+------------------------------+-------------+---------------+

Performance speedup regressions

+------------------------+-------------------------------+-------------+------------+
|        compiler        |             name              | prev_status | cur_status |
+------------------------+-------------------------------+-------------+------------+
| inductor_no_cudagraphs |         lennard_jones         |   1.0479    |   0.9232   |
| inductor_no_cudagraphs |          tts_angular          |   1.0306    |   0.8749   |
| inductor_no_cudagraphs |             dlrm              |   1.3205    |    0.0     |
| inductor_no_cudagraphs | timm_vision_transformer_large |    1.103    |    0.0     |
|        inductor        |          tts_angular          |   1.0226    |   0.9018   |
|        inductor        |         hf_Longformer         |   1.6303    |    0.0     |
|        inductor        | timm_vision_transformer_large |    1.074    |    0.0     |
+------------------------+-------------------------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+--------------------+-------------+------------+
|        compiler        |        name        | prev_status | cur_status |
+------------------------+--------------------+-------------+------------+
| inductor_no_cudagraphs |  phlippe_densenet  |   35.5879   |  177.3165  |
| inductor_no_cudagraphs | timm_efficientnet  |   75.8329   |  152.4158  |
| inductor_no_cudagraphs |    densenet121     |   82.9904   |  147.3931  |
| inductor_no_cudagraphs | mobilenet_v3_large |   65.4633   |  145.8978  |
| inductor_no_cudagraphs |   hf_GPT2_large    |   78.3684   |  140.4915  |
| inductor_no_cudagraphs |    mobilenet_v2    |   34.0223   |  139.8573  |
| inductor_no_cudagraphs |       yolov3       |   52.7001   |  125.7029  |
| inductor_no_cudagraphs |     hf_BigBird     |  117.2348   |  120.1973  |
|        inductor        |  phlippe_densenet  |   37.1538   |  171.3755  |
|        inductor        | timm_efficientnet  |   76.7066   |  150.8667  |
|        inductor        | mobilenet_v3_large |   68.574    |  148.1743  |
|        inductor        |    densenet121     |   78.3237   |  142.1128  |
|        inductor        |   hf_GPT2_large    |   80.0068   |  141.2736  |
|        inductor        |    mobilenet_v2    |   34.8149   |  136.3726  |
|        inductor        |       yolov3       |   53.8302   |  124.8586  |
+------------------------+--------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+------------------------+-------------------------------+-------------+------------+
|        compiler        |             name              | prev_status | cur_status |
+------------------------+-------------------------------+-------------+------------+
| inductor_no_cudagraphs |      speech_transformer       |   1.0888    |   0.869    |
| inductor_no_cudagraphs |         squeezenet1_1         |   1.1148    |   0.7678   |
| inductor_no_cudagraphs |        LearningToPaint        |   0.9966    |   0.7466   |
| inductor_no_cudagraphs |       phlippe_densenet        |   1.0062    |   0.7179   |
| inductor_no_cudagraphs |          densenet121          |   0.9945    |   0.6035   |
| inductor_no_cudagraphs | pytorch_CycleGAN_and_pix2pix  |   1.0224    |   0.6004   |
| inductor_no_cudagraphs |        phlippe_resnet         |   1.0037    |   0.3443   |
| inductor_no_cudagraphs | timm_vision_transformer_large |   0.9762    |    0.0     |
| inductor_no_cudagraphs |             dlrm              |   1.0009    |    0.0     |
|        inductor        |      shufflenet_v2_x1_0       |   0.9343    |   0.8656   |
|        inductor        |      speech_transformer       |   1.0825    |   0.8651   |
+------------------------+-------------------------------+-------------+------------+

Regressions for huggingface

Current report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/oneoff_cron_logs/day_047_16_02_23_performance_amp_184

Previous report name (compiler: inductor_no_cudagraphs, suite: huggingface): /data/home/williamwen/cluster/oneoff_cron_logs/day_040_09_02_23_performance_amp_322

Current report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/oneoff_cron_logs/day_047_16_02_23_performance_amp_184

Previous report name (compiler: inductor, suite: huggingface): /data/home/williamwen/cluster/oneoff_cron_logs/day_040_09_02_23_performance_amp_322

Accuracy regressions

+------------------------+-------------------------------+-------------+---------------+
|        compiler        |             name              | prev_status |  cur_status   |
+------------------------+-------------------------------+-------------+---------------+
| inductor_no_cudagraphs | DebertaV2ForQuestionAnswering |    pass     | fail_accuracy |
+------------------------+-------------------------------+-------------+---------------+

Performance speedup regressions

+------------------------+-------------------------------+-------------+------------+
|        compiler        |             name              | prev_status | cur_status |
+------------------------+-------------------------------+-------------+------------+
| inductor_no_cudagraphs |      DebertaForMaskedLM       |   0.9907    |   0.9352   |
| inductor_no_cudagraphs |     AllenaiLongformerBase     |   1.6455    |    0.0     |
|        inductor        | DebertaV2ForQuestionAnswering |   1.0834    |   0.9392   |
+------------------------+-------------------------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+--------------------------------+-------------+------------+
|        compiler        |              name              | prev_status | cur_status |
+------------------------+--------------------------------+-------------+------------+
| inductor_no_cudagraphs |     MobileBertForMaskedLM      |  113.7818   |  140.0078  |
| inductor_no_cudagraphs |  MT5ForConditionalGeneration   |   89.1825   |  135.7902  |
| inductor_no_cudagraphs | MobileBertForQuestionAnswering |  104.0908   |  135.3323  |
| inductor_no_cudagraphs | M2M100ForConditionalGeneration |   92.5092   |  121.2122  |
|        inductor        |     MobileBertForMaskedLM      |  115.4381   |  142.8343  |
|        inductor        |  MT5ForConditionalGeneration   |   92.2199   |  138.8022  |
|        inductor        | MobileBertForQuestionAnswering |  108.0764   |  138.2921  |
|        inductor        | M2M100ForConditionalGeneration |   96.6702   |  127.9952  |
+------------------------+--------------------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+------------------------+-----------------------+-------------+------------+
|        compiler        |         name          | prev_status | cur_status |
+------------------------+-----------------------+-------------+------------+
| inductor_no_cudagraphs | AllenaiLongformerBase |   0.9124    |    0.0     |
+------------------------+-----------------------+-------------+------------+

Regressions for timm_models

Current report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/oneoff_cron_logs/day_047_16_02_23_performance_amp_184

Previous report name (compiler: inductor_no_cudagraphs, suite: timm_models): /data/home/williamwen/cluster/oneoff_cron_logs/day_040_09_02_23_performance_amp_322

Current report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/oneoff_cron_logs/day_047_16_02_23_performance_amp_184

Previous report name (compiler: inductor, suite: timm_models): /data/home/williamwen/cluster/oneoff_cron_logs/day_040_09_02_23_performance_amp_322

Performance speedup regressions

+----------+---------------+-------------+------------+
| compiler |     name      | prev_status | cur_status |
+----------+---------------+-------------+------------+
| inductor | pnasnet5large |   1.1413    |   0.9452   |
+----------+---------------+-------------+------------+

Compilation latency (sec) regressions

+------------------------+-----------------------+-------------+------------+
|        compiler        |         name          | prev_status | cur_status |
+------------------------+-----------------------+-------------+------------+
| inductor_no_cudagraphs |      rexnet_100       |   55.9265   |  315.721   |
| inductor_no_cudagraphs |     ghostnet_100      |   62.387    |  255.9795  |
| inductor_no_cudagraphs |       fbnetv3_b       |    65.65    |  184.9282  |
| inductor_no_cudagraphs |      tf_mixnet_l      |   61.3387   |  173.799   |
| inductor_no_cudagraphs | mobilenetv3_large_100 |   41.7463   |  173.3434  |
| inductor_no_cudagraphs |       tinynet_a       |   49.8997   |  172.5106  |
| inductor_no_cudagraphs |      mobilevit_s      |   79.2543   |  171.8832  |
| inductor_no_cudagraphs |       mixnet_l        |   58.8432   |  171.804   |
| inductor_no_cudagraphs |     inception_v3      |   58.888    |  171.7195  |
| inductor_no_cudagraphs |      resnest101e      |   100.691   |  171.5599  |
| inductor_no_cudagraphs |   adv_inception_v3    |   61.8915   |  171.0537  |
| inductor_no_cudagraphs |  gluon_inception_v3   |   58.6345   |  169.8777  |
| inductor_no_cudagraphs |  tf_efficientnet_b0   |   45.0207   |  167.3042  |
| inductor_no_cudagraphs | xcit_large_24_p8_224  |  105.4047   |  160.8476  |
| inductor_no_cudagraphs |   res2net101_26w_4s   |   92.7082   |  159.3268  |
| inductor_no_cudagraphs |   twins_pcpvt_base    |  117.3875   |  157.5274  |
| inductor_no_cudagraphs |     pnasnet5large     |  100.0627   |  152.5736  |
| inductor_no_cudagraphs |      fbnetc_100       |   41.5344   |  152.4179  |
| inductor_no_cudagraphs |     spnasnet_100      |   40.7063   |  151.848   |
| inductor_no_cudagraphs |    mobilenetv2_100    |   35.9906   |  141.7804  |
| inductor_no_cudagraphs |      mnasnet_100      |   36.0892   |  134.8647  |
| inductor_no_cudagraphs |   res2net50_14w_8s    |   85.8919   |  131.2387  |
| inductor_no_cudagraphs |     cait_m36_384      |  107.5555   |  126.2237  |
|        inductor        |      rexnet_100       |   59.9265   |  309.363   |
|        inductor        |     ghostnet_100      |   63.4552   |  258.8826  |
|        inductor        |       fbnetv3_b       |   66.6791   |  186.0991  |
|        inductor        |       tinynet_a       |   50.1639   |  173.8911  |
|        inductor        |   adv_inception_v3    |   59.5714   |  173.5962  |
|        inductor        |       mixnet_l        |   61.2409   |  173.3327  |
|        inductor        |      resnest101e      |   102.515   |  173.1165  |
|        inductor        | mobilenetv3_large_100 |   42.9618   |  173.0226  |
|        inductor        |      mobilevit_s      |   79.8525   |  172.8851  |
|        inductor        |     inception_v3      |   59.6333   |  172.2805  |
|        inductor        |      tf_mixnet_l      |   62.0863   |  172.127   |
|        inductor        |  gluon_inception_v3   |   59.1854   |  168.4864  |
|        inductor        |  tf_efficientnet_b0   |   45.4103   |  161.3287  |
|        inductor        | xcit_large_24_p8_224  |  108.3117   |  160.3484  |
|        inductor        |   res2net101_26w_4s   |   93.3066   |  159.9451  |
|        inductor        |   twins_pcpvt_base    |  119.6717   |  158.7075  |
|        inductor        |     pnasnet5large     |  105.0551   |  158.0835  |
|        inductor        |     spnasnet_100      |   42.2941   |  150.974   |
|        inductor        |      fbnetc_100       |   42.1337   |  150.8096  |
|        inductor        |    mobilenetv2_100    |   36.7172   |  142.5709  |
|        inductor        |      mnasnet_100      |   37.6126   |  134.0404  |
|        inductor        |   res2net50_14w_8s    |   92.4949   |  133.6794  |
|        inductor        |     cait_m36_384      |  112.9868   |  127.3313  |
+------------------------+-----------------------+-------------+------------+

Peak Memory Compression Ratio regressions

+------------------------+--------------+-------------+------------+
|        compiler        |     name     | prev_status | cur_status |
+------------------------+--------------+-------------+------------+
| inductor_no_cudagraphs | regnety_002  |   1.0009    |   0.8625   |
| inductor_no_cudagraphs |  lcnet_050   |   1.0001    |   0.8411   |
|        inductor        | ghostnet_100 |   0.9077    |   0.8805   |
+------------------------+--------------+-------------+------------+

torchbench suite with amp precision

see more

Performance speedup

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|       functorch_dp_cifar10        |  64  | 0.990581 |  0.54021  | 3.712228 |        1.413733        |
|           BERT_pytorch            |  16  | 1.006936 | 0.450595  | 3.463346 |        2.108239        |
|            hf_BigBird             |  2   | 0.981411 | 0.415691  | 2.859171 |        1.640458        |
|           basic_gnn_gin           |  1   | 1.032703 | 0.568248  | 2.732045 |        1.429185        |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.984672 | 0.500195  | 2.477184 |        1.870084        |
|            densenet121            |  4   | 0.996921 | 0.423079  | 2.444519 |        1.088812        |
|            hf_T5_large            |  2   | 1.017803 | 0.439253  | 2.437908 |        2.023823        |
|             hf_Albert             |  8   | 1.000671 | 0.702945  | 2.363868 |        2.341655        |
|              hf_Bert              |  4   | 1.028005 | 0.463149  | 2.009287 |        1.640587        |
|        mobilenet_v3_large         |  32  | 1.003432 | 0.504719  | 1.990287 |         1.1577         |
|         timm_efficientnet         |  32  | 1.013296 | 0.456372  | 1.983019 |        1.114798        |
|         phlippe_densenet          | 128  | 0.995252 | 0.491501  | 1.917714 |        0.901889        |
|              hf_GPT2              |  4   | 1.017823 | 0.571805  | 1.888901 |        1.90254         |
|           squeezenet1_1           |  32  | 0.98282  | 0.597015  | 1.86335  |        1.298258        |
|           hf_GPT2_large           |  4   | 0.998962 | 0.753518  | 1.741626 |        1.719248        |
|           lennard_jones           | 1000 | 0.915411 | 0.438693  | 1.738416 |        0.97385         |
|          phlippe_resnet           | 128  | 0.98917  |  0.5133   | 1.726683 |        1.09865         |
|          resnext50_32x4d          |  8   | 0.994479 | 0.422441  | 1.726142 |        0.996377        |
|               hf_T5               |  8   | 0.999187 | 0.832055  | 1.714156 |        1.730936        |
|           hf_Bert_large           |  4   | 1.035508 |  0.46677  | 1.677549 |        1.626562        |
|          basic_gnn_sage           |  1   | 1.029749 | 0.547532  |  1.677   |        1.355606        |
|              hf_Bart              |  4   | 1.016383 | 0.452196  | 1.662917 |        1.214776        |
|           timm_resnest            |  32  | 0.997901 | 0.696343  | 1.650129 |        1.531564        |
|            mnasnet1_0             |  32  | 0.994613 | 0.477843  | 1.648648 |        1.068923        |
|            timm_nfnet             | 128  | 0.999422 | 0.989363  | 1.622877 |        1.526521        |
|             resnet18              |  16  | 0.993864 | 0.473616  | 1.594736 |        1.029539        |
| attention_is_all_you_need_pytorch | 256  | 1.003547 | 0.479328  | 1.593045 |        1.705072        |
|           mobilenet_v2            |  96  | 0.999291 | 0.681107  | 1.581873 |        1.386965        |
|        shufflenet_v2_x1_0         | 128  | 0.997039 | 0.529758  | 1.57826  |        1.231394        |
|      timm_vision_transformer      |  32  | 0.995796 | 0.462421  | 1.572057 |        1.290214        |
|               dcgan               |  32  | 0.934627 | 0.462908  | 1.514868 |        0.895629        |
|           hf_DistilBert           |  8   | 0.999675 | 0.659354  | 1.502309 |        1.544261        |
|           fastNLP_Bert            |  6   | 0.972396 | 0.512643  | 1.482898 |        1.453191        |
|        speech_transformer         |  32  | 0.998451 | 0.444351  | 1.44784  |        1.570706        |
|          LearningToPaint          |  96  | 0.99264  | 0.534404  | 1.363809 |        1.075582        |
|           pytorch_unet            |  1   | 0.999237 | 0.231561  | 1.352636 |        1.330816        |
|         basic_gnn_edgecnn         |  1   | 0.991719 | 0.721736  | 1.316141 |          0.0           |
|           basic_gnn_gcn           |  1   | 0.933433 | 0.502323  | 1.305415 |        1.217759        |
|          pytorch_stargan          |  16  | 0.995532 | 0.512034  | 1.289811 |        1.262911        |
|            timm_vovnet            |  32  | 1.022856 | 0.586446  | 1.271282 |        1.185148        |
|               vgg16               |  64  | 0.999728 | 0.990964  | 1.261118 |        1.251211        |
|              yolov3               |  16  | 0.999093 | 0.698116  | 1.239094 |        1.221426        |
|             resnet50              |  32  | 0.998222 | 0.529793  | 1.238431 |        1.084923        |
|             resnet152             |  32  | 0.997807 | 0.477685  | 1.211381 |        1.038721        |
|        Background_Matting         |  4   | 0.999552 | 0.155649  | 1.19138  |        1.181734        |
|            hf_Reformer            |  4   | 0.995598 |  0.8563   | 1.174101 |        1.14423         |
|            timm_regnet            |  32  | 1.014454 | 0.631259  | 1.153648 |        1.062594        |
|              alexnet              | 128  | 0.998507 | 0.968282  | 1.141633 |        1.137853        |
|            Super_SloMo            |  6   | 0.999032 | 0.207671  |  1.124   |        1.09898         |
|              demucs               |  4   | 0.999256 | 0.999251  | 1.060159 |        1.038659        |
|            tts_angular            |  64  | 0.974489 | 0.830191  | 0.990276 |        1.000686        |
|      nvidia_deeprecommender       | 256  | 0.998672 | 0.999366  | 0.979605 |        1.018991        |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|         timm_efficientdet         |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|         soft_actor_critic         |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|          pytorch_struct           |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|                drq                |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|           hf_Longformer           |  2   | 1.016201 | 0.413757  |   0.0    |          0.0           |
|          DALLE2_pytorch           |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|               moco                |  32  | 0.982534 |    0.0    |   0.0    |          0.0           |
|               dlrm                | 1024 | 0.964046 | 0.515002  |   0.0    |        1.226872        |
|   timm_vision_transformer_large   |  32  | 1.000023 | 0.979788  |   0.0    |        0.993088        |
|           torchrec_dlrm           |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

Accuracy

+-----------------------------------+----+------------------+------------------+------------------+------------------------+
|               name                | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------+----+------------------+------------------+------------------+------------------------+
|        Background_Matting         | 4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            hf_T5_large            | 4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|   timm_vision_transformer_large   | 4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|           hf_GPT2_large           | 4  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|        shufflenet_v2_x1_0         | 4  |       pass       |       pass       |       pass       |          pass          |
|               moco                | 4  |       pass       |       pass       |       pass       |          pass          |
|         phlippe_densenet          | 4  |       pass       |       pass       |       pass       |          pass          |
|          phlippe_resnet           | 4  |       pass       |       pass       |       pass       |          pass          |
|   pytorch_CycleGAN_and_pix2pix    | 1  |       pass       |       pass       |       pass       |          pass          |
|          pytorch_stargan          | 16 |       pass       |       pass       |       pass       |          pass          |
|           pytorch_unet            | 2  |       pass       |       pass       |       pass       |          pass          |
|             resnet152             | 4  |       pass       |       pass       |       pass       |          pass          |
|             resnet18              | 4  |       pass       |       pass       |       pass       |          pass          |
|             resnet50              | 4  |       pass       |       pass       |       pass       |          pass          |
|          resnext50_32x4d          | 4  |       pass       |       pass       |       pass       |          pass          |
|           squeezenet1_1           | 4  |       pass       |       pass       |       pass       |          pass          |
|        speech_transformer         | 32 |       pass       |       pass       |       pass       |          pass          |
|           mobilenet_v2            | 4  |       pass       |       pass       |       pass       |          pass          |
|         timm_efficientnet         | 4  |       pass       |       pass       |       pass       |          pass          |
|            timm_nfnet             | 4  |       pass       |       pass       |       pass       |          pass          |
|            timm_regnet            | 4  |       pass       |       pass       |       pass       |          pass          |
|           timm_resnest            | 4  |       pass       |       pass       |       pass       |          pass          |
|      timm_vision_transformer      | 4  |       pass       |       pass       |       pass       |          pass          |
|            timm_vovnet            | 4  |       pass       |       pass       |       pass       |          pass          |
|            tts_angular            | 4  |       pass       |       pass       |       pass       |          pass          |
|               vgg16               | 4  |       pass       |       pass       |       pass       |          pass          |
|              yolov3               | 4  |       pass       |       pass       |       pass       |          pass          |
|           BERT_pytorch            | 4  |  fail_accuracy   |       pass       |       pass       |          pass          |
|        mobilenet_v3_large         | 4  |       pass       |       pass       |       pass       |          pass          |
|      nvidia_deeprecommender       | 4  |       pass       |       pass       |       pass       |          pass          |
|            mnasnet1_0             | 4  |       pass       |       pass       |       pass       |          pass          |
|               dlrm                | 4  |       pass       |       pass       |       pass       |          pass          |
|          LearningToPaint          | 4  |       pass       |       pass       |       pass       |          pass          |
|            Super_SloMo            | 4  |       pass       |       pass       |       pass       |          pass          |
|              alexnet              | 4  |       pass       |       pass       |       pass       |          pass          |
| attention_is_all_you_need_pytorch | 4  |       pass       |       pass       |       pass       |          pass          |
|         basic_gnn_edgecnn         | 1  |       pass       |       pass       |       pass       |          pass          |
|           basic_gnn_gcn           | 1  |       pass       |       pass       |       pass       |          pass          |
|           basic_gnn_gin           | 1  |       pass       |       pass       |       pass       |          pass          |
|           lennard_jones           | 4  |       pass       |       pass       |       pass       |          pass          |
|               dcgan               | 4  |       pass       |       pass       |       pass       |          pass          |
|              demucs               | 4  |       pass       |       pass       |       pass       |          pass          |
|            densenet121            | 4  |       pass       |       pass       |       pass       |          pass          |
|          basic_gnn_sage           | 1  |       pass       |       pass       |       pass       |          pass          |
|           fastNLP_Bert            | 4  |       pass       |       pass       |       pass       |          pass          |
|           hf_DistilBert           | 4  |       pass       |       pass       |       pass       |          pass          |
|       functorch_dp_cifar10        | 4  |       pass       |       pass       |       pass       |          pass          |
|               hf_T5               | 4  |       pass       |       pass       |       pass       |          pass          |
|            hf_Reformer            | 4  |       pass       |       pass       |       pass       |          pass          |
|              hf_GPT2              | 2  |       pass       |       pass       |       pass       |          pass          |
|            hf_T5_base             | 4  |       pass       |       pass       |       pass       |          pass          |
|            hf_BigBird             | 4  |       pass       |       pass       |       pass       |          pass          |
|           hf_Bert_large           | 4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bert              | 4  |       pass       |       pass       |       pass       |          pass          |
|              hf_Bart              | 4  |       pass       |       pass       |       pass       |          pass          |
|             hf_Albert             | 4  |       pass       |       pass       |       pass       |          pass          |
|          vision_maskrcnn          | 1  |       pass       |       pass       |   infra_error    |      infra_error       |
|          DALLE2_pytorch           | 0  |   infra_error    |   infra_error    |   infra_error    |      infra_error       |
|     detectron2_fcos_r_50_fpn      | 0  |   infra_error    |   infra_error    |   infra_error    |      infra_error       |
|                drq                | 0  |   infra_error    |   infra_error    |   infra_error    |      infra_error       |
|          pytorch_struct           | 0  |   infra_error    |   infra_error    |   infra_error    |      infra_error       |
|         soft_actor_critic         | 0  |   infra_error    |   infra_error    |   infra_error    |      infra_error       |
|         timm_efficientdet         | 0  |   infra_error    |   infra_error    |   infra_error    |      infra_error       |
|           torchrec_dlrm           | 0  |   infra_error    |   infra_error    |   infra_error    |      infra_error       |
|           hf_Longformer           | 4  |       pass       |       pass       |   fail_to_run    |      fail_to_run       |
|               llama               | 4  |  fail_accuracy   |       pass       |  fail_accuracy   |     fail_accuracy      |
+-----------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------+------+-----------+-----------+------------+------------------------+
|               name                |  bs  |   eager   | aot_eager |  inductor  | inductor_no_cudagraphs |
+-----------------------------------+------+-----------+-----------+------------+------------------------+
|            hf_T5_large            |  2   | 31.439603 | 57.802168 | 164.083421 |       164.456502       |
|            hf_BigBird             |  2   | 14.210805 | 39.864807 | 163.680048 |       126.065458       |
|            densenet121            |  4   |  6.08875  | 17.663866 | 113.059074 |       124.351767       |
|         timm_efficientnet         |  32  | 3.993969  | 9.531764  | 110.753224 |       133.045389       |
|        mobilenet_v3_large         |  32  | 2.413839  | 6.663535  | 102.873356 |       106.446296       |
|           hf_GPT2_large           |  4   | 13.364166 | 27.884342 | 101.458251 |       101.483187       |
|         phlippe_densenet          | 128  | 2.547705  |  6.22488  | 101.337409 |       153.191907       |
|              yolov3               |  16  | 2.648541  | 9.837036  | 94.954118  |       110.717039       |
|             resnet152             |  32  |  6.50623  | 17.380757 | 94.415224  |       98.678362        |
|           mobilenet_v2            |  96  | 2.221713  | 6.185012  | 92.460311  |       101.414281       |
|           timm_resnest            |  32  | 1.375978  | 3.281833  |  72.12699  |       92.236499        |
|            mnasnet1_0             |  32  | 2.233631  | 6.156288  | 70.783563  |       85.075042        |
|        speech_transformer         |  32  | 4.408621  | 12.618386 | 70.360211  |       71.684556        |
|            timm_nfnet             | 128  | 5.738571  | 9.848938  | 64.911987  |       69.544176        |
|        shufflenet_v2_x1_0         | 128  | 2.541028  | 6.612881  | 64.884759  |       69.163346        |
|            timm_regnet            |  32  | 6.148549  | 11.355753 | 63.242359  |       66.675486        |
|           hf_Bert_large           |  4   | 8.977086  | 19.504087 | 60.793473  |       60.410969        |
|        Background_Matting         |  4   | 2.332617  | 9.612557  | 58.472384  |       66.595764        |
| attention_is_all_you_need_pytorch | 256  | 3.505044  | 9.522789  |  57.67811  |       58.422907        |
|           BERT_pytorch            |  16  | 3.713043  | 10.090656 | 52.195144  |       53.596293        |
|            timm_vovnet            |  32  | 3.107935  | 6.053281  | 49.165858  |       57.570949        |
|             resnet50              |  32  | 2.346468  | 7.000923  | 48.652712  |       55.526006        |
|           fastNLP_Bert            |  6   | 4.164716  | 10.128299 | 48.572427  |       46.713766        |
|               hf_T5               |  8   | 4.915321  | 11.419154 | 48.113126  |       45.577417        |
|           pytorch_unet            |  1   | 1.076603  | 3.689207  | 47.830672  |       57.432964        |
|              hf_Bart              |  4   | 4.546189  | 11.59483  | 47.061006  |       49.216481        |
|            hf_Reformer            |  4   |  4.48975  | 6.167701  | 46.172077  |       41.411991        |
|          resnext50_32x4d          |  8   | 2.366839  | 5.909192  | 42.762865  |       45.120019        |
|       functorch_dp_cifar10        |  64  | 0.887442  | 2.125804  | 42.667529  |       54.300915        |
|            Super_SloMo            |  6   | 2.285815  | 8.035609  | 41.629987  |       43.139268        |
|              hf_GPT2              |  4   | 4.747487  |  8.90541  |  39.97558  |       40.067813        |
|          pytorch_stargan          |  16  | 0.865556  | 2.822374  | 39.396471  |       45.636277        |
|      timm_vision_transformer      |  32  | 2.302068  | 5.710583  | 39.236317  |       41.092706        |
|             resnet18              |  16  | 0.946753  | 2.344216  | 38.098066  |       44.582691        |
|             hf_Albert             |  8   | 2.360897  | 7.744948  |  36.59468  |       39.796835        |
|          LearningToPaint          |  96  | 1.035538  | 2.504758  | 35.489835  |       40.383513        |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 0.922841  | 2.633678  | 34.938649  |       37.547208        |
|              hf_Bert              |  4   |  4.12986  | 9.508273  | 34.485988  |       36.734931        |
|           hf_DistilBert           |  8   |  2.77078  | 4.502302  | 29.687752  |       30.247295        |
|          phlippe_resnet           | 128  | 0.953731  | 2.422928  | 28.932409  |       33.693714        |
|              demucs               |  4   | 1.100834  | 1.852201  | 27.642722  |       27.286456        |
|           squeezenet1_1           |  32  | 0.694507  | 1.502967  | 20.689336  |       24.304327        |
|           basic_gnn_gcn           |  1   | 0.709566  | 0.960833  | 16.970181  |       15.818721        |
|         basic_gnn_edgecnn         |  1   | 1.282825  | 2.266287  | 16.496915  |          0.0           |
|               vgg16               |  64  | 0.420619  | 0.901943  | 15.236677  |        16.48442        |
|              alexnet              | 128  | -0.653623 | 0.619124  | 14.853093  |       15.645378        |
|      nvidia_deeprecommender       | 256  | 0.322201  | 0.617053  | 12.085651  |       10.625166        |
|          basic_gnn_sage           |  1   | 0.617571  | 0.798499  | 11.174681  |        9.523117        |
|               dcgan               |  32  | 0.338477  | 0.591528  |  11.05737  |       10.258963        |
|           basic_gnn_gin           |  1   | 0.620047  | 0.898669  |  9.574421  |        9.717836        |
|           lennard_jones           | 1000 | 0.258804  | 0.486293  |  8.655805  |        8.538321        |
|            tts_angular            |  64  | 0.300747  |  0.39467  |  8.265909  |        9.247066        |
|               dlrm                | 1024 | 0.372334  | 1.616226  |    0.0     |        9.201727        |
|   timm_vision_transformer_large   |  32  | 7.192703  | 17.534214 |    0.0     |       109.763888       |
|           hf_Longformer           |  2   | 7.310068  | 32.69951  |    0.0     |          0.0           |
|               moco                |  32  | 24.68456  |    0.0    |    0.0     |          0.0           |
|          DALLE2_pytorch           |  0   |    0.0    |    0.0    |    0.0     |          0.0           |
|     detectron2_fcos_r_50_fpn      |  0   |    0.0    |    0.0    |    0.0     |          0.0           |
|                drq                |  0   |    0.0    |    0.0    |    0.0     |          0.0           |
|          pytorch_struct           |  0   |    0.0    |    0.0    |    0.0     |          0.0           |
|         soft_actor_critic         |  0   |    0.0    |    0.0    |    0.0     |          0.0           |
|         timm_efficientdet         |  0   |    0.0    |    0.0    |    0.0     |          0.0           |
|           torchrec_dlrm           |  0   |    0.0    |    0.0    |    0.0     |          0.0           |
+-----------------------------------+------+-----------+-----------+------------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------+------+----------+-----------+----------+------------------------+
|               name                |  bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------+------+----------+-----------+----------+------------------------+
|           basic_gnn_gcn           |  1   |   1.0    | 1.168533  | 3.650016 |        3.059778        |
|          basic_gnn_sage           |  1   | 1.028514 |    1.0    | 2.008225 |        1.65962         |
|           basic_gnn_gin           |  1   | 1.002292 | 0.974245  | 1.983976 |        1.788851        |
|         basic_gnn_edgecnn         |  1   | 1.000679 | 1.136944  | 1.268084 |          0.0           |
|             hf_Albert             |  8   | 0.999911 | 0.973313  | 1.264965 |        1.262069        |
|            Super_SloMo            |  6   | 1.011702 |  1.01433  | 1.213417 |        1.22401         |
|            hf_BigBird             |  2   | 1.004615 | 0.994082  | 1.160882 |        1.155953        |
|           BERT_pytorch            |  16  | 1.000289 | 1.004587  | 1.11184  |        1.096388        |
|           mobilenet_v2            |  96  | 0.999635 |  0.95159  | 1.108167 |        1.102092        |
|           fastNLP_Bert            |  6   | 1.000268 | 0.991456  | 1.098417 |        1.08374         |
|            hf_T5_large            |  2   | 0.999967 | 1.015386  | 1.093518 |        1.100758        |
|           hf_GPT2_large           |  4   | 0.99993  | 0.968515  | 1.081592 |        1.118083        |
|            timm_nfnet             | 128  | 0.91347  | 0.989046  | 1.076428 |        1.072757        |
|           lennard_jones           | 1000 |   1.0    | 1.000112  | 1.068689 |        0.999804        |
|              hf_GPT2              |  4   | 1.000028 | 0.959248  | 1.065758 |        1.101962        |
|               hf_T5               |  8   | 0.999954 | 0.992418  | 1.043161 |        1.10217         |
|        Background_Matting         |  4   | 1.011173 | 0.705627  | 1.039805 |        1.039524        |
|              yolov3               |  16  | 0.999819 | 0.994149  | 1.02349  |        1.022172        |
|               dcgan               |  32  |   1.0    | 1.015395  | 1.006245 |        0.999846        |
|            hf_Reformer            |  4   |   1.0    |    1.0    | 1.005698 |          1.0           |
|           hf_Bert_large           |  4   |   1.0    | 0.989943  | 1.004658 |        1.003581        |
| attention_is_all_you_need_pytorch | 256  | 1.003319 | 1.001208  | 1.003191 |        1.018958        |
|              demucs               |  4   | 1.000058 | 1.000184  | 1.001963 |        0.999831        |
|            tts_angular            |  64  |   1.0    |    1.0    | 0.996691 |          1.0           |
|        shufflenet_v2_x1_0         | 128  | 1.002631 | 1.003486  | 0.995438 |        0.985483        |
|               vgg16               |  64  |   1.0    | 0.999917  | 0.99064  |        0.988421        |
|              hf_Bert              |  4   | 1.000248 | 0.985513  | 0.975042 |        0.968575        |
|      nvidia_deeprecommender       | 256  |   1.0    | 0.970936  | 0.973313 |        0.971137        |
|           hf_DistilBert           |  8   | 0.999326 | 0.982577  | 0.972378 |        0.967066        |
|           timm_resnest            |  32  | 0.999633 | 1.100041  | 0.958733 |        0.952261        |
|            timm_regnet            |  32  | 0.999704 | 0.999869  | 0.952921 |        0.950449        |
|         timm_efficientnet         |  32  | 0.999872 | 0.958033  | 0.94973  |        0.94381         |
|              alexnet              | 128  | 1.000735 | 1.001243  | 0.94399  |        0.938735        |
|             resnet152             |  32  | 0.999192 | 1.001836  | 0.943303 |        0.939124        |
|           pytorch_unet            |  1   | 1.000597 | 0.866079  | 0.930504 |        0.93105         |
|              hf_Bart              |  4   | 1.000543 | 0.921673  | 0.911767 |        0.941756        |
|          pytorch_stargan          |  16  | 0.998426 | 1.050024  | 0.893437 |        0.889299        |
|             resnet50              |  32  | 1.000517 | 1.003551  | 0.890619 |        0.887016        |
|            timm_vovnet            |  32  | 1.001787 | 1.001536  | 0.888781 |        0.887004        |
|      timm_vision_transformer      |  32  | 0.999677 | 1.004442  | 0.85232  |        0.846964        |
|        speech_transformer         |  32  | 0.999301 | 1.000002  | 0.846621 |        0.844683        |
|        mobilenet_v3_large         |  32  | 1.002749 | 0.996106  | 0.788748 |        0.78255         |
|            mnasnet1_0             |  32  | 0.997135 | 0.999441  | 0.784332 |        0.774557        |
|          resnext50_32x4d          |  8   | 1.000255 | 1.001975  | 0.780881 |        0.771616        |
|           squeezenet1_1           |  32  | 1.000239 | 0.998744  | 0.776372 |        0.775402        |
|          LearningToPaint          |  96  |   1.0    | 1.001431  | 0.757111 |         0.7482         |
|         phlippe_densenet          | 128  |   1.0    | 0.999753  | 0.729494 |        0.713997        |
|            densenet121            |  4   | 0.999194 | 0.982488  | 0.691168 |        0.670463        |
|             resnet18              |  16  | 0.999306 | 0.999031  | 0.618876 |        0.61026         |
|   pytorch_CycleGAN_and_pix2pix    |  1   |   1.0    |  0.9865   | 0.603505 |        0.600365        |
|       functorch_dp_cifar10        |  64  |   1.0    | 0.999337  | 0.453125 |        0.444502        |
|          phlippe_resnet           | 128  | 1.000654 | 1.000288  | 0.378591 |        0.36166         |
|     detectron2_fcos_r_50_fpn      |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|         timm_efficientdet         |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|         soft_actor_critic         |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|          pytorch_struct           |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|                drq                |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|   timm_vision_transformer_large   |  32  | 0.999978 | 1.003925  |   0.0    |        0.973508        |
|          DALLE2_pytorch           |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
|               moco                |  32  | 0.977628 |    0.0    |   0.0    |          0.0           |
|           hf_Longformer           |  2   | 0.999106 | 0.981581  |   0.0    |          0.0           |
|               dlrm                | 1024 |   1.0    | 1.000239  |   0.0    |        1.000856        |
|           torchrec_dlrm           |  0   |   0.0    |    0.0    |   0.0    |          0.0           |
+-----------------------------------+------+----------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------+------+------------+------------+------------+------------------------+
|               name                |  bs  |   eager    | aot_eager  |  inductor  | inductor_no_cudagraphs |
+-----------------------------------+------+------------+------------+------------+------------------------+
|           hf_GPT2_large           |  4   | 209.743514 | 278.263137 | 119.851999 |       121.387135       |
|        Background_Matting         |  4   | 127.166333 | 815.382554 | 106.692103 |       107.651865       |
|               hf_T5               |  8   | 179.867163 | 214.081589 | 103.929014 |       103.976291       |
|            hf_T5_large            |  2   | 220.827315 | 534.394825 | 92.848731  |       122.636986       |
|            timm_nfnet             | 128  | 118.48404  | 118.522766 | 72.383207  |       76.617107        |
|            Super_SloMo            |  6   | 81.272053  | 389.501557 | 71.989336  |       73.631438        |
|            hf_Reformer            |  4   | 81.199195  | 94.346731  | 68.901985  |       70.726586        |
|            hf_BigBird             |  2   | 200.190482 | 463.53716  | 66.188665  |       118.239417       |
|              yolov3               |  16  | 69.449507  | 100.042606 | 56.058524  |       56.917833        |
|               vgg16               |  64  | 66.053084  | 66.675128  | 52.395551  |        52.8455         |
|             resnet152             |  32  | 62.922322  | 131.994108 | 52.076394  |       61.520554        |
|              demucs               |  4   | 53.378007  |  53.4464   | 50.519243  |       51.323677        |
|            timm_regnet            |  32  | 57.432232  | 90.482038  | 49.597881  |       54.684078        |
|           hf_Bert_large           |  4   | 80.805967  | 176.396209 | 49.089284  |       51.675919        |
|        speech_transformer         |  32  | 59.786829  | 132.323443 | 41.385651  |        38.71846        |
|           fastNLP_Bert            |  6   | 58.055502  | 108.546454 | 37.595166  |       38.576146        |
| attention_is_all_you_need_pytorch | 256  | 56.749064  | 110.79778  | 33.845332  |       35.132171        |
|              hf_Bart              |  4   | 54.824691  | 117.855948 | 32.748684  |       53.885642        |
|           mobilenet_v2            |  96  | 49.590641  | 73.289949  | 31.269572  |       35.868631        |
|           pytorch_unet            |  1   | 40.562235  | 175.148977 | 29.931212  |       30.529352        |
|             hf_Albert             |  8   |  68.56327  |  97.30032  | 28.961743  |        29.81502        |
|              hf_GPT2              |  4   | 48.076233  | 87.964772  | 25.598932  |       28.113427        |
|            densenet121            |  4   | 51.575704  | 142.20561  | 21.708126  |       50.003876        |
|        shufflenet_v2_x1_0         | 128  | 33.597051  | 62.070596  | 20.963371  |       26.907238        |
|             resnet50              |  32  |  26.38896  | 49.775186  | 20.806872  |       24.357018        |
|              hf_Bert              |  4   | 41.596632  | 89.431368  | 20.777042  |       25.918155        |
|           hf_DistilBert           |  8   | 31.808461  | 47.486587  | 20.757692  |       21.297847        |
|            timm_vovnet            |  32  | 25.020608  |  44.16279  | 19.903423  |       21.885843        |
|         timm_efficientnet         |  32  | 35.170023  | 76.950871  | 17.270957  |       31.628223        |
|           BERT_pytorch            |  16  | 52.476098  |  119.2507  | 15.477095  |       25.364927        |
|      timm_vision_transformer      |  32  | 25.157028  | 50.651734  | 14.976379  |       18.969449        |
|           timm_resnest            |  32  | 24.105572  | 34.667665  | 14.541757  |       15.751461        |
|            mnasnet1_0             |  32  | 22.941997  | 51.009472  | 13.985521  |       23.125186        |
|        mobilenet_v3_large         |  32  | 26.935175  | 58.649976  | 13.744624  |       27.768159        |
|         phlippe_densenet          | 128  | 26.885159  | 51.138504  | 12.783092  |       28.133978        |
|          resnext50_32x4d          |  8   | 20.321671  | 46.339256  | 11.907205  |       20.199126        |
|          pytorch_stargan          |  16  | 14.755791  | 28.485545  | 11.295711  |       11.662139        |
|      nvidia_deeprecommender       | 256  | 10.314474  | 10.311137  |  10.50392  |       10.110573        |
|              alexnet              | 128  |  9.749033  |  10.03852  |  8.526583  |        8.554891        |
|          LearningToPaint          |  96  | 11.514781  | 23.293387  |  8.199904  |       11.019652        |
|            tts_angular            |  64  |  6.470567  |  7.580891  |  6.19554   |        6.247491        |
|          phlippe_resnet           | 128  | 10.148995  | 19.853486  |  5.83463   |        9.243718        |
|         basic_gnn_edgecnn         |  1   |  7.576859  | 10.434382  |  5.638062  |          0.0           |
|             resnet18              |  16  |  8.898828  |  18.6218   |  5.602462  |        8.931554        |
|           squeezenet1_1           |  32  | 10.179315  | 17.106445  |  5.473075  |        7.992387        |
|   pytorch_CycleGAN_and_pix2pix    |  1   | 13.228025  | 26.220481  |  5.395211  |        7.201255        |
|           basic_gnn_gcn           |  1   |  4.916064  |  8.996784  |  3.458337  |        3.718402        |
|       functorch_dp_cifar10        |  64  | 12.012928  | 18.743119  |  3.235116  |        7.399274        |
|          basic_gnn_sage           |  1   |  3.294526  |  6.12314   |  2.022108  |        2.454052        |
|               dcgan               |  32  |  2.334653  |  4.536521  |  1.536079  |        2.35901         |
|           basic_gnn_gin           |  1   |  4.050946  |  6.602456  |  1.400484  |        3.192411        |
|           lennard_jones           | 1000 |  1.574114  |  3.572559  |  0.873083  |        1.60309         |
|         soft_actor_critic         |  0   |    0.0     |    0.0     |    0.0     |          0.0           |
|          pytorch_struct           |  0   |    0.0     |    0.0     |    0.0     |          0.0           |
|     detectron2_fcos_r_50_fpn      |  0   |    0.0     |    0.0     |    0.0     |          0.0           |
|         timm_efficientdet         |  0   |    0.0     |    0.0     |    0.0     |          0.0           |
|                drq                |  0   |    0.0     |    0.0     |    0.0     |          0.0           |
|   timm_vision_transformer_large   |  32  | 416.854919 | 425.844473 |    0.0     |       420.331654       |
|          DALLE2_pytorch           |  0   |    0.0     |    0.0     |    0.0     |          0.0           |
|               moco                |  32  | 49.197174  |    0.0     |    0.0     |          0.0           |
|           hf_Longformer           |  2   | 116.920036 | 286.932827 |    0.0     |          0.0           |
|               dlrm                | 1024 |  6.501852  |  8.253714  |    0.0     |        3.466805        |
|           torchrec_dlrm           |  0   |    0.0     |    0.0     |    0.0     |          0.0           |
+-----------------------------------+------+------------+------------+------------+------------------------+

huggingface suite with amp precision

see more

Performance speedup

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|     MobileBertForQuestionAnswering      | 128 | 1.013948 |  0.46647  | 2.780108 |        1.182392        |
|       DebertaForQuestionAnswering       | 16  | 0.999415 |  0.93414  | 2.748288 |        2.693413        |
|      DebertaV2ForQuestionAnswering      |  1  | 1.019968 | 0.534772  | 2.647552 |        1.959343        |
|       MT5ForConditionalGeneration       | 16  | 1.021138 | 0.484657  | 2.506785 |        1.919343        |
|             OPTForCausalLM              |  2  | 0.99936  | 0.912277  | 2.356594 |        2.349125        |
|      GPT2ForSequenceClassification      |  4  | 0.999497 |  0.86511  | 2.31787  |        2.272015        |
|       ElectraForQuestionAnswering       | 64  | 1.000032 | 0.951253  | 2.153292 |        2.091876        |
|          MobileBertForMaskedLM          | 128 | 1.014689 | 0.482264  | 1.994832 |        1.154881        |
|           DebertaForMaskedLM            |  8  | 0.999973 | 0.789217  | 1.949198 |        1.92653         |
|           ElectraForCausalLM            | 32  | 0.998817 | 0.859431  | 1.861379 |        1.883629        |
|    LayoutLMForSequenceClassification    | 16  | 0.999907 | 0.928003  | 1.825873 |        1.771729        |
|       RobertaForQuestionAnswering       | 16  | 1.000025 | 0.936475  | 1.802971 |        1.761879        |
|        BertForQuestionAnswering         | 16  | 1.00079  | 0.930211  | 1.797494 |        1.745037        |
|             XGLMForCausalLM             |  8  | 1.011627 | 0.483379  | 1.746136 |        1.54555         |
|           RobertaForCausalLM            | 16  | 1.000061 | 0.937462  | 1.679495 |        1.651481        |
|    MegatronBertForQuestionAnswering     |  8  | 0.998948 | 0.796862  | 1.668544 |        1.62954         |
|       AlbertForQuestionAnswering        |  4  | 1.000328 |  0.87335  | 1.655547 |        1.644761        |
|               DistillGPT2               | 16  | 1.000361 | 0.945663  | 1.650823 |        1.675055        |
|     M2M100ForConditionalGeneration      | 16  | 1.007375 | 0.476325  | 1.649832 |        1.491729        |
|            AlbertForMaskedLM            |  4  | 1.000027 | 0.873116  | 1.645831 |        1.636325        |
|            XLNetLMHeadModel             |  8  | 0.99919  | 0.888554  | 1.626346 |        1.634082        |
|         MegatronBertForCausalLM         |  4  | 1.029205 | 0.484722  | 1.615127 |        1.552879        |
|             BertForMaskedLM             | 16  | 0.999089 | 0.937167  | 1.595655 |        1.584557        |
|     PLBartForConditionalGeneration      |  4  | 0.999868 | 0.869075  | 1.590527 |        1.588789        |
|          DebertaV2ForMaskedLM           |  2  | 1.020883 | 0.555165  | 1.567072 |        1.544834        |
|       T5ForConditionalGeneration        |  4  | 1.000079 | 0.692371  | 1.545598 |        1.585897        |
|                 T5Small                 |  4  | 0.999779 | 0.689581  | 1.536741 |        1.581885        |
|                CamemBert                | 16  | 1.000056 | 0.939538  | 1.53429  |        1.53267         |
|      BartForConditionalGeneration       |  2  | 1.002339 | 0.595178  | 1.505955 |        1.475818        |
|      MBartForConditionalGeneration      |  2  | 1.000198 | 0.568799  | 1.498238 |        1.467302        |
|             BartForCausalLM             |  4  | 0.99972  | 0.933782  | 1.494171 |        1.49175         |
|            MBartForCausalLM             |  4  | 0.999195 | 0.933679  | 1.488263 |        1.491064        |
|            YituTechConvBert             | 16  | 1.000015 | 0.859239  | 1.476193 |        1.456752        |
|            PLBartForCausalLM            |  8  | 1.000065 | 0.944641  | 1.461295 |        1.485481        |
|         Speech2Text2ForCausalLM         | 256 | 0.999598 | 0.884283  | 1.459613 |        1.483713        |
|     DistilBertForQuestionAnswering      | 256 | 0.99982  | 0.973493  | 1.445399 |        1.434791        |
| BlenderbotSmallForConditionalGeneration | 64  | 1.006367 | 0.704126  | 1.362908 |        1.363553        |
|     PegasusForConditionalGeneration     | 32  | 1.005231 | 0.611184  | 1.319683 |        1.31133         |
|            TrOCRForCausalLM             | 32  | 1.000129 | 0.938249  | 1.253856 |        1.264002        |
|          DistilBertForMaskedLM          | 128 | 0.99951  | 0.934382  | 1.231039 |        1.239865        |
|       BlenderbotSmallForCausalLM        | 64  | 0.999792 | 0.802935  | 1.226609 |        1.263912        |
|           PegasusForCausalLM            | 32  | 1.000389 | 0.752019  | 1.204053 |        1.192735        |
|          BlenderbotForCausalLM          |  4  | 1.011696 |  0.49972  | 1.138141 |        1.119599        |
|           LayoutLMForMaskedLM           | 16  | 1.00001  | 0.938123  |   0.0    |        1.604682        |
|          AllenaiLongformerBase          |  4  | 0.999358 | 0.582473  |   0.0    |          0.0           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

Accuracy

+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|                  name                   | bs |      eager       |    aot_eager     |     inductor     | inductor_no_cudagraphs |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+
|          BlenderbotForCausalLM          | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|          DebertaV2ForMaskedLM           | 1  | pass_due_to_skip | pass_due_to_skip | pass_due_to_skip |    pass_due_to_skip    |
|            YituTechConvBert             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PLBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|      MBartForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|       MT5ForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|         MegatronBertForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|    MegatronBertForQuestionAnswering     | 1  |       pass       |       pass       |       pass       |          pass          |
|          MobileBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     MobileBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|             OPTForCausalLM              | 1  |       pass       |       pass       |       pass       |          pass          |
|            PLBartForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|           PegasusForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|            XLNetLMHeadModel             | 1  |       pass       |       pass       |       pass       |          pass          |
|            MBartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|           RobertaForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       RobertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|         Speech2Text2ForCausalLM         | 1  |       pass       |       pass       |       pass       |          pass          |
|       T5ForConditionalGeneration        | 1  |       pass       |       pass       |       pass       |          pass          |
|                 T5Small                 | 1  |       pass       |       pass       |       pass       |          pass          |
|            TrOCRForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|             XGLMForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|     PegasusForConditionalGeneration     | 1  |       pass       |       pass       |       pass       |          pass          |
|     M2M100ForConditionalGeneration      | 1  |       pass       |       pass       |       pass       |          pass          |
|    LayoutLMForSequenceClassification    | 1  |       pass       |       pass       |       pass       |          pass          |
| BlenderbotSmallForConditionalGeneration | 1  |       pass       |       pass       |       pass       |          pass          |
|            AlbertForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       AlbertForQuestionAnswering        | 1  |       pass       |       pass       |       pass       |          pass          |
|          AllenaiLongformerBase          | 1  |       pass       |       pass       |       pass       |          pass          |
|             BartForCausalLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|      BartForConditionalGeneration       | 1  |       pass       |       pass       |       pass       |          pass          |
|             BertForMaskedLM             | 1  |       pass       |       pass       |       pass       |          pass          |
|        BertForQuestionAnswering         | 1  |       pass       |       pass       |       pass       |          pass          |
|       BlenderbotSmallForCausalLM        | 1  |       pass       |       pass       |       pass       |          pass          |
|                CamemBert                | 1  |       pass       |       pass       |       pass       |          pass          |
|           LayoutLMForMaskedLM           | 1  |       pass       |       pass       |       pass       |          pass          |
|           DebertaForMaskedLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       DebertaForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|          DistilBertForMaskedLM          | 1  |       pass       |       pass       |       pass       |          pass          |
|     DistilBertForQuestionAnswering      | 1  |       pass       |       pass       |       pass       |          pass          |
|               DistillGPT2               | 1  |       pass       |       pass       |       pass       |          pass          |
|           ElectraForCausalLM            | 1  |       pass       |       pass       |       pass       |          pass          |
|       ElectraForQuestionAnswering       | 1  |       pass       |       pass       |       pass       |          pass          |
|      GPT2ForSequenceClassification      | 1  |       pass       |       pass       |       pass       |          pass          |
|      DebertaV2ForQuestionAnswering      | 1  |       pass       |       pass       |   infra_error    |          pass          |
+-----------------------------------------+----+------------------+------------------+------------------+------------------------+

Compilation latency (sec)

+-----------------------------------------+-----+-----------+-----------+------------+------------------------+
|                  name                   | bs  |   eager   | aot_eager |  inductor  | inductor_no_cudagraphs |
+-----------------------------------------+-----+-----------+-----------+------------+------------------------+
|     MobileBertForQuestionAnswering      | 128 | 28.005908 | 50.090302 | 134.464177 |       134.520693       |
|          MobileBertForMaskedLM          | 128 | 26.541355 | 50.022521 | 132.344597 |       133.560424       |
|     M2M100ForConditionalGeneration      | 16  | 8.232879  | 22.093632 | 98.271694  |       98.993619        |
|            XLNetLMHeadModel             |  8  | 8.730799  | 23.367344 |  94.50377  |       98.369967        |
|             XGLMForCausalLM             |  8  | 7.002015  | 17.639691 | 85.759767  |       86.436074        |
|       MT5ForConditionalGeneration       | 16  | 6.961684  | 17.397204 | 84.680189  |       84.005012        |
|      MBartForConditionalGeneration      |  2  | 9.524429  | 22.614339 |  78.37576  |       77.586508        |
|      BartForConditionalGeneration       |  2  | 8.640034  | 21.831722 |  73.21505  |       73.113895        |
|          DebertaV2ForMaskedLM           |  2  | 15.014841 | 25.998448 | 71.168706  |       66.025264        |
|     PegasusForConditionalGeneration     | 32  | 4.152695  | 18.741433 | 70.392856  |       69.228935        |
|         MegatronBertForCausalLM         |  4  |  9.41297  | 20.29049  | 68.510422  |       65.352325        |
|      DebertaV2ForQuestionAnswering      |  1  | 13.291003 | 26.004463 | 68.001044  |       66.357613        |
|          BlenderbotForCausalLM          |  4  | 6.043639  | 18.061699 | 66.767457  |       64.225277        |
|    MegatronBertForQuestionAnswering     |  8  | 9.412642  | 19.553306 | 65.844739  |       65.134062        |
|            YituTechConvBert             | 16  | 5.986609  | 14.311981 | 62.389043  |       64.085156        |
| BlenderbotSmallForConditionalGeneration | 64  | 5.725183  | 14.854252 | 57.020904  |       53.715058        |
|                 T5Small                 |  4  | 4.829753  | 11.646779 | 47.015723  |       46.676919        |
|       T5ForConditionalGeneration        |  4  | 4.887451  | 11.57768  | 46.680816  |       46.210788        |
|     PLBartForConditionalGeneration      |  4  | 4.427963  | 11.516188 |  46.19498  |       46.879618        |
|           ElectraForCausalLM            | 32  | 4.277916  | 9.506654  | 43.828684  |       46.754753        |
|           DebertaForMaskedLM            |  8  | 6.988147  | 12.874999 | 42.348477  |       41.364712        |
|    LayoutLMForSequenceClassification    | 16  | 4.324636  |  9.89485  | 41.894954  |        42.14851        |
|       DebertaForQuestionAnswering       | 16  | 6.876514  | 12.653998 | 39.520491  |       38.947953        |
|           RobertaForCausalLM            | 16  | 4.909576  | 9.835368  | 38.496135  |       35.025488        |
|             BertForMaskedLM             | 16  | 4.817023  | 9.755045  | 38.025903  |        37.77552        |
|            MBartForCausalLM             |  4  | 3.601726  | 8.722706  | 37.898803  |       36.378014        |
|                CamemBert                | 16  | 4.252059  | 10.003216 | 37.556412  |       36.801517        |
|            AlbertForMaskedLM            |  4  | 2.140659  | 7.868742  | 37.496397  |       36.848402        |
|        BertForQuestionAnswering         | 16  | 4.104459  | 9.663629  | 37.093781  |       36.475531        |
|            TrOCRForCausalLM             | 32  | 3.671399  | 8.717741  | 36.849418  |       36.250642        |
|       ElectraForQuestionAnswering       | 64  | 4.231301  | 9.704069  | 36.808427  |       37.085661        |
|             BartForCausalLM             |  4  | 3.880073  | 8.798065  | 36.216865  |       35.372078        |
|           PegasusForCausalLM            | 32  | 4.530307  | 9.036373  | 36.065896  |       34.979598        |
|       AlbertForQuestionAnswering        |  4  | 3.143062  | 7.671309  | 35.832595  |       34.116697        |
|             OPTForCausalLM              |  2  | 3.334007  | 8.370733  |  34.41215  |       36.543114        |
|      GPT2ForSequenceClassification      |  4  |  4.23757  | 8.806861  | 34.021485  |       32.988405        |
|       RobertaForQuestionAnswering       | 16  | 5.321378  | 9.405191  | 33.953307  |       35.571662        |
|     DistilBertForQuestionAnswering      | 256 | 2.684745  | 4.576588  | 32.997891  |       33.474498        |
|          DistilBertForMaskedLM          | 128 | 1.906759  | 5.577871  | 32.012122  |       31.835771        |
|       BlenderbotSmallForCausalLM        | 64  | 2.390067  | 6.839191  | 30.847765  |       28.647454        |
|               DistillGPT2               | 16  | 1.794796  | 4.343471  | 28.615149  |       27.169641        |
|         Speech2Text2ForCausalLM         | 256 | 2.636885  | 4.348747  | 26.361592  |       26.945285        |
|            PLBartForCausalLM            |  8  | 1.961152  | 4.522749  | 25.933666  |       25.806095        |
|           LayoutLMForMaskedLM           | 16  | 4.381036  | 9.961645  |    0.0     |       37.867065        |
|          AllenaiLongformerBase          |  4  |  7.66396  | 30.761964 |    0.0     |          0.0           |
+-----------------------------------------+-----+-----------+-----------+------------+------------------------+

Peak Memory Compression Ratio

+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|                  name                   | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+
|       DebertaForQuestionAnswering       | 16  | 1.021271 | 1.147063  | 1.340031 |        1.337637        |
|       AlbertForQuestionAnswering        |  4  | 0.999999 | 0.791079  | 1.315569 |        1.314665        |
|            AlbertForMaskedLM            |  4  |   1.0    | 0.783779  | 1.257824 |        1.268002        |
|      GPT2ForSequenceClassification      |  4  | 1.000115 | 0.978078  | 1.097279 |        1.126616        |
|             OPTForCausalLM              |  2  | 1.000031 | 0.968122  | 1.090841 |        1.128266        |
|     DistilBertForQuestionAnswering      | 256 | 1.011432 | 1.024273  | 1.084986 |        1.082283        |
|       ElectraForQuestionAnswering       | 64  | 1.001377 |  1.00139  | 1.072642 |        1.07173         |
|        BertForQuestionAnswering         | 16  | 1.001687 | 1.001795  | 1.069941 |        1.065794        |
|       RobertaForQuestionAnswering       | 16  | 1.001249 | 1.001349  | 1.069558 |        1.065445        |
|           DebertaForMaskedLM            |  8  | 0.999756 | 1.026032  | 1.055076 |        1.108048        |
|    LayoutLMForSequenceClassification    | 16  | 1.001432 |  1.00144  | 1.039867 |        1.035639        |
|            XLNetLMHeadModel             |  8  |   1.0    | 0.984302  | 1.033141 |        1.033141        |
|    MegatronBertForQuestionAnswering     |  8  |   1.0    | 0.999999  | 1.029183 |        1.028523        |
|                 T5Small                 |  4  | 0.999923 | 0.987768  | 1.02166  |        1.067157        |
|       T5ForConditionalGeneration        |  4  | 0.999923 | 0.987768  | 1.02166  |        1.067157        |
|         MegatronBertForCausalLM         |  4  |   1.0    | 0.986938  | 1.020586 |        1.032361        |
|          BlenderbotForCausalLM          |  4  | 0.997825 | 0.998157  | 1.000304 |        0.99879         |
|      DebertaV2ForQuestionAnswering      |  1  | 1.000099 | 1.000099  | 0.999938 |        0.999281        |
|      MBartForConditionalGeneration      |  2  |   1.0    | 0.973796  | 0.995659 |        1.02194         |
|     PegasusForConditionalGeneration     | 32  | 0.999993 | 0.944958  | 0.989469 |        1.048692        |
|      BartForConditionalGeneration       |  2  |   1.0    | 0.973768  | 0.979934 |        1.005418        |
|          DebertaV2ForMaskedLM           |  2  | 0.999666 | 0.981407  | 0.974219 |        0.990623        |
|           RobertaForCausalLM            | 16  | 0.999899 | 0.958734  | 0.94639  |        0.983123        |
|               DistillGPT2               | 16  | 0.999964 | 0.915694  | 0.940389 |        1.027408        |
|            MBartForCausalLM             |  4  |   1.0    |  0.95108  | 0.936612 |        0.982652        |
|          MobileBertForMaskedLM          | 128 | 0.999985 | 0.932688  | 0.935191 |        0.983458        |
|            YituTechConvBert             | 16  |   1.0    | 0.955142  | 0.930818 |        0.929368        |
|             BertForMaskedLM             | 16  | 0.999764 | 0.958621  | 0.927181 |        0.924076        |
|                CamemBert                | 16  | 0.999989 | 0.957465  | 0.924779 |        0.921675        |
|             BartForCausalLM             |  4  |   1.0    | 0.951014  | 0.921898 |        0.966594        |
|             XGLMForCausalLM             |  8  |   1.0    | 0.943529  | 0.91812  |        0.969992        |
|     M2M100ForConditionalGeneration      | 16  |   1.0    | 0.938979  | 0.910994 |        0.966946        |
|     PLBartForConditionalGeneration      |  4  | 1.00005  | 0.930006  | 0.908319 |        0.973149        |
|           PegasusForCausalLM            | 32  |   1.0    | 0.926025  | 0.904791 |        0.973312        |
|       MT5ForConditionalGeneration       | 16  | 0.999948 | 0.922812  | 0.903874 |        0.991231        |
|            TrOCRForCausalLM             | 32  |   1.0    | 0.919942  | 0.87395  |        0.881037        |
| BlenderbotSmallForConditionalGeneration | 64  |   1.0    | 0.889539  | 0.864986 |        0.897783        |
|            PLBartForCausalLM            |  8  |   1.0    | 0.923832  |  0.863   |        0.860945        |
|           ElectraForCausalLM            | 32  | 0.999976 | 0.917402  | 0.861134 |        0.93223         |
|     MobileBertForQuestionAnswering      | 128 | 1.016118 | 1.024925  | 0.857907 |        0.857131        |
|          DistilBertForMaskedLM          | 128 | 1.000018 | 0.917034  | 0.851792 |        0.849938        |
|       BlenderbotSmallForCausalLM        | 64  |   1.0    | 0.890384  | 0.804903 |        0.803499        |
|         Speech2Text2ForCausalLM         | 256 |   1.0    | 0.888278  | 0.77739  |        0.775883        |
|           LayoutLMForMaskedLM           | 16  | 0.999888 | 0.958901  |   0.0    |        0.924424        |
|          AllenaiLongformerBase          |  4  | 0.998844 | 0.951083  |   0.0    |          0.0           |
+-----------------------------------------+-----+----------+-----------+----------+------------------------+

Absolute latency (ms)

+-----------------------------------------+-----+------------+------------+------------+------------------------+
|                  name                   | bs  |   eager    | aot_eager  |  inductor  | inductor_no_cudagraphs |
+-----------------------------------------+-----+------------+------------+------------+------------------------+
|            XLNetLMHeadModel             |  8  | 277.016386 | 311.764992 | 170.93676  |       169.043056       |
|            AlbertForMaskedLM            |  4  | 266.852506 | 305.86614  | 162.205125 |       163.083869       |
|       AlbertForQuestionAnswering        |  4  | 265.070686 | 303.194028 | 160.052774 |       160.974258       |
|            TrOCRForCausalLM             | 32  | 135.298269 | 144.146378 | 108.454232 |       107.836772       |
|     PegasusForConditionalGeneration     | 32  | 136.739217 | 234.549796 | 107.324999 |       109.105772       |
|          MobileBertForMaskedLM          | 128 | 175.810528 | 397.080822 | 93.477214  |       162.804799       |
|            YituTechConvBert             | 16  | 133.302608 | 156.60748  |  90.92096  |       93.080105        |
|      MBartForConditionalGeneration      |  2  | 133.92787  | 238.272106 | 89.959816  |       92.407365        |
|      BartForConditionalGeneration       |  2  | 133.74041  | 224.192551 |  88.67486  |       91.007708        |
|    MegatronBertForQuestionAnswering     |  8  | 141.779694 | 178.000234 | 85.562515  |       87.803048        |
|          BlenderbotForCausalLM          |  4  | 89.909457  | 182.567356 | 79.563173  |        81.7712         |
| BlenderbotSmallForConditionalGeneration | 64  | 108.340561 | 155.596584 | 79.292134  |       79.038295        |
|                CamemBert                | 16  | 118.496928 | 126.03016  | 77.124599  |       78.028231        |
|          DebertaV2ForMaskedLM           |  2  | 118.481165 | 212.967894 |  74.15952  |       76.213999        |
|            MBartForCausalLM             |  4  | 108.309004 | 116.002761 | 73.098479  |       73.205261        |
|             BartForCausalLM             |  4  | 108.563164 |  115.7605  | 72.313173  |        72.48485        |
|     PLBartForConditionalGeneration      |  4  | 113.500478 | 130.329322 | 71.762294  |       72.370534        |
|     DistilBertForQuestionAnswering      | 256 | 102.873574 | 105.65135  | 71.370854  |       72.075112        |
|     M2M100ForConditionalGeneration      | 16  | 116.169399 | 244.385957 | 71.244798  |       87.374891        |
|            PLBartForCausalLM            |  8  | 102.536118 | 108.485397 | 70.476753  |        69.49154        |
|                 T5Small                 |  4  | 103.32242  | 150.514971 | 68.882033  |       66.842522        |
|       T5ForConditionalGeneration        |  4  | 103.362044 | 149.945978 | 68.863693  |       66.818725        |
|             BertForMaskedLM             | 16  | 109.965019 | 117.322671 | 68.813818  |       69.507316        |
|           RobertaForCausalLM            | 16  | 114.930009 | 122.707966 | 68.787922  |       69.598675        |
|          DistilBertForMaskedLM          | 128 | 84.142975  | 89.979572  | 68.647007  |       67.866745        |
|     MobileBertForQuestionAnswering      | 128 | 181.745989 | 400.246042 | 66.049968  |       156.101302       |
|             OPTForCausalLM              |  2  | 155.246568 | 169.915345 |  65.80678  |       66.600839        |
|               DistillGPT2               | 16  | 105.696834 | 111.898442 | 64.033449  |       63.104804        |
|           PegasusForCausalLM            | 32  | 68.190021  | 90.585331  | 57.286371  |       57.135705        |
|         MegatronBertForCausalLM         |  4  |  86.17124  | 185.62622  | 54.927648  |       58.193645        |
|    LayoutLMForSequenceClassification    | 16  | 97.871068  | 105.438826 | 53.932518  |       55.204354        |
|        BertForQuestionAnswering         | 16  |  95.33042  | 102.560064 | 53.431936  |       54.651938        |
|       ElectraForQuestionAnswering       | 64  | 115.256001 | 120.816882 | 53.357514  |       55.019793        |
|       RobertaForQuestionAnswering       | 16  | 95.807973  | 102.049819 | 53.026195  |        54.84576        |
|       DebertaForQuestionAnswering       | 16  | 145.63176  | 155.702317 | 52.938425  |        54.0114         |
|             XGLMForCausalLM             |  8  | 92.257911  | 193.848468 | 50.147052  |       56.908814        |
|           DebertaForMaskedLM            |  8  | 93.849216  | 119.057684 | 48.558856  |       48.678837        |
|           ElectraForCausalLM            | 32  | 87.737753  | 101.708558 | 46.940744  |       47.323442        |
|       BlenderbotSmallForCausalLM        | 64  | 56.735862  | 70.750482  | 46.229695  |        46.02597        |
|      DebertaV2ForQuestionAnswering      |  1  | 107.643376 | 207.988094 | 41.279191  |        53.56479        |
|       MT5ForConditionalGeneration       | 16  | 96.460884  | 209.539665 | 39.911509  |       52.325371        |
|      GPT2ForSequenceClassification      |  4  | 90.507337  | 104.834707 | 39.596057  |       39.892535        |
|         Speech2Text2ForCausalLM         | 256 | 49.689272  | 55.855297  | 34.465984  |       33.965759        |
|           LayoutLMForMaskedLM           | 16  | 112.489355 | 119.939849 |    0.0     |       70.264157        |
|          AllenaiLongformerBase          |  4  | 197.24792  | 342.259095 |    0.0     |          0.0           |
+-----------------------------------------+-----+------------+------------+------------+------------------------+

timm_models suite with amp precision

see more

Performance speedup

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|        tnt_s_patch16_224        | 128 | 0.999937 | 0.979273  | 2.535318 |        2.488158        |
|            levit_128            | 128 | 1.006095 | 0.479146  | 2.163689 |        1.462609        |
|      xcit_large_24_p8_224       |  5  | 0.998343 | 0.478302  | 2.115537 |        1.591851        |
|          cait_m36_384           |  2  | 0.942583 | 0.474984  | 2.051115 |        1.572233        |
|            lcnet_050            | 128 | 0.998996 |  0.68977  | 2.045065 |        1.718694        |
|          ghostnet_100           | 128 | 0.99974  | 0.657087  | 1.957761 |        1.731641        |
|        twins_pcpvt_base         | 64  | 1.002899 | 0.554792  | 1.937105 |        1.573457        |
|         coat_lite_mini          | 128 | 0.999538 |  0.90529  | 1.897422 |        1.862499        |
|           regnety_002           | 128 | 1.018046 | 0.581444  | 1.856761 |        1.440385        |
|          gmlp_s16_224           | 128 | 1.000266 | 0.973587  | 1.743779 |        1.707174        |
|          gmixer_24_224          | 128 | 0.999746 |  0.81072  | 1.713867 |        1.679773        |
|      mobilenetv3_large_100      | 128 | 0.999267 | 0.757196  | 1.641153 |        1.585806        |
|           mnasnet_100           | 128 | 0.999342 | 0.783825  | 1.630094 |        1.619385        |
|         crossvit_9_240          | 128 | 1.003288 | 0.578453  | 1.628457 |        1.310233        |
|         mobilenetv2_100         | 128 | 0.999443 | 0.767241  | 1.621301 |        1.60987         |
|  swin_base_patch4_window7_224   | 64  | 0.999643 | 0.735531  | 1.620062 |        1.578529        |
|           volo_d1_224           | 64  | 0.999606 | 0.919902  | 1.603852 |        1.570803        |
|           dm_nfnet_f0           | 128 | 0.999379 | 0.971582  | 1.603275 |        1.518219        |
|        sebotnet33ts_256         | 64  | 0.999996 | 0.780509  | 1.600624 |        1.573112        |
|            nfnet_l0             | 128 | 0.999007 | 0.780911  | 1.590227 |        1.501868        |
|          spnasnet_100           | 128 | 0.999321 | 0.781063  | 1.570904 |        1.546223        |
|           convit_base           | 64  | 0.999734 |  0.97573  | 1.564137 |        1.536557        |
|             dla102              | 128 | 0.999475 |  0.82009  | 1.558571 |        1.537289        |
|       gluon_inception_v3        | 128 | 0.999921 | 0.860162  | 1.557212 |        1.528026        |
|           fbnetc_100            | 128 | 0.999546 | 0.774868  | 1.555221 |        1.541267        |
|          inception_v3           | 128 | 0.999767 | 0.860256  | 1.554518 |        1.524263        |
|        adv_inception_v3         | 128 | 0.999568 |  0.85449  | 1.554092 |        1.525432        |
|          convnext_base          | 64  | 0.999662 | 0.916394  | 1.537376 |        1.505963        |
|       tf_efficientnet_b0        | 128 | 0.999678 | 0.699331  | 1.512592 |        1.490584        |
|       eca_botnext26ts_256       | 128 | 0.999655 | 0.740707  | 1.491524 |        1.48158         |
|           mobilevit_s           | 64  | 0.999561 | 0.640437  | 1.490321 |        1.37385         |
|            fbnetv3_b            | 128 | 0.99911  |  0.73631  | 1.473592 |        1.454377        |
|          botnet26t_256          | 128 | 0.999733 | 0.874946  | 1.46725  |        1.457085        |
|           SelecSls42b           | 128 | 0.999507 |  0.81013  | 1.462733 |        1.433993        |
|           resnest101e           | 64  | 0.999128 | 0.739484  | 1.43997  |        1.344963        |
|        ese_vovnet19b_dw         | 128 | 0.999513 | 0.861125  | 1.437404 |        1.429039        |
|          cspdarknet53           | 64  | 0.999553 | 0.829967  | 1.433597 |        1.414689        |
|           rexnet_100            | 128 | 0.999486 | 0.713482  | 1.430253 |        1.404121        |
|            tinynet_a            | 128 | 0.999533 | 0.620371  | 1.407258 |        1.257235        |
|           res2next50            | 128 | 0.999736 | 0.810016  | 1.396026 |        1.37432         |
|        eca_halonext26ts         | 128 | 0.999875 | 0.746769  | 1.383023 |        1.37401         |
|         poolformer_m36          | 64  | 0.999077 |  0.96823  | 1.38185  |        1.348848        |
|        res2net50_14w_8s         | 128 | 0.999756 |  0.71351  | 1.373582 |        1.354588        |
|          mixer_b16_224          | 128 | 0.999596 | 0.968242  | 1.327419 |        1.317113        |
|            repvgg_a2            | 128 | 0.999542 | 0.789153  | 1.263607 |        1.251531        |
|             dpn107              | 32  | 1.00005  | 0.736984  | 1.25536  |        1.231054        |
|           tf_mixnet_l           | 128 | 0.999972 | 0.834594  | 1.242394 |        1.227238        |
|            mixnet_l             | 128 | 0.999308 | 0.828089  | 1.237938 |        1.218866        |
|            pit_b_224            | 64  | 0.999462 | 0.910021  | 1.227257 |        1.207004        |
|          jx_nest_base           | 32  | 0.999827 | 0.664991  | 1.220553 |        1.19083         |
|         visformer_small         | 128 | 0.999601 |  0.94578  | 1.188301 |        1.136696        |
|            gernet_l             | 128 | 0.999329 | 0.824337  | 1.164958 |        1.153659        |
|          resmlp_12_224          | 128 | 1.000008 | 0.761694  | 1.152558 |        1.139756        |
| deit_base_distilled_patch16_224 | 64  | 0.99969  | 0.956208  | 1.141128 |        1.125946        |
|      vit_base_patch16_224       | 64  | 0.999596 | 0.960421  | 1.129073 |        1.116839        |
|      beit_base_patch16_224      | 64  | 0.999567 |  0.8994   | 1.126642 |        1.118745        |
|     swsl_resnext101_32x16d      | 32  | 0.999512 | 0.820644  | 1.088081 |        1.025927        |
|        res2net101_26w_4s        | 64  | 0.999153 | 0.564931  | 1.046018 |        1.084082        |
|          pnasnet5large          | 16  | 0.998644 | 0.736695  | 1.042094 |        1.119958        |
|        convmixer_768_32         | 32  | 0.999638 | 0.947029  | 0.998149 |        0.995412        |
|            hrnet_w18            | 128 | 0.968426 | 0.705496  | 0.972432 |        1.036891        |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Accuracy

+---------------------------------+----+---------------+---------------+-------------+------------------------+
|              name               | bs |     eager     |   aot_eager   |  inductor   | inductor_no_cudagraphs |
+---------------------------------+----+---------------+---------------+-------------+------------------------+
|           SelecSls42b           | 8  |     pass      |     pass      |    pass     |          pass          |
|        adv_inception_v3         | 8  |     pass      |     pass      |    pass     |          pass          |
|           mobilevit_s           | 8  |     pass      |     pass      |    pass     |          pass          |
|            nfnet_l0             | 8  |     pass      |     pass      |    pass     |          pass          |
|            pit_b_224            | 8  |     pass      |     pass      |    pass     |          pass          |
|          pnasnet5large          | 8  |     pass      |     pass      |    pass     |          pass          |
|         poolformer_m36          | 8  |     pass      |     pass      |    pass     |          pass          |
|           regnety_002           | 8  |     pass      |     pass      |    pass     |          pass          |
|            repvgg_a2            | 8  |     pass      |     pass      |    pass     |          pass          |
|        res2net101_26w_4s        | 8  |     pass      |     pass      |    pass     |          pass          |
|        res2net50_14w_8s         | 8  |     pass      |     pass      |    pass     |          pass          |
|           res2next50            | 8  |     pass      |     pass      |    pass     |          pass          |
|          resmlp_12_224          | 8  |     pass      |     pass      |    pass     |          pass          |
|           resnest101e           | 8  |     pass      |     pass      |    pass     |          pass          |
|           rexnet_100            | 8  |     pass      |     pass      |    pass     |          pass          |
|        sebotnet33ts_256         | 8  |     pass      |     pass      |    pass     |          pass          |
|          spnasnet_100           | 8  |     pass      |     pass      |    pass     |          pass          |
|  swin_base_patch4_window7_224   | 8  |     pass      |     pass      |    pass     |          pass          |
|     swsl_resnext101_32x16d      | 8  |     pass      |     pass      |    pass     |          pass          |
|       tf_efficientnet_b0        | 8  |     pass      |     pass      |    pass     |          pass          |
|           tf_mixnet_l           | 8  |     pass      |     pass      |    pass     |          pass          |
|            tinynet_a            | 8  |     pass      |     pass      |    pass     |          pass          |
|        tnt_s_patch16_224        | 8  |     pass      |     pass      |    pass     |          pass          |
|        twins_pcpvt_base         | 8  |     pass      |     pass      |    pass     |          pass          |
|         visformer_small         | 8  |     pass      |     pass      |    pass     |          pass          |
|      vit_base_patch16_224       | 8  |     pass      |     pass      |    pass     |          pass          |
|           volo_d1_224           | 8  |     pass      |     pass      |    pass     |          pass          |
|      xcit_large_24_p8_224       | 8  |     pass      |     pass      |    pass     |          pass          |
|            lcnet_050            | 8  |     pass      | fail_accuracy |    pass     |          pass          |
|      mobilenetv3_large_100      | 8  |     pass      |     pass      |    pass     |          pass          |
|         mobilenetv2_100         | 8  |     pass      |     pass      |    pass     |          pass          |
|           mnasnet_100           | 8  |     pass      |     pass      |    pass     |          pass          |
|            mixnet_l             | 8  |     pass      |     pass      |    pass     |          pass          |
|      beit_base_patch16_224      | 8  |     pass      |     pass      |    pass     |          pass          |
|          botnet26t_256          | 8  |     pass      |     pass      |    pass     |          pass          |
|         coat_lite_mini          | 8  |     pass      |     pass      |    pass     |          pass          |
|        convmixer_768_32         | 8  |     pass      |     pass      |    pass     |          pass          |
|          convnext_base          | 8  |     pass      |     pass      |    pass     |          pass          |
|         crossvit_9_240          | 8  |     pass      |     pass      |    pass     |          pass          |
|          cspdarknet53           | 8  |     pass      |     pass      |    pass     |          pass          |
| deit_base_distilled_patch16_224 | 8  |     pass      |     pass      |    pass     |          pass          |
|             dla102              | 8  |     pass      |     pass      |    pass     |          pass          |
|           dm_nfnet_f0           | 8  |     pass      |     pass      |    pass     |          pass          |
|             dpn107              | 8  |     pass      |     pass      |    pass     |          pass          |
|       eca_botnext26ts_256       | 8  |     pass      |     pass      |    pass     |          pass          |
|        eca_halonext26ts         | 8  |     pass      |     pass      |    pass     |          pass          |
|        ese_vovnet19b_dw         | 8  |     pass      |     pass      |    pass     |          pass          |
|           fbnetc_100            | 8  |     pass      |     pass      |    pass     |          pass          |
|            fbnetv3_b            | 8  |     pass      |     pass      |    pass     |          pass          |
|            gernet_l             | 8  |     pass      |     pass      |    pass     |          pass          |
|          ghostnet_100           | 8  |     pass      |     pass      |    pass     |          pass          |
|       gluon_inception_v3        | 8  |     pass      |     pass      |    pass     |          pass          |
|          gmixer_24_224          | 8  |     pass      |     pass      |    pass     |          pass          |
|          gmlp_s16_224           | 8  |     pass      |     pass      |    pass     |          pass          |
|            hrnet_w18            | 8  |     pass      |     pass      |    pass     |          pass          |
|          inception_v3           | 8  |     pass      |     pass      |    pass     |          pass          |
|          jx_nest_base           | 8  |     pass      |     pass      |    pass     |          pass          |
|            levit_128            | 8  |     pass      |     pass      |    pass     |          pass          |
|          mixer_b16_224          | 8  |     pass      |     pass      |    pass     |          pass          |
|           convit_base           | 8  | fail_accuracy |     pass      | fail_to_run |      fail_to_run       |
|          cait_m36_384           | 8  |     pass      |     pass      |     OOM     |          pass          |
+---------------------------------+----+---------------+---------------+-------------+------------------------+

Compilation latency (sec)

+---------------------------------+-----+-----------+-----------+------------+------------------------+
|              name               | bs  |   eager   | aot_eager |  inductor  | inductor_no_cudagraphs |
+---------------------------------+-----+-----------+-----------+------------+------------------------+
|            hrnet_w18            | 128 | 11.526093 | 34.704092 | 215.300548 |       235.439188       |
|           rexnet_100            | 128 | 4.709586  | 9.921003  | 198.835597 |       287.845509       |
|          ghostnet_100           | 128 |  4.28569  | 11.676922 | 182.636547 |       232.169208       |
|          pnasnet5large          | 16  |  9.84799  | 27.578502 | 147.549783 |       156.780616       |
|           resnest101e           | 64  | 9.160534  | 21.012353 | 138.563704 |       158.368072       |
|           mobilevit_s           | 64  | 4.339656  | 9.846744  | 135.482809 |       157.381025       |
|            fbnetv3_b            | 128 | 6.780701  | 14.103187 | 131.883844 |       162.084463       |
|       gluon_inception_v3        | 128 | 6.771645  | 13.31762  | 131.07415  |       158.07422        |
|           tf_mixnet_l           | 128 | 7.631285  | 15.099839 | 128.487929 |       147.30566        |
|        res2net101_26w_4s        | 64  | 6.925115  | 22.037917 | 128.054408 |       138.097871       |
|        adv_inception_v3         | 128 | 6.916839  | 13.522195 | 126.647641 |       150.583826       |
|          inception_v3           | 128 | 6.908191  | 13.404646 | 126.359993 |       154.997186       |
|            tinynet_a            | 128 | 4.562391  | 10.756223 | 123.713257 |       143.520164       |
|            mixnet_l             | 128 | 8.114457  | 14.762465 | 122.645935 |       147.809105       |
|       tf_efficientnet_b0        | 128 | 3.876222  |  8.95223  | 119.095017 |       135.472967       |
|      mobilenetv3_large_100      | 128 | 3.268903  | 7.370348  | 116.468401 |       141.481082       |
|      xcit_large_24_p8_224       |  5  | 9.060889  | 23.917738 | 114.981726 |       117.396615       |
|        res2net50_14w_8s         | 128 | 6.656659  | 20.358322 | 110.228272 |       114.338428       |
|            levit_128            | 128 | 4.723784  | 11.611245 | 105.940064 |       118.947059       |
|           fbnetc_100            | 128 | 4.740683  | 8.405713  | 105.643603 |       134.635195       |
|          cait_m36_384           |  2  | 10.683412 | 26.892269 | 104.168617 |       101.645813       |
|          spnasnet_100           | 128 | 3.801504  |  8.38845  | 104.156516 |       131.219595       |
|  swin_base_patch4_window7_224   | 64  | 6.711992  | 16.408464 | 102.961338 |       103.763249       |
|        twins_pcpvt_base         | 64  | 8.097481  | 18.079822 | 101.116118 |       105.321891       |
|        eca_halonext26ts         | 128 | 2.803947  | 6.547626  | 95.464727  |       106.42396        |
|         poolformer_m36          | 64  | 7.666118  | 13.079596 | 93.365106  |       100.51483        |
|         mobilenetv2_100         | 128 | 3.233801  | 7.084594  | 88.824387  |       103.558762       |
|             dpn107              | 32  | 9.221649  | 18.13529  |  87.67872  |        90.00538        |
|        sebotnet33ts_256         | 64  | 3.449897  | 7.858807  |  84.98418  |       96.118227        |
|           regnety_002           | 128 | 4.030516  |  8.01265  | 84.177373  |       87.946725        |
|          cspdarknet53           | 64  | 5.331471  | 10.261594 | 82.409863  |       93.775798        |
|          jx_nest_base           | 32  | 6.090223  | 13.536968 | 81.942001  |       80.856023        |
|             dla102              | 128 | 4.956245  | 13.010602 | 81.783964  |       90.879032        |
|           mnasnet_100           | 128 | 3.710201  | 6.811075  | 79.663665  |       100.111465       |
|         coat_lite_mini          | 128 | 2.811215  | 7.210743  | 79.427025  |        82.83736        |
|       eca_botnext26ts_256       | 128 | 2.733343  | 6.420717  |  76.05704  |       90.096494        |
|            lcnet_050            | 128 | 2.341389  | 4.422994  | 73.907701  |       96.204906        |
|         crossvit_9_240          | 128 | 4.295831  | 11.204699 | 73.348105  |       74.572179        |
|           res2next50            | 128 | 4.780722  | 10.478523 | 73.289263  |       82.035364        |
|          botnet26t_256          | 128 | 2.510238  | 5.418026  | 71.797055  |       86.853273        |
|           volo_d1_224           | 64  |  3.63585  | 10.17847  | 68.454106  |       69.542334        |
|            nfnet_l0             | 128 | 4.139216  | 9.284106  | 65.549119  |       71.427918        |
|           dm_nfnet_f0           | 128 | 4.891267  | 9.810936  | 65.227689  |        69.27176        |
|        tnt_s_patch16_224        | 128 | 4.937536  | 13.830504 | 65.109728  |       65.763122        |
|            gernet_l             | 128 | 4.505373  | 8.082768  | 64.167078  |       74.439501        |
|           SelecSls42b           | 128 | 1.928453  | 4.732315  | 61.956307  |       85.059221        |
|        ese_vovnet19b_dw         | 128 |  2.16477  | 4.153617  |  59.97885  |       74.747155        |
|     swsl_resnext101_32x16d      | 32  | 4.890121  | 12.505957 | 59.081047  |       59.323381        |
|         visformer_small         | 128 | 2.055817  | 5.208936  | 56.404626  |       60.933241        |
|          convnext_base          | 64  | 5.300881  | 11.968768 | 54.463518  |       55.232698        |
|          gmlp_s16_224           | 128 | 4.926444  | 10.300078 | 54.267164  |       53.275909        |
|          gmixer_24_224          | 128 | 4.541943  | 12.086148 | 48.574792  |       47.430169        |
|            repvgg_a2            | 128 | 4.114284  | 7.906238  | 48.392955  |       56.701221        |
|           convit_base           | 64  | 2.941901  | 7.936277  | 43.926032  |       42.370222        |
|          resmlp_12_224          | 128 | 2.033362  | 4.268435  | 36.677943  |       36.590531        |
|      beit_base_patch16_224      | 64  | 3.389528  | 7.645599  |  36.5096   |        33.7776         |
|        convmixer_768_32         | 32  | 1.667835  | 6.788494  |  35.26413  |       34.369227        |
|            pit_b_224            | 64  | 2.808171  | 6.189776  | 35.009204  |       34.056854        |
|          mixer_b16_224          | 128 | 3.228623  | 4.958011  | 33.479241  |       29.756883        |
|      vit_base_patch16_224       | 64  | 2.341231  | 5.392029  |  33.37032  |       32.161311        |
| deit_base_distilled_patch16_224 | 64  | 2.321951  | 5.310861  | 32.587011  |       32.188822        |
+---------------------------------+-----+-----------+-----------+------------+------------------------+

Peak Memory Compression Ratio

+---------------------------------+-----+----------+-----------+----------+------------------------+
|              name               | bs  |  eager   | aot_eager | inductor | inductor_no_cudagraphs |
+---------------------------------+-----+----------+-----------+----------+------------------------+
|          pnasnet5large          | 16  | 1.058353 | 1.058169  | 1.288607 |        1.284813        |
|          gmlp_s16_224           | 128 | 1.00147  | 1.001336  | 1.205843 |        1.204895        |
|         poolformer_m36          | 64  | 1.003335 | 1.003329  | 1.195425 |        1.192848        |
|          gmixer_24_224          | 128 | 1.001673 | 1.001863  | 1.160637 |        1.159565        |
|           convit_base           | 64  | 0.999863 | 1.001607  | 1.158278 |        1.157029        |
|         mobilenetv2_100         | 128 | 1.000388 | 0.951814  | 1.121887 |        1.118241        |
|        sebotnet33ts_256         | 64  | 1.000271 | 1.000207  | 1.112976 |        1.111256        |
|           resnest101e           | 64  |   1.0    | 1.103821  | 1.084709 |        1.08367         |
|          resmlp_12_224          | 128 | 0.999919 | 1.008706  | 1.076966 |        1.074078        |
|           dm_nfnet_f0           | 128 | 0.91384  | 0.989346  | 1.07696  |        1.072741        |
|       tf_efficientnet_b0        | 128 | 1.000043 | 0.956726  | 1.070345 |        1.067359        |
|           tf_mixnet_l           | 128 | 1.000146 | 0.982793  | 1.067061 |        1.064731        |
|            tinynet_a            | 128 | 0.999648 | 0.959928  | 1.064684 |        1.06071         |
|        twins_pcpvt_base         | 64  | 1.000222 | 1.000677  | 1.061434 |        1.061039        |
|        tnt_s_patch16_224        | 128 | 1.000049 | 1.005661  | 1.051242 |        1.050604        |
|           rexnet_100            | 128 | 0.999839 |  0.95797  | 1.04929  |        1.046467        |
|  swin_base_patch4_window7_224   | 64  | 1.000137 | 1.001841  | 1.047412 |        1.046694        |
|          convnext_base          | 64  | 1.005168 | 1.004929  | 1.038726 |        1.037884        |
|             dla102              | 128 | 0.975942 | 1.000439  | 1.026486 |        1.026963        |
|         coat_lite_mini          | 128 | 1.044455 | 1.045678  | 1.020978 |        1.020201        |
|         visformer_small         | 128 | 1.000821 | 1.000654  | 1.02056  |        1.019496        |
|        adv_inception_v3         | 128 | 1.000748 | 1.000689  | 1.019977 |        1.01808         |
|       gluon_inception_v3        | 128 | 1.000748 | 1.000689  | 1.019977 |        1.01808         |
|          inception_v3           | 128 | 1.000748 | 1.000689  | 1.019977 |        1.01808         |
|          cspdarknet53           | 64  |   1.0    |  0.99992  | 1.017356 |        1.014332        |
|       eca_botnext26ts_256       | 128 | 0.999972 | 0.977713  | 1.005351 |        1.004339        |
|          ghostnet_100           | 128 | 0.998638 | 0.997661  | 1.00505  |        1.00194         |
|        eca_halonext26ts         | 128 | 0.999902 | 0.977751  | 1.001027 |        0.999981        |
|             dpn107              | 32  | 1.000838 | 1.001744  | 0.998565 |        0.999818        |
|          mixer_b16_224          | 128 | 0.999945 |  1.00009  | 0.995661 |        0.994736        |
|            hrnet_w18            | 128 | 1.000189 |  1.00005  | 0.992477 |        0.989726        |
|           mobilevit_s           | 64  | 1.000096 |  0.9616   | 0.989891 |        0.98861         |
|            mixnet_l             | 128 | 1.000338 | 0.980962  | 0.989015 |        0.986977        |
|      beit_base_patch16_224      | 64  | 0.999671 | 1.003491  | 0.988595 |        0.987106        |
|        convmixer_768_32         | 32  |   1.0    | 0.999873  | 0.987411 |        0.986378        |
|          cait_m36_384           |  2  | 1.000008 | 0.999611  |  0.9837  |        0.97734         |
|     swsl_resnext101_32x16d      | 32  | 1.000515 |  1.00025  | 0.979647 |        0.978856        |
|      xcit_large_24_p8_224       |  5  | 0.998685 | 0.998616  | 0.977595 |        0.973401        |
|          botnet26t_256          | 128 | 1.000038 | 0.999927  | 0.975558 |        0.974401        |
|        ese_vovnet19b_dw         | 128 | 1.000829 | 1.000484  | 0.975309 |        0.973402        |
|            gernet_l             | 128 | 1.000219 | 0.999742  | 0.973937 |        0.970591        |
|           volo_d1_224           | 64  | 1.001172 | 1.002349  | 0.973156 |        0.973038        |
|            nfnet_l0             | 128 | 1.000313 | 0.980272  | 0.973054 |        0.969296        |
|            fbnetv3_b            | 128 | 1.000086 | 0.972896  | 0.972444 |        0.969908        |
|           SelecSls42b           | 128 | 1.001155 | 1.000911  | 0.971568 |        0.967975        |
|        res2net101_26w_4s        | 64  | 1.00123  | 1.001229  | 0.967101 |        0.962937        |
|            repvgg_a2            | 128 | 1.000425 | 1.000528  | 0.965269 |        0.960691        |
|        res2net50_14w_8s         | 128 | 1.000212 | 1.000114  |  0.9641  |        0.961767        |
|           fbnetc_100            | 128 | 0.999813 | 1.000333  | 0.958404 |        0.953838        |
|           res2next50            | 128 | 1.000587 | 1.001572  | 0.957722 |        0.955826        |
|          spnasnet_100           | 128 |   1.0    | 1.001177  | 0.951742 |        0.946511        |
|           mnasnet_100           | 128 |   1.0    | 1.000843  | 0.946617 |        0.94138         |
|      mobilenetv3_large_100      | 128 |   1.0    | 0.993162  | 0.940769 |        0.93762         |
|      vit_base_patch16_224       | 64  | 0.999946 | 1.015251  | 0.938716 |        0.937614        |
| deit_base_distilled_patch16_224 | 64  | 0.999166 | 1.010207  | 0.937285 |        0.935767        |
|            pit_b_224            | 64  | 0.999855 | 1.003211  | 0.932635 |        0.931235        |
|            levit_128            | 128 | 1.002678 | 1.002644  | 0.905741 |        0.902892        |
|         crossvit_9_240          | 128 | 0.999212 | 1.000282  | 0.871764 |        0.870197        |
|           regnety_002           | 128 |   1.0    |  0.99901  | 0.866936 |        0.862637        |
|            lcnet_050            | 128 | 1.000406 | 0.967801  | 0.843427 |        0.838246        |
|          jx_nest_base           | 32  | 0.999875 | 1.000032  | 0.733958 |        0.732922        |
+---------------------------------+-----+----------+-----------+----------+------------------------+

Absolute latency (ms)

+---------------------------------+-----+------------+------------+------------+------------------------+
|              name               | bs  |   eager    | aot_eager  |  inductor  | inductor_no_cudagraphs |
+---------------------------------+-----+------------+------------+------------+------------------------+
|            hrnet_w18            | 128 | 446.878638 | 611.969061 | 443.782737 |       415.895688       |
|        convmixer_768_32         | 32  | 294.645387 | 311.184605 | 294.741448 |       295.729404       |
|          pnasnet5large          | 16  | 232.732296 | 316.778479 | 222.770819 |       208.410549       |
|           tf_mixnet_l           | 128 | 196.872753 | 235.871277 | 158.46994  |       160.293271       |
|            mixnet_l             | 128 | 188.98348  | 227.929676 | 152.45178  |       154.710412       |
|        tnt_s_patch16_224        | 128 | 363.341809 | 370.530738 | 143.092395 |       145.923881       |
|           resnest101e           | 64  | 169.119573 | 228.064175 | 117.528888 |       126.475063       |
|             dla102              | 128 | 182.030465 | 221.841947 | 116.864859 |       118.460653       |
|           convit_base           | 64  | 181.540373 | 186.158534 | 116.059079 |       118.158031       |
|      beit_base_patch16_224      | 64  | 124.679585 | 138.527407 | 110.360664 |       111.161983       |
|        res2net50_14w_8s         | 128 | 150.324309 | 210.688859 | 109.422432 |       111.274625       |
|     swsl_resnext101_32x16d      | 32  | 118.458758 | 144.639785 | 108.966196 |       116.039383       |
|        adv_inception_v3         | 128 | 165.980914 | 193.918667 | 106.649283 |       108.680916       |
|          inception_v3           | 128 | 165.997887 | 192.462692 | 106.587813 |       108.714911       |
|       gluon_inception_v3        | 128 | 165.857755 | 192.871784 | 106.553094 |       108.537034       |
|         poolformer_m36          | 64  | 145.918709 | 150.027273 | 104.905518 |       107.808836       |
|        res2net101_26w_4s        | 64  | 102.486962 | 181.458481 | 97.619046  |       94.436208        |
|         visformer_small         | 128 | 110.407552 | 116.551611 |  92.84181  |       97.240383        |
|           res2next50            | 128 | 127.266728 | 156.846056 | 91.100126  |       92.625477        |
|          mixer_b16_224          | 128 | 116.580148 | 120.330186 | 87.796646  |       88.465373        |
|             dpn107              | 32  | 110.562657 | 149.513341 | 87.788073  |       89.607133        |
|  swin_base_patch4_window7_224   | 64  | 142.119912 | 192.836912 | 87.491142  |       89.951344        |
|          jx_nest_base           | 32  | 104.265713 | 156.002293 | 85.117311  |       87.159175        |
|           volo_d1_224           | 64  | 134.242745 | 146.037595 | 83.782373  |       85.412843        |
|        eca_halonext26ts         | 128 | 114.145529 | 152.799788 | 82.533784  |       82.990996        |
|            fbnetv3_b            | 128 | 119.365616 | 161.694348 | 80.882374  |       81.984004        |
|          convnext_base          | 64  | 122.041312 | 133.30372  | 79.309284  |        80.94466        |
|          gmlp_s16_224           | 128 | 136.750852 | 140.324328 | 78.414999  |       79.949312        |
|           dm_nfnet_f0           | 128 | 120.83973  | 124.176647 |  75.11734  |       79.402598        |
|       eca_botnext26ts_256       | 128 | 110.273459 | 148.783847 | 73.941916  |       74.422772        |
|          botnet26t_256          | 128 | 106.74703  | 122.109975 | 72.835181  |        73.21282        |
|          cait_m36_384           |  2  | 169.039562 | 302.431277 | 68.975127  |       83.133636        |
|          gmixer_24_224          | 128 | 117.516488 | 144.904971 | 68.639002  |       69.842059        |
|            nfnet_l0             | 128 | 104.999651 | 134.500987 | 66.132067  |       70.338611        |
|            gernet_l             | 128 | 76.509915  | 92.917121  | 65.809415  |       66.360813        |
|          cspdarknet53           | 64  | 93.222172  | 112.320786 | 64.998876  |       65.869118        |
|            pit_b_224            | 64  | 78.724206  | 86.501541  | 64.041408  |       65.063404        |
|           rexnet_100            | 128 | 86.483862  | 121.371797 | 60.386357  |       61.641991        |
|      vit_base_patch16_224       | 64  | 68.160133  | 70.768295  | 60.357166  |        61.00665        |
| deit_base_distilled_patch16_224 | 64  | 68.712661  | 71.701813  | 60.125983  |       60.883998        |
|            repvgg_a2            | 128 | 75.521186  | 95.544417  | 59.744078  |       60.204655        |
|      xcit_large_24_p8_224       |  5  | 125.378018 | 257.084322 |  59.59185  |       79.512499        |
|         coat_lite_mini          | 128 | 112.387396 | 124.341592 | 59.210266  |       60.389376        |
|           mobilevit_s           | 64  | 85.465966  | 133.228319 | 57.256435  |       62.152627        |
|       tf_efficientnet_b0        | 128 | 85.684352  | 122.393691 | 56.514047  |       57.474226        |
|        twins_pcpvt_base         | 64  | 106.863494 | 189.787044 | 55.294221  |       67.516871        |
|           fbnetc_100            | 128 | 83.424177  | 107.685281 |  53.60418  |       54.139423        |
|            tinynet_a            | 128 | 72.048109  | 115.805231 | 51.070798  |       58.165759        |
|        sebotnet33ts_256         | 64  | 81.416487  | 104.254351 | 50.799102  |       51.710679        |
|          ghostnet_100           | 128 | 96.153795  | 146.156668 | 48.912041  |       55.489097        |
|          spnasnet_100           | 128 |  73.10982  | 93.504571  | 46.414704  |       47.265541        |
|          resmlp_12_224          | 128 | 52.917786  |  69.49245  | 45.985379  |       46.419233        |
|        ese_vovnet19b_dw         | 128 | 63.288247  | 73.521745  | 44.024072  |       44.288846        |
|           SelecSls42b           | 128 | 62.412993  | 77.107054  | 42.637271  |        43.54646        |
|           mnasnet_100           | 128 |  67.29454  | 85.625227  | 41.141425  |       41.438283        |
|         mobilenetv2_100         | 128 | 64.984806  | 84.671564  | 40.120431  |       40.339911        |
|         crossvit_9_240          | 128 |  64.26486  | 113.953064 | 39.703625  |       48.413579        |
|      mobilenetv3_large_100      | 128 | 63.010654  |   83.221   | 38.300878  |       39.772848        |
|            levit_128            | 128 | 53.758502  | 113.336888 | 25.693503  |       36.250379        |
|           regnety_002           | 128 | 40.779376  | 70.479183  | 23.091201  |       30.022773        |
|            lcnet_050            | 128 | 31.582245  | 45.758263  | 15.384557  |       18.366872        |
+---------------------------------+-----+------------+------------+------------+------------------------+

Performance graphs

see more

/data/home/williamwen/cluster/oneoff_cron_logs/day_144_24_05_23_performance_amp_246/timm_models_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_144_24_05_23_performance_amp_246/torchbench_amp.png :

/data/home/williamwen/cluster/oneoff_cron_logs/day_144_24_05_23_performance_amp_246/huggingface_amp.png :

Build Summary

see more

Run name

day_144_24_05_23_performance_amp_246

Commit hashes

pytorch commit: a370bca9a97aaf3c1bc36adbc2d68428fde8e74c
pytorch commit date: 2023-05-18 22:52:38+00:00
torchbench commit: 3f2a2a1583f5ec480e4882f632445807d1c4d487
torchbench commit date: 2023-05-24 10:26:51-07:00

TorchDynamo config flags

Torch version

torch: 2.1.0a0+git956bd03

Environment variables

TORCH_CUDA_ARCH_LIST = 8.0
CUDA_HOME = /usr/local/cuda-11.6
USE_LLVM = /usr/lib/llvm-10

GPU details

CUDNN VERSION: 8401
Number CUDA Devices: 1
Device Name: NVIDIA A100-SXM4-40GB
Device Memory [GB]: 42.481549312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants