Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.
-
Validated Quantization Examples
1.1. TensorFlow Models with Intel TensorFlow 2.13.0
1.2. PyTorch Models with Torch 2.0.1+cpu in PTQ Mode
1.3. PyTorch Models with Torch 2.0.1+cpu in QAT Mode
1.4. PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu
1.5. PyTorch Models with Torch 2.0.1+cpu in WOQ Mode
-
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime
System summary: Test by Intel on 09/01/2023. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 256GB (16x16GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3A14.TEL2P1, microcode 0x2b0001b0,
CentOS Stream 8, gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16.
Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model.
Using 1 socket, 56 cores/instance, 1 instance and batch size 1 for some large models performance measurement.
Performance varies by use, configuration and other factors.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet50 v1.0 | pb | 74.12% | 74.27% | -0.21% | 2914.42 | 621.91 | 4.69x |
ResNet50 v1.5 | pb | 76.23% | 76.46% | -0.31% | 2160.07 | 545.47 | 3.96x |
ResNet101 | pb | 77.50% | 76.45% | 1.37% | 1508.97 | 428.02 | 3.53x |
Inception V1 | pb | 70.44% | 69.74% | 1.01% | 3290.75 | 1229.78 | 2.68x |
Inception V2 | pb | 74.38% | 73.97% | 0.57% | 2404.57 | 1048.49 | 2.29x |
Inception V3 | pb | 76.71% | 76.75% | -0.05% | 1669.09 | 500.95 | 3.33x |
Inception V4 | pb | 80.18% | 80.27% | -0.11% | 1073.14 | 245.13 | 4.38x |
Inception ResNet V2 | pb | 80.34% | 80.40% | -0.07% | 374.52 | 172.06 | 2.18x |
MobileNet V1 | pb | 71.78% | 70.96% | 1.16% | 5478.88 | 1756.33 | 3.12x |
MobileNet V2 | pb | 72.52% | 71.76% | 1.07% | 4133.01 | 1748.06 | 2.36x |
VGG16 | pb | 72.64% | 70.89% | 2.47% | 1534.50 | 236.62 | 6.49x |
VGG19 | pb | 72.69% | 71.01% | 2.37% | 1377.40 | 197.77 | 6.96x |
ResNetV2 50 | pb | 70.39% | 69.64% | 1.07% | 1125.32 | 656.38 | 1.71x |
ResNetV2 101 | pb | 72.62% | 71.87% | 1.04% | 709.50 | 367.00 | 1.93x |
ResNetV2 152 | pb | 73.11% | 72.37% | 1.03% | 497.24 | 265.34 | 1.87x |
Densenet 121 | pb | 73.59% | 72.89% | 0.97% | 557.67 | 456.61 | 1.22x |
Densenet 161 | pb | 76.35% | 76.29% | 0.08% | 353.18 | 235.35 | 1.50x |
Densenet 169 | pb | 74.34% | 74.65% | -0.41% | 435.44 | 385.73 | 1.13x |
EfficientNet B0 | ckpt | 76.15% | 76.76% | -0.79% | 786.55 | 723.69 | 1.09x |
SSD ResNet50 V1 | pb | 37.88% | 38.00% | -0.31% | 130.09 | 30.78 | 4.23x |
SSD MobileNet V1 | pb | 22.98% | 23.13% | -0.64% | 1291.02 | 683.50 | 1.89x |
SSD ResNet50 v1 | ckpt | 37.89% | 38.00% | -0.30% | 127.30 | 27.63 | 4.61x |
SSD MobileNet v1 | ckpt | 22.96% | 23.13% | -0.72% | 1295.23 | 453.76 | 2.85x |
SSD ResNet34 | pb | 21.70% | 22.09% | -1.76% | 242.91 | 14.03 | 17.31x |
Faster R-CNN Inception ResNet V2 | pb | 37.47% | 38.31% | -2.18% | 5.44 | 3.02 | 1.80x |
Faster R-CNN Inception ResNet V2 | SavedModel | 37.79% | 38.31% | -1.34% | 5.43 | 3.00 | 1.81x |
Faster R-CNN ResNet101 | pb | 30.32% | 30.39% | -0.23% | 166.37 | 23.54 | 7.07x |
Faster R-CNN ResNet101 | SavedModel | 30.33% | 30.39% | -0.20% | 151.54 | 18.58 | 8.16x |
Faster R-CNN ResNet50 | pb | 26.64% | 26.59% | 0.21% | 173.33 | 28.58 | 6.07x |
YOLOv3 | pb | 82.13% | 82.35% | -0.28% | 230.69 | 88.35 | 2.61x |
BERT large SQuAD | pb | 92.36 | 92.99 | -0.67% | 59.76 | 17.71 | 3.37x |
BERT large SQuAD (ONNX Model Zoo) | pb | 92.26 | 92.98 | -0.78% | 41.65 | 16.14 | 2.58x |
BERT base MRPC | ckpt | 87.01% | 86.52% | 0.57% | 416.57 | 177.06 | 2.35x |
Transformer LT | pb | 25.68 | 25.86 | -0.67% | 41.19 | 21.94 | 1.88x |
Transformer lt MLPerf | pb | 27.27 | 27.17 | 0.39% | 9.77 | 4.51 | 2.17x |
Wide Deep large DS | pb | 77.75% | 77.67% | 0.10% | 75552.26 | 50803.82 | 1.49x |
Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
Mask R-CNN Inception V2 | pb | 28.60% | 28.73% | -0.44% | 41.96 | 25.66 | 1.64x |
Mask R-CNN Inception V2 | ckpt | 28.60% | 28.73% | -0.44% | 41.56 | 24.35 | 1.71x |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet18 | static | 69.61% | 69.76% | -0.22% | 1673.05 | 653.13 | 2.56x |
ResNet50 | static | 75.92% | 76.15% | -0.30% | 1170.62 | 329.70 | 3.55x |
Inception V3 | static | 69.47% | 69.52% | -0.07% | 977.08 | 335.55 | 2.91x |
ResNeSt50 | static | 80.80% | 81.04% | -0.30% | 404.51 | 40.04 | 10.10x |
ResNeXt101_32x8d | static | 78.94% | 79.31% | -0.46% | 562.16 | 109.77 | 5.12x |
Efficientnet_b0 | static | 76.89% | 77.67% | -1.01% | 696.79 | 667.27 | 1.04x |
Efficientnet_b3 | static | 77.82% | 78.54% | -0.93% | 508.85 | 397.32 | 1.28x |
Efficientnet_b7 | static | 73.55% | 73.92% | -0.50% | 234.87 | 149.65 | 1.57x |
Peleenet | static | 71.85% | 72.10% | -0.35% | 858.18 | 588.33 | 1.46x |
SE_ResNeXt50_32x4d | static | 79.03% | 79.08% | -0.07% | 739.61 | 283.60 | 2.61x |
YOLO V3 | static | 55.09% | 54.93% | 0.31% | 161.92 | 60.48 | 2.68x |
SSD ResNet34 | static | 19.52 | 19.63 | -0.58% | 141.26 | 11.78 | 11.99x |
Roberta base MRPC | static | 92.69% | 93.59% | -0.96% | 404.62 | 174.02 | 2.33x |
CamemBERT base MRPC | static | 88.93% | 89.28% | -0.39% | 395.08 | 171.78 | 2.30x |
DistilBERT base MRPC | static | 89.53% | 90.27% | -0.82% | 795.98 | 341.60 | 2.33x |
DistilBERT base MRPC | dynamic | 90.20% | 90.27% | -0.07% | 744.78 | 343.36 | 2.17x |
ALBERT base MRPC | static | 92.63% | 92.63% | 0.00% | 374.41 | 163.39 | 2.29x |
Funnel MRPC | static | 91.60% | 92.25% | -0.71% | 300.02 | 182.21 | 1.65x |
Xlm Roberta MRPC | static | 88.36% | 88.62% | -0.29% | 399.27 | 173.62 | 2.30x |
Xlm Roberta MRPC | dynamic | 88.24% | 88.24% | 0.00% | 385.00 | 174.37 | 2.21x |
BERT base MRPC | static | 89.63% | 90.42% | -0.87% | 407.79 | 173.24 | 2.35x |
BERT base COLA | static | 54.51% | 53.39% | 2.10% | 412.12 | 172.97 | 2.38x |
BERT base STSB | static | 87.55% | 88.05% | -0.57% | 413.19 | 173.17 | 2.39x |
BERT base SST-2 | static | 91.51% | 92.32% | -0.87% | 409.94 | 172.77 | 2.37x |
BERT large COLA | static | 62.84% | 63.35% | -0.80% | 141.90 | 51.55 | 2.75x |
BERT base RTE | static | 72.56% | 72.56% | 0.00% | 401.42 | 174.02 | 2.31x |
BERT large MRPC | static | 90.22% | 90.38% | -0.17% | 139.59 | 51.66 | 2.70x |
BERT large QNLI | static | 90.87% | 91.54% | -0.74% | 406.48 | 172.94 | 2.35x |
BERT large RTE | static | 73.29% | 74.01% | -0.98% | 141.92 | 51.41 | 2.76x |
BERT large RTE | dynamic | 71.48% | 74.01% | -3.41% | 128.46 | 51.61 | 2.49x |
BERT large SQuAD | static | 92.27 | 93.16 | -0.95% | 37.59 | 16.48 | 2.28x |
Reformer Crime and Punishment | static | 1.88 | 1.87 | 0.23% | 446.29 | 398.25 | 1.12x |
lvwerra/pegasus-samsum | static | 42.50 | 42.67 | -0.39% | 102.63 | 37.94 | 2.71x |
T5 Small | dynamic | 2.65 | 3.16 | -16.25% | 770.18 | 450.79 | 1.71x |
Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
EleutherAI/gpt-j-6B | static | 3.36 | 2.34 | 43.85% | 0.88 | 0.28 | 3.14x |
openai/whisper-large | dynamic | 97.07% | 96.96% | 0.12% | 0.59 | 0.47 | 1.25x |
abeja/gpt-neox-japanese-2.7b | static | 4.30 | 3.52 | 22.06% | 1.04 | 0.55 | 1.90x |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet18 | static | 69.74% | 69.76% | -0.03% | 1646.74 | 657.43 | 2.50x |
ResNet50 | static | 76.05% | 76.15% | -0.12% | 1098.80 | 322.34 | 3.41x |
ResNeXt101_32x8d | static | 79.28% | 79.31% | -0.04% | 568.02 | 109.50 | 5.19x |
MobileNet V2 | static | 69.73% | 71.84% | -2.93% | 1383.77 | 761.35 | 1.82x |
BERT base MRPC | static | 89.50% | 90.40% | -1.00% | 401.83 | 173.17 | 2.32x |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet18 | static | 69.56% | 69.76% | -0.29% | 5701.04 | 1593.88 | 3.58x |
ResNet50 | static | 75.98% | 76.15% | -0.22% | 2090.03 | 685.29 | 3.05x |
ResNeXt101_32x16d_wsl | static | 84.04% | 84.17% | -0.15% | 556.86 | 79.42 | 7.01x |
SSD ResNet34 | static | 19.93% | 20.00% | -0.38% | 91.53 | 15.62 | 5.86x |
bert-large-uncased-whole-word-masking-finetuned-squad | static | 92.93 | 93.16 | -0.25% | 162.94 | 22.37 | 7.29x |
distilbert-base-uncased-distilled-squad | static | 86.09 | 86.84 | -0.86% | 558.66 | 151.25 | 3.69x |
Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
EleutherAI/gpt-j-6B | static | 78.70% | 79.20% | -0.63% | 4.88 | 1.57 | 3.11x |
Model name | Configuration | Lambada_openai | Hellaswag | Winogrande | Piqa | Average [Mean accuracy of previous four tasks] |
Wikitext | |
---|---|---|---|---|---|---|---|---|
Accuracy | Accuracy | Accuracy | Accuracy | Accuracy | Accuracy Ratio [INT4/FP32] |
Word_perplexity | ||
EleutherAI/gpt-j-6b | FP32 | 0.6831 | 0.4954 | 0.6409 | 0.7541 | 0.6434 | / | 10.8816 |
GPTQ W4G128Asym |
0.679 | 0.4895 | 0.6433 | 0.7476 | 0.6399 | 0.9945 | 11.0999 | |
GPTQ W4G32Asym |
0.6829 | 0.4923 | 0.6401 | 0.7486 | 0.6410 | 0.9963 | 11.0141 | |
GPTQ W4G128Sym |
0.685 | 0.4907 | 0.6361 | 0.7443 | 0.6390 | 0.9932 | 11.1498 | |
GPTQ W4G32Sym |
0.6911 | 0.4899 | 0.6448 | 0.7497 | 0.6439 | 1.0008 | 11.0927 | |
facebook/opt-6.7b | FP32 | 0.6769 | 0.5049 | 0.6543 | 0.7628 | 0.6497 | / | 12.2862 |
GPTQ W4G32Asym |
0.6804 | 0.4984 | 0.6535 | 0.7568 | 0.6473 | 0.9962 | 12.4193 | |
GPTQ W4G32Sym |
0.6885 | 0.4973 | 0.6433 | 0.753 | 0.6455 | 0.9935 | 12.4607 | |
decapoda-research/llama-7b-hf | FP32 | 0.7361 | 0.5642 | 0.6709 | 0.7835 | 0.6887 | / | 9.4202 |
GPTQ W4G32Asym |
0.7244 | 0.5603 | 0.6614 | 0.7835 | 0.6824 | 0.9909 | 9.5881 | |
decapoda-research/llama-13b-hf | FP32 | 0.7627 | 0.5911 | 0.7009 | 0.7878 | 0.7106 | / | 8.212 |
GPTQ W4G128Asym |
0.7518 | 0.5843 | 0.6961 | 0.7911 | 0.7058 | 0.9932 | 8.4319 | |
GPTQ W4G32Asym |
0.7572 | 0.5898 | 0.7056 | 0.7894 | 0.7105 | 0.9998 | 8.3429 | |
GPTQ W4G128Sym |
0.7596 | 0.5841 | 0.6977 | 0.7905 | 0.7080 | 0.9963 | 8.4916 | |
decapoda-research/llama-30b-hf | FP32 | 0.7759 | 0.6266 | 0.7277 | 0.8096 | 0.7350 | / | 6.2384 |
GPTQ W4G128Asym |
0.778 | 0.624 | 0.7269 | 0.8047 | 0.7334 | 0.9979 | 6.4237 | |
GPTQ W4G32Asym |
0.7706 | 0.6239 | 0.7285 | 0.8058 | 0.7322 | 0.9963 | 6.4697 | |
GPTQ W4G128Sym |
0.7836 | 0.6195 | 0.7269 | 0.8047 | 0.7337 | 0.9983 | 6.5604 | |
meta-llama/Llama-2-7b-chat-hf | FP32 | 0.7058 | 0.5732 | 0.648 | 0.7715 | 0.6746 | / | 11.7107 |
GPTQ W4G128Asym |
0.6982 | 0.5637 | 0.6527 | 0.7704 | 0.6713 | 0.9950 | 11.9702 | |
GPTQ W4G32Asym |
0.6953 | 0.5682 | 0.6575 | 0.7758 | 0.6742 | 0.9994 | 11.9317 | |
meta-llama/Llama-2-7b-hf | FP32 | 0.7392 | 0.567 | 0.6709 | 0.7835 | 0.6902 | / | 8.7911 |
GPTQ W4G32Asym |
0.7353 | 0.5642 | 0.6622 | 0.7829 | 0.6862 | 0.9942 | 8.9635 | |
GPTQ W4G128Sym |
0.7246 | 0.5617 | 0.6756 | 0.7797 | 0.6854 | 0.9931 | 9.2799 | |
meta-llama/Llama-2-13b-chat-hf | FP32 | 0.7312 | 0.6059 | 0.7103 | 0.7835 | 0.7077 | / | 10.2213 |
GPTQ W4G128Asym |
0.7273 | 0.6018 | 0.7088 | 0.7742 | 0.7030 | 0.9934 | 2538.083 | |
GPTQ W4G32Asym |
0.7283 | 0.6053 | 0.7024 | 0.7764 | 0.7031 | 0.9935 | 1889.374 | |
GPTQ W4G128Sym |
0.727 | 0.5997 | 0.7024 | 0.778 | 0.7018 | 0.9916 | 2504.497 | |
meta-llama/Llama-2-13b-hf | FP32 | 0.7677 | 0.5972 | 0.6961 | 0.7878 | 0.7122 | / | 7.8984 |
GPTQ W4G128Asym |
0.7627 | 0.5933 | 0.689 | 0.7851 | 0.7075 | 0.9934 | 1556.448 | |
GPTQ W4G32Asym |
0.7675 | 0.5934 | 0.6977 | 0.7856 | 0.7111 | 0.9984 | 1514.927 | |
GPTQ W4G128Sym |
0.7566 | 0.5899 | 0.7032 | 0.7856 | 0.7088 | 0.9953 | 1374.728 | |
bigscience/bloom-7b1 | FP32 | 0.5764 | 0.4628 | 0.6456 | 0.7269 | 0.6029 | / | 30.6438 |
GPTQ W4G32Sym |
0.5799 | 0.4542 | 0.6361 | 0.7312 | 0.6004 | 0.9957 | 32.0626 | |
bigscience/bloomz-7b1 | FP32 | 0.5593 | 0.4789 | 0.6527 | 0.7628 | 0.6134 | / | 51.7432 |
GPTQ W4G32Asym |
0.5525 | 0.4731 | 0.6504 | 0.7617 | 0.6094 | 0.9935 | 52.7828 | |
databricks/dolly-v1-6b | FP32 | 0.6866 | 0.5098 | 0.6433 | 0.7622 | 0.6505 | / | 11.3242 |
GPTQ W4G128Asym |
0.6878 | 0.5058 | 0.6393 | 0.7633 | 0.6491 | 0.9978 | 11.5514 | |
GPTQ W4G32Asym |
0.6864 | 0.5084 | 0.6519 | 0.7568 | 0.6509 | 1.0006 | 11.4728 | |
GPTQ W4G128Sym |
0.6876 | 0.5045 | 0.6433 | 0.7541 | 0.6474 | 0.9952 | 11.6474 | |
databricks/dolly-v2-7b | FP32 | 0.6379 | 0.5282 | 0.614 | 0.7448 | 0.6312 | / | 16.161 |
GPTQ W4G32Asym |
0.6377 | 0.5228 | 0.5991 | 0.7448 | 0.6261 | 0.9919 | 16.4096 | |
EleutherAI/gpt-neo-2.7b | FP32 | 0.6224 | 0.4271 | 0.577 | 0.722 | 0.5871 | / | 13.9359 |
GPTQ W4G128Asym |
0.6123 | 0.4227 | 0.5738 | 0.7203 | 0.5823 | 0.9917 | 14.3377 | |
GPTQ W4G32Asym |
0.615 | 0.4259 | 0.5714 | 0.7247 | 0.5843 | 0.9951 | 14.2083 | |
GPTQ W4G32Sym |
0.6154 | 0.4208 | 0.5777 | 0.7198 | 0.5834 | 0.9937 | 14.3121 | |
EleutherAI/gpt-neox-20b | FP32 | 0.7233 | 0.5359 | 0.6614 | 0.7753 | 0.6740 | / | 9.195 |
GPTQ W4G128Asym |
0.7186 | 0.5328 | 0.6535 | 0.7699 | 0.6687 | 0.9922 | 9.3463 | |
GPTQ W4G32Asym |
0.7268 | 0.533 | 0.659 | 0.7715 | 0.6726 | 0.9979 | 9.2897 | |
mosaicml/mpt-7b | FP32 | 0.7056 | 0.5718 | 0.6859 | 0.7927 | 0.6890 | / | 9.9324 |
GPTQ W4G128Asym |
0.7006 | 0.5655 | 0.6803 | 0.7965 | 0.6857 | 0.9952 | 10.1515 | |
mosaicml/mpt-7b-chat | FP32 | 0.655 | 0.5752 | 0.6748 | 0.7845 | 0.6724 | / | 13.5951 |
GPTQ W4G128Asym |
0.6472 | 0.5716 | 0.6685 | 0.784 | 0.6678 | 0.9932 | 13.8539 | |
mosaicml/mpt-7b-instruct | FP32 | 0.6918 | 0.5819 | 0.678 | 0.7927 | 0.6861 | / | 10.8863 |
GPTQ W4G128Asym |
0.6864 | 0.5765 | 0.6827 | 0.7873 | 0.6832 | 0.9958 | 11.1451 | |
mosaicml/mpt-7b-storywriter | FP32 | 0.693 | 0.5477 | 0.663 | 0.784 | 0.6719 | / | 9.9125 |
GPTQ W4G128Asym |
0.6854 | 0.5443 | 0.6661 | 0.7813 | 0.6693 | 0.9961 | 10.1137 | |
tiiuae/falcon-rw-7b | FP32 | 0.6604 | 0.5419 | 0.6598 | 0.7753 | 0.6594 | / | 11.7616 |
GPTQ W4G128Asym |
0.6484 | 0.5369 | 0.6575 | 0.7807 | 0.6559 | 0.9947 | 11.9411 | |
GPTQ W4G32Asym |
0.6571 | 0.5398 | 0.6582 | 0.7764 | 0.6579 | 0.9978 | 11.8809 | |
GPTQ W4G128Sym |
0.652 | 0.535 | 0.6575 | 0.7682 | 0.6532 | 0.9906 | 12.0048 | |
tiiuae/falcon-7b-instruct | FP32 | 0.6437 | 0.5177 | 0.6669 | 0.7824 | 0.6527 | / | 14.5053 |
GPTQ W4G128Asym |
0.6301 | 0.5142 | 0.6654 | 0.7835 | 0.6483 | 0.9933 | 14.8146 | |
GPTQ W4G32Asym |
0.6377 | 0.517 | 0.6598 | 0.7807 | 0.6488 | 0.9941 | 14.6953 |
Model | Example | Accuracy | Performance 1s4c14ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
ResNet50 V1.5 | qlinearops | 72.16% | 72.29% | -0.19% | 1566.70 | 724.89 | 2.16x |
ResNet50 V1.5 | qdq | 72.14% | 72.29% | -0.22% | 1567.15 | 716.57 | 2.19x |
ResNet50 V1.5 MLPerf | qlinearops | 76.11% | 76.46% | -0.46% | 1414.92 | 718.25 | 1.97x |
ResNet50 V1.5 MLPerf | qdq | 76.13% | 76.46% | -0.44% | 1459.45 | 721.54 | 2.02x |
ResNet50 V1.5 (ONNX Model Zoo) | qlinearops | 74.82% | 74.99% | -0.22% | 1593.71 | 753.89 | 2.11x |
ResNet50 V1.5 (ONNX Model Zoo) | qdq | 74.82% | 74.99% | -0.23% | 1582.24 | 752.38 | 2.10x |
MobileNet V2 | qlinearops | 65.49% | 66.89% | -2.09% | 7139.93 | 4289.29 | 1.66x |
MobileNet V2 | qdq | 65.49% | 66.89% | -2.10% | 7335.80 | 4080.31 | 1.80x |
MobileNet V2 (ONNX Model Zoo) | qlinearops | 68.38% | 69.48% | -1.59% | 7236.84 | 4299.29 | 1.68x |
MobileNet V2 (ONNX Model Zoo) | qdq | 68.38% | 69.48% | -1.59% | 6842.58 | 4496.44 | 1.52x |
VGG16 | qlinearops | 66.56% | 66.69% | -0.19% | 591.43 | 178.91 | 3.31x |
VGG16 | qdq | 66.59% | 66.69% | -0.15% | 614.91 | 183.79 | 3.35x |
VGG16 (ONNX Model Zoo) | qlinearops | 72.33% | 72.40% | -0.09% | 590.04 | 182.90 | 3.23x |
VGG16 (ONNX Model Zoo) | qdq | 72.33% | 72.40% | -0.09% | 614.75 | 179.93 | 3.42x |
MobileNet V3 MLPerf | qlinearops | 75.56% | 75.74% | -0.24% | 5703.81 | 2578.80 | 2.21x |
MobileNet V3 MLPerf | qdq | 75.56% | 75.74% | -0.24% | 5610.37 | 2603.41 | 2.16x |
ShuffleNet V2 (ONNX Model Zoo) | qlinearops | 66.09% | 66.36% | -0.41% | 6689.57 | 3690.63 | 1.81x |
ShuffleNet V2 (ONNX Model Zoo) | qdq | 66.09% | 66.36% | -0.41% | 5692.38 | 3758.23 | 1.51x |
GoogleNet (ONNX Model Zoo) | qlinearops | 67.71% | 67.79% | -0.12% | 1792.52 | 1111.26 | 1.61x |
GoogleNet (ONNX Model Zoo) | qdq | 67.73% | 67.79% | -0.09% | 1821.10 | 1104.52 | 1.65x |
SqueezeNet (ONNX Model Zoo) | qlinearops | 56.54% | 56.87% | -0.57% | 9472.72 | 5582.40 | 1.70x |
SqueezeNet (ONNX Model Zoo) | qdq | 56.54% | 56.87% | -0.57% | 9861.50 | 5566.72 | 1.77x |
CaffeNet (ONNX Model Zoo) | qlinearops | 56.21% | 56.30% | -0.16% | 3348.37 | 1141.01 | 2.93x |
CaffeNet (ONNX Model Zoo) | qdq | 56.25% | 56.30% | -0.09% | 3509.70 | 1142.19 | 3.07x |
AlexNet (ONNX Model Zoo) | qlinearops | 54.73% | 54.79% | -0.10% | 2426.58 | 987.34 | 2.46x |
AlexNet (ONNX Model Zoo) | qdq | 54.71% | 54.79% | -0.14% | 2208.63 | 1016.53 | 2.17x |
ZFNet (ONNX Model Zoo) | qlinearops | 55.84% | 55.96% | -0.21% | 930.06 | 532.61 | 1.75x |
ZFNet (ONNX Model Zoo) | qdq | 55.86% | 55.96% | -0.18% | 919.83 | 417.00 | 2.21x |
Inception V1 (ONNX Model Zoo) | qlinearops | 67.21% | 67.24% | -0.05% | 1880.94 | 1159.97 | 1.62x |
Inception V1 (ONNX Model Zoo) | qdq | 67.21% | 67.24% | -0.05% | 1798.96 | 1151.37 | 1.56x |
EfficientNet (ONNX Model Zoo) | qlinearops | 76.98% | 77.11% | -0.17% | 2890.97 | 1380.23 | 2.09x |
EfficientNet (ONNX Model Zoo) | qdq | 76.99% | 77.11% | -0.16% | 2548.20 | 1362.69 | 1.87x |
DenseNet (ONNX Model Zoo) | qlinearops | 60.53% | 60.96% | -0.70% | 657.12 | 507.94 | 1.29x |
SSD (ONNX Model Zoo) | qlinearops | 18.47% | 18.98% | -2.69% | 57.63 | 14.64 | 3.94x |
SSD (ONNX Model Zoo) | qdq | 18.62% | 18.98% | -1.89% | 56.96 | 14.58 | 3.91x |
SSD MobileNet V1 | qlinearops | 22.44% | 23.10% | -2.86% | 1286.79 | 904.83 | 1.42x |
SSD MobileNet V1 | qdq | 22.44% | 23.10% | -2.86% | 1121.02 | 856.82 | 1.31x |
SSD MobileNet V1 (ONNX Model Zoo) | qlinearops | 22.96% | 23.02% | -0.27% | 1098.80 | 829.55 | 1.32x |
SSD MobileNet V1 (ONNX Model Zoo) | qdq | 22.96% | 23.02% | -0.27% | 1044.34 | 790.39 | 1.32x |
SSD MobileNet V2 | qlinearops | 23.87% | 24.67% | -3.25% | 849.89 | 627.62 | 1.35x |
YOLOv3 (ONNX Model Zoo) | qlinearops | 27.01% | 28.73% | -5.99% | 66.22 | 83.98 | 0.79x |
YOLOv4 (ONNX Model Zoo) | qlinearops | 32.30% | 33.71% | -4.19% | 70.87 | 66.16 | 1.07x |
DUC (ONNX Model Zoo) | qlinearops | 81.63% | 81.92% | -0.36% | 9.15 | 4.90 | 1.87x |
Tiny YOLOv3 (ONNX Model Zoo) | qlinearops | 11.74% | 12.42% | -5.48% | 1119.16 | 161.90 | 6.91x |
Ultra Face (ONNX Model Zoo) | qlinearops | 83.17% | 83.65% | -0.57% | 8537.50 | 1934.53 | 4.41x |
Emotion FERPlus (ONNX Model Zoo) | qlinearops | 7.97% | 8.00% | -0.35% | 3568.69 | 3121.38 | 1.14x |
ArcFace (ONNX Model Zoo) | qlinearops | 99.80% | 99.80% | 0.00% | 494.07 | 244.21 | 2.02x |
BERT base MRPC | qlinearops | 85.54% | 86.03% | -0.57% | 398.76 | 226.09 | 1.76x |
BERT base MRPC | qdq | 85.54% | 86.03% | -0.57% | 392.94 | 223.06 | 1.76x |
BERT base MRPC | integerops | 85.29% | 86.03% | -0.85% | 473.72 | 223.12 | 2.12x |
DistilBERT base MRPC | qdq | 84.07% | 84.56% | -0.58% | 548.57 | 400.62 | 1.37x |
DistilBERT base MRPC | integerops | 85.54% | 84.56% | 1.16% | 964.62 | 400.86 | 2.41x |
Mobile bert MRPC | qdq | 85.54% | 86.28% | -0.85% | 540.59 | 394.98 | 1.37x |
Mobile bert MRPC | integerops | 85.54% | 86.28% | -0.85% | 602.34 | 397.35 | 1.52x |
Roberta base MRPC | integerops | 90.93% | 89.95% | 1.09% | 487.62 | 222.08 | 2.20x |
BERT SQuAD (ONNX Model Zoo) | integerops | 80.29 | 80.67 | -0.47% | 189.27 | 97.40 | 1.94x |
MobileBERT SQuAD MLPerf (ONNX Model Zoo) | integerops | 89.87 | 90.03 | -0.17% | 146.72 | 125.33 | 1.17x |
BiDAF (ONNX Model Zoo) | integerops | 65.93% | 66.08% | -0.23% | 2757.59 | 2277.14 | 1.21x |
GPT2 lm head WikiText (ONNX Model Zoo) | integerops | 31.98 | 29.00 | 10.31% | 15.47 | 9.78 | 1.58x |
BERT base cased MRPC (HuggingFace) | qlinearops | 90.21% | 90.42% | -0.23% | 360.90 | 212.41 | 1.70x |
BERT base uncased MRPC (HuggingFace) | integerops | 89.58% | 90.42% | -0.93% | 484.68 | 212.34 | 2.28x |
Roberta base MRPC (HuggingFace) | qlinearops | 91.00% | 91.38% | -0.41% | 353.24 | 213.83 | 1.65x |
Roberta base MRPC (HuggingFace) | integerops | 90.85% | 91.38% | -0.58% | 490.42 | 212.57 | 2.31x |
XLM Roberta base MRPC (HuggingFace) | qlinearops | 89.37% | 90.10% | -0.81% | 304.10 | 214.51 | 1.42x |
XLM Roberta base MRPC (HuggingFace) | integerops | 89.66% | 90.10% | -0.50% | 347.25 | 214.13 | 1.62x |
Camembert base MRPC (HuggingFace) | qlinearops | 89.28% | 89.28% | 0.00% | 272.62 | 216.98 | 1.26x |
Camembert base MRPC (HuggingFace) | integerops | 89.19% | 89.28% | -0.10% | 489.58 | 216.06 | 2.27x |
MiniLM L12 H384 uncased MRPC (HuggingFace) | qlinearops | 90.13% | 90.97% | -0.93% | 1054.31 | 585.78 | 1.80x |
MiniLM L12 H384 uncased MRPC (HuggingFace) | integerops | 91.07% | 90.97% | 0.10% | 1072.47 | 590.03 | 1.82x |
DistilBERT base uncased SST-2 (HuggingFace) | qlinearops | 90.71% | 91.06% | -0.38% | 890.23 | 398.72 | 2.23x |
DistilBERT base uncased SST-2 (HuggingFace) | integerops | 90.25% | 91.06% | -0.88% | 746.66 | 397.78 | 1.88x |
Albert base v2 SST-2 (HuggingFace) | qlinearops | 92.09% | 92.32% | -0.25% | 268.37 | 211.96 | 1.27x |
Albert base v2 SST-2 (HuggingFace) | integerops | 91.74% | 92.32% | -0.62% | 265.65 | 212.21 | 1.25x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | qlinearops | 89.45% | 90.14% | -0.76% | 1958.82 | 1130.40 | 1.73x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | integerops | 89.91% | 90.14% | -0.26% | 2022.09 | 1130.14 | 1.79x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | qlinearops | 87.70% | 88.29% | -0.67% | 397.45 | 212.84 | 1.87x |
MiniLM L6 H384 uncased SST-2 (HuggingFace) | integerops | 88.19% | 88.29% | -0.12% | 489.19 | 213.14 | 2.30x |
Electra small discriminator MRPC (HuggingFace) | qlinearops | 89.92% | 89.83% | 0.09% | 1797.98 | 1077.51 | 1.67x |
Electra small discriminator MRPC (HuggingFace) | integerops | 89.27% | 89.83% | -0.63% | 1930.55 | 1139.74 | 1.69x |
BERT mini MRPC (HuggingFace) | qlinearops | 86.21% | 86.52% | -0.35% | 5510.81 | 3334.89 | 1.65x |
BERT mini MRPC (HuggingFace) | integerops | 86.16% | 86.52% | -0.41% | 5627.19 | 3365.08 | 1.67x |
Xlnet base cased MRPC (HuggingFace) | qlinearops | 90.05% | 89.86% | 0.21% | 108.83 | 92.24 | 1.18x |
Xlnet base cased MRPC (HuggingFace) | integerops | 89.58% | 89.86% | -0.31% | 110.83 | 90.80 | 1.22x |
BART large MRPC (HuggingFace) | qlinearops | 91.77% | 91.20% | 0.63% | 59.18 | 51.49 | 1.15x |
BART large MRPC (HuggingFace) | integerops | 92.36% | 91.20% | 1.28% | 96.38 | 51.47 | 1.87x |
DeBERTa v3 base MRPC (HuggingFace) | qlinearops | 91.85% | 92.23% | -0.40% | 163.17 | 146.13 | 1.12x |
DeBERTa v3 base MRPC (HuggingFace) | integerops | 92.39% | 92.23% | 0.17% | 168.41 | 145.58 | 1.16x |
Spanbert SQuAD (HuggingFace) | qlinearops | 91.14 | 91.98 | -0.91% | 69.53 | 42.72 | 1.63x |
Spanbert SQuAD (HuggingFace) | integerops | 91.40 | 91.98 | -0.63% | 79.82 | 42.58 | 1.87x |
Bert base multilingual cased SQuAD (HuggingFace) | qlinearops | 88.42 | 89.13 | -0.79% | 70.47 | 42.73 | 1.65x |
Bert base multilingual cased SQuAD (HuggingFace) | integerops | 88.70 | 89.13 | -0.48% | 79.35 | 42.46 | 1.87x |
DistilBert base uncased SQuAD (HuggingFace) | qlinearops | 86.33 | 86.86 | -0.62% | 113.00 | 67.85 | 1.67x |
DistilBert base uncased SQuAD (HuggingFace) | integerops | 86.05 | 86.86 | -0.94% | 159.51 | 67.90 | 2.35x |
BERT large uncased whole word masking SQuAD (HuggingFace) | qlinearops | 92.34 | 93.16 | -0.88% | 24.64 | 12.75 | 1.93x |
BERT large uncased whole word masking SQuAD (HuggingFace) | integerops | 92.99 | 93.16 | -0.18% | 26.79 | 12.76 | 2.10x |
Roberta large SQuAD v2 (HuggingFace) | qlinearops | 89.03 | 89.02 | 0.02% | 16.91 | 12.98 | 1.30x |
Roberta large SQuAD v2 (HuggingFace) | integerops | 89.04 | 89.02 | 0.02% | 26.80 | 12.95 | 2.07x |
GPT2 WikiText (HuggingFace) | qlinearops | 30.25 | 29.00 | 4.33% | 12.82 | 9.80 | 1.31x |
GPT2 WikiText (HuggingFace) | integerops | 29.68 | 29.00 | 2.36% | 13.68 | 9.76 | 1.40x |
DistilGPT2 WikiText (HuggingFace) | qlinearops | 44.93 | 43.43 | 3.46% | 20.66 | 16.78 | 1.23x |
DistilGPT2 WikiText (HuggingFace) | integerops | 44.62 | 43.43 | 2.74% | 21.97 | 16.77 | 1.31x |
LayoutLM FUNSD (HuggingFace) | qlinearops | 78.15% | 78.35% | -0.25% | 59.50 | 42.98 | 1.38x |
LayoutLM FUNSD (HuggingFace) | integerops | 77.58% | 78.35% | -0.98% | 64.93 | 43.20 | 1.50x |
LayoutLMv3 FUNSD (HuggingFace) | qlinearops | 90.00% | 90.49% | -0.54% | 30.97 | 27.97 | 1.11x |
LayoutLMv3 FUNSD (HuggingFace) | integerops | 90.07% | 90.49% | -0.46% | 35.15 | 27.72 | 1.27x |
LayoutLMv2 (HuggingFace) | qlinearops | 81.36% | 81.17% | 0.23% | 48.61 | 38.93 | 1.25x |
LayoutLMv2 (HuggingFace) | integerops | 80.86% | 81.17% | -0.39% | 45.52 | 36.10 | 1.26x |
CodeBert (HuggingFace) | qlinearops | 64.97% | 65.41% | -0.67% | 64.99 | 44.20 | 1.47x |
CodeBert (HuggingFace) | integerops | 64.93% | 65.41% | -0.73% | 77.99 | 43.63 | 1.79x |
Model | Example | Accuracy | Performance 1s56c1ins1bs Throughput(samples/sec) |
||||
---|---|---|---|---|---|---|---|
INT8 | FP32 | Accuracy Ratio [(INT8-FP32)/FP32] |
INT8 | FP32 | Performance Ratio [INT8/FP32] |
||
Faster R-CNN (ONNX Model Zoo) | qlinearops | 34.06% | 34.37% | -0.88% | 4.15 | 3.47 | 1.20x |
Faster R-CNN (ONNX Model Zoo) | qdq | 33.98% | 34.37% | -1.12% | 4.19 | 3.49 | 1.20x |
Mask R-CNN (ONNX Model Zoo) | qlinearops | 33.13% | 33.72% | -1.74% | 3.46 | 3.02 | 1.15x |
Mask R-CNN (ONNX Model Zoo) | qdq | 33.29% | 33.72% | -1.28% | 3.46 | 3.02 | 1.15x |
FCN (ONNX Model Zoo) | qlinearops | 64.54% | 64.98% | -0.67% | 28.04 | 12.59 | 2.23x |
FCN (ONNX Model Zoo) | qdq | 64.54% | 64.98% | -0.67% | 28.22 | 12.67 | 2.23x |
GPT-J-6B (HuggingFace) | qlinearops | 78.46% | 79.17% | -0.91% | 1.74 | 0.66 | 2.62x |
GPT-J-6B (HuggingFace) | integerops | 78.93% | 79.17% | -0.31% | 1.68 | 0.67 | 2.52x |
Model name | Configuration | Lambada_openai | Accuracy Ratio [INT4/FP32] |
|
---|---|---|---|---|
Accuracy | Perplexity | |||
meta-llama/Llama-2-7b-chat-hf | FP32 | 0.7058 | 3.2788 | / |
GPTQ W4G32Asym |
0.7002 | 3.4124 | 0.9921 | |
meta-llama/Llama-2-7b-hf | FP32 | 0.7392 | 3.3950 | / |
GPTQ W4G32Asym |
0.7312 | 3.5711 | 0.9892 | |
meta-llama/Llama-2-13b-chat-hf | FP32 | 0.7312 | 2.9163 | / |
GPTQ W4G128Asym |
0.7240 | 2.9945 | 0.9902 | |
meta-llama/Llama-2-13b-hf | FP32 | 0.7677 | 3.0438 | / |
GPTQ W4G128Asym |
0.7634 | 3.1186 | 0.9944 | |
GPTQ W4G32Asym |
0.7615 | 3.1276 | 0.9919 | |
meta-llama/Llama-2-70b-chat-hf | FP32 | 0.7543 | 2.6181 | / |
RTN W4G32Asym |
0.7518 | 2.6496 | 0.9967 | |
meta-llama/Llama-2-70b-hf | FP32 | 0.7964 | 2.6612 | / |
RTN W4G32Sym |
0.7941 | 2.7243 | 0.9971 |
Model | Task Dataset |
Dense Accuracy Sparse Accuracy |
Relative Drop | Sparsity ratio Sparsity Pattern |
Comments Balanced or unbalanced ratio |
---|---|---|---|---|---|
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=76.2 |
-0.80% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=76.2 |
-0.80% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | question answering SQuAD-v1.1 |
f1=76.87 f1=77.62 |
+0.98% | 50% structured 2:4 |
snip momentum balanced |
Distilbert-base-uncased | question answering SQuAD-v1.1 |
f1=86.90 f1=86.15 |
-0.86% | 80% structured 4x1 |
snip momentum unbalanced |
Distilbert-base-uncased | question answering SQuAD-v1.1 |
f1=86.90 f1=87.50 |
+0.69% | 50% structured 2:4 |
snip momentum balanced |
Bert-base-uncased | question answering SQuAD-v1.1 |
f1=88.59 f1=87.78 |
-0.92% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-base-uncased | question answering SQuAD-v1.1 |
f1=88.59 f1=89.40 |
+0.91% | 50% structured 2:4 |
snip momentum balanced |
Bert-large | question answering SQuAD-v1.1 |
f1=91.23 f1=90.91 |
-0.35% | 80% structured 4x1 |
snip momentum unbalanced |
Bert-large | question answering SQuAD-v1.1 |
f1=91.23 f1=91.67 |
+0.48% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=87.22 |
-0.34% | 90% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=87.33 |
-0.22% | 90% structured 4x1 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=86.89 |
-0.72% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification MRPC |
f1=87.52 f1=86.8 |
-0.83% | 60% structured per channel |
snip momentum unbalanced |
Distilbert-base-uncased | text classification MRPC |
f1=90.26 f1=89.85 |
-0.46% | 90% structured 4x1 |
snip momentum unbalanced |
Distilbert-base-uncased | text classification MRPC |
f1=90.26 f1=90.88 |
+0.69% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=86.92 |
-0.79% | 90% structured 4x1 |
snip momentum unbalanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=87.73 |
+0.14% | 50% structured 2:4 |
snip momentum balanced |
Bert-Mini | text classification SST-2 |
accuracy=87.61 accuracy=86.92 |
-0.79% | 50% structured per channel |
snip momentum unbalanced |
ResNet50 | image recognition ImageNet |
top1 acc = 78.95 top1 acc = 80.10 |
-1.43% | 75% structured 2x1 |
snip momentum unbalanced |
YOLO-v5s6 | object detection COCO |
AP0.50:0.95/AP0.50=0.404/0.6 AP0.50:0.95/AP0.50=0.393/0.584 |
-2.72% | 80% unstructured |
snip momentum unbalanced |
Bert-Large | question answering SQuAD-v1.1 |
f1=91.34 f1=90.7 |
-0.07% | 80% structured 2x1 |
group lasso unbalanced |
Bert-Base | text classification MNLI |
[m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27] |
[-2.51%, -1.80%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification MNLI |
[m, mm] = [84.57, 84.79] [m, mm] = [83.20, 84.11] |
[-1.62%, -0.80%] | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 91.51 |
-0.88% | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 92.20 |
-0.13% | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification SST-2 |
accuracy = 92.32 accuracy = 91.97 |
-0.38% | 20% unstructured |
gradient sensitivity balanced |
Bert-Base | text classification QQP |
[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06] |
[-0.68%, -1.12%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification QQP |
[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.92, 87.78] |
[-0.20%, -0.31%] | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | text classification QNLI |
accuracy = 91.54 accuracy = 90.39 |
-1.26% | 70% unstructured |
Prune once for all balanced |
Bert-Base | text classification QNLI |
accuracy = 91.54 accuracy = 90.87 |
-0.73% | 50% structured 1:2 |
Prune once for all balanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75] |
[-2.61%, -1.54%] | 70% unstructured |
Prune once for all balanced |
Bert-Base | question answering | [em, f1] = [79.34, 87.10] [em, f1] = [78.03, 86.50] |
[-1.65%, -0.69%] | 50% structured 1:2 |
Prune once for all balanced |
Example Name | Dataset | Student (Metrics) |
Teacher (Metrics) |
Student With Distillation (Metrics Improvement) |
Student With Distributed Distillation (Metrics Improvement) |
---|---|---|---|---|---|
MobileNet example | CIFAR-10 | MobileNetV2-0.35 (0.7965 ACC) |
WideResNet40-2 (0.9522 ACC) |
0.8178 ACC (0.0213 ACC) |
0.8235 ACC (0.027 ACC) |
CNN example | CIFAR-100 | CNN-2 (0.5494 ACC) |
CNN-10 (0.7153 ACC) |
0.5540 ACC (0.0046 ACC) |
0.5523 ACC (0.0029 ACC) |
VGG example | CIFAR-100 | VGG-8-BN (0.7022 ACC) |
VGG-13-BN (0.7415 ACC) |
0.7025 ACC (0.0003 ACC) |
NA |
ResNet example | ImageNet | ResNet18 (0.6739 ACC) |
ResNet50 (0.7399 ACC) |
0.6845 ACC (0.0106 ACC) |
NA |
BlendCnn example | MRPC | BlendCnn (0.7034 ACC) |
BERT-Base (0.8382 ACC) |
0.7034 ACC (0 ACC) |
NA |
BiLSTM example | SST-2 | BiLSTM (0.8314 ACC) |
RoBERTa-Base (0.9403 ACC) |
0.9048 ACC (0.0734 ACC) |
NA |
DistilBERT example | SQuAD | DistilBERT (0.7323/0.8256 EM/F1) |
BERT-Base (0.8084/0.8814 EM/F1) |
0.7442/0.8371 EM/F1 (0.0119/0.0115 EM/F1) |
NA |
TinyBERT example | MNLI | TinyBERT (0.8018/0.8044 m/mm) |
BERT-Base (0.8363/0.8411 m/mm) |
0.8025/0.8074 m/mm (0.0007/0.0030 m/mm) |
NA |
BERT-3 example | QQP | BERT-3 (0.8626/0.8213 EM/F1) |
BERT-Base (0.9091/0.8782 EM/F1) |
0.8684/0.8259 EM/F1 (0.0058/0.0046 EM/F1) |
NA |
DistilRoBERTa example | COLA | DistilRoBERTa (0.6057 ACC) |
RoBERTa-Large (0.6455 ACC) |
0.6187 ACC (0.0130 ACC) |
NA |
Model (ONNX QDQ) | AWS c6i.2xlarge (Intel) CPU Execution Provider |
AWS c6a.2xlarge (AMD) CPU Execution Provider |
AWS c6g.2xlarge (ARM) CPU Execution Provider |
NVidia A100 CUDA Execution Provider |
---|---|---|---|---|
ResNet50 | 74.76% | 68.95% | 74.76% | 74.75% |
BERT-base | 85.54% | 84.56% | 85.54% | 84.31% |
ResNet50 V1.5 | 72.20% | 67.70% | 72.20% | 72.29% |
MobileNet V2 | 65.82% | 58.56% | 65.83% | 65.63% |
SSD MobileNet V1 | 22.45% | 16.53% | 22.45% | 22.35% |
DistilBERT base MRPC | 84.56% | 83.82% | 84.56% | 84.56% |
SqueezeNet | 56.54% | 53.52% | 56.54% | 56.55% |
SSD | 18.63% | 18.54% | 18.63% | 18.61% |
AlexNet | 54.71% | 47.06% | 54.71% | 54.79% |
CaffeNet | 56.25% | 52.35% | 56.27% | 56.24% |
GoogleNet | 67.73% | 63.56% | 67.72% | 67.76% |
ZFNet | 55.86% | 45.09% | 55.86% | 55.89% |
Inception V1 | 67.21% | 63.03% | 67.20% | 67.21% |
SSD MobileNet V1 (ONNX Model Zoo) | 22.86% | 16.94% | 22.80% | 22.87% |
Mobile bert MRPC | 85.54% | 84.56% | 85.54% | 85.54% |
Roberta base MRPC | 89.46% | 90.44% | 89.71% | 89.71% |
ResNet50 V1.5 MLPerf | 76.14% | 72.80% | 76.14% | 76.17% |
VGG16 | 66.69% | 64.25% | 66.69% | 66.64% |
VGG16 (ONNX Model Zoo) | 72.31% | 69.35% | 72.32% | 72.34% |
MobileNet V3 MLPerf | 75.57% | 70.78% | 75.56% | 75.52% |
EfficientNet | 77.61% | 76.52% | 77.56% | 77.60% |
MobileNet V2 (ONNX Model Zoo) | 68.51% | 62.48% | 68.58% | 68.48% |
ShuffleNet V2 | 66.12% | 58.41% | 66.11% | 66.11% |