Validated Models

Intel® Neural Compressor validated examples with multiple compression techniques. The typical examples link can be found in example tables, and the performance/accuracy results is available here.

Validated Quantization Examples

1.1. TensorFlow Models with Intel TensorFlow 2.13.0

1.2. PyTorch Models with Torch 2.0.1+cpu in PTQ Mode

1.3. PyTorch Models with Torch 2.0.1+cpu in QAT Mode

1.4. PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu

1.5. PyTorch Models with Torch 2.0.1+cpu in WOQ Mode

1.6. ONNX Models with ONNX Runtime 1.15.1

1.7. ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode
Validated Pruning Examples
Validated Knowledge Distillation Examples
Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Validated Quantization Examples

System summary: Test by Intel on 09/01/2023. 1-node, 1x Intel(R) Xeon(R) Platinum 8480+ @3.8GHz, 56 cores/socket, HT On, Turbo On, Total Memory 256GB (16x16GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3A14.TEL2P1, microcode 0x2b0001b0,
CentOS Stream 8, gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), DL Models, Frameworks: TensorFlow/ONNXRT/PyTorch, Datatype: FP32/INT8/BF16.
Using 1 socket, 4 cores/instance, 14 instances and batch size 1 to benchmark most of the model.
Using 1 socket, 56 cores/instance, 1 instance and batch size 1 for some large models performance measurement.

Performance varies by use, configuration and other factors.
For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

TensorFlow Models with Intel TensorFlow 2.13.0

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet50 v1.0	pb	74.12%	74.27%	-0.21%	2914.42	621.91	4.69x
ResNet50 v1.5	pb	76.23%	76.46%	-0.31%	2160.07	545.47	3.96x
ResNet101	pb	77.50%	76.45%	1.37%	1508.97	428.02	3.53x
Inception V1	pb	70.44%	69.74%	1.01%	3290.75	1229.78	2.68x
Inception V2	pb	74.38%	73.97%	0.57%	2404.57	1048.49	2.29x
Inception V3	pb	76.71%	76.75%	-0.05%	1669.09	500.95	3.33x
Inception V4	pb	80.18%	80.27%	-0.11%	1073.14	245.13	4.38x
Inception ResNet V2	pb	80.34%	80.40%	-0.07%	374.52	172.06	2.18x
MobileNet V1	pb	71.78%	70.96%	1.16%	5478.88	1756.33	3.12x
MobileNet V2	pb	72.52%	71.76%	1.07%	4133.01	1748.06	2.36x
VGG16	pb	72.64%	70.89%	2.47%	1534.50	236.62	6.49x
VGG19	pb	72.69%	71.01%	2.37%	1377.40	197.77	6.96x
ResNetV2 50	pb	70.39%	69.64%	1.07%	1125.32	656.38	1.71x
ResNetV2 101	pb	72.62%	71.87%	1.04%	709.50	367.00	1.93x
ResNetV2 152	pb	73.11%	72.37%	1.03%	497.24	265.34	1.87x
Densenet 121	pb	73.59%	72.89%	0.97%	557.67	456.61	1.22x
Densenet 161	pb	76.35%	76.29%	0.08%	353.18	235.35	1.50x
Densenet 169	pb	74.34%	74.65%	-0.41%	435.44	385.73	1.13x
EfficientNet B0	ckpt	76.15%	76.76%	-0.79%	786.55	723.69	1.09x
SSD ResNet50 V1	pb	37.88%	38.00%	-0.31%	130.09	30.78	4.23x
SSD MobileNet V1	pb	22.98%	23.13%	-0.64%	1291.02	683.50	1.89x
SSD ResNet50 v1	ckpt	37.89%	38.00%	-0.30%	127.30	27.63	4.61x
SSD MobileNet v1	ckpt	22.96%	23.13%	-0.72%	1295.23	453.76	2.85x
SSD ResNet34	pb	21.70%	22.09%	-1.76%	242.91	14.03	17.31x
Faster R-CNN Inception ResNet V2	pb	37.47%	38.31%	-2.18%	5.44	3.02	1.80x
Faster R-CNN Inception ResNet V2	SavedModel	37.79%	38.31%	-1.34%	5.43	3.00	1.81x
Faster R-CNN ResNet101	pb	30.32%	30.39%	-0.23%	166.37	23.54	7.07x
Faster R-CNN ResNet101	SavedModel	30.33%	30.39%	-0.20%	151.54	18.58	8.16x
Faster R-CNN ResNet50	pb	26.64%	26.59%	0.21%	173.33	28.58	6.07x
YOLOv3	pb	82.13%	82.35%	-0.28%	230.69	88.35	2.61x
BERT large SQuAD	pb	92.36	92.99	-0.67%	59.76	17.71	3.37x
BERT large SQuAD (ONNX Model Zoo)	pb	92.26	92.98	-0.78%	41.65	16.14	2.58x
BERT base MRPC	ckpt	87.01%	86.52%	0.57%	416.57	177.06	2.35x
Transformer LT	pb	25.68	25.86	-0.67%	41.19	21.94	1.88x
Transformer lt MLPerf	pb	27.27	27.17	0.39%	9.77	4.51	2.17x
Wide Deep large DS	pb	77.75%	77.67%	0.10%	75552.26	50803.82	1.49x

Model	Example	Accuracy			Performance 1s56c1ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
Mask R-CNN Inception V2	pb	28.60%	28.73%	-0.44%	41.96	25.66	1.64x
Mask R-CNN Inception V2	ckpt	28.60%	28.73%	-0.44%	41.56	24.35	1.71x

PyTorch Models with Torch 2.0.1+cpu in PTQ Mode

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet18	static	69.61%	69.76%	-0.22%	1673.05	653.13	2.56x
ResNet50	static	75.92%	76.15%	-0.30%	1170.62	329.70	3.55x
Inception V3	static	69.47%	69.52%	-0.07%	977.08	335.55	2.91x
ResNeSt50	static	80.80%	81.04%	-0.30%	404.51	40.04	10.10x
ResNeXt101_32x8d	static	78.94%	79.31%	-0.46%	562.16	109.77	5.12x
Efficientnet_b0	static	76.89%	77.67%	-1.01%	696.79	667.27	1.04x
Efficientnet_b3	static	77.82%	78.54%	-0.93%	508.85	397.32	1.28x
Efficientnet_b7	static	73.55%	73.92%	-0.50%	234.87	149.65	1.57x
Peleenet	static	71.85%	72.10%	-0.35%	858.18	588.33	1.46x
SE_ResNeXt50_32x4d	static	79.03%	79.08%	-0.07%	739.61	283.60	2.61x
YOLO V3	static	55.09%	54.93%	0.31%	161.92	60.48	2.68x
SSD ResNet34	static	19.52	19.63	-0.58%	141.26	11.78	11.99x
Roberta base MRPC	static	92.69%	93.59%	-0.96%	404.62	174.02	2.33x
CamemBERT base MRPC	static	88.93%	89.28%	-0.39%	395.08	171.78	2.30x
DistilBERT base MRPC	static	89.53%	90.27%	-0.82%	795.98	341.60	2.33x
DistilBERT base MRPC	dynamic	90.20%	90.27%	-0.07%	744.78	343.36	2.17x
ALBERT base MRPC	static	92.63%	92.63%	0.00%	374.41	163.39	2.29x
Funnel MRPC	static	91.60%	92.25%	-0.71%	300.02	182.21	1.65x
Xlm Roberta MRPC	static	88.36%	88.62%	-0.29%	399.27	173.62	2.30x
Xlm Roberta MRPC	dynamic	88.24%	88.24%	0.00%	385.00	174.37	2.21x
BERT base MRPC	static	89.63%	90.42%	-0.87%	407.79	173.24	2.35x
BERT base COLA	static	54.51%	53.39%	2.10%	412.12	172.97	2.38x
BERT base STSB	static	87.55%	88.05%	-0.57%	413.19	173.17	2.39x
BERT base SST-2	static	91.51%	92.32%	-0.87%	409.94	172.77	2.37x
BERT large COLA	static	62.84%	63.35%	-0.80%	141.90	51.55	2.75x
BERT base RTE	static	72.56%	72.56%	0.00%	401.42	174.02	2.31x
BERT large MRPC	static	90.22%	90.38%	-0.17%	139.59	51.66	2.70x
BERT large QNLI	static	90.87%	91.54%	-0.74%	406.48	172.94	2.35x
BERT large RTE	static	73.29%	74.01%	-0.98%	141.92	51.41	2.76x
BERT large RTE	dynamic	71.48%	74.01%	-3.41%	128.46	51.61	2.49x
BERT large SQuAD	static	92.27	93.16	-0.95%	37.59	16.48	2.28x
Reformer Crime and Punishment	static	1.88	1.87	0.23%	446.29	398.25	1.12x
lvwerra/pegasus-samsum	static	42.50	42.67	-0.39%	102.63	37.94	2.71x
T5 Small	dynamic	2.65	3.16	-16.25%	770.18	450.79	1.71x

Model	Example	Accuracy			Performance 1s56c1ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
EleutherAI/gpt-j-6B	static	3.36	2.34	43.85%	0.88	0.28	3.14x
openai/whisper-large	dynamic	97.07%	96.96%	0.12%	0.59	0.47	1.25x
abeja/gpt-neox-japanese-2.7b	static	4.30	3.52	22.06%	1.04	0.55	1.90x

PyTorch Models with Torch 2.0.1+cpu in QAT Mode

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet18	static	69.74%	69.76%	-0.03%	1646.74	657.43	2.50x
ResNet50	static	76.05%	76.15%	-0.12%	1098.80	322.34	3.41x
ResNeXt101_32x8d	static	79.28%	79.31%	-0.04%	568.02	109.50	5.19x
MobileNet V2	static	69.73%	71.84%	-2.93%	1383.77	761.35	1.82x
BERT base MRPC	static	89.50%	90.40%	-1.00%	401.83	173.17	2.32x

PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet18	static	69.56%	69.76%	-0.29%	5701.04	1593.88	3.58x
ResNet50	static	75.98%	76.15%	-0.22%	2090.03	685.29	3.05x
ResNeXt101_32x16d_wsl	static	84.04%	84.17%	-0.15%	556.86	79.42	7.01x
SSD ResNet34	static	19.93%	20.00%	-0.38%	91.53	15.62	5.86x
bert-large-uncased-whole-word-masking-finetuned-squad	static	92.93	93.16	-0.25%	162.94	22.37	7.29x
distilbert-base-uncased-distilled-squad	static	86.09	86.84	-0.86%	558.66	151.25	3.69x

Model	Example	Accuracy			Performance 1s56c1ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
EleutherAI/gpt-j-6B	static	78.70%	79.20%	-0.63%	4.88	1.57	3.11x

PyTorch Models with Torch 2.0.1+cpu in WOQ Mode

Model name	Configuration	Lambada_openai	Hellaswag	Winogrande	Piqa	Average [Mean accuracy of previous four tasks]		Wikitext
Model name	Configuration	Accuracy	Accuracy	Accuracy	Accuracy	Accuracy	Accuracy Ratio [INT4/FP32]	Word_perplexity
EleutherAI/gpt-j-6b	FP32	0.6831	0.4954	0.6409	0.7541	0.6434	/	10.8816
	GPTQ W4G128Asym	0.679	0.4895	0.6433	0.7476	0.6399	0.9945	11.0999
	GPTQ W4G32Asym	0.6829	0.4923	0.6401	0.7486	0.6410	0.9963	11.0141
	GPTQ W4G128Sym	0.685	0.4907	0.6361	0.7443	0.6390	0.9932	11.1498
	GPTQ W4G32Sym	0.6911	0.4899	0.6448	0.7497	0.6439	1.0008	11.0927
facebook/opt-6.7b	FP32	0.6769	0.5049	0.6543	0.7628	0.6497	/	12.2862
	GPTQ W4G32Asym	0.6804	0.4984	0.6535	0.7568	0.6473	0.9962	12.4193
	GPTQ W4G32Sym	0.6885	0.4973	0.6433	0.753	0.6455	0.9935	12.4607
decapoda-research/llama-7b-hf	FP32	0.7361	0.5642	0.6709	0.7835	0.6887	/	9.4202
decapoda-research/llama-7b-hf	GPTQ W4G32Asym	0.7244	0.5603	0.6614	0.7835	0.6824	0.9909	9.5881
decapoda-research/llama-13b-hf	FP32	0.7627	0.5911	0.7009	0.7878	0.7106	/	8.212
	GPTQ W4G128Asym	0.7518	0.5843	0.6961	0.7911	0.7058	0.9932	8.4319
	GPTQ W4G32Asym	0.7572	0.5898	0.7056	0.7894	0.7105	0.9998	8.3429
	GPTQ W4G128Sym	0.7596	0.5841	0.6977	0.7905	0.7080	0.9963	8.4916
decapoda-research/llama-30b-hf	FP32	0.7759	0.6266	0.7277	0.8096	0.7350	/	6.2384
	GPTQ W4G128Asym	0.778	0.624	0.7269	0.8047	0.7334	0.9979	6.4237
	GPTQ W4G32Asym	0.7706	0.6239	0.7285	0.8058	0.7322	0.9963	6.4697
	GPTQ W4G128Sym	0.7836	0.6195	0.7269	0.8047	0.7337	0.9983	6.5604
meta-llama/Llama-2-7b-chat-hf	FP32	0.7058	0.5732	0.648	0.7715	0.6746	/	11.7107
	GPTQ W4G128Asym	0.6982	0.5637	0.6527	0.7704	0.6713	0.9950	11.9702
	GPTQ W4G32Asym	0.6953	0.5682	0.6575	0.7758	0.6742	0.9994	11.9317
meta-llama/Llama-2-7b-hf	FP32	0.7392	0.567	0.6709	0.7835	0.6902	/	8.7911
	GPTQ W4G32Asym	0.7353	0.5642	0.6622	0.7829	0.6862	0.9942	8.9635
	GPTQ W4G128Sym	0.7246	0.5617	0.6756	0.7797	0.6854	0.9931	9.2799
meta-llama/Llama-2-13b-chat-hf	FP32	0.7312	0.6059	0.7103	0.7835	0.7077	/	10.2213
	GPTQ W4G128Asym	0.7273	0.6018	0.7088	0.7742	0.7030	0.9934	2538.083
	GPTQ W4G32Asym	0.7283	0.6053	0.7024	0.7764	0.7031	0.9935	1889.374
	GPTQ W4G128Sym	0.727	0.5997	0.7024	0.778	0.7018	0.9916	2504.497
meta-llama/Llama-2-13b-hf	FP32	0.7677	0.5972	0.6961	0.7878	0.7122	/	7.8984
	GPTQ W4G128Asym	0.7627	0.5933	0.689	0.7851	0.7075	0.9934	1556.448
	GPTQ W4G32Asym	0.7675	0.5934	0.6977	0.7856	0.7111	0.9984	1514.927
	GPTQ W4G128Sym	0.7566	0.5899	0.7032	0.7856	0.7088	0.9953	1374.728
bigscience/bloom-7b1	FP32	0.5764	0.4628	0.6456	0.7269	0.6029	/	30.6438
bigscience/bloom-7b1	GPTQ W4G32Sym	0.5799	0.4542	0.6361	0.7312	0.6004	0.9957	32.0626
bigscience/bloomz-7b1	FP32	0.5593	0.4789	0.6527	0.7628	0.6134	/	51.7432
bigscience/bloomz-7b1	GPTQ W4G32Asym	0.5525	0.4731	0.6504	0.7617	0.6094	0.9935	52.7828
databricks/dolly-v1-6b	FP32	0.6866	0.5098	0.6433	0.7622	0.6505	/	11.3242
	GPTQ W4G128Asym	0.6878	0.5058	0.6393	0.7633	0.6491	0.9978	11.5514
	GPTQ W4G32Asym	0.6864	0.5084	0.6519	0.7568	0.6509	1.0006	11.4728
	GPTQ W4G128Sym	0.6876	0.5045	0.6433	0.7541	0.6474	0.9952	11.6474
databricks/dolly-v2-7b	FP32	0.6379	0.5282	0.614	0.7448	0.6312	/	16.161
databricks/dolly-v2-7b	GPTQ W4G32Asym	0.6377	0.5228	0.5991	0.7448	0.6261	0.9919	16.4096
EleutherAI/gpt-neo-2.7b	FP32	0.6224	0.4271	0.577	0.722	0.5871	/	13.9359
	GPTQ W4G128Asym	0.6123	0.4227	0.5738	0.7203	0.5823	0.9917	14.3377
	GPTQ W4G32Asym	0.615	0.4259	0.5714	0.7247	0.5843	0.9951	14.2083
	GPTQ W4G32Sym	0.6154	0.4208	0.5777	0.7198	0.5834	0.9937	14.3121
EleutherAI/gpt-neox-20b	FP32	0.7233	0.5359	0.6614	0.7753	0.6740	/	9.195
	GPTQ W4G128Asym	0.7186	0.5328	0.6535	0.7699	0.6687	0.9922	9.3463
	GPTQ W4G32Asym	0.7268	0.533	0.659	0.7715	0.6726	0.9979	9.2897
mosaicml/mpt-7b	FP32	0.7056	0.5718	0.6859	0.7927	0.6890	/	9.9324
mosaicml/mpt-7b	GPTQ W4G128Asym	0.7006	0.5655	0.6803	0.7965	0.6857	0.9952	10.1515
mosaicml/mpt-7b-chat	FP32	0.655	0.5752	0.6748	0.7845	0.6724	/	13.5951
mosaicml/mpt-7b-chat	GPTQ W4G128Asym	0.6472	0.5716	0.6685	0.784	0.6678	0.9932	13.8539
mosaicml/mpt-7b-instruct	FP32	0.6918	0.5819	0.678	0.7927	0.6861	/	10.8863
mosaicml/mpt-7b-instruct	GPTQ W4G128Asym	0.6864	0.5765	0.6827	0.7873	0.6832	0.9958	11.1451
mosaicml/mpt-7b-storywriter	FP32	0.693	0.5477	0.663	0.784	0.6719	/	9.9125
mosaicml/mpt-7b-storywriter	GPTQ W4G128Asym	0.6854	0.5443	0.6661	0.7813	0.6693	0.9961	10.1137
tiiuae/falcon-rw-7b	FP32	0.6604	0.5419	0.6598	0.7753	0.6594	/	11.7616
	GPTQ W4G128Asym	0.6484	0.5369	0.6575	0.7807	0.6559	0.9947	11.9411
	GPTQ W4G32Asym	0.6571	0.5398	0.6582	0.7764	0.6579	0.9978	11.8809
	GPTQ W4G128Sym	0.652	0.535	0.6575	0.7682	0.6532	0.9906	12.0048
tiiuae/falcon-7b-instruct	FP32	0.6437	0.5177	0.6669	0.7824	0.6527	/	14.5053
	GPTQ W4G128Asym	0.6301	0.5142	0.6654	0.7835	0.6483	0.9933	14.8146
	GPTQ W4G32Asym	0.6377	0.517	0.6598	0.7807	0.6488	0.9941	14.6953

ONNX Models with ONNX Runtime 1.15.1

Model	Example	Accuracy			Performance 1s4c14ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
ResNet50 V1.5	qlinearops	72.16%	72.29%	-0.19%	1566.70	724.89	2.16x
ResNet50 V1.5	qdq	72.14%	72.29%	-0.22%	1567.15	716.57	2.19x
ResNet50 V1.5 MLPerf	qlinearops	76.11%	76.46%	-0.46%	1414.92	718.25	1.97x
ResNet50 V1.5 MLPerf	qdq	76.13%	76.46%	-0.44%	1459.45	721.54	2.02x
ResNet50 V1.5 (ONNX Model Zoo)	qlinearops	74.82%	74.99%	-0.22%	1593.71	753.89	2.11x
ResNet50 V1.5 (ONNX Model Zoo)	qdq	74.82%	74.99%	-0.23%	1582.24	752.38	2.10x
MobileNet V2	qlinearops	65.49%	66.89%	-2.09%	7139.93	4289.29	1.66x
MobileNet V2	qdq	65.49%	66.89%	-2.10%	7335.80	4080.31	1.80x
MobileNet V2 (ONNX Model Zoo)	qlinearops	68.38%	69.48%	-1.59%	7236.84	4299.29	1.68x
MobileNet V2 (ONNX Model Zoo)	qdq	68.38%	69.48%	-1.59%	6842.58	4496.44	1.52x
VGG16	qlinearops	66.56%	66.69%	-0.19%	591.43	178.91	3.31x
VGG16	qdq	66.59%	66.69%	-0.15%	614.91	183.79	3.35x
VGG16 (ONNX Model Zoo)	qlinearops	72.33%	72.40%	-0.09%	590.04	182.90	3.23x
VGG16 (ONNX Model Zoo)	qdq	72.33%	72.40%	-0.09%	614.75	179.93	3.42x
MobileNet V3 MLPerf	qlinearops	75.56%	75.74%	-0.24%	5703.81	2578.80	2.21x
MobileNet V3 MLPerf	qdq	75.56%	75.74%	-0.24%	5610.37	2603.41	2.16x
ShuffleNet V2 (ONNX Model Zoo)	qlinearops	66.09%	66.36%	-0.41%	6689.57	3690.63	1.81x
ShuffleNet V2 (ONNX Model Zoo)	qdq	66.09%	66.36%	-0.41%	5692.38	3758.23	1.51x
GoogleNet (ONNX Model Zoo)	qlinearops	67.71%	67.79%	-0.12%	1792.52	1111.26	1.61x
GoogleNet (ONNX Model Zoo)	qdq	67.73%	67.79%	-0.09%	1821.10	1104.52	1.65x
SqueezeNet (ONNX Model Zoo)	qlinearops	56.54%	56.87%	-0.57%	9472.72	5582.40	1.70x
SqueezeNet (ONNX Model Zoo)	qdq	56.54%	56.87%	-0.57%	9861.50	5566.72	1.77x
CaffeNet (ONNX Model Zoo)	qlinearops	56.21%	56.30%	-0.16%	3348.37	1141.01	2.93x
CaffeNet (ONNX Model Zoo)	qdq	56.25%	56.30%	-0.09%	3509.70	1142.19	3.07x
AlexNet (ONNX Model Zoo)	qlinearops	54.73%	54.79%	-0.10%	2426.58	987.34	2.46x
AlexNet (ONNX Model Zoo)	qdq	54.71%	54.79%	-0.14%	2208.63	1016.53	2.17x
ZFNet (ONNX Model Zoo)	qlinearops	55.84%	55.96%	-0.21%	930.06	532.61	1.75x
ZFNet (ONNX Model Zoo)	qdq	55.86%	55.96%	-0.18%	919.83	417.00	2.21x
Inception V1 (ONNX Model Zoo)	qlinearops	67.21%	67.24%	-0.05%	1880.94	1159.97	1.62x
Inception V1 (ONNX Model Zoo)	qdq	67.21%	67.24%	-0.05%	1798.96	1151.37	1.56x
EfficientNet (ONNX Model Zoo)	qlinearops	76.98%	77.11%	-0.17%	2890.97	1380.23	2.09x
EfficientNet (ONNX Model Zoo)	qdq	76.99%	77.11%	-0.16%	2548.20	1362.69	1.87x
DenseNet (ONNX Model Zoo)	qlinearops	60.53%	60.96%	-0.70%	657.12	507.94	1.29x
SSD (ONNX Model Zoo)	qlinearops	18.47%	18.98%	-2.69%	57.63	14.64	3.94x
SSD (ONNX Model Zoo)	qdq	18.62%	18.98%	-1.89%	56.96	14.58	3.91x
SSD MobileNet V1	qlinearops	22.44%	23.10%	-2.86%	1286.79	904.83	1.42x
SSD MobileNet V1	qdq	22.44%	23.10%	-2.86%	1121.02	856.82	1.31x
SSD MobileNet V1 (ONNX Model Zoo)	qlinearops	22.96%	23.02%	-0.27%	1098.80	829.55	1.32x
SSD MobileNet V1 (ONNX Model Zoo)	qdq	22.96%	23.02%	-0.27%	1044.34	790.39	1.32x
SSD MobileNet V2	qlinearops	23.87%	24.67%	-3.25%	849.89	627.62	1.35x
YOLOv3 (ONNX Model Zoo)	qlinearops	27.01%	28.73%	-5.99%	66.22	83.98	0.79x
YOLOv4 (ONNX Model Zoo)	qlinearops	32.30%	33.71%	-4.19%	70.87	66.16	1.07x
DUC (ONNX Model Zoo)	qlinearops	81.63%	81.92%	-0.36%	9.15	4.90	1.87x
Tiny YOLOv3 (ONNX Model Zoo)	qlinearops	11.74%	12.42%	-5.48%	1119.16	161.90	6.91x
Ultra Face (ONNX Model Zoo)	qlinearops	83.17%	83.65%	-0.57%	8537.50	1934.53	4.41x
Emotion FERPlus (ONNX Model Zoo)	qlinearops	7.97%	8.00%	-0.35%	3568.69	3121.38	1.14x
ArcFace (ONNX Model Zoo)	qlinearops	99.80%	99.80%	0.00%	494.07	244.21	2.02x
BERT base MRPC	qlinearops	85.54%	86.03%	-0.57%	398.76	226.09	1.76x
BERT base MRPC	qdq	85.54%	86.03%	-0.57%	392.94	223.06	1.76x
BERT base MRPC	integerops	85.29%	86.03%	-0.85%	473.72	223.12	2.12x
DistilBERT base MRPC	qdq	84.07%	84.56%	-0.58%	548.57	400.62	1.37x
DistilBERT base MRPC	integerops	85.54%	84.56%	1.16%	964.62	400.86	2.41x
Mobile bert MRPC	qdq	85.54%	86.28%	-0.85%	540.59	394.98	1.37x
Mobile bert MRPC	integerops	85.54%	86.28%	-0.85%	602.34	397.35	1.52x
Roberta base MRPC	integerops	90.93%	89.95%	1.09%	487.62	222.08	2.20x
BERT SQuAD (ONNX Model Zoo)	integerops	80.29	80.67	-0.47%	189.27	97.40	1.94x
MobileBERT SQuAD MLPerf (ONNX Model Zoo)	integerops	89.87	90.03	-0.17%	146.72	125.33	1.17x
BiDAF (ONNX Model Zoo)	integerops	65.93%	66.08%	-0.23%	2757.59	2277.14	1.21x
GPT2 lm head WikiText (ONNX Model Zoo)	integerops	31.98	29.00	10.31%	15.47	9.78	1.58x
BERT base cased MRPC (HuggingFace)	qlinearops	90.21%	90.42%	-0.23%	360.90	212.41	1.70x
BERT base uncased MRPC (HuggingFace)	integerops	89.58%	90.42%	-0.93%	484.68	212.34	2.28x
Roberta base MRPC (HuggingFace)	qlinearops	91.00%	91.38%	-0.41%	353.24	213.83	1.65x
Roberta base MRPC (HuggingFace)	integerops	90.85%	91.38%	-0.58%	490.42	212.57	2.31x
XLM Roberta base MRPC (HuggingFace)	qlinearops	89.37%	90.10%	-0.81%	304.10	214.51	1.42x
XLM Roberta base MRPC (HuggingFace)	integerops	89.66%	90.10%	-0.50%	347.25	214.13	1.62x
Camembert base MRPC (HuggingFace)	qlinearops	89.28%	89.28%	0.00%	272.62	216.98	1.26x
Camembert base MRPC (HuggingFace)	integerops	89.19%	89.28%	-0.10%	489.58	216.06	2.27x
MiniLM L12 H384 uncased MRPC (HuggingFace)	qlinearops	90.13%	90.97%	-0.93%	1054.31	585.78	1.80x
MiniLM L12 H384 uncased MRPC (HuggingFace)	integerops	91.07%	90.97%	0.10%	1072.47	590.03	1.82x
DistilBERT base uncased SST-2 (HuggingFace)	qlinearops	90.71%	91.06%	-0.38%	890.23	398.72	2.23x
DistilBERT base uncased SST-2 (HuggingFace)	integerops	90.25%	91.06%	-0.88%	746.66	397.78	1.88x
Albert base v2 SST-2 (HuggingFace)	qlinearops	92.09%	92.32%	-0.25%	268.37	211.96	1.27x
Albert base v2 SST-2 (HuggingFace)	integerops	91.74%	92.32%	-0.62%	265.65	212.21	1.25x
MiniLM L6 H384 uncased SST-2 (HuggingFace)	qlinearops	89.45%	90.14%	-0.76%	1958.82	1130.40	1.73x
MiniLM L6 H384 uncased SST-2 (HuggingFace)	integerops	89.91%	90.14%	-0.26%	2022.09	1130.14	1.79x
MiniLM L6 H384 uncased SST-2 (HuggingFace)	qlinearops	87.70%	88.29%	-0.67%	397.45	212.84	1.87x
MiniLM L6 H384 uncased SST-2 (HuggingFace)	integerops	88.19%	88.29%	-0.12%	489.19	213.14	2.30x
Electra small discriminator MRPC (HuggingFace)	qlinearops	89.92%	89.83%	0.09%	1797.98	1077.51	1.67x
Electra small discriminator MRPC (HuggingFace)	integerops	89.27%	89.83%	-0.63%	1930.55	1139.74	1.69x
BERT mini MRPC (HuggingFace)	qlinearops	86.21%	86.52%	-0.35%	5510.81	3334.89	1.65x
BERT mini MRPC (HuggingFace)	integerops	86.16%	86.52%	-0.41%	5627.19	3365.08	1.67x
Xlnet base cased MRPC (HuggingFace)	qlinearops	90.05%	89.86%	0.21%	108.83	92.24	1.18x
Xlnet base cased MRPC (HuggingFace)	integerops	89.58%	89.86%	-0.31%	110.83	90.80	1.22x
BART large MRPC (HuggingFace)	qlinearops	91.77%	91.20%	0.63%	59.18	51.49	1.15x
BART large MRPC (HuggingFace)	integerops	92.36%	91.20%	1.28%	96.38	51.47	1.87x
DeBERTa v3 base MRPC (HuggingFace)	qlinearops	91.85%	92.23%	-0.40%	163.17	146.13	1.12x
DeBERTa v3 base MRPC (HuggingFace)	integerops	92.39%	92.23%	0.17%	168.41	145.58	1.16x
Spanbert SQuAD (HuggingFace)	qlinearops	91.14	91.98	-0.91%	69.53	42.72	1.63x
Spanbert SQuAD (HuggingFace)	integerops	91.40	91.98	-0.63%	79.82	42.58	1.87x
Bert base multilingual cased SQuAD (HuggingFace)	qlinearops	88.42	89.13	-0.79%	70.47	42.73	1.65x
Bert base multilingual cased SQuAD (HuggingFace)	integerops	88.70	89.13	-0.48%	79.35	42.46	1.87x
DistilBert base uncased SQuAD (HuggingFace)	qlinearops	86.33	86.86	-0.62%	113.00	67.85	1.67x
DistilBert base uncased SQuAD (HuggingFace)	integerops	86.05	86.86	-0.94%	159.51	67.90	2.35x
BERT large uncased whole word masking SQuAD (HuggingFace)	qlinearops	92.34	93.16	-0.88%	24.64	12.75	1.93x
BERT large uncased whole word masking SQuAD (HuggingFace)	integerops	92.99	93.16	-0.18%	26.79	12.76	2.10x
Roberta large SQuAD v2 (HuggingFace)	qlinearops	89.03	89.02	0.02%	16.91	12.98	1.30x
Roberta large SQuAD v2 (HuggingFace)	integerops	89.04	89.02	0.02%	26.80	12.95	2.07x
GPT2 WikiText (HuggingFace)	qlinearops	30.25	29.00	4.33%	12.82	9.80	1.31x
GPT2 WikiText (HuggingFace)	integerops	29.68	29.00	2.36%	13.68	9.76	1.40x
DistilGPT2 WikiText (HuggingFace)	qlinearops	44.93	43.43	3.46%	20.66	16.78	1.23x
DistilGPT2 WikiText (HuggingFace)	integerops	44.62	43.43	2.74%	21.97	16.77	1.31x
LayoutLM FUNSD (HuggingFace)	qlinearops	78.15%	78.35%	-0.25%	59.50	42.98	1.38x
LayoutLM FUNSD (HuggingFace)	integerops	77.58%	78.35%	-0.98%	64.93	43.20	1.50x
LayoutLMv3 FUNSD (HuggingFace)	qlinearops	90.00%	90.49%	-0.54%	30.97	27.97	1.11x
LayoutLMv3 FUNSD (HuggingFace)	integerops	90.07%	90.49%	-0.46%	35.15	27.72	1.27x
LayoutLMv2 (HuggingFace)	qlinearops	81.36%	81.17%	0.23%	48.61	38.93	1.25x
LayoutLMv2 (HuggingFace)	integerops	80.86%	81.17%	-0.39%	45.52	36.10	1.26x
CodeBert (HuggingFace)	qlinearops	64.97%	65.41%	-0.67%	64.99	44.20	1.47x
CodeBert (HuggingFace)	integerops	64.93%	65.41%	-0.73%	77.99	43.63	1.79x

Model	Example	Accuracy			Performance 1s56c1ins1bs Throughput(samples/sec)
Model	Example	INT8	FP32	Accuracy Ratio [(INT8-FP32)/FP32]	INT8	FP32	Performance Ratio [INT8/FP32]
Faster R-CNN (ONNX Model Zoo)	qlinearops	34.06%	34.37%	-0.88%	4.15	3.47	1.20x
Faster R-CNN (ONNX Model Zoo)	qdq	33.98%	34.37%	-1.12%	4.19	3.49	1.20x
Mask R-CNN (ONNX Model Zoo)	qlinearops	33.13%	33.72%	-1.74%	3.46	3.02	1.15x
Mask R-CNN (ONNX Model Zoo)	qdq	33.29%	33.72%	-1.28%	3.46	3.02	1.15x
FCN (ONNX Model Zoo)	qlinearops	64.54%	64.98%	-0.67%	28.04	12.59	2.23x
FCN (ONNX Model Zoo)	qdq	64.54%	64.98%	-0.67%	28.22	12.67	2.23x
GPT-J-6B (HuggingFace)	qlinearops	78.46%	79.17%	-0.91%	1.74	0.66	2.62x
GPT-J-6B (HuggingFace)	integerops	78.93%	79.17%	-0.31%	1.68	0.67	2.52x

ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode

Model name	Configuration	Lambada_openai		Accuracy Ratio [INT4/FP32]
Model name	Configuration	Accuracy	Perplexity	Accuracy Ratio [INT4/FP32]
meta-llama/Llama-2-7b-chat-hf	FP32	0.7058	3.2788	/
meta-llama/Llama-2-7b-chat-hf	GPTQ W4G32Asym	0.7002	3.4124	0.9921
meta-llama/Llama-2-7b-hf	FP32	0.7392	3.3950	/
meta-llama/Llama-2-7b-hf	GPTQ W4G32Asym	0.7312	3.5711	0.9892
meta-llama/Llama-2-13b-chat-hf	FP32	0.7312	2.9163	/
meta-llama/Llama-2-13b-chat-hf	GPTQ W4G128Asym	0.7240	2.9945	0.9902
meta-llama/Llama-2-13b-hf	FP32	0.7677	3.0438	/
	GPTQ W4G128Asym	0.7634	3.1186	0.9944
	GPTQ W4G32Asym	0.7615	3.1276	0.9919
meta-llama/Llama-2-70b-chat-hf	FP32	0.7543	2.6181	/
meta-llama/Llama-2-70b-chat-hf	RTN W4G32Asym	0.7518	2.6496	0.9967
meta-llama/Llama-2-70b-hf	FP32	0.7964	2.6612	/
meta-llama/Llama-2-70b-hf	RTN W4G32Sym	0.7941	2.7243	0.9971

Validated Pruning Examples

Model	Task Dataset	Dense Accuracy Sparse Accuracy	Relative Drop	Sparsity ratio Sparsity Pattern	Comments Balanced or unbalanced ratio
Model	Task Dataset	Dense Accuracy Sparse Accuracy	Relative Drop	Sparsity ratio Sparsity Pattern	Comments Balanced or unbalanced ratio
Bert-Mini	question answering SQuAD-v1.1	f1=76.87 f1=76.2	-0.80%	80% structured 4x1	snip momentum unbalanced
Bert-Mini	question answering SQuAD-v1.1	f1=76.87 f1=76.2	-0.80%	80% structured 4x1	snip momentum unbalanced
Bert-Mini	question answering SQuAD-v1.1	f1=76.87 f1=77.62	+0.98%	50% structured 2:4	snip momentum balanced
Distilbert-base-uncased	question answering SQuAD-v1.1	f1=86.90 f1=86.15	-0.86%	80% structured 4x1	snip momentum unbalanced
Distilbert-base-uncased	question answering SQuAD-v1.1	f1=86.90 f1=87.50	+0.69%	50% structured 2:4	snip momentum balanced
Bert-base-uncased	question answering SQuAD-v1.1	f1=88.59 f1=87.78	-0.92%	80% structured 4x1	snip momentum unbalanced
Bert-base-uncased	question answering SQuAD-v1.1	f1=88.59 f1=89.40	+0.91%	50% structured 2:4	snip momentum balanced
Bert-large	question answering SQuAD-v1.1	f1=91.23 f1=90.91	-0.35%	80% structured 4x1	snip momentum unbalanced
Bert-large	question answering SQuAD-v1.1	f1=91.23 f1=91.67	+0.48%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=87.22	-0.34%	90% structured 4x1	snip momentum unbalanced
Bert-Mini	text classification MRPC	f1=87.52 f1=87.33	-0.22%	90% structured 4x1	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=86.89	-0.72%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification MRPC	f1=87.52 f1=86.8	-0.83%	60% structured per channel	snip momentum unbalanced
Distilbert-base-uncased	text classification MRPC	f1=90.26 f1=89.85	-0.46%	90% structured 4x1	snip momentum unbalanced
Distilbert-base-uncased	text classification MRPC	f1=90.26 f1=90.88	+0.69%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=86.92	-0.79%	90% structured 4x1	snip momentum unbalanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=87.73	+0.14%	50% structured 2:4	snip momentum balanced
Bert-Mini	text classification SST-2	accuracy=87.61 accuracy=86.92	-0.79%	50% structured per channel	snip momentum unbalanced
ResNet50	image recognition ImageNet	top1 acc = 78.95 top1 acc = 80.10	-1.43%	75% structured 2x1	snip momentum unbalanced
YOLO-v5s6	object detection COCO	AP0.50:0.95/AP0.50=0.404/0.6 AP0.50:0.95/AP0.50=0.393/0.584	-2.72%	80% unstructured	snip momentum unbalanced
Bert-Large	question answering SQuAD-v1.1	f1=91.34 f1=90.7	-0.07%	80% structured 2x1	group lasso unbalanced
Bert-Base	text classification MNLI	[m, mm] = [84.57, 84.79] [m, mm] = [82.45, 83.27]	[-2.51%, -1.80%]	70% unstructured	Prune once for all balanced
Bert-Base	text classification MNLI	[m, mm] = [84.57, 84.79] [m, mm] = [83.20, 84.11]	[-1.62%, -0.80%]	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 91.51	-0.88%	70% unstructured	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 92.20	-0.13%	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification SST-2	accuracy = 92.32 accuracy = 91.97	-0.38%	20% unstructured	gradient sensitivity balanced
Bert-Base	text classification QQP	[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.48, 87.06]	[-0.68%, -1.12%]	70% unstructured	Prune once for all balanced
Bert-Base	text classification QQP	[accuracy, f1] = [91.10, 88.05] [accuracy, f1] = [90.92, 87.78]	[-0.20%, -0.31%]	50% structured 1:2	Prune once for all balanced
Bert-Base	text classification QNLI	accuracy = 91.54 accuracy = 90.39	-1.26%	70% unstructured	Prune once for all balanced
Bert-Base	text classification QNLI	accuracy = 91.54 accuracy = 90.87	-0.73%	50% structured 1:2	Prune once for all balanced
Bert-Base	question answering	[em, f1] = [79.34, 87.10] [em, f1] = [77.27, 85.75]	[-2.61%, -1.54%]	70% unstructured	Prune once for all balanced
Bert-Base	question answering	[em, f1] = [79.34, 87.10] [em, f1] = [78.03, 86.50]	[-1.65%, -0.69%]	50% structured 1:2	Prune once for all balanced

Validated Knowledge Distillation Examples

Example Name	Dataset	Student (Metrics)	Teacher (Metrics)	Student With Distillation (Metrics Improvement)	Student With Distributed Distillation (Metrics Improvement)
MobileNet example	CIFAR-10	MobileNetV2-0.35 (0.7965 ACC)	WideResNet40-2 (0.9522 ACC)	0.8178 ACC (0.0213 ACC)	0.8235 ACC (0.027 ACC)
CNN example	CIFAR-100	CNN-2 (0.5494 ACC)	CNN-10 (0.7153 ACC)	0.5540 ACC (0.0046 ACC)	0.5523 ACC (0.0029 ACC)
VGG example	CIFAR-100	VGG-8-BN (0.7022 ACC)	VGG-13-BN (0.7415 ACC)	0.7025 ACC (0.0003 ACC)	NA
ResNet example	ImageNet	ResNet18 (0.6739 ACC)	ResNet50 (0.7399 ACC)	0.6845 ACC (0.0106 ACC)	NA
BlendCnn example	MRPC	BlendCnn (0.7034 ACC)	BERT-Base (0.8382 ACC)	0.7034 ACC (0 ACC)	NA
BiLSTM example	SST-2	BiLSTM (0.8314 ACC)	RoBERTa-Base (0.9403 ACC)	0.9048 ACC (0.0734 ACC)	NA
DistilBERT example	SQuAD	DistilBERT (0.7323/0.8256 EM/F1)	BERT-Base (0.8084/0.8814 EM/F1)	0.7442/0.8371 EM/F1 (0.0119/0.0115 EM/F1)	NA
TinyBERT example	MNLI	TinyBERT (0.8018/0.8044 m/mm)	BERT-Base (0.8363/0.8411 m/mm)	0.8025/0.8074 m/mm (0.0007/0.0030 m/mm)	NA
BERT-3 example	QQP	BERT-3 (0.8626/0.8213 EM/F1)	BERT-Base (0.9091/0.8782 EM/F1)	0.8684/0.8259 EM/F1 (0.0058/0.0046 EM/F1)	NA
DistilRoBERTa example	COLA	DistilRoBERTa (0.6057 ACC)	RoBERTa-Large (0.6455 ACC)	0.6187 ACC (0.0130 ACC)	NA

Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Model (ONNX QDQ)	AWS c6i.2xlarge (Intel) CPU Execution Provider	AWS c6a.2xlarge (AMD) CPU Execution Provider	AWS c6g.2xlarge (ARM) CPU Execution Provider	NVidia A100 CUDA Execution Provider
ResNet50	74.76%	68.95%	74.76%	74.75%
BERT-base	85.54%	84.56%	85.54%	84.31%
ResNet50 V1.5	72.20%	67.70%	72.20%	72.29%
MobileNet V2	65.82%	58.56%	65.83%	65.63%
SSD MobileNet V1	22.45%	16.53%	22.45%	22.35%
DistilBERT base MRPC	84.56%	83.82%	84.56%	84.56%
SqueezeNet	56.54%	53.52%	56.54%	56.55%
SSD	18.63%	18.54%	18.63%	18.61%
AlexNet	54.71%	47.06%	54.71%	54.79%
CaffeNet	56.25%	52.35%	56.27%	56.24%
GoogleNet	67.73%	63.56%	67.72%	67.76%
ZFNet	55.86%	45.09%	55.86%	55.89%
Inception V1	67.21%	63.03%	67.20%	67.21%
SSD MobileNet V1 (ONNX Model Zoo)	22.86%	16.94%	22.80%	22.87%
Mobile bert MRPC	85.54%	84.56%	85.54%	85.54%
Roberta base MRPC	89.46%	90.44%	89.71%	89.71%
ResNet50 V1.5 MLPerf	76.14%	72.80%	76.14%	76.17%
VGG16	66.69%	64.25%	66.69%	66.64%
VGG16 (ONNX Model Zoo)	72.31%	69.35%	72.32%	72.34%
MobileNet V3 MLPerf	75.57%	70.78%	75.56%	75.52%
EfficientNet	77.61%	76.52%	77.56%	77.60%
MobileNet V2 (ONNX Model Zoo)	68.51%	62.48%	68.58%	68.48%
ShuffleNet V2	66.12%	58.41%	66.11%	66.11%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validated_model_list.md

validated_model_list.md

Validated Models

Validated Quantization Examples

TensorFlow Models with Intel TensorFlow 2.13.0

PyTorch Models with Torch 2.0.1+cpu in PTQ Mode

PyTorch Models with Torch 2.0.1+cpu in QAT Mode

PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu

PyTorch Models with Torch 2.0.1+cpu in WOQ Mode

ONNX Models with ONNX Runtime 1.15.1

ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode

Validated Pruning Examples

Validated Knowledge Distillation Examples

Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime

Files

validated_model_list.md

Latest commit

History

validated_model_list.md

File metadata and controls

Validated Models

Validated Quantization Examples

TensorFlow Models with Intel TensorFlow 2.13.0

PyTorch Models with Torch 2.0.1+cpu in PTQ Mode

PyTorch Models with Torch 2.0.1+cpu in QAT Mode

PyTorch Models with Intel® Extension for PyTorch* 2.0.1+cpu

PyTorch Models with Torch 2.0.1+cpu in WOQ Mode

ONNX Models with ONNX Runtime 1.15.1

ONNX Models with ONNX Runtime 1.15.0 in WOQ Mode

Validated Pruning Examples

Validated Knowledge Distillation Examples

Validated ONNX QDQ INT8 Models on Multiple Hardware through ONNX Runtime