Diagnosis

Diagnosis Introduction
Supported Feature Matrix
Get Started
Example
Step by Step Diagnosis Example with TensorFlow
Step by Step Diagnosis Example with ONNXRT

Diagnosis Introduction

The diagnosis feature provides methods to debug the accuracy loss during quantization and profile the performance gap during benchmark. There are 2 ways to diagnose a model with Intel® Neural Compressor. First is non-GUI mode that is described below and second is GUI mode with Neural Insights component.

The workflow is described in the diagram below. First we have to configure scripts with diagnosis, then run them and check diagnosis info in the terminal. Test if the result is satisfying and repeat the steps if needed.

Supported Feature Matrix

Types	Diagnosis data	Framework	Backend
Post-Training Static Quantization (PTQ)	weights and activations	TensorFlow	TensorFlow/Intel TensorFlow
Post-Training Static Quantization (PTQ)	weights and activations	ONNX Runtime	QLinearops/QDQ
Benchmark Profiling	OP execute duration	TensorFlow	TensorFlow/Intel TensorFlow
Benchmark Profiling	OP execute duration	ONNX Runtime	QLinearops/QDQ

Get Started

Install Intel® Neural Compressor

First you need to install Intel® Neural Compressor.

git clone https://github.com/intel/neural-compressor.git
cd neural-compressor 
pip install -r requirements.txt 
python setup.py install

Modify script

Modify quantization/benchmark script to run diagnosis by adding argument diagnosis set to True to PostTrainingQuantConfig/BenchmarkConfig as shown below.

Quantization diagnosis

config = PostTrainingQuantConfig(diagnosis=True, ...)

Benchmark diagnosis

config = BenchmarkConfig(diagnosis=True, ...)

Example

Below it is explained how to run diagnosis for ONNX ResNet50 model.

Prepare dataset

Download dataset ILSVR2012 validation Imagenet dataset.

Download label:

wget http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
tar -xvzf caffe_ilsvrc12.tar.gz val.txt

Run quantization script

Then execute script with quantization API in another terminal with --diagnose flag.

python examples/onnxrt/image_recognition/resnet50_torchvision/quantization/ptq_static/main.py \
  --model_path=/path/to/resnet50_v1.onnx/ \
  --dataset_location=/path/to/ImageNet/ \
  --label_path=/path/to/val.txt/
  --tune 
  --diagnose

Run benchmark script

To run profiling execute script with parameters shown in the command below.

python examples/onnxrt/image_recognition/resnet50_torchvision/quantization/ptq_static/main.py \
  --model_path=/path/to/resnet50_v1.onnx/ \
  --dataset_location=/path/to/ImageNet/ \
  --label_path=/path/to/val.txt/
  --mode=performance \
  --benchmark \
  --diagnose

See quantization data

After script's execution you will see the results in your terminal. In the activations summary you can see a table with OP name, MSE (mean squared error), activation minimum and maximum sorted by MSE.

In the weights summary table there are parameters like minimum, maximum, mean, standard deviation and variance for input model. The table is also sorted by MSE.

How to do diagnosis

Neural Compressor diagnosis mode provides weights and activation data that includes several useful metrics for diagnosing potential losses of model accuracy.

Parameter description

Data is presented in the terminal in form of table where each row describes single OP in the model. We present such quantities measures like:

MSE - Mean Squared Error - it is a metric that measures how big is the difference between input and optimized model's weights for specific OP.

$$ MSE = \sum_{i=1}^{n}(x_i-y_i)^2 $$

Input model min - minimum value of the input OP tensor data

$$ \min{\vec{x}} $$

Input model max - maximum value of the input OP tensor data

$$ \max{\vec{x}} $$

Input model mean - mean value of the input OP tensor data

$$ \mu =\frac{1}{n} \sum_{i=1}^{n} x_{i} $$

Input model standard deviation - standard deviation of the input OP tensor data

$$ \sigma =\sqrt{\frac{1}{n}\sum\limits_{i=1}^n (x_i - \mu)} $$

Input model variance - variance of the input OP tensor data

$$ Var = \sigma^2 $$

where,
$x_i$ - input OP tensor data,
$y_i$ - optimized OP tensor data,
$\mu_x$ - input model mean,
$\sigma_x$ - input model variance

Diagnosis suggestions

Check the nodes with MSE order. High MSE usually means higher possibility of accuracy loss happened during the quantization, so fallback those Ops may get some accuracy back.
Check the Min-Max data range. An dispersed data range usually means higher accuracy loss, so we can also try to full back those Ops.
Check with the other data and find some outliers, and try to fallback some Ops and test for the quantization accuracy.

Note: We can't always trust the debug rules, it's only a reference, sometimes the accuracy regression is hard to explain.

Fallback setting example

from neural_compressor import quantization, PostTrainingQuantConfig

op_name_dict = {"v0/cg/conv0/conv2d/Conv2D": {"activation": {"dtype": ["fp32"]}}}
config = PostTrainingQuantConfig(
    diagnosis=True,
    op_name_dict=op_name_dict,
)
q_model = quantization.fit(
    model,
    config,
    calib_dataloader=dataloader,
    eval_func=eval,
)

See profiling data

In profiling section there is a table with nodes sorted by total execution time. It is possible to check which operations take the most time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diagnosis.md

diagnosis.md

Diagnosis

Diagnosis Introduction

Supported Feature Matrix

Get Started

Install Intel® Neural Compressor

Modify script

Quantization diagnosis

Benchmark diagnosis

Example

Prepare dataset

Run quantization script

Run benchmark script

See quantization data

How to do diagnosis

Parameter description

Diagnosis suggestions

Fallback setting example

See profiling data

Files

diagnosis.md

Latest commit

History

diagnosis.md

File metadata and controls

Diagnosis

Diagnosis Introduction

Supported Feature Matrix

Get Started

Install Intel® Neural Compressor

Modify script

Quantization diagnosis

Benchmark diagnosis

Example

Prepare dataset

Run quantization script

Run benchmark script

See quantization data

How to do diagnosis

Parameter description

Diagnosis suggestions

Fallback setting example

See profiling data