This feature helps automatically enable quantization on Deep Learning models and automatically evaluates for the best performance on the model. It is a code-free solution that can help users enable quantization algorithms on a model with no manual coding needed. Supported features include Post-Training Static Quantization, Post-Training Dynamic Quantization, and Mixed Precision.
- Post-Training Static Quantization for Stock PyTorch (with FX backend)
- Post-Training Static Quantization for IPEX
- Post-Training Dynamic Quantization for Stock PyTorch
- Mixed Precision for Stock PyTorch
- HuggingFace Transformers models
- torchvision models
- Broad models (under development)
- PyPI distribution with a one-line API call
- JupyterLab extension
HuggingFace Transformers models: text-classification/run_glue.py
from neural_coder import auto_quant
auto_quant(
code="https://github.com/huggingface/transformers/blob/v4.21-release/examples/pytorch/text-classification/run_glue.py",
args="--model_name_or_path albert-base-v2 --task_name sst2 --do_eval --output_dir result",
)
torchvision models: imagenet/main.py
from neural_coder import auto_quant
auto_quant(
code="https://github.com/pytorch/examples/blob/main/imagenet/main.py",
args="-a alexnet --pretrained -e /path/to/imagenet/",
)