In this instruction weight data will be dumped using Neural Insights. PyTorch GPT-J-6B model will be used as an example.
First you need to install Intel® Neural Compressor.
# Install Neural Compressor
git clone https://github.com/intel/neural-compressor.git
cd neural-compressor
pip install -r requirements.txt
python setup.py install
# Install Neural Insights
pip install -r neural_insights/requirements.txt
python setup.py install neural_insights
cd /examples/pytorch/nlp/huggingface_models/language-modeling/quantization/ptq_static/fx
pip install -r requirements.txt
Before applying quantization, modify some code in run_clm.py
file to enable Neural Insights:
- Set the argument
diagnosis
to beTrue
inPostTrainingQuantConfig
so that Neural Insights will dump weights of quantizable Ops in this model.
conf = PostTrainingQuantConfig(
accuracy_criterion=accuracy_criterion,
diagnosis=True,
)
- Quantize the model with following command:
python run_clm.py \
--model_name_or_path EleutherAI/gpt-j-6B \
--dataset_name wikitext\
--dataset_config_name wikitext-2-raw-v1 \
--do_train \
--do_eval \
--tune \
--output_dir saved_results
Results would be dumped into nc_workspace
directory in similar structure:
├── history.snapshot
├── input_model.pt
├── inspect_saved
│ ├── fp32
│ │ └── inspect_result.pkl
│ └── quan
│ └── inspect_result.pkl
├── model_summary.txt
└── weights_table.csv