Skip to content

Latest commit

 

History

History
 
 

sample_weight_stripping

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Introduction To Building and Refitting Weight-stripped Engines from ONNX Models

Table Of Contents

Description

This sample, sample_weight_stripping, is a Python sample which uses TensorRT to build a weight-stripped engine and later refit to a full engine for inference.

How does this sample work?

This sample demonstrates how to build a weight-stripped engine from an ONNX model file using TensorRT Python API which can reduce the saved engine size. Later, the weight-stripped engine is refitted by parser refitter with the original ONNX model as input. The refitted full engine is used for inference and guarantees no performance and accuracy loss. In this sample, we use ResNet50 to showcase our features.

Prerequisites

  1. Install the dependencies for Python.

    pip3 install -r requirements.txt

Running the sample

  1. Build and save both normal engine and weight-stripped engine:

    python3 build_engines.py --output_stripped_engine=stripped_engine.trt --output_normal_engine=normal_engine.trt
    

    After running this step, you can see two saved TensorRT engines. stripped_engine.trt contains a stripped engine (~2.3MB) and normal_engine.trt contains a normal engine with all weights included (~51MB). By using stripped engine build, we can greatly reduce the size of the saved engine file.

    Note: If the TensorRT sample data is not installed in the default location, for example /usr/src/tensorrt/data/, the model directory must be specified. For example: --stripped_onnx=/path/to/my/data/ sets the model path for building weight-stripped engine and --original_onnx=/path/to/my/data/ sets the model path for building normal engine. In most of the cases, they can use the same ONNX model.

  2. Refit the weight-stripped engine and perform inference with the weight-stripped engine and the normal engine:

    python3 refit_engine_and_infer.py --stripped_engine=stripped_engine.trt -–normal_engine=normal_engine.trt
    
  3. Verify that the sample ran successfully. If the sample runs successfully you should see output similar to the following. The prediction results of the refitted stripped engine is the same as the normal engine. There is no performance loss.

    Normal engine inference time on 100 cases: 0.1066 seconds
    Refitted stripped engine inference time on 100 cases: 0.0606 seconds
    Normal engine correctly recognized data/samples/resnet50/tabby_tiger_cat.jpg as tiger cat
    Refitted stripped engine correctly recognized data/samples/resnet50/tabby_tiger_cat.jpg as tiger cat
    

Sample --help options

To see the full list of available options and their descriptions, use the -h or --help command line option.

Additional resources

The following resources provide a deeper understanding about importing a model into TensorRT using Python:

ResNet-50

Documentation

License

For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.

Changelog

February 2024

Initial release of this sample.

Known issues

There are no known issues in this sample.