Name		Name	Last commit message	Last commit date
parent directory ..
notebooks		notebooks
README.md		README.md
build_engines.py		build_engines.py
refit_engine_and_infer.py		refit_engine_and_infer.py
requirements.txt		requirements.txt

README.md

Introduction To Building and Refitting Weight-stripped Engines from ONNX Models

Table Of Contents

Description
How does this sample work?
Prerequisites
Running the sample
- Sample --help options
Additional resources
License
Changelog
Known issues

Description

This sample, sample_weight_stripping, is a Python sample which uses TensorRT to build a weight-stripped engine and later refit to a full engine for inference.

How does this sample work?

This sample demonstrates how to build a weight-stripped engine from an ONNX model file using TensorRT Python API which can reduce the saved engine size. Later, the weight-stripped engine is refitted by parser refitter with the original ONNX model as input. The refitted full engine is used for inference and guarantees no performance and accuracy loss. In this sample, we use ResNet50 to showcase our features.

Prerequisites

Install the dependencies for Python.
```
pip3 install -r requirements.txt
```

Running the sample

Build and save both normal engine and weight-stripped engine:
```
python3 build_engines.py --output_stripped_engine=stripped_engine.trt --output_normal_engine=normal_engine.trt
```
After running this step, you can see two saved TensorRT engines. stripped_engine.trt contains a stripped engine (~2.3MB) and normal_engine.trt contains a normal engine with all weights included (~51MB). By using stripped engine build, we can greatly reduce the size of the saved engine file.

Note: If the TensorRT sample data is not installed in the default location, for example /usr/src/tensorrt/data/, the model directory must be specified. For example: --stripped_onnx=/path/to/my/data/ sets the model path for building weight-stripped engine and --original_onnx=/path/to/my/data/ sets the model path for building normal engine. In most of the cases, they can use the same ONNX model.

Refit the weight-stripped engine and perform inference with the weight-stripped engine and the normal engine:

python3 refit_engine_and_infer.py --stripped_engine=stripped_engine.trt -–normal_engine=normal_engine.trt

Verify that the sample ran successfully. If the sample runs successfully you should see output similar to the following. The prediction results of the refitted stripped engine is the same as the normal engine. There is no performance loss.

Normal engine inference time on 100 cases: 0.1066 seconds
Refitted stripped engine inference time on 100 cases: 0.0606 seconds
Normal engine correctly recognized data/samples/resnet50/tabby_tiger_cat.jpg as tiger cat
Refitted stripped engine correctly recognized data/samples/resnet50/tabby_tiger_cat.jpg as tiger cat

Sample --help options

To see the full list of available options and their descriptions, use the -h or --help command line option.

Additional resources

The following resources provide a deeper understanding about importing a model into TensorRT using Python:

ResNet-50

Deep Residual Learning for Image Recognition

Documentation

License

For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.

Changelog

February 2024

Initial release of this sample.

Known issues

There are no known issues in this sample.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sample_weight_stripping

sample_weight_stripping

README.md

Introduction To Building and Refitting Weight-stripped Engines from ONNX Models

Description

How does this sample work?

Prerequisites

Running the sample

Sample --help options

Additional resources

License

Changelog

Known issues

Files

sample_weight_stripping

Directory actions

More options

Directory actions

More options

Latest commit

History

sample_weight_stripping

Folders and files

parent directory

README.md

Introduction To Building and Refitting Weight-stripped Engines from ONNX Models

Description

How does this sample work?

Prerequisites

Running the sample

Sample --help options

Additional resources

License

Changelog

Known issues