Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 16 additions & 28 deletions examples/onnx_ptq/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ Please use the TensorRT docker image (e.g., `nvcr.io/nvidia/tensorrt:26.02-py3`)

> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.

> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
Comment on lines 27 to +29
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove duplicate note.

Lines 27-29 duplicate the same onnxruntime-gpu compatibility note that already appears on line 27.

Proposed fix
 > **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
 
-> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
-
 Set the following environment variables inside the TensorRT docker.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/onnx_ptq/README.md` around lines 27 - 29, The README contains a
duplicated compatibility note about using onnxruntime-gpu with the
nvcr.io/nvidia/tensorrt:25.06-py3 image; remove the redundant copy so the note
appears only once. Locate the duplicate note text ("If you are using
`onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3`...")
in examples/onnx_ptq/README.md and delete the second occurrence, leaving a
single instance of the note.


Set the following environment variables inside the TensorRT docker.

```bash
Expand Down Expand Up @@ -172,53 +174,39 @@ python -m modelopt.onnx.quantization \

This feature requires `TensorRT 10+` and `ORT>=1.20`. For proper usage, please make sure that the paths to `libcudnn*.so` and TensorRT `lib/` are in the `LD_LIBRARY_PATH` env variable and that the `tensorrt` python package is installed.

Please see the sample example below.

**Step 1**: Obtain the sample ONNX model and TensorRT plugin from [TensorRT-Custom-Plugin-Example](https://github.com/leimao/TensorRT-Custom-Plugin-Example).

  **1.1.** Change directory to `TensorRT-Custom-Plugin-Example`:

```bash
cd /path/to/TensorRT-Custom-Plugin-Example
```
A self-contained example is provided in the [`custom_op_plugin/`](./custom_op_plugin/) subfolder. Please see the steps below.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a link to the original custom op repo?


  **1.2.** Compile the TensorRT plugin:
**Step 1**: Build the TensorRT plugin and create the sample ONNX model.

```bash
cmake -B build \
-DNVINFER_LIB=$TRT_LIBPATH/libnvinfer.so.10 \
-DNVINFER_PLUGIN_LIB=$TRT_LIBPATH/libnvinfer_plugin.so.10 \
-DNVONNXPARSER_LIB=$TRT_LIBPATH/libnvonnxparser.so.10 \
-DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES=/usr/include/x86_64-linux-gnu
```
  **1.1.** Compile the TensorRT plugin:

```bash
cmake --build build --config Release --parallel
cmake -S custom_op_plugin/plugin -B /tmp/plugin_build
cmake --build /tmp/plugin_build --config Release --parallel
```

This generates a plugin in `TensorRT-Custom-Plugin-Example/build/src/plugins/IdentityConvIPluginV2IOExt/libidentity_conv_iplugin_v2_io_ext.so`
This generates `/tmp/plugin_build/libidentity_conv_plugin.so`.

  **1.3.** Create the ONNX file.
  **1.2.** Create the ONNX model with a custom `IdentityConv` operator:

```bash
python scripts/create_identity_neural_network.py
python custom_op_plugin/create_identity_neural_network.py \
--output_path=/tmp/identity_neural_network.onnx
```

This generates the identity_neural_network.onnx model in `TensorRT-Custom-Plugin-Example/data/identity_neural_network.onnx`

**Step 2**: Quantize the ONNX model. We will be using the `libidentity_conv_iplugin_v2_io_ext.so` plugin for this example.
**Step 2**: Quantize the ONNX model using the compiled plugin.

```bash
python -m modelopt.onnx.quantization \
--onnx_path=/path/to/identity_neural_network.onnx \
--trt_plugins=/path/to/libidentity_conv_iplugin_v2_io_ext.so
--onnx_path=/tmp/identity_neural_network.onnx \
--trt_plugins=/tmp/plugin_build/libidentity_conv_plugin.so
```

**Step 3**: Deploy the quantized model with TensorRT.

```bash
trtexec --onnx=/path/to/identity_neural_network.quant.onnx \
--staticPlugins=/path/to/libidentity_conv_iplugin_v2_io_ext.so
trtexec --onnx=/tmp/identity_neural_network.quant.onnx \
--staticPlugins=/tmp/plugin_build/libidentity_conv_plugin.so
```

### Optimize Q/DQ node placement with Autotune
Expand Down
105 changes: 105 additions & 0 deletions examples/onnx_ptq/custom_op_plugin/create_identity_neural_network.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Create a simple identity neural network with a custom IdentityConv operator.

This script generates an ONNX model consisting of three convolutional layers where the
second Conv node is replaced with a custom ``IdentityConv`` operator. The custom operator
is not defined in the standard ONNX operator set and requires a TensorRT plugin to parse.

Based on https://github.com/leimao/TensorRT-Custom-Plugin-Example.
"""

import argparse
import os

import numpy as np
import onnx
import onnx_graphsurgeon as gs


def create_identity_neural_network(output_path: str) -> None:
"""Create and save an ONNX model with a custom IdentityConv operator."""
opset_version = 15

input_shape = (1, 3, 480, 960)
input_channels = input_shape[1]

# Configure identity convolution weights (depthwise, 1x1 kernel with all ones).
weights_shape = (input_channels, 1, 1, 1)
num_groups = input_channels
weights_data = np.ones(weights_shape, dtype=np.float32)

# Build the ONNX graph using onnx-graphsurgeon.
x0 = gs.Variable(name="X0", dtype=np.float32, shape=input_shape)
w0 = gs.Constant(name="W0", values=weights_data)
x1 = gs.Variable(name="X1", dtype=np.float32, shape=input_shape)
w1 = gs.Constant(name="W1", values=weights_data)
x2 = gs.Variable(name="X2", dtype=np.float32, shape=input_shape)
w2 = gs.Constant(name="W2", values=weights_data)
x3 = gs.Variable(name="X3", dtype=np.float32, shape=input_shape)

conv_attrs = {
"kernel_shape": [1, 1],
"strides": [1, 1],
"pads": [0, 0, 0, 0],
"group": num_groups,
}

node_1 = gs.Node(name="Conv-1", op="Conv", inputs=[x0, w0], outputs=[x1], attrs=conv_attrs)

# The second node uses the custom IdentityConv operator instead of standard Conv.
# This operator requires a TensorRT plugin to be loaded at runtime.
node_2 = gs.Node(
name="Conv-2",
op="IdentityConv",
inputs=[x1, w1],
outputs=[x2],
attrs={
**conv_attrs,
"plugin_version": "1",
"plugin_namespace": "",
},
)

node_3 = gs.Node(name="Conv-3", op="Conv", inputs=[x2, w2], outputs=[x3], attrs=conv_attrs)

graph = gs.Graph(
nodes=[node_1, node_2, node_3],
inputs=[x0],
outputs=[x3],
opset=opset_version,
)
model = gs.export_onnx(graph)
# Shape inference does not work with the custom operator.
dirname = os.path.dirname(output_path)
if dirname:
os.makedirs(dirname, exist_ok=True)
onnx.save(model, output_path)
print(f"Saved ONNX model to {output_path}")


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Create an ONNX model with a custom IdentityConv operator."
)
parser.add_argument(
"--output_path",
type=str,
default="identity_neural_network.onnx",
help="Path to save the generated ONNX model.",
)
args = parser.parse_args()
create_identity_neural_network(args.output_path)
39 changes: 39 additions & 0 deletions examples/onnx_ptq/custom_op_plugin/plugin/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

cmake_minimum_required(VERSION 3.18)

project(IDENTITY-CONV-PLUGIN VERSION 0.0.1 LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

find_package(CUDAToolkit REQUIRED)

# TensorRT libraries
find_library(NVINFER_LIB nvinfer HINTS /usr/lib/x86_64-linux-gnu/ PATH_SUFFIXES lib lib64 REQUIRED)
find_library(NVINFER_PLUGIN_LIB nvinfer_plugin HINTS /usr/lib/x86_64-linux-gnu/ PATH_SUFFIXES lib lib64 REQUIRED)

add_library(
identity_conv_plugin
SHARED
PluginUtils.cpp
IdentityConvPlugin.cpp
IdentityConvPluginCreator.cpp
PluginRegistration.cpp
)

target_include_directories(identity_conv_plugin PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(identity_conv_plugin PRIVATE ${NVINFER_LIB} ${NVINFER_PLUGIN_LIB} CUDA::cudart)
Loading
Loading