NVIDIA · ajrasane · Mar 24, 2026 · Mar 24, 2026 · Mar 25, 2026 · coderabbitai
@@ -26,6 +26,8 @@ Please use the TensorRT docker image (e.g., `nvcr.io/nvidia/tensorrt:26.02-py3`)
 
 > **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
 
+> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
-> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
-
-> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
+> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
+
-> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
-
-> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
+> **Note:** If you are using `onnxruntime-gpu`, we recommend using `nvcr.io/nvidia/tensorrt:25.06-py3` as it is built with CUDA 12, which is required by the stable `onnxruntime-gpu` package.
+
+
 Set the following environment variables inside the TensorRT docker.
 
 ```bash
@@ -172,53 +174,39 @@ python -m modelopt.onnx.quantization \
 
 This feature requires `TensorRT 10+` and `ORT>=1.20`. For proper usage, please make sure that the paths to `libcudnn*.so` and TensorRT `lib/` are in the `LD_LIBRARY_PATH` env variable and that the `tensorrt` python package is installed.
 
-Please see the sample example below.
-
-**Step 1**: Obtain the sample ONNX model and TensorRT plugin from [TensorRT-Custom-Plugin-Example](https://github.com/leimao/TensorRT-Custom-Plugin-Example).
-
-&#160; **1.1.** Change directory to `TensorRT-Custom-Plugin-Example`:
-
-```bash
-cd /path/to/TensorRT-Custom-Plugin-Example
-```
+A self-contained example is provided in the [`custom_op_plugin/`](./custom_op_plugin/) subfolder. Please see the steps below.
 
-&#160; **1.2.** Compile the TensorRT plugin:
+**Step 1**: Build the TensorRT plugin and create the sample ONNX model.
 
-```bash
-cmake -B build \
-    -DNVINFER_LIB=$TRT_LIBPATH/libnvinfer.so.10 \
-    -DNVINFER_PLUGIN_LIB=$TRT_LIBPATH/libnvinfer_plugin.so.10 \
-    -DNVONNXPARSER_LIB=$TRT_LIBPATH/libnvonnxparser.so.10 \
-    -DCMAKE_CXX_STANDARD_INCLUDE_DIRECTORIES=/usr/include/x86_64-linux-gnu
-```
+&#160; **1.1.** Compile the TensorRT plugin:
 
 ```bash
-cmake --build build --config Release --parallel
+cmake -S custom_op_plugin/plugin -B /tmp/plugin_build
+cmake --build /tmp/plugin_build --config Release --parallel
 ```
 
-This generates a plugin in `TensorRT-Custom-Plugin-Example/build/src/plugins/IdentityConvIPluginV2IOExt/libidentity_conv_iplugin_v2_io_ext.so`
+This generates `/tmp/plugin_build/libidentity_conv_plugin.so`.
 
-&#160; **1.3.** Create the ONNX file.
+&#160; **1.2.** Create the ONNX model with a custom `IdentityConv` operator:
 
 ```bash
-python scripts/create_identity_neural_network.py
+python custom_op_plugin/create_identity_neural_network.py \
+    --output_path=/tmp/identity_neural_network.onnx
 ```
 
-This generates the identity_neural_network.onnx model in `TensorRT-Custom-Plugin-Example/data/identity_neural_network.onnx`
-
-**Step 2**: Quantize the ONNX model. We will be using the `libidentity_conv_iplugin_v2_io_ext.so` plugin for this example.
+**Step 2**: Quantize the ONNX model using the compiled plugin.
 
 ```bash
 python -m modelopt.onnx.quantization \
-    --onnx_path=/path/to/identity_neural_network.onnx \
-    --trt_plugins=/path/to/libidentity_conv_iplugin_v2_io_ext.so
+    --onnx_path=/tmp/identity_neural_network.onnx \
+    --trt_plugins=/tmp/plugin_build/libidentity_conv_plugin.so
 ```
 
 **Step 3**: Deploy the quantized model with TensorRT.
 
 ```bash
-trtexec --onnx=/path/to/identity_neural_network.quant.onnx \
-    --staticPlugins=/path/to/libidentity_conv_iplugin_v2_io_ext.so
+trtexec --onnx=/tmp/identity_neural_network.quant.onnx \
+    --staticPlugins=/tmp/plugin_build/libidentity_conv_plugin.so
 ```
 
 ### Optimize Q/DQ node placement with Autotune

@@ -0,0 +1,105 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Create a simple identity neural network with a custom IdentityConv operator.
+
+This script generates an ONNX model consisting of three convolutional layers where the
+second Conv node is replaced with a custom ``IdentityConv`` operator. The custom operator
+is not defined in the standard ONNX operator set and requires a TensorRT plugin to parse.
+
+Based on https://github.com/leimao/TensorRT-Custom-Plugin-Example.
+"""
+
+import argparse
+import os
+
+import numpy as np
+import onnx
+import onnx_graphsurgeon as gs
+
+
+def create_identity_neural_network(output_path: str) -> None:
+    """Create and save an ONNX model with a custom IdentityConv operator."""
+    opset_version = 15
+
+    input_shape = (1, 3, 480, 960)
+    input_channels = input_shape[1]
+
+    # Configure identity convolution weights (depthwise, 1x1 kernel with all ones).
+    weights_shape = (input_channels, 1, 1, 1)
+    num_groups = input_channels
+    weights_data = np.ones(weights_shape, dtype=np.float32)
+
+    # Build the ONNX graph using onnx-graphsurgeon.
+    x0 = gs.Variable(name="X0", dtype=np.float32, shape=input_shape)
+    w0 = gs.Constant(name="W0", values=weights_data)
+    x1 = gs.Variable(name="X1", dtype=np.float32, shape=input_shape)
+    w1 = gs.Constant(name="W1", values=weights_data)
+    x2 = gs.Variable(name="X2", dtype=np.float32, shape=input_shape)
+    w2 = gs.Constant(name="W2", values=weights_data)
+    x3 = gs.Variable(name="X3", dtype=np.float32, shape=input_shape)
+
+    conv_attrs = {
+        "kernel_shape": [1, 1],
+        "strides": [1, 1],
+        "pads": [0, 0, 0, 0],
+        "group": num_groups,
+    }
+
+    node_1 = gs.Node(name="Conv-1", op="Conv", inputs=[x0, w0], outputs=[x1], attrs=conv_attrs)
+
+    # The second node uses the custom IdentityConv operator instead of standard Conv.
+    # This operator requires a TensorRT plugin to be loaded at runtime.
+    node_2 = gs.Node(
+        name="Conv-2",
+        op="IdentityConv",
+        inputs=[x1, w1],
+        outputs=[x2],
+        attrs={
+            **conv_attrs,
+            "plugin_version": "1",
+            "plugin_namespace": "",
+        },
+    )
+
+    node_3 = gs.Node(name="Conv-3", op="Conv", inputs=[x2, w2], outputs=[x3], attrs=conv_attrs)
+
+    graph = gs.Graph(
+        nodes=[node_1, node_2, node_3],
+        inputs=[x0],
+        outputs=[x3],
+        opset=opset_version,
+    )
+    model = gs.export_onnx(graph)
+    # Shape inference does not work with the custom operator.
+    dirname = os.path.dirname(output_path)
+    if dirname:
+        os.makedirs(dirname, exist_ok=True)
+    onnx.save(model, output_path)
+    print(f"Saved ONNX model to {output_path}")
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(
+        description="Create an ONNX model with a custom IdentityConv operator."
+    )
+    parser.add_argument(
+        "--output_path",
+        type=str,
+        default="identity_neural_network.onnx",
+        help="Path to save the generated ONNX model.",
+    )
+    args = parser.parse_args()
+    create_identity_neural_network(args.output_path)
@@ -0,0 +1,39 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+cmake_minimum_required(VERSION 3.18)
+
+project(IDENTITY-CONV-PLUGIN VERSION 0.0.1 LANGUAGES CXX)
+
+set(CMAKE_CXX_STANDARD 14)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
+
+find_package(CUDAToolkit REQUIRED)
+
+# TensorRT libraries
+find_library(NVINFER_LIB nvinfer HINTS /usr/lib/x86_64-linux-gnu/ PATH_SUFFIXES lib lib64 REQUIRED)
+find_library(NVINFER_PLUGIN_LIB nvinfer_plugin HINTS /usr/lib/x86_64-linux-gnu/ PATH_SUFFIXES lib lib64 REQUIRED)
+
+add_library(
+    identity_conv_plugin
+    SHARED
+    PluginUtils.cpp
+    IdentityConvPlugin.cpp
+    IdentityConvPluginCreator.cpp
+    PluginRegistration.cpp
+)
+
+target_include_directories(identity_conv_plugin PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
+target_link_libraries(identity_conv_plugin PRIVATE ${NVINFER_LIB} ${NVINFER_PLUGIN_LIB} CUDA::cudart)