Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# TensorRT Extension for Stable Diffusion
# TensorRT Extension for Stable Diffusion

This extension enables the best performance on NVIDIA RTX GPUs for Stable Diffusion with TensorRT.
This extension enables the best performance on NVIDIA RTX GPUs for Stable Diffusion with TensorRT.

You need to install the extension and generate optimized engines before using the extension. Please follow the instructions below to set everything up.
You need to install the extension and generate optimized engines before using the extension. Please follow the instructions below to set everything up.

Supports Stable Diffusion 1.5 and 2.1. Native SDXL support coming in a future release. Please use the [dev branch](https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/dev) if you would like to use it today. Note that the Dev branch is not intended for production work and may break other things that you are currently using.

Expand All @@ -17,10 +17,10 @@ Example instructions for Automatic1111:

## How to use

1. Click on the “Generate Default Engines” button. This step takes 2-10 minutes depending on your GPU. You can generate engines for other combinations.
1. Click on the “Generate Default Engines” button. This step takes 2-10 minutes depending on your GPU. You can generate engines for other combinations.
2. Go to Settings → User Interface → Quick Settings List, add sd_unet. Apply these settings, then reload the UI.
3. Back in the main UI, select the TRT model from the sd_unet dropdown menu at the top of the page.
4. You can now start generating images accelerated by TRT. If you need to create more Engines, go to the TensorRT tab.
3. Back in the main UI, select the TRT model from the sd_unet dropdown menu at the top of the page.
4. You can now start generating images accelerated by TRT. If you need to create more Engines, go to the TensorRT tab.

Happy prompting!

Expand All @@ -29,15 +29,15 @@ Happy prompting!
TensorRT uses optimized engines for specific resolutions and batch sizes. You can generate as many optimized engines as desired. Types:

- The "Export Default Engines” selection adds support for resolutions between 512x512 and 768x768 for Stable Diffusion 1.5 and 768x768 to 1024x1024 for SDXL with batch sizes 1 to 4.
- Static engines support a single specific output resolution and batch size.
- Dynamic engines support a range of resolutions and batch sizes, at a small cost in performance. Wider ranges will use more VRAM.
- Static engines support a single specific output resolution and batch size.
- Dynamic engines support a range of resolutions and batch sizes, at a small cost in performance. Wider ranges will use more VRAM.

Each preset can be adjusted with the “Advanced Settings” option. More detailed instructions can be found [here](https://nvidia.custhelp.com/app/answers/detail/a_id/5487/~/tensorrt-extension-for-stable-diffusion-web-ui).

### Common Issues/Limitations

**HIRES FIX:** If using the hires.fix option in Automatic1111 you must build engines that match both the starting and ending resolutions. For instance, if initial size is `512 x 512` and hires.fix upscales to `1024 x 1024`, you must either generate two engines, one at 512 and one at 1024, or generate a single dynamic engine that covers the whole range.
Having two seperate engines will heavily impact performance at the moment. Stay tuned for updates.
Having two separate engines will heavily impact performance at the moment. Stay tuned for updates.

**Resolution:** When generating images the resolution needs to be a multiple of 64. This applies to hires.fix as well, requiring the low and high-res to be divisible by 64.

Expand All @@ -55,4 +55,4 @@ Having two seperate engines will heavily impact performance at the moment. Stay
- Linux: >= 450.80.02
- Windows: >=452.39

We always recommend keeping the driver up-to-date for system wide performance improvments.
We always recommend keeping the driver up-to-date for system wide performance improvements.
2 changes: 1 addition & 1 deletion scripts/trt.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def forward(self, x, timesteps, context, *args, **kwargs):
if "y" in kwargs:
feed_dict["y"] = kwargs["y"].float()

# Need to check compatability on the fly
# Need to check compatibility on the fly
if self.shape_hash != hash(x.shape):
nvtx.range_push("switch_engine")
if x.shape[-1] % 8 or x.shape[-2] % 8:
Expand Down
4 changes: 2 additions & 2 deletions ui_trt.py
Original file line number Diff line number Diff line change
Expand Up @@ -511,13 +511,13 @@ def get_version_from_filename(name):

def get_lora_checkpoints():
available_lora_models = {}
canditates = list(
candidates = list(
shared.walk_files(
shared.cmd_opts.lora_dir,
allowed_extensions=[".pt", ".ckpt", ".safetensors"],
)
)
for filename in canditates:
for filename in candidates:
name = os.path.splitext(os.path.basename(filename))[0]
try:
metadata = sd_models.read_metadata_from_safetensors(filename)
Expand Down