JIT: Skip redundant AND masking in NarrowWithSaturation codegen by laveeshb · Pull Request #122898 · dotnet/runtime

laveeshb · 2026-01-05T21:05:43Z

When NarrowWithSaturation is used with unsigned narrow types (e.g., UInt32 → UInt16, UInt16 → Byte), we're already clamping via Min() internally - so the subsequent vpand to mask the result is redundant. This adds an inputsAlreadyClamped param to gtNewSimdNarrowNode so callers like NarrowWithSaturation can skip the extra AND for unsigned types.

Note: For signed narrow types (e.g., Int32 → Int16), the AND masking is still required. After clamping, negative values have sign-extended upper bits that must be cleared before PackUnsignedSaturate can correctly pack them.

Approach suggested by @tannergooding in the issue.

Before (unsigned narrowing):

vpminuw  xmm1, xmm0, xmmword ptr [...]
vpand    xmm1, xmm1, xmm0
vpminuw  xmm2, xmm0, xmmword ptr [...]
vpand    xmm0, xmm2, xmm0
vpackuswb xmm0, xmm1, xmm0

After (unsigned narrowing):

vpminuw  xmm1, xmm0, xmmword ptr [...]
vpminuw  xmm0, xmm0, xmmword ptr [...]
vpackuswb xmm0, xmm1, xmm0

ARM64 isn't affected - it uses AdvSimd intrinsics that handle this natively.

Tested with System.Runtime.Intrinsics and System.Numerics.Vectors test suites.

dotnet-policy-service · 2026-01-05T21:06:56Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR optimizes the codegen for Vector128/256.NarrowWithSaturation by eliminating redundant AND masking instructions on x86/x64. The optimization recognizes that when inputs are already clamped to the target range by preceding Min/Max operations, the subsequent AND operations are unnecessary.

Key Changes:

Added an optional inputsAlreadyClamped parameter to gtNewSimdNarrowNode to skip AND masking when inputs are guaranteed to be within target range
Updated the NarrowWithSaturation codegen path to pass inputsAlreadyClamped=true after explicit clamping operations
Added regression test with disasm checks to verify the optimization

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/coreclr/jit/compiler.h	Added optional `inputsAlreadyClamped` parameter (default false) to `gtNewSimdNarrowNode` function signature
src/coreclr/jit/gentree.cpp	Implemented conditional logic to skip AND masking operations when `inputsAlreadyClamped` is true for four specific narrowing scenarios (TYP_UBYTE and TYP_USHORT paths in both AVX2 and SSE2)
src/coreclr/jit/hwintrinsicxarch.cpp	Modified `NarrowWithSaturation` implementation to pass `inputsAlreadyClamped=true` when calling `gtNewSimdNarrowNode` after explicit Min/Max clamping
src/tests/JIT/Regression/JitBlue/Runtime_116526/Runtime_116526.csproj	Added test project configuration with disasm checking enabled and JIT optimization settings
src/tests/JIT/Regression/JitBlue/Runtime_116526/Runtime_116526.cs	Added regression test with disasm assertions to verify `vpand` is not generated and functional tests to verify correctness for UInt16→Byte and UInt32→UShort narrowing

src/tests/JIT/Regression/JitBlue/Runtime_116526/Runtime_116526.cs

laveeshb · 2026-01-20T15:26:41Z

Friendly ping - @tannergooding since you suggested the approach in #116526, would you mind taking a look when you have a moment?

laveeshb · 2026-01-21T15:03:04Z

@tannergooding I see some CI tasks have failed. AFAICT these are not due to my changes. I had tested them locally. Can you advise how to unblock the PR?

tannergooding · 2026-01-21T20:42:27Z

src/coreclr/jit/hwintrinsicxarch.cpp

                                              /* isMax */ false, /* isMagnitude */ false, /* isNumber */ false);

-                    retNode = gtNewSimdNarrowNode(retType, op1, op2, narrowSimdBaseType, simdSize);
+                    retNode = gtNewSimdNarrowNode(retType, op1, op2, narrowSimdBaseType, simdSize,


Looks like this is failing System.Numerics.Tests.GenericVectorTests.NarrowWithSaturationInt32Test

@tannergooding Thanks for pointing this out! I investigated the root cause:

The Problem:
The inputsAlreadyClamped = true optimization was incorrectly applied to all narrow types, but it's only valid for unsigned types.

For signed narrowing (e.g., Int32 → Int16):

Values are clamped to [-32768, 32767]

But negative values still have sign-extended upper bits (e.g., -1 is 0xFFFFFFFF)

PackUnsignedSaturate needs the AND masking to clear these upper bits before packing

Without the mask, PackUnsignedSaturate sees the full 32-bit value and produces incorrect results

For unsigned narrowing (e.g., UInt32 → UInt16):

Values are clamped to [0, 65535]

The upper bits are naturally zero after clamping

The AND masking is redundant and can safely be skipped

The Fix:
Changed the code to only set inputsAlreadyClamped = true for unsigned narrow types:

bool inputsAlreadyClamped = varTypeIsUnsigned(narrowSimdBaseType);

I've pushed the fix and updated the PR description to clarify this limitation. The optimization now only applies to TYP_UBYTE and TYP_USHORT narrow types.

Vector128/256.NarrowWithSaturation was generating redundant vpand instructions because gtNewSimdNarrowNode didn't know the inputs were already clamped to the target range by the preceding Min() operations. This change adds an optional parameter to gtNewSimdNarrowNode to skip the AND masking when the caller knows inputs are already in range. The fix applies to x86/x64 only. ARM64 uses native AdvSimd instructions that don't have this issue. Before: vpminuw xmm1, xmm0, xmmword ptr [...] vpand xmm1, xmm1, xmm0 ; redundant vpminuw xmm2, xmm0, xmmword ptr [...] vpand xmm0, xmm2, xmm0 ; redundant vpackuswb xmm0, xmm1, xmm0 After: vpminuw xmm1, xmm0, xmmword ptr [...] vpminuw xmm0, xmm0, xmmword ptr [...] vpackuswb xmm0, xmm1, xmm0

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/coreclr/jit/gentree.cpp:25216

inputsAlreadyClamped is not referenced on the TARGET_ARM64 side of gtNewSimdNarrowNode, which can trigger unused-parameter warnings on non-xarch builds. Consider explicitly suppressing it for the ARM64 path (or only naming/using it under TARGET_XARCH) to avoid build breaks under Werror configurations.

Copilot · 2026-02-11T15:41:17Z

src/coreclr/jit/hwintrinsicxarch.cpp

+                    // For unsigned narrow types (TYP_UBYTE, TYP_USHORT), the clamping already ensures
+                    // the upper bits are zero, so the AND masking in gtNewSimdNarrowNode is redundant.
+                    // For signed narrow types, we still need the AND masking to clear sign-extended bits
+                    // before PackUnsignedSaturate can correctly pack the values.


The comment says this is only relevant for unsigned narrow types TYP_UBYTE/TYP_USHORT, but the implementation sets inputsAlreadyClamped based on varTypeIsUnsigned(narrowSimdBaseType), which also includes TYP_UINT. Either tighten the condition to the types that actually benefit from skipping the mask (the PackUnsignedSaturate paths) or broaden the comment to match the condition.

Suggested change

// For unsigned narrow types (TYP_UBYTE, TYP_USHORT), the clamping already ensures

// the upper bits are zero, so the AND masking in gtNewSimdNarrowNode is redundant.

// For signed narrow types, we still need the AND masking to clear sign-extended bits

// before PackUnsignedSaturate can correctly pack the values.

// For unsigned narrow element types (for example, TYP_UBYTE, TYP_USHORT, TYP_UINT),

// the clamping already ensures the upper bits are zero, so the AND masking in

// gtNewSimdNarrowNode is redundant. For signed narrow types, we still need the AND

// masking to clear sign-extended bits before PackUnsignedSaturate can correctly pack

// the values.

…uration The inputsAlreadyClamped optimization was incorrectly applied to all types, but it's only valid for unsigned narrow types (TYP_UBYTE, TYP_USHORT). For signed narrow types, the AND masking is still required to clear the sign-extended upper bits before PackUnsignedSaturate can correctly pack the values. Without this masking, negative values would have incorrect upper bits that would cause PackUnsignedSaturate to produce wrong results. This fixes System.Numerics.Tests.GenericVectorTests.NarrowWithSaturationInt32Test

Copilot AI review requested due to automatic review settings January 5, 2026 21:05

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 5, 2026

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 5, 2026

Copilot started reviewing on behalf of laveeshb January 5, 2026 21:06 View session

Copilot AI reviewed Jan 5, 2026

View reviewed changes

laveeshb force-pushed the fix/narrow-with-saturation-codegen branch from bd48091 to ea91488 Compare January 5, 2026 21:20

This was referenced Jan 6, 2026

[Android][CoreCLR] System.Security.Cryptography.Tests killed by lowmemorykiller #118603

Open

iOS.Device test WorkItemExecutions #122874

Open

saucecontrol reviewed Jan 6, 2026

View reviewed changes

src/tests/JIT/Regression/JitBlue/Runtime_116526/Runtime_116526.cs Outdated Show resolved Hide resolved

laveeshb force-pushed the fix/narrow-with-saturation-codegen branch from ea91488 to 431213e Compare January 6, 2026 02:25

tannergooding approved these changes Jan 20, 2026

View reviewed changes

tannergooding enabled auto-merge (squash) January 20, 2026 21:58

build-analysis bot mentioned this pull request Jan 21, 2026

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

3 tasks

tannergooding reviewed Jan 21, 2026

View reviewed changes

auto-merge was automatically disabled February 11, 2026 15:33
Head branch was pushed to by a user without write access

Copilot AI review requested due to automatic review settings February 11, 2026 15:33

Copilot started reviewing on behalf of laveeshb February 11, 2026 15:34 View session

laveeshb force-pushed the fix/narrow-with-saturation-codegen branch from 847f263 to e6cef00 Compare February 11, 2026 15:34

Copilot AI reviewed Feb 11, 2026

View reviewed changes

laveeshb force-pushed the fix/narrow-with-saturation-codegen branch from e6cef00 to d600dbc Compare February 11, 2026 16:26

This was referenced Feb 11, 2026

System.IO.Tests.File_GetSetTimes_SafeFileHandle.WritingShouldUpdateWriteTime_After_SetLastAccessTime failing #97020

Open

System.Security.Cryptography.CryptographicException : m_safeCertContext is an invalid handle. #124279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Skip redundant AND masking in NarrowWithSaturation codegen#122898

JIT: Skip redundant AND masking in NarrowWithSaturation codegen#122898
laveeshb wants to merge 2 commits intodotnet:mainfrom
laveeshb:fix/narrow-with-saturation-codegen

laveeshb commented Jan 5, 2026 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Jan 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

laveeshb commented Jan 20, 2026

Uh oh!

laveeshb commented Jan 21, 2026

Uh oh!

tannergooding Jan 21, 2026

Uh oh!

laveeshb Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-                    // For unsigned narrow types (TYP_UBYTE, TYP_USHORT), the clamping already ensures
-                    // the upper bits are zero, so the AND masking in gtNewSimdNarrowNode is redundant.
-                    // For signed narrow types, we still need the AND masking to clear sign-extended bits
-                    // before PackUnsignedSaturate can correctly pack the values.
+                    // For unsigned narrow element types (for example, TYP_UBYTE, TYP_USHORT, TYP_UINT),
+                    // the clamping already ensures the upper bits are zero, so the AND masking in
+                    // gtNewSimdNarrowNode is redundant. For signed narrow types, we still need the AND
+                    // masking to clear sign-extended bits before PackUnsignedSaturate can correctly pack
+                    // the values.

Conversation

laveeshb commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jan 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

laveeshb commented Jan 20, 2026

Uh oh!

laveeshb commented Jan 21, 2026

Uh oh!

tannergooding Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

laveeshb Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

laveeshb commented Jan 5, 2026 •

edited

Loading