Vectorize Adler32 by stephentoub · Pull Request #124409 · dotnet/runtime

stephentoub · 2026-02-13T23:30:18Z

No description provided.

Copilot

Pull request overview

This PR updates the System.IO.Hashing Adler-32 implementation to use hardware intrinsics for vectorized processing (where available) and extends the test suite to validate correctness across a wider set of input shapes.

Changes:

Add SIMD-accelerated Adler-32 update paths for Arm (AdvSimd) and x86 (Vector128 / AVX2 / AVX-512 BW), with scalar fallback.
Refactor scalar update to share ModBase / NMax constants and introduce a scalar tail helper for vector paths.
Add new test coverage for many input lengths, max-byte data, and incremental append chunking, validated against a reference Adler-32 implementation.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs`	Introduces vectorized Adler-32 update implementations (Arm/x86) with feature detection and scalar fallback.
`src/libraries/System.IO.Hashing/tests/Adler32Tests.cs`	Adds reference-based tests targeting multiple length boundaries, overflow-stressing inputs, and incremental append behavior.

src/libraries/System.IO.Hashing/tests/Adler32Tests.cs

dotnet-policy-service · 2026-02-14T11:32:59Z

Tagging subscribers to this area: @dotnet/area-system-io-hashing, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

stephentoub · 2026-02-14T13:45:17Z

🤖 Copilot Code Review — PR #124409

Holistic Assessment

Motivation: The Adler-32 implementation added in #123601 was scalar-only. Adler-32 is used in zlib/deflate, so vectorization is a well-motivated performance improvement for a hot-path algorithm. The PR is justified.

Approach: The implementation follows the well-known approach from zlib/chromium of decomposing the Adler-32 s2 weighted sum into position-weighted sums computed via SIMD multiply-add intrinsics, with a prefix-sum accumulator tracking inter-block s1 contributions. Four vectorized paths are provided (AdvSimd, SSE2/SSSE3, AVX2, AVX-512BW) plus a widening fallback for Vector128 without SSSE3. This matches the pattern used by other vectorized hash implementations in this library (Crc32.Vectorized.cs, Crc64.Vectorized.cs, XxHashShared.cs).

Summary: ✅ LGTM. After extensive mathematical verification of all four vectorized paths (tracing the s1/s2 accumulation through multi-block examples), the code is correct. The test coverage is thorough with a reference implementation, and the patterns are consistent with the rest of the codebase. One minor suggestion below. No blocking issues found.

Detailed Findings

✅ Vectorized s2 Accumulation — Verified correct across all paths

I traced through the prefix-sum logic (the vps/vs3 accumulators) for each path with a 2-block example:

Vector128/ARM (vps pattern): vps starts at s1 * blocks, accumulates old vs1 values, then vps << 5 gives the inter-block contribution. vs2 is seeded with s2 (x86) or added via s2 += (ARM). Both produce s2_new = s2_old + BlockSize*blocks*s1 + Σ(BlockSize * prefix_byte_sums) + Σ(weighted_sums). ✅
Vector256/512 (vs3 pattern): vs1 starts at CreateScalar(s1) (not zero), vs3 accumulates old vs1 values including the initial s1. vs3 << 5 (or << 6 for 512) produces the same inter-block contribution. Mathematically equivalent to the vps pattern. ✅

✅ Vector512 Weight Correction — Correct

The weights vector repeats 32..1 for both 256-bit halves. sumLo << 5 adds 32 * sum(lower_32_bytes) to compensate: the lower half needs weights 64..33 but gets 32..1, so adding 32 × byte_sum for each lower byte bridges the gap. The upper half already has the correct weights 32..1 for positions 32..63. ✅

✅ Vector128 Non-SSSE3 Fallback — Correct and overflow-safe

The widening fallback sums widened ushort values from four 8-element vectors. Each ushort lane accumulates at most 4 byte values (max 4 × 255 = 1020), well within ushort range. WeightedSumWidening128 produces int16 products (max 255 × 32 = 8160 < 32767) then widens to int32 for safe accumulation. ✅

✅ Endianness Guard — Appropriate

The BitConverter.IsLittleEndian check before vectorized paths is correct. Vector.LoadUnsafe maps memory bytes to vector lane indices in memory order, but the position-weighted tap vectors assume little-endian lane numbering (byte[0] at lane 0). On big-endian platforms, this mapping may differ, so falling back to scalar is the safe choice. ✅

✅ AVX2 Lane Independence — Verified

Avx2.MultiplyAddAdjacent (VPMADDUBSW) operates on 128-bit lanes independently. The weight vector [32..17 | 16..1] correctly places weights within each lane: lower lane processes bytes 0..15 with weights 32..17, upper lane processes bytes 16..31 with weights 16..1. No cross-lane issue. ✅

✅ Test Coverage — Thorough

Tests cover: (1) various lengths hitting all vector width boundaries and tail transitions, (2) all-0xFF stress testing for accumulator overflow safety, (3) incremental append with varying chunk sizes, all validated against a clean reference implementation that applies modular reduction per byte. Good use of [Theory] with [InlineData]. ✅

💡 Missing benchmark data

The PR description doesn't include benchmark numbers. While the algorithmic improvement is well-established from zlib implementations, having BenchmarkDotNet results would help quantify the speedup for the .NET implementation specifically. This is a nice-to-have for the PR description, not a blocker — the approach is proven.

stephentoub · 2026-02-14T17:07:28Z

@EgorBot -amd -intel -arm

using System.IO.Hashing;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser]
public class Bench
{
    private byte[] _bytes;

    [Params(10, 100, 1000, 10_000, 100_000)]
    public int Size { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _bytes = new byte[Size];
        for (int i = 0; i < Size; i++) _bytes[i] = (byte)('a' + (i % 26));
    }

    [Benchmark]
    public uint Adler() => Adler32.HashToUInt32(_bytes);
}

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs

stephentoub · 2026-02-14T19:26:50Z

On my machine with AVX2...

Before:

Method	Size	Mean
Adler	1	2.685 ns
Adler	10	5.039 ns
Adler	1000	477.781 ns
Adler	10000	4,743.580 ns

After:

Method	Size	Mean
Adler	1	2.924 ns
Adler	10	5.567 ns
Adler	1000	35.518 ns
Adler	10000	280.908 ns

using System.IO.Hashing;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

[MemoryDiagnoser]
public class Bench
{
    private byte[] _bytes;

    [Params(1, 10, 1000, 10_000)]
    public int Size { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _bytes = new byte[Size];
        for (int i = 0; i < Size; i++)
        {
            _bytes[i] = (byte)('a' + (i % 26));
        }
    }

    [Benchmark]
    public uint Adler() => Adler32.HashToUInt32(_bytes);
}

AraHaan · 2026-02-17T18:38:59Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs

+            if (BitConverter.IsLittleEndian &&
+                Vector128.IsHardwareAccelerated &&
+                source.Length >= Vector128<byte>.Count * 2)


Could this be put in CanBeVectorized just like in Crc32.Vectorized.cs (maybe put the Vectorized code in Adler32.Vectorized.cs?

AraHaan · 2026-02-17T18:40:56Z

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs

+        [MethodImpl(MethodImplOptions.NoInlining)]
+        private static uint UpdateVector128(uint adler, ReadOnlySpan<byte> source)
+        {
+            Debug.Assert(source.Length >= Vector128<byte>.Count * 2);
+
+            const int BlockSize = 32; // two Vector128<byte> loads
+
+            uint s1 = adler & 0xFFFF;
+            uint s2 = (adler >> 16) & 0xFFFF;
+
+            ref byte sourceRef = ref MemoryMarshal.GetReference(source);
+            int length = source.Length;
+
+            Vector128<sbyte> tap1 = Vector128.Create(32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17);
+            Vector128<sbyte> tap2 = Vector128.Create(16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1);
+
+            do
+            {
+                int n = Math.Min(length, NMax);
+                int blocks = n / BlockSize;
+                n = blocks * BlockSize;
+                length -= n;
+
+                Vector128<uint> vs1 = Vector128<uint>.Zero;
+                Vector128<uint> vs2 = Vector128.CreateScalar(s2);
+                Vector128<uint> vps = Vector128.CreateScalar(s1 * (uint)blocks);
+
+                do
+                {
+                    Vector128<byte> bytes1 = Vector128.LoadUnsafe(ref sourceRef);
+                    Vector128<byte> bytes2 = Vector128.LoadUnsafe(ref sourceRef, 16);
+                    sourceRef = ref Unsafe.Add(ref sourceRef, BlockSize);
+
+                    vps += vs1;
+
+                    if (Ssse3.IsSupported)
+                    {
+                        vs1 += Sse2.SumAbsoluteDifferences(bytes1, Vector128<byte>.Zero).AsUInt32();
+                        vs1 += Sse2.SumAbsoluteDifferences(bytes2, Vector128<byte>.Zero).AsUInt32();
+
+                        vs2 += Sse2.MultiplyAddAdjacent(Ssse3.MultiplyAddAdjacent(bytes1, tap1), Vector128<short>.One).AsUInt32();
+                        vs2 += Sse2.MultiplyAddAdjacent(Ssse3.MultiplyAddAdjacent(bytes2, tap2), Vector128<short>.One).AsUInt32();
+                    }
+                    else
+                    {
+                        (Vector128<ushort> lo1, Vector128<ushort> hi1) = Vector128.Widen(bytes1);
+                        (Vector128<ushort> lo2, Vector128<ushort> hi2) = Vector128.Widen(bytes2);
+                        (Vector128<uint> sumLo, Vector128<uint> sumHi) = Vector128.Widen(lo1 + hi1 + lo2 + hi2);
+                        vs1 += sumLo + sumHi;
+                        vs2 += WeightedSumWidening128(bytes1, tap1) + WeightedSumWidening128(bytes2, tap2);
+
+                        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+                        static Vector128<uint> WeightedSumWidening128(Vector128<byte> data, Vector128<sbyte> weights)
+                        {
+                            (Vector128<ushort> dLo, Vector128<ushort> dHi) = Vector128.Widen(data);
+                            (Vector128<short> wLo, Vector128<short> wHi) = Vector128.Widen(weights);
+
+                            (Vector128<int> pLo1, Vector128<int> pHi1) = Vector128.Widen(dLo.AsInt16() * wLo);
+                            (Vector128<int> pLo2, Vector128<int> pHi2) = Vector128.Widen(dHi.AsInt16() * wHi);
+
+                            return (pLo1 + pHi1 + pLo2 + pHi2).AsUInt32();
+                        }
+                    }
+                }
+                while (--blocks > 0);
+
+                vs2 += vps << 5;
+
+                s1 += Vector128.Sum(vs1);
+                s2 = Vector128.Sum(vs2);
+
+                s1 %= ModBase;
+                s2 %= ModBase;
+            }
+            while (length >= BlockSize);
+
+            if (length > 0)
+            {
+                UpdateScalarTail(ref sourceRef, length, ref s1, ref s2);
+            }
+
+            return (s2 << 16) | s1;
+        }
+
+        [MethodImpl(MethodImplOptions.NoInlining)]
+        private static uint UpdateVector256(uint adler, ReadOnlySpan<byte> source)
+        {
+            Debug.Assert(source.Length >= Vector256<byte>.Count);
+
+            const int BlockSize = 32;
+
+            uint s1 = adler & 0xFFFF;
+            uint s2 = (adler >> 16) & 0xFFFF;
+
+            ref byte sourceRef = ref MemoryMarshal.GetReference(source);
+            int length = source.Length;
+
+            Vector256<sbyte> weights = Vector256.Create(32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1);
+
+            do
+            {
+                int n = Math.Min(length, NMax);
+                int blocks = n / BlockSize;
+                n = blocks * BlockSize;
+                length -= n;
+
+                Vector256<uint> vs1 = Vector256.CreateScalar(s1);
+                Vector256<uint> vs2 = Vector256.CreateScalar(s2);
+                Vector256<uint> vs3 = Vector256<uint>.Zero;
+
+                do
+                {
+                    Vector256<byte> data = Vector256.LoadUnsafe(ref sourceRef);
+                    sourceRef = ref Unsafe.Add(ref sourceRef, BlockSize);
+
+                    Vector256<uint> vs1_0 = vs1;
+                    vs1 += Avx2.SumAbsoluteDifferences(data, Vector256<byte>.Zero).AsUInt32();
+                    vs3 += vs1_0;
+
+                    Vector256<short> mad = Avx2.MultiplyAddAdjacent(data, weights);
+                    vs2 += Avx2.MultiplyAddAdjacent(mad, Vector256<short>.One).AsUInt32();
+                }
+                while (--blocks > 0);
+
+                vs3 <<= 5;
+                vs2 += vs3;
+
+                s1 = (uint)Vector256.Sum(vs1.AsUInt64()); // SumAbsoluteDifferences stores the results in the even lanes
+                s2 = Vector256.Sum(vs2);
+
+                s1 %= ModBase;
+                s2 %= ModBase;
+            }
+            while (length >= BlockSize);
+
+            if (length > 0)
+            {
+                UpdateScalarTail(ref sourceRef, length, ref s1, ref s2);
+            }
+
+            return (s2 << 16) | s1;
+        }
+
+        [MethodImpl(MethodImplOptions.NoInlining)]
+        private static uint UpdateVector512(uint adler, ReadOnlySpan<byte> source)
+        {
+            Debug.Assert(source.Length >= Vector512<byte>.Count);
+
+            const int BlockSize = 64;
+
+            uint s1 = adler & 0xFFFF;
+            uint s2 = (adler >> 16) & 0xFFFF;
+
+            ref byte sourceRef = ref MemoryMarshal.GetReference(source);
+            int length = source.Length;
+
+            Vector512<sbyte> weights = Vector512.Create(
+                32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1,
+                32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1);
+
+            do
+            {
+                int n = Math.Min(length, NMax);
+                int blocks = n / BlockSize;
+                n = blocks * BlockSize;
+                length -= n;
+
+                Vector512<uint> vs1 = Vector512.CreateScalar(s1);
+                Vector512<uint> vs2 = Vector512.CreateScalar(s2);
+                Vector512<uint> vs3 = Vector512<uint>.Zero;
+
+                do
+                {
+                    Vector512<byte> data = Vector512.LoadUnsafe(ref sourceRef);
+                    sourceRef = ref Unsafe.Add(ref sourceRef, BlockSize);
+
+                    Vector512<uint> vs1_0 = vs1;
+                    vs1 += Avx512BW.SumAbsoluteDifferences(data, Vector512<byte>.Zero).AsUInt32();
+                    vs3 += vs1_0;
+                    vs2 += Avx512BW.MultiplyAddAdjacent(Avx512BW.MultiplyAddAdjacent(data, weights), Vector512<short>.One).AsUInt32();
+
+                    Vector256<uint> sumLo = Avx2.SumAbsoluteDifferences(data.GetLower(), Vector256<byte>.Zero).AsUInt32();
+                    vs2 += Vector512.Create(sumLo << 5, Vector256<uint>.Zero);
+                }
+                while (--blocks > 0);
+
+                vs3 <<= 6;
+                vs2 += vs3;
+
+                s1 = (uint)Vector512.Sum(vs1.AsUInt64());
+                s2 = Vector512.Sum(vs2);
+
+                s1 %= ModBase;
+                s2 %= ModBase;
+            }
+            while (length >= BlockSize);
+
+            if (length >= Vector256<byte>.Count)
+            {
+                return UpdateVector256((s2 << 16) | s1, MemoryMarshal.CreateReadOnlySpan(ref sourceRef, length));
+            }
+
+            if (length > 0)
+            {
+                UpdateScalarTail(ref sourceRef, length, ref s1, ref s2);
+            }
+
+            return (s2 << 16) | s1;
+        }
+
+        [MethodImpl(MethodImplOptions.NoInlining)]
+        private static uint UpdateArm128(uint adler, ReadOnlySpan<byte> source)
+        {
+            Debug.Assert(source.Length >= Vector128<byte>.Count * 2);
+
+            const int BlockSize = 32; // two Vector128<byte> loads
+
+            uint s1 = adler & 0xFFFF;
+            uint s2 = (adler >> 16) & 0xFFFF;
+
+            ref byte sourceRef = ref MemoryMarshal.GetReference(source);
+            int length = source.Length;
+
+            do
+            {
+                int n = Math.Min(length, NMax);
+                int blocks = n / BlockSize;
+                n = blocks * BlockSize;
+                length -= n;
+
+                Vector128<uint> vs1 = Vector128<uint>.Zero;
+                Vector128<uint> vps = Vector128.CreateScalar(s1 * (uint)blocks);
+
+                Vector128<ushort> vColumnSum1 = Vector128<ushort>.Zero;
+                Vector128<ushort> vColumnSum2 = Vector128<ushort>.Zero;
+                Vector128<ushort> vColumnSum3 = Vector128<ushort>.Zero;
+                Vector128<ushort> vColumnSum4 = Vector128<ushort>.Zero;
+
+                do
+                {
+                    Vector128<byte> bytes1 = Vector128.LoadUnsafe(ref sourceRef);
+                    Vector128<byte> bytes2 = Vector128.LoadUnsafe(ref sourceRef, 16);
+                    sourceRef = ref Unsafe.Add(ref sourceRef, BlockSize);
+
+                    vps += vs1;
+
+                    vs1 = AdvSimd.AddPairwiseWideningAndAdd(
+                        vs1,
+                        AdvSimd.AddPairwiseWideningAndAdd(
+                            AdvSimd.AddPairwiseWidening(bytes1),
+                            bytes2));
+
+                    vColumnSum1 = AdvSimd.AddWideningLower(vColumnSum1, bytes1.GetLower());
+                    vColumnSum2 = AdvSimd.AddWideningLower(vColumnSum2, bytes1.GetUpper());
+                    vColumnSum3 = AdvSimd.AddWideningLower(vColumnSum3, bytes2.GetLower());
+                    vColumnSum4 = AdvSimd.AddWideningLower(vColumnSum4, bytes2.GetUpper());
+                }
+                while (--blocks > 0);
+
+                Vector128<uint> vs2 = vps << 5;
+                vs2 = AdvSimd.MultiplyWideningLowerAndAdd(vs2, vColumnSum1.GetLower(), Vector64.Create((ushort)32, 31, 30, 29));
+                vs2 = AdvSimd.MultiplyWideningLowerAndAdd(vs2, vColumnSum1.GetUpper(), Vector64.Create((ushort)28, 27, 26, 25));
+                vs2 = AdvSimd.MultiplyWideningLowerAndAdd(vs2, vColumnSum2.GetLower(), Vector64.Create((ushort)24, 23, 22, 21));
+                vs2 = AdvSimd.MultiplyWideningLowerAndAdd(vs2, vColumnSum2.GetUpper(), Vector64.Create((ushort)20, 19, 18, 17));
+                vs2 = AdvSimd.MultiplyWideningLowerAndAdd(vs2, vColumnSum3.GetLower(), Vector64.Create((ushort)16, 15, 14, 13));
+                vs2 = AdvSimd.MultiplyWideningLowerAndAdd(vs2, vColumnSum3.GetUpper(), Vector64.Create((ushort)12, 11, 10, 9));
+                vs2 = AdvSimd.MultiplyWideningLowerAndAdd(vs2, vColumnSum4.GetLower(), Vector64.Create((ushort)8, 7, 6, 5));
+                vs2 = AdvSimd.MultiplyWideningLowerAndAdd(vs2, vColumnSum4.GetUpper(), Vector64.Create((ushort)4, 3, 2, 1));
+
+                s1 += Vector128.Sum(vs1);
+                s2 += Vector128.Sum(vs2);
+
+                s1 %= ModBase;
+                s2 %= ModBase;
+            }
+            while (length >= BlockSize);
+
+            if (length > 0)
+            {
+                UpdateScalarTail(ref sourceRef, length, ref s1, ref s2);
+            }
+
+            return (s2 << 16) | s1;
+        }
+
+        private static void UpdateScalarTail(ref byte sourceRef, int length, ref uint s1, ref uint s2)
+        {
+            Debug.Assert(length is > 0 and < NMax);
+
+            foreach (byte b in MemoryMarshal.CreateReadOnlySpan(ref sourceRef, length))
+            {
+                s1 += b;
+                s2 += s1;
+            }
+
+            s1 %= ModBase;
+            s2 %= ModBase;
+        }


I think all of this could be put in Adler32.Vectorized.cs as the csproj can separate the .NET specific stuff from say the .NET Standard target(s).

Copilot AI review requested due to automatic review settings February 13, 2026 23:30

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 13, 2026

dotnet-policy-service bot assigned stephentoub Feb 13, 2026

Copilot started reviewing on behalf of stephentoub February 13, 2026 23:31 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

src/libraries/System.IO.Hashing/tests/Adler32Tests.cs Outdated Show resolved Hide resolved

stephentoub added area-System.IO.Hashing and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 14, 2026

Vectorize Adler32

04d2d15

stephentoub force-pushed the vectorizeadler branch from 81edcb0 to 04d2d15 Compare February 14, 2026 13:07

This was referenced Feb 14, 2026

Unable to pull image from mcr.microsoft.com #117164

Open

iOS.Device test WorkItemExecutions #122874

Open

XHarness package install failure on iOS due to devicectl NSPOSIXErrorDomain error 49 #123796

Open

EgorBot mentioned this pull request Feb 14, 2026

Benchmarks for #124409 (stephentoub) EgorBot/runtime-utils#616

Open

stephentoub marked this pull request as ready for review February 14, 2026 17:08

Copilot AI review requested due to automatic review settings February 14, 2026 17:08

Copilot started reviewing on behalf of stephentoub February 14, 2026 17:09 View session

Copilot AI reviewed Feb 14, 2026

View reviewed changes

src/libraries/System.IO.Hashing/src/System/IO/Hashing/Adler32.cs Show resolved Hide resolved

build-analysis bot mentioned this pull request Feb 14, 2026

[wasm] OpenQA.Selenium.WebDriverTimeoutException: timeout: Timed out receiving message from renderer #117486

Open

AraHaan reviewed Feb 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize Adler32#124409

Vectorize Adler32#124409
stephentoub wants to merge 1 commit intodotnet:mainfrom
stephentoub:vectorizeadler

stephentoub commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

dotnet-policy-service bot commented Feb 14, 2026

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

AraHaan Feb 17, 2026

Uh oh!

AraHaan Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stephentoub commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

dotnet-policy-service bot commented Feb 14, 2026

Uh oh!

stephentoub commented Feb 14, 2026

🤖 Copilot Code Review — PR #124409

Holistic Assessment

Detailed Findings

✅ Vectorized s2 Accumulation — Verified correct across all paths

✅ Vector512 Weight Correction — Correct

✅ Vector128 Non-SSSE3 Fallback — Correct and overflow-safe

✅ Endianness Guard — Appropriate

✅ AVX2 Lane Independence — Verified

✅ Test Coverage — Thorough

💡 Missing benchmark data

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

stephentoub commented Feb 14, 2026

Uh oh!

AraHaan Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

AraHaan Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants