Skip to content

Comments

Fix incorrect atomic loop optimization when body contains backtracking#124254

Merged
stephentoub merged 1 commit intodotnet:mainfrom
stephentoub:fixatomicloop
Feb 11, 2026
Merged

Fix incorrect atomic loop optimization when body contains backtracking#124254
stephentoub merged 1 commit intodotnet:mainfrom
stephentoub:fixatomicloop

Conversation

@stephentoub
Copy link
Member

MakeLoopAtomic wraps a loop in an Atomic group, which discards all backtracking state from the body. However, CanBeMadeAtomic only proves that giving back iterations is unnecessary... it does not account for within-body backtracking. When the body itself contains backtracking constructs, the atomic wrapper incorrectly suppresses that internal backtracking, producing wrong match results.

The fix adds a MayContainBacktracking check that walks the loop body tree before allowing MakeLoopAtomic. A loop is only made atomic when the body has no backtracking constructs of its own. The backtracking-construct detection logic is extracted into a reusable IsBacktrackingConstruct property on RegexNode, which RegexTreeAnalyzer now also uses, removing the duplicated switch.

MakeLoopAtomic wraps a loop in an Atomic group, which discards all backtracking state from the body. However, CanBeMadeAtomic only proves that giving back iterations is unnecessary... it does not account for within-body backtracking. When the body itself contains backtracking constructs, the atomic wrapper incorrectly suppresses that internal backtracking, producing wrong match results.

The fix adds a MayContainBacktracking check that walks the loop body tree before allowing MakeLoopAtomic. A loop is only made atomic when the body has no backtracking constructs of its own. The backtracking-construct detection logic is extracted into a reusable IsBacktrackingConstruct property on RegexNode, which RegexTreeAnalyzer now also uses, removing the duplicated switch.

Tests are added covering lazy/greedy single-char loops, char-class loops, alternations, and general multi-char loops inside outer loop bodies, as well as cases where the optimization should still apply (non-backtracking bodies).
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an unsound regex optimization where wrapping certain loops in an Atomic group could incorrectly suppress within-body backtracking (not just “giving back” loop iterations), leading to wrong match results. The PR adds a conservative check to ensure a loop body has no intrinsic backtracking constructs before applying MakeLoopAtomic, and refactors backtracking-construct detection into a reusable RegexNode property.

Changes:

  • Add a subtree walk (MayContainBacktracking) to block MakeLoopAtomic when the loop body contains backtracking constructs.
  • Extract backtracking-construct detection into RegexNode.IsBacktrackingConstruct and reuse it from RegexTreeAnalyzer.
  • Add regression tests covering nested-loop cases where atomic wrapping must not suppress inner backtracking.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Adds regression cases for loops whose bodies require backtracking to produce correct matches.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexTreeAnalyzer.cs Replaces a duplicated switch with RegexNode.IsBacktrackingConstruct.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Adds IsBacktrackingConstruct + MayContainBacktracking and gates MakeLoopAtomic accordingly.
Comments suppressed due to low confidence (1)

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs:2612

  • The MayContainBacktracking summary/remarks mention “variable-width loops”, but the detection is based on IsBacktrackingConstruct which keys off M != N (variable iteration count) and alternation. Consider rewording the summary to avoid implying it’s about width rather than iteration variability.
        /// <summary>
        /// Checks whether a node tree may contain backtracking constructs (variable-width loops or alternations).
        /// </summary>

@stephentoub
Copy link
Member Author

/ba-g failures are unrelated

@stephentoub stephentoub enabled auto-merge (squash) February 11, 2026 18:22
@stephentoub stephentoub merged commit 5d62ba2 into dotnet:main Feb 11, 2026
94 of 100 checks passed
@stephentoub stephentoub deleted the fixatomicloop branch February 11, 2026 18:22
@stephentoub
Copy link
Member Author

/backport to release/10.0

@github-actions
Copy link
Contributor

Started backporting to release/10.0 (link to workflow run)

@jeffhandley
Copy link
Member

Here's the repro app used to validate the servicing fix:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

// Repro for: https://github.com/dotnet/runtime/pull/124254
// Main PR:   https://github.com/dotnet/runtime/pull/124254
// Servicing: https://github.com/dotnet/runtime/pull/124287 (10.0.4)
// Repro from: https://github.com/dotnet/runtime/pull/124287 (test cases)
//
// MakeLoopAtomic incorrectly wraps loops whose body contains backtracking
// constructs (e.g., lazy quantifiers, alternations). The atomic wrapper
// suppresses within-body backtracking, producing wrong match results.
// Expected: Regex matches the full intended substring.
// Actual:   Regex fails to match or matches incorrectly.

using System;
using System.Text.RegularExpressions;

class Repro_124254
{
    static int Main()
    {
        int failures = 0;

        // Lazy loop inside loop body: .*? must expand past first ) to find correct closing delimiter
        failures += Check(@"\w+(\(.*?\))?,", @"Foo(one()?,two,three),", @"Foo(one()?,two,three),");

        // Another variant: lazy dot-star inside optional group
        failures += Check(@"\w+(<.*?>)?!", "foo<a>b>!", "foo<a>b>!");

        // Greedy loop body with backtracking
        failures += Check(@"\w+(\(.+\))?,", "Foo(a),(b)x,", "Foo(a),");

        // Alternate inside loop body
        failures += Check(@"\w+(\((?:a|a\)b)\))?,", "Foo(a)b),", "Foo(a)b),");

        if (failures == 0)
        {
            Console.WriteLine("PASS: All regex patterns matched correctly");
            return 0;
        }

        Console.WriteLine($"FAIL: {failures} pattern(s) failed");
        return 1;
    }

    static int Check(string pattern, string input, string expected)
    {
        var match = Regex.Match(input, pattern);
        if (!match.Success || match.Value != expected)
        {
            Console.WriteLine($"FAIL: Pattern '{pattern}' on '{input}'");
            Console.WriteLine($"  Expected: '{expected}'");
            Console.WriteLine($"  Actual:   '{(match.Success ? match.Value : "(no match)")}'");
            return 1;
        }

        return 0;
    }
}

Results: repro_124254 — Regex atomic loop optimization

Main PR: #124254
Servicing PRs: #124287 (10.0.4)

Baseline (expect failure — bug should reproduce)

  • .NET 10.0.3 — FAIL (expected) ✓ — 4/4 regex patterns matched incorrectly

Validation (expect pass — fix should resolve)

  • .NET 10.0.4 — PASS (expected) ✓ — All regex patterns matched correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants