Skip to content

fix: robust subagent completion propagation#13321

Open
ASidorenkoCode wants to merge 1 commit intoanomalyco:devfrom
ASidorenkoCode:fix/subagent-completion-propagation
Open

fix: robust subagent completion propagation#13321
ASidorenkoCode wants to merge 1 commit intoanomalyco:devfrom
ASidorenkoCode:fix/subagent-completion-propagation

Conversation

@ASidorenkoCode
Copy link
Contributor

@ASidorenkoCode ASidorenkoCode commented Feb 12, 2026

Summary

Fixes parent session hanging indefinitely when a subagent (Task tool) completes but SessionPrompt.prompt() never resolves. Fixes #9003, #10802, #11865, #6792.

Three targeted changes:

  1. await the prune() call (prompt.ts:653) — was fire-and-forget, racing with the immediately following MessageV2.stream() reads.
  2. Diagnostic logging (prompt.ts:305-312) — the loop exit condition silently continued when assistant finished but user message ID was newer. Now logs a warning so this case is visible.
  3. Event-driven recovery (task.ts) — subscribe to child MessageV2.Event.Updated; if the child reaches a terminal state (finish or error) but the prompt doesn't resolve within 3s, force-cancel to unblock the parent. Not a timeout — only fires with positive proof of completion.

Fixes parent session hanging when a subagent completes but the prompt
loop doesn't exit. Three changes:

1. `await` the `SessionCompaction.prune()` call that was fire-and-forget,
   eliminating a race between prune writes and the following stream reads.

2. Add diagnostic logging when the loop exit condition fails (assistant
   finished but user message ID is newer), making stuck loops visible.

3. Add event-driven recovery in the task tool: subscribe to child message
   updates, and if the child's assistant message reaches a terminal state
   (finish or error) but the prompt doesn't resolve within 3s, force-cancel
   to unblock the parent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@ASidorenkoCode
Copy link
Contributor Author

ASidorenkoCode commented Feb 12, 2026

Reproduction & verification

Simulating the stuck condition

To reliably reproduce the race condition, I injected an abort-aware delay in prompt.ts at the child session's loop break point (after processor.process() returns "stop", before break). This only triggers for child sessions (session.parentID is set), simulating the condition where the child's loop doesn't exit after completion:

if (result === "stop") {
  if (session.parentID) {
    log.warn("INJECTED DELAY: child loop stuck after processor returned stop", { sessionID })
    await new Promise<void>((resolve) => {
      const timer = setTimeout(resolve, 120_000)
      abort.addEventListener("abort", () => { clearTimeout(timer); resolve() }, { once: true })
    })
  }
  break
}

The delay is abort-aware so that SessionPrompt.cancel() can interrupt it, mirroring real stuck conditions which wait on abort-aware I/O.

Before (without fix, with injected delay)

=== BEFORE TEST ===
• quick test  Explore Agent
I'll launch an explore agent...

--- 60s elapsed ---
RESULT: STILL RUNNING = HUNG (parent blocked by stuck child)

Log confirmed the child was stuck:

WARN service=session.prompt sessionID=ses_... INJECTED DELAY: child loop stuck after processor returned stop

Parent waited forever — had to be killed.

After (with fix, no injected delay)

=== AFTER TEST ===
• list top-level files  Explore Agent
✓ list top-level files  Explore Agent

(full directory listing...)

## Top-Level Files & Directories Report
(parent produced final summary and exited cleanly)

Log confirmed clean exit:

INFO service=session.prompt sessionID=ses_... exiting loop
INFO service=session.prompt sessionID=ses_... cancel

Additional finding

During testing, discovered that processor errors set error on the assistant message but NOT finish. Updated the recovery mechanism to detect both terminal states (finish and error), covering both clean completions and error exits.

@github-actions
Copy link
Contributor

The following comment was made by an LLM, it may be inaccurate:

No duplicate PRs found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

main agent hangs because of subagent (explore)

1 participant