Skip to content

ci-analysis skill: let the agent reason about its own tools#124398

Merged
lewing merged 2 commits intodotnet:mainfrom
lewing:skill/ci-analysis-hlx-update
Feb 13, 2026
Merged

ci-analysis skill: let the agent reason about its own tools#124398
lewing merged 2 commits intodotnet:mainfrom
lewing:skill/ci-analysis-hlx-update

Conversation

@lewing
Copy link
Member

@lewing lewing commented Feb 13, 2026

Refactors the ci-analysis skill to remove explicit MCP tool name references from all documentation.

Why

The agent has MCP tool descriptions in its context at runtime — it already knows what each tool does and what parameters it takes. Skills should provide domain knowledge the agent doesn't have: gotchas, priority orderings, data locations, and anti-patterns. Re-documenting tool parameters or providing step-by-step "call tool X then tool Y" recipes is fragile (breaks when tools change), redundant, and overly prescriptive.

What changed

  • Replaced tool call chains with action descriptions
  • Replaced parameter-level details with workflow guidance
  • Subagent delegation prompts describe goals, not tool calls
  • Kept all domain-specific gotchas, anti-patterns, and priority orderings

Net result: 89 lines removed, 45 added — less to maintain, less to break when tools change.

Testing

Multi-model tested with Claude Sonnet 4 and GPT-5 against real CI investigation (PR #124095). Both correctly identified and used the right tools for all scenarios without explicit tool names in the skill.

Copilot AI review requested due to automatic review settings February 13, 2026 19:04
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 13, 2026
@lewing lewing force-pushed the skill/ci-analysis-hlx-update branch from cc5149b to 014192c Compare February 13, 2026 19:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the ci-analysis skill documentation to align with Helix MCP server v0.1.3 API/tooling, steering investigators toward MCP tools over manual Helix REST calls.

Changes:

  • Updated guidance for hlx_batch_status (array input, max 50) and hlx_status (new filter:"failed|passed|all" enum).
  • Documented and promoted new MCP investigation tools: hlx_test_results, hlx_search_file, hlx_find_files, and preference for hlx_search_log.
  • Added “prefer MCP tools” notes to reduce reliance on manual curl-based Helix API usage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
.github/skills/ci-analysis/references/manual-investigation.md Adds an explicit note to prefer MCP tools before using raw Helix REST API calls.
.github/skills/ci-analysis/references/helix-artifacts.md Documents new remote-investigation MCP tools and recommends hlx_test_results over manual XML parsing.
.github/skills/ci-analysis/references/delegation-patterns.md Updates delegation guidance to use hlx_search_log and hlx_test_results for more direct extraction.
.github/skills/ci-analysis/SKILL.md Updates core skill guidance to match Helix MCP v0.1.3 API semantics (filter, array batch status) and new preferred tools.

- hlx_batch_status: document array param (was comma-separated), max 50 limit
- hlx_status: document filter enum (failed|passed|all), was bool
- hlx_test_results: preferred over manual testResults.xml download+parse
- hlx_search_file: quick remote file search without downloading
- hlx_find_files: generalized file discovery with glob patterns
- hlx_search_log: preferred over hlx_logs for pattern extraction
- manual-investigation.md: add MCP-preferred note before raw API section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@lewing lewing force-pushed the skill/ci-analysis-hlx-update branch from 014192c to 8c51dd2 Compare February 13, 2026 19:16
Skills should provide domain knowledge (gotchas, priority orderings,
data locations) not re-document tool parameters or step-by-step tool
call recipes. The agent has MCP tool descriptions in its context and
can map actions to tools on its own.

Multi-model tested: Sonnet 4 and GPT-5 both correctly navigated
real CI investigation on PR dotnet#124095 without explicit tool names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 13, 2026 19:53
@lewing lewing changed the title ci-analysis skill: update for Helix MCP v0.1.3 API ci-analysis skill: update for Helix MCP v0.1.3 and remove explicit tool references Feb 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

.github/skills/ci-analysis/references/delegation-patterns.md:20

  • In the delegation prompt, “search console logs” is underspecified now that explicit hlx_logs usage was removed. Given this skill’s guidance to prefer hlx_search_log for pattern extraction, please name the intended MCP tool(s) and the pattern to use (e.g., search for lines ending in [FAIL]) so the subagent prompt is directly executable.
For each, search console logs for lines ending with [FAIL] (xUnit format).
If hlx MCP is not available, fall back to:
  ./scripts/Get-CIStatus.ps1 -HelixJob "{JOB}" -WorkItem "{ITEM}"

Extract lines ending with [FAIL] (xUnit format). Ignore [OUTPUT] and [PASS] lines.

@lewing lewing changed the title ci-analysis skill: update for Helix MCP v0.1.3 and remove explicit tool references ci-analysis skill: let the agent reason about its own tools Feb 13, 2026
@lewing
Copy link
Member Author

lewing commented Feb 13, 2026

Design note: tension between routing signals and tool-agnostic guidance

The official skills guidance recommends explicit MCP tool references in two places:

  1. Frontmatter INVOKES: — tells the router which tools the skill uses, preventing collisions when multiple skills/tools could match the same prompt
  2. "MCP Tools Used" tables — step-by-step tool → command → purpose mappings in the skill body

This PR removes category (2) — the step-by-step tool call recipes in reference docs — but the reasoning applies differently to each category:

Why step-by-step recipes are fragile (what this PR removes)

The official guide's own evaluation-first principle says: "Document only what [the agent] gets wrong without guidance." The agent already has MCP tool descriptions in its context — it knows what hlx_test_results does and what parameters it takes. Re-documenting that in the skill creates two sources of truth that drift when tools change (as just happened with hlx v0.1.3). Multi-model testing confirmed both Sonnet 4 and GPT-5 found the right tools without explicit names.

Why routing signals may still matter (what we should consider)

The official guide's INVOKES: pattern exists to solve a different problem: when a skill and an MCP tool have overlapping descriptions, the router needs help disambiguating. The ci-analysis skill doesn't currently have this collision problem — there's no competing "ci-analysis" MCP tool. But adding INVOKES: to the frontmatter is cheap (~20 tokens) and provides forward compatibility if the tool landscape changes.

The principle

Frontmatter routing signals (INVOKES:, USE FOR:, DO NOT USE FOR:) → ✅ Keep, they help the router
Step-by-step tool call recipes (call hlx_status with filter:"all") → ❌ Remove, they duplicate tool descriptions and break when APIs change

The skill should teach the agent domain knowledge it doesn't have (gotchas, priority orderings, data locations) and let it reason about its own tools from their descriptions.

@lewing lewing requested a review from steveisok February 13, 2026 21:37
@lewing lewing merged commit 70d3aeb into dotnet:main Feb 13, 2026
26 checks passed
lewing added a commit to lewing/runtime that referenced this pull request Feb 14, 2026
…ogression

Waza eval progression testing (16 runs across 4 skill versions) revealed
the tool-agnostic refactor (dotnet#124398) caused a 68% regression in tool
calls (25→42) for the build progression task. Root cause: domain-specific
examples were incorrectly classified as tool schema restatements.

Changes:
- build-progression-analysis.md: restore key AzDO query parameters
  (branchName, queryOrder, top, project) as inline hints
- build-progression-analysis.md: restore gh api merge parent extraction
  example and mention get_commit MCP alternative
- build-progression-analysis.md: restore logId:5 / startLine:500 hints
  with bold emphasis for checkout log extraction
- build-progression-analysis.md: add stop signal — present findings when
  the progression table and transition are identified
- delegation-patterns.md: add bold emphasis on log ID/line hints in
  subagent prompt template
- SKILL.md: mention refs/pull/{PR}/merge branch pattern in step 1

These are domain examples (branch ref formats, field names, log locations,
jq expressions) that agents cannot infer from tool descriptions alone.
Simple tasks (retry) still benefit from less prescriptive guidance.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
richlander pushed a commit to richlander/runtime that referenced this pull request Feb 14, 2026
…24398)

Refactors the ci-analysis skill to remove explicit MCP tool name
references from all documentation.

### Why

The agent has MCP tool descriptions in its context at runtime — it
already knows what each tool does and what parameters it takes. Skills
should provide domain knowledge the agent *doesn't* have: gotchas,
priority orderings, data locations, and anti-patterns. Re-documenting
tool parameters or providing step-by-step "call tool X then tool Y"
recipes is fragile (breaks when tools change), redundant, and overly
prescriptive.

### What changed

- Replaced tool call chains with action descriptions
- Replaced parameter-level details with workflow guidance  
- Subagent delegation prompts describe goals, not tool calls
- Kept all domain-specific gotchas, anti-patterns, and priority
orderings

**Net result: 89 lines removed, 45 added** — less to maintain, less to
break when tools change.

### Testing

Multi-model tested with Claude Sonnet 4 and GPT-5 against real CI
investigation (PR dotnet#124095). Both correctly identified and used the right
tools for all scenarios without explicit tool names in the skill.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants