ci-analysis skill: let the agent reason about its own tools by lewing · Pull Request #124398 · dotnet/runtime

lewing · 2026-02-13T19:04:57Z

Refactors the ci-analysis skill to remove explicit MCP tool name references from all documentation.

Why

The agent has MCP tool descriptions in its context at runtime — it already knows what each tool does and what parameters it takes. Skills should provide domain knowledge the agent doesn't have: gotchas, priority orderings, data locations, and anti-patterns. Re-documenting tool parameters or providing step-by-step "call tool X then tool Y" recipes is fragile (breaks when tools change), redundant, and overly prescriptive.

What changed

Replaced tool call chains with action descriptions
Replaced parameter-level details with workflow guidance
Subagent delegation prompts describe goals, not tool calls
Kept all domain-specific gotchas, anti-patterns, and priority orderings

Net result: 89 lines removed, 45 added — less to maintain, less to break when tools change.

Testing

Multi-model tested with Claude Sonnet 4 and GPT-5 against real CI investigation (PR #124095). Both correctly identified and used the right tools for all scenarios without explicit tool names in the skill.

Copilot

Pull request overview

Updates the ci-analysis skill documentation to align with Helix MCP server v0.1.3 API/tooling, steering investigators toward MCP tools over manual Helix REST calls.

Changes:

Updated guidance for hlx_batch_status (array input, max 50) and hlx_status (new filter:"failed|passed|all" enum).
Documented and promoted new MCP investigation tools: hlx_test_results, hlx_search_file, hlx_find_files, and preference for hlx_search_log.
Added “prefer MCP tools” notes to reduce reliance on manual curl-based Helix API usage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
`.github/skills/ci-analysis/references/manual-investigation.md`	Adds an explicit note to prefer MCP tools before using raw Helix REST API calls.
`.github/skills/ci-analysis/references/helix-artifacts.md`	Documents new remote-investigation MCP tools and recommends `hlx_test_results` over manual XML parsing.
`.github/skills/ci-analysis/references/delegation-patterns.md`	Updates delegation guidance to use `hlx_search_log` and `hlx_test_results` for more direct extraction.
`.github/skills/ci-analysis/SKILL.md`	Updates core skill guidance to match Helix MCP v0.1.3 API semantics (`filter`, array batch status) and new preferred tools.

- hlx_batch_status: document array param (was comma-separated), max 50 limit - hlx_status: document filter enum (failed|passed|all), was bool - hlx_test_results: preferred over manual testResults.xml download+parse - hlx_search_file: quick remote file search without downloading - hlx_find_files: generalized file discovery with glob patterns - hlx_search_log: preferred over hlx_logs for pattern extraction - manual-investigation.md: add MCP-preferred note before raw API section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Skills should provide domain knowledge (gotchas, priority orderings, data locations) not re-document tool parameters or step-by-step tool call recipes. The agent has MCP tool descriptions in its context and can map actions to tools on its own. Multi-model tested: Sonnet 4 and GPT-5 both correctly navigated real CI investigation on PR dotnet#124095 without explicit tool names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Comments suppressed due to low confidence (1)

.github/skills/ci-analysis/references/delegation-patterns.md:20

In the delegation prompt, “search console logs” is underspecified now that explicit hlx_logs usage was removed. Given this skill’s guidance to prefer hlx_search_log for pattern extraction, please name the intended MCP tool(s) and the pattern to use (e.g., search for lines ending in [FAIL]) so the subagent prompt is directly executable.

For each, search console logs for lines ending with [FAIL] (xUnit format).
If hlx MCP is not available, fall back to:
  ./scripts/Get-CIStatus.ps1 -HelixJob "{JOB}" -WorkItem "{ITEM}"

Extract lines ending with [FAIL] (xUnit format). Ignore [OUTPUT] and [PASS] lines.

.github/skills/ci-analysis/references/delegation-patterns.md

.github/skills/ci-analysis/references/manual-investigation.md

.github/skills/ci-analysis/references/helix-artifacts.md

.github/skills/ci-analysis/references/delegation-patterns.md

.github/skills/ci-analysis/references/build-progression-analysis.md

.github/skills/ci-analysis/SKILL.md

.github/skills/ci-analysis/references/helix-artifacts.md

.github/skills/ci-analysis/SKILL.md

lewing · 2026-02-13T20:06:24Z

Design note: tension between routing signals and tool-agnostic guidance

The official skills guidance recommends explicit MCP tool references in two places:

Frontmatter INVOKES: — tells the router which tools the skill uses, preventing collisions when multiple skills/tools could match the same prompt
"MCP Tools Used" tables — step-by-step tool → command → purpose mappings in the skill body

This PR removes category (2) — the step-by-step tool call recipes in reference docs — but the reasoning applies differently to each category:

Why step-by-step recipes are fragile (what this PR removes)

The official guide's own evaluation-first principle says: "Document only what [the agent] gets wrong without guidance." The agent already has MCP tool descriptions in its context — it knows what hlx_test_results does and what parameters it takes. Re-documenting that in the skill creates two sources of truth that drift when tools change (as just happened with hlx v0.1.3). Multi-model testing confirmed both Sonnet 4 and GPT-5 found the right tools without explicit names.

Why routing signals may still matter (what we should consider)

The official guide's INVOKES: pattern exists to solve a different problem: when a skill and an MCP tool have overlapping descriptions, the router needs help disambiguating. The ci-analysis skill doesn't currently have this collision problem — there's no competing "ci-analysis" MCP tool. But adding INVOKES: to the frontmatter is cheap (~20 tokens) and provides forward compatibility if the tool landscape changes.

The principle

Frontmatter routing signals (INVOKES:, USE FOR:, DO NOT USE FOR:) → ✅ Keep, they help the router
Step-by-step tool call recipes (call hlx_status with filter:"all") → ❌ Remove, they duplicate tool descriptions and break when APIs change

The skill should teach the agent domain knowledge it doesn't have (gotchas, priority orderings, data locations) and let it reason about its own tools from their descriptions.

…ogression Waza eval progression testing (16 runs across 4 skill versions) revealed the tool-agnostic refactor (dotnet#124398) caused a 68% regression in tool calls (25→42) for the build progression task. Root cause: domain-specific examples were incorrectly classified as tool schema restatements. Changes: - build-progression-analysis.md: restore key AzDO query parameters (branchName, queryOrder, top, project) as inline hints - build-progression-analysis.md: restore gh api merge parent extraction example and mention get_commit MCP alternative - build-progression-analysis.md: restore logId:5 / startLine:500 hints with bold emphasis for checkout log extraction - build-progression-analysis.md: add stop signal — present findings when the progression table and transition are identified - delegation-patterns.md: add bold emphasis on log ID/line hints in subagent prompt template - SKILL.md: mention refs/pull/{PR}/merge branch pattern in step 1 These are domain examples (branch ref formats, field names, log locations, jq expressions) that agents cannot infer from tool descriptions alone. Simple tasks (retry) still benefit from less prescriptive guidance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…24398) Refactors the ci-analysis skill to remove explicit MCP tool name references from all documentation. ### Why The agent has MCP tool descriptions in its context at runtime — it already knows what each tool does and what parameters it takes. Skills should provide domain knowledge the agent *doesn't* have: gotchas, priority orderings, data locations, and anti-patterns. Re-documenting tool parameters or providing step-by-step "call tool X then tool Y" recipes is fragile (breaks when tools change), redundant, and overly prescriptive. ### What changed - Replaced tool call chains with action descriptions - Replaced parameter-level details with workflow guidance - Subagent delegation prompts describe goals, not tool calls - Kept all domain-specific gotchas, anti-patterns, and priority orderings **Net result: 89 lines removed, 45 added** — less to maintain, less to break when tools change. ### Testing Multi-model tested with Claude Sonnet 4 and GPT-5 against real CI investigation (PR dotnet#124095). Both correctly identified and used the right tools for all scenarios without explicit tool names in the skill. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 13, 2026 19:04

dotnet-policy-service bot assigned lewing Feb 13, 2026

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 13, 2026

Copilot started reviewing on behalf of lewing February 13, 2026 19:05 View session

lewing force-pushed the skill/ci-analysis-hlx-update branch from cc5149b to 014192c Compare February 13, 2026 19:06

Copilot AI reviewed Feb 13, 2026

View reviewed changes

lewing force-pushed the skill/ci-analysis-hlx-update branch from 014192c to 8c51dd2 Compare February 13, 2026 19:16

Copilot AI review requested due to automatic review settings February 13, 2026 19:53

lewing changed the title ~~ci-analysis skill: update for Helix MCP v0.1.3 API~~ ci-analysis skill: update for Helix MCP v0.1.3 and remove explicit tool references Feb 13, 2026

Copilot started reviewing on behalf of lewing February 13, 2026 19:55 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

lewing changed the title ~~ci-analysis skill: update for Helix MCP v0.1.3 and remove explicit tool references~~ ci-analysis skill: let the agent reason about its own tools Feb 13, 2026

lewing requested a review from steveisok February 13, 2026 21:37

steveisok approved these changes Feb 13, 2026

View reviewed changes

lewing merged commit 70d3aeb into dotnet:main Feb 13, 2026
26 checks passed

lewing mentioned this pull request Feb 14, 2026

ci-analysis skill: restore domain examples from eval regression analysis #124416

Merged

dotnet-maestro bot mentioned this pull request Feb 14, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#4873

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci-analysis skill: let the agent reason about its own tools#124398

ci-analysis skill: let the agent reason about its own tools#124398
lewing merged 2 commits intodotnet:mainfrom
lewing:skill/ci-analysis-hlx-update

lewing commented Feb 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewing commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lewing commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewing commented Feb 13, 2026

Design note: tension between routing signals and tool-agnostic guidance

Why step-by-step recipes are fragile (what this PR removes)

Why routing signals may still matter (what we should consider)

The principle

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lewing commented Feb 13, 2026 •

edited

Loading