ci-analysis skill: let the agent reason about its own tools#124398
ci-analysis skill: let the agent reason about its own tools#124398lewing merged 2 commits intodotnet:mainfrom
Conversation
cc5149b to
014192c
Compare
There was a problem hiding this comment.
Pull request overview
Updates the ci-analysis skill documentation to align with Helix MCP server v0.1.3 API/tooling, steering investigators toward MCP tools over manual Helix REST calls.
Changes:
- Updated guidance for
hlx_batch_status(array input, max 50) andhlx_status(newfilter:"failed|passed|all"enum). - Documented and promoted new MCP investigation tools:
hlx_test_results,hlx_search_file,hlx_find_files, and preference forhlx_search_log. - Added “prefer MCP tools” notes to reduce reliance on manual curl-based Helix API usage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
.github/skills/ci-analysis/references/manual-investigation.md |
Adds an explicit note to prefer MCP tools before using raw Helix REST API calls. |
.github/skills/ci-analysis/references/helix-artifacts.md |
Documents new remote-investigation MCP tools and recommends hlx_test_results over manual XML parsing. |
.github/skills/ci-analysis/references/delegation-patterns.md |
Updates delegation guidance to use hlx_search_log and hlx_test_results for more direct extraction. |
.github/skills/ci-analysis/SKILL.md |
Updates core skill guidance to match Helix MCP v0.1.3 API semantics (filter, array batch status) and new preferred tools. |
- hlx_batch_status: document array param (was comma-separated), max 50 limit - hlx_status: document filter enum (failed|passed|all), was bool - hlx_test_results: preferred over manual testResults.xml download+parse - hlx_search_file: quick remote file search without downloading - hlx_find_files: generalized file discovery with glob patterns - hlx_search_log: preferred over hlx_logs for pattern extraction - manual-investigation.md: add MCP-preferred note before raw API section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
014192c to
8c51dd2
Compare
Skills should provide domain knowledge (gotchas, priority orderings, data locations) not re-document tool parameters or step-by-step tool call recipes. The agent has MCP tool descriptions in its context and can map actions to tools on its own. Multi-model tested: Sonnet 4 and GPT-5 both correctly navigated real CI investigation on PR dotnet#124095 without explicit tool names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (1)
.github/skills/ci-analysis/references/delegation-patterns.md:20
- In the delegation prompt, “search console logs” is underspecified now that explicit
hlx_logsusage was removed. Given this skill’s guidance to preferhlx_search_logfor pattern extraction, please name the intended MCP tool(s) and the pattern to use (e.g., search for lines ending in[FAIL]) so the subagent prompt is directly executable.
For each, search console logs for lines ending with [FAIL] (xUnit format).
If hlx MCP is not available, fall back to:
./scripts/Get-CIStatus.ps1 -HelixJob "{JOB}" -WorkItem "{ITEM}"
Extract lines ending with [FAIL] (xUnit format). Ignore [OUTPUT] and [PASS] lines.
Design note: tension between routing signals and tool-agnostic guidanceThe official skills guidance recommends explicit MCP tool references in two places:
This PR removes category (2) — the step-by-step tool call recipes in reference docs — but the reasoning applies differently to each category: Why step-by-step recipes are fragile (what this PR removes)The official guide's own evaluation-first principle says: "Document only what [the agent] gets wrong without guidance." The agent already has MCP tool descriptions in its context — it knows what Why routing signals may still matter (what we should consider)The official guide's The principleFrontmatter routing signals ( The skill should teach the agent domain knowledge it doesn't have (gotchas, priority orderings, data locations) and let it reason about its own tools from their descriptions. |
…ogression Waza eval progression testing (16 runs across 4 skill versions) revealed the tool-agnostic refactor (dotnet#124398) caused a 68% regression in tool calls (25→42) for the build progression task. Root cause: domain-specific examples were incorrectly classified as tool schema restatements. Changes: - build-progression-analysis.md: restore key AzDO query parameters (branchName, queryOrder, top, project) as inline hints - build-progression-analysis.md: restore gh api merge parent extraction example and mention get_commit MCP alternative - build-progression-analysis.md: restore logId:5 / startLine:500 hints with bold emphasis for checkout log extraction - build-progression-analysis.md: add stop signal — present findings when the progression table and transition are identified - delegation-patterns.md: add bold emphasis on log ID/line hints in subagent prompt template - SKILL.md: mention refs/pull/{PR}/merge branch pattern in step 1 These are domain examples (branch ref formats, field names, log locations, jq expressions) that agents cannot infer from tool descriptions alone. Simple tasks (retry) still benefit from less prescriptive guidance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…24398) Refactors the ci-analysis skill to remove explicit MCP tool name references from all documentation. ### Why The agent has MCP tool descriptions in its context at runtime — it already knows what each tool does and what parameters it takes. Skills should provide domain knowledge the agent *doesn't* have: gotchas, priority orderings, data locations, and anti-patterns. Re-documenting tool parameters or providing step-by-step "call tool X then tool Y" recipes is fragile (breaks when tools change), redundant, and overly prescriptive. ### What changed - Replaced tool call chains with action descriptions - Replaced parameter-level details with workflow guidance - Subagent delegation prompts describe goals, not tool calls - Kept all domain-specific gotchas, anti-patterns, and priority orderings **Net result: 89 lines removed, 45 added** — less to maintain, less to break when tools change. ### Testing Multi-model tested with Claude Sonnet 4 and GPT-5 against real CI investigation (PR dotnet#124095). Both correctly identified and used the right tools for all scenarios without explicit tool names in the skill. --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactors the ci-analysis skill to remove explicit MCP tool name references from all documentation.
Why
The agent has MCP tool descriptions in its context at runtime — it already knows what each tool does and what parameters it takes. Skills should provide domain knowledge the agent doesn't have: gotchas, priority orderings, data locations, and anti-patterns. Re-documenting tool parameters or providing step-by-step "call tool X then tool Y" recipes is fragile (breaks when tools change), redundant, and overly prescriptive.
What changed
Net result: 89 lines removed, 45 added — less to maintain, less to break when tools change.
Testing
Multi-model tested with Claude Sonnet 4 and GPT-5 against real CI investigation (PR #124095). Both correctly identified and used the right tools for all scenarios without explicit tool names in the skill.