You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Test run against:Durable Streams (pre-1.0 HTTP streaming protocol + multi-language client/server monorepo) Date: 2026-02-25 through 2026-03-03 Tester: Kyle Mathews (library maintainer) Comparison baseline: Hand-crafted skills in PR #219
Domain Discovery
What worked well
Phase 1 reading was thorough and well-ordered. README → protocol spec → source → tests built context correctly. The concept inventory was comprehensive — identified all public exports, config options, error types, and protocol headers.
Failure modes from docs/source were high quality. "Awaiting IdempotentProducer.append()" (the tweaks to skills #1 agent mistake) was nailed from reading the type signature alone. "Treating Stream-Up-To-Date as EOF" caught a subtle protocol distinction.
Tension identification was valuable. The four tensions are real architectural forces. "Fire-and-forget throughput vs error visibility" is the most important thing for agents to understand about IdempotentProducer.
Gap identification fed excellent interview questions. "How should agents choose between stream(), DurableStream, and IdempotentProducer?" surfaced a clear three-API decision table.
What needs improvement
Over-split into 5 domains instead of 3. Grouped by architecture (lifecycle/writing/reading) instead of developer tasks ("I'm using Durable Streams"). The grouping heuristic needs a validation step: "Would a developer working on a single feature need to load multiple skills? If so, merge them."
Missed all framework-integration failure modes. The six highest-impact maintainer-sourced failure modes (SSR incompatibility, component lifecycle, browser connection limits, singleton patterns) weren't discoverable from the library's own source. Phase 1 needs to read peer dependency docs for integration constraints.
Failure modes were abstract, not actionable. The domain map describes mechanisms but doesn't show wrong/correct code pairs. Add wrong_pattern and correct_pattern fields to the failure_mode schema so they feed directly into SKILL.md generation.
"4–7 domains" target is too rigid. For Durable Streams, 3 was correct. Target should be driven by library complexity, not a fixed range.
Phase-by-phase grades
Phase
Grade
Notes
Phase 1 (Read everything)
A
Thorough, comprehensive
Phase 2a (Group concepts)
C+
Over-split core client
Phase 2e (Failure modes)
A-
High quality from docs, missed framework patterns
Phase 2f (Tensions)
A
All four tensions are real
Phase 3b (Gap-targeted)
A-
Found stale-offset bug and README error
Phase 3c (Agent-specific)
B+
Key finding: discoverability > misuse
Tree Generator
Observations
The tree generator step was straightforward given clean domain map artifacts. With 3 domains mapping to 3 skills (all tier 1), the tree was essentially flat.
For libraries with deeper skill hierarchies, the tree generator would matter more. For Durable Streams, the domain map already captured the structure.
Reference file identification (api.md, errors.md for the core skill) carried through correctly from domain map reference_candidates.
Suggestion
For flat skill trees (≤3 skills, all tier 1), the tree generator step could be simplified or auto-generated from the domain map. The current step adds overhead without adding information for simple libraries.
Generate Skill
What worked well
Generated SKILL.md files included wrong/correct code pairs for all failure modes — translating the domain map's abstract mechanisms into actionable examples.
The API Decision Table format (stream vs DurableStream vs IdempotentProducer) was a natural fit and matched the hand-crafted skill structure.
Reference files (api.md, errors.md) were generated with appropriate depth.
What needs improvement
500-line limit was tight. The core durable-streams SKILL.md hit 530 lines and needed manual trimming. The limit should either be increased or the generator should be aware of it and prioritize content accordingly.
No awareness of monorepo structure. Skills were generated assuming a single skills/ root. For monorepos where each package publishes independently, skills need to live in per-package directories (e.g., packages/client/skills/, packages/state/skills/). The scaffold prompt should ask about monorepo structure.
Scaffold CLI Feedback
Issues encountered during npx @tanstack/intent scaffold
Scaffold says npx intent validate but the correct command is npx @tanstack/intent validate. The scaffold output references intent as a bare command in the checklist, but it's not globally installed at that point.
No validate command existed at time of scaffold. The scaffold checklist includes "Run npx intent validate" but the CLI only had list and install. (This may have been added in a later version — we were on 0.0.4.)
Scanner requires an intent config field in package.json for discovery. The scanner should just look for "bin": { "intent": ... } — the bin entry is already the signal that a package ships Intent skills. Requiring a separate intent config object is redundant boilerplate. The repo and docs fields in that config can be derived from the existing repository and homepage fields in package.json.
No monorepo guidance. The scaffold assumes a single skills/ root directory. For monorepos with multiple publishable packages, each package needs its own skills directory, bin entry, and @tanstack/intent dependency. The scaffold should ask about monorepo structure and generate per-package setup.
_artifacts/ exclusion from publishing should be automatic. The scaffold tells maintainers to "exclude skills/_artifacts/ from package publishing" but doesn't say how. This required adding "!skills/_artifacts" to the files array in package.json. The scaffold should either generate this or the scanner should ignore _artifacts/ by convention.
Summary: What I'd want from v2
Grouping driven by developer tasks, not architecture — The biggest single improvement for domain discovery
Peer dependency reading — Framework integration failure modes are the highest value and currently missed
Code pattern pairs in failure modes — Wrong + correct code, not just mechanism descriptions
Monorepo support in scaffold — Per-package skill directories with clear setup instructions
Scanner uses bin.intent for discovery — Drop the redundant intent config field requirement
Flexible domain count — Don't enforce 4–7; let library complexity drive it
Line limit awareness in generator — Either increase the 500-line limit or make the generator budget-aware
Correct CLI commands in scaffold output — Use npx @tanstack/intent validate, not npx intent validate
Meta Skill Feedback: Durable Streams
Test run against: Durable Streams (pre-1.0 HTTP streaming protocol + multi-language client/server monorepo)
Date: 2026-02-25 through 2026-03-03
Tester: Kyle Mathews (library maintainer)
Comparison baseline: Hand-crafted skills in PR #219
Domain Discovery
What worked well
What needs improvement
Over-split into 5 domains instead of 3. Grouped by architecture (lifecycle/writing/reading) instead of developer tasks ("I'm using Durable Streams"). The grouping heuristic needs a validation step: "Would a developer working on a single feature need to load multiple skills? If so, merge them."
Missed all framework-integration failure modes. The six highest-impact maintainer-sourced failure modes (SSR incompatibility, component lifecycle, browser connection limits, singleton patterns) weren't discoverable from the library's own source. Phase 1 needs to read peer dependency docs for integration constraints.
Failure modes were abstract, not actionable. The domain map describes mechanisms but doesn't show wrong/correct code pairs. Add
wrong_patternandcorrect_patternfields to the failure_mode schema so they feed directly into SKILL.md generation."4–7 domains" target is too rigid. For Durable Streams, 3 was correct. Target should be driven by library complexity, not a fixed range.
Phase-by-phase grades
Tree Generator
Observations
reference_candidates.Suggestion
For flat skill trees (≤3 skills, all tier 1), the tree generator step could be simplified or auto-generated from the domain map. The current step adds overhead without adding information for simple libraries.
Generate Skill
What worked well
What needs improvement
500-line limit was tight. The core durable-streams SKILL.md hit 530 lines and needed manual trimming. The limit should either be increased or the generator should be aware of it and prioritize content accordingly.
No awareness of monorepo structure. Skills were generated assuming a single
skills/root. For monorepos where each package publishes independently, skills need to live in per-package directories (e.g.,packages/client/skills/,packages/state/skills/). The scaffold prompt should ask about monorepo structure.Scaffold CLI Feedback
Issues encountered during
npx @tanstack/intent scaffoldScaffold says
npx intent validatebut the correct command isnpx @tanstack/intent validate. The scaffold output referencesintentas a bare command in the checklist, but it's not globally installed at that point.No
validatecommand existed at time of scaffold. The scaffold checklist includes "Run npx intent validate" but the CLI only hadlistandinstall. (This may have been added in a later version — we were on 0.0.4.)Scanner requires an
intentconfig field in package.json for discovery. The scanner should just look for"bin": { "intent": ... }— the bin entry is already the signal that a package ships Intent skills. Requiring a separateintentconfig object is redundant boilerplate. Therepoanddocsfields in that config can be derived from the existingrepositoryandhomepagefields in package.json.No monorepo guidance. The scaffold assumes a single
skills/root directory. For monorepos with multiple publishable packages, each package needs its own skills directory, bin entry, and@tanstack/intentdependency. The scaffold should ask about monorepo structure and generate per-package setup._artifacts/exclusion from publishing should be automatic. The scaffold tells maintainers to "excludeskills/_artifacts/from package publishing" but doesn't say how. This required adding"!skills/_artifacts"to thefilesarray in package.json. The scaffold should either generate this or the scanner should ignore_artifacts/by convention.Summary: What I'd want from v2
bin.intentfor discovery — Drop the redundantintentconfig field requirementnpx @tanstack/intent validate, notnpx intent validate