diff --git a/docs/architecture/decisions/001-dual-engine-architecture.md b/docs/architecture/decisions/001-dual-engine-architecture.md new file mode 100644 index 00000000..ce084176 --- /dev/null +++ b/docs/architecture/decisions/001-dual-engine-architecture.md @@ -0,0 +1,105 @@ +# ADR-001: Dual-Engine Architecture (JS/WASM + Rust Native) + +**Date:** 2026-03-18 +**Status:** Accepted +**Context:** Architectural audit v3.1.4 (2026-03-16) raised the dual-engine maintenance cost as a concern. This ADR documents the rationale, trade-offs, and long-term trajectory. + +--- + +## Decision + +Codegraph maintains two parsing and analysis engines: + +1. **Rust native engine** — compiled to platform-specific `.node` addons via napi-rs, distributed as optional npm packages (`@optave/codegraph-{platform}-{arch}`) +2. **JS/WASM engine** — tree-sitter WASM grammars running in `web-tree-sitter`, built from devDependencies on `npm install` + +The `--engine auto` default (and recommended mode) uses native when available, WASM as fallback. Both engines feed the same SQLite graph — downstream queries and analysis are engine-agnostic. + +This is a settled architectural decision. + +--- + +## Context + +### The problem codegraph solves + +AI coding assistants waste tokens re-orienting themselves in large codebases, hallucinate dependencies, and miss blast radius. Codegraph exists to fix this for **large codebases** — the ones where AI agents actually struggle. Small codebases don't have this problem: an agent can read most of the code in a single context window. The tool's value scales with codebase size, which means performance at scale is not optional — it's the core requirement. + +### Why two engines exist + +The two engines serve fundamentally different deployment constraints: + +| Constraint | Rust native | JS/WASM | +|-----------|------------|---------| +| **Performance on large codebases** | 3-10x faster parsing, parallel via rayon | Single-threaded, slower | +| **CI/CD pipelines** | Requires prebuilt binary for the CI runner's platform | Runs anywhere Node.js runs — no binary needed | +| **VS Code extensions** | Cannot load native addons in VS Code web or restricted extension hosts | WASM runs in any V8 environment including VS Code webviews | +| **Browser environments** | Not possible | WASM runs natively | +| **Platform coverage** | Limited to platforms with prebuilt binaries (currently: linux-x64, darwin-arm64, darwin-x64, win32-x64) | Universal — any platform with Node.js ≥20 | +| **Install simplicity** | `npm install` pulls prebuilt binary via optionalDependencies (no Rust toolchain) | `npm install` builds WASM grammars from devDeps (no native compilation) | + +A single-engine architecture would force a choice: + +- **Rust-only** eliminates the WASM maintenance cost but locks out VS Code plugin development, browser-based visualization, and any CI runner without a prebuilt binary. This is the approach taken by `esbuild` — viable for a build tool, not for a tool that needs to run inside editor extensions and web contexts. +- **WASM-only** eliminates the native maintenance cost but sacrifices the 3-10x performance advantage that makes the tool viable on large codebases. A 15-second initial build on WASM becomes a 3-second build on native — the difference between "fast enough for interactive use" and "waiting for the tool." + +Neither trade-off is acceptable for codegraph's target use case. + +--- + +## Trade-offs + +### Costs of dual-engine + +1. **Maintenance multiplier.** Bug fixes and new features in parsing, extraction, import resolution, and analysis may need to be applied in both JS and Rust. This is real ongoing cost. + +2. **Parity verification.** The two engines must produce identical graphs for the same input. Parity tests exist but test specific inputs, not full behavioral equivalence. Divergence between engines is a class of bug that single-engine tools don't have. + +3. **New language cost.** Adding a language requires an extractor in both engines (Rust + JS/WASM). This doubles the per-language implementation effort. + +4. **Cognitive overhead.** Contributors must understand two codebases (32K LOC JS + 10K LOC Rust) with different idioms, toolchains, and debugging workflows. + +### Benefits of dual-engine + +1. **Performance where it matters.** Native Rust parsing at 3-10x WASM speed is the difference between codegraph being viable or not on 100K+ LOC codebases. With multi-repo integration on the roadmap, graphs will span multiple repositories — making parse performance even more critical. + +2. **Universal portability.** WASM fallback guarantees the tool works everywhere Node.js runs, regardless of platform, environment restrictions, or binary availability. This is essential for VS Code extensions, browser-based visualization, and CI runners on uncommon architectures. + +3. **Graceful degradation.** Users on unsupported platforms or restricted environments get full functionality at reduced speed, rather than no functionality. The `--engine auto` strategy handles this transparently. + +4. **Future optionality.** The WASM engine enables deployment targets that don't exist yet — browser-based code review tools, WebContainer environments (StackBlitz), cloud IDEs with restricted filesystem access. + +### Current parity state + +Today, some analysis phases (AST node extraction, CFG, dataflow, complexity) fall back to WASM even when the native engine is selected for parsing. This is a temporary state — Phase 6 (Native Analysis Acceleration) will port these remaining phases to Rust, eliminating the fallback and making the native path fully self-contained. Once complete, the WASM engine will be a true fallback for environments that can't run native code, not a required component of the native pipeline. + +--- + +## Trajectory + +The dual-engine architecture is not static. The expected evolution: + +1. **Phase 6 (Native Analysis Acceleration):** Port remaining JS-only build phases to Rust. After this, `--engine native` runs the entire pipeline in Rust with zero WASM dependency. The WASM engine becomes a standalone fallback path, not a supplement to native. + +2. **Multi-repo integration:** As codegraph supports cross-repository graphs, the data volume grows multiplicatively. A 5-repo monorepo with 50K LOC each means 250K LOC of parsing — native performance becomes non-negotiable. + +3. **VS Code extension:** The WASM engine enables in-editor graph queries without requiring users to install platform-specific binaries. The extension can run entirely in WASM for portability, with an option to delegate to a native CLI process for heavy operations (initial build, full rebuild). + +4. **Parity convergence.** As the Rust engine reaches full feature parity, the WASM engine's role narrows to "portable fallback." Maintenance cost decreases proportionally — the WASM engine receives bug fixes but not new features, since new analysis capabilities are implemented in Rust first and the WASM path is exercised only for compatibility. + +--- + +## Alternatives considered + +| Alternative | Why rejected | +|------------|-------------| +| **Rust-only (like esbuild)** | Locks out VS Code extensions, browser visualization, and CI runners without prebuilt binaries. Acceptable for a build tool, not for an analysis tool that must integrate into diverse environments | +| **WASM-only** | 3-10x slower on large codebases. Unacceptable for the target use case (100K+ LOC where AI agents struggle). Single-threaded WASM can't leverage multi-core parsing | +| **Native tree-sitter Node.js bindings** (not web-tree-sitter) | Would give native speed without Rust custom code, but only for parsing. Import resolution, edge building, and analysis would still be JS. Doesn't solve the full pipeline performance problem. Also adds `node-gyp` compilation step for all users, not just platform-specific prebuilt binaries | +| **Single Rust binary with WASM compilation target** (like oxc, biome, swc) | Would unify the codebase into one language. But codegraph's CLI orchestration, MCP server, and embeddings layer rely heavily on the Node.js ecosystem (`commander`, `@modelcontextprotocol/sdk`, `@huggingface/transformers`, `better-sqlite3`). Rewriting these in Rust is a multi-year effort with no user-facing benefit. The current split — Rust for hot-path parsing/analysis, JS for orchestration/MCP/CLI — puts each language where it's strongest | + +--- + +## Decision outcome + +The dual-engine architecture stays. The maintenance cost is real but bounded — it applies to parsing, extraction, and resolution (the hot path), not to the CLI, MCP, queries, or presentation layers which remain JS-only. The performance and portability benefits are load-bearing for the tool's target use case. As the native engine reaches full parity (Phase 6), the WASM engine's maintenance surface shrinks to "portable fallback" rather than "parallel implementation." diff --git a/docs/roadmap/BACKLOG.md b/docs/roadmap/BACKLOG.md index 23f84128..1e64cdd2 100644 --- a/docs/roadmap/BACKLOG.md +++ b/docs/roadmap/BACKLOG.md @@ -29,8 +29,8 @@ Both items are now **DONE**. These directly improved agent experience and graph | ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on | |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| -| 83 | ~~Hook-optimized `codegraph brief` command~~ | New `codegraph brief ` command designed for Claude Code hook context injection. Returns a compact, token-efficient summary per file: each symbol with its role and caller count (e.g. `buildGraph [core, 12 callers]`), blast radius count on importers (`Imported by: src/cli.js (+8 transitive)`), and overall file risk tier. Current `deps --json` output used by `enrich-context.sh` is shallow — just file-level imports/importedBy and symbol names with no role or blast radius info. The `brief` command would include: **(a)** symbol roles in the output — knowing a file defines `core` vs `leaf` symbols changes editing caution; **(b)** per-symbol transitive caller counts — makes blast radius visible without a separate `fn-impact` call; **(c)** file-level risk tier (high/medium/low based on max fan-in and role composition). Output optimized for `additionalContext` injection — single compact block, not verbose JSON. Also add `--brief` flag to `deps` as an alias. | Embeddability | The `enrich-context.sh` hook is the only codegraph context agents actually see (they ignore CLAUDE.md instructions to run commands manually). Making that passively-injected context richer — with roles, caller counts, and risk tiers — directly reduces blind edits to high-impact code. Currently the hook shows `Defines: function buildGraph` but not that it's a core symbol with 12 transitive callers | ✓ | ✓ | 4 | No | — | **DONE** — `codegraph brief ` command with symbol roles, caller counts, and risk tiers. CLI command, MCP tool, and presentation layer. ([#480](https://github.com/optave/codegraph/pull/480)) | -| 71 | ~~Basic type inference for typed languages~~ | Extract type annotations from TypeScript and Java AST nodes (variable declarations, function parameters, return types, generics) to resolve method calls through typed references. Currently `const x: Router = express.Router(); x.get(...)` produces no edge because `x.get` can't be resolved without knowing `x` is a `Router`. Tree-sitter already parses type annotations — we just don't use them for resolution. Start with declared types (no flow inference), which covers the majority of TS/Java code. | Resolution | Dramatically improves call graph completeness for TypeScript and Java — the two languages where developers annotate types explicitly and expect tooling to use them. Directly prevents hallucinated "no callers" results for methods called through typed variables | ✓ | ✓ | 5 | No | — | **DONE** — Type inference for all typed languages (TS, Java, Go, Rust, C#, PHP, Python). WASM + native engines. ([#501](https://github.com/optave/codegraph/pull/501)) | +| 83 | ~~Hook-optimized `codegraph brief` command~~ | ~~New `codegraph brief ` command designed for Claude Code hook context injection. Returns a compact, token-efficient summary per file: each symbol with its role and caller count (e.g. `buildGraph [core, 12 callers]`), blast radius count on importers (`Imported by: src/cli.js (+8 transitive)`), and overall file risk tier. Current `deps --json` output used by `enrich-context.sh` is shallow — just file-level imports/importedBy and symbol names with no role or blast radius info. The `brief` command would include: **(a)** symbol roles in the output — knowing a file defines `core` vs `leaf` symbols changes editing caution; **(b)** per-symbol transitive caller counts — makes blast radius visible without a separate `fn-impact` call; **(c)** file-level risk tier (high/medium/low based on max fan-in and role composition). Output optimized for `additionalContext` injection — single compact block, not verbose JSON. Also add `--brief` flag to `deps` as an alias.~~ | Embeddability | ~~The `enrich-context.sh` hook is the only codegraph context agents actually see (they ignore CLAUDE.md instructions to run commands manually). Making that passively-injected context richer — with roles, caller counts, and risk tiers — directly reduces blind edits to high-impact code. Currently the hook shows `Defines: function buildGraph` but not that it's a core symbol with 12 transitive callers~~ | ✓ | ✓ | 4 | No | — | **DONE** — `codegraph brief ` command with symbol roles, caller counts, and risk tiers. CLI command, MCP tool, and presentation layer. ([#480](https://github.com/optave/codegraph/pull/480)) | +| 71 | ~~Basic type inference for typed languages~~ | ~~Extract type annotations from TypeScript and Java AST nodes (variable declarations, function parameters, return types, generics) to resolve method calls through typed references. Currently `const x: Router = express.Router(); x.get(...)` produces no edge because `x.get` can't be resolved without knowing `x` is a `Router`. Tree-sitter already parses type annotations — we just don't use them for resolution. Start with declared types (no flow inference), which covers the majority of TS/Java code.~~ | Resolution | ~~Dramatically improves call graph completeness for TypeScript and Java — the two languages where developers annotate types explicitly and expect tooling to use them. Directly prevents hallucinated "no callers" results for methods called through typed variables~~ | ✓ | ✓ | 5 | No | — | **DONE** — Type inference for all typed languages (TS, Java, Go, Rust, C#, PHP, Python). WASM + native engines. ([#501](https://github.com/optave/codegraph/pull/501)) | ### Tier 1 — Zero-dep + Foundation-aligned (build these first) @@ -86,7 +86,7 @@ CFG is built for every function on every build (~100ms) but only consumed by the | ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on | |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| -| 45 | Cyclomatic complexity from CFG | `complexity.js` re-walks the tree-sitter AST to compute cyclomatic complexity. The CFG already has the data — cyclomatic complexity is literally `edges - nodes + 2` on the stored graph. Replace or supplement the AST walk with a SQL query against `cfg_blocks`/`cfg_edges`. | Analysis | Eliminates redundant AST walking for the most common complexity metric; single source of truth for control flow | ✓ | ✓ | 3 | No | — | +| 45 | ~~Cyclomatic complexity from CFG~~ | `complexity.js` re-walks the tree-sitter AST to compute cyclomatic complexity. The CFG already has the data — cyclomatic complexity is literally `edges - nodes + 2` on the stored graph. Replace or supplement the AST walk with a SQL query against `cfg_blocks`/`cfg_edges`. | Analysis | Eliminates redundant AST walking for the most common complexity metric; single source of truth for control flow | ✓ | ✓ | 3 | No | — | **DONE** — Cyclomatic complexity now derived from CFG structure (`E - N + 2`) as part of the unified AST analysis framework. CFG visitor rewrite in PR #392. | | 46 | Unreachable block detection in `check` | Query `cfg_blocks` for blocks with zero incoming edges (excluding entry blocks) to detect dead branches and unreachable code paths. Add as a `check` predicate: `--no-unreachable-blocks`. | CI | Catches dead code that static call-graph analysis misses — unreachable branches inside functions, not just unused functions | ✓ | ✓ | 3 | No | — | | 47 | CFG summary in `audit` | Include CFG structural summary in `audit` reports: block count, branch count, max path depth, try/catch coverage ratio. Agents get control flow complexity at a glance without running `cfg` separately. | Orchestration | Audit reports become more complete — agents understand control flow complexity alongside dependency and metric data in one call | ✓ | ✓ | 3 | No | — | | 48 | CFG metrics in triage risk scoring | Add CFG-derived dimensions to `triage.js`: branch density (edges/blocks), max nesting path length, exception handler ratio. Functions with complex control flow but low cyclomatic complexity (many small branches) get flagged. | Intelligence | Triage catches control-flow-heavy functions that cyclomatic complexity alone underweights | ✓ | ✓ | 3 | No | — | @@ -156,7 +156,7 @@ These address fundamental limitations in the parsing and resolution pipeline tha | ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on | |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| | 72 | Interprocedural dataflow analysis | Extend the existing intraprocedural dataflow (ID 14) to propagate `flows_to`/`returns`/`mutates` edges across function boundaries. When function A calls B with argument X, and B's dataflow shows X flows to its return value, connect A's call site to the downstream consumers of B's return. Requires stitching per-function dataflow summaries at call edges — no new parsing, just graph traversal over existing `dataflow` + `edges` tables. Start with single-level propagation (caller↔callee), not transitive closure. | Analysis | Current dataflow stops at function boundaries, missing the most important flows — data passing through helper functions, middleware chains, and factory patterns. Single-function scope means `dataflow` can't answer "where does this user input end up?" across call boundaries. Cross-function propagation is the difference between toy dataflow and useful taint-like analysis | ✓ | ✓ | 5 | No | 14 | -| 73 | Improved dynamic call resolution | Upgrade the current "best-effort" dynamic dispatch resolution for Python, Ruby, and JavaScript. Three concrete improvements: **(a)** receiver-type tracking — when `x = SomeClass()` is followed by `x.method()`, resolve `method` to `SomeClass.method` using the assignment chain (leverages existing `ast_nodes` + `dataflow` tables); **(b)** common pattern recognition — resolve `EventEmitter.on('event', handler)` callback registration, `Promise.then/catch` chains, `Array.map/filter/reduce` with named function arguments, and decorator/annotation patterns; **(c)** confidence-tiered edges — mark dynamically-resolved edges with a confidence score (high for direct assignment, medium for pattern match, low for heuristic) so consumers can filter by reliability. | Resolution | In Python/Ruby/JS, 30-60% of real calls go through dynamic dispatch — method calls on variables, callbacks, event handlers, higher-order functions. The current best-effort resolution misses most of these, leaving massive gaps in the call graph for the languages where codegraph is most commonly used. Even partial improvement here has outsized impact on graph completeness | ✓ | ✓ | 5 | No | — | +| 73 | ~~Improved dynamic call resolution~~ | ~~Upgrade the current "best-effort" dynamic dispatch resolution for Python, Ruby, and JavaScript. Three concrete improvements: **(a)** receiver-type tracking — when `x = SomeClass()` is followed by `x.method()`, resolve `method` to `SomeClass.method` using the assignment chain (leverages existing `ast_nodes` + `dataflow` tables); **(b)** common pattern recognition — resolve `EventEmitter.on('event', handler)` callback registration, `Promise.then/catch` chains, `Array.map/filter/reduce` with named function arguments, and decorator/annotation patterns; **(c)** confidence-tiered edges — mark dynamically-resolved edges with a confidence score (high for direct assignment, medium for pattern match, low for heuristic) so consumers can filter by reliability.~~ | Resolution | ~~In Python/Ruby/JS, 30-60% of real calls go through dynamic dispatch — method calls on variables, callbacks, event handlers, higher-order functions. The current best-effort resolution misses most of these, leaving massive gaps in the call graph for the languages where codegraph is most commonly used. Even partial improvement here has outsized impact on graph completeness~~ | ✓ | ✓ | 5 | No | — | **PROMOTED** — Moved to ROADMAP Phase 4.2 (Receiver Type Tracking for Method Dispatch) | | 81 | Track dynamic `import()` and re-exports as graph edges | Extract `import()` expressions as `dynamic-imports` edges in both WASM extraction paths (query-based and walk-based). Destructured names (`const { a } = await import(...)`) feed into `importedNames` for call resolution. **Partially done:** WASM JS/TS extraction works (PR #389). Remaining: **(a)** native Rust engine support — `crates/codegraph-core/src/extractors/javascript.rs` doesn't extract `import()` calls; **(b)** non-static paths (`import(\`./plugins/${name}.js\`)`, `import(variable)`) are skipped with a debug warning; **(c)** re-export consumer counting in `exports --unused` only checks `calls` edges, not `imports`/`dynamic-imports` — symbols consumed only via import edges show as zero-consumer false positives. | Resolution | Fixes false "zero consumers" reports for symbols consumed via dynamic imports. 95 `dynamic-imports` edges found in codegraph's own codebase — these were previously invisible to impact analysis, exports audit, and dead-export hooks | ✓ | ✓ | 5 | No | — | | 82 | Extract names from `import().then()` callback patterns | `extractDynamicImportNames` only extracts destructured names from `const { a } = await import(...)` (walks up to `variable_declarator`). The `.then()` pattern — `import('./foo.js').then(({ a, b }) => ...)` — produces an edge with empty names because the destructured parameters live in the `.then()` callback, not a `variable_declarator`. Detect when an `import()` call's parent is a `member_expression` with `.then`, find the arrow/function callback in `.then()`'s arguments, and extract parameter names from its destructuring pattern. | Resolution | `.then()`-style dynamic imports are common in older codebases and lazy-loading patterns (React.lazy, Webpack code splitting). Without name extraction, these produce file-level edges only — no symbol-level `calls` edges, so the imported symbols still appear as zero-consumer false positives | ✓ | ✓ | 4 | No | 81 | @@ -166,13 +166,30 @@ These close gaps in search expressiveness, cross-repo navigation, implementation | ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on | |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| -| 74 | Interface and trait implementation tracking | Extract `implements`/`extends`/trait-impl relationships from tree-sitter AST and store as `implements` edges in the graph. New `codegraph implementations ` command returns all concrete types that implement a given interface, abstract class, or trait. Inverse: `codegraph interfaces ` returns what a type implements. Cross-reference with existing `contains` edges for full type hierarchy. Covers TypeScript interfaces, Java interfaces/abstract classes, Go interfaces (structural matching via method set comparison), Rust traits, C# interfaces, PHP interfaces. | Navigation | Agents can answer "who implements this interface?" and "what contract does this type satisfy?" in one call — currently impossible without reading every file. Directly prevents missed blast radius when an interface signature changes, since all implementors are affected but invisible to the current call-graph-only impact analysis | ✓ | ✓ | 5 | No | — | +| 74 | ~~Interface and trait implementation tracking~~ | ~~Extract `implements`/`extends`/trait-impl relationships from tree-sitter AST and store as `implements` edges in the graph. New `codegraph implementations ` command returns all concrete types that implement a given interface, abstract class, or trait. Inverse: `codegraph interfaces ` returns what a type implements. Cross-reference with existing `contains` edges for full type hierarchy. Covers TypeScript interfaces, Java interfaces/abstract classes, Go interfaces (structural matching via method set comparison), Rust traits, C# interfaces, PHP interfaces.~~ | Navigation | ~~Agents can answer "who implements this interface?" and "what contract does this type satisfy?" in one call — currently impossible without reading every file. Directly prevents missed blast radius when an interface signature changes, since all implementors are affected but invisible to the current call-graph-only impact analysis~~ | ✓ | ✓ | 5 | No | — | **PROMOTED** — Moved to ROADMAP Phase 4.3 (Interface and Trait Implementation Tracking) | | 75 | Diff and commit content search | Search within git diffs and commit messages using the existing graph's file/symbol awareness. `codegraph search-history "pattern" --since 30d` searches `git log -p` output, returning matches with commit SHA, author, date, file, and enclosing function (resolved via line-number intersection with `nodes` table). Supports `--author`, `--file`, `--kind` filters to scope by symbol type. Unlike `co-change` (which tracks statistical co-occurrence), this searches actual diff content — "find every commit that modified the `authenticate` function" or "find when `TODO: hack` was introduced." | Search | Agents can trace when and why a function changed without leaving the graph — answers "who introduced this bug?" and "what changed in this module last month?" in one query. Eliminates manual `git log -p` + grep workflows that burn tokens on raw diff output | ✓ | ✓ | 4 | No | — | | 76 | Regression watchers (query-based commit monitors) | Define persistent watch rules that evaluate graph queries against each new commit during `build --watch` or incremental rebuild. Rules are declared in `.codegraphrc.json` under `monitors[]` — each rule has a name, a query type (`check` predicate, `search` pattern, `ast` pattern, or custom SQL), and an action (`warn`, `fail`, or `webhook`). Examples: "alert when a new call to `deprecatedAPI()` appears in a diff", "fail when a new `eval()` AST node is added", "warn when fan-in of any function exceeds 20 after this commit." Results surfaced in CLI output during watch mode and as a `monitors` section in `diff-impact`. | CI | Proactive detection of regressions as they happen — agents and CI pipelines get immediate feedback when a commit introduces banned patterns, exceeds thresholds, or violates architectural rules. Shifts detection left from periodic audits to per-commit triggers | ✓ | ✓ | 4 | No | — | | 77 | Metric trend tracking (code insights) | `codegraph trends` computes key graph metrics (total symbols, avg complexity, dead code count, cycle count, community drift score, boundary violations) at historical git revisions and outputs a time-series table or JSON. Uses `git stash && git checkout && build && collect && restore` loop over sampled commits (configurable `--samples N` defaulting to 10 evenly-spaced commits). Stores results in a `metric_snapshots` table for incremental updates. `--since` and `--until` for date range. `--metric` to select specific metrics. Enables tracking migration progress ("how many files still use old API?"), tech debt trends, and codebase growth over time without external dashboards. | Intelligence | Agents and teams can answer "is our codebase getting healthier or worse?" with data instead of intuition — tracks complexity trends, dead code accumulation, architectural drift, and migration progress over time. Historical backfill from git history means instant visibility into months of trends | ✓ | ✓ | 3 | No | — | | 78 | Cross-repo symbol resolution | In multi-repo mode, resolve import edges that cross repository boundaries. When repo A imports `@org/shared-lib`, and repo B is `@org/shared-lib` in the registry, create cross-repo edges linking A's import to B's actual exported symbol. Requires matching npm/pip/go package names to registered repos. Store cross-repo edges with a `repo` qualifier in the `edges` table. Enables cross-repo `fn-impact` (changing a shared library function shows impact across all consuming repos), cross-repo `path` queries, and cross-repo `diff-impact`. | Navigation | Multi-repo mode currently treats each repo as isolated — agents can search across repos but can't trace dependencies between them. Cross-repo edges enable "if I change this shared utility, which downstream repos break?" — the highest-value question in monorepo and multi-repo architectures | ✓ | ✓ | 5 | No | — | | 79 | Advanced query language with boolean operators and output shaping | Extend `codegraph search` and `codegraph where` with a structured query syntax supporting: **(a)** boolean operators — `kind:function AND file:src/` , `name:parse OR name:extract`, `NOT kind:class`; **(b)** compound filters — `kind:method AND complexity.cognitive>15 AND role:core`; **(c)** output shaping — `--select symbols` (just names), `--select files` (distinct files), `--select owners` (CODEOWNERS for matches), `--select stats` (aggregate counts by kind/file/role); **(d)** result aggregation — `--group-by file`, `--group-by kind`, `--group-by community` with counts. Parse the query into a SQL WHERE clause against the `nodes`/`function_complexity`/`edges` tables. Expose as `query_language` MCP tool parameter. | Search | Current search is either keyword/semantic (fuzzy) or exact-name (`where`). Agents needing "all core functions with cognitive complexity > 15 in src/api/" must chain multiple commands and filter manually — wasting tokens on intermediate results. A structured query language answers compound questions in one call | ✓ | ✓ | 4 | No | — | -| 80 | Find implementations in impact analysis | When a function signature or interface definition changes, automatically include all implementations/subtypes in `fn-impact` and `diff-impact` blast radius. Currently impact only follows `calls` edges — changing an interface method signature breaks every implementor, but this is invisible. Requires ID 74's `implements` edges. Add `--include-implementations` flag (on by default) to impact commands. | Analysis | Catches the most dangerous class of missed blast radius — interface/trait changes that silently break all implementors. A single method signature change on a widely-implemented interface can break dozens of files, none of which appear in the current call-graph-only impact analysis | ✓ | ✓ | 5 | No | 74 | +| 80 | ~~Find implementations in impact analysis~~ | ~~When a function signature or interface definition changes, automatically include all implementations/subtypes in `fn-impact` and `diff-impact` blast radius. Currently impact only follows `calls` edges — changing an interface method signature breaks every implementor, but this is invisible. Requires ID 74's `implements` edges. Add `--include-implementations` flag (on by default) to impact commands.~~ | Analysis | ~~Catches the most dangerous class of missed blast radius — interface/trait changes that silently break all implementors. A single method signature change on a widely-implemented interface can break dozens of files, none of which appear in the current call-graph-only impact analysis~~ | ✓ | ✓ | 5 | No | 74 | **PROMOTED** — Folded into ROADMAP Phase 4.3 (`--include-implementations` flag on impact commands) | + +### Tier 1j — Audit-identified gaps (from Architecture Audit v3.1.4) + +Items identified by the architectural audit (v3.1.4) that don't fit existing tiers. Most are zero-dep and foundation-aligned (see individual rows for exceptions). + +| ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on | +|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| +| 87 | ~~Confidence annotations on query output~~ | ~~Every query output should include resolution statistics: `{ resolved: N, unresolved_method_calls: M, confidence: 0.XX }`. When `fn-impact` shows a blast radius of 5 functions, it should note how many method-dispatch calls may be missing. CLI displays as a footer line; MCP tools include in JSON. Makes the tool honest about its known blind spots instead of presenting precise-looking numbers without qualification.~~ | Embeddability | ~~Agents and users can assess how much to trust a result — a blast radius of 5 with 82% confidence is actionable, while 5 with 40% confidence means "run more analysis". Directly prevents the "missed blast radius" problem by making the gap visible~~ | ✓ | ✓ | 5 | No | — | **PROMOTED** — Moved to ROADMAP Phase 7.10 | +| 88 | SCIP/LSP integration for top languages | Use compiler-grade symbol indices from TypeScript (tsc/tsserver), Python (Pyright), Go (gopls), and Rust (rust-analyzer) for import resolution and call graph construction. These tools export SCIP or LSP-compatible symbol tables that give 100% precision on resolved symbols — at near-zero implementation cost compared to building custom type inference. Current heuristic resolution becomes the fallback for languages without SCIP support. | Resolution | The single highest-value accuracy improvement available. Compiler-backed resolution eliminates the entire class of method-dispatch, overload, and type-narrowing resolution failures for the 4 most popular typed languages. Transforms codegraph from "~75% accurate heuristic" to "~99% accurate for TS/Py/Go/Rust, ~75% for the rest" | ✓ | ✓ | 5 | No | — | +| 89 | ~~Call resolution precision/recall benchmark suite~~ | ~~Hand-annotated fixture projects per language with `expected-edges.json` manifests declaring the correct call edges. Benchmark runner compares codegraph's resolved edges against expected, reports precision (correct / total resolved) and recall (correct / total expected). CI gate fails if metrics drop below baseline. Without this, there's no way to know if a change improves or degrades call graph quality. CodeScene publishes precision metrics; Sourcegraph SCIP indexers have per-language benchmarks; codegraph has none.~~ | Testing | ~~The only way to measure whether resolution changes actually improve accuracy — without it, "improvements" might silently break other resolution paths. Also provides regression protection as new languages and resolution modes are added~~ | ✓ | ✓ | 5 | No | — | **PROMOTED** — Moved to ROADMAP Phase 4.4 | +| 90 | ~~`package.json` `exports` field resolution~~ | ~~Modern Node.js standard since v12. Import resolution currently uses brute-force filesystem probing (tries 10+ extensions via `fs.existsSync()`). It doesn't read `package.json` `exports` field, meaning conditional exports, subpath patterns, and package self-references are invisible. Support subpath patterns, conditional exports (`"import"`, `"require"`, `"default"`), and fall back to filesystem probing only when `exports` is absent.~~ | Resolution | ~~Fixes resolution failures for any project using modern Node.js package conventions — which is most projects published since 2020. Currently produces low-confidence heuristic matches for imports that should resolve deterministically~~ | ✓ | ✓ | 4 | No | — | **PROMOTED** — Moved to ROADMAP Phase 4.5 | +| 91 | ~~Monorepo workspace resolution~~ | ~~`pnpm-workspace.yaml`, npm workspaces (`package.json` `workspaces`), and `lerna.json` are not recognized. Internal package imports (`@myorg/utils`) fall through to global resolution with low confidence. Detect workspace root, enumerate workspace packages, resolve internal imports to actual source files with high confidence (0.95).~~ | Resolution | ~~Fixes resolution for the most common monorepo patterns — affects every monorepo user. Internal package imports currently produce wrong or missing edges~~ | ✓ | ✓ | 4 | No | — | **PROMOTED** — Moved to ROADMAP Phase 4.6 (resolution layer only; full monorepo graph in Phase 12.2) | +| 92 | Auto-generate MCP tool schemas from types | The `tool-registry.js` (801 LOC) contains hand-maintained JSON schemas for 40+ tools. With TypeScript (Phase 5), these can be derived from types via `zod` or `typebox`. Schema drift between the implementation and MCP schema is inevitable with hand-maintenance. | Embeddability | Eliminates an entire class of MCP bugs (schema says one thing, implementation does another). Reduces maintenance burden for adding new MCP tools | ✗ | ✓ | 3 | No | Phase 5 (TypeScript) | +| 93 | ~~Shell completion for CLI~~ | ~~Commander supports shell completion but it's not implemented. `codegraph completion bash\|zsh\|fish` outputs the appropriate script. Basic UX gap for a CLI tool with 40+ commands.~~ | Developer Experience | ~~Tab completion makes the CLI discoverable — users find commands without reading docs. Reduces friction for new users and agents exploring available commands~~ | ✓ | ✓ | 2 | No | — | **PROMOTED** — Moved to ROADMAP Phase 7.11 | +| 94 | VS Code extension | Replace the planned Web UI (removed from roadmap Phase 12) with a VS Code extension providing: webview-based graph visualization (reusing the existing `viewer.js` HTML), go-to-definition via graph edges, inline impact annotations on hover, integration with the MCP server for queries, and a sidebar panel for triage/audit results. VS Code is the right UI target for developer tools in 2026. | Visualization | Developers get graph intelligence directly in their editor — no context switching to a browser or terminal. Impact annotations on hover surface blast radius without running commands | ✗ | ✓ | 3 | No | — | +| 95 | SARIF output for cycle detection | Add SARIF output format so cycle detection integrates with GitHub Code Scanning, showing issues inline in PRs. Currently planned for Phase 11 but could be delivered as early as Phase 7 since it's a pure output format addition. | CI | GitHub Code Scanning integration surfaces cycle violations directly in PR review — no separate CI step or comment bot needed | ✓ | ✓ | 3 | No | — | +| 96 | Fix README runtime dependency count | README claims "Only 3 runtime dependencies" but there are 5 — it omits `graphology` and `graphology-communities-louvain` which are in `package.json` `dependencies` (not optional). Correct to 5. | Documentation | Accuracy — users and contributors should be able to trust the README | ✓ | ✓ | 1 | No | — | ### Tier 2 — Foundation-aligned, needs dependencies diff --git a/docs/roadmap/ROADMAP.md b/docs/roadmap/ROADMAP.md index 01f082b9..9e6873c2 100644 --- a/docs/roadmap/ROADMAP.md +++ b/docs/roadmap/ROADMAP.md @@ -2,7 +2,7 @@ > **Current version:** 3.2.0 | **Status:** Active development | **Updated:** March 2026 -Codegraph is a strong local-first code graph CLI. This roadmap describes planned improvements across eleven phases -- closing gaps with commercial code intelligence platforms while preserving codegraph's core strengths: fully local, open source, zero cloud dependency by default. +Codegraph is a strong local-first code graph CLI. This roadmap describes planned improvements across twelve phases -- closing gaps with commercial code intelligence platforms while preserving codegraph's core strengths: fully local, open source, zero cloud dependency by default. **LLM strategy:** All LLM-powered features are **optional enhancements**. Everything works without an API key. When configured (OpenAI, Anthropic, Ollama, or any OpenAI-compatible endpoint), users unlock richer semantic search and natural language queries. @@ -17,14 +17,15 @@ Codegraph is a strong local-first code graph CLI. This roadmap describes planned | [**2.5**](#phase-25--analysis-expansion) | Analysis Expansion | Complexity metrics, community detection, flow tracing, co-change, manifesto, boundary rules, check, triage, audit, batch, hybrid search | **Complete** (v2.7.0) | | [**2.7**](#phase-27--deep-analysis--graph-enrichment) | Deep Analysis & Graph Enrichment | Dataflow analysis, intraprocedural CFG, AST node storage, expanded node/edge types, extractors refactoring, CLI consolidation, interactive viewer, exports command, normalizeSymbol | **Complete** (v3.0.0) | | [**3**](#phase-3--architectural-refactoring) | Architectural Refactoring (Vertical Slice) | Unified AST analysis framework, command/query separation, repository pattern, queries.js decomposition, composable MCP, CLI commands, domain errors, builder pipeline, presentation layer, domain grouping, curated API, unified graph model, qualified names, CLI composability | **Complete** (v3.1.5) | -| [**4**](#phase-4--native-analysis-acceleration) | Native Analysis Acceleration | Move JS-only build phases (AST nodes, CFG, dataflow, insert nodes, structure, roles, complexity) to Rust; fix incremental rebuild data loss on native; sub-100ms 1-file rebuilds | Planned | +| [**4**](#phase-4--resolution-accuracy) | Resolution Accuracy | Dead role sub-categories, receiver type tracking, interface/trait implementation edges, resolution precision/recall benchmarks, `package.json` exports field, monorepo workspace resolution | Planned | | [**5**](#phase-5--typescript-migration) | TypeScript Migration | Project setup, core type definitions, leaf -> core -> orchestration module migration, test migration, supply-chain security, CI coverage gates | Planned | -| [**6**](#phase-6--runtime--extensibility) | Runtime & Extensibility | Event-driven pipeline, unified engine strategy, subgraph export filtering, transitive confidence, query caching, configuration profiles, pagination, plugin system, DX & onboarding | Planned | -| [**7**](#phase-7--intelligent-embeddings) | Intelligent Embeddings | LLM-generated descriptions, enhanced embeddings, build-time semantic metadata, module summaries | Planned | -| [**8**](#phase-8--natural-language-queries) | Natural Language Queries | `ask` command, conversational sessions, LLM-narrated graph queries, onboarding tools | Planned | -| [**9**](#phase-9--expanded-language-support) | Expanded Language Support | 8 new languages (11 -> 19), parser utilities | Planned | -| [**10**](#phase-10--github-integration--ci) | GitHub Integration & CI | Reusable GitHub Action, LLM-enhanced PR review, visual impact graphs, SARIF output | Planned | -| [**11**](#phase-11--interactive-visualization--advanced-features) | Visualization & Advanced | Web UI, dead code detection, monorepo, agentic search, refactoring analysis | Planned | +| [**6**](#phase-6--native-analysis-acceleration) | Native Analysis Acceleration | Move JS-only build phases (AST nodes, CFG, dataflow, insert nodes, structure, roles, complexity) to Rust; fix incremental rebuild data loss on native; sub-100ms 1-file rebuilds | Planned | +| [**7**](#phase-7--runtime--extensibility) | Runtime & Extensibility | Event-driven pipeline, unified engine strategy, subgraph export filtering, transitive confidence, query caching, configuration profiles, pagination, plugin system, DX & onboarding, confidence annotations, shell completion | Planned | +| [**8**](#phase-8--intelligent-embeddings) | Intelligent Embeddings | LLM-generated descriptions, enhanced embeddings, build-time semantic metadata, module summaries | Planned | +| [**9**](#phase-9--natural-language-queries) | Natural Language Queries | `ask` command, conversational sessions, LLM-narrated graph queries, onboarding tools | Planned | +| [**10**](#phase-10--expanded-language-support) | Expanded Language Support | 8 new languages (11 -> 19), parser utilities | Planned | +| [**11**](#phase-11--github-integration--ci) | GitHub Integration & CI | Reusable GitHub Action, LLM-enhanced PR review, visual impact graphs, SARIF output | Planned | +| [**12**](#phase-12--advanced-features) | Advanced Features | Dead code detection, monorepo, agentic search, refactoring analysis | Planned | ### Dependency graph @@ -34,13 +35,14 @@ Phase 1 (Rust Core) |--> Phase 2.5 (Analysis Expansion) |--> Phase 2.7 (Deep Analysis & Graph Enrichment) |--> Phase 3 (Architectural Refactoring) - |--> Phase 4 (Native Analysis Acceleration) + |--> Phase 4 (Resolution Accuracy) |--> Phase 5 (TypeScript Migration) - |--> Phase 6 (Runtime & Extensibility) - |--> Phase 7 (Embeddings + Metadata) --> Phase 8 (NL Queries + Narration) - |--> Phase 9 (Languages) - |--> Phase 10 (GitHub/CI) <-- Phase 7 (risk_score, side_effects) -Phases 1-8 --> Phase 11 (Visualization + Refactoring Analysis) + |--> Phase 6 (Native Analysis Acceleration) + |--> Phase 7 (Runtime & Extensibility) + |--> Phase 8 (Embeddings + Metadata) --> Phase 9 (NL Queries + Narration) + |--> Phase 10 (Languages) + |--> Phase 11 (GitHub/CI) <-- Phase 8 (risk_score, side_effects) +Phases 1-9 --> Phase 12 (Advanced Features) ``` --- @@ -991,116 +993,83 @@ src/domain/ --- -## Phase 4 -- Native Analysis Acceleration +## Phase 4 -- Resolution Accuracy -**Goal:** Move the remaining JS-only build phases to Rust so that `--engine native` eliminates all redundant WASM visitor walks. Today only 3 of 10 build phases (parse, resolve imports, build edges) run in Rust — the other 7 execute identical JavaScript regardless of engine, leaving ~50% of native build time on the table. +> **Status:** Planned -**Why its own phase:** This is a substantial Rust engineering effort — porting 6 JS visitors to `crates/codegraph-core/`, fixing a data loss bug in incremental rebuilds, and optimizing the 1-file rebuild path. Doing this before the TS migration avoids rewriting the same visitor code twice (once to TS, once to Rust). The Phase 3 module boundaries make each phase a self-contained target. +**Goal:** Close the most impactful gaps in call graph accuracy before investing in type safety or native acceleration. The entire value proposition — blast radius, impact analysis, dependency chains — rests on the call graph. These targeted improvements make the graph trustworthy. -**Evidence (v3.1.4 benchmarks on 398 files):** +**Why before TypeScript:** These fixes operate on the existing JS codebase and produce measurable accuracy gains immediately. TypeScript types will further improve resolution later, but receiver tracking, dead role fixes, and precision benchmarks don't require types to implement. -| Phase | Native | WASM | Ratio | Status | -|-------|-------:|-----:|------:|--------| -| Parse | 468ms | 1483ms | 3.2x faster | Already Rust | -| Build edges | 88ms | 152ms | 1.7x faster | Already Rust | -| Resolve imports | 8ms | 9ms | ~1x | Already Rust | -| **AST nodes** | **361ms** | **347ms** | **~1x** | JS visitor — biggest win | -| **CFG** | **126ms** | **125ms** | **~1x** | JS visitor | -| **Dataflow** | **100ms** | **98ms** | **~1x** | JS visitor | -| **Insert nodes** | **143ms** | **148ms** | **~1x** | Pure SQLite batching | -| **Roles** | **29ms** | **32ms** | **~1x** | JS classification | -| **Structure** | **13ms** | **17ms** | **~1x** | JS directory tree | -| Complexity | 16ms | 77ms | 5x faster | Partly pre-computed | +### 4.1 -- Fix "Dead" Role Sub-categories -**Target:** Reduce native full-build time from ~1,400ms to ~700ms (2x improvement) by eliminating ~690ms of redundant JS visitor work. +The current `dead` role classification conflates genuinely different categories, making the tool's own metrics misleading. Of ~509 dead callable symbols in codegraph's own codebase: 151 are Rust FFI (invisible by design), 94 are CLI/MCP entry points (framework dispatch), 26 are AST visitors (dynamic dispatch), 125 are repository methods (receiver type unknown), and only ~94 are genuine dead code or resolution misses. -### 4.1 -- AST Node Extraction in Rust +- Add sub-categories to role classification: `dead-leaf` (parameters, properties, constants — leaf nodes by definition), `dead-entry` (framework dispatch: CLI commands, MCP tools, event handlers), `dead-ffi` (cross-language FFI boundaries), `dead-unresolved` (genuinely unreferenced callables — the real dead code) +- Update `classifyNodeRoles()` to use the new sub-categories +- Update `roles` command, `audit`, and `triage` to report sub-categories +- MCP `node_roles` tool gains `--role dead-entry`, `--role dead-unresolved` etc. -The largest single opportunity. Currently the native parser returns partial AST node data, so the JS `buildAstNodes()` visitor re-walks all WASM trees anyway (~361ms). +**Affected files:** `src/graph/classifiers/roles.js`, `src/shared/kinds.js`, `src/domain/analysis/roles.js`, `src/features/triage.js` -- Extend `crates/codegraph-core/` to extract all AST node types (`call`, `new`, `string`, `regex`, `throw`, `await`) during the native parse phase -- Return complete AST node data in the `FileSymbols` result so `run-analyses.js` can skip the WASM walker entirely -- Validate parity: ensure native extraction produces identical node counts to the WASM visitor (benchmark already tracks this via `nodes/file`) +### 4.2 -- Receiver Type Tracking for Method Dispatch -**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/ast.js`, `src/domain/graph/builder/stages/run-analyses.js` +The single highest-impact resolution improvement. Currently `obj.method()` resolves to ANY exported `method` in scope — no receiver type tracking. This misses repository pattern calls (`repo.findCallers()`), builder chains, and visitor dispatch. -### 4.2 -- CFG Construction in Rust +- Track variable-to-type assignments: when `const x = new SomeClass()` or `const x: SomeClass = ...`, record `x → SomeClass` in a per-file type map +- During edge building, resolve `x.method()` to `SomeClass.method` using the type map +- Leverage the existing `qualified_name` and `scope` columns (Phase 3.12) for matching +- Confidence: `1.0` for explicit constructor, `0.9` for type annotation, `0.7` for factory function return +- Covers: TypeScript annotated variables, constructor assignments, factory patterns -The intraprocedural control-flow graph visitor runs in JS even on native builds (~126ms). +**Affected files:** `src/domain/graph/builder/stages/build-edges.js`, `src/extractors/*.js` -- Port `createCfgVisitor()` logic to Rust: basic block detection, branch/loop edges, entry/exit nodes -- Return CFG block data per function in `FileSymbols` so the JS visitor is fully bypassed -- Validate parity: CFG block counts and edge counts must match the WASM visitor output +### 4.3 -- Interface and Trait Implementation Tracking -**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/cfg.js`, `src/ast-analysis/visitors/cfg-visitor.js` +Extract `implements`/`extends`/trait-impl relationships from tree-sitter AST and store as `implements` edges. When an interface signature changes, all implementors appear in impact analysis. -### 4.3 -- Dataflow Analysis in Rust +- New `codegraph implementations ` command — all concrete types implementing a given interface/trait +- Inverse: `codegraph interfaces ` — what a type implements +- Covers: TypeScript interfaces, Java interfaces/abstract classes, Go interfaces (structural matching), Rust traits, C# interfaces, PHP interfaces +- `fn-impact` and `diff-impact` include implementors in blast radius by default (`--include-implementations`, on by default) -Dataflow edges are computed by a JS visitor that walks WASM trees (~100ms on native builds). - -- Port `createDataflowVisitor()` to Rust: variable definitions, assignments, reads, def-use chains -- Return dataflow edges in `FileSymbols` -- Validate parity against WASM visitor output - -**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/dataflow.js`, `src/ast-analysis/visitors/dataflow-visitor.js` - -### 4.4 -- Batch SQLite Inserts via Rust - -`insertNodes` is pure SQLite work (~143ms) but runs row-by-row from JS. Batching in Rust can reduce JS↔native boundary crossings. - -- Expose a `batchInsertNodes(nodes[])` function from Rust that uses a single prepared statement in a transaction -- Alternatively, generate the SQL batch on the JS side and execute as a single `better-sqlite3` call (may be sufficient without Rust) -- Benchmark both approaches; pick whichever is faster - -**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/db/index.js`, `src/domain/graph/builder/stages/insert-nodes.js` +**Affected files:** `src/extractors/*.js`, `src/domain/graph/builder/stages/build-edges.js`, `src/domain/analysis/impact.js` -### 4.5 -- Role Classification & Structure in Rust +### 4.4 -- Call Resolution Precision/Recall Benchmark Suite -Smaller wins (~42ms combined) but complete the picture of a fully native build pipeline. +No tests currently measure call resolution accuracy. Add a benchmark suite that tracks precision/recall across versions. -- Port `classifyNodeRoles()` to Rust: hub/leaf/bridge/utility classification based on in/out degree and betweenness -- Port directory structure building and metrics aggregation -- Return role assignments and structure data alongside parse results +- Create `tests/benchmarks/resolution/` with hand-annotated fixture projects per language +- Each fixture declares expected call edges in a `expected-edges.json` manifest +- Benchmark runner compares codegraph's resolved edges against expected edges, reports precision (correct / total resolved) and recall (correct / total expected) +- Track metrics per language and per resolution mode (static, receiver-typed, interface-dispatched) +- CI gate: fail if precision or recall drops below baseline for any language +- Initial target: ≥85% precision, ≥80% recall for TypeScript and JavaScript -**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/structure.js`, `src/domain/graph/builder/stages/build-structure.js` +**New directory:** `tests/benchmarks/resolution/` -### 4.6 -- Complete Complexity Pre-computation +### 4.5 -- `package.json` Exports Field Resolution -Complexity is partly pre-computed by native (~16ms vs 77ms WASM) but not all functions are covered. +Modern Node.js standard since v12. Currently codegraph's import resolution uses brute-force filesystem probing (tries 10+ extensions via `fs.existsSync()`). It doesn't read `package.json` `exports` field, meaning conditional exports, subpath patterns, and package self-references are invisible. -- Ensure native parse computes cognitive, cyclomatic, Halstead, and MI metrics for every function, not just a subset -- Eliminate the WASM fallback path in `buildComplexityMetrics()` when running native +- Parse `package.json` `exports` field during import resolution +- Support subpath patterns (`"./lib/*": "./src/*.js"`) +- Support conditional exports (`"import"`, `"require"`, `"default"`) +- Fall back to current filesystem probing only when `exports` field is absent -**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/complexity.js` - -### 4.7 -- Fix Incremental Rebuild Data Loss on Native Engine - -**Bug:** On native 1-file rebuilds, complexity, CFG, and dataflow data for the changed file is **silently lost**. `purgeFilesFromGraph` removes the old data, but the analysis phases never re-compute it because: - -1. The native parser does not produce a `_tree` (WASM tree-sitter tree) -2. The unified walker at `src/ast-analysis/engine.js:108-109` skips files without `_tree` -3. The `buildXxx` functions check for pre-computed fields (`d.complexity`, `d.cfg?.blocks`) which the native parser does not provide for these analyses -4. Result: 0.1ms no-op — the phases run but do nothing - -This is confirmed by the v3.1.4 1-file rebuild data: complexity (0.1ms), CFG (0.1ms), dataflow (0.2ms) on native — these are just module import overhead, not actual computation. Contrast with v3.1.3 where the numbers were higher (1.3ms, 8.7ms, 4ms) because earlier versions triggered a WASM fallback tree via `ensureWasmTrees`. - -**Fix (prerequisite: 4.1–4.3):** Once the native parser returns complete AST nodes, CFG blocks, and dataflow edges in `FileSymbols`, the `run-analyses` stage can store them directly without needing a WASM tree. The incremental path must: - -- Ensure `parseFilesAuto()` returns pre-computed analysis data for the single changed file -- Have `run-analyses.js` store that data (currently it only stores if `_tree` exists or if pre-computed fields are present — the latter path needs to work reliably) -- Add an integration test: rebuild 1 file on native engine, then query its complexity/CFG/dataflow and assert non-empty results +**Affected files:** `src/domain/graph/resolve.js` -**Affected files:** `src/ast-analysis/engine.js`, `src/domain/graph/builder/stages/run-analyses.js`, `src/domain/parser.js`, `tests/integration/` +### 4.6 -- Monorepo Workspace Resolution -### 4.8 -- Incremental Rebuild Performance +`pnpm-workspace.yaml`, npm workspaces (`package.json` `workspaces`), and `lerna.json` are not recognized. Internal package imports (`@myorg/utils`) fall through to global resolution with low confidence. -With analysis data loss fixed, optimize the 1-file rebuild path end-to-end. Current native 1-file rebuild is 265ms — dominated by parse (51ms), structure (13ms), roles (27ms), edges (13ms), insert (12ms), and finalize (12ms). +> **Scope note:** This phase covers the *resolution layer only* — detecting workspace packages and resolving internal imports to source files. Full monorepo graph support (package node type, cross-package edges, `build --workspace` flag) is deferred to Phase 12.2. -- **Skip unchanged phases:** Structure and roles are graph-wide computations. On a 1-file change, only the changed file's nodes/edges need updating — skip full reclassification unless the file's degree changed significantly -- **Incremental edge rebuild:** Only rebuild edges involving the changed file's symbols, not the full edge set -- **Benchmark target:** Sub-100ms native 1-file rebuilds (from current 265ms) +- Detect workspace root and enumerate workspace packages +- Resolve internal package imports to actual source files within the monorepo +- Assign high confidence (0.95) to workspace-resolved imports -**Affected files:** `src/domain/graph/builder/stages/build-structure.js`, `src/domain/graph/builder/stages/build-edges.js`, `src/domain/graph/builder/pipeline.js` +**Affected files:** `src/domain/graph/resolve.js`, `src/infrastructure/config.js` --- @@ -1108,7 +1077,7 @@ With analysis data loss fixed, optimize the 1-file rebuild path end-to-end. Curr **Goal:** Migrate the codebase from plain JavaScript to TypeScript, leveraging the clean module boundaries established in Phase 3. Incremental module-by-module migration starting from leaf modules inward. -**Why after Phase 4:** The architectural refactoring (Phase 3) creates small, well-bounded modules with explicit interfaces. Phase 4 moves the remaining hot-path visitor code to Rust — doing TS migration first would mean rewriting those visitors to TypeScript only to delete them when porting to Rust. With both phases complete, the JS layer is purely orchestration and presentation, which is the ideal surface for TypeScript. +**Why after Phase 4:** The resolution accuracy work (Phase 4) operates on the existing JS codebase and produces immediate accuracy gains. TypeScript migration builds on Phase 3's clean module boundaries to add type safety across the entire codebase. Every subsequent phase benefits from types: MCP schema auto-generation, API contracts, refactoring safety. The Phase 4 resolution improvements (receiver tracking, interface edges) establish the resolution model that TypeScript types will formalize. ### 5.1 -- Project Setup @@ -1236,15 +1205,138 @@ Migrate top-level orchestration and entry points: **Affected files:** `.github/workflows/ci.yml`, `vitest.config.js`, `tests/` +### 5.9 -- Kill List (Technical Debt Cleanup) + +Items to remove or rework during the TypeScript migration, identified by architectural audit: + +1. **Remove Maintainability Index computation** — The 1991 Coleman-Oman formula (171 - 5.2*ln(V) - 0.23*G - 16.2*ln(LOC)) was validated on Fortran and C, not modern languages with closures, async/await, and higher-order functions. Microsoft deprecated their MI implementation in 2023. Remove from `ast-analysis/metrics.js` and `complexity` output, or replace with a validated metric +2. **Scope Halstead metrics to imperative code** — Halstead operator/operand counting is meaningless for JSX, template literals, HCL, and declarative code. Either scope to imperative code blocks or remove +3. **Migrate custom `graph/model.js` to `graphology`** — `graphology` is already a runtime dependency. The custom model reimplements `addNode`, `addEdge`, `successors`, `predecessors`, `inDegree`, `outDegree` — all available natively in `graphology`. Migrate during the TypeScript migration to avoid maintaining two graph representations +4. **Skip WASM loading on platforms with native binaries** — On supported platforms (darwin-arm64, linux-x64, win32-x64), WASM should not be loaded at all. Currently `loadNative()` is checked on every call in `resolve.js` + --- -## Phase 6 -- Runtime & Extensibility +## Phase 6 -- Native Analysis Acceleration + +**Goal:** Move the remaining JS-only build phases to Rust so that `--engine native` eliminates all redundant WASM visitor walks. Today only 3 of 10 build phases (parse, resolve imports, build edges) run in Rust — the other 7 execute identical JavaScript regardless of engine, leaving ~50% of native build time on the table. + +**Why its own phase:** This is a substantial Rust engineering effort — porting 6 JS visitors to `crates/codegraph-core/`, fixing a data loss bug in incremental rebuilds, and optimizing the 1-file rebuild path. With TypeScript types (Phase 5) defining the interface contracts, the Rust ports can target well-typed boundaries. The Phase 3 module boundaries make each phase a self-contained target. + +**Evidence (v3.1.4 benchmarks on 398 files):** + +| Phase | Native | WASM | Ratio | Status | +|-------|-------:|-----:|------:|--------| +| Parse | 468ms | 1483ms | 3.2x faster | Already Rust | +| Build edges | 88ms | 152ms | 1.7x faster | Already Rust | +| Resolve imports | 8ms | 9ms | ~1x | Already Rust | +| **AST nodes** | **361ms** | **347ms** | **~1x** | JS visitor — biggest win | +| **CFG** | **126ms** | **125ms** | **~1x** | JS visitor | +| **Dataflow** | **100ms** | **98ms** | **~1x** | JS visitor | +| **Insert nodes** | **143ms** | **148ms** | **~1x** | Pure SQLite batching | +| **Roles** | **29ms** | **32ms** | **~1x** | JS classification | +| **Structure** | **13ms** | **17ms** | **~1x** | JS directory tree | +| Complexity | 16ms | 77ms | 5x faster | Partly pre-computed | + +**Target:** Reduce native full-build time from ~1,400ms to ~700ms (2x improvement) by eliminating ~690ms of redundant JS visitor work. + +### 6.1 -- AST Node Extraction in Rust + +The largest single opportunity. Currently the native parser returns partial AST node data, so the JS `buildAstNodes()` visitor re-walks all WASM trees anyway (~361ms). + +- Extend `crates/codegraph-core/` to extract all AST node types (`call`, `new`, `string`, `regex`, `throw`, `await`) during the native parse phase +- Return complete AST node data in the `FileSymbols` result so `run-analyses.js` can skip the WASM walker entirely +- Validate parity: ensure native extraction produces identical node counts to the WASM visitor (benchmark already tracks this via `nodes/file`) + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/ast.js`, `src/domain/graph/builder/stages/run-analyses.js` + +### 6.2 -- CFG Construction in Rust + +The intraprocedural control-flow graph visitor runs in JS even on native builds (~126ms). + +- Port `createCfgVisitor()` logic to Rust: basic block detection, branch/loop edges, entry/exit nodes +- Return CFG block data per function in `FileSymbols` so the JS visitor is fully bypassed +- Validate parity: CFG block counts and edge counts must match the WASM visitor output + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/cfg.js`, `src/ast-analysis/visitors/cfg-visitor.js` + +### 6.3 -- Dataflow Analysis in Rust + +Dataflow edges are computed by a JS visitor that walks WASM trees (~100ms on native builds). + +- Port `createDataflowVisitor()` to Rust: variable definitions, assignments, reads, def-use chains +- Return dataflow edges in `FileSymbols` +- Validate parity against WASM visitor output + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/dataflow.js`, `src/ast-analysis/visitors/dataflow-visitor.js` + +### 6.4 -- Batch SQLite Inserts via Rust + +`insertNodes` is pure SQLite work (~143ms) but runs row-by-row from JS. Batching in Rust can reduce JS↔native boundary crossings. + +- Expose a `batchInsertNodes(nodes[])` function from Rust that uses a single prepared statement in a transaction +- Alternatively, generate the SQL batch on the JS side and execute as a single `better-sqlite3` call (may be sufficient without Rust) +- Benchmark both approaches; pick whichever is faster + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/db/index.js`, `src/domain/graph/builder/stages/insert-nodes.js` + +### 6.5 -- Role Classification & Structure in Rust + +Smaller wins (~42ms combined) but complete the picture of a fully native build pipeline. + +- Port `classifyNodeRoles()` to Rust: hub/leaf/bridge/utility classification based on in/out degree and betweenness +- Port directory structure building and metrics aggregation +- Return role assignments and structure data alongside parse results + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/structure.js`, `src/domain/graph/builder/stages/build-structure.js` + +### 6.6 -- Complete Complexity Pre-computation + +Complexity is partly pre-computed by native (~16ms vs 77ms WASM) but not all functions are covered. + +- Ensure native parse computes cognitive and cyclomatic metrics for every function, not just a subset +- Halstead and MI are scoped by Phase 5.9 (Kill List): MI will be removed entirely; Halstead will be limited to imperative code blocks. Native acceleration should only target the metrics that survive the Kill List +- Eliminate the WASM fallback path in `buildComplexityMetrics()` when running native + +**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/features/complexity.js` + +### 6.7 -- Fix Incremental Rebuild Data Loss on Native Engine + +**Bug:** On native 1-file rebuilds, complexity, CFG, and dataflow data for the changed file is **silently lost**. `purgeFilesFromGraph` removes the old data, but the analysis phases never re-compute it because: + +1. The native parser does not produce a `_tree` (WASM tree-sitter tree) +2. The unified walker at `src/ast-analysis/engine.js:108-109` skips files without `_tree` +3. The `buildXxx` functions check for pre-computed fields (`d.complexity`, `d.cfg?.blocks`) which the native parser does not provide for these analyses +4. Result: 0.1ms no-op — the phases run but do nothing + +This is confirmed by the v3.1.4 1-file rebuild data: complexity (0.1ms), CFG (0.1ms), dataflow (0.2ms) on native — these are just module import overhead, not actual computation. Contrast with v3.1.3 where the numbers were higher (1.3ms, 8.7ms, 4ms) because earlier versions triggered a WASM fallback tree via `ensureWasmTrees`. + +**Fix (prerequisite: 6.1–6.3):** Once the native parser returns complete AST nodes, CFG blocks, and dataflow edges in `FileSymbols`, the `run-analyses` stage can store them directly without needing a WASM tree. The incremental path must: + +- Ensure `parseFilesAuto()` returns pre-computed analysis data for the single changed file +- Have `run-analyses.js` store that data (currently it only stores if `_tree` exists or if pre-computed fields are present — the latter path needs to work reliably) +- Add an integration test: rebuild 1 file on native engine, then query its complexity/CFG/dataflow and assert non-empty results + +**Affected files:** `src/ast-analysis/engine.js`, `src/domain/graph/builder/stages/run-analyses.js`, `src/domain/parser.js`, `tests/integration/` + +### 6.8 -- Incremental Rebuild Performance + +With analysis data loss fixed, optimize the 1-file rebuild path end-to-end. Current native 1-file rebuild is 265ms — dominated by parse (51ms), structure (13ms), roles (27ms), edges (13ms), insert (12ms), and finalize (12ms). + +- **Skip unchanged phases:** Structure and roles are graph-wide computations. On a 1-file change, only the changed file's nodes/edges need updating — skip full reclassification unless the file's degree changed significantly +- **Incremental edge rebuild:** Only rebuild edges involving the changed file's symbols, not the full edge set +- **Benchmark target:** Sub-100ms native 1-file rebuilds (from current 265ms) + +**Affected files:** `src/domain/graph/builder/stages/build-structure.js`, `src/domain/graph/builder/stages/build-edges.js`, `src/domain/graph/builder/pipeline.js` + +--- + +## Phase 7 -- Runtime & Extensibility **Goal:** Harden the runtime for large codebases and open the platform to external contributors. These items were deferred from Phase 3 -- they depend on the clean module boundaries and domain layering established there, and benefit from TypeScript's type safety (Phase 5) for safe refactoring of cross-cutting concerns like caching, streaming, and plugin contracts. -**Why after TypeScript Migration:** Several of these items introduce new internal contracts (plugin API, cache interface, streaming protocol, engine strategy). Defining those contracts in TypeScript from the start avoids a second migration pass and gives contributors type-checked extension points. +**Why after TypeScript Migration:** Several of these items introduce new internal contracts (plugin API, cache interface, streaming protocol, engine strategy). Defining those contracts in TypeScript from the start avoids a second migration pass and gives contributors type-checked extension points (Phase 5). -### 6.1 -- Event-Driven Pipeline +### 7.1 -- Event-Driven Pipeline Replace the synchronous build/analysis pipeline with an event/streaming architecture. Enables progress reporting, cancellation tokens, and bounded memory usage on large repositories (10K+ files). @@ -1256,7 +1348,7 @@ Replace the synchronous build/analysis pipeline with an event/streaming architec **Affected files:** `src/domain/graph/builder.js`, `src/cli/`, `src/mcp/` -### 6.2 -- Unified Engine Interface (Strategy Pattern) +### 7.2 -- Unified Engine Interface (Strategy Pattern) Replace scattered `engine.name === 'native'` / `engine === 'wasm'` branching throughout the codebase with a formal Strategy pattern. Each engine implements a common `ParsingEngine` interface with methods like `parse(file)`, `batchParse(files)`, `supports(language)`, and `capabilities()`. @@ -1268,7 +1360,7 @@ Replace scattered `engine.name === 'native'` / `engine === 'wasm'` branching thr **Affected files:** `src/infrastructure/native.js`, `src/domain/parser.js`, `src/domain/graph/builder.js` -### 6.3 -- Subgraph Export Filtering +### 7.3 -- Subgraph Export Filtering Add focus and depth controls to `codegraph export` so users can produce usable visualizations of specific subsystems rather than the entire graph. @@ -1285,7 +1377,7 @@ codegraph export --focus "buildGraph" --depth 3 --format dot **Affected files:** `src/features/export.js`, `src/presentation/export.js` -### 6.4 -- Transitive Import-Aware Confidence +### 7.4 -- Transitive Import-Aware Confidence Improve import resolution accuracy by walking the import graph before falling back to proximity heuristics. Currently the 6-level priority system uses directory proximity as a strong signal, but this can mis-resolve when a symbol is re-exported through an index file several directories away. @@ -1296,7 +1388,7 @@ Improve import resolution accuracy by walking the import graph before falling ba **Affected files:** `src/domain/graph/resolve.js` -### 6.5 -- Query Result Caching +### 7.5 -- Query Result Caching Add an LRU/TTL cache layer between the analysis/query functions and the SQLite repository. With 34+ MCP tools that often run overlapping queries within a session, caching eliminates redundant DB round-trips. @@ -1309,7 +1401,7 @@ Add an LRU/TTL cache layer between the analysis/query functions and the SQLite r **Affected files:** `src/domain/analysis/`, `src/db/index.js` -### 6.6 -- Configuration Profiles +### 7.6 -- Configuration Profiles Support named configuration profiles for monorepos and multi-service projects where different parts of the codebase need different settings. @@ -1330,7 +1422,7 @@ Support named configuration profiles for monorepos and multi-service projects wh **Affected files:** `src/infrastructure/config.js`, `src/cli/` -### 6.7 -- Pagination Standardization +### 7.7 -- Pagination Standardization Standardize SQL-level `LIMIT`/`OFFSET` pagination across all repository queries and surface it consistently through the CLI and MCP. @@ -1342,7 +1434,7 @@ Standardize SQL-level `LIMIT`/`OFFSET` pagination across all repository queries **Affected files:** `src/shared/paginate.js`, `src/db/index.js`, `src/domain/analysis/`, `src/mcp/` -### 6.8 -- Plugin System for Custom Commands +### 7.8 -- Plugin System for Custom Commands Allow users to extend codegraph with custom commands by dropping a JS/TS module into `~/.codegraph/plugins/` (global) or `.codegraph/plugins/` (project-local). @@ -1370,7 +1462,7 @@ export function data(db: Database, args: ParsedArgs, config: Config): object { **Affected files:** `src/cli/`, `src/mcp/`, new `src/infrastructure/plugins.js` -### 6.9 -- Developer Experience & Onboarding +### 7.9 -- Developer Experience & Onboarding Lower the barrier to first successful use. Today codegraph requires manual install, manual config, and prior knowledge of which command to run next. @@ -1382,15 +1474,36 @@ Lower the barrier to first successful use. Today codegraph requires manual insta **Affected files:** new `src/cli/commands/init.js`, `docs/benchmarks/`, `docs/editors/`, `src/presentation/result-formatter.js` +### 7.10 -- Confidence Annotations on Query Output + +Every query output should communicate its known limitations. When `fn-impact` shows a blast radius of 5 functions, it should note how many method-dispatch calls may be missing. + +- Add `confidence` and `resolution_stats` fields to all `*Data()` function return values +- Format: `{ resolved: N, unresolved_method_calls: M, confidence: 0.XX }` +- CLI displays as a footer line: `"5 affected functions (confidence: 82%, 3 unresolved method calls in scope)"` +- MCP tools include the fields in JSON responses + +**Affected files:** `src/domain/analysis/*.js`, `src/presentation/result-formatter.js` + +### 7.11 -- Shell Completion + +Commander supports shell completion but it's not implemented. Basic UX gap for a CLI tool with 40+ commands. + +- Generate bash/zsh/fish completion scripts via Commander's built-in support +- `codegraph completion bash|zsh|fish` outputs the script +- Document in README + +**Affected files:** `src/cli/index.js` + --- -## Phase 7 -- Intelligent Embeddings +## Phase 8 -- Intelligent Embeddings **Goal:** Dramatically improve semantic search quality by embedding natural-language descriptions instead of raw code. -> **Phase 7.3 (Hybrid Search) was completed early** during Phase 2.5 -- FTS5 BM25 + semantic search with RRF fusion is already shipped in v2.7.0. +> **Phase 8.3 (Hybrid Search) was completed early** during Phase 2.5 -- FTS5 BM25 + semantic search with RRF fusion is already shipped in v2.7.0. -### 7.1 -- LLM Description Generator +### 8.1 -- LLM Description Generator For each function/method/class node, generate a concise natural-language description: @@ -1418,7 +1531,7 @@ For each function/method/class node, generate a concise natural-language descrip **New file:** `src/describer.js` -### 7.2 -- Enhanced Embedding Pipeline +### 8.2 -- Enhanced Embedding Pipeline - When descriptions exist, embed the description text instead of raw code - Keep raw code as fallback when no description is available @@ -1429,11 +1542,11 @@ For each function/method/class node, generate a concise natural-language descrip **Affected files:** `src/embedder.js` -### ~~7.3 -- Hybrid Search~~ ✅ Completed in Phase 2.5 +### ~~8.3 -- Hybrid Search~~ ✅ Completed in Phase 2.5 Shipped in v2.7.0. FTS5 BM25 keyword search + semantic vector search with RRF fusion. Three search modes: `hybrid` (default), `semantic`, `keyword`. -### 7.4 -- Build-time Semantic Metadata +### 8.4 -- Build-time Semantic Metadata Enrich nodes with LLM-generated metadata beyond descriptions. Computed incrementally at build time (only for changed nodes), stored as columns on the `nodes` table. @@ -1446,9 +1559,9 @@ Enrich nodes with LLM-generated metadata beyond descriptions. Computed increment - MCP tool: `assess ` -- returns complexity rating + specific concerns - Cascade invalidation: when a node changes, mark dependents for re-enrichment -**Depends on:** 7.1 (LLM provider abstraction) +**Depends on:** 8.1 (LLM provider abstraction) -### 7.5 -- Module Summaries +### 8.5 -- Module Summaries Aggregate function descriptions + dependency direction into file-level narratives. @@ -1456,17 +1569,17 @@ Aggregate function descriptions + dependency direction into file-level narrative - MCP tool: `explain_module ` -- returns module purpose, key exports, role in the system - `naming_conventions` metadata per module -- detected patterns (camelCase, snake_case, verb-first), flag outliers -**Depends on:** 7.1 (function-level descriptions must exist first) +**Depends on:** 8.1 (function-level descriptions must exist first) > **Full spec:** See [llm-integration.md](./llm-integration.md) for detailed architecture, infrastructure table, and prompt design. --- -## Phase 8 -- Natural Language Queries +## Phase 9 -- Natural Language Queries **Goal:** Allow developers to ask questions about their codebase in plain English. -### 8.1 -- Query Engine +### 9.1 -- Query Engine ```bash codegraph ask "How does the authentication flow work?" @@ -1492,7 +1605,7 @@ codegraph ask "How does the authentication flow work?" **New file:** `src/nlquery.js` -### 8.2 -- Conversational Sessions +### 9.2 -- Conversational Sessions Multi-turn conversations with session memory. @@ -1506,7 +1619,7 @@ codegraph sessions clear - Store conversation history in SQLite table `sessions` - Include prior Q&A pairs in subsequent prompts -### 8.3 -- MCP Integration +### 9.3 -- MCP Integration New MCP tool: `ask_codebase` -- natural language query via MCP. @@ -1514,7 +1627,7 @@ Enables AI coding agents (Claude Code, Cursor, etc.) to ask codegraph questions **Affected files:** `src/mcp.js` -### 8.4 -- LLM-Narrated Graph Queries +### 9.4 -- LLM-Narrated Graph Queries Graph traversal + LLM narration for questions that require both structural data and natural-language explanation. Each query walks the graph first, then sends the structural result to the LLM for narration. @@ -1527,9 +1640,9 @@ Graph traversal + LLM narration for questions that require both structural data Pre-computed `flow_narratives` table caches results for key entry points at build time, invalidated when any node in the chain changes. -**Depends on:** 7.4 (`side_effects` metadata), 7.1 (descriptions for narration context) +**Depends on:** 8.4 (`side_effects` metadata), 8.1 (descriptions for narration context) -### 8.5 -- Onboarding & Navigation Tools +### 9.5 -- Onboarding & Navigation Tools Help new contributors and AI agents orient in an unfamiliar codebase. @@ -1538,15 +1651,15 @@ Help new contributors and AI agents orient in an unfamiliar codebase. - MCP tool: `get_started` -- returns ordered list: "start here, then read this, then this" - `change_plan ` -- LLM reads description, graph identifies relevant modules, returns touch points and test coverage gaps -**Depends on:** 7.5 (module summaries for context), 8.1 (query engine) +**Depends on:** 8.5 (module summaries for context), 9.1 (query engine) --- -## Phase 9 -- Expanded Language Support +## Phase 10 -- Expanded Language Support **Goal:** Go from 11 -> 19 supported languages. -### 9.1 -- Batch 1: High Demand +### 10.1 -- Batch 1: High Demand | Language | Extensions | Grammar | Effort | |----------|-----------|---------|--------| @@ -1555,7 +1668,7 @@ Help new contributors and AI agents orient in an unfamiliar codebase. | Kotlin | `.kt`, `.kts` | `tree-sitter-kotlin` | Low | | Swift | `.swift` | `tree-sitter-swift` | Medium | -### 9.2 -- Batch 2: Growing Ecosystems +### 10.2 -- Batch 2: Growing Ecosystems | Language | Extensions | Grammar | Effort | |----------|-----------|---------|--------| @@ -1564,7 +1677,7 @@ Help new contributors and AI agents orient in an unfamiliar codebase. | Lua | `.lua` | `tree-sitter-lua` | Low | | Zig | `.zig` | `tree-sitter-zig` | Low | -### 9.3 -- Parser Abstraction Layer +### 10.3 -- Parser Abstraction Layer Extract shared patterns from existing extractors into reusable helpers. @@ -1580,13 +1693,13 @@ Extract shared patterns from existing extractors into reusable helpers. --- -## Phase 10 -- GitHub Integration & CI +## Phase 11 -- GitHub Integration & CI **Goal:** Bring codegraph's analysis into pull request workflows. > **Note:** Phase 2.5 delivered `codegraph check` (CI validation predicates with exit code 0/1), which provides the foundation for GitHub Action integration. The boundary violation, blast radius, and cycle detection predicates are already available. -### 10.1 -- Reusable GitHub Action +### 11.1 -- Reusable GitHub Action A reusable GitHub Action that runs on PRs: @@ -1609,7 +1722,7 @@ A reusable GitHub Action that runs on PRs: **New file:** `.github/actions/codegraph-ci/action.yml` -### 10.2 -- PR Review Integration +### 11.2 -- PR Review Integration ```bash codegraph review --pr @@ -1632,7 +1745,7 @@ Requires `gh` CLI. For each changed function: **New file:** `src/github.js` -### 10.3 -- Visual Impact Graphs for PRs +### 11.3 -- Visual Impact Graphs for PRs Extend the existing `diff-impact --format mermaid` foundation with CI automation and LLM annotations. @@ -1653,15 +1766,17 @@ Extend the existing `diff-impact --format mermaid` foundation with CI automation - Highlight fragile nodes: high churn + high fan-in = high breakage risk - Track blast radius trends: "this PR's blast radius is 2x larger than your average" -**Depends on:** 10.1 (GitHub Action), 7.4 (`risk_score`, `side_effects`) +**Depends on:** 11.1 (GitHub Action), 8.4 (`risk_score`, `side_effects`) + +### 11.4 -- SARIF Output -### 10.4 -- SARIF Output +> **Note:** SARIF output could be delivered as early as Phase 7 for IDE integration, since it only requires serializing existing cycle/check data into the SARIF JSON schema. Add SARIF output format for cycle detection. SARIF integrates with GitHub Code Scanning, showing issues inline in the PR. **Affected files:** `src/export.js` -### 10.5 -- Auto-generated Docstrings +### 11.5 -- Auto-generated Docstrings ```bash codegraph annotate @@ -1670,34 +1785,13 @@ codegraph annotate --changed-only LLM-generated docstrings aware of callers, callees, and types. Diff-aware: only regenerate for functions whose code or dependencies changed. Stores in `docstrings` column on nodes table -- does not modify source files unless explicitly requested. -**Depends on:** 7.1 (LLM provider abstraction), 7.4 (side effects context) +**Depends on:** 8.1 (LLM provider abstraction), 8.4 (side effects context) --- -## Phase 11 -- Interactive Visualization & Advanced Features - -### 11.1 -- Interactive Web Visualization (Partially Complete) - -> **Phase 2.7 progress:** `codegraph plot` (Phase 2.7.8) ships a self-contained HTML viewer with vis-network. It supports layout switching, color/size/cluster overlays, drill-down, community detection, and a detail panel. The remaining work is the server-based experience below. - -```bash -codegraph viz -``` - -Opens a local web UI at `localhost:3000` extending the static HTML viewer with: - -- Server-side filtering for large graphs (the current `plot` command embeds all data as JSON, scaling poorly past ~1K nodes) -- Lazy edge loading and progressive disclosure -- Code preview on hover (reads from source files via local server) -- Filter panel: toggle node kinds, confidence thresholds, test files -- Edge styling by type (imports=solid, calls=dashed, extends=bold) -- Persistent view state (zoom, pan, expanded nodes saved across sessions) - -**Data source:** Serve from DB via lightweight HTTP server, lazy-load on interaction. - -**New file:** `src/visualizer.js` +## Phase 12 -- Advanced Features -### 11.2 -- Dead Code Detection +### 12.1 -- Dead Code Detection ```bash codegraph dead @@ -1710,7 +1804,7 @@ Find functions/methods/classes with zero incoming edges (never called). Filters **Affected files:** `src/queries.js` -### 11.3 -- Cross-Repository Support (Monorepo) +### 12.2 -- Cross-Repository Support (Monorepo) Support multi-package monorepos with cross-package edges. @@ -1720,7 +1814,7 @@ Support multi-package monorepos with cross-package edges. - `codegraph build --workspace` to scan all packages - Impact analysis across package boundaries -### 11.4 -- Agentic Search +### 12.3 -- Agentic Search Recursive reference-following search that traces connections. @@ -1742,7 +1836,7 @@ codegraph agent-search "payment processing" **New file:** `src/agentic-search.js` -### 11.5 -- Refactoring Analysis +### 12.4 -- Refactoring Analysis LLM-powered structural analysis that identifies refactoring opportunities. The graph provides the structural data; the LLM interprets it. @@ -1757,9 +1851,9 @@ LLM-powered structural analysis that identifies refactoring opportunities. The g > **Note:** `hotspots` and `boundary_analysis` already have data foundations from Phase 2.5 (structure.js hotspots, boundaries.js evaluation). This phase adds LLM interpretation on top. -**Depends on:** 7.4 (`risk_score`, `complexity_notes`), 7.5 (module summaries) +**Depends on:** 8.4 (`risk_score`, `complexity_notes`), 8.5 (module summaries) -### 11.6 -- Auto-generated Docstrings +### 12.5 -- Auto-generated Docstrings ```bash codegraph annotate @@ -1768,7 +1862,7 @@ codegraph annotate --changed-only LLM-generated docstrings aware of callers, callees, and types. Diff-aware: only regenerate for functions whose code or dependencies changed. Stores in `docstrings` column on nodes table -- does not modify source files unless explicitly requested. -**Depends on:** 7.1 (LLM provider abstraction), 7.4 (side effects context) +**Depends on:** 8.1 (LLM provider abstraction), 8.4 (side effects context) > **Full spec:** See [llm-integration.md](./llm-integration.md) for detailed architecture, infrastructure tables, and prompt design for all LLM-powered features. @@ -1786,12 +1880,15 @@ Each phase includes targeted verification: | **2.5** | All 59 test files pass; integration tests for every new command; engine parity tests | | **2.7** | All 70 test files pass; CFG + AST + dataflow integration tests; extractors produce identical output to pre-refactoring inline extractors (shipped as v3.0.0) | | **3** | All existing tests pass; each refactored module produces identical output to the pre-refactoring version; unit tests for pure analysis modules; InMemoryRepository tests | -| **4** | `tsc --noEmit` passes with zero errors; all existing tests pass after migration; no runtime behavior changes | -| **5** | Compare `codegraph search` quality before/after descriptions; verify `side_effects` and `risk_score` populated for LLM-enriched builds | -| **6** | `codegraph ask "How does import resolution work?"` against codegraph itself; verify `trace_flow` and `get_started` produce coherent narration | -| **7** | Parse sample files for each new language, verify definitions/calls/imports | -| **8** | Test PR in a fork, verify GitHub Action comment with Mermaid graph and risk labels is posted | -| **9** | `codegraph viz` loads; `hotspots` returns ranked list with LLM commentary; `split_analysis` produces actionable output | +| **4** | Hand-annotated fixture projects with expected call edges; precision ≥85%, recall ≥80% for JS/TS; dead role sub-categories produce correct classifications on codegraph's own codebase | +| **5** | `tsc --noEmit` passes with zero errors; all existing tests pass after migration; no runtime behavior changes | +| **6** | Native full-build time reduced from ~1,400ms to ~700ms; 1-file rebuild complexity/CFG/dataflow data verified non-empty on native engine | +| **7** | Event pipeline emits progress events; plugin system loads and executes a sample plugin; confidence annotations appear on query output | +| **8** | Compare `codegraph search` quality before/after descriptions; verify `side_effects` and `risk_score` populated for LLM-enriched builds | +| **9** | `codegraph ask "How does import resolution work?"` against codegraph itself; verify `trace_flow` and `get_started` produce coherent narration | +| **10** | Parse sample files for each new language, verify definitions/calls/imports | +| **11** | Test PR in a fork, verify GitHub Action comment with Mermaid graph and risk labels is posted | +| **12** | `hotspots` returns ranked list with LLM commentary; `split_analysis` produces actionable output; dead code detection filters correctly | **Full integration test** after all phases: @@ -1804,7 +1901,6 @@ codegraph trace_flow handleRequest # LLM-narrated execution flow codegraph hotspots # Fragility report with risk scores codegraph diff-impact HEAD~5 codegraph review --pr 42 # LLM-enhanced PR review -codegraph viz ``` ---