From fec01d4213e85f9902364de263b37d4234364a41 Mon Sep 17 00:00:00 2001 From: carlos-alm <127798846+carlos-alm@users.noreply.github.com> Date: Mon, 16 Mar 2026 04:50:10 -0600 Subject: [PATCH 1/3] docs: promote #83 (brief command) and #71 (type inference) to Tier 0 in backlog These two items deliver the highest immediate impact on agent experience and graph accuracy without requiring Rust porting or TypeScript migration. They should be implemented before any Phase 4+ roadmap work. - #83: hook-optimized `codegraph brief` enriches passively-injected context - #71: basic type inference closes the biggest resolution gap for TS/Java --- docs/roadmap/BACKLOG.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/roadmap/BACKLOG.md b/docs/roadmap/BACKLOG.md index 7c7cea66..c5876a18 100644 --- a/docs/roadmap/BACKLOG.md +++ b/docs/roadmap/BACKLOG.md @@ -21,6 +21,17 @@ Each item has a short title, description, category, expected benefit, and four a ## Backlog +### Tier 0 — Promote before Phase 4-5 (highest immediate impact) + +These two items directly improve agent experience and graph accuracy today, without requiring Rust porting or TypeScript migration. They should be implemented before any Phase 4+ roadmap work begins. + +**Rationale:** Item #83 enriches the *passively-injected* context that agents actually see via hooks — the single highest-leverage surface for reducing blind edits. Item #71 closes the biggest accuracy gap in the graph for TypeScript and Java, where missing type-aware resolution causes hallucinated "no callers" results. + +| ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on | +|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| +| 83 | Hook-optimized `codegraph brief` command | New `codegraph brief ` command designed for Claude Code hook context injection. Returns a compact, token-efficient summary per file: each symbol with its role and caller count (e.g. `buildGraph [core, 12 callers]`), blast radius count on importers (`Imported by: src/cli.js (+8 transitive)`), and overall file risk tier. Current `deps --json` output used by `enrich-context.sh` is shallow — just file-level imports/importedBy and symbol names with no role or blast radius info. The `brief` command would include: **(a)** symbol roles in the output — knowing a file defines `core` vs `leaf` symbols changes editing caution; **(b)** per-symbol transitive caller counts — makes blast radius visible without a separate `fn-impact` call; **(c)** file-level risk tier (high/medium/low based on max fan-in and role composition). Output optimized for `additionalContext` injection — single compact block, not verbose JSON. Also add `--brief` flag to `deps` as an alias. | Embeddability | The `enrich-context.sh` hook is the only codegraph context agents actually see (they ignore CLAUDE.md instructions to run commands manually). Making that passively-injected context richer — with roles, caller counts, and risk tiers — directly reduces blind edits to high-impact code. Currently the hook shows `Defines: function buildGraph` but not that it's a core symbol with 12 transitive callers | ✓ | ✓ | 4 | No | — | +| 71 | Basic type inference for typed languages | Extract type annotations from TypeScript and Java AST nodes (variable declarations, function parameters, return types, generics) to resolve method calls through typed references. Currently `const x: Router = express.Router(); x.get(...)` produces no edge because `x.get` can't be resolved without knowing `x` is a `Router`. Tree-sitter already parses type annotations — we just don't use them for resolution. Start with declared types (no flow inference), which covers the majority of TS/Java code. | Resolution | Dramatically improves call graph completeness for TypeScript and Java — the two languages where developers annotate types explicitly and expect tooling to use them. Directly prevents hallucinated "no callers" results for methods called through typed variables | ✓ | ✓ | 5 | No | — | + ### Tier 1 — Zero-dep + Foundation-aligned (build these first) Non-breaking, ordered by problem-fit: @@ -144,7 +155,6 @@ These address fundamental limitations in the parsing and resolution pipeline tha | ID | Title | Description | Category | Benefit | Zero-dep | Foundation-aligned | Problem-fit (1-5) | Breaking | Depends on | |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------| -| 71 | Basic type inference for typed languages | Extract type annotations from TypeScript and Java AST nodes (variable declarations, function parameters, return types, generics) to resolve method calls through typed references. Currently `const x: Router = express.Router(); x.get(...)` produces no edge because `x.get` can't be resolved without knowing `x` is a `Router`. Tree-sitter already parses type annotations — we just don't use them for resolution. Start with declared types (no flow inference), which covers the majority of TS/Java code. | Resolution | Dramatically improves call graph completeness for TypeScript and Java — the two languages where developers annotate types explicitly and expect tooling to use them. Directly prevents hallucinated "no callers" results for methods called through typed variables | ✓ | ✓ | 5 | No | — | | 72 | Interprocedural dataflow analysis | Extend the existing intraprocedural dataflow (ID 14) to propagate `flows_to`/`returns`/`mutates` edges across function boundaries. When function A calls B with argument X, and B's dataflow shows X flows to its return value, connect A's call site to the downstream consumers of B's return. Requires stitching per-function dataflow summaries at call edges — no new parsing, just graph traversal over existing `dataflow` + `edges` tables. Start with single-level propagation (caller↔callee), not transitive closure. | Analysis | Current dataflow stops at function boundaries, missing the most important flows — data passing through helper functions, middleware chains, and factory patterns. Single-function scope means `dataflow` can't answer "where does this user input end up?" across call boundaries. Cross-function propagation is the difference between toy dataflow and useful taint-like analysis | ✓ | ✓ | 5 | No | 14 | | 73 | Improved dynamic call resolution | Upgrade the current "best-effort" dynamic dispatch resolution for Python, Ruby, and JavaScript. Three concrete improvements: **(a)** receiver-type tracking — when `x = SomeClass()` is followed by `x.method()`, resolve `method` to `SomeClass.method` using the assignment chain (leverages existing `ast_nodes` + `dataflow` tables); **(b)** common pattern recognition — resolve `EventEmitter.on('event', handler)` callback registration, `Promise.then/catch` chains, `Array.map/filter/reduce` with named function arguments, and decorator/annotation patterns; **(c)** confidence-tiered edges — mark dynamically-resolved edges with a confidence score (high for direct assignment, medium for pattern match, low for heuristic) so consumers can filter by reliability. | Resolution | In Python/Ruby/JS, 30-60% of real calls go through dynamic dispatch — method calls on variables, callbacks, event handlers, higher-order functions. The current best-effort resolution misses most of these, leaving massive gaps in the call graph for the languages where codegraph is most commonly used. Even partial improvement here has outsized impact on graph completeness | ✓ | ✓ | 5 | No | — | | 81 | Track dynamic `import()` and re-exports as graph edges | Extract `import()` expressions as `dynamic-imports` edges in both WASM extraction paths (query-based and walk-based). Destructured names (`const { a } = await import(...)`) feed into `importedNames` for call resolution. **Partially done:** WASM JS/TS extraction works (PR #389). Remaining: **(a)** native Rust engine support — `crates/codegraph-core/src/extractors/javascript.rs` doesn't extract `import()` calls; **(b)** non-static paths (`import(\`./plugins/${name}.js\`)`, `import(variable)`) are skipped with a debug warning; **(c)** re-export consumer counting in `exports --unused` only checks `calls` edges, not `imports`/`dynamic-imports` — symbols consumed only via import edges show as zero-consumer false positives. | Resolution | Fixes false "zero consumers" reports for symbols consumed via dynamic imports. 95 `dynamic-imports` edges found in codegraph's own codebase — these were previously invisible to impact analysis, exports audit, and dead-export hooks | ✓ | ✓ | 5 | No | — | @@ -163,7 +173,6 @@ These close gaps in search expressiveness, cross-repo navigation, implementation | 78 | Cross-repo symbol resolution | In multi-repo mode, resolve import edges that cross repository boundaries. When repo A imports `@org/shared-lib`, and repo B is `@org/shared-lib` in the registry, create cross-repo edges linking A's import to B's actual exported symbol. Requires matching npm/pip/go package names to registered repos. Store cross-repo edges with a `repo` qualifier in the `edges` table. Enables cross-repo `fn-impact` (changing a shared library function shows impact across all consuming repos), cross-repo `path` queries, and cross-repo `diff-impact`. | Navigation | Multi-repo mode currently treats each repo as isolated — agents can search across repos but can't trace dependencies between them. Cross-repo edges enable "if I change this shared utility, which downstream repos break?" — the highest-value question in monorepo and multi-repo architectures | ✓ | ✓ | 5 | No | — | | 79 | Advanced query language with boolean operators and output shaping | Extend `codegraph search` and `codegraph where` with a structured query syntax supporting: **(a)** boolean operators — `kind:function AND file:src/` , `name:parse OR name:extract`, `NOT kind:class`; **(b)** compound filters — `kind:method AND complexity.cognitive>15 AND role:core`; **(c)** output shaping — `--select symbols` (just names), `--select files` (distinct files), `--select owners` (CODEOWNERS for matches), `--select stats` (aggregate counts by kind/file/role); **(d)** result aggregation — `--group-by file`, `--group-by kind`, `--group-by community` with counts. Parse the query into a SQL WHERE clause against the `nodes`/`function_complexity`/`edges` tables. Expose as `query_language` MCP tool parameter. | Search | Current search is either keyword/semantic (fuzzy) or exact-name (`where`). Agents needing "all core functions with cognitive complexity > 15 in src/api/" must chain multiple commands and filter manually — wasting tokens on intermediate results. A structured query language answers compound questions in one call | ✓ | ✓ | 4 | No | — | | 80 | Find implementations in impact analysis | When a function signature or interface definition changes, automatically include all implementations/subtypes in `fn-impact` and `diff-impact` blast radius. Currently impact only follows `calls` edges — changing an interface method signature breaks every implementor, but this is invisible. Requires ID 74's `implements` edges. Add `--include-implementations` flag (on by default) to impact commands. | Analysis | Catches the most dangerous class of missed blast radius — interface/trait changes that silently break all implementors. A single method signature change on a widely-implemented interface can break dozens of files, none of which appear in the current call-graph-only impact analysis | ✓ | ✓ | 5 | No | 74 | -| 83 | Hook-optimized `codegraph brief` command | New `codegraph brief ` command designed for Claude Code hook context injection. Returns a compact, token-efficient summary per file: each symbol with its role and caller count (e.g. `buildGraph [core, 12 callers]`), blast radius count on importers (`Imported by: src/cli.js (+8 transitive)`), and overall file risk tier. Current `deps --json` output used by `enrich-context.sh` is shallow — just file-level imports/importedBy and symbol names with no role or blast radius info. The `brief` command would include: **(a)** symbol roles in the output — knowing a file defines `core` vs `leaf` symbols changes editing caution; **(b)** per-symbol transitive caller counts — makes blast radius visible without a separate `fn-impact` call; **(c)** file-level risk tier (high/medium/low based on max fan-in and role composition). Output optimized for `additionalContext` injection — single compact block, not verbose JSON. Also add `--brief` flag to `deps` as an alias. | Embeddability | The `enrich-context.sh` hook is the only codegraph context agents actually see (they ignore CLAUDE.md instructions to run commands manually). Making that passively-injected context richer — with roles, caller counts, and risk tiers — directly reduces blind edits to high-impact code. Currently the hook shows `Defines: function buildGraph` but not that it's a core symbol with 12 transitive callers | ✓ | ✓ | 4 | No | — | ### Tier 2 — Foundation-aligned, needs dependencies From 39f996787da027f70046d52145ad6b20f10638c6 Mon Sep 17 00:00:00 2001 From: carlos-alm <127798846+carlos-alm@users.noreply.github.com> Date: Tue, 17 Mar 2026 04:46:38 -0600 Subject: [PATCH 2/3] fix: include constant nodes in edge building and downstream queries (#487) The edge-building stage excluded 'constant'-kind nodes from the node lookup maps, so no import/dependency edges were ever created for exported constants. This made them invisible to where, fn-impact, query, roles, and all other graph queries. Also adds missing 'record' kind to build-edges.js to match CORE_SYMBOL_KINDS. Impact: 6 functions changed, 33 affected --- src/domain/analysis/context.js | 2 +- src/domain/graph/builder/stages/build-edges.js | 2 +- src/domain/graph/watcher.js | 4 ++-- src/features/export.js | 6 +++--- src/features/graph-enrichment.js | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/domain/analysis/context.js b/src/domain/analysis/context.js index e3409208..db157cf2 100644 --- a/src/domain/analysis/context.js +++ b/src/domain/analysis/context.js @@ -95,7 +95,7 @@ function explainFileImpl(db, target, getFileLines) { function explainFunctionImpl(db, target, noTests, getFileLines) { let nodes = db .prepare( - `SELECT * FROM nodes WHERE name LIKE ? AND kind IN ('function','method','class','interface','type','struct','enum','trait','record','module') ORDER BY file, line`, + `SELECT * FROM nodes WHERE name LIKE ? AND kind IN ('function','method','class','interface','type','struct','enum','trait','record','module','constant') ORDER BY file, line`, ) .all(`%${target}%`); if (noTests) nodes = nodes.filter((n) => !isTestFile(n.file)); diff --git a/src/domain/graph/builder/stages/build-edges.js b/src/domain/graph/builder/stages/build-edges.js index a8879b62..a1529454 100644 --- a/src/domain/graph/builder/stages/build-edges.js +++ b/src/domain/graph/builder/stages/build-edges.js @@ -28,7 +28,7 @@ export async function buildEdges(ctx) { // Pre-load all nodes into lookup maps const allNodes = db .prepare( - `SELECT id, name, kind, file, line FROM nodes WHERE kind IN ('function','method','class','interface','struct','type','module','enum','trait')`, + `SELECT id, name, kind, file, line FROM nodes WHERE kind IN ('function','method','class','interface','struct','type','module','enum','trait','record','constant')`, ) .all(); ctx.nodesByName = new Map(); diff --git a/src/domain/graph/watcher.js b/src/domain/graph/watcher.js index 15b4b4a6..3fea7954 100644 --- a/src/domain/graph/watcher.js +++ b/src/domain/graph/watcher.js @@ -57,10 +57,10 @@ export async function watchProject(rootDir, opts = {}) { countNodes: db.prepare('SELECT COUNT(*) as c FROM nodes WHERE file = ?'), countEdgesForFile: null, findNodeInFile: db.prepare( - "SELECT id, file FROM nodes WHERE name = ? AND kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module') AND file = ?", + "SELECT id, file FROM nodes WHERE name = ? AND kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'constant') AND file = ?", ), findNodeByName: db.prepare( - "SELECT id, file FROM nodes WHERE name = ? AND kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module')", + "SELECT id, file FROM nodes WHERE name = ? AND kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'constant')", ), listSymbols: db.prepare("SELECT name, kind, line FROM nodes WHERE file = ? AND kind != 'file'"), }; diff --git a/src/features/export.js b/src/features/export.js index 6f93faae..3bd064e3 100644 --- a/src/features/export.js +++ b/src/features/export.js @@ -67,8 +67,8 @@ function loadFunctionLevelEdges(db, { noTests, minConfidence, limit }) { FROM edges e JOIN nodes n1 ON e.source_id = n1.id JOIN nodes n2 ON e.target_id = n2.id - WHERE n1.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module') - AND n2.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module') + WHERE n1.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'constant') + AND n2.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'constant') AND e.kind = 'calls' AND e.confidence >= ? `, @@ -308,7 +308,7 @@ export function exportGraphSON(db, opts = {}) { let nodes = db .prepare(` SELECT id, name, kind, file, line, role FROM nodes - WHERE kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'file') + WHERE kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'constant', 'file') `) .all(); if (noTests) nodes = nodes.filter((n) => !isTestFile(n.file)); diff --git a/src/features/graph-enrichment.js b/src/features/graph-enrichment.js index 96e47e2c..adb9fb8e 100644 --- a/src/features/graph-enrichment.js +++ b/src/features/graph-enrichment.js @@ -42,8 +42,8 @@ function prepareFunctionLevelData(db, noTests, minConf, cfg) { FROM edges e JOIN nodes n1 ON e.source_id = n1.id JOIN nodes n2 ON e.target_id = n2.id - WHERE n1.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module') - AND n2.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module') + WHERE n1.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'constant') + AND n2.kind IN ('function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'constant') AND e.kind = 'calls' AND e.confidence >= ? `, From 8266b334442e0bf130ef5325f8c341a9f3ca5436 Mon Sep 17 00:00:00 2001 From: carlos-alm <127798846+carlos-alm@users.noreply.github.com> Date: Tue, 17 Mar 2026 05:10:47 -0600 Subject: [PATCH 3/3] fix: include constant kind in whereSymbol and findMatchingNodes queries (#487) whereSymbolImpl used ALL_SYMBOL_KINDS (core 10, no constant) so codegraph where could not find constants. findMatchingNodes defaulted to FUNCTION_KINDS which also excluded constant. Switch whereSymbolImpl to EVERY_SYMBOL_KIND and add constant to FUNCTION_KINDS default. Impact: 1 functions changed, 6 affected --- src/domain/analysis/symbol-lookup.js | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/domain/analysis/symbol-lookup.js b/src/domain/analysis/symbol-lookup.js index 47a7d403..f6e42913 100644 --- a/src/domain/analysis/symbol-lookup.js +++ b/src/domain/analysis/symbol-lookup.js @@ -14,11 +14,11 @@ import { openReadonlyOrFail, } from '../../db/index.js'; import { isTestFile } from '../../infrastructure/test-filter.js'; -import { ALL_SYMBOL_KINDS } from '../../shared/kinds.js'; +import { EVERY_SYMBOL_KIND } from '../../shared/kinds.js'; import { getFileHash, normalizeSymbol } from '../../shared/normalize.js'; import { paginateResult } from '../../shared/paginate.js'; -const FUNCTION_KINDS = ['function', 'method', 'class']; +const FUNCTION_KINDS = ['function', 'method', 'class', 'constant']; /** * Find nodes matching a name query, ranked by relevance. @@ -103,12 +103,12 @@ export function queryNameData(name, customDbPath, opts = {}) { } function whereSymbolImpl(db, target, noTests) { - const placeholders = ALL_SYMBOL_KINDS.map(() => '?').join(', '); + const placeholders = EVERY_SYMBOL_KIND.map(() => '?').join(', '); let nodes = db .prepare( `SELECT * FROM nodes WHERE name LIKE ? AND kind IN (${placeholders}) ORDER BY file, line`, ) - .all(`%${target}%`, ...ALL_SYMBOL_KINDS); + .all(`%${target}%`, ...EVERY_SYMBOL_KIND); if (noTests) nodes = nodes.filter((n) => !isTestFile(n.file)); const hc = new Map();