refactor: deduplicate BFS impact traversal and centralize config loading#463
refactor: deduplicate BFS impact traversal and centralize config loading#463carlos-alm merged 11 commits intomainfrom
Conversation
Extract shared bfsTransitiveCallers() into domain/analysis/impact.js, replacing 4 near-identical BFS loops across audit, check, and impact modules. The onVisit callback handles site-specific tracking (edges, affected sets) without duplicating the core traversal. Make feature functions (audit, check, manifesto, complexity, diff-impact) accept opts.config to reuse the config already loaded at CLI entry, falling back to loadConfig() for programmatic API callers. Impact: 16 functions changed, 37 affected
|
Claude finished @carlos-alm's task —— View job PR Review: Deduplicate BFS Impact Traversal and Centralize Config LoadingTodo List:
Phase 4: Code Quality Assessment ✅Code Quality Analysis:
Potential Issues: Phase 5: Backlog ComplianceZero-dep: ✅ No new runtime dependencies added Foundation-aligned: ✅ Fully compliant - Pure refactoring that improves code maintainability without violating any core principles Problem-fit: 3/5 - This is useful infrastructure work that improves maintainability and reduces redundant I/O, but doesn't directly address the core "lost AI agents" problem. It's good engineering hygiene that makes the codebase easier to navigate. Breaking: No - All changes are backward compatible Tier: 1 (zero-dep ✅ + foundation-aligned ✅) Root Cause AnalysisProblem Being Solved: Legitimate code duplication (4x BFS loops, ~33 lines each) and unnecessary config I/O on CLI paths where config is already loaded Backlog Compliance
Critical ConcernsNone identified. This is clean, well-structured refactoring work. Final Recommendation
Note: Unable to verify the "all 1862 tests pass" claim due to permission restrictions, but code analysis shows no breaking changes. |
Greptile SummaryThis PR delivers two complementary cleanups: it extracts a shared Key changes:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
BFS["bfsTransitiveCallers(db, startId, opts)\n─────────────────────────────\nonVisit callback for side effects\nreturns { totalDependents, levels }"]
subgraph callers["Feature callers"]
A["auditData\n(audit.js)"]
B["checkMaxBlastRadius\n(check.js)"]
C["fnImpactData\n(impact.js)"]
D["diffImpactData\n(impact.js)\n+ onVisit: edges + allAffected"]
end
A -->|"bfsTransitiveCallers(db, nodeId, { noTests, maxDepth })"| BFS
B -->|"bfsTransitiveCallers(db, def.id, { noTests, maxDepth })"| BFS
C -->|"bfsTransitiveCallers(db, node.id, { noTests, maxDepth })"| BFS
D -->|"bfsTransitiveCallers(db, fn.id, { noTests, maxDepth, onVisit })"| BFS
BFS --> FDC["findDistinctCallers(db, fid)\nSELECT DISTINCT callers via edges table"]
subgraph config["Config loading (CLI path)"]
CTX["ctx.config\nloaded once at CLI startup\nfrom process.cwd()"]
end
CTX -->|"opts.config"| A
CTX -->|"opts.config"| B
CTX -->|"opts.config"| E["checkData\n(check.js)"]
CTX -->|"opts.config"| D
CTX -->|"opts.config"| F["complexityData\n(complexity.js)"]
CTX -->|"opts.config"| G["manifestoData\n(manifesto.js)"]
Last reviewed commit: d6b80f2 |
src/features/check.js
Outdated
| const { totalDependents } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth }); | ||
| const totalCallers = totalDependents; |
There was a problem hiding this comment.
Redundant alias variable
totalCallers is just a rename of totalDependents with no transformation. You could destructure with a rename directly to keep things concise:
| const { totalDependents } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth }); | |
| const totalCallers = totalDependents; | |
| const { totalDependents: totalCallers } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth }); |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — replaced the two-line pattern with a single destructuring rename: const { totalDependents: totalCallers } = bfsTransitiveCallers(...).
Impact: 7 functions changed, 3 affected
| if (isRepo) { | ||
| edges = db | ||
| .getCallEdges() | ||
| .filter((e) => e.confidence != null && e.confidence >= minConfidence); | ||
| } else { | ||
| edges = db | ||
| .prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?") | ||
| .all(minConfidence); |
There was a problem hiding this comment.
Full edge scan for minConfidence on Repository path
When a Repository instance is passed and minConfidence is set, this branch fetches every call edge from getCallEdges() and then filters in JavaScript, whereas the raw-db path issues a targeted SQL WHERE confidence >= ?. For large graphs with tens of thousands of call edges this could be a meaningful performance difference.
Consider adding a getCallEdgesByMinConfidence(minConfidence) method to the Repository interface so the Repository path can also push the filter down to the store (or to the in-memory map) instead of loading everything:
// e.g. in InMemoryRepository
getCallEdgesByMinConfidence(min) {
return [...this.#edges.values()]
.filter((e) => e.kind === 'calls' && e.confidence != null && e.confidence >= min)
.map((e) => ({ source_id: e.source_id, target_id: e.target_id, confidence: e.confidence }));
}
If you're not ready to extend the interface, at minimum add a comment noting the trade-off so it's explicit for future maintainers.
There was a problem hiding this comment.
Added a comment documenting the trade-off. A dedicated getCallEdgesByMinConfidence method is a good future improvement but out of scope for this deduplication PR.
src/graph/builders/dependency.js
Outdated
| const isRepo = db instanceof Repository; | ||
|
|
||
| let nodes = getFileNodesAll(db); | ||
| let nodes = isRepo ? db.getFileNodesAll() : getFileNodesAll(db); |
There was a problem hiding this comment.
New isRepo paths are untested
The db instanceof Repository dispatch in both buildFileLevelGraph and buildFunctionLevelGraph (including the minConfidence filter branch) is not exercised by any test in tests/graph/builders/dependency.test.js — all existing tests pass a raw better-sqlite3 handle. Since this appears to be a net-new capability (no production caller currently passes a Repository to buildDependencyGraph), the Repository code paths have zero test coverage.
Consider adding test cases that pass an InMemoryRepository to buildDependencyGraph to verify the dispatch behaves identically to the raw-db path.
There was a problem hiding this comment.
Added test cases that pass an InMemoryRepository to buildDependencyGraph for both file-level and function-level graphs, verifying the Repository dispatch path.
…t minConfidence trade-off Impact: 1 functions changed, 11 affected
| /** | ||
| * BFS traversal to find transitive callers of a node. | ||
| * | ||
| * @param {object} db - Open SQLite database handle | ||
| * @param {number} startId - Starting node ID | ||
| * @param {{ noTests?: boolean, maxDepth?: number, onVisit?: (caller: object, parentId: number, depth: number) => void }} options | ||
| * @returns {{ totalDependents: number, levels: Record<number, Array<{name:string, kind:string, file:string, line:number}>> }} | ||
| */ |
There was a problem hiding this comment.
JSDoc db type overly broad for an exported function
bfsTransitiveCallers is now part of the public API (export function). The JSDoc describes db only as {object} - Open SQLite database handle, but the function's body calls findDistinctCallers(db, fid) from ../../db/index.js, which relies on cachedStmt (a WeakMap-backed prepared-statement cache) and therefore requires a raw better-sqlite3 handle — a Repository instance would fail at runtime here.
Tightening the type annotation and adding a note prevents future callers from accidentally passing a Repository:
| /** | |
| * BFS traversal to find transitive callers of a node. | |
| * | |
| * @param {object} db - Open SQLite database handle | |
| * @param {number} startId - Starting node ID | |
| * @param {{ noTests?: boolean, maxDepth?: number, onVisit?: (caller: object, parentId: number, depth: number) => void }} options | |
| * @returns {{ totalDependents: number, levels: Record<number, Array<{name:string, kind:string, file:string, line:number}>> }} | |
| */ | |
| * @param {import('better-sqlite3').Database} db - Open read-only SQLite database handle (not a Repository) |
There was a problem hiding this comment.
Fixed in 404c2e9 — tightened the JSDoc type from generic object to import(better-sqlite3).Database with a note that Repository instances are not accepted.
|
|
||
| // Load config defaults for check predicates | ||
| const config = loadConfig(repoRoot); | ||
| const config = opts.config || loadConfig(repoRoot); |
There was a problem hiding this comment.
opts.config origin may differ from old loadConfig(repoRoot) fallback
Previously checkData always loaded the config relative to the database file's parent directory:
// old
const config = loadConfig(repoRoot); // repoRoot = dirname(dbPath)/..Now the CLI path passes ctx.config, which is loadConfig(process.cwd()) — loaded once at startup from the shell's working directory. These are equivalent when the user invokes codegraph from the repo root, but will silently diverge if --db points to a database outside the current working tree (e.g. codegraph check --db /other/project/.codegraph/graph.db). In that scenario the check predicates will read thresholds from the caller's project config instead of the target project's config.
The same observation applies to diffImpactData in src/domain/analysis/impact.js. This is an acceptable trade-off for the hot path (avoids duplicate I/O), but it's worth documenting with a comment so future maintainers understand why the two sources can differ.
There was a problem hiding this comment.
Added a comment documenting the trade-off — opts.config is loaded from process.cwd() at startup, which may differ from the DB's parent repo root when --db points externally. Accepted trade-off to avoid duplicate I/O.
Impact: 1 functions changed, 2 affected
Combine main's dbOrRepo variable fix and barrel import with the PR's improved minConfidence filtering (JS filter instead of silent drop). Impact: 14 functions changed, 13 affected
Summary
bfsTransitiveCallers()insrc/domain/analysis/impact.js, replacing 4 near-identical BFS loops acrossaudit.js,check.js, andimpact.js(bothfnImpactDataanddiffImpactData). AnonVisitcallback handles site-specific tracking without duplicating the core traversal.auditData,checkData,manifestoData,complexityData,diffImpactData) acceptopts.configto reuse the config already loaded once at CLI entry, falling back toloadConfig()for programmatic API callers.Net result: -41 lines, single source of truth for caller BFS, no redundant config I/O from CLI paths.
Test plan
codegraph audit,codegraph check --staged,codegraph diff-impact,codegraph complexity, andcodegraph check(manifesto mode) produce identical output before/after