refactor: deduplicate BFS impact traversal and centralize config loading by carlos-alm · Pull Request #463 · optave/codegraph

carlos-alm · 2026-03-16T09:54:43Z

Summary

Extract shared bfsTransitiveCallers() in src/domain/analysis/impact.js, replacing 4 near-identical BFS loops across audit.js, check.js, and impact.js (both fnImpactData and diffImpactData). An onVisit callback handles site-specific tracking without duplicating the core traversal.
Make 5 feature functions (auditData, checkData, manifestoData, complexityData, diffImpactData) accept opts.config to reuse the config already loaded once at CLI entry, falling back to loadConfig() for programmatic API callers.

Net result: -41 lines, single source of truth for caller BFS, no redundant config I/O from CLI paths.

Test plan

All 1862 tests pass (104 test files)
Lint clean (biome check)
Verify codegraph audit, codegraph check --staged, codegraph diff-impact, codegraph complexity, and codegraph check (manifesto mode) produce identical output before/after

Extract shared bfsTransitiveCallers() into domain/analysis/impact.js, replacing 4 near-identical BFS loops across audit, check, and impact modules. The onVisit callback handles site-specific tracking (edges, affected sets) without duplicating the core traversal. Make feature functions (audit, check, manifesto, complexity, diff-impact) accept opts.config to reuse the config already loaded at CLI entry, falling back to loadConfig() for programmatic API callers. Impact: 16 functions changed, 37 affected

claude · 2026-03-16T09:54:56Z

Claude finished @carlos-alm's task —— View job

PR Review: Deduplicate BFS Impact Traversal and Centralize Config Loading

Todo List:

Phase 1: Root cause analysis - understand the problem being solved
Phase 2: Configuration change detection (high scrutiny)
Phase 3: Test integrity verification
Phase 4: Code quality assessment
Phase 5: Backlog compliance rating
Read and analyze changed files
Provide final assessment with rating and recommendation

Phase 4: Code Quality Assessment ✅

Code Quality Analysis:

BFS Deduplication: Successfully extracted 4 near-identical 33-line BFS loops into a single shared bfsTransitiveCallers() function with an onVisit callback for site-specific logic. The extraction preserves exact behavior while eliminating ~100 lines of duplicate code.
Config Centralization: Clean implementation where 5 feature functions now accept opts.config parameter, falling back to loadConfig() for backward compatibility with programmatic API usage.
No Breaking Changes: All function signatures remain backward compatible. CLI commands pass ctx.config, while direct API usage continues to work unchanged.
Implementation Quality:
- Good separation of concerns: shared algorithm in domain/analysis/, site-specific callbacks in features
- Proper error handling and fallbacks maintained
- Consistent parameter patterns across all modified functions

Potential Issues: ⚠️ Cannot verify test/lint status due to permission restrictions, but no red flags in code structure.

Phase 5: Backlog Compliance

Zero-dep: ✅ No new runtime dependencies added

Foundation-aligned: ✅ Fully compliant - Pure refactoring that improves code maintainability without violating any core principles

Problem-fit: 3/5 - This is useful infrastructure work that improves maintainability and reduces redundant I/O, but doesn't directly address the core "lost AI agents" problem. It's good engineering hygiene that makes the codebase easier to navigate.

Breaking: No - All changes are backward compatible

Tier: 1 (zero-dep ✅ + foundation-aligned ✅)

Root Cause Analysis

Problem Being Solved: Legitimate code duplication (4x BFS loops, ~33 lines each) and unnecessary config I/O on CLI paths where config is already loaded
Why This Approach: Creates single source of truth for BFS traversal while maintaining backward compatibility via optional parameters
Risk Assessment: Very low risk - pure refactoring with no behavioral changes

Backlog Compliance

Zero-dep: ✅ — pure refactoring, no new dependencies
Foundation-aligned: ✅ — improves maintainability without violating core principles
Problem-fit: 3 — useful infrastructure work, doesn't directly help AI navigation
Breaking: No — all changes backward compatible
Tier: 1

Critical Concerns

None identified. This is clean, well-structured refactoring work.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: High-quality code consolidation that eliminates real duplication and unnecessary I/O. The refactoring is well-executed with proper backward compatibility. Only limitation is that it doesn't directly address the core product mission, but infrastructure improvements like this are valuable for long-term maintainability.

Note: Unable to verify the "all 1862 tests pass" claim due to permission restrictions, but code analysis shows no breaking changes.

greptile-apps · 2026-03-16T09:57:37Z

Greptile Summary

This PR delivers two complementary cleanups: it extracts a shared bfsTransitiveCallers() helper from four near-identical BFS loops across audit.js, check.js, and impact.js, and it threads opts.config through five feature functions so the CLI only loads configuration once at startup instead of once per command.

Key changes:

bfsTransitiveCallers is now an exported function in src/domain/analysis/impact.js, accepting an onVisit callback for site-specific side effects (allAffected tracking and edge-building in diffImpactData). The callback ordering is correct — parent idToKey entries are always populated before a child's onVisit fires.
auditData, checkData, manifestoData, complexityData, and diffImpactData all accept opts.config and fall back to their original loadConfig(...) calls for programmatic callers.
getCallEdges() (SQL and InMemoryRepository) now returns the confidence column, enabling the previously broken JS-side minConfidence filter in buildFunctionLevelGraph for the Repository dispatch path.
New InMemoryRepository test cases for buildDependencyGraph close the zero-coverage gap identified in the previous review round.
The trade-off where opts.config is loaded from process.cwd() rather than the DB's parent directory is now documented with inline comments in both check.js and impact.js.

Confidence Score: 4/5

Safe to merge — the refactoring is behavior-preserving, well-tested, and all previously raised review concerns have been addressed.
All 1862 tests pass. The BFS extraction is logically equivalent to the four loops it replaces (same SQL via findDistinctCallers, same visited-set semantics, same totalDependents count). The onVisit ordering in diffImpactData is correct. The only deduction is a minor style nit in resolveThresholds where an IIFE wraps two cheap path calls unnecessarily.
No files require special attention. The IIFE in src/features/audit.js:22-29 is a readability nit, not a defect.

Important Files Changed

Filename	Overview
src/domain/analysis/impact.js	Introduces `bfsTransitiveCallers` as a shared, exported BFS primitive. Correctly handles `onVisit` ordering for `diffImpactData`'s `idToKey` population (parent keys are always set before children reach `onVisit`). JSDoc tightened to `import(better-sqlite3).Database` after previous review feedback.
src/features/audit.js	Replaces the `computeImpact` IIFE with `bfsTransitiveCallers`. Adds `opts.config` passthrough to `resolveThresholds`. Minor style nit: the IIFE fallback in `resolveThresholds` is unnecessarily complex for operations this cheap.
src/features/check.js	Replaces the inline BFS loop in `checkMaxBlastRadius` with `bfsTransitiveCallers`. Correctly uses `{ totalDependents: totalCallers }` destructuring rename. Documents the `opts.config` / `loadConfig(repoRoot)` trade-off with a comment.
src/graph/builders/dependency.js	JS-side `minConfidence` filter for the `Repository` path now actually filters instead of silently passing all edges through. Comment documents the O(all-edges) vs SQL-indexed trade-off. Clean follow-up to previous review thread.
tests/graph/builders/dependency.test.js	Adds four well-structured `InMemoryRepository` test cases covering file-level graph, `noTests` exclusion, self-loop skipping, `minConfidence` filtering, and the no-filter path. Directly addresses the gap from the prior review thread.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    BFS["bfsTransitiveCallers(db, startId, opts)\n─────────────────────────────\nonVisit callback for side effects\nreturns { totalDependents, levels }"]

    subgraph callers["Feature callers"]
        A["auditData\n(audit.js)"]
        B["checkMaxBlastRadius\n(check.js)"]
        C["fnImpactData\n(impact.js)"]
        D["diffImpactData\n(impact.js)\n+ onVisit: edges + allAffected"]
    end

    A -->|"bfsTransitiveCallers(db, nodeId, { noTests, maxDepth })"| BFS
    B -->|"bfsTransitiveCallers(db, def.id, { noTests, maxDepth })"| BFS
    C -->|"bfsTransitiveCallers(db, node.id, { noTests, maxDepth })"| BFS
    D -->|"bfsTransitiveCallers(db, fn.id, { noTests, maxDepth, onVisit })"| BFS

    BFS --> FDC["findDistinctCallers(db, fid)\nSELECT DISTINCT callers via edges table"]

    subgraph config["Config loading (CLI path)"]
        CTX["ctx.config\nloaded once at CLI startup\nfrom process.cwd()"]
    end

    CTX -->|"opts.config"| A
    CTX -->|"opts.config"| B
    CTX -->|"opts.config"| E["checkData\n(check.js)"]
    CTX -->|"opts.config"| D
    CTX -->|"opts.config"| F["complexityData\n(complexity.js)"]
    CTX -->|"opts.config"| G["manifestoData\n(manifesto.js)"]

_{Last reviewed commit: d6b80f2}

greptile-apps · 2026-03-16T09:57:40Z

src/features/check.js

+      const { totalDependents } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });
+      const totalCallers = totalDependents;


Redundant alias variable

totalCallers is just a rename of totalDependents with no transformation. You could destructure with a rename directly to keep things concise:

Suggested change

const { totalDependents } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });

const totalCallers = totalDependents;

const { totalDependents: totalCallers } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Fixed — replaced the two-line pattern with a single destructuring rename: const { totalDependents: totalCallers } = bfsTransitiveCallers(...).

Impact: 7 functions changed, 3 affected

carlos-alm · 2026-03-16T10:11:03Z

@greptileai

greptile-apps · 2026-03-16T10:16:03Z

src/graph/builders/dependency.js

+    if (isRepo) {
+      edges = db
+        .getCallEdges()
+        .filter((e) => e.confidence != null && e.confidence >= minConfidence);
+    } else {
+      edges = db
+        .prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?")
+        .all(minConfidence);


Full edge scan for minConfidence on Repository path

When a Repository instance is passed and minConfidence is set, this branch fetches every call edge from getCallEdges() and then filters in JavaScript, whereas the raw-db path issues a targeted SQL WHERE confidence >= ?. For large graphs with tens of thousands of call edges this could be a meaningful performance difference.

Consider adding a getCallEdgesByMinConfidence(minConfidence) method to the Repository interface so the Repository path can also push the filter down to the store (or to the in-memory map) instead of loading everything:

// e.g. in InMemoryRepository getCallEdgesByMinConfidence(min) { return [...this.#edges.values()] .filter((e) => e.kind === 'calls' && e.confidence != null && e.confidence >= min) .map((e) => ({ source_id: e.source_id, target_id: e.target_id, confidence: e.confidence })); }

If you're not ready to extend the interface, at minimum add a comment noting the trade-off so it's explicit for future maintainers.

Added a comment documenting the trade-off. A dedicated getCallEdgesByMinConfidence method is a good future improvement but out of scope for this deduplication PR.

greptile-apps · 2026-03-16T10:16:03Z

src/graph/builders/dependency.js

+  const isRepo = db instanceof Repository;

-  let nodes = getFileNodesAll(db);
+  let nodes = isRepo ? db.getFileNodesAll() : getFileNodesAll(db);


New isRepo paths are untested

The db instanceof Repository dispatch in both buildFileLevelGraph and buildFunctionLevelGraph (including the minConfidence filter branch) is not exercised by any test in tests/graph/builders/dependency.test.js — all existing tests pass a raw better-sqlite3 handle. Since this appears to be a net-new capability (no production caller currently passes a Repository to buildDependencyGraph), the Repository code paths have zero test coverage.

Consider adding test cases that pass an InMemoryRepository to buildDependencyGraph to verify the dispatch behaves identically to the raw-db path.

Added test cases that pass an InMemoryRepository to buildDependencyGraph for both file-level and function-level graphs, verifying the Repository dispatch path.

…t minConfidence trade-off Impact: 1 functions changed, 11 affected

carlos-alm · 2026-03-16T10:26:58Z

@greptileai

greptile-apps · 2026-03-16T10:32:16Z

src/domain/analysis/impact.js

+/**
+ * BFS traversal to find transitive callers of a node.
+ *
+ * @param {object} db - Open SQLite database handle
+ * @param {number} startId - Starting node ID
+ * @param {{ noTests?: boolean, maxDepth?: number, onVisit?: (caller: object, parentId: number, depth: number) => void }} options
+ * @returns {{ totalDependents: number, levels: Record<number, Array<{name:string, kind:string, file:string, line:number}>> }}
+ */


JSDoc db type overly broad for an exported function

bfsTransitiveCallers is now part of the public API (export function). The JSDoc describes db only as {object} - Open SQLite database handle, but the function's body calls findDistinctCallers(db, fid) from ../../db/index.js, which relies on cachedStmt (a WeakMap-backed prepared-statement cache) and therefore requires a raw better-sqlite3 handle — a Repository instance would fail at runtime here.

Tightening the type annotation and adding a note prevents future callers from accidentally passing a Repository:

Suggested change

/**

* BFS traversal to find transitive callers of a node.

*

* @param {object} db - Open SQLite database handle

* @param {number} startId - Starting node ID

* @param {{ noTests?: boolean, maxDepth?: number, onVisit?: (caller: object, parentId: number, depth: number) => void }} options

* @returns {{ totalDependents: number, levels: Record<number, Array<{name:string, kind:string, file:string, line:number}>> }}

*/

* @param {import('better-sqlite3').Database} db - Open read-only SQLite database handle (not a Repository)

Fixed in 404c2e9 — tightened the JSDoc type from generic object to import(better-sqlite3).Database with a note that Repository instances are not accepted.

…tabase

carlos-alm · 2026-03-16T10:48:37Z

@greptileai

greptile-apps · 2026-03-16T10:54:28Z

src/features/check.js


    // Load config defaults for check predicates
-    const config = loadConfig(repoRoot);
+    const config = opts.config || loadConfig(repoRoot);


opts.config origin may differ from old loadConfig(repoRoot) fallback

Previously checkData always loaded the config relative to the database file's parent directory:

// old const config = loadConfig(repoRoot); // repoRoot = dirname(dbPath)/..

Now the CLI path passes ctx.config, which is loadConfig(process.cwd()) — loaded once at startup from the shell's working directory. These are equivalent when the user invokes codegraph from the repo root, but will silently diverge if --db points to a database outside the current working tree (e.g. codegraph check --db /other/project/.codegraph/graph.db). In that scenario the check predicates will read thresholds from the caller's project config instead of the target project's config.

The same observation applies to diffImpactData in src/domain/analysis/impact.js. This is an acceptable trade-off for the hot path (avoids duplicate I/O), but it's worth documenting with a comment so future maintainers understand why the two sources can differ.

Added a comment documenting the trade-off — opts.config is loaded from process.cwd() at startup, which may differ from the DB's parent repo root when --db points externally. Accepted trade-off to avoid duplicate I/O.

Impact: 1 functions changed, 2 affected

Combine main's dbOrRepo variable fix and barrel import with the PR's improved minConfidence filtering (JS filter instead of silent drop). Impact: 14 functions changed, 13 affected

carlos-alm · 2026-03-16T11:14:07Z

@greptileai

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

fix: filter minConfidence for Repository path and simplify alias

2a51c29

Impact: 7 functions changed, 3 affected

Merge branch 'main' into refactor/deduplicate-bfs-and-config

6215e46

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

carlos-alm added 2 commits March 16, 2026 04:16

Merge branch 'main' into refactor/deduplicate-bfs-and-config

dd5e2e5

test: add InMemoryRepository tests for dependency builder and documen…

4fe685f

…t minConfidence trade-off Impact: 1 functions changed, 11 affected

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

carlos-alm added 3 commits March 16, 2026 04:40

Merge branch 'main' into refactor/deduplicate-bfs-and-config

cc8bc26

docs: tighten bfsTransitiveCallers db param type to better-sqlite3 Da…

404c2e9

…tabase

Merge branch 'main' into refactor/deduplicate-bfs-and-config

f24b29b

greptile-apps bot reviewed Mar 16, 2026

View reviewed changes

carlos-alm added 2 commits March 16, 2026 04:56

docs: document opts.config cwd vs db-root trade-off in checkData

5e5f356

Impact: 1 functions changed, 2 affected

merge: resolve conflict with origin/main in dependency builder

05b0ba4

Combine main's dbOrRepo variable fix and barrel import with the PR's improved minConfidence filtering (JS filter instead of silent drop). Impact: 14 functions changed, 13 affected

Merge branch 'main' into refactor/deduplicate-bfs-and-config

d6b80f2

carlos-alm merged commit f0770f5 into main Mar 16, 2026
14 checks passed

carlos-alm deleted the refactor/deduplicate-bfs-and-config branch March 16, 2026 11:22

github-actions bot locked and limited conversation to collaborators Mar 16, 2026

		const { totalDependents } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });
		const totalCallers = totalDependents;

	const { totalDependents } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });
	const totalCallers = totalDependents;
	const { totalDependents: totalCallers } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });

Conversation

carlos-alm commented Mar 16, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Deduplicate BFS Impact Traversal and Centralize Config Loading

Phase 4: Code Quality Assessment ✅

Phase 5: Backlog Compliance

Root Cause Analysis

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 16, 2026

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 16, 2026

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 16, 2026

Uh oh!

greptile-apps bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 16, 2026 •

edited

Loading

greptile-apps bot commented Mar 16, 2026 •

edited

Loading

carlos-alm Mar 16, 2026 •

edited

Loading