Skip to content

refactor: deduplicate BFS impact traversal and centralize config loading#463

Merged
carlos-alm merged 11 commits intomainfrom
refactor/deduplicate-bfs-and-config
Mar 16, 2026
Merged

refactor: deduplicate BFS impact traversal and centralize config loading#463
carlos-alm merged 11 commits intomainfrom
refactor/deduplicate-bfs-and-config

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Extract shared bfsTransitiveCallers() in src/domain/analysis/impact.js, replacing 4 near-identical BFS loops across audit.js, check.js, and impact.js (both fnImpactData and diffImpactData). An onVisit callback handles site-specific tracking without duplicating the core traversal.
  • Make 5 feature functions (auditData, checkData, manifestoData, complexityData, diffImpactData) accept opts.config to reuse the config already loaded once at CLI entry, falling back to loadConfig() for programmatic API callers.

Net result: -41 lines, single source of truth for caller BFS, no redundant config I/O from CLI paths.

Test plan

  • All 1862 tests pass (104 test files)
  • Lint clean (biome check)
  • Verify codegraph audit, codegraph check --staged, codegraph diff-impact, codegraph complexity, and codegraph check (manifesto mode) produce identical output before/after

Extract shared bfsTransitiveCallers() into domain/analysis/impact.js,
replacing 4 near-identical BFS loops across audit, check, and impact
modules. The onVisit callback handles site-specific tracking (edges,
affected sets) without duplicating the core traversal.

Make feature functions (audit, check, manifesto, complexity, diff-impact)
accept opts.config to reuse the config already loaded at CLI entry,
falling back to loadConfig() for programmatic API callers.

Impact: 16 functions changed, 37 affected
@claude
Copy link

claude bot commented Mar 16, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Deduplicate BFS Impact Traversal and Centralize Config Loading

Todo List:

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection (high scrutiny)
  • Phase 3: Test integrity verification
  • Phase 4: Code quality assessment
  • Phase 5: Backlog compliance rating
  • Read and analyze changed files
  • Provide final assessment with rating and recommendation

Phase 4: Code Quality Assessment ✅

Code Quality Analysis:

  • BFS Deduplication: Successfully extracted 4 near-identical 33-line BFS loops into a single shared bfsTransitiveCallers() function with an onVisit callback for site-specific logic. The extraction preserves exact behavior while eliminating ~100 lines of duplicate code.
  • Config Centralization: Clean implementation where 5 feature functions now accept opts.config parameter, falling back to loadConfig() for backward compatibility with programmatic API usage.
  • No Breaking Changes: All function signatures remain backward compatible. CLI commands pass ctx.config, while direct API usage continues to work unchanged.
  • Implementation Quality:
    • Good separation of concerns: shared algorithm in domain/analysis/, site-specific callbacks in features
    • Proper error handling and fallbacks maintained
    • Consistent parameter patterns across all modified functions

Potential Issues: ⚠️ Cannot verify test/lint status due to permission restrictions, but no red flags in code structure.

Phase 5: Backlog Compliance

Zero-dep: ✅ No new runtime dependencies added

Foundation-aligned: ✅ Fully compliant - Pure refactoring that improves code maintainability without violating any core principles

Problem-fit: 3/5 - This is useful infrastructure work that improves maintainability and reduces redundant I/O, but doesn't directly address the core "lost AI agents" problem. It's good engineering hygiene that makes the codebase easier to navigate.

Breaking: No - All changes are backward compatible

Tier: 1 (zero-dep ✅ + foundation-aligned ✅)

Root Cause Analysis

Problem Being Solved: Legitimate code duplication (4x BFS loops, ~33 lines each) and unnecessary config I/O on CLI paths where config is already loaded
Why This Approach: Creates single source of truth for BFS traversal while maintaining backward compatibility via optional parameters
Risk Assessment: Very low risk - pure refactoring with no behavioral changes

Backlog Compliance

  • Zero-dep: ✅ — pure refactoring, no new dependencies
  • Foundation-aligned: ✅ — improves maintainability without violating core principles
  • Problem-fit: 3 — useful infrastructure work, doesn't directly help AI navigation
  • Breaking: No — all changes backward compatible
  • Tier: 1

Critical Concerns

None identified. This is clean, well-structured refactoring work.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality code consolidation that eliminates real duplication and unnecessary I/O. The refactoring is well-executed with proper backward compatibility. Only limitation is that it doesn't directly address the core product mission, but infrastructure improvements like this are valuable for long-term maintainability.

Note: Unable to verify the "all 1862 tests pass" claim due to permission restrictions, but code analysis shows no breaking changes.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR delivers two complementary cleanups: it extracts a shared bfsTransitiveCallers() helper from four near-identical BFS loops across audit.js, check.js, and impact.js, and it threads opts.config through five feature functions so the CLI only loads configuration once at startup instead of once per command.

Key changes:

  • bfsTransitiveCallers is now an exported function in src/domain/analysis/impact.js, accepting an onVisit callback for site-specific side effects (allAffected tracking and edge-building in diffImpactData). The callback ordering is correct — parent idToKey entries are always populated before a child's onVisit fires.
  • auditData, checkData, manifestoData, complexityData, and diffImpactData all accept opts.config and fall back to their original loadConfig(...) calls for programmatic callers.
  • getCallEdges() (SQL and InMemoryRepository) now returns the confidence column, enabling the previously broken JS-side minConfidence filter in buildFunctionLevelGraph for the Repository dispatch path.
  • New InMemoryRepository test cases for buildDependencyGraph close the zero-coverage gap identified in the previous review round.
  • The trade-off where opts.config is loaded from process.cwd() rather than the DB's parent directory is now documented with inline comments in both check.js and impact.js.

Confidence Score: 4/5

  • Safe to merge — the refactoring is behavior-preserving, well-tested, and all previously raised review concerns have been addressed.
  • All 1862 tests pass. The BFS extraction is logically equivalent to the four loops it replaces (same SQL via findDistinctCallers, same visited-set semantics, same totalDependents count). The onVisit ordering in diffImpactData is correct. The only deduction is a minor style nit in resolveThresholds where an IIFE wraps two cheap path calls unnecessarily.
  • No files require special attention. The IIFE in src/features/audit.js:22-29 is a readability nit, not a defect.

Important Files Changed

Filename Overview
src/domain/analysis/impact.js Introduces bfsTransitiveCallers as a shared, exported BFS primitive. Correctly handles onVisit ordering for diffImpactData's idToKey population (parent keys are always set before children reach onVisit). JSDoc tightened to import(better-sqlite3).Database after previous review feedback.
src/features/audit.js Replaces the computeImpact IIFE with bfsTransitiveCallers. Adds opts.config passthrough to resolveThresholds. Minor style nit: the IIFE fallback in resolveThresholds is unnecessarily complex for operations this cheap.
src/features/check.js Replaces the inline BFS loop in checkMaxBlastRadius with bfsTransitiveCallers. Correctly uses { totalDependents: totalCallers } destructuring rename. Documents the opts.config / loadConfig(repoRoot) trade-off with a comment.
src/graph/builders/dependency.js JS-side minConfidence filter for the Repository path now actually filters instead of silently passing all edges through. Comment documents the O(all-edges) vs SQL-indexed trade-off. Clean follow-up to previous review thread.
tests/graph/builders/dependency.test.js Adds four well-structured InMemoryRepository test cases covering file-level graph, noTests exclusion, self-loop skipping, minConfidence filtering, and the no-filter path. Directly addresses the gap from the prior review thread.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    BFS["bfsTransitiveCallers(db, startId, opts)\n─────────────────────────────\nonVisit callback for side effects\nreturns { totalDependents, levels }"]

    subgraph callers["Feature callers"]
        A["auditData\n(audit.js)"]
        B["checkMaxBlastRadius\n(check.js)"]
        C["fnImpactData\n(impact.js)"]
        D["diffImpactData\n(impact.js)\n+ onVisit: edges + allAffected"]
    end

    A -->|"bfsTransitiveCallers(db, nodeId, { noTests, maxDepth })"| BFS
    B -->|"bfsTransitiveCallers(db, def.id, { noTests, maxDepth })"| BFS
    C -->|"bfsTransitiveCallers(db, node.id, { noTests, maxDepth })"| BFS
    D -->|"bfsTransitiveCallers(db, fn.id, { noTests, maxDepth, onVisit })"| BFS

    BFS --> FDC["findDistinctCallers(db, fid)\nSELECT DISTINCT callers via edges table"]

    subgraph config["Config loading (CLI path)"]
        CTX["ctx.config\nloaded once at CLI startup\nfrom process.cwd()"]
    end

    CTX -->|"opts.config"| A
    CTX -->|"opts.config"| B
    CTX -->|"opts.config"| E["checkData\n(check.js)"]
    CTX -->|"opts.config"| D
    CTX -->|"opts.config"| F["complexityData\n(complexity.js)"]
    CTX -->|"opts.config"| G["manifestoData\n(manifesto.js)"]
Loading

Last reviewed commit: d6b80f2

Comment on lines +100 to +101
const { totalDependents } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });
const totalCallers = totalDependents;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant alias variable

totalCallers is just a rename of totalDependents with no transformation. You could destructure with a rename directly to keep things concise:

Suggested change
const { totalDependents } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });
const totalCallers = totalDependents;
const { totalDependents: totalCallers } = bfsTransitiveCallers(db, def.id, { noTests, maxDepth });

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced the two-line pattern with a single destructuring rename: const { totalDependents: totalCallers } = bfsTransitiveCallers(...).

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +76 to +83
if (isRepo) {
edges = db
.getCallEdges()
.filter((e) => e.confidence != null && e.confidence >= minConfidence);
} else {
edges = db
.prepare("SELECT source_id, target_id FROM edges WHERE kind = 'calls' AND confidence >= ?")
.all(minConfidence);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full edge scan for minConfidence on Repository path

When a Repository instance is passed and minConfidence is set, this branch fetches every call edge from getCallEdges() and then filters in JavaScript, whereas the raw-db path issues a targeted SQL WHERE confidence >= ?. For large graphs with tens of thousands of call edges this could be a meaningful performance difference.

Consider adding a getCallEdgesByMinConfidence(minConfidence) method to the Repository interface so the Repository path can also push the filter down to the store (or to the in-memory map) instead of loading everything:

// e.g. in InMemoryRepository
getCallEdgesByMinConfidence(min) {
  return [...this.#edges.values()]
    .filter((e) => e.kind === 'calls' && e.confidence != null && e.confidence >= min)
    .map((e) => ({ source_id: e.source_id, target_id: e.target_id, confidence: e.confidence }));
}

If you're not ready to extend the interface, at minimum add a comment noting the trade-off so it's explicit for future maintainers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment documenting the trade-off. A dedicated getCallEdgesByMinConfidence method is a good future improvement but out of scope for this deduplication PR.

Comment on lines +31 to +33
const isRepo = db instanceof Repository;

let nodes = getFileNodesAll(db);
let nodes = isRepo ? db.getFileNodesAll() : getFileNodesAll(db);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New isRepo paths are untested

The db instanceof Repository dispatch in both buildFileLevelGraph and buildFunctionLevelGraph (including the minConfidence filter branch) is not exercised by any test in tests/graph/builders/dependency.test.js — all existing tests pass a raw better-sqlite3 handle. Since this appears to be a net-new capability (no production caller currently passes a Repository to buildDependencyGraph), the Repository code paths have zero test coverage.

Consider adding test cases that pass an InMemoryRepository to buildDependencyGraph to verify the dispatch behaves identically to the raw-db path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test cases that pass an InMemoryRepository to buildDependencyGraph for both file-level and function-level graphs, verifying the Repository dispatch path.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +23 to +30
/**
* BFS traversal to find transitive callers of a node.
*
* @param {object} db - Open SQLite database handle
* @param {number} startId - Starting node ID
* @param {{ noTests?: boolean, maxDepth?: number, onVisit?: (caller: object, parentId: number, depth: number) => void }} options
* @returns {{ totalDependents: number, levels: Record<number, Array<{name:string, kind:string, file:string, line:number}>> }}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSDoc db type overly broad for an exported function

bfsTransitiveCallers is now part of the public API (export function). The JSDoc describes db only as {object} - Open SQLite database handle, but the function's body calls findDistinctCallers(db, fid) from ../../db/index.js, which relies on cachedStmt (a WeakMap-backed prepared-statement cache) and therefore requires a raw better-sqlite3 handle — a Repository instance would fail at runtime here.

Tightening the type annotation and adding a note prevents future callers from accidentally passing a Repository:

Suggested change
/**
* BFS traversal to find transitive callers of a node.
*
* @param {object} db - Open SQLite database handle
* @param {number} startId - Starting node ID
* @param {{ noTests?: boolean, maxDepth?: number, onVisit?: (caller: object, parentId: number, depth: number) => void }} options
* @returns {{ totalDependents: number, levels: Record<number, Array<{name:string, kind:string, file:string, line:number}>> }}
*/
* @param {import('better-sqlite3').Database} db - Open read-only SQLite database handle (not a Repository)

Copy link
Contributor Author

@carlos-alm carlos-alm Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 404c2e9 — tightened the JSDoc type from generic object to import(better-sqlite3).Database with a note that Repository instances are not accepted.

@carlos-alm
Copy link
Contributor Author

@greptileai


// Load config defaults for check predicates
const config = loadConfig(repoRoot);
const config = opts.config || loadConfig(repoRoot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts.config origin may differ from old loadConfig(repoRoot) fallback

Previously checkData always loaded the config relative to the database file's parent directory:

// old
const config = loadConfig(repoRoot); // repoRoot = dirname(dbPath)/..

Now the CLI path passes ctx.config, which is loadConfig(process.cwd()) — loaded once at startup from the shell's working directory. These are equivalent when the user invokes codegraph from the repo root, but will silently diverge if --db points to a database outside the current working tree (e.g. codegraph check --db /other/project/.codegraph/graph.db). In that scenario the check predicates will read thresholds from the caller's project config instead of the target project's config.

The same observation applies to diffImpactData in src/domain/analysis/impact.js. This is an acceptable trade-off for the hot path (avoids duplicate I/O), but it's worth documenting with a comment so future maintainers understand why the two sources can differ.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment documenting the trade-off — opts.config is loaded from process.cwd() at startup, which may differ from the DB's parent repo root when --db points externally. Accepted trade-off to avoid duplicate I/O.

Combine main's dbOrRepo variable fix and barrel import with the PR's
improved minConfidence filtering (JS filter instead of silent drop).

Impact: 14 functions changed, 13 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit f0770f5 into main Mar 16, 2026
14 checks passed
@carlos-alm carlos-alm deleted the refactor/deduplicate-bfs-and-config branch March 16, 2026 11:22
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant