Skip to content

feat: Repository abstraction with InMemoryRepository (3.13)#444

Merged
carlos-alm merged 13 commits intomainfrom
feat/testing-pyramid
Mar 16, 2026
Merged

feat: Repository abstraction with InMemoryRepository (3.13)#444
carlos-alm merged 13 commits intomainfrom
feat/testing-pyramid

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Introduces Repository base class, InMemoryRepository (Maps-backed, zero SQLite dependency), and SqliteRepository (delegates to existing fn(db, ...) functions)
  • Adds fluent TestRepoBuilder fixture factory (tests/helpers/fixtures.js) for ergonomic test graph construction
  • 105 new tests: 51 unit tests for InMemoryRepository correctness + 54 parity tests verifying behavioral equivalence between both implementations

Test plan

  • npx vitest run tests/unit/in-memory-repository.test.js — 51 tests pass
  • npx vitest run tests/unit/repository-parity.test.js — 54 tests pass
  • npm test — no regressions in existing test suite
  • npm run lint — clean

…pyramid (3.13)

Introduce a Repository base class, InMemoryRepository (Maps-backed, no
SQLite), and SqliteRepository (delegates to existing fn(db,...) functions).
Includes fluent TestRepoBuilder fixture factory and 105 new tests
(51 unit + 54 parity) verifying behavioral equivalence between both
implementations. Existing code and tests remain unchanged.

Impact: 139 functions changed, 6 affected
@claude
Copy link

claude bot commented Mar 14, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Repository Abstraction (3.13)

Review Tasks

  • Phase 1: Root Cause Analysis
  • Phase 2: Configuration Change Detection
  • Phase 3: Test Integrity Assessment
  • Phase 4: Code Quality Review
  • Phase 5: Backlog Compliance Rating
  • Final Assessment & Recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR addresses testing pyramid concerns by introducing repository abstraction that enables fast in-memory testing without SQLite dependency. The current codebase likely has slow tests that require full SQLite database setup, making unit testing cumbersome.

Why This Approach:

  • Repository pattern abstracts data access layer
  • InMemoryRepository enables fast, isolated unit tests
  • SqliteRepository maintains backward compatibility with existing SQLite-based functions
  • TestRepoBuilder provides ergonomic test fixture creation

Risk Assessment: Low risk - this is a pure additive abstraction with no breaking changes to existing APIs.

Phase 2: Configuration Change Detection ✅

No Configuration Changes: ✅ No changes to biome.json, package.json, vitest.config.js, or other config files that could be silencing warnings/errors. This is purely additive code.

Phase 3: Test Integrity Assessment ✅

Test Quality: ✅ EXCELLENT test integrity

  • 51 unit tests for InMemoryRepository correctness with comprehensive coverage of all Repository methods
  • 54 parity tests using describe.each to run identical assertions against both SqliteRepository and InMemoryRepository
  • Tests use realistic test data with proper relationships (function calls, inheritance, etc.)
  • TestRepoBuilder provides ergonomic fluent API for test fixture creation
  • No weakened assertions or reduced test strictness
  • Tests verify behavioral equivalence between implementations

Test Design Quality:

  • Uses proper seeding functions for both repo types with identical test data
  • Proper test isolation with beforeEach/afterEach hooks
  • Tests all Repository interface methods systematically
  • Good edge case coverage (missing IDs, empty results, filtering)

Phase 4: Code Quality Review ✅

Architecture & Design Quality

✅ Excellent Architecture:

  • Repository pattern implementation is clean and well-structured
  • Abstract base class with proper inheritance hierarchy
  • Clear separation of concerns between data access implementations
  • SqliteRepository delegates to existing tested functions (no duplication)
  • InMemoryRepository provides identical interface backed by Maps

Implementation Quality

✅ High-Quality Code:

  • SQL LIKE emulation: likeToRegex() function correctly handles % and _ wildcards with proper escaping (src/db/repository/in-memory-repository.js:10-26)
  • Proper private fields: Uses # private fields throughout InMemoryRepository
  • Consistent error handling: Abstract base methods throw descriptive "not implemented" errors
  • Type safety: JSDoc annotations for parameters and return types
  • Performance considerations: Set-based deduplication, proper Map usage, sorted results
  • Edge case handling: Graceful handling of missing nodes/edges, proper null/undefined checks

Security Analysis

✅ No Security Issues:

  • No SQL injection risks (in-memory implementation)
  • No file system access vulnerabilities
  • No command execution or shell injection
  • Proper input sanitization in LIKE pattern conversion

Potential Issues

Minor observations (non-blocking):

  1. likeToRegex function re-implements SQL LIKE behavior - this is appropriate for this use case
  2. Some methods in InMemoryRepository have O(n) complexity vs SQLite's indexed queries, but this is expected and acceptable for test scenarios

Phase 5: Backlog Compliance Rating ✅

Backlog Compliance Assessment

Zero-dep: ✅ Pass - No new runtime dependencies added. Builds entirely on existing JavaScript Maps/Sets and existing SQLite functions.

Foundation-aligned: ✅ Pass - Fully aligned with FOUNDATION.md principles:

  • P1 (always-current): Testing infrastructure, no impact on core graph currency
  • P4 (zero-cost core): This is testing infrastructure, not core functionality
  • P5 (embeddable-first): Improves programmatic API testability
  • P8 (optimize for development): Directly improves developer testing experience

Problem-fit: 3/5 - Good fit. While this doesn't directly address AI token waste/hallucination/blast-radius issues, it enables better testing of the core functionality that does solve those problems. Testing pyramid improvements are foundation-level work that makes the core problem-solving code more maintainable.

Breaking: No - Purely additive. All existing APIs unchanged, new Repository interfaces are additive.

Tier Classification: Tier 1 (Zero-dep ✅ + Foundation-aligned ✅)


Root Cause Analysis

Problem Being Solved: Testing pyramid improvement - enables fast, isolated unit tests for codegraph query logic without SQLite overhead
Why This Approach: Repository pattern is industry-standard abstraction for data layer testability
Risk Assessment: Very low risk - pure additive abstraction

Backlog Compliance

  • Zero-dep: ✅ - no new dependencies
  • Foundation-aligned: ✅ - testing infrastructure aligns with all principles
  • Problem-fit: 3/5 - enables better testing of core problem-solving functionality
  • Breaking: No - purely additive
  • Tier: 1

Critical Concerns

None. This is well-executed foundational testing infrastructure.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Exceptional implementation quality, comprehensive test coverage, zero-risk additive change that improves codebase maintainability. This is precisely the kind of testing infrastructure that enables faster, more confident development of the core functionality. The 105 new tests and behavioral parity verification give high confidence in the implementation.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR introduces a clean Repository abstraction (Repository base class, InMemoryRepository, SqliteRepository) along with a fluent TestRepoBuilder fixture factory and 105 new tests (51 unit + 54 parity). The core abstractions are well-designed and the majority of previously-flagged divergences have been addressed.

Key changes:

  • Repository base class in base.js defines a throw-by-default interface covering all node lookups, edge queries, graph-read methods, and optional table checks.
  • InMemoryRepository in in-memory-repository.js backs the interface with plain Map stores, includes escapeLike applied to file-path LIKE filters, and correctly mirrors noTests exclusion patterns (__test__, __tests__, .stories.).
  • SqliteRepository in sqlite-repository.js is a thin delegation wrapper over existing fn(db, …) functions.
  • TestRepoBuilder in tests/helpers/fixtures.js provides a fluent builder with a duplicate-name guard and ordered check-before-insert.

One remaining divergence:

  • #iterateFunctionNodesImpl applies escapeLike(opts.pattern) before building the name-match regex, making _ a literal. The SQLite path calls nameLike(opts.pattern) in query-builder.js, which does not escape, so _ remains a SQL wildcard. This is the same class of divergence as the acknowledged fileFilter issue in findNodesWithFanIn, but it affects listFunctionNodes / iterateFunctionNodes and has not yet been discussed. The parity test for pattern filtering uses 'Baz' (no special characters), so this divergence is not caught by the test suite.

Confidence Score: 4/5

  • Safe to merge with one remaining parity gap in opts.pattern escaping for listFunctionNodes.
  • The implementation is thorough and most previously-flagged divergences have been fixed. The one remaining issue — opts.pattern escaping in #iterateFunctionNodesImpl vs SQLite's unescaped nameLike — mirrors a known, acknowledged pre-existing bug on the SQLite side (same as fileFilter), so the in-memory version is actually more correct. The risk is limited to callers passing patterns with _ or %, which is uncommon in practice. No regressions to existing tests, and the parity test suite covers the critical paths.
  • src/db/repository/in-memory-repository.js (lines 556-559) and tests/unit/repository-parity.test.js (pattern filter test) need attention for the opts.pattern escaping divergence.

Important Files Changed

Filename Overview
src/db/repository/in-memory-repository.js New 584-line Maps-backed repository; well-structured with escapeLike applied to opts.file filters, but opts.pattern in #iterateFunctionNodesImpl also applies escapeLike while SQLite's nameLike does not — creating an unacknowledged behavioral divergence for patterns containing _ or %.
src/db/repository/base.js Clean abstract base class defining the full Repository interface; all methods throw 'not implemented' by default; no issues found.
src/db/repository/sqlite-repository.js Thin, correct wrapper delegating all calls to existing fn(db, …) functions; exposes a db getter for legacy raw-access callers; no issues found.
tests/helpers/fixtures.js Fluent TestRepoBuilder with duplicate-name guard, clear error messages for unknown nodes, and correct ordering of check-before-insert; no issues found.
tests/unit/repository-parity.test.js 92 parity tests covering counts, node/edge lookups, import queries, scope/qualified-name with escapeLike, and noTests with tests and .stories. patterns; missing a parity test for opts.pattern with _ wildcard that would expose the nameLike escaping divergence.

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class Repository {
        <<abstract>>
        +findNodeById(id)
        +findNodesByFile(file)
        +findFileNodes(fileLike)
        +findNodesWithFanIn(namePattern, opts)
        +countNodes()
        +countEdges()
        +countFiles()
        +getNodeId(name, kind, file, line)
        +getFunctionNodeId(name, file, line)
        +bulkNodeIdsByFile(file)
        +findNodeChildren(parentId)
        +findNodesByScope(scopeName, opts)
        +findNodeByQualifiedName(qualifiedName, opts)
        +listFunctionNodes(opts)
        +iterateFunctionNodes(opts)
        +findNodesForTriage(opts)
        +findCallees(nodeId)
        +findCallers(nodeId)
        +findDistinctCallers(nodeId)
        +findAllOutgoingEdges(nodeId)
        +findAllIncomingEdges(nodeId)
        +findCalleeNames(nodeId)
        +findCallerNames(nodeId)
        +findImportTargets(nodeId)
        +findImportSources(nodeId)
        +findImportDependents(nodeId)
        +findCrossFileCallTargets(file)
        +countCrossFileCallers(nodeId, file)
        +getClassHierarchy(classNodeId)
        +findIntraFileCallEdges(file)
        +getCallableNodes()
        +getCallEdges()
        +getFileNodesAll()
        +getImportEdges()
        +hasCfgTables()
        +hasEmbeddings()
        +hasDataflowTable()
        +getComplexityForNode(nodeId)
    }

    class InMemoryRepository {
        -nodes Map
        -edges Map
        -complexity Map
        -nextNodeId int
        -nextEdgeId int
        +addNode(attrs) int
        +addEdge(attrs) int
        +addComplexity(nodeId, metrics)
        -escapeLike(s) string
        -likeToRegex(pattern) RegExp
        -computeFanIn() Map
        -iterateFunctionNodesImpl(opts) Iterator
    }

    class SqliteRepository {
        -db Database
        +db Database
    }

    class TestRepoBuilder {
        -pending object
        +fn(name, file, line, extra) TestRepoBuilder
        +method(name, file, line, extra) TestRepoBuilder
        +cls(name, file, line, extra) TestRepoBuilder
        +file(filePath) TestRepoBuilder
        +node(name, kind, file, line, extra) TestRepoBuilder
        +calls(sourceName, targetName) TestRepoBuilder
        +imports(sourceName, targetName) TestRepoBuilder
        +extends(sourceName, targetName) TestRepoBuilder
        +edge(sourceName, targetName, kind) TestRepoBuilder
        +complexity(name, metrics) TestRepoBuilder
        +build() object
    }

    Repository <|-- InMemoryRepository
    Repository <|-- SqliteRepository
    TestRepoBuilder ..> InMemoryRepository : creates
Loading

Last reviewed commit: 1f5b03b

Comment on lines +225 to +260
findNodesForTriage(opts = {}) {
const kindsToUse = opts.kind ? [opts.kind] : ['function', 'method', 'class'];
let nodes = [...this.#nodes.values()].filter((n) => kindsToUse.includes(n.kind));

if (opts.noTests) {
nodes = nodes.filter((n) => !n.file.includes('.test.') && !n.file.includes('.spec.'));
}
if (opts.file) {
const fileRe = likeToRegex(`%${opts.file}%`);
nodes = nodes.filter((n) => fileRe.test(n.file));
}
if (opts.role) {
nodes = nodes.filter((n) => n.role === opts.role);
}

const fanInMap = this.#computeFanIn();
return nodes
.sort((a, b) => a.file.localeCompare(b.file) || a.line - b.line)
.map((n) => {
const cx = this.#complexity.get(n.id);
return {
id: n.id,
name: n.name,
kind: n.kind,
file: n.file,
line: n.line,
end_line: n.end_line,
role: n.role,
fan_in: fanInMap.get(n.id) ?? 0,
cognitive: cx?.cognitive ?? 0,
mi: cx?.maintainability_index ?? 0,
cyclomatic: cx?.cyclomatic ?? 0,
max_nesting: cx?.max_nesting ?? 0,
churn: 0, // no co-change data in-memory
};
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing input validation for kind and role

The SQLite version in nodes.js:39-47 validates opts.kind against EVERY_SYMBOL_KIND and opts.role against VALID_ROLES, throwing a ConfigError for invalid values. The in-memory version silently accepts any value, which means invalid inputs will silently return empty results instead of throwing. This behavioral divergence will cause the parity tests to miss bugs where callers pass incorrect kind/role strings.

Suggested change
findNodesForTriage(opts = {}) {
const kindsToUse = opts.kind ? [opts.kind] : ['function', 'method', 'class'];
let nodes = [...this.#nodes.values()].filter((n) => kindsToUse.includes(n.kind));
if (opts.noTests) {
nodes = nodes.filter((n) => !n.file.includes('.test.') && !n.file.includes('.spec.'));
}
if (opts.file) {
const fileRe = likeToRegex(`%${opts.file}%`);
nodes = nodes.filter((n) => fileRe.test(n.file));
}
if (opts.role) {
nodes = nodes.filter((n) => n.role === opts.role);
}
const fanInMap = this.#computeFanIn();
return nodes
.sort((a, b) => a.file.localeCompare(b.file) || a.line - b.line)
.map((n) => {
const cx = this.#complexity.get(n.id);
return {
id: n.id,
name: n.name,
kind: n.kind,
file: n.file,
line: n.line,
end_line: n.end_line,
role: n.role,
fan_in: fanInMap.get(n.id) ?? 0,
cognitive: cx?.cognitive ?? 0,
mi: cx?.maintainability_index ?? 0,
cyclomatic: cx?.cyclomatic ?? 0,
max_nesting: cx?.max_nesting ?? 0,
churn: 0, // no co-change data in-memory
};
});
findNodesForTriage(opts = {}) {
if (opts.kind && !['function', 'method', 'class', 'interface', 'type', 'struct', 'enum', 'trait', 'record', 'module', 'parameter', 'property', 'constant'].includes(opts.kind)) {
throw new Error(`Invalid kind: ${opts.kind}`);
}
if (opts.role && !['entry', 'core', 'utility', 'adapter', 'dead', 'leaf'].includes(opts.role)) {
throw new Error(`Invalid role: ${opts.role}`);
}
const kindsToUse = opts.kind ? [opts.kind] : ['function', 'method', 'class'];
let nodes = [...this.#nodes.values()].filter((n) => kindsToUse.includes(n.kind));
if (opts.noTests) {
nodes = nodes.filter((n) => !n.file.includes('.test.') && !n.file.includes('.spec.'));
}
if (opts.file) {
const fileRe = likeToRegex(`%${opts.file}%`);
nodes = nodes.filter((n) => fileRe.test(n.file));
}
if (opts.role) {
nodes = nodes.filter((n) => n.role === opts.role);
}
const fanInMap = this.#computeFanIn();
return nodes
.sort((a, b) => a.file.localeCompare(b.file) || a.line - b.line)
.map((n) => {
const cx = this.#complexity.get(n.id);
return {
id: n.id,
name: n.name,
kind: n.kind,
file: n.file,
line: n.line,
end_line: n.end_line,
role: n.role,
fan_in: fanInMap.get(n.id) ?? 0,
cognitive: cx?.cognitive ?? 0,
mi: cx?.maintainability_index ?? 0,
cyclomatic: cx?.cyclomatic ?? 0,
max_nesting: cx?.max_nesting ?? 0,
churn: 0, // no co-change data in-memory
};
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added kind/role validation matching the SQLite version. InMemoryRepository.findNodesForTriage now imports ConfigError, EVERY_SYMBOL_KIND, and VALID_ROLES from kinds.js and throws ConfigError for invalid values. Also added parity tests for both invalid kind and invalid role.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2c9c5fc — added kind/role validation with ConfigError, matching SqliteRepository behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — validation for kind against EVERY_SYMBOL_KIND and role against VALID_ROLES was added in a previous push (lines 237-246). Parity tests for both invalid-kind and invalid-role throws are also in place.

Comment on lines +453 to +460
return results.sort((a, b) => {
const callerA = [...this.#nodes.values()].find(
(n) => n.name === a.caller_name && n.file === file,
);
const callerB = [...this.#nodes.values()].find(
(n) => n.name === b.caller_name && n.file === file,
);
return (callerA?.line ?? 0) - (callerB?.line ?? 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort comparator spreads nodes Map per comparison

Each call to the sort comparator spreads this.#nodes into a new array and does a linear .find() on it. With m nodes and r results, this is O(r * log(r) * m) work. Since the caller/callee nodes are already resolved in the loop above (lines 447-448), consider building a name→line lookup map before sorting to avoid the repeated scans.

Suggested change
return results.sort((a, b) => {
const callerA = [...this.#nodes.values()].find(
(n) => n.name === a.caller_name && n.file === file,
);
const callerB = [...this.#nodes.values()].find(
(n) => n.name === b.caller_name && n.file === file,
);
return (callerA?.line ?? 0) - (callerB?.line ?? 0);
return results.sort((a, b) => {
const nodesArr = [...this.#nodes.values()];
const callerA = nodesArr.find(
(n) => n.name === a.caller_name && n.file === file,
);
const callerB = nodesArr.find(
(n) => n.name === b.caller_name && n.file === file,
);

A more efficient approach would be to build the lookup once before sorting:

const lineByName = new Map();
for (const n of this.#nodes.values()) {
  if (n.file === file) lineByName.set(n.name, n.line);
}
return results.sort((a, b) => (lineByName.get(a.caller_name) ?? 0) - (lineByName.get(b.caller_name) ?? 0));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced the per-comparison [...this.#nodes.values()].find() with a pre-built lineByName Map. Now O(n) to build the map + O(r log r) to sort, instead of O(r * log(r) * n).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2c9c5fc — findIntraFileCallEdges now builds a lineByName lookup Map once before sorting instead of spreading the nodes Map per comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — findIntraFileCallEdges builds a lineByName lookup map once before sorting (lines 474-480), avoiding the repeated spread+find per comparator call.

*/
class TestRepoBuilder {
#pending = { nodes: [], edges: [], complexity: [] };
#nameIndex = new Map(); // name → { kind, file, line, ... }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#nameIndex is written but never read

#nameIndex is populated in #addNode (line 131) but is never used anywhere in the class. It appears to be dead code — possibly left over from an earlier iteration where it was used for edge resolution in build() (which now uses the ids Map instead).

Suggested change
#nameIndex = new Map(); // name → { kind, file, line, ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the dead #nameIndex field and its .set() call in #addNode. It was indeed leftover from an earlier iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2c9c5fc — removed dead #nameIndex field from TestRepoBuilder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already removed — #nameIndex no longer exists in the current version of fixtures.js.

- Add kind/role validation to InMemoryRepository.findNodesForTriage
  matching SqliteRepository behavior (throws ConfigError)
- Optimize findIntraFileCallEdges sort with pre-built lookup map
  instead of spreading nodes Map per comparison
- Remove dead #nameIndex field from TestRepoBuilder
- Add validation parity tests for invalid kind/role

Impact: 5 functions changed, 0 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +104 to +107
for (const n of this.#pending.nodes) {
const id = repo.addNode(n);
ids.set(n.name, id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate node names silently corrupt edge resolution

ids.set(n.name, id) uses the node name as the sole key. If a test graph contains two nodes with the same name (e.g. fn('render', 'components/A.js', 1) and fn('render', 'components/B.js', 1)), the second insertion silently overwrites the first. Any subsequent .calls(…, 'render') will then wire to the wrong node with no error thrown, producing a graph that looks valid but has incorrect edges.

The Error paths a few lines down (Unknown source node) only fire when a name is missing entirely — they offer no protection against the overwrite case.

Consider throwing on a detected duplicate during build():

    for (const n of this.#pending.nodes) {
      if (ids.has(n.name)) {
        throw new Error(`Duplicate node name "${n.name}" — use unique names or disambiguate with name@file:line`);
      }
      const id = repo.addNode(n);
      ids.set(n.name, id);
    }

Alternatively, the documentation for fn / node / etc. should prominently warn that names must be unique within a single builder chain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1ac56afbuild() now throws an Error if a duplicate node name is detected: Duplicate node name: "X" (already assigned ID N). This prevents silent edge corruption when the same name appears twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — build() checks ids.has(n.name) before calling repo.addNode(n), throwing on duplicates. See lines 105-109 in the current fixtures.js.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +297 to +306
findCallers(nodeId) {
const results = [];
for (const e of this.#edges.values()) {
if (e.target_id === nodeId && e.kind === 'calls') {
const n = this.#nodes.get(e.source_id);
if (n) results.push({ id: n.id, name: n.name, kind: n.kind, file: n.file, line: n.line });
}
}
return results;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findCallers doesn't deduplicate while findCallees does

findCallees (lines 276–295) uses a seen Set to suppress duplicate entries when the same target_id appears across multiple edges. findCallers has no equivalent guard. If two calls edges from the same source to the same target exist in the graph (which addEdge does not prevent), findCallees(source) returns the target once, while findCallers(target) returns the source twice.

This asymmetry is not covered by any unit or parity test — every test uses single edges per source-target pair, so neither a SQLite SELECT nor the in-memory scan ever exercises the duplicate case. If the underlying SQLite query for findCallers also returns duplicates in this scenario the implementations are consistent; if the SQL uses DISTINCT, the in-memory version diverges silently.

At minimum, a parity test with a duplicate edge would confirm the intended semantics:

it('findCallers with duplicate edges', () => {
  // Add two edges from the same caller to the same target
  repo.addEdge({ source_id: ids.bar, target_id: ids.foo, kind: 'calls' });
  const callers = repo.findCallers(ids.foo);
  // Assert expected count (1 or 3 depending on intended semantics)
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — the SQLite findCallers (edges.js:48-50) also has no DISTINCT, so the in-memory version correctly matches. findDistinctCallers is the deduplicated variant used for impact analysis BFS. The asymmetry with findCallees is pre-existing behavior that both implementations match on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a divergence — the SQLite \ query also does not use \ (it's a plain ). Both implementations return duplicates if duplicate edges exist. For deduplicated results, callers should use \ which exists for exactly this purpose.

Comment on lines +104 to +112
for (const n of this.#pending.nodes) {
const id = repo.addNode(n);
if (ids.has(n.name)) {
throw new Error(
`Duplicate node name: "${n.name}" — use unique names or qualify with file path`,
);
}
ids.set(n.name, id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate node added to repo before the check fires

repo.addNode(n) is called on line 105 before the ids.has(n.name) check on line 106. When a duplicate name is encountered, the duplicate node is inserted into repo first, and only then is the error thrown. Since build() propagates the error and the partially-built repo is never returned, there is no observable bug — but it's cleaner to check before adding:

Suggested change
for (const n of this.#pending.nodes) {
const id = repo.addNode(n);
if (ids.has(n.name)) {
throw new Error(
`Duplicate node name: "${n.name}" — use unique names or qualify with file path`,
);
}
ids.set(n.name, id);
}
for (const n of this.#pending.nodes) {
if (ids.has(n.name)) {
throw new Error(
`Duplicate node name: "${n.name}" — use unique names or qualify with file path`,
);
}
const id = repo.addNode(n);
ids.set(n.name, id);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 31ebeac — moved the duplicate-name check before repo.addNode() so the repo is never mutated when a duplicate is detected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed — the duplicate check now runs before addNode. See lines 105-109 in the current fixtures.js.

Move the duplicate-name guard before repo.addNode() so the repo
is not mutated when a duplicate is detected.

Impact: 2 functions changed, 0 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +199 to +202
if (opts.file) {
const fileRe = likeToRegex(`%${opts.file}%`);
nodes = nodes.filter((n) => fileRe.test(n.file));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts.file LIKE escaping divergence vs SQLite

findNodesByScope in nodes.js (line 276) uses escapeLike() + ESCAPE '\\' before building the LIKE pattern, so % and _ in opts.file are treated as literal characters. The in-memory version passes opts.file directly into likeToRegex(\%${opts.file}%`), which converts every _to a regex.(any character) and%to.*`.

Concrete divergence: findNodesByScope('SomeClass', { file: '__tests__' }) in SQLite correctly finds files containing the literal string __tests__, while the in-memory version matches files containing any two characters, then tests, then any two characters — returning false positives. The same applies at line 200 in findNodeByQualifiedName (lines 211–212).

Fix: apply the same escapeLike treatment before building the regex:

// after: import { likeToRegex } from the helper, OR add an escapeLike helper
const escapedFile = opts.file.replace(/[%_\\]/g, '\\$&'); // mirror escapeLike()
const fileRe = likeToRegex(`%${escapedFile}%`);

This brings the in-memory filtering in line with the SQL ESCAPE '\\' behaviour.

The same divergence exists in findNodeByQualifiedName at src/db/repository/in-memory-repository.js:211.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6ca47ea — added escapeLike() helper (mirrors nodes.js:254) and applied it to all 6 likeToRegex call sites in in-memory-repository.js. The _ and % chars are now treated as literals, matching SQLite ESCAPE behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — all opts.file usages in the in-memory repo now use escapeLike(opts.file) before building the LIKE pattern. See findNodesByScope (line 210), findNodeByQualifiedName (line 221), findNodesForTriage (line 254), and #iterateFunctionNodesImpl (line 547). The parity test for _ as literal in file filter (line 356) validates this.

Comment on lines +1 to +10
import Database from 'better-sqlite3';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { initSchema } from '../../src/db/migrations.js';
import { InMemoryRepository } from '../../src/db/repository/in-memory-repository.js';
import { SqliteRepository } from '../../src/db/repository/sqlite-repository.js';

/**
* Parity tests — run the same assertions against both SqliteRepository and
* InMemoryRepository to verify behavioral equivalence.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing parity coverage for several methods

Several InMemoryRepository methods that have unit tests in in-memory-repository.test.js are absent from the parity suite, meaning potential SQLite-vs-in-memory divergences in those methods go undetected:

  • findNodesByScope / findNodeByQualifiedName — the escaping divergence noted above wouldn't have been caught without parity tests
  • findNodesWithFanIn (name-pattern filtering + fan-in computation)
  • findFileNodes (LIKE pattern for file-kind nodes)
  • findNodeChildren
  • findImportTargets / findImportSources / findImportDependents
  • getCallableNodes / getFileNodesAll / getImportEdges
  • hasCfgTables / hasEmbeddings

The seeded SQLite fixture in seedSqliteRepo already has bar with scope = 'BarClass' and qualified_name = 'BarClass.bar', making it straightforward to add parity tests for the scope/qualified-name lookups without changing the fixture. Adding these tests would have caught the LIKE escaping issue automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6ca47ea — added parity tests for: findNodesByScope, findNodeByQualifiedName (with file filter), findNodesWithFanIn, findFileNodes, getFileNodesAll, findNodeChildren, findImportTargets, findImportSources, findImportDependents, getImportEdges, getCallableNodes, hasCfgTables, hasEmbeddings. Also added a LIKE-escaping test (findNodesByScope treats _ as literal in file filter) that would have caught the divergence. Seed fixtures now include file-kind nodes, import edges, and parent_id relationships.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added parity tests for findNodesByScope, findNodeByQualifiedName, findFileNodes, findNodeChildren, findNodesWithFanIn, import queries, getCallableNodes, getFileNodesAll, getImportEdges, hasCfgTables, and hasEmbeddings in the latest parity suite (92 tests total). The LIKE escaping test specifically covers the _ wildcard divergence.

findNodesByScope, findNodeByQualifiedName, findNodesWithFanIn,
listFunctionNodes, and findNodesForTriage all passed user-provided
strings directly into likeToRegex without escaping % and _ chars.
This caused divergence from SQLite which uses escapeLike + ESCAPE '\'.

Also adds parity tests for findNodesByScope, findNodeByQualifiedName,
findNodesWithFanIn, findFileNodes, findNodeChildren, import queries,
getCallableNodes, getFileNodesAll, getImportEdges, and hasCfgTables.

Impact: 7 functions changed, 5 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +554 to +555
if (opts.noTests) {
nodes = nodes.filter((n) => !n.file.includes('.test.') && !n.file.includes('.spec.'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noTests filter misses 3 exclusion patterns from SQLite

The in-memory noTests guard only checks for .test. and .spec. in the file path, but the SQLite equivalent (NodeQuery.excludeTests() in src/db/query-builder.js lines 157–163) applies five conditions:

n.file NOT LIKE '%.test.%'
n.file NOT LIKE '%.spec.%'
n.file NOT LIKE '%__test__%'
n.file NOT LIKE '%__tests__%'
n.file NOT LIKE '%.stories.%'

Files inside __tests__/ or __test__/ directories (the Jest/Vitest convention) and Storybook .stories.js files would be included by InMemoryRepository when noTests: true but excluded by SqliteRepository. This same divergence exists in both #iterateFunctionNodesImpl (affecting listFunctionNodes / iterateFunctionNodes) and findNodesForTriage (lines 250–251).

Because the parity fixture only seeds a tests/foo.test.js filename, the parity tests do not exercise any of the three missing patterns.

Suggested change
if (opts.noTests) {
nodes = nodes.filter((n) => !n.file.includes('.test.') && !n.file.includes('.spec.'));
if (opts.noTests) {
nodes = nodes.filter(
(n) =>
!n.file.includes('.test.') &&
!n.file.includes('.spec.') &&
!n.file.includes('__test__') &&
!n.file.includes('__tests__') &&
!n.file.includes('.stories.'),
);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added the three missing patterns (__test__, __tests__, .stories.) to match the SQLite excludeTests behavior in #iterateFunctionNodesImpl. Also added parity test fixtures with __tests__ and .stories. nodes so this divergence is caught automatically going forward.

Comment on lines +250 to +251
if (opts.noTests) {
nodes = nodes.filter((n) => !n.file.includes('.test.') && !n.file.includes('.spec.'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same noTests divergence in findNodesForTriage

The same three patterns (__test__, __tests__, .stories.) that are excluded by the SQLite implementation are missing here. See the comment at line 554 for the full analysis. Both occurrences need the same fix.

Suggested change
if (opts.noTests) {
nodes = nodes.filter((n) => !n.file.includes('.test.') && !n.file.includes('.spec.'));
if (opts.noTests) {
nodes = nodes.filter(
(n) =>
!n.file.includes('.test.') &&
!n.file.includes('.spec.') &&
!n.file.includes('__test__') &&
!n.file.includes('__tests__') &&
!n.file.includes('.stories.'),
);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — same three patterns added here in findNodesForTriage.

Comment on lines +133 to +136
if (opts.file) {
const fileRe = likeToRegex(`%${escapeLike(opts.file)}%`);
nodes = nodes.filter((n) => fileRe.test(n.file));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findNodesWithFanIn file filter escaping is stricter than SQLite

The SQLite findNodesWithFanIn calls q.fileFilter(opts.file) (query-builder.js line 170), which wraps the value with %…% but does not call escapeLike. This means in SQLite a _ in opts.file is treated as a LIKE wildcard (any single character). The in-memory version calls escapeLike(opts.file) first, making _ a literal.

In practice, this is a minor discrepancy (in-memory is actually more correct), but it causes a subtle parity gap for callers that pass file fragments containing underscores (e.g. src_utils, __init__). A parity test covering opts.file with an underscore would expose this difference. Consider either:

  • Removing escapeLike here to match SQLite behavior, or
  • Adding escapeLike to the SQLite fileFilter method (a wider fix) and documenting the semantic difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — the in-memory version is stricter (more correct) because it applies . The SQLite \ method in query-builder.js doesn't call , which is a pre-existing bug on the SQLite side. We'll fix that separately rather than regressing the in-memory version to match the buggy behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #446 — added \ to \ and \ in \ with , so \ and \ are now treated as literals. Also deduplicated the \ function — \ now imports it from . Added parity test for \ with underscore in file filter.

Comment on lines +209 to +213
it('listFunctionNodes excludes tests', () => {
const rows = repo.listFunctionNodes({ noTests: true });
expect(rows.length).toBe(4); // foo, bar, Baz, bazMethod
expect(rows.every((r) => !r.file.includes('.test.'))).toBe(true);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parity tests don't cover __tests__/.stories. exclusion patterns

The parity test for listFunctionNodes excludes tests seeds only a tests/foo.test.js filename (matched by .test.), so all three missing noTests patterns (__test__, __tests__, .stories.) are never exercised. Consider adding a node whose file contains one of these patterns to the seedSqliteRepo / seedInMemoryRepo fixtures, then asserting rows.every((r) => !r.file.includes('__tests__')). This would have caught the divergence described in the comment on in-memory-repository.js.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added jestFn (src/__tests__/helper.js) and storyFn (src/Button.stories.js) to both seed functions. The listFunctionNodes excludes tests parity test now asserts that none of the excluded patterns appear in the results.

The in-memory noTests filter only checked .test. and .spec. patterns,
missing __test__, __tests__, and .stories. that the SQLite query-builder
excludes. Add the missing patterns to both findNodesForTriage and
iterateFunctionNodes. Extend parity test fixtures with __tests__ and
.stories. nodes so the divergence is caught automatically.

Impact: 5 functions changed, 0 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +556 to +559
}
if (opts.pattern) {
const patternRe = likeToRegex(`%${escapeLike(opts.pattern)}%`);
nodes = nodes.filter((n) => patternRe.test(n.name));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts.pattern escaping diverges from SQLite's nameLike

The in-memory version applies escapeLike(opts.pattern) before building the regex, so a _ in the pattern is treated as a literal underscore. The SQLite path calls nameLike(opts.pattern) in query-builder.js, which does not escape, so _ is a LIKE wildcard (any single character) in SQL.

Concrete divergence: listFunctionNodes({ pattern: 'get_User' }) will return only get_User in InMemoryRepository, but will also return getAUser, getBUser, etc., in SqliteRepository.

This is the same class of divergence as the acknowledged fileFilter issue in findNodesWithFanIn, but it affects the opts.pattern / nameLike path and has not been discussed in previous threads. The resolution options are the same: either add escapeLike to nameLike in query-builder.js (preferred, fixes SQLite too), or document the intentional asymmetry here.

Comment on lines +233 to +236
it('listFunctionNodes filters by pattern', () => {
const rows = repo.listFunctionNodes({ pattern: 'Baz' });
expect(rows.length).toBe(2); // Baz + bazMethod (LIKE is case-insensitive)
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parity test for listFunctionNodes pattern doesn't cover LIKE wildcards

The pattern 'Baz' contains no LIKE special characters, so the escaping divergence between escapeLike(opts.pattern) (in-memory) and the raw nameLike(pattern) call on the SQL side is never exercised here. Adding a node whose name would only be matched if _ is treated as a wildcard would catch the divergence:

it('listFunctionNodes treats _ as literal in pattern', () => {
  // '_az' should NOT match 'Baz' — underscore is a literal, not a wildcard
  const rows = repo.listFunctionNodes({ pattern: '_az' });
  expect(rows.length).toBe(0);
});

Note: this test would currently fail on SqliteRepository because SQLite's nameLike doesn't escape, confirming the bug described in the comment on in-memory-repository.js.

@carlos-alm carlos-alm merged commit cdd7d29 into main Mar 16, 2026
66 checks passed
@carlos-alm carlos-alm deleted the feat/testing-pyramid branch March 16, 2026 06:25
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant