feat: harden multi-repo registry and add structural analysis by carlos-alm · Pull Request #18 · optave/codegraph

carlos-alm · 2026-02-22T08:52:34Z

Summary

Registry pruning: New pruneRegistry() removes stale entries where the repo directory no longer exists on disk, exposed via codegraph registry prune
MCP repo allowlist: --repos <names> option on codegraph mcp restricts which registered repos the MCP server can access (tool calls and list_repos are filtered)
Name collision auto-suffix: registerRepo now auto-suffixes when a basename collides with a different path (api → api-2), preventing silent overwrites during auto-registration
Structural analysis: New src/structure.js module makes directories first-class graph nodes with containment edges and per-file/per-directory metrics (symbol density, avg fan-out, cohesion). Adds codegraph structure and codegraph hotspots CLI commands, MCP tools, and DOT/Mermaid directory clusters

Changed files

File	Changes
`src/registry.js`	`pruneRegistry()`, `registerRepo()` auto-suffix
`src/mcp.js`	`options.allowedRepos` param + filtering
`src/cli.js`	`registry prune`, `--repos` on mcp, `structure`, `hotspots` commands
`src/index.js`	New exports
`src/structure.js`	New module — build-time + query-time structure analysis
`src/builder.js`	Directory collection, line counts, `buildStructure` call
`src/db.js`	`node_metrics` table in schema
`src/export.js`	Directory cluster support in DOT/Mermaid
`tests/unit/registry.test.js`	Prune + collision tests
`tests/unit/mcp.test.js`	Allowlist filtering tests
`tests/unit/structure.test.js`	Structure metrics unit tests
`tests/integration/cli.test.js`	CLI integration tests
`tests/integration/structure.test.js`	Full build + query integration tests

Test plan

npm run lint — 0 errors
npm test — 362 passed, 0 failed
Manual: codegraph registry prune on a system with stale entries
Manual: codegraph mcp --repos repo1,repo2 restricts tool access
Manual: codegraph structure and codegraph hotspots on a real project

🤖 Generated with Claude Code

greptile-apps · 2026-02-22T08:55:48Z

Greptile Summary

Adds directories as first-class graph nodes with computed metrics (cohesion, fan-in/out, symbol density) and two new commands (structure, hotspots) plus MCP tools for AI integration.

Key changes:

New src/structure.js module with buildStructure() for metrics computation and query functions
node_metrics table added to v1 schema (no migration needed since there are no existing users)
collectFiles() now tracks directories during traversal
DOT exporter uses DB directory nodes with cohesion labels (fallback to path.dirname() for old DBs)
Registry enhancements: auto-suffixing for basename collisions, prune command for stale entries
MCP server gains --repos allowlist for access control
20 new tests (7 unit, 13 integration) with comprehensive coverage

Security concern:

hotspotsData() in src/structure.js:329 uses string interpolation for SQL ORDER BY clause. While metricToColumn() validates inputs via switch statement, this pattern is vulnerable if future changes allow untrusted input to bypass validation.

Confidence Score: 4/5

Safe to merge with one SQL injection risk requiring attention
The implementation is well-architected with comprehensive tests (20 new tests, all passing), proper error handling, and backward compatibility. However, the SQL injection vulnerability in hotspotsData() with dynamic ORDER BY construction prevents a perfect score. The risk is currently mitigated by input validation in metricToColumn(), but the pattern is fragile.
src/structure.js requires fixing the SQL injection risk at line 329

Important Files Changed

Filename	Overview
src/structure.js	New module introducing directory-level graph nodes, metrics computation (cohesion, fan-in/out), and query functions. SQL injection risk in `hotspotsData` line 329 with dynamic ORDER BY.
src/builder.js	Updated `collectFiles` to track directories and added structure analysis integration after edge building. Clean implementation with proper error handling.
src/db.js	Added `node_metrics` table to v1 schema with proper foreign keys and indexes. Schema design is sound.
src/export.js	DOT exporter now uses directory nodes from DB with cohesion labels, falls back gracefully to `path.dirname()` for old DBs.
src/cli.js	Added `structure` and `hotspots` commands with proper option parsing, plus `registry prune` command and MCP `--repos` filter option.
src/mcp.js	Added `structure` and `hotspots` MCP tools with proper schema validation, plus `allowedRepos` access control for multi-repo scenarios.

Entity Relationship Diagram

%%{init: {'theme': 'neutral'}}%%
erDiagram
    nodes ||--o{ node_metrics : "has metrics"
    nodes ||--o{ edges : "source"
    nodes ||--o{ edges : "target"
    
    nodes {
        int id PK
        string name
        string kind "file, directory, function, class, etc"
        string file
        int line
        int end_line
    }
    
    node_metrics {
        int node_id PK,FK
        int line_count "files only"
        int symbol_count "deduplicated definitions"
        int import_count "files only"
        int export_count "files only"
        int fan_in "cross-boundary imports"
        int fan_out "cross-boundary imports"
        real cohesion "intra_edges divided by total_edges"
        int file_count "directories only"
    }
    
    edges {
        int id PK
        int source_id FK
        int target_id FK
        string kind "imports, contains, calls, etc"
        real confidence
        int dynamic
    }

_{Last reviewed commit: 691fefe}

greptile-apps

_{13 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-22T08:55:51Z

src/structure.js

+      FROM nodes n
+      JOIN node_metrics nm ON n.id = nm.node_id
+      WHERE n.kind = ?
+      ORDER BY ${orderCol} DESC


SQL injection risk: ${orderCol} is directly interpolated into the SQL query. While metricToColumn() validates the input with a switch statement, this pattern is fragile. If future code changes allow untrusted input to reach this function, it creates a vulnerability.

Suggested change

ORDER BY ${orderCol} DESC

ORDER BY ${orderCol} DESC NULLS LAST

Also consider using parameterized queries or a whitelist of allowed column names directly in the query builder for defense in depth.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Good catch on NULLS LAST — added in a41668f.

On the SQL injection concern: metricToColumn() is a closed switch/case that only returns hardcoded column expressions (nm.fan_in, nm.fan_out, etc.) with a safe default fallback. No external input reaches the SQL string. That said, the NULLS LAST addition is a genuine correctness fix — without it NULL metrics would sort to the top.

Registry hardening (3 of 4 audit concerns): - Add pruneRegistry() to remove stale entries where repo dir no longer exists - Add --repos allowlist on MCP server for repo-level access control - Auto-suffix name collisions in registerRepo (api → api-2) when no explicit name Structural analysis (new): - Add src/structure.js with directory nodes, containment edges, and metrics (symbol density, avg fan-out, cohesion scores) - Add structure/hotspots CLI commands - Extend DOT/Mermaid export with directory clusters - Add 'directory' and 'contains' kinds to DB schema CLI additions: - codegraph registry prune - codegraph mcp --repos <names> - codegraph structure [dir] - codegraph hotspots

Ensures NULL metrics sort to the end rather than the top when ranking hotspots by fan-in, fan-out, or density.

…oding guide Remove normalizePath import from parser.js and inline the trivial path normalization at the two call sites, eliminating the file-level cycle that codegraph detected when analyzing its own codebase. Add dogfooding section to CLAUDE.md and docs/dogfooding-guide.md with self-analysis findings and improvement action items.

Replace metricToColumn() string interpolation with a static map of pre-built prepared statements — one per metric. No dynamic string ever reaches SQL now, regardless of future callers.

The MCP server now only exposes the local project's graph by default. Multi-repo access (repo param, list_repos tool) requires explicit opt-in via --multi-repo or --repos, preventing AI agents from silently querying other registered codebases.

greptile-apps bot reviewed Feb 22, 2026

View reviewed changes

carlos-alm force-pushed the feat/registry-hardening branch from 691fefe to a413ea7 Compare February 22, 2026 08:58

carlos-alm changed the title ~~feat: structure analysis — directories as first-class graph nodes~~ feat: harden multi-repo registry and add structural analysis Feb 22, 2026

github-actions bot added 5 commits February 22, 2026 02:04

fix: add NULLS LAST to hotspots ORDER BY clause

a41668f

Ensures NULL metrics sort to the end rather than the top when ranking hotspots by fan-in, fan-out, or density.

fix: eliminate SQL interpolation in hotspotsData

f8790d7

Replace metricToColumn() string interpolation with a static map of pre-built prepared statements — one per metric. No dynamic string ever reaches SQL now, regardless of future callers.

docs: prohibit Claude Code / Anthropic references in output

85484e8

carlos-alm merged commit ece0d99 into main Feb 22, 2026
13 checks passed

carlos-alm deleted the feat/registry-hardening branch February 22, 2026 09:43

carlos-alm mentioned this pull request Mar 1, 2026

feat: add CODEOWNERS integration #195

Merged

4 tasks

claude bot mentioned this pull request Mar 2, 2026

feat: add CODEOWNERS module with parsing, matching, and ownership data #221

Closed

4 tasks

greptile-apps bot mentioned this pull request Mar 23, 2026

docs: update competitive analysis for v3.2.0 and March 2026 landscape #559

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: harden multi-repo registry and add structural analysis#18

feat: harden multi-repo registry and add structural analysis#18
carlos-alm merged 6 commits intomainfrom
feat/registry-hardening

carlos-alm commented Feb 22, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 22, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 22, 2026

Uh oh!

carlos-alm Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	ORDER BY ${orderCol} DESC
	ORDER BY ${orderCol} DESC NULLS LAST

Conversation

carlos-alm commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changed files

Test plan

Uh oh!

greptile-apps bot commented Feb 22, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Entity Relationship Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

carlos-alm commented Feb 22, 2026 •

edited

Loading