Skip to content

feat: harden multi-repo registry and add structural analysis#18

Merged
carlos-alm merged 6 commits intomainfrom
feat/registry-hardening
Feb 22, 2026
Merged

feat: harden multi-repo registry and add structural analysis#18
carlos-alm merged 6 commits intomainfrom
feat/registry-hardening

Conversation

@carlos-alm
Copy link
Contributor

@carlos-alm carlos-alm commented Feb 22, 2026

Summary

  • Registry pruning: New pruneRegistry() removes stale entries where the repo directory no longer exists on disk, exposed via codegraph registry prune
  • MCP repo allowlist: --repos <names> option on codegraph mcp restricts which registered repos the MCP server can access (tool calls and list_repos are filtered)
  • Name collision auto-suffix: registerRepo now auto-suffixes when a basename collides with a different path (apiapi-2), preventing silent overwrites during auto-registration
  • Structural analysis: New src/structure.js module makes directories first-class graph nodes with containment edges and per-file/per-directory metrics (symbol density, avg fan-out, cohesion). Adds codegraph structure and codegraph hotspots CLI commands, MCP tools, and DOT/Mermaid directory clusters

Changed files

File Changes
src/registry.js pruneRegistry(), registerRepo() auto-suffix
src/mcp.js options.allowedRepos param + filtering
src/cli.js registry prune, --repos on mcp, structure, hotspots commands
src/index.js New exports
src/structure.js New module — build-time + query-time structure analysis
src/builder.js Directory collection, line counts, buildStructure call
src/db.js node_metrics table in schema
src/export.js Directory cluster support in DOT/Mermaid
tests/unit/registry.test.js Prune + collision tests
tests/unit/mcp.test.js Allowlist filtering tests
tests/unit/structure.test.js Structure metrics unit tests
tests/integration/cli.test.js CLI integration tests
tests/integration/structure.test.js Full build + query integration tests

Test plan

  • npm run lint — 0 errors
  • npm test — 362 passed, 0 failed
  • Manual: codegraph registry prune on a system with stale entries
  • Manual: codegraph mcp --repos repo1,repo2 restricts tool access
  • Manual: codegraph structure and codegraph hotspots on a real project

🤖 Generated with Claude Code

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 22, 2026

Greptile Summary

Adds directories as first-class graph nodes with computed metrics (cohesion, fan-in/out, symbol density) and two new commands (structure, hotspots) plus MCP tools for AI integration.

Key changes:

  • New src/structure.js module with buildStructure() for metrics computation and query functions
  • node_metrics table added to v1 schema (no migration needed since there are no existing users)
  • collectFiles() now tracks directories during traversal
  • DOT exporter uses DB directory nodes with cohesion labels (fallback to path.dirname() for old DBs)
  • Registry enhancements: auto-suffixing for basename collisions, prune command for stale entries
  • MCP server gains --repos allowlist for access control
  • 20 new tests (7 unit, 13 integration) with comprehensive coverage

Security concern:

  • hotspotsData() in src/structure.js:329 uses string interpolation for SQL ORDER BY clause. While metricToColumn() validates inputs via switch statement, this pattern is vulnerable if future changes allow untrusted input to bypass validation.

Confidence Score: 4/5

  • Safe to merge with one SQL injection risk requiring attention
  • The implementation is well-architected with comprehensive tests (20 new tests, all passing), proper error handling, and backward compatibility. However, the SQL injection vulnerability in hotspotsData() with dynamic ORDER BY construction prevents a perfect score. The risk is currently mitigated by input validation in metricToColumn(), but the pattern is fragile.
  • src/structure.js requires fixing the SQL injection risk at line 329

Important Files Changed

Filename Overview
src/structure.js New module introducing directory-level graph nodes, metrics computation (cohesion, fan-in/out), and query functions. SQL injection risk in hotspotsData line 329 with dynamic ORDER BY.
src/builder.js Updated collectFiles to track directories and added structure analysis integration after edge building. Clean implementation with proper error handling.
src/db.js Added node_metrics table to v1 schema with proper foreign keys and indexes. Schema design is sound.
src/export.js DOT exporter now uses directory nodes from DB with cohesion labels, falls back gracefully to path.dirname() for old DBs.
src/cli.js Added structure and hotspots commands with proper option parsing, plus registry prune command and MCP --repos filter option.
src/mcp.js Added structure and hotspots MCP tools with proper schema validation, plus allowedRepos access control for multi-repo scenarios.

Entity Relationship Diagram

%%{init: {'theme': 'neutral'}}%%
erDiagram
    nodes ||--o{ node_metrics : "has metrics"
    nodes ||--o{ edges : "source"
    nodes ||--o{ edges : "target"
    
    nodes {
        int id PK
        string name
        string kind "file, directory, function, class, etc"
        string file
        int line
        int end_line
    }
    
    node_metrics {
        int node_id PK,FK
        int line_count "files only"
        int symbol_count "deduplicated definitions"
        int import_count "files only"
        int export_count "files only"
        int fan_in "cross-boundary imports"
        int fan_out "cross-boundary imports"
        real cohesion "intra_edges divided by total_edges"
        int file_count "directories only"
    }
    
    edges {
        int id PK
        int source_id FK
        int target_id FK
        string kind "imports, contains, calls, etc"
        real confidence
        int dynamic
    }
Loading

Last reviewed commit: 691fefe

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

src/structure.js Outdated
FROM nodes n
JOIN node_metrics nm ON n.id = nm.node_id
WHERE n.kind = ?
ORDER BY ${orderCol} DESC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: ${orderCol} is directly interpolated into the SQL query. While metricToColumn() validates the input with a switch statement, this pattern is fragile. If future code changes allow untrusted input to reach this function, it creates a vulnerability.

Suggested change
ORDER BY ${orderCol} DESC
ORDER BY ${orderCol} DESC NULLS LAST

Also consider using parameterized queries or a whitelist of allowed column names directly in the query builder for defense in depth.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on NULLS LAST — added in a41668f.

On the SQL injection concern: metricToColumn() is a closed switch/case that only returns hardcoded column expressions (nm.fan_in, nm.fan_out, etc.) with a safe default fallback. No external input reaches the SQL string. That said, the NULLS LAST addition is a genuine correctness fix — without it NULL metrics would sort to the top.

Registry hardening (3 of 4 audit concerns):
- Add pruneRegistry() to remove stale entries where repo dir no longer exists
- Add --repos allowlist on MCP server for repo-level access control
- Auto-suffix name collisions in registerRepo (api → api-2) when no explicit name

Structural analysis (new):
- Add src/structure.js with directory nodes, containment edges, and metrics
  (symbol density, avg fan-out, cohesion scores)
- Add structure/hotspots CLI commands
- Extend DOT/Mermaid export with directory clusters
- Add 'directory' and 'contains' kinds to DB schema

CLI additions:
- codegraph registry prune
- codegraph mcp --repos <names>
- codegraph structure [dir]
- codegraph hotspots
@carlos-alm carlos-alm force-pushed the feat/registry-hardening branch from 691fefe to a413ea7 Compare February 22, 2026 08:58
@carlos-alm carlos-alm changed the title feat: structure analysis — directories as first-class graph nodes feat: harden multi-repo registry and add structural analysis Feb 22, 2026
Ensures NULL metrics sort to the end rather than the top when
ranking hotspots by fan-in, fan-out, or density.
…oding guide

Remove normalizePath import from parser.js and inline the trivial path
normalization at the two call sites, eliminating the file-level cycle
that codegraph detected when analyzing its own codebase.

Add dogfooding section to CLAUDE.md and docs/dogfooding-guide.md with
self-analysis findings and improvement action items.
Replace metricToColumn() string interpolation with a static map of
pre-built prepared statements — one per metric. No dynamic string
ever reaches SQL now, regardless of future callers.
The MCP server now only exposes the local project's graph by default.
Multi-repo access (repo param, list_repos tool) requires explicit
opt-in via --multi-repo or --repos, preventing AI agents from silently
querying other registered codebases.
@carlos-alm carlos-alm merged commit ece0d99 into main Feb 22, 2026
13 checks passed
@carlos-alm carlos-alm deleted the feat/registry-hardening branch February 22, 2026 09:43
@carlos-alm carlos-alm mentioned this pull request Mar 1, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant