feat: harden multi-repo registry and add structural analysis#18
feat: harden multi-repo registry and add structural analysis#18carlos-alm merged 6 commits intomainfrom
Conversation
Greptile SummaryAdds directories as first-class graph nodes with computed metrics (cohesion, fan-in/out, symbol density) and two new commands ( Key changes:
Security concern:
Confidence Score: 4/5
Important Files Changed
Entity Relationship Diagram%%{init: {'theme': 'neutral'}}%%
erDiagram
nodes ||--o{ node_metrics : "has metrics"
nodes ||--o{ edges : "source"
nodes ||--o{ edges : "target"
nodes {
int id PK
string name
string kind "file, directory, function, class, etc"
string file
int line
int end_line
}
node_metrics {
int node_id PK,FK
int line_count "files only"
int symbol_count "deduplicated definitions"
int import_count "files only"
int export_count "files only"
int fan_in "cross-boundary imports"
int fan_out "cross-boundary imports"
real cohesion "intra_edges divided by total_edges"
int file_count "directories only"
}
edges {
int id PK
int source_id FK
int target_id FK
string kind "imports, contains, calls, etc"
real confidence
int dynamic
}
Last reviewed commit: 691fefe |
src/structure.js
Outdated
| FROM nodes n | ||
| JOIN node_metrics nm ON n.id = nm.node_id | ||
| WHERE n.kind = ? | ||
| ORDER BY ${orderCol} DESC |
There was a problem hiding this comment.
SQL injection risk: ${orderCol} is directly interpolated into the SQL query. While metricToColumn() validates the input with a switch statement, this pattern is fragile. If future code changes allow untrusted input to reach this function, it creates a vulnerability.
| ORDER BY ${orderCol} DESC | |
| ORDER BY ${orderCol} DESC NULLS LAST |
Also consider using parameterized queries or a whitelist of allowed column names directly in the query builder for defense in depth.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Good catch on NULLS LAST — added in a41668f.
On the SQL injection concern: metricToColumn() is a closed switch/case that only returns hardcoded column expressions (nm.fan_in, nm.fan_out, etc.) with a safe default fallback. No external input reaches the SQL string. That said, the NULLS LAST addition is a genuine correctness fix — without it NULL metrics would sort to the top.
Registry hardening (3 of 4 audit concerns): - Add pruneRegistry() to remove stale entries where repo dir no longer exists - Add --repos allowlist on MCP server for repo-level access control - Auto-suffix name collisions in registerRepo (api → api-2) when no explicit name Structural analysis (new): - Add src/structure.js with directory nodes, containment edges, and metrics (symbol density, avg fan-out, cohesion scores) - Add structure/hotspots CLI commands - Extend DOT/Mermaid export with directory clusters - Add 'directory' and 'contains' kinds to DB schema CLI additions: - codegraph registry prune - codegraph mcp --repos <names> - codegraph structure [dir] - codegraph hotspots
691fefe to
a413ea7
Compare
Ensures NULL metrics sort to the end rather than the top when ranking hotspots by fan-in, fan-out, or density.
…oding guide Remove normalizePath import from parser.js and inline the trivial path normalization at the two call sites, eliminating the file-level cycle that codegraph detected when analyzing its own codebase. Add dogfooding section to CLAUDE.md and docs/dogfooding-guide.md with self-analysis findings and improvement action items.
Replace metricToColumn() string interpolation with a static map of pre-built prepared statements — one per metric. No dynamic string ever reaches SQL now, regardless of future callers.
The MCP server now only exposes the local project's graph by default. Multi-repo access (repo param, list_repos tool) requires explicit opt-in via --multi-repo or --repos, preventing AI agents from silently querying other registered codebases.
Summary
pruneRegistry()removes stale entries where the repo directory no longer exists on disk, exposed viacodegraph registry prune--repos <names>option oncodegraph mcprestricts which registered repos the MCP server can access (tool calls andlist_reposare filtered)registerReponow auto-suffixes when a basename collides with a different path (api→api-2), preventing silent overwrites during auto-registrationsrc/structure.jsmodule makes directories first-class graph nodes with containment edges and per-file/per-directory metrics (symbol density, avg fan-out, cohesion). Addscodegraph structureandcodegraph hotspotsCLI commands, MCP tools, and DOT/Mermaid directory clustersChanged files
src/registry.jspruneRegistry(),registerRepo()auto-suffixsrc/mcp.jsoptions.allowedReposparam + filteringsrc/cli.jsregistry prune,--reposon mcp,structure,hotspotscommandssrc/index.jssrc/structure.jssrc/builder.jsbuildStructurecallsrc/db.jsnode_metricstable in schemasrc/export.jstests/unit/registry.test.jstests/unit/mcp.test.jstests/unit/structure.test.jstests/integration/cli.test.jstests/integration/structure.test.jsTest plan
npm run lint— 0 errorsnpm test— 362 passed, 0 failedcodegraph registry pruneon a system with stale entriescodegraph mcp --repos repo1,repo2restricts tool accesscodegraph structureandcodegraph hotspotson a real project🤖 Generated with Claude Code