-
Notifications
You must be signed in to change notification settings - Fork 4
feat: add CODEOWNERS module with parsing, matching, and ownership data #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| # Codegraph Feature Backlog | ||
|
|
||
| **Last updated:** 2026-03-02 | ||
| **Last updated:** 2026-03-01 | ||
| **Source:** Features derived from [COMPETITIVE_ANALYSIS.md](../../generated/COMPETITIVE_ANALYSIS.md) and internal roadmap discussions. | ||
|
|
||
| --- | ||
|
|
@@ -31,18 +31,18 @@ Non-breaking, ordered by problem-fit: | |
| | 1 | ~~Dead code detection~~ | ~~Find symbols with zero incoming edges (excluding entry points and exports). Agents constantly ask "is this used?" — the graph already has the data, we just need to surface it. Inspired by narsil-mcp, axon, codexray, CKB.~~ | Analysis | ~~Agents stop wasting tokens investigating dead code; developers get actionable cleanup lists without external tools~~ | ✓ | ✓ | 4 | No | **DONE** — Delivered as part of node classification (ID 4). `codegraph roles --role dead -T` lists all symbols with zero fan-in that aren't exported. | | ||
| | 2 | ~~Shortest path A→B~~ | ~~BFS/Dijkstra on the existing edges table to find how symbol A reaches symbol B. We have `fn` for single-node chains but no A→B pathfinding. Inspired by codexray, arbor.~~ | Navigation | ~~Agents can answer "how does this function reach that one?" in one call instead of manually tracing chains~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph path <from> <to>` command with `--reverse`, `--max-depth`, `--kinds` options. BFS pathfinding on the edges table. `symbol_path` MCP tool. | | ||
| | 12 | ~~Execution flow tracing~~ | ~~Framework-aware entry point detection (Express routes, CLI commands, event handlers) + BFS flow tracing from entry to leaf. Inspired by axon, GitNexus, code-context-mcp.~~ | Navigation | ~~Agents can answer "what happens when a user hits POST /login?" by tracing the full execution path in one query~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph flow` command with entry point detection and BFS flow tracing. MCP tools `flow` and `entry_points` added. Merged in PR #118. | | ||
| | 16 | ~~Branch structural diff~~ | ~~Compare code structure between two branches using git worktrees. Show added/removed/changed symbols and their impact. Inspired by axon.~~ | Analysis | ~~Teams can review structural impact of feature branches before merge; agents get branch-aware context~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph branch-compare <base> <target>` builds separate graphs for each ref using git worktrees, diffs at the symbol level (added/removed/changed), and traces transitive caller impact via BFS. Supports `--json`, `--format mermaid`, `--depth`, `--no-tests`, `--engine`. `branchCompareData` and `branchCompareMermaid` exported from programmatic API. `branch_compare` MCP tool. | | ||
| | 20 | ~~Streaming / chunked results~~ | ~~Support streaming output for large query results so MCP clients and programmatic consumers can process incrementally.~~ | Embeddability | ~~Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload~~ | ✓ | ✓ | 4 | No | **DONE** — Universal pagination (`limit`/`offset` + `_pagination` metadata) on all 21 MCP tools with per-tool defaults in `MCP_DEFAULTS` and hard cap of 1000. NDJSON streaming (`--ndjson`/`--limit`/`--offset`) on ~14 CLI commands via shared `printNdjson` helper. Generator APIs (`iterListFunctions`, `iterRoles`, `iterWhere`, `iterComplexity`) using `better-sqlite3` `.iterate()` for memory-efficient streaming. PR #207. | | ||
| | 16 | ~~Branch structural diff~~ | ~~Compare code structure between two branches using git worktrees. Show added/removed/changed symbols and their impact. Inspired by axon.~~ | Analysis | ~~Teams can review structural impact of feature branches before merge; agents get branch-aware context~~ | ✓ | ✓ | 4 | No | **DONE** — `src/branch-compare.js` module with `branchCompareData`, `branchCompareMermaid`, `branchCompare`. CLI: `codegraph branch-compare <base> <target>` with `--depth`, `-T`, `-j`, `-f` options. MCP: `branch_compare` tool. Git worktree isolation, symbol-level diff, transitive impact analysis. | | ||
| | 20 | Streaming / chunked results | Support streaming output for large query results so MCP clients and programmatic consumers can process incrementally. | Embeddability | Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload | ✓ | ✓ | 4 | No | | ||
| | 27 | Composite audit command | Single `codegraph audit <file-or-function>` that combines `explain`, `fn-impact`, and code health metrics into one structured report per function. Core version uses graph data; enhanced version includes Phase 4.4 `risk_score`/`complexity_notes`/`side_effects` when available. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) Gauntlet phase. | Orchestration | Each sub-agent in a multi-agent swarm gets everything it needs to assess a function in one call instead of 3-4 — directly reduces token waste and round-trips | ✓ | ✓ | 4 | No | | ||
| | 28 | Batch querying | Accept a list of targets (file or JSON) and return all query results in one JSON payload. Applies to `audit`, `fn-impact`, `context`, and other per-symbol commands. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) swarm pattern. | Orchestration | A swarm of 20+ agents auditing different files can be fed from a single orchestrator call instead of N sequential invocations — reduces overhead and enables parallel dispatch | ✓ | ✓ | 4 | No | | ||
| | 29 | Triage priority queue | Single `codegraph triage` command that merges `map` connectivity, `hotspots` fan-in/fan-out, node roles, and optionally git churn + `risk_score` into one ranked audit queue. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) RECON phase. | Orchestration | Orchestrating agent gets a single prioritized list of what to audit first — replaces manual synthesis of 3+ commands, saves RECON phase from burning tokens on orientation | ✓ | ✓ | 4 | No | | ||
| | 32 | MCP orchestration tools | Expose `audit`, `triage`, and `check` as MCP tools alongside existing tools. Enables multi-agent orchestrators (Claude Code agent teams, custom MCP clients) to run the full Titan Paradigm loop through the MCP protocol without CLI overhead. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md). | Embeddability | Agents query the graph through MCP with zero CLI overhead — fewer tokens, faster round-trips, native integration with AI agent frameworks | ✓ | ✓ | 4 | No | | ||
| | 5 | ~~TF-IDF lightweight search~~ | ~~SQLite FTS5 + TF-IDF as a middle tier (~50MB) between "no search" and full transformer embeddings (~500MB). Provides decent keyword search with near-zero overhead. Inspired by codexray.~~ | Search | ~~Users get useful search without the 500MB embedding model download; faster startup for small projects~~ | ✓ | ✓ | 3 | No | **DONE** — Subsumed by ID 15 (Hybrid BM25 + semantic search). FTS5 full-text index (`fts_index` table) populated during `codegraph embed`. `--mode keyword` provides BM25-only search with zero embedding overhead. PR #198. | | ||
| | 5 | TF-IDF lightweight search | SQLite FTS5 + TF-IDF as a middle tier (~50MB) between "no search" and full transformer embeddings (~500MB). Provides decent keyword search with near-zero overhead. Inspired by codexray. | Search | Users get useful search without the 500MB embedding model download; faster startup for small projects | ✓ | ✓ | 3 | No | | ||
| | 13 | Architecture boundary rules | User-defined rules for allowed/forbidden dependencies between modules (e.g., "controllers must not import from other controllers"). Violations flagged in `diff-impact` and CI. Inspired by codegraph-rust, stratify. | Architecture | Prevents architectural decay in CI; agents are warned before introducing forbidden cross-module dependencies | ✓ | ✓ | 3 | No | | ||
| | 15 | ~~Hybrid BM25 + semantic search~~ | ~~Combine BM25 keyword matching with embedding-based semantic search using Reciprocal Rank Fusion. Better recall than either approach alone. Inspired by GitNexus, claude-context-local.~~ | Search | ~~Search results improve dramatically — keyword matches catch exact names, embeddings catch conceptual matches, RRF merges both~~ | ✓ | ✓ | 3 | No | **DONE** — FTS5 full-text index alongside embeddings for BM25 keyword search. `ftsSearchData()` for keyword-only, `hybridSearchData()` for RRF fusion. `search` command defaults to `--mode hybrid`, with `semantic` and `keyword` alternatives. MCP `semantic_search` tool gains `mode` parameter. Graceful fallback on older DBs. Zero new deps — FTS5 ships with better-sqlite3. PR #198. | | ||
| | 18 | ~~CODEOWNERS integration~~ | ~~Map graph nodes to CODEOWNERS entries. Show who owns each function, surface ownership boundaries in `diff-impact`. Inspired by CKB.~~ | Developer Experience | ~~`diff-impact` tells agents which teams to notify; ownership-aware impact analysis reduces missed reviews~~ | ✓ | ✓ | 3 | No | **DONE** — `src/owners.js` module with CODEOWNERS parser, matcher, and data functions. `codegraph owners [target]` CLI command with `--owner`, `--boundary`, `-f`, `-k`, `-T`, `-j` options. `code_owners` MCP tool. Integrated into `diff-impact` (affected owners + suggested reviewers). No new deps — glob patterns via ~30-line `patternToRegex`. PR #195. | | ||
| | 15 | Hybrid BM25 + semantic search | Combine BM25 keyword matching with embedding-based semantic search using Reciprocal Rank Fusion. Better recall than either approach alone. Inspired by GitNexus, claude-context-local. | Search | Search results improve dramatically — keyword matches catch exact names, embeddings catch conceptual matches, RRF merges both | ✓ | ✓ | 3 | No | | ||
|
Comment on lines
33
to
+42
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing "DONE" markers and implementation details from 6 already-completed features (IDs 5, 15, 16, 18, 20, 31). ID 18 specifically references the CODEOWNERS module that this PR claims to add, but it was already completed in PR #195. |
||
| | 18 | CODEOWNERS integration | Map graph nodes to CODEOWNERS entries. Show who owns each function, surface ownership boundaries in `diff-impact`. Inspired by CKB. | Developer Experience | `diff-impact` tells agents which teams to notify; ownership-aware impact analysis reduces missed reviews | ✓ | ✓ | 3 | No | | ||
| | 22 | ~~Manifesto-driven pass/fail~~ | ~~User-defined rule engine with custom thresholds (e.g. "cognitive > 15 = fail", "cyclomatic > 10 = fail", "imports > 10 = decompose"). Outputs pass/fail per function/file. Generalizes ID 13 (boundary rules) into a generic rule system.~~ | Analysis | ~~Enables autonomous multi-agent audit workflows (GAUNTLET pattern); CI integration for code health gates with configurable thresholds~~ | ✓ | ✓ | 3 | No | **DONE** — `codegraph manifesto` with 9 configurable rules (cognitive, cyclomatic, nesting, MI, Halstead volume/effort/bugs, fan-in, fan-out). Warn/fail thresholds via `.codegraphrc.json`. Exit code 1 on any fail-level breach — CI gate ready. PR #138. | | ||
| | 31 | ~~Graph snapshots~~ | ~~`codegraph snapshot save <name>` / `codegraph snapshot restore <name>` for lightweight SQLite DB backup and restore. Enables orchestrators to checkpoint before refactoring passes and instantly rollback without rebuilding. After Phase 4, also preserves embeddings and semantic metadata. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase.~~ | Orchestration | ~~Multi-agent workflows get instant rollback without re-running expensive builds or LLM calls — orchestrator checkpoints before each pass and restores on failure~~ | ✓ | ✓ | 3 | No | **DONE** — `codegraph snapshot` subcommand group with `save`, `restore`, `list`, `delete`. Uses SQLite `VACUUM INTO` for atomic, WAL-free snapshots in `.codegraph/snapshots/`. All 6 functions exported via programmatic API. PR #192. | | ||
| | 31 | Graph snapshots | `codegraph snapshot save <name>` / `codegraph snapshot restore <name>` for lightweight SQLite DB backup and restore. Enables orchestrators to checkpoint before refactoring passes and instantly rollback without rebuilding. After Phase 4, also preserves embeddings and semantic metadata. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | Orchestration | Multi-agent workflows get instant rollback without re-running expensive builds or LLM calls — orchestrator checkpoints before each pass and restores on failure | ✓ | ✓ | 3 | No | | ||
| | 30 | Change validation predicates | `codegraph check --staged` with configurable predicates: `--no-new-cycles`, `--max-blast-radius N`, `--no-signature-changes`, `--no-boundary-violations`. Returns exit code 0/1 for CI gates and state machines. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) STATE MACHINE phase. | CI | Automated rollback triggers without parsing JSON — orchestrators and CI pipelines get first-class pass/fail signals for blast radius, cycles, and contract changes | ✓ | ✓ | 3 | No | | ||
| | 6 | ~~Formal code health metrics~~ | ~~Cyclomatic complexity, Maintainability Index, and Halstead metrics per function — we already parse the AST, the data is there. Inspired by code-health-meter (published in ACM TOSEM 2025).~~ | Analysis | ~~Agents can prioritize refactoring targets; `hotspots` becomes richer with quantitative health scores per function~~ | ✓ | ✓ | 2 | No | **DONE** — `codegraph complexity` provides cognitive, cyclomatic, nesting depth, Halstead (volume, difficulty, effort, bugs), and Maintainability Index per function. `--health` for full Halstead view, `--sort mi` to rank by MI, `--above-threshold` for flagged functions. `function_complexity` DB table. `complexity` MCP tool. PR #130 + #139. | | ||
| | 11 | ~~Community detection~~ | ~~Leiden/Louvain algorithm to discover natural module boundaries vs actual file organization. Reveals which symbols are tightly coupled and whether the directory structure matches. Inspired by axon, GitNexus, CodeGraphMCPServer.~~ | Intelligence | ~~Surfaces architectural drift — when directory structure no longer matches actual dependency clusters; guides refactoring~~ | ✓ | ✓ | 2 | No | **DONE** — `codegraph communities` with Louvain algorithm. File-level and `--functions` function-level detection. `--drift` for drift analysis (split/merge candidates). `--resolution` tunable. `communities` MCP tool. PR #133/#134. | | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Date changed backwards from 2026-03-02 to 2026-03-01