feat: add CODEOWNERS module with parsing, matching, and ownership data#221
feat: add CODEOWNERS module with parsing, matching, and ownership data#221carlos-alm wants to merge 1 commit intomainfrom
Conversation
Adds src/owners.js implementing CODEOWNERS parsing (3 standard locations), glob-to-regex pattern matching (**, *, ?, anchored/unanchored), last-match-wins owner resolution, file-level and symbol-level ownership data, cross-owner boundary detection, and diff-impact integration helper. Includes unit tests (parseCodeownersContent, patternToRegex, matchOwners, parseCodeowners) and integration tests (ownersData, ownersForFiles with fixture DB and CODEOWNERS file). Also updates BACKLOG.md to mark branch structural diff as done and reorganize backlog items.
|
Claude finished @carlos-alm's task —— View job Root Cause AnalysisProblem Being Solved: This PR implements ID #18 from the backlog: "CODEOWNERS integration — Map graph nodes to CODEOWNERS entries. Show who owns each function, surface ownership boundaries in Why This Approach: The implementation follows CODEOWNERS specification correctly with:
Risk Assessment: This is a legitimate new feature with comprehensive test coverage. No configuration relaxation or test weakening detected. Phase 2: Configuration Change Detection ✅NO HIGH-RISK CHANGES DETECTED
Phase 3: Test Integrity ✅STRONG TEST COVERAGE
Phase 4: Code Quality Review✅ Strengths
|
|
Closing — these files are already on main. The local copies were untracked because the working branch was behind main. |
Greptile SummaryThis PR's title and description are critically misleading. The PR claims to add a new CODEOWNERS module with parsing, matching, and ownership features, but the actual changes are:
Critical issues:
This appears to be either an accidental commit, a rebase/merge error, or a mislabeled documentation revert. The code changes themselves are harmless (formatting cleanup), but the BACKLOG reversion is problematic and the PR metadata is completely incorrect. Confidence Score: 1/5
Important Files Changed
Last reviewed commit: 6ec2db9 |
| # Codegraph Feature Backlog | ||
|
|
||
| **Last updated:** 2026-03-02 | ||
| **Last updated:** 2026-03-01 |
There was a problem hiding this comment.
Date changed backwards from 2026-03-02 to 2026-03-01
| | 12 | ~~Execution flow tracing~~ | ~~Framework-aware entry point detection (Express routes, CLI commands, event handlers) + BFS flow tracing from entry to leaf. Inspired by axon, GitNexus, code-context-mcp.~~ | Navigation | ~~Agents can answer "what happens when a user hits POST /login?" by tracing the full execution path in one query~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph flow` command with entry point detection and BFS flow tracing. MCP tools `flow` and `entry_points` added. Merged in PR #118. | | ||
| | 16 | ~~Branch structural diff~~ | ~~Compare code structure between two branches using git worktrees. Show added/removed/changed symbols and their impact. Inspired by axon.~~ | Analysis | ~~Teams can review structural impact of feature branches before merge; agents get branch-aware context~~ | ✓ | ✓ | 4 | No | **DONE** — `codegraph branch-compare <base> <target>` builds separate graphs for each ref using git worktrees, diffs at the symbol level (added/removed/changed), and traces transitive caller impact via BFS. Supports `--json`, `--format mermaid`, `--depth`, `--no-tests`, `--engine`. `branchCompareData` and `branchCompareMermaid` exported from programmatic API. `branch_compare` MCP tool. | | ||
| | 20 | ~~Streaming / chunked results~~ | ~~Support streaming output for large query results so MCP clients and programmatic consumers can process incrementally.~~ | Embeddability | ~~Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload~~ | ✓ | ✓ | 4 | No | **DONE** — Universal pagination (`limit`/`offset` + `_pagination` metadata) on all 21 MCP tools with per-tool defaults in `MCP_DEFAULTS` and hard cap of 1000. NDJSON streaming (`--ndjson`/`--limit`/`--offset`) on ~14 CLI commands via shared `printNdjson` helper. Generator APIs (`iterListFunctions`, `iterRoles`, `iterWhere`, `iterComplexity`) using `better-sqlite3` `.iterate()` for memory-efficient streaming. PR #207. | | ||
| | 16 | ~~Branch structural diff~~ | ~~Compare code structure between two branches using git worktrees. Show added/removed/changed symbols and their impact. Inspired by axon.~~ | Analysis | ~~Teams can review structural impact of feature branches before merge; agents get branch-aware context~~ | ✓ | ✓ | 4 | No | **DONE** — `src/branch-compare.js` module with `branchCompareData`, `branchCompareMermaid`, `branchCompare`. CLI: `codegraph branch-compare <base> <target>` with `--depth`, `-T`, `-j`, `-f` options. MCP: `branch_compare` tool. Git worktree isolation, symbol-level diff, transitive impact analysis. | | ||
| | 20 | Streaming / chunked results | Support streaming output for large query results so MCP clients and programmatic consumers can process incrementally. | Embeddability | Large codebases don't blow up agent context windows; consumers process results as they arrive instead of waiting for the full payload | ✓ | ✓ | 4 | No | | ||
| | 27 | Composite audit command | Single `codegraph audit <file-or-function>` that combines `explain`, `fn-impact`, and code health metrics into one structured report per function. Core version uses graph data; enhanced version includes Phase 4.4 `risk_score`/`complexity_notes`/`side_effects` when available. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) Gauntlet phase. | Orchestration | Each sub-agent in a multi-agent swarm gets everything it needs to assess a function in one call instead of 3-4 — directly reduces token waste and round-trips | ✓ | ✓ | 4 | No | | ||
| | 28 | Batch querying | Accept a list of targets (file or JSON) and return all query results in one JSON payload. Applies to `audit`, `fn-impact`, `context`, and other per-symbol commands. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) swarm pattern. | Orchestration | A swarm of 20+ agents auditing different files can be fed from a single orchestrator call instead of N sequential invocations — reduces overhead and enables parallel dispatch | ✓ | ✓ | 4 | No | | ||
| | 29 | Triage priority queue | Single `codegraph triage` command that merges `map` connectivity, `hotspots` fan-in/fan-out, node roles, and optionally git churn + `risk_score` into one ranked audit queue. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md) RECON phase. | Orchestration | Orchestrating agent gets a single prioritized list of what to audit first — replaces manual synthesis of 3+ commands, saves RECON phase from burning tokens on orientation | ✓ | ✓ | 4 | No | | ||
| | 32 | MCP orchestration tools | Expose `audit`, `triage`, and `check` as MCP tools alongside existing tools. Enables multi-agent orchestrators (Claude Code agent teams, custom MCP clients) to run the full Titan Paradigm loop through the MCP protocol without CLI overhead. Inspired by [Titan Paradigm](../docs/use-cases/titan-paradigm.md). | Embeddability | Agents query the graph through MCP with zero CLI overhead — fewer tokens, faster round-trips, native integration with AI agent frameworks | ✓ | ✓ | 4 | No | | ||
| | 5 | ~~TF-IDF lightweight search~~ | ~~SQLite FTS5 + TF-IDF as a middle tier (~50MB) between "no search" and full transformer embeddings (~500MB). Provides decent keyword search with near-zero overhead. Inspired by codexray.~~ | Search | ~~Users get useful search without the 500MB embedding model download; faster startup for small projects~~ | ✓ | ✓ | 3 | No | **DONE** — Subsumed by ID 15 (Hybrid BM25 + semantic search). FTS5 full-text index (`fts_index` table) populated during `codegraph embed`. `--mode keyword` provides BM25-only search with zero embedding overhead. PR #198. | | ||
| | 5 | TF-IDF lightweight search | SQLite FTS5 + TF-IDF as a middle tier (~50MB) between "no search" and full transformer embeddings (~500MB). Provides decent keyword search with near-zero overhead. Inspired by codexray. | Search | Users get useful search without the 500MB embedding model download; faster startup for small projects | ✓ | ✓ | 3 | No | | ||
| | 13 | Architecture boundary rules | User-defined rules for allowed/forbidden dependencies between modules (e.g., "controllers must not import from other controllers"). Violations flagged in `diff-impact` and CI. Inspired by codegraph-rust, stratify. | Architecture | Prevents architectural decay in CI; agents are warned before introducing forbidden cross-module dependencies | ✓ | ✓ | 3 | No | | ||
| | 15 | ~~Hybrid BM25 + semantic search~~ | ~~Combine BM25 keyword matching with embedding-based semantic search using Reciprocal Rank Fusion. Better recall than either approach alone. Inspired by GitNexus, claude-context-local.~~ | Search | ~~Search results improve dramatically — keyword matches catch exact names, embeddings catch conceptual matches, RRF merges both~~ | ✓ | ✓ | 3 | No | **DONE** — FTS5 full-text index alongside embeddings for BM25 keyword search. `ftsSearchData()` for keyword-only, `hybridSearchData()` for RRF fusion. `search` command defaults to `--mode hybrid`, with `semantic` and `keyword` alternatives. MCP `semantic_search` tool gains `mode` parameter. Graceful fallback on older DBs. Zero new deps — FTS5 ships with better-sqlite3. PR #198. | | ||
| | 18 | ~~CODEOWNERS integration~~ | ~~Map graph nodes to CODEOWNERS entries. Show who owns each function, surface ownership boundaries in `diff-impact`. Inspired by CKB.~~ | Developer Experience | ~~`diff-impact` tells agents which teams to notify; ownership-aware impact analysis reduces missed reviews~~ | ✓ | ✓ | 3 | No | **DONE** — `src/owners.js` module with CODEOWNERS parser, matcher, and data functions. `codegraph owners [target]` CLI command with `--owner`, `--boundary`, `-f`, `-k`, `-T`, `-j` options. `code_owners` MCP tool. Integrated into `diff-impact` (affected owners + suggested reviewers). No new deps — glob patterns via ~30-line `patternToRegex`. PR #195. | | ||
| | 15 | Hybrid BM25 + semantic search | Combine BM25 keyword matching with embedding-based semantic search using Reciprocal Rank Fusion. Better recall than either approach alone. Inspired by GitNexus, claude-context-local. | Search | Search results improve dramatically — keyword matches catch exact names, embeddings catch conceptual matches, RRF merges both | ✓ | ✓ | 3 | No | |
There was a problem hiding this comment.
Removing "DONE" markers and implementation details from 6 already-completed features (IDs 5, 15, 16, 18, 20, 31). ID 18 specifically references the CODEOWNERS module that this PR claims to add, but it was already completed in PR #195.
Summary
src/owners.js: New CODEOWNERS module implementing:.github/CODEOWNERS,CODEOWNERS,docs/CODEOWNERS)**,*,?, and anchored/unanchored patternstests/unit/owners.test.js: 21 unit tests coveringparseCodeownersContent,patternToRegex,matchOwners, andparseCodeownerstests/integration/owners.test.js: 12 integration tests coveringownersDataandownersForFileswith a fixture DB and CODEOWNERS filedocs/roadmap/BACKLOG.md: Updated to mark branch structural diff as done and reorganize backlog itemsTest plan
npx vitest run tests/unit/owners.test.js— 21 tests passnpx vitest run tests/integration/owners.test.js— 12 tests pass.github/CODEOWNERSfile