optave · carlos-alm · Mar 3, 2026 · Mar 3, 2026
diff --git a/docs/roadmap/ROADMAP.md b/docs/roadmap/ROADMAP.md
@@ -15,7 +15,7 @@ Codegraph is a strong local-first code graph CLI. This roadmap describes planned
 | [**1**](#phase-1--rust-core) | Rust Core | Rust parsing engine via napi-rs, parallel parsing, incremental tree-sitter, JS orchestration layer | **Complete** (v1.3.0) |
 | [**2**](#phase-2--foundation-hardening) | Foundation Hardening | Parser registry, complete MCP, test coverage, enhanced config, multi-repo MCP | **Complete** (v1.4.0) |
 | [**2.5**](#phase-25--analysis-expansion) | Analysis Expansion | Complexity metrics, community detection, flow tracing, co-change, manifesto, boundary rules, check, triage, audit, batch, hybrid search | **Complete** (v2.6.0) |
-| [**3**](#phase-3--architectural-refactoring) | Architectural Refactoring | Command/query separation, repository pattern, queries.js decomposition, composable MCP, CLI commands, domain errors, curated API, unified graph model | Planned |
+| [**3**](#phase-3--architectural-refactoring) | Architectural Refactoring | Command/query separation, repository pattern, queries.js decomposition, composable MCP, CLI commands, domain errors, curated API, unified graph model, dead symbol cleanup, community drift reduction, break function cycles | Planned |
 | [**4**](#phase-4--typescript-migration) | TypeScript Migration | Project setup, core type definitions, leaf -> core -> orchestration module migration, test migration | Planned |
 | [**5**](#phase-5--intelligent-embeddings) | Intelligent Embeddings | LLM-generated descriptions, enhanced embeddings, build-time semantic metadata, module summaries | Planned |
 | [**6**](#phase-6--natural-language-queries) | Natural Language Queries | `ask` command, conversational sessions, LLM-narrated graph queries, onboarding tools | Planned |
@@ -623,7 +623,62 @@ The repository pattern (3.2) enables true unit testing:
 
 **Current gap:** Many "unit" tests still hit SQLite because there's no repository abstraction.
 
-### 3.14 -- Remaining Items (Lower Priority)
+### 3.14 -- Dead Symbol Cleanup
+
+**Current state:** Role classification reports 221 dead symbols -- 27% of all classified code. Root causes: the dual-function pattern (every `*Data()` has an uncalled `*()` counterpart), 120+ speculative exports in `index.js`, and rapid feature addition without pruning.
+
+**Deliverables:**
+
+1. **Audit pass:** Categorize all dead symbols as truly dead (remove), entry points (annotate), or public API (keep or drop from `index.js`)
+2. **Manifesto rule:** Add `max-dead-ratio` with `warn: 0.15`, `fail: 0.25` to prevent regression
+3. **CI gate:** `codegraph check` fails if dead ratio exceeds threshold
+
+**Note:** Sections 3.1 (command/query separation) and 3.6 (curated API surface) eliminate the two biggest dead-code factories. This item captures the explicit cleanup and prevention gate.
+
+**Target:** Reduce dead symbol ratio from 27% to under 10%.
+
+**Affected files:** `src/manifesto.js`, `src/check.js`, dead code across all modules
+
+### 3.15 -- Community Drift Reduction
+
+**Current state:** Louvain community detection reports 40% drift -- files belong to a different logical community than their directory placement suggests. The flat `src/` layout with 35 modules gives no structural signal about which modules are coupled.
+
+**Deliverables:**
+
+1. **Directory restructuring:** Align file organization to detected communities (this happens naturally through 3.1-3.11):
+   ```
+   src/
+     analysis/       # Community: query/impact/context/explain/roles
+     commands/       # Community: CLI-specific formatting
+     health/         # Community: audit/triage/manifesto/check/complexity
+     graph/          # Community: structure/communities/cochange/cycles
+     infrastructure/ # Community: db/pagination/config/logger
+   ```
+2. **Track drift as a metric:** Add modularity score and drift percentage to `codegraph stats` output
+3. **Manifesto rule:** Add `max-community-drift` with `warn: 0.30`, `fail: 0.45`
+
+**Target:** Reduce drift from 40% to under 20%.
+
+**Affected files:** `src/communities.js`, `src/manifesto.js`, `src/queries.js` (stats), directory structure
+
+### 3.16 -- Break Function-Level Cycles
+
+**Current state:** 9 function-level circular dependencies. File-level imports are acyclic, but function call graphs contain mutual recursion and indirect loops. These make impact analysis unreliable and complicate module decomposition.
+
+**Deliverables:**
+
+1. **Classify each cycle:**
+   - **Intentional recursion** (tree walkers, AST visitors) -- document and exempt from CI gate
+   - **Accidental coupling** (A→B→C→A) -- refactor by extracting shared logic or inverting dependencies
+   - **Layering violations** (query→builder→query) -- break with parameter passing or interface boundaries
+2. **Break accidental cycles** through extraction, dependency inversion, or callback patterns
+3. **CI gate:** Add `no-new-cycles` predicate to `codegraph check` at function scope
+
+**Target:** 0 accidental cycles (intentional recursion documented and exempted).
+
+**Affected files:** `src/check.js`, functions involved in the 9 cycles
+
+### 3.17 -- Remaining Items (Lower Priority)
 
 These items from the original Phase 3 are still valid but less urgent:
 

diff --git a/generated/architecture.md b/generated/architecture.md
@@ -564,6 +564,116 @@ await pipeline.run(rootDir)
 
 ---
 
+## 18. Dead Symbol Cleanup -- 27% of Classified Code Is Unused
+
+**Not in original analysis** -- the `roles` classification that surfaces dead symbols didn't exist yet.
+
+**Current state:** Codegraph's own role classification reports 221 dead symbols -- 27% of all classified code. In a project this young (~10 days old at time of measurement), a quarter of the symbols being unused signals systematic overproduction: speculative helpers, leftover refactoring artifacts, and the dual-function pattern generating display functions that nothing calls.
+
+**Root causes:**
+- The `*Data()` / `*()` dual-function pattern (Section 1) means every data function has a display counterpart. MCP and programmatic consumers only call `*Data()`, leaving many `*()` functions uncalled
+- `index.js` exports 120+ symbols (Section 13) with no consumer tracking -- functions are exported "just in case"
+- Rapid feature addition without pruning -- each new module adds helpers that may only be used during development
+
+**Ideal approach -- continuous dead code hygiene:**
+
+1. **Audit pass:** Run `codegraph roles --role dead -T` and categorize results:
+   - **Truly dead:** Remove immediately (unused helpers, orphaned formatters)
+   - **Entry points:** CLI handlers, MCP tool handlers, test utilities -- mark as `@entry` or add to a known-entries list so the classifier doesn't flag them
+   - **Public API:** Exported but uncalled internally -- decide if they're part of the supported API or remove from `index.js`
+
+2. **CI gate:** Add a dead-symbol threshold to `manifesto.js` rules:
+   ```json
+   {
+     "rule": "max-dead-ratio",
+     "warn": 0.15,
+     "fail": 0.25,
+     "message": "Dead symbol ratio exceeds {threshold}"
+   }
+   ```
+
+3. **Prevention:** The Command/Query separation (Section 1) and curated API surface (Section 13) eliminate the two biggest dead-code factories. Once display functions are internal to the CLI layer and exports are curated, new dead code becomes visible immediately.
+
+**Target:** Reduce dead symbol ratio from 27% to under 10%.
+
+---
+
+## 19. Community Drift -- 40% of Files Are in the Wrong Logical Module
+
+**Not in original analysis** -- `communities.js` didn't exist yet.
+
+**Current state:** Louvain community detection on the dependency graph finds that 40% of files belong to a different logical community than their directory suggests. This means the file organization actively misleads developers about which modules are coupled.
+
+**What drift means concretely:**
+- Files in `src/` root that should be grouped (e.g., `triage.js`, `audit.js`, `manifesto.js` form a "code health" community but live alongside unrelated modules)
+- Utility functions in domain modules that are actually shared infrastructure
+- Tight coupling between files in different conceptual areas (e.g., `structure.js` and `queries.js` are more coupled to each other than to their neighbors)
+
+**Ideal approach -- align directory structure to communities:**
+
+1. **Measure baseline:** `codegraph communities -T` to get current modularity score and drift percentage
+2. **Map communities to directories:** The restructuring proposed in Sections 1, 3, 4, 5 would naturally create directories that match logical communities:
+   ```
+   src/
+     analysis/       # Community: query/impact/context/explain/roles
+     commands/       # Community: CLI-specific formatting
+     health/         # Community: audit/triage/manifesto/check/complexity
+     graph/          # Community: structure/communities/cochange/cycles
+     infrastructure/ # Community: db/pagination/config/logger
+   ```
+3. **Track drift as a metric:** Add modularity score and drift percentage to `stats` output. Regressing drift should trigger a warning.
+4. **CI gate:** Add a drift threshold to `manifesto.js`:
+   ```json
+   {
+     "rule": "max-community-drift",
+     "warn": 0.30,
+     "fail": 0.45,
+     "message": "Community drift exceeds {threshold}"
+   }
+   ```
+
+**Target:** Reduce drift from 40% to under 20% through directory restructuring.
+
+---
+
+## 20. Function-Level Cycles -- 9 Circular Dependencies
+
+**Not in original analysis** -- cycle detection existed but function-level cycles weren't measured.
+
+**Current state:** `codegraph cycles` reports 9 function-level circular dependencies. While the codebase has no file-level cycles (imports are acyclic), function call graphs contain mutual recursion and indirect loops.
+
+**Why this matters:**
+- Circular call chains make impact analysis unreliable -- a change to any function in a cycle potentially affects all others
+- They complicate the proposed decomposition (Sections 1, 3) -- you can't cleanly split modules if their functions are mutually dependent
+- They indicate hidden coupling that the module structure doesn't reveal
+
+**Ideal approach:**
+
+1. **Identify and classify:** Run `codegraph cycles` and categorize each cycle:
+   - **Intentional recursion:** Mutual recursion in tree walkers, AST visitors -- document with comments, exclude from CI gates
+   - **Accidental coupling:** Function A calls B which calls C which calls A -- these need refactoring
+   - **Layering violations:** A query function calling a builder function that calls back into queries -- break by introducing an interface boundary
+
+2. **Break accidental cycles:**
+   - **Extract shared logic:** If A and B both need the same computation, extract it to a third function that both call
+   - **Invert dependencies:** If a low-level function calls a high-level one, pass the needed data as a parameter instead
+   - **Event/callback:** For unavoidable bidirectional communication, use callbacks or events instead of direct calls
+
+3. **CI gate:** Add to `check.js` predicates:
+   ```json
+   {
+     "rule": "no-new-cycles",
+     "scope": "function",
+     "message": "New function-level cycle introduced: {cycle}"
+   }
+   ```
+
+4. **Prevention:** The layered architecture proposed throughout this document (analysis → infrastructure → db) naturally prevents cycles -- lower layers never import from higher layers.
+
+**Target:** Reduce from 9 cycles to 0 accidental cycles (intentional recursion documented and exempted).
+
+---
+
 ## Remaining Items (Unchanged from Original)
 
 - **Config profiles (S8):** Single flat config, no monorepo profiles. Still relevant but not blocking anything.
@@ -593,13 +703,16 @@ await pipeline.run(rootDir)
 | **14** | **Testing pyramid with InMemoryRepository** | **Medium** | Quality | S11 (unchanged) |
 | **15** | **Event-driven pipeline for streaming** | **Medium** | Scalability, UX | S7 (unchanged) |
 | **16** | **Query result caching (25 MCP tools)** | **Low-Medium** | Performance | S14 (unchanged) |
-| **17** | **Unified engine interface (Strategy)** | **Low-Medium** | Abstraction | S6 (was Medium-High) |
-| **18** | **Subgraph export with filtering** | **Low-Medium** | Usability | S16 (unchanged) |
-| **19** | **Transitive import-aware confidence** | **Low** | Accuracy | S9 (unchanged) |
-| **20** | **Parser plugin system** | **Low** | Modularity | S1 (was High -- parser.js shrank to 404 lines) |
-| **21** | **Config profiles for monorepos** | **Low** | Feature | S8 (unchanged) |
-
-**The structural priority shifted.** In the original analysis, the parser monolith was #1 -- it's now #20 because the native engine solved it. The new #1 is the command/query separation: the dual-function anti-pattern replicated across 15 modules is the single biggest source of code duplication and coupling in the codebase. Items 1-3 are the foundation -- they restructure the core and everything else becomes easier. Items 4-7 are high-impact but can be done in parallel. Items 8-10 are large-file decompositions that follow naturally once the shared infrastructure exists.
+| **17** | **Dead symbol cleanup (27% dead code ratio)** | **Medium** | Code hygiene | New |
+| **18** | **Reduce community drift (40% misplaced files)** | **Medium** | Cohesion | New |
+| **19** | **Break function-level cycles (9 circular deps)** | **Medium** | Correctness | New |
+| **20** | **Unified engine interface (Strategy)** | **Low-Medium** | Abstraction | S6 (was Medium-High) |
+| **21** | **Subgraph export with filtering** | **Low-Medium** | Usability | S16 (unchanged) |
+| **22** | **Transitive import-aware confidence** | **Low** | Accuracy | S9 (unchanged) |
+| **23** | **Parser plugin system** | **Low** | Modularity | S1 (was High -- parser.js shrank to 404 lines) |
+| **24** | **Config profiles for monorepos** | **Low** | Feature | S8 (unchanged) |
+
+**The structural priority shifted.** In the original analysis, the parser monolith was #1 -- it's now #23 because the native engine solved it. The new #1 is the command/query separation: the dual-function anti-pattern replicated across 15 modules is the single biggest source of code duplication and coupling in the codebase. Items 1-3 are the foundation -- they restructure the core and everything else becomes easier. Items 4-7 are high-impact but can be done in parallel. Items 8-10 are large-file decompositions that follow naturally once the shared infrastructure exists. Items 17-19 (dead symbols, community drift, function cycles) are health metrics that improve naturally as the structural changes land -- but also benefit from explicit CI gates to prevent regression.
 
 ---