docs: add community leverage features to backlog (IDs 100-102) by carlos-alm · Pull Request #546 · optave/codegraph

carlos-alm · 2026-03-20T07:51:43Z

Summary

Add three new backlog items leveraging the vendored Leiden algorithm:
- feat: add file limit to structure tool #100 Weighted community labels — auto-generate human-readable names from member files/symbols
- feat: dev releases to GitHub Releases instead of npm #101 Hierarchical community decomposition — multi-resolution nested communities
- docs: add CLI and MCP example output showcases #102 Community-aware impact scoring — factor community boundaries into blast radius risk
Mark fix: make embed command respect .codegraphrc.json model config #96 (README dep count) as done after graphology removal
Update fix: change default embedding model to nomic-v1.5 #54 to reference Leiden instead of Louvain

Test plan

Verify BACKLOG.md renders correctly on GitHub
New IDs don't conflict with existing entries

Add three Leiden-powered features: weighted community labels (#97), hierarchical decomposition (#98), and community-aware impact scoring (#99). Mark ID 96 (README dep count) as done after graphology removal. Update ID 54 to reference Leiden instead of Louvain.

claude · 2026-03-20T07:51:55Z

Claude finished @carlos-alm's task —— View job

PR Review: Community Leverage Features to Backlog ❌

Todo List:

Read and analyze BACKLOG.md changes
Phase 1: Root cause analysis
Phase 2: Configuration change detection
Phase 3: Test integrity check
Phase 4: Code quality assessment
Phase 5: Backlog compliance evaluation
Provide final recommendation

Root Cause Analysis

Problem Being Solved: Adding three new backlog items to leverage the recently vendored Leiden algorithm for improved community detection features, updating terminology from Louvain to Leiden, and marking a completed item as done.

Why This Approach: The PR leverages the newly vendored Leiden algorithm by proposing features that build on the existing communities infrastructure. This is a logical follow-up to the Leiden integration.

Risk Assessment: CRITICAL ISSUE FOUND — ID conflict in backlog entries.

Backlog Compliance

Zero-dep: ✓ — All three new features build on existing tree-sitter/SQLite/AST infrastructure
Foundation-aligned: ✓ — Aligns with P1 (always-current), P4 (zero-cost core), P5 (embeddable-first), P8 (planning/developing focus)
Problem-fit: 3-4 — Improves blast radius accuracy (ID 99 scores 4), architectural understanding (IDs 97,98 score 3)
Breaking: No — All features are purely additive
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

❌ BLOCKING: Duplicate ID 97

The PR introduces ID 97 for "Weighted community labels" in the new "Tier 1e′" section at line 121, but ID 97 already exists in "Tier 1i" at line 185 for "Unified multi-repo graph". This creates a conflict where the same ID refers to two completely different features.

❌ BLOCKING: Inconsistent ID sequence

From scanning the backlog, the highest ID numbers in use appear to be in the mid-90s range. The new community features should use the next available sequential IDs (likely 100, 101, 102) rather than conflicting with existing entries.

Phase Assessment Results

Phase 1 ✅: Root cause is sound — leveraging vendored Leiden algorithm with additional features.

Phase 2 ✅: No configuration changes (documentation-only PR).

Phase 3 ✅: No test integrity concerns (documentation-only).

Phase 4 ❌: Critical ID conflict prevents approval. Content quality is otherwise excellent:

New section "Tier 1e′" follows proper formatting
Item descriptions are comprehensive and well-written
Item fix: change default embedding model to nomic-v1.5 #54 correctly updated from "Louvain" to "Leiden"
Item fix: make embed command respect .codegraphrc.json model config #96 correctly marked as DONE with proper explanation
All features align with established patterns and quality standards

Phase 5 ✅: Features score well on backlog compliance criteria.

Required Fix

Before this PR can be approved, the ID conflict must be resolved:

Change IDs 97, 98, 99 to 100, 101, 102 (or whatever the next available sequential IDs are)
Verify no other ID conflicts exist across the entire backlog
Update any internal references if the new features reference each other by ID

Final Recommendation

Rating: ⭐⭐☆☆☆ (2/5)
Action: REQUEST CHANGES
Reasoning: Critical ID conflict makes the PR non-mergeable despite high content quality. The features themselves are well-designed and aligned with codegraph's mission, but the duplicate ID 97 must be fixed before approval.

greptile-apps · 2026-03-20T07:54:26Z

Greptile Summary

This PR adds three new backlog items (#100–#102) under a new "Tier 1e.1 — Community leverage" section, all building on the upcoming vendored Leiden algorithm (PR #545). It also marks item #96 as BLOCKED on #545 (correctly reverting a premature DONE), and renames the algorithm reference in item #54 from Louvain to Leiden. All previous review feedback has been properly addressed — the section is now forward-looking, tier naming uses ASCII-safe 1e.1, IDs are conflict-free, and items #100–#102 correctly list #545 in "Depends on".

Key changes:

New Tier 1e.1 section with items feat: add file limit to structure tool #100 (weighted community labels), feat: dev releases to GitHub Releases instead of npm #101 (hierarchical decomposition), docs: add CLI and MCP example output showcases #102 (community-aware impact scoring) — all correctly gated on #545
Item fix: make embed command respect .codegraphrc.json model config #96 reverted from DONE to BLOCKED with an accurate explanation tied to #545
Item fix: change default embedding model to nomic-v1.5 #54 description updated from "Louvain communities" to "Leiden communities"

One remaining inconsistency:

Item fix: change default embedding model to nomic-v1.5 #54's description now explicitly says "Leiden communities", but its "Depends on" column still shows —. The identical pattern (explicit Leiden reference without a #545 dependency) was caught and fixed for feat: add file limit to structure tool #100–docs: add CLI and MCP example output showcases #102 in a prior review — the same fix should apply here, or the description should be made algorithm-agnostic so the feature can be implemented today against the existing graphology-communities-louvain backend.

Confidence Score: 4/5

Safe to merge with one minor inconsistency — item fix: change default embedding model to nomic-v1.5 #54's "Depends on" column was not updated alongside its Leiden naming change.
All previous review issues have been thoroughly addressed. The one remaining inconsistency (item fix: change default embedding model to nomic-v1.5 #54 now says "Leiden communities" but lacks a feat: vendor Leiden community detection, remove graphology #545 dependency, unlike the identical pattern fixed for feat: add file limit to structure tool #100–docs: add CLI and MCP example output showcases #102) is a documentation-only concern with no runtime impact. The backlog IDs are conflict-free, the section prose is forward-looking and accurate, and the three new items are well-specified with correct dependencies.
docs/roadmap/BACKLOG.md — specifically item fix: change default embedding model to nomic-v1.5 #54's "Depends on" column at line 112.

Important Files Changed

Filename	Overview
docs/roadmap/BACKLOG.md	Adds Tier 1e.1 section with three new community-leverage backlog items (#100–#102, all correctly depending on #545), marks #96 as BLOCKED on #545, and updates #54 to reference Leiden. One inconsistency remains: #54's description now says "Leiden communities" but its "Depends on" column was not updated to #545, unlike the identical pattern that was fixed for #100–#102.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    P545["PR #545\nVendor Leiden algorithm"]

    P545 --> I96["#96 Fix README dep count\n(BLOCKED → resolves automatically)"]
    P545 --> I100["#100 Weighted community labels\n(new, Tier 1e.1)"]
    P545 --> I101["#101 Hierarchical community decomposition\n(new, Tier 1e.1)"]
    P545 --> I102["#102 Community-aware impact scoring\n(new, Tier 1e.1)"]

    I54["#54 Co-change vs dependency communities\n(updated: Louvain → Leiden)\nDepends on: —  ⚠️"]

    P545 -.->|"description says 'Leiden communities'\nbut Depends on = —"| I54

    style P545 fill:#4a90d9,color:#fff
    style I96 fill:#f0ad4e,color:#000
    style I100 fill:#5cb85c,color:#fff
    style I101 fill:#5cb85c,color:#fff
    style I102 fill:#5cb85c,color:#fff
    style I54 fill:#d9534f,color:#fff

_{Last reviewed commit: "docs: add #545 depen..."}

greptile-apps · 2026-03-20T07:54:29Z

docs/roadmap/BACKLOG.md

+| 97 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |
+| 98 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |
+| 99 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |


Duplicate ID 97 — conflicts with existing "Unified multi-repo graph" entry

The three new community leverage items are numbered 97, 98, 99, but ID 97 was already assigned to the "Unified multi-repo graph" item on line 185. The base branch (45585a9) confirms that IDs 1–97 were fully occupied before this PR.

The test plan checkbox "New IDs don't conflict with existing entries" remains unchecked — and indeed the conflict exists.

IDs 98 and 99 were not previously used, so only #97 collides. The new items should be renumbered starting at 100:

Suggested change

| 97 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |

| 98 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |

| 99 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |

| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |

| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |

| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |

Also update the PR title and description to reference IDs 100–102.

Fixed — renumbered all three items from 97/98/99 to 100/101/102. No conflicts with existing entries now.

greptile-apps · 2026-03-20T07:54:30Z

docs/roadmap/BACKLOG.md


+### Tier 1e′ — Community leverage (build on vendored Leiden algorithm)
+
+Community detection uses a vendored Leiden/Louvain optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.


Section description still uses "Leiden/Louvain" after Louvain references are being updated

The new section's introductory paragraph says "vendored Leiden/Louvain optimiser", yet one stated goal of this PR is to replace Louvain references with Leiden (as done for item #54). This is inconsistent — if the vendored package is now purely the Leiden algorithm, it should be described as such here too.

Suggested change

Community detection uses a vendored Leiden/Louvain optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.

Community detection uses a vendored Leiden optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.

Fixed — updated section description to say "vendored Leiden optimiser" instead of "Leiden/Louvain optimiser".

IDs 97-99 conflicted with existing entry #97 (unified multi-repo graph). Renumbered to 100-102. Also corrected "Leiden/Louvain" to "Leiden" in the section description for consistency with the rest of the PR.

…to docs/backlog-leiden-features

carlos-alm · 2026-03-20T08:20:40Z

Addressed both review concerns: renumbered IDs 97/98/99 to 100/101/102 (no more collision with existing ID 97), and corrected section description from "Leiden/Louvain" to "Leiden".

carlos-alm · 2026-03-20T08:20:45Z

@greptileai

carlos-alm · 2026-03-20T08:20:50Z

@claude

claude · 2026-03-20T08:21:01Z

Claude finished @carlos-alm's task —— View job

Claude analyzing PR changes ✅

Todo List:

Read current BACKLOG.md to verify changes
Check if ID conflicts are resolved (100-102 vs original 97-99)
Verify terminology consistency (Leiden vs Louvain)
Provide updated review assessment

✅ Fixes Verified - All Issues Resolved

Both critical concerns from the previous review have been successfully addressed:

✅ ID Conflict Fixed

Issue: Duplicate ID 97 conflicted with existing "Unified multi-repo graph" entry
Status: RESOLVED — New community features are now properly numbered as IDs 100, 101, 102
Verification:
- Line 121: ID 100 "Weighted community labels"
- Line 122: ID 101 "Hierarchical community decomposition"
- Line 123: ID 102 "Community-aware impact scoring"
- Line 185: Original ID 97 "Unified multi-repo graph" remains unchanged

✅ Terminology Consistency Fixed

Issue: Section description said "Leiden/Louvain optimiser" despite updating other Louvain references
Status: RESOLVED — Section now consistently uses "vendored Leiden optimiser"
Verification: Line 117 shows proper terminology: Community detection uses a vendored Leiden optimiser

Updated Assessment

Quality: ⭐⭐⭐⭐⭐ (5/5)

All content is well-written and properly formatted
Features align with existing patterns and architecture
No remaining conflicts or inconsistencies

Backlog Compliance: ✅ EXCELLENT

Zero-dep: ✓ All features build on existing infrastructure
Foundation-aligned: ✓ Aligns with core principles
Problem-fit: 3-4 (solid contribution to blast radius accuracy and architectural understanding)
Breaking: No — All features are purely additive
Tier: 1 (meets all criteria for highest priority tier)

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE ✅
Reasoning: All blocking issues resolved. The PR successfully adds three well-designed community leverage features, correctly updates terminology from Louvain to Leiden, and marks completed work as done. No remaining concerns.

greptile-apps · 2026-03-20T08:23:54Z

docs/roadmap/BACKLOG.md

+| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |
+| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |
+| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |


PR title and description reference stale IDs 97-99

The items in the file were correctly renumbered to 100, 101, 102 (as requested in a prior review), but the PR title still reads (IDs 97-99) and the PR description still lists **#97** Weighted community labels, **#98** Hierarchical community decomposition, and **#99** Community-aware impact scoring. These should be updated to reference the final IDs so the PR history stays accurate for future readers searching by ID.

Fixed — updated PR title and description to reference the correct IDs 100-102 instead of 97-99.

greptile-apps · 2026-03-20T08:23:55Z

docs/roadmap/BACKLOG.md

+| 54 | Co-change communities vs dependency communities | Compare Leiden communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions | ✓ | ✓ | 3 | No | — |
 | 55 | Missing co-change partner warning in `check` | When `--staged` changes touch file A but not its historical partner file B (high Jaccard), emit a warning: "file A historically co-changes with file B — did you forget to update it?" | CI | Catches incomplete changes in CI — the most common source of subtle bugs is changing one file in a coupled pair but forgetting the other | ✓ | ✓ | 4 | No | — |

+### Tier 1e′ — Community leverage (build on vendored Leiden algorithm)


Unconventional prime-suffix tier naming

Inserting Tier 1e′ between Tier 1e (Co-change) and Tier 1f (Embeddings) works around renaming downstream tiers, but the prime ′ (U+2032) is a non-ASCII character that can be hard to type, search, and anchor-link against. Consider renaming to Tier 1e2 or, if the tier ordering matters for sequencing, explicitly noting in the section header why it sits between 1e and 1f (e.g., Tier 1e.1 or a brief comment in the prose). This is a minor consistency nit given the rest of the document uses plain ASCII tier names.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Fixed — renamed Tier 1e′ to Tier 1e.1 for ASCII compatibility and conventional numbering.

carlos-alm · 2026-03-20T09:10:49Z

@greptileai

greptile-apps · 2026-03-20T09:14:48Z

docs/roadmap/BACKLOG.md


+### Tier 1e.1 — Community leverage (build on vendored Leiden algorithm)
+
+Community detection uses a vendored Leiden optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.


Section description references a non-existent path

The prose says Community detection uses a vendored Leiden optimiser (\src/graph/algorithms/leiden/`), but this directory does not exist. The actual implementation lives at src/graph/algorithms/louvain.jsand still importsgraphology-communities-louvain(confirmed in both the source file andpackage.json`). The Leiden vendoring that this whole section's premise is built on has not landed in the codebase yet.

This makes the new Tier 1e.1 section description factually incorrect as written. The path should either be corrected to the real module (src/graph/algorithms/louvain.js), or the section note should be marked as forward-looking ("will use a vendored Leiden optimiser once #X lands") rather than describing the present state.

Fixed — updated section description to be forward-looking: 'will use a vendored Leiden optimiser (PR #545)' instead of asserting the path already exists. Also notes current implementation uses graphology-communities-louvain.

greptile-apps · 2026-03-20T09:14:49Z

docs/roadmap/BACKLOG.md

@@ -190,7 +200,7 @@ Items identified by the architectural audit (v3.1.4) that don't fit existing tie
 | 93 | ~~Shell completion for CLI~~ | ~~Commander supports shell completion but it's not implemented. `codegraph completion bash\|zsh\|fish` outputs the appropriate script. Basic UX gap for a CLI tool with 40+ commands.~~ | Developer Experience | ~~Tab completion makes the CLI discoverable — users find commands without reading docs. Reduces friction for new users and agents exploring available commands~~ | ✓ | ✓ | 2 | No | — | **PROMOTED** — Moved to ROADMAP Phase 7.11 |
 | 94 | VS Code extension | Replace the planned Web UI (removed from roadmap Phase 12) with a VS Code extension providing: webview-based graph visualization (reusing the existing `viewer.js` HTML), go-to-definition via graph edges, inline impact annotations on hover, integration with the MCP server for queries, and a sidebar panel for triage/audit results. VS Code is the right UI target for developer tools in 2026. | Visualization | Developers get graph intelligence directly in their editor — no context switching to a browser or terminal. Impact annotations on hover surface blast radius without running commands | ✗ | ✓ | 3 | No | — |


#96 marked DONE based on a change that hasn't shipped

The DONE note reads: "graphology and graphology-communities-louvain removed by vendoring Leiden algorithm. Runtime dependency count is now back to 3." However, both packages are still present as runtime dependencies in package.json:

"graphology": "^0.26.0", "graphology-communities-louvain": "^2.0.2",

And src/graph/algorithms/louvain.js still imports graphology-communities-louvain as its algorithm backend. The Leiden vendoring (which this DONE justification is predicated on) has not actually been merged — there is no src/graph/algorithms/leiden/ directory in the repository. Marking #96 as DONE here is premature and will mislead readers into thinking the dependency count is already 3 when it is still 5.

Fixed — reverted #96 from DONE to BLOCKED on #545. graphology and graphology-communities-louvain are still live runtime dependencies until the Leiden vendoring PR merges.

The section description referenced src/graph/algorithms/leiden/ which does not exist yet (pending PR #545). Item #96 was incorrectly marked DONE when graphology is still a live dependency. Updated section to be forward-looking and reverted #96 to BLOCKED on #545.

carlos-alm · 2026-03-20T10:17:57Z

@greptileai

greptile-apps · 2026-03-20T10:20:26Z

docs/roadmap/BACKLOG.md

+| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |
+| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |
+| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |


"Depends on" column should reference #545 for items #100–#102

The section description explicitly states these features are available "once the Leiden vendoring lands", yet all three items still have — in the "Depends on" column. This is inconsistent with:

The section's own prose: "These features deepen community awareness across the analysis pipeline once the Leiden vendoring lands."

Item fix: make embed command respect .codegraphrc.json model config #96, which was just corrected to list #545 in the same column for the same reason.

A reader scanning the table independently of the section header would see no dependency and might schedule any of these items before #545 is merged, only to discover the Leiden resolution/quality-function API they need isn't available yet.

Suggested change

| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |

| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |

| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |

| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | #545 |

| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | #545 |

| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | #545 |

Fixed — added #545 as dependency for all three community leverage items (100, 101, 102) in the 'Depends on' column, consistent with the section description and #96.

carlos-alm · 2026-03-20T10:23:32Z

@greptileai

greptile-apps · 2026-03-20T10:26:35Z

docs/roadmap/BACKLOG.md

 |----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------|
 | 53 | Co-change coupling in triage | Files with high co-change Jaccard scores are maintenance risks — changing one always requires changing the other. Add `coChangeCoupling` as a triage scoring dimension using the max Jaccard score for each file. | Intelligence | Triage surfaces tightly coupled file pairs that static dependency analysis may miss — hidden coupling from shared business logic or implicit contracts | ✓ | ✓ | 4 | No | — |
-| 54 | Co-change communities vs dependency communities | Compare Louvain communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions | ✓ | ✓ | 3 | No | — |
+| 54 | Co-change communities vs dependency communities | Compare Leiden communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions | ✓ | ✓ | 3 | No | — |


Item #54 "Depends on" not updated after renaming to "Leiden communities"

The description was updated from "Compare Louvain communities" to "Compare Leiden communities", but the "Depends on" column still shows —. This is the same inconsistency that was caught and fixed for items #100–#102 in a prior review: those items explicitly reference Leiden and now correctly list #545 in "Depends on".

A developer scanning item #54 independently of the section header would see no dependency, attempt to implement it against the current graphology-communities-louvain backend, and produce output the description calls "Leiden communities" — which is factually wrong until #545 ships.

Two consistent options:

Option A — Update "Depends on" to #545, matching the pattern already applied to #100–#102:

Suggested change

| 54 | Co-change communities vs dependency communities | Compare Leiden communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions | ✓ | ✓ | 3 | No | — |

| 54 | Co-change communities vs dependency communities | Compare Leiden communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions | ✓ | ✓ | 3 | No | #545 |

Option B — Keep the dependency as — but make the description algorithm-agnostic so it can be implemented today with the existing Louvain backend:

| 54 | Co-change communities vs dependency communities | Compare dependency-graph communities with clusters from co-change data. ...

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

carlos-alm added 3 commits March 20, 2026 02:11

Merge branch 'main' into docs/backlog-leiden-features

39a3c19

fix(backlog): renumber community items to IDs 100-102, fix Leiden text

c97b217

IDs 97-99 conflicted with existing entry #97 (unified multi-repo graph). Renumbered to 100-102. Also corrected "Leiden/Louvain" to "Leiden" in the section description for consistency with the rest of the PR.

Merge remote-tracking branch 'origin/docs/backlog-leiden-features' in…

afc420d

…to docs/backlog-leiden-features

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

carlos-alm added 2 commits March 20, 2026 02:42

Merge branch 'main' into docs/backlog-leiden-features

41a0329

docs: rename Tier 1e′ to Tier 1e.1 for ASCII compatibility

7f4a036

carlos-alm changed the title ~~docs: add community leverage features to backlog (IDs 97-99)~~ docs: add community leverage features to backlog (IDs 100-102) Mar 20, 2026

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes

docs: add #545 dependency to community leverage items 100-102

9217342

greptile-apps bot reviewed Mar 20, 2026

View reviewed changes


		### Tier 1e′ — Community leverage (build on vendored Leiden algorithm)

		Community detection uses a vendored Leiden/Louvain optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.


		### Tier 1e.1 — Community leverage (build on vendored Leiden algorithm)

		Community detection uses a vendored Leiden optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.

		@@ -190,7 +200,7 @@ Items identified by the architectural audit (v3.1.4) that don't fit existing tie
		\| 93 \| ~~Shell completion for CLI~~ \| ~~Commander supports shell completion but it's not implemented. `codegraph completion bash\\|zsh\\|fish` outputs the appropriate script. Basic UX gap for a CLI tool with 40+ commands.~~ \| Developer Experience \| ~~Tab completion makes the CLI discoverable — users find commands without reading docs. Reduces friction for new users and agents exploring available commands~~ \| ✓ \| ✓ \| 2 \| No \| — \| PROMOTED — Moved to ROADMAP Phase 7.11 \|
		\| 94 \| VS Code extension \| Replace the planned Web UI (removed from roadmap Phase 12) with a VS Code extension providing: webview-based graph visualization (reusing the existing `viewer.js` HTML), go-to-definition via graph edges, inline impact annotations on hover, integration with the MCP server for queries, and a sidebar panel for triage/audit results. VS Code is the right UI target for developer tools in 2026. \| Visualization \| Developers get graph intelligence directly in their editor — no context switching to a browser or terminal. Impact annotations on hover surface blast radius without running commands \| ✗ \| ✓ \| 3 \| No \| — \|

Conversation

carlos-alm commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Community Leverage Features to Backlog ❌

Todo List:

Root Cause Analysis

Backlog Compliance

Critical Concerns

❌ BLOCKING: Duplicate ID 97

❌ BLOCKING: Inconsistent ID sequence

Phase Assessment Results

Required Fix

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 20, 2026

Uh oh!

carlos-alm commented Mar 20, 2026

Uh oh!

carlos-alm commented Mar 20, 2026

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo List:

✅ Fixes Verified - All Issues Resolved

✅ ID Conflict Fixed

✅ Terminology Consistency Fixed

Updated Assessment

Quality: ⭐⭐⭐⭐⭐ (5/5)

Backlog Compliance: ✅ EXCELLENT

Final Recommendation

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 20, 2026

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 20, 2026

Choose a reason for hiding this comment

carlos-alm commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 20, 2026 •

edited

Loading

greptile-apps bot commented Mar 20, 2026 •

edited

Loading

claude bot commented Mar 20, 2026 •

edited

Loading