Skip to content

docs: add community leverage features to backlog (IDs 100-102)#546

Open
carlos-alm wants to merge 8 commits intomainfrom
docs/backlog-leiden-features
Open

docs: add community leverage features to backlog (IDs 100-102)#546
carlos-alm wants to merge 8 commits intomainfrom
docs/backlog-leiden-features

Conversation

@carlos-alm
Copy link
Contributor

@carlos-alm carlos-alm commented Mar 20, 2026

Summary

Test plan

  • Verify BACKLOG.md renders correctly on GitHub
  • New IDs don't conflict with existing entries

Add three Leiden-powered features: weighted community labels (#97),
hierarchical decomposition (#98), and community-aware impact scoring
(#99). Mark ID 96 (README dep count) as done after graphology removal.
Update ID 54 to reference Leiden instead of Louvain.
@claude
Copy link

claude bot commented Mar 20, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Community Leverage Features to Backlog ❌

Todo List:

  • Read and analyze BACKLOG.md changes
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity check
  • Phase 4: Code quality assessment
  • Phase 5: Backlog compliance evaluation
  • Provide final recommendation

Root Cause Analysis

Problem Being Solved: Adding three new backlog items to leverage the recently vendored Leiden algorithm for improved community detection features, updating terminology from Louvain to Leiden, and marking a completed item as done.

Why This Approach: The PR leverages the newly vendored Leiden algorithm by proposing features that build on the existing communities infrastructure. This is a logical follow-up to the Leiden integration.

Risk Assessment: CRITICAL ISSUE FOUND — ID conflict in backlog entries.

Backlog Compliance

  • Zero-dep: ✓ — All three new features build on existing tree-sitter/SQLite/AST infrastructure
  • Foundation-aligned: ✓ — Aligns with P1 (always-current), P4 (zero-cost core), P5 (embeddable-first), P8 (planning/developing focus)
  • Problem-fit: 3-4 — Improves blast radius accuracy (ID 99 scores 4), architectural understanding (IDs 97,98 score 3)
  • Breaking: No — All features are purely additive
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

❌ BLOCKING: Duplicate ID 97

The PR introduces ID 97 for "Weighted community labels" in the new "Tier 1e′" section at line 121, but ID 97 already exists in "Tier 1i" at line 185 for "Unified multi-repo graph". This creates a conflict where the same ID refers to two completely different features.

❌ BLOCKING: Inconsistent ID sequence

From scanning the backlog, the highest ID numbers in use appear to be in the mid-90s range. The new community features should use the next available sequential IDs (likely 100, 101, 102) rather than conflicting with existing entries.

Phase Assessment Results

Phase 1 ✅: Root cause is sound — leveraging vendored Leiden algorithm with additional features.

Phase 2 ✅: No configuration changes (documentation-only PR).

Phase 3 ✅: No test integrity concerns (documentation-only).

Phase 4 ❌: Critical ID conflict prevents approval. Content quality is otherwise excellent:

Phase 5 ✅: Features score well on backlog compliance criteria.

Required Fix

Before this PR can be approved, the ID conflict must be resolved:

  1. Change IDs 97, 98, 99 to 100, 101, 102 (or whatever the next available sequential IDs are)
  2. Verify no other ID conflicts exist across the entire backlog
  3. Update any internal references if the new features reference each other by ID

Final Recommendation

  • Rating: ⭐⭐☆☆☆ (2/5)
  • Action: REQUEST CHANGES
  • Reasoning: Critical ID conflict makes the PR non-mergeable despite high content quality. The features themselves are well-designed and aligned with codegraph's mission, but the duplicate ID 97 must be fixed before approval.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR adds three new backlog items (#100#102) under a new "Tier 1e.1 — Community leverage" section, all building on the upcoming vendored Leiden algorithm (PR #545). It also marks item #96 as BLOCKED on #545 (correctly reverting a premature DONE), and renames the algorithm reference in item #54 from Louvain to Leiden. All previous review feedback has been properly addressed — the section is now forward-looking, tier naming uses ASCII-safe 1e.1, IDs are conflict-free, and items #100#102 correctly list #545 in "Depends on".

Key changes:

One remaining inconsistency:

Confidence Score: 4/5

Important Files Changed

Filename Overview
docs/roadmap/BACKLOG.md Adds Tier 1e.1 section with three new community-leverage backlog items (#100#102, all correctly depending on #545), marks #96 as BLOCKED on #545, and updates #54 to reference Leiden. One inconsistency remains: #54's description now says "Leiden communities" but its "Depends on" column was not updated to #545, unlike the identical pattern that was fixed for #100#102.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    P545["PR #545\nVendor Leiden algorithm"]

    P545 --> I96["#96 Fix README dep count\n(BLOCKED → resolves automatically)"]
    P545 --> I100["#100 Weighted community labels\n(new, Tier 1e.1)"]
    P545 --> I101["#101 Hierarchical community decomposition\n(new, Tier 1e.1)"]
    P545 --> I102["#102 Community-aware impact scoring\n(new, Tier 1e.1)"]

    I54["#54 Co-change vs dependency communities\n(updated: Louvain → Leiden)\nDepends on: —  ⚠️"]

    P545 -.->|"description says 'Leiden communities'\nbut Depends on = —"| I54

    style P545 fill:#4a90d9,color:#fff
    style I96 fill:#f0ad4e,color:#000
    style I100 fill:#5cb85c,color:#fff
    style I101 fill:#5cb85c,color:#fff
    style I102 fill:#5cb85c,color:#fff
    style I54 fill:#d9534f,color:#fff
Loading

Last reviewed commit: "docs: add #545 depen..."

Comment on lines +121 to +123
| 97 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |
| 98 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |
| 99 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Duplicate ID 97 — conflicts with existing "Unified multi-repo graph" entry

The three new community leverage items are numbered 97, 98, 99, but ID 97 was already assigned to the "Unified multi-repo graph" item on line 185. The base branch (45585a9) confirms that IDs 1–97 were fully occupied before this PR.

The test plan checkbox "New IDs don't conflict with existing entries" remains unchecked — and indeed the conflict exists.

IDs 98 and 99 were not previously used, so only #97 collides. The new items should be renumbered starting at 100:

Suggested change
| 97 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number ||| 3 | No ||
| 98 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions ||| 3 | No ||
| 99 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture ||| 4 | No ||
| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number ||| 3 | No ||
| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions ||| 3 | No ||
| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture ||| 4 | No ||

Also update the PR title and description to reference IDs 100–102.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — renumbered all three items from 97/98/99 to 100/101/102. No conflicts with existing entries now.


### Tier 1e′ — Community leverage (build on vendored Leiden algorithm)

Community detection uses a vendored Leiden/Louvain optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Section description still uses "Leiden/Louvain" after Louvain references are being updated

The new section's introductory paragraph says "vendored Leiden/Louvain optimiser", yet one stated goal of this PR is to replace Louvain references with Leiden (as done for item #54). This is inconsistent — if the vendored package is now purely the Leiden algorithm, it should be described as such here too.

Suggested change
Community detection uses a vendored Leiden/Louvain optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.
Community detection uses a vendored Leiden optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated section description to say "vendored Leiden optimiser" instead of "Leiden/Louvain optimiser".

IDs 97-99 conflicted with existing entry #97 (unified multi-repo graph).
Renumbered to 100-102. Also corrected "Leiden/Louvain" to "Leiden" in
the section description for consistency with the rest of the PR.
@carlos-alm
Copy link
Contributor Author

Addressed both review concerns: renumbered IDs 97/98/99 to 100/101/102 (no more collision with existing ID 97), and corrected section description from "Leiden/Louvain" to "Leiden".

@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm
Copy link
Contributor Author

@claude

@claude
Copy link

claude bot commented Mar 20, 2026

Claude finished @carlos-alm's task —— View job


Claude analyzing PR changes

Todo List:

  • Read current BACKLOG.md to verify changes
  • Check if ID conflicts are resolved (100-102 vs original 97-99)
  • Verify terminology consistency (Leiden vs Louvain)
  • Provide updated review assessment

Fixes Verified - All Issues Resolved

Both critical concerns from the previous review have been successfully addressed:

ID Conflict Fixed

  • Issue: Duplicate ID 97 conflicted with existing "Unified multi-repo graph" entry
  • Status: RESOLVED — New community features are now properly numbered as IDs 100, 101, 102
  • Verification:
    • Line 121: ID 100 "Weighted community labels"
    • Line 122: ID 101 "Hierarchical community decomposition"
    • Line 123: ID 102 "Community-aware impact scoring"
    • Line 185: Original ID 97 "Unified multi-repo graph" remains unchanged

Terminology Consistency Fixed

  • Issue: Section description said "Leiden/Louvain optimiser" despite updating other Louvain references
  • Status: RESOLVED — Section now consistently uses "vendored Leiden optimiser"
  • Verification: Line 117 shows proper terminology: Community detection uses a vendored Leiden optimiser

Updated Assessment

Quality: ⭐⭐⭐⭐⭐ (5/5)

  • All content is well-written and properly formatted
  • Features align with existing patterns and architecture
  • No remaining conflicts or inconsistencies

Backlog Compliance: ✅ EXCELLENT

  • Zero-dep: ✓ All features build on existing infrastructure
  • Foundation-aligned: ✓ Aligns with core principles
  • Problem-fit: 3-4 (solid contribution to blast radius accuracy and architectural understanding)
  • Breaking: No — All features are purely additive
  • Tier: 1 (meets all criteria for highest priority tier)

Final Recommendation

  • Rating: ⭐⭐⭐⭐⭐ (5/5)
  • Action: APPROVE
  • Reasoning: All blocking issues resolved. The PR successfully adds three well-designed community leverage features, correctly updates terminology from Louvain to Leiden, and marks completed work as done. No remaining concerns.

Comment on lines +121 to +123
| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |
| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |
| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 PR title and description reference stale IDs 97-99

The items in the file were correctly renumbered to 100, 101, 102 (as requested in a prior review), but the PR title still reads (IDs 97-99) and the PR description still lists **#97** Weighted community labels, **#98** Hierarchical community decomposition, and **#99** Community-aware impact scoring. These should be updated to reference the final IDs so the PR history stays accurate for future readers searching by ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated PR title and description to reference the correct IDs 100-102 instead of 97-99.

| 54 | Co-change communities vs dependency communities | Compare Leiden communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions | ✓ | ✓ | 3 | No | — |
| 55 | Missing co-change partner warning in `check` | When `--staged` changes touch file A but not its historical partner file B (high Jaccard), emit a warning: "file A historically co-changes with file B — did you forget to update it?" | CI | Catches incomplete changes in CI — the most common source of subtle bugs is changing one file in a coupled pair but forgetting the other | ✓ | ✓ | 4 | No | — |

### Tier 1e′ — Community leverage (build on vendored Leiden algorithm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unconventional prime-suffix tier naming

Inserting Tier 1e′ between Tier 1e (Co-change) and Tier 1f (Embeddings) works around renaming downstream tiers, but the prime (U+2032) is a non-ASCII character that can be hard to type, search, and anchor-link against. Consider renaming to Tier 1e2 or, if the tier ordering matters for sequencing, explicitly noting in the section header why it sits between 1e and 1f (e.g., Tier 1e.1 or a brief comment in the prose). This is a minor consistency nit given the rest of the document uses plain ASCII tier names.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — renamed Tier 1e′ to Tier 1e.1 for ASCII compatibility and conventional numbering.

@carlos-alm carlos-alm changed the title docs: add community leverage features to backlog (IDs 97-99) docs: add community leverage features to backlog (IDs 100-102) Mar 20, 2026
@carlos-alm
Copy link
Contributor Author

@greptileai


### Tier 1e.1 — Community leverage (build on vendored Leiden algorithm)

Community detection uses a vendored Leiden optimiser (`src/graph/algorithms/leiden/`) with full control over resolution, quality functions, and refinement. Currently only consumed by `communities` command and HTML viewer enrichment. These features deepen community awareness across the analysis pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Section description references a non-existent path

The prose says Community detection uses a vendored Leiden optimiser (\src/graph/algorithms/leiden/`), but this directory does not exist. The actual implementation lives at src/graph/algorithms/louvain.jsand still importsgraphology-communities-louvain(confirmed in both the source file andpackage.json`). The Leiden vendoring that this whole section's premise is built on has not landed in the codebase yet.

This makes the new Tier 1e.1 section description factually incorrect as written. The path should either be corrected to the real module (src/graph/algorithms/louvain.js), or the section note should be marked as forward-looking ("will use a vendored Leiden optimiser once #X lands") rather than describing the present state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated section description to be forward-looking: 'will use a vendored Leiden optimiser (PR #545)' instead of asserting the path already exists. Also notes current implementation uses graphology-communities-louvain.

@@ -190,7 +200,7 @@ Items identified by the architectural audit (v3.1.4) that don't fit existing tie
| 93 | ~~Shell completion for CLI~~ | ~~Commander supports shell completion but it's not implemented. `codegraph completion bash\|zsh\|fish` outputs the appropriate script. Basic UX gap for a CLI tool with 40+ commands.~~ | Developer Experience | ~~Tab completion makes the CLI discoverable — users find commands without reading docs. Reduces friction for new users and agents exploring available commands~~ | ✓ | ✓ | 2 | No | — | **PROMOTED** — Moved to ROADMAP Phase 7.11 |
| 94 | VS Code extension | Replace the planned Web UI (removed from roadmap Phase 12) with a VS Code extension providing: webview-based graph visualization (reusing the existing `viewer.js` HTML), go-to-definition via graph edges, inline impact annotations on hover, integration with the MCP server for queries, and a sidebar panel for triage/audit results. VS Code is the right UI target for developer tools in 2026. | Visualization | Developers get graph intelligence directly in their editor — no context switching to a browser or terminal. Impact annotations on hover surface blast radius without running commands | ✗ | ✓ | 3 | No | — |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 #96 marked DONE based on a change that hasn't shipped

The DONE note reads: "graphology and graphology-communities-louvain removed by vendoring Leiden algorithm. Runtime dependency count is now back to 3." However, both packages are still present as runtime dependencies in package.json:

"graphology": "^0.26.0",
"graphology-communities-louvain": "^2.0.2",

And src/graph/algorithms/louvain.js still imports graphology-communities-louvain as its algorithm backend. The Leiden vendoring (which this DONE justification is predicated on) has not actually been merged — there is no src/graph/algorithms/leiden/ directory in the repository. Marking #96 as DONE here is premature and will mislead readers into thinking the dependency count is already 3 when it is still 5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — reverted #96 from DONE to BLOCKED on #545. graphology and graphology-communities-louvain are still live runtime dependencies until the Leiden vendoring PR merges.

The section description referenced src/graph/algorithms/leiden/ which
does not exist yet (pending PR #545). Item #96 was incorrectly marked
DONE when graphology is still a live dependency. Updated section to be
forward-looking and reverted #96 to BLOCKED on #545.
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +121 to +123
| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number | ✓ | ✓ | 3 | No | — |
| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions | ✓ | ✓ | 3 | No | — |
| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture | ✓ | ✓ | 4 | No | — |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 "Depends on" column should reference #545 for items #100#102

The section description explicitly states these features are available "once the Leiden vendoring lands", yet all three items still have in the "Depends on" column. This is inconsistent with:

A reader scanning the table independently of the section header would see no dependency and might schedule any of these items before #545 is merged, only to discover the Leiden resolution/quality-function API they need isn't available yet.

Suggested change
| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number ||| 3 | No | |
| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions ||| 3 | No | |
| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture ||| 4 | No | |
| 100 | Weighted community labels | Auto-generate a human-readable label for each community from its member files and symbols. Heuristics: most common directory prefix, dominant symbol kinds, shared naming patterns (e.g., "parsing pipeline", "CLI presentation", "graph algorithms"). Store labels in `communities` output and `graph-enrichment.js`. Expose as `--labels` flag on `communities` command. | Intelligence | Raw community IDs (0, 1, 2…) are meaningless to agents and humans. Labels like "database layer" or "test utilities" make community output immediately actionable — agents can reference architectural groups by name instead of number ||| 3 | No | #545 |
| 101 | Hierarchical community decomposition | Run Leiden at multiple resolution levels (e.g., γ=0.5, 1.0, 2.0) and expose nested community structure — macro-clusters containing sub-clusters. The vendored optimiser already computes multi-level coarsening internally; surface it as `communities --hierarchical` with a tree output showing which fine-grained communities nest inside coarse ones. Store hierarchy in a `community_hierarchy` table or JSON metadata. | Architecture | Single-resolution communities force a choice between broad architectural groups and tight cohesion clusters. Hierarchical decomposition gives both — agents can zoom from "this is the graph subsystem" to "specifically the Leiden algorithm cluster within it" without re-running at different resolutions ||| 3 | No | #545 |
| 102 | Community-aware impact scoring | Factor community boundaries into `fn-impact` and `diff-impact` risk scoring. Changes that cross community boundaries are architecturally riskier than changes within a single community — they indicate coupling between modules that should be independent. Add `crossCommunityCount` to impact output and weight it in triage risk scoring. A function with blast radius 5 all within one community is lower risk than blast radius 5 spanning 4 communities. | Analysis | Directly improves blast radius accuracy — the core problem codegraph exists to solve. Community-crossing impact is a strong signal for architectural coupling that raw call-chain fan-out doesn't capture ||| 4 | No | #545 |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added #545 as dependency for all three community leverage items (100, 101, 102) in the 'Depends on' column, consistent with the section description and #96.

@carlos-alm
Copy link
Contributor Author

@greptileai

|----|-------|-------------|----------|---------|----------|-------------------|-------------------|----------|------------|
| 53 | Co-change coupling in triage | Files with high co-change Jaccard scores are maintenance risks — changing one always requires changing the other. Add `coChangeCoupling` as a triage scoring dimension using the max Jaccard score for each file. | Intelligence | Triage surfaces tightly coupled file pairs that static dependency analysis may miss — hidden coupling from shared business logic or implicit contracts | ✓ | ✓ | 4 | No | — |
| 54 | Co-change communities vs dependency communities | Compare Louvain communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions | ✓ | ✓ | 3 | No | — |
| 54 | Co-change communities vs dependency communities | Compare Leiden communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions | ✓ | ✓ | 3 | No | — |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Item #54 "Depends on" not updated after renaming to "Leiden communities"

The description was updated from "Compare Louvain communities" to "Compare Leiden communities", but the "Depends on" column still shows . This is the same inconsistency that was caught and fixed for items #100#102 in a prior review: those items explicitly reference Leiden and now correctly list #545 in "Depends on".

A developer scanning item #54 independently of the section header would see no dependency, attempt to implement it against the current graphology-communities-louvain backend, and produce output the description calls "Leiden communities" — which is factually wrong until #545 ships.

Two consistent options:

Option A — Update "Depends on" to #545, matching the pattern already applied to #100#102:

Suggested change
| 54 | Co-change communities vs dependency communities | Compare Leiden communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions ||| 3 | No | |
| 54 | Co-change communities vs dependency communities | Compare Leiden communities from the dependency graph with clusters from co-change data. Files that co-change frequently but live in different dependency communities indicate hidden coupling or architectural drift. | Architecture | Surfaces coupling that the static graph can't see — two modules may have no import relationship but always change together due to shared assumptions ||| 3 | No | #545 |

Option B — Keep the dependency as but make the description algorithm-agnostic so it can be implemented today with the existing Louvain backend:

| 54 | Co-change communities vs dependency communities | Compare dependency-graph communities with clusters from co-change data. ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant