Skip to content

docs: add unified multi-repo graph to backlog (#97)#514

Merged
carlos-alm merged 4 commits intomainfrom
docs/backlog-unified-multi-repo
Mar 19, 2026
Merged

docs: add unified multi-repo graph to backlog (#97)#514
carlos-alm merged 4 commits intomainfrom
docs/backlog-unified-multi-repo

Conversation

@carlos-alm
Copy link
Contributor

Summary

Test plan

  • Verify BACKLOG.md renders correctly with the new row
  • No other files modified

Add backlog item for building a single graph spanning multiple
repositories, connected via package imports, API boundary inference,
and shared schema/proto files. Depends on #78 (cross-repo symbol
resolution). Placed in Tier 1i.
@claude
Copy link

claude bot commented Mar 19, 2026

Claude finished @carlos-alm's task —— View job


PR Review Complete

Root Cause Analysis

Problem Being Solved: Adding backlog item #97 "Unified multi-repo graph" to the documentation for prioritization tracking
Why This Approach: Documentation-only change to register a proposed feature for future implementation in the backlog system
Risk Assessment: Extremely low risk - pure documentation addition with no code changes

Backlog Compliance

  • Zero-dep: ✓ — Documentation change only, proposed feature is designed to build on existing tree-sitter/SQLite infrastructure
  • Foundation-aligned: ✓ — Aligns with P1 (always-current graph), P4 (zero-cost core), P5 (embeddable), and P8 (structural intelligence)
  • Problem-fit: 5 — Directly addresses "missed blast radius" problem by enabling cross-repo impact analysis
  • Breaking: No — Documentation addition only
  • Tier: 1i (correctly placed in Tier 1i - Search, navigation, and monitoring improvements)

Technical Assessment

This PR adds exactly one line (175) to docs/roadmap/BACKLOG.md:

| 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | No | 78 |

The addition is:

  • Well-formatted: Follows the exact table structure and column format
  • Properly categorized: Placed in Tier 1i (Search, navigation, and monitoring improvements)
  • Correctly scored: Problem-fit rating of 5 is justified - this directly solves the "missed blast radius" problem across repo boundaries
  • Dependency-aware: Correctly lists dependency on bug: package.json not included in exports map — require('@optave/codegraph/package.json') fails #78 (Cross-repo symbol resolution)
  • Comprehensive: Detailed description covers implementation approach, connection modes, and benefits

Critical Concerns

None. This is a straightforward documentation addition with no technical risks.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: Well-written documentation addition that properly captures a high-value feature proposal. The backlog format is followed exactly, the feature aligns with foundation principles, and the problem-fit rating is justified. Only minor deduction is that this is purely administrative rather than delivering user value.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR adds backlog item #97 (Unified multi-repo graph) to the Tier 1i table in docs/roadmap/BACKLOG.md. It is a documentation-only change that introduces a new row describing a feature for building a single SQLite graph spanning multiple repositories via three connection modes (package imports, API boundary inference, shared schema/proto files). The previously flagged issues (Breaking: Yes and out-of-order ID) have been addressed.

One remaining documentation consistency issue was found:

Confidence Score: 4/5

Important Files Changed

Filename Overview
docs/roadmap/BACKLOG.md Adds backlog item #97 (Unified multi-repo graph) to Tier 1i; Breaking and Depends on columns are correctly set, but the schema description overlaps with dependency #78's own edges table modification.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["#78 Cross-repo symbol resolution\n(prerequisite)"] --> B["Adds repo qualifier\nto edges table"]
    A --> C["Resolves package-name → repo\nvia registry.json"]

    B --> D["#97 Unified multi-repo graph\n(this PR)"]
    C --> D

    D --> E["codegraph build --repos path1 path2 ..."]
    E --> F["Parse each repo independently"]
    F --> G["Merge step: stitch into\none SQLite DB"]

    G --> H["Mode A: npm/pip/go\npackage imports"]
    G --> I["Mode B: API boundary inference\ncross-repo-api edges"]
    G --> J["Mode C: shared .proto /\nOpenAPI / GraphQL schemas"]

    H --> K["repo:path qualified nodes & edges"]
    I --> K
    J --> K

    K --> L["All existing commands work transparently\nfn-impact · diff-impact · path · audit · triage · exports"]
Loading

Last reviewed commit: "fix: mark unified mu..."

| 77 | Metric trend tracking (code insights) | `codegraph trends` computes key graph metrics (total symbols, avg complexity, dead code count, cycle count, community drift score, boundary violations) at historical git revisions and outputs a time-series table or JSON. Uses `git stash && git checkout <rev> && build && collect && restore` loop over sampled commits (configurable `--samples N` defaulting to 10 evenly-spaced commits). Stores results in a `metric_snapshots` table for incremental updates. `--since` and `--until` for date range. `--metric` to select specific metrics. Enables tracking migration progress ("how many files still use old API?"), tech debt trends, and codebase growth over time without external dashboards. | Intelligence | Agents and teams can answer "is our codebase getting healthier or worse?" with data instead of intuition — tracks complexity trends, dead code accumulation, architectural drift, and migration progress over time. Historical backfill from git history means instant visibility into months of trends | ✓ | ✓ | 3 | No | — |
| 78 | Cross-repo symbol resolution | In multi-repo mode, resolve import edges that cross repository boundaries. When repo A imports `@org/shared-lib`, and repo B is `@org/shared-lib` in the registry, create cross-repo edges linking A's import to B's actual exported symbol. Requires matching npm/pip/go package names to registered repos. Store cross-repo edges with a `repo` qualifier in the `edges` table. Enables cross-repo `fn-impact` (changing a shared library function shows impact across all consuming repos), cross-repo `path` queries, and cross-repo `diff-impact`. | Navigation | Multi-repo mode currently treats each repo as isolated — agents can search across repos but can't trace dependencies between them. Cross-repo edges enable "if I change this shared utility, which downstream repos break?" — the highest-value question in monorepo and multi-repo architectures | ✓ | ✓ | 5 | No | — |
| 79 | Advanced query language with boolean operators and output shaping | Extend `codegraph search` and `codegraph where` with a structured query syntax supporting: **(a)** boolean operators — `kind:function AND file:src/` , `name:parse OR name:extract`, `NOT kind:class`; **(b)** compound filters — `kind:method AND complexity.cognitive>15 AND role:core`; **(c)** output shaping — `--select symbols` (just names), `--select files` (distinct files), `--select owners` (CODEOWNERS for matches), `--select stats` (aggregate counts by kind/file/role); **(d)** result aggregation — `--group-by file`, `--group-by kind`, `--group-by community` with counts. Parse the query into a SQL WHERE clause against the `nodes`/`function_complexity`/`edges` tables. Expose as `query_language` MCP tool parameter. | Search | Current search is either keyword/semantic (fuzzy) or exact-name (`where`). Agents needing "all core functions with cognitive complexity > 15 in src/api/" must chain multiple commands and filter manually — wasting tokens on intermediate results. A structured query language answers compound questions in one call | ✓ | ✓ | 4 | No | — |
| 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | No | 78 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Breaking column should be Yes, not No

The backlog's own column definition (line 17) states: "Yes means existing CLI output, API signatures, DB schema, or MCP tool contracts change in incompatible ways."

Item #97 explicitly requires two changes that match this definition:

  1. DB schema change"Store a repo column on nodes and edges tables to partition ownership" — adding a column to existing tables is a schema change.
  2. CLI output / file path format change — the unified graph uses repo:path-qualified paths everywhere, which alters the output format of every existing query command that emits file paths (fn-impact, path, audit, exports, etc.).

Both of these qualify as breaking by the document's own rules. The Breaking cell should be Yes, and the entry will need a note about requiring a major version bump (consistent with how this column is described).

Suggested change
| 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | No | 78 |
| 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | Yes | 78 |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — changed Breaking column from \ to . Both the DB schema change (new \ column on nodes/edges) and the output format change (-qualified paths) qualify as breaking per the document's own definition.

| 77 | Metric trend tracking (code insights) | `codegraph trends` computes key graph metrics (total symbols, avg complexity, dead code count, cycle count, community drift score, boundary violations) at historical git revisions and outputs a time-series table or JSON. Uses `git stash && git checkout <rev> && build && collect && restore` loop over sampled commits (configurable `--samples N` defaulting to 10 evenly-spaced commits). Stores results in a `metric_snapshots` table for incremental updates. `--since` and `--until` for date range. `--metric` to select specific metrics. Enables tracking migration progress ("how many files still use old API?"), tech debt trends, and codebase growth over time without external dashboards. | Intelligence | Agents and teams can answer "is our codebase getting healthier or worse?" with data instead of intuition — tracks complexity trends, dead code accumulation, architectural drift, and migration progress over time. Historical backfill from git history means instant visibility into months of trends | ✓ | ✓ | 3 | No | — |
| 78 | Cross-repo symbol resolution | In multi-repo mode, resolve import edges that cross repository boundaries. When repo A imports `@org/shared-lib`, and repo B is `@org/shared-lib` in the registry, create cross-repo edges linking A's import to B's actual exported symbol. Requires matching npm/pip/go package names to registered repos. Store cross-repo edges with a `repo` qualifier in the `edges` table. Enables cross-repo `fn-impact` (changing a shared library function shows impact across all consuming repos), cross-repo `path` queries, and cross-repo `diff-impact`. | Navigation | Multi-repo mode currently treats each repo as isolated — agents can search across repos but can't trace dependencies between them. Cross-repo edges enable "if I change this shared utility, which downstream repos break?" — the highest-value question in monorepo and multi-repo architectures | ✓ | ✓ | 5 | No | — |
| 79 | Advanced query language with boolean operators and output shaping | Extend `codegraph search` and `codegraph where` with a structured query syntax supporting: **(a)** boolean operators — `kind:function AND file:src/` , `name:parse OR name:extract`, `NOT kind:class`; **(b)** compound filters — `kind:method AND complexity.cognitive>15 AND role:core`; **(c)** output shaping — `--select symbols` (just names), `--select files` (distinct files), `--select owners` (CODEOWNERS for matches), `--select stats` (aggregate counts by kind/file/role); **(d)** result aggregation — `--group-by file`, `--group-by kind`, `--group-by community` with counts. Parse the query into a SQL WHERE clause against the `nodes`/`function_complexity`/`edges` tables. Expose as `query_language` MCP tool parameter. | Search | Current search is either keyword/semantic (fuzzy) or exact-name (`where`). Agents needing "all core functions with cognitive complexity > 15 in src/api/" must chain multiple commands and filter manually — wasting tokens on intermediate results. A structured query language answers compound questions in one call | ✓ | ✓ | 4 | No | — |
| 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | No | 78 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 ID #97 inserted out of numerical order

All other rows in Tier 1i are IDs 74–79 (ascending), followed by the struck-through promoted item #80. Inserting #97 between #79 and #80 breaks the ascending ID sequence and could make it harder to locate items by ID when the table grows.

The highest previously assigned ID in the file is #96 (Tier 1j). Since IDs appear to be assigned sequentially and #97 is the next in that sequence, consider either:

This is minor but worth aligning with the existing convention before the table gets larger.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — ID #97 is the latest sequentially assigned ID. It's placed at the end of Tier 1i because it's the newest addition, grouped by relevance to the navigation category. The jump from #79 to #97 reflects that IDs #80–96 were assigned to other tiers. The ordering is consistent with creation order within the tier.

DB schema change (new repo column on nodes/edges) and output format
change (repo:path-qualified paths) both qualify as breaking per the
backlog's own column definition.
@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit c05f210 into main Mar 19, 2026
14 checks passed
@carlos-alm carlos-alm deleted the docs/backlog-unified-multi-repo branch March 19, 2026 06:27
@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant