docs: add unified multi-repo graph to backlog (#97)#514
Conversation
Add backlog item for building a single graph spanning multiple repositories, connected via package imports, API boundary inference, and shared schema/proto files. Depends on #78 (cross-repo symbol resolution). Placed in Tier 1i.
|
Claude finished @carlos-alm's task —— View job PR Review CompleteRoot Cause AnalysisProblem Being Solved: Adding backlog item #97 "Unified multi-repo graph" to the documentation for prioritization tracking Backlog Compliance
Technical AssessmentThis PR adds exactly one line (175) to The addition is:
Critical ConcernsNone. This is a straightforward documentation addition with no technical risks. Final Recommendation
|
Greptile SummaryThis PR adds backlog item #97 (Unified multi-repo graph) to the Tier 1i table in One remaining documentation consistency issue was found:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["#78 Cross-repo symbol resolution\n(prerequisite)"] --> B["Adds repo qualifier\nto edges table"]
A --> C["Resolves package-name → repo\nvia registry.json"]
B --> D["#97 Unified multi-repo graph\n(this PR)"]
C --> D
D --> E["codegraph build --repos path1 path2 ..."]
E --> F["Parse each repo independently"]
F --> G["Merge step: stitch into\none SQLite DB"]
G --> H["Mode A: npm/pip/go\npackage imports"]
G --> I["Mode B: API boundary inference\ncross-repo-api edges"]
G --> J["Mode C: shared .proto /\nOpenAPI / GraphQL schemas"]
H --> K["repo:path qualified nodes & edges"]
I --> K
J --> K
K --> L["All existing commands work transparently\nfn-impact · diff-impact · path · audit · triage · exports"]
Last reviewed commit: "fix: mark unified mu..." |
docs/roadmap/BACKLOG.md
Outdated
| | 77 | Metric trend tracking (code insights) | `codegraph trends` computes key graph metrics (total symbols, avg complexity, dead code count, cycle count, community drift score, boundary violations) at historical git revisions and outputs a time-series table or JSON. Uses `git stash && git checkout <rev> && build && collect && restore` loop over sampled commits (configurable `--samples N` defaulting to 10 evenly-spaced commits). Stores results in a `metric_snapshots` table for incremental updates. `--since` and `--until` for date range. `--metric` to select specific metrics. Enables tracking migration progress ("how many files still use old API?"), tech debt trends, and codebase growth over time without external dashboards. | Intelligence | Agents and teams can answer "is our codebase getting healthier or worse?" with data instead of intuition — tracks complexity trends, dead code accumulation, architectural drift, and migration progress over time. Historical backfill from git history means instant visibility into months of trends | ✓ | ✓ | 3 | No | — | | ||
| | 78 | Cross-repo symbol resolution | In multi-repo mode, resolve import edges that cross repository boundaries. When repo A imports `@org/shared-lib`, and repo B is `@org/shared-lib` in the registry, create cross-repo edges linking A's import to B's actual exported symbol. Requires matching npm/pip/go package names to registered repos. Store cross-repo edges with a `repo` qualifier in the `edges` table. Enables cross-repo `fn-impact` (changing a shared library function shows impact across all consuming repos), cross-repo `path` queries, and cross-repo `diff-impact`. | Navigation | Multi-repo mode currently treats each repo as isolated — agents can search across repos but can't trace dependencies between them. Cross-repo edges enable "if I change this shared utility, which downstream repos break?" — the highest-value question in monorepo and multi-repo architectures | ✓ | ✓ | 5 | No | — | | ||
| | 79 | Advanced query language with boolean operators and output shaping | Extend `codegraph search` and `codegraph where` with a structured query syntax supporting: **(a)** boolean operators — `kind:function AND file:src/` , `name:parse OR name:extract`, `NOT kind:class`; **(b)** compound filters — `kind:method AND complexity.cognitive>15 AND role:core`; **(c)** output shaping — `--select symbols` (just names), `--select files` (distinct files), `--select owners` (CODEOWNERS for matches), `--select stats` (aggregate counts by kind/file/role); **(d)** result aggregation — `--group-by file`, `--group-by kind`, `--group-by community` with counts. Parse the query into a SQL WHERE clause against the `nodes`/`function_complexity`/`edges` tables. Expose as `query_language` MCP tool parameter. | Search | Current search is either keyword/semantic (fuzzy) or exact-name (`where`). Agents needing "all core functions with cognitive complexity > 15 in src/api/" must chain multiple commands and filter manually — wasting tokens on intermediate results. A structured query language answers compound questions in one call | ✓ | ✓ | 4 | No | — | | ||
| | 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | No | 78 | |
There was a problem hiding this comment.
Breaking column should be
Yes, not No
The backlog's own column definition (line 17) states: "Yes means existing CLI output, API signatures, DB schema, or MCP tool contracts change in incompatible ways."
Item #97 explicitly requires two changes that match this definition:
- DB schema change — "Store a
repocolumn onnodesandedgestables to partition ownership" — adding a column to existing tables is a schema change. - CLI output / file path format change — the unified graph uses
repo:path-qualified paths everywhere, which alters the output format of every existing query command that emits file paths (fn-impact,path,audit,exports, etc.).
Both of these qualify as breaking by the document's own rules. The Breaking cell should be Yes, and the entry will need a note about requiring a major version bump (consistent with how this column is described).
| | 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | No | 78 | | |
| | 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | Yes | 78 | |
There was a problem hiding this comment.
Fixed — changed Breaking column from \ to . Both the DB schema change (new \ column on nodes/edges) and the output format change (-qualified paths) qualify as breaking per the document's own definition.
docs/roadmap/BACKLOG.md
Outdated
| | 77 | Metric trend tracking (code insights) | `codegraph trends` computes key graph metrics (total symbols, avg complexity, dead code count, cycle count, community drift score, boundary violations) at historical git revisions and outputs a time-series table or JSON. Uses `git stash && git checkout <rev> && build && collect && restore` loop over sampled commits (configurable `--samples N` defaulting to 10 evenly-spaced commits). Stores results in a `metric_snapshots` table for incremental updates. `--since` and `--until` for date range. `--metric` to select specific metrics. Enables tracking migration progress ("how many files still use old API?"), tech debt trends, and codebase growth over time without external dashboards. | Intelligence | Agents and teams can answer "is our codebase getting healthier or worse?" with data instead of intuition — tracks complexity trends, dead code accumulation, architectural drift, and migration progress over time. Historical backfill from git history means instant visibility into months of trends | ✓ | ✓ | 3 | No | — | | ||
| | 78 | Cross-repo symbol resolution | In multi-repo mode, resolve import edges that cross repository boundaries. When repo A imports `@org/shared-lib`, and repo B is `@org/shared-lib` in the registry, create cross-repo edges linking A's import to B's actual exported symbol. Requires matching npm/pip/go package names to registered repos. Store cross-repo edges with a `repo` qualifier in the `edges` table. Enables cross-repo `fn-impact` (changing a shared library function shows impact across all consuming repos), cross-repo `path` queries, and cross-repo `diff-impact`. | Navigation | Multi-repo mode currently treats each repo as isolated — agents can search across repos but can't trace dependencies between them. Cross-repo edges enable "if I change this shared utility, which downstream repos break?" — the highest-value question in monorepo and multi-repo architectures | ✓ | ✓ | 5 | No | — | | ||
| | 79 | Advanced query language with boolean operators and output shaping | Extend `codegraph search` and `codegraph where` with a structured query syntax supporting: **(a)** boolean operators — `kind:function AND file:src/` , `name:parse OR name:extract`, `NOT kind:class`; **(b)** compound filters — `kind:method AND complexity.cognitive>15 AND role:core`; **(c)** output shaping — `--select symbols` (just names), `--select files` (distinct files), `--select owners` (CODEOWNERS for matches), `--select stats` (aggregate counts by kind/file/role); **(d)** result aggregation — `--group-by file`, `--group-by kind`, `--group-by community` with counts. Parse the query into a SQL WHERE clause against the `nodes`/`function_complexity`/`edges` tables. Expose as `query_language` MCP tool parameter. | Search | Current search is either keyword/semantic (fuzzy) or exact-name (`where`). Agents needing "all core functions with cognitive complexity > 15 in src/api/" must chain multiple commands and filter manually — wasting tokens on intermediate results. A structured query language answers compound questions in one call | ✓ | ✓ | 4 | No | — | | ||
| | 97 | Unified multi-repo graph | New `codegraph build --repos <path1> <path2> ...` (or `.codegraphrc.json` `repos[]` list) that builds a single unified graph spanning multiple repositories. Each repo is parsed independently, then a merge step stitches them into one SQLite DB with repo-qualified file paths (`repo:path`). Three connection modes: **(a)** npm/pip/go package imports — repo A imports `@org/lib` which is repo B, resolved via `package.json`/`setup.py`/`go.mod` name matching; **(b)** API boundary inference — repo A calls `fetch('/api/users')` and repo B defines an Express/Flask/Gin route for `/api/users`, linked as a `cross-repo-api` edge with lower confidence; **(c)** shared schema/proto — repos sharing `.proto`, OpenAPI, or GraphQL schema files get edges through the shared contract types. All existing query commands (`fn-impact`, `diff-impact`, `path`, `audit`, `triage`, `exports`) work transparently on the unified graph — changing a shared library function shows impact across all consuming repos in one query. Requires a `repos` registry mapping package names to local paths (extend existing `~/.codegraph/registry.json`). Store a `repo` column on `nodes` and `edges` tables to partition ownership. | Navigation | Current multi-repo mode (`--multi-repo`) keeps each repo's graph isolated — you can search across repos but can't trace how a change in one repo impacts another. Real-world systems span multiple repos connected by package imports, API integrations, or shared schemas. A unified graph answers "if I change this endpoint handler, which frontend components break?" or "if I update this shared utility, which downstream services are affected?" — the highest-value cross-cutting questions that currently require manual tracing across repo boundaries | ✓ | ✓ | 5 | No | 78 | |
There was a problem hiding this comment.
ID #97 inserted out of numerical order
All other rows in Tier 1i are IDs 74–79 (ascending), followed by the struck-through promoted item #80. Inserting #97 between #79 and #80 breaks the ascending ID sequence and could make it harder to locate items by ID when the table grows.
The highest previously assigned ID in the file is #96 (Tier 1j). Since IDs appear to be assigned sequentially and #97 is the next in that sequence, consider either:
- Appending feat: enhance Mermaid export with subgraphs, edge labels, node shapes and styling #97 after fix(cli): graceful error for cycles, export, embed when no graph.db exists #79 to keep Tier 1i grouped by relevance, but documenting that the ID reflects creation order rather than tier order, or
- Using a consistent comment/note (like the PROMOTED annotations) to indicate why the sequence jumps.
This is minor but worth aligning with the existing convention before the table gets larger.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Acknowledged — ID #97 is the latest sequentially assigned ID. It's placed at the end of Tier 1i because it's the newest addition, grouped by relevance to the navigation category. The jump from #79 to #97 reflects that IDs #80–96 were assigned to other tiers. The ordering is consistent with creation order within the tier.
DB schema change (new repo column on nodes/edges) and output format change (repo:path-qualified paths) both qualify as breaking per the backlog's own column definition.
Summary
Test plan