feat: add .pyi/.phtml/.rake/.gemspec extensions, condense comparison table#502
feat: add .pyi/.phtml/.rake/.gemspec extensions, condense comparison table#502carlos-alm merged 44 commits intomainfrom
Conversation
- Remove dead `truncate` function from ast-analysis/shared.js (0 consumers) - Remove dead `truncStart` function from presentation/table.js (0 consumers) - Un-export `BATCH_CHUNK` in builder/helpers.js (only used internally) Skipped sync.json targets that were false positives: - BUILTIN_RECEIVERS: used by incremental.js + build-edges.js - TRANSIENT_CODES/RETRY_DELAY_MS: internal to readFileSafe - MAX_COL_WIDTH: internal to printAutoTable - findFunctionNode: re-exported from index.js, used in tests Impact: 1 functions changed, 32 affected
Impact: 29 functions changed, 105 affected
…ures Impact: 5 functions changed, 7 affected
connection.js: add debug() logging to all 8 catch-with-fallback blocks so failures are observable without changing behavior. migrations.js: replace 14 try/catch blocks in initSchema with hasColumn() and hasTable() guards. CREATE INDEX calls use IF NOT EXISTS directly. getBuildMeta uses hasTable() check instead of try/catch. Impact: 10 functions changed, 19 affected
Add debug() logging to 10 empty catch blocks across context.js, symbol-lookup.js, exports.js, impact.js, and module-map.js. All catches retain their fallback behavior but failures are now observable via debug logging. Impact: 6 functions changed, 18 affected
Add debug() logging to 6 empty catch blocks: 3 in disposeParsers() for WASM resource cleanup, 2 in ensureWasmTrees() for file read and parse failures, and 1 in getActiveEngine() for version lookup. Impact: 3 functions changed, 0 affected
Add debug() logging to 9 empty catch blocks across complexity.js (5), cfg.js (2), and dataflow.js (2). All catches for file read and parse failures now log the error message before continuing. Impact: 4 functions changed, 2 affected
Split the monolithic walkJavaScriptNode switch (13 cases, cognitive 228) into 11 focused handler functions. The dispatcher is now a thin switch that delegates to handleFunctionDecl, handleClassDecl, handleMethodDef, handleInterfaceDecl, handleTypeAliasDecl, handleVariableDecl, handleEnumDecl, handleCallExpr, handleImportStmt, handleExportStmt, and handleExpressionStmt. The expression_statement case now reuses the existing handleCommonJSAssignment helper, eliminating ~50 lines of duplication. Worst handler complexity: handleVariableDecl (cognitive 20), down from the original monolithic function (cognitive 279). Impact: 13 functions changed, 3 affected
Split walkPythonNode switch into 7 focused handlers: handlePyFunctionDef, handlePyClassDef, handlePyCall, handlePyImport, handlePyExpressionStmt, handlePyImportFrom, plus the decorated_definition inline dispatch. Moved extractPythonParameters, extractPythonClassProperties, walkInitBody, and findPythonParentClass from closures to module-scope functions. Impact: 12 functions changed, 5 affected
Split walkJavaNode switch into 8 focused handlers plus an extractJavaInterfaces helper. Moved findJavaParentClass to module scope. The class_declaration case (deepest nesting in the file) is now split between handleJavaClassDecl and extractJavaInterfaces. Impact: 12 functions changed, 5 affected
Apply the same per-category handler decomposition to all remaining language extractors: Go (6 handlers), Ruby (8 handlers), PHP (11 handlers), C# (11 handlers), Rust (9 handlers), HCL (4 handlers). Each extractor now follows the template established by the JS extractor: - Thin entry function creates ctx, delegates to walkXNode - walkXNode is a thin dispatcher switch - Each case is a named handler function at module scope - Helper functions (findParentClass, etc.) moved to module scope Impact: 66 functions changed, 23 affected
…pers Move nested handler functions to module level in cfg-visitor.js, dataflow-visitor.js, and complexity-visitor.js — reducing cognitive complexity of each factory function from 100-337 down to thin coordinators. Extract WASM pre-parse, visitor setup, result storage, and build delegation from runAnalyses into focused helper functions. Impact: 66 functions changed, 43 affected
Extract edge-building by type (import, call-native, call-JS, class hierarchy) from buildEdges. Extract per-phase insertion logic from insertNodes. Extract scoped/incremental/full-build paths and reverse-dep cascade from detectChanges. Extract setup, engine init, alias loading from pipeline.js. Extract node/edge-building helpers from incremental.js rebuildFile. Impact: 44 functions changed, 19 affected
Impact: 37 functions changed, 29 affected
Impact: 5 functions changed, 3 affected
…sification Impact: 8 functions changed, 3 affected
Impact: 10 functions changed, 5 affected
Impact: 5 functions changed, 2 affected
Impact: 5 functions changed, 2 affected
Impact: 12 functions changed, 6 affected
…age) Extract per-section validators from validateBoundaryConfig (cog 101→2). Extract buildCommunityObjects and analyzeDrift from communitiesData (cog 32→4). Extract buildTriageItems and computeTriageSummary from triageData (bugs 1.4→0.48). Impact: 13 functions changed, 11 affected
Extract printDiffFunctions/Coupled/Ownership/Boundaries/Summary from diffImpact (cog 28→6, cyc 21→7). Extract printExportHeader/Symbols from fileExports. Extract printNotFound/PathSteps from symbolPath. Impact: 12 functions changed, 7 affected
Extract runManifesto/validateKind from check execute (cyc 14→10). Extract runHotspots/validateFilters/parseWeights from triage execute (cyc 13→4). Extract loadMCPSdk/createLazyLoaders/resolveDbPath/validateMultiRepoAccess from startMCPServer (cog 34→13, cyc 19→7). Impact: 14 functions changed, 4 affected
Move createRequire assignment after all import declarations to satisfy Biome's import sorting rule. Impact: 3 functions changed, 1 affected
Impact: 28 functions changed, 35 affected
Impact: 31 functions changed, 8 affected
…elds in lockfile
…nto release/3.2.0
Extract per-file typeMap (varName → typeName) from type annotations, new expressions, and typed parameters for JS/TS, Java, Go, Rust, C#, PHP, and Python. The edge resolver uses typeMap to connect variable receiver calls (x.method()) to their declared types (Type.method()) with 0.9 confidence. Implemented in both WASM (JS extractors) and native (Rust extractors + edge builder) engines for full parity. Updated README language table with symbols-extracted, type-inference, and engine-parity columns. Impact: 73 functions changed, 54 affected
… extractor (#501) Impact: 1 functions changed, 2 affected
…501) Impact: 2 functions changed, 3 affected
…on table - Python: .pyi (type stubs), PHP: .phtml, Ruby: .rake/.gemspec - Updated WASM registry, native Rust registry, and import resolution - Condensed README comparison table from 28×8 to 12×5, verified claims against actual repos, added security scanning row, grouped features - Marked backlog #71 (type inference) as done (#501) - Ruby/HCL type inference column: N/A (dynamic/declarative) Impact: 3 functions changed, 3 affected
Greptile SummaryThis PR extends language support with four new file extensions ( Key observations:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[parseFile / parseFiles] --> B{Native engine?}
B -- No --> C[WASM extract\ntypeMap included]
B -- Yes --> D[native.parseFile / parseFiles]
D --> E{typeMap empty\nor missing?}
E -- No --> F[Return patched result]
E -- Yes --> G[backfillTypeMap\nWASM AST extraction]
G --> F
F --> H[buildCallEdgesNative]
H --> I[native.buildCallEdges]
I --> J{hasReceiver\nin nativeEdges?}
J -- Yes >= 3.2.0 + annotations --> K[Done — native edges used]
J -- No old binary OR no annotations --> L[supplementReceiverEdges\nJS shim]
L --> K
style J fill:#f9c,stroke:#c66
style L fill:#ffd,stroke:#aa0
Last reviewed commit: "fix: replace regex t..." |
| 'app.post("/items");', | ||
| '', | ||
| ].join('\n'), | ||
| ); |
There was a problem hiding this comment.
Misleading "native deferred" comment
The comment says // Force WASM engine — typeMap resolution is JS-only (native deferred), but native Rust support for typeMap is fully implemented in this very PR — edge_builder.rs has a complete type-map-aware resolution path, and parser_registry.rs / types.rs expose typeMap through the NAPI layer.
The real reason to force WASM in this test is likely that the native binary isn't guaranteed to be compiled in the CI environment where unit/integration tests run, not that the feature is unimplemented. Using an incorrect rationale here will mislead future contributors and could cause someone to remove native support thinking it's still "deferred".
Suggested fix:
| ); | |
| // Force WASM engine — native binary may not be present in all test environments | |
| await buildGraph(typedDir, { skipRegistry: true, engine: 'wasm' }); |
There was a problem hiding this comment.
Fixed — updated the comment to: "Force WASM engine — native binary may not be present in all test environments". The original rationale was incorrect since native Rust type inference is fully implemented in this PR.
src/extractors/javascript.js
Outdated
| function walk(node) { | ||
| const t = node.type; | ||
| if (t === 'variable_declarator') { | ||
| const nameN = node.childForFieldName('name'); | ||
| if (nameN && nameN.type === 'identifier') { | ||
| const typeAnno = findChild(node, 'type_annotation'); | ||
| if (typeAnno) { | ||
| const typeName = extractSimpleTypeName(typeAnno); | ||
| if (typeName) typeMap.set(nameN.text, typeName); | ||
| } else { | ||
| const valueN = node.childForFieldName('value'); | ||
| if (valueN && valueN.type === 'new_expression') { | ||
| const ctorType = extractNewExprTypeName(valueN); | ||
| if (ctorType) typeMap.set(nameN.text, ctorType); | ||
| } |
There was a problem hiding this comment.
extractTypeMapWalk has no recursion depth guard
Every other JS type-map extractor added in this PR (php.js, python.js, go.js, rust.js, csharp.js) protects against deep ASTs with an explicit depth limit:
function extractPhpTypeMapDepth(node, ctx, depth) {
if (depth >= 200) return;The inner walk closure in extractTypeMapWalk recurses unconditionally into every child node. For pathologically large or deeply nested files (e.g., auto-generated code, deeply nested template expressions), this can exhaust the call stack and throw a RangeError: Maximum call stack size exceeded.
Add a depth parameter and guard, consistent with every other extractor in this PR:
| function walk(node) { | |
| const t = node.type; | |
| if (t === 'variable_declarator') { | |
| const nameN = node.childForFieldName('name'); | |
| if (nameN && nameN.type === 'identifier') { | |
| const typeAnno = findChild(node, 'type_annotation'); | |
| if (typeAnno) { | |
| const typeName = extractSimpleTypeName(typeAnno); | |
| if (typeName) typeMap.set(nameN.text, typeName); | |
| } else { | |
| const valueN = node.childForFieldName('value'); | |
| if (valueN && valueN.type === 'new_expression') { | |
| const ctorType = extractNewExprTypeName(valueN); | |
| if (ctorType) typeMap.set(nameN.text, ctorType); | |
| } | |
| function extractTypeMapWalk(rootNode, typeMap) { | |
| function walk(node, depth) { | |
| if (depth >= 200) return; | |
| const t = node.type; | |
| if (t === 'variable_declarator') { |
There was a problem hiding this comment.
Fixed — added if (depth >= 200) return; guard to extractTypeMapWalk, consistent with every other type-map extractor in this PR. The walk closure now takes a depth parameter, incremented on each recursive call.
| const methodCandidates = (ctx.nodesByName.get(call.name) || []).filter( | ||
| (n) => n.name.endsWith(`.${call.name}`) && n.kind === 'method', | ||
| ); | ||
| if (methodCandidates.length > 0) return methodCandidates; | ||
|
|
||
| // Type-aware resolution: translate variable receiver to its declared type | ||
| if (call.receiver && typeMap) { | ||
| const typeName = typeMap.get(call.receiver); | ||
| if (typeName) { | ||
| const qualifiedName = `${typeName}.${call.name}`; | ||
| const typed = (ctx.nodesByName.get(qualifiedName) || []).filter((n) => n.kind === 'method'); | ||
| if (typed.length > 0) return typed; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
methodCandidates guard is effectively dead code
ctx.nodesByName is keyed by node.name (see setupNodeLookups), where class methods are stored under their fully-qualified name (e.g., 'Router.get'). When call.name is the simple method name 'get', ctx.nodesByName.get('get') returns only nodes whose literal .name property equals 'get' — i.e., top-level functions, never class methods. The subsequent .endsWith('.get') filter then eliminates those too, so methodCandidates is always empty and the early-return guard never fires.
This means the new type-aware qualifiedName lookup below is always reachable (good), but the intent of the methodCandidates block is misleading and will confuse anyone reading this path. Consider removing the dead block or adding a comment that explains why methodCandidates is always empty:
// nodesByName is keyed by fully-qualified name (e.g., 'Router.get'),
// so get(call.name) never returns qualified methods — this block is a no-op
// and falls through to the type-aware qualified lookup below.
const methodCandidates = ...There was a problem hiding this comment.
Fixed — removed the dead methodCandidates block entirely. Since nodesByName is keyed by fully-qualified name (e.g. Router.get), get(call.name) with a simple name never returns class methods. The type-aware resolution path is now the clear entry point in resolveByMethodOrGlobal.
| if (node.type === 'var_spec') { | ||
| const nameNode = node.childForFieldName('name'); | ||
| const typeNode = node.childForFieldName('type'); | ||
| if (nameNode && typeNode) { | ||
| const typeName = extractGoTypeName(typeNode); | ||
| if (typeName) ctx.typeMap.set(nameNode.text, typeName); | ||
| } | ||
| } | ||
|
|
||
| // Function/method parameter types: parameter_declaration has identifiers then a type | ||
| if (node.type === 'parameter_declaration') { |
There was a problem hiding this comment.
Go multi-name
var declarations only capture the first name
In Go, a single var_spec can declare multiple variables with the same type:
var x, y MyServiceIn tree-sitter-go the name field of var_spec is a list, but node.childForFieldName('name') only returns the first name identifier. The same limitation exists in the corresponding Rust implementation (extract_go_type_map_depth in go.rs, line ~337).
Variables after the first (y in the example) will silently fail to appear in the type map, causing type-aware method-call resolution to miss them. This won't cause any errors but means the feature is incomplete for this pattern, which is common in Go:
var req *http.Request
var rw, backup http.ResponseWriterConsider iterating named children or using children_by_field_name (in tree-sitter Rust) / walking all identifier children before the type node (in JS) to capture all declared names in a var_spec.
There was a problem hiding this comment.
Fixed — both JS () and Rust () now iterate all identifier children in a \ node instead of using . Multi-name declarations like \ now register all names in the type map.
…view # Conflicts: # README.md
- Add GitNexus column to feature comparison table (verified against repo) - Fix codegraph license: MIT → Apache-2.0 - Remove "Other tools evaluated" footnote (GitNexus now in table) - Update Limitations: reflect that type inference exists, clarify tsc gap - Sync Roadmap section with actual ROADMAP.md (11 phases, correct versions)
Native binaries < 3.2.0 don't emit typeMap from parsing or handle receiver-based method calls in buildCallEdges. This adds two fallbacks: 1. extractTypeMapRegex: when native parse results lack typeMap, extract `const x = new Foo()` patterns via regex so the edge builder can resolve variable receivers to their declared types. 2. supplementReceiverEdges: after native buildCallEdges runs, if no receiver edges were produced, create them JS-side using the same buildReceiverEdge + type-resolved method-call logic as the WASM path. Both fallbacks are gated on the absence of the feature and will be no-ops once native binaries >= 3.2.0 are published. Fixes build-parity test failure on CI (native 31 edges vs WASM 33). Impact: 6 functions changed, 15 affected Impact: 2 functions changed, 5 affected
The comment incorrectly stated native type-map resolution was deferred, but native Rust support is fully implemented in this PR. The actual reason is that the native binary may not be present in CI.
Every other type-map extractor guards at depth 200, but the JS/TS walker recursed unconditionally. This could cause stack overflow on pathologically large or deeply nested auto-generated files. Impact: 2 functions changed, 3 affected
var x, y MyService now registers all names in the type map, not just
the first. Fixed in both JS (extractGoTypeMapDepth) and Rust
(extract_go_type_map_depth) extractors by iterating identifier children
instead of using child_by_field_name('name').
Impact: 2 functions changed, 4 affected
src/domain/parser.js
Outdated
| function extractTypeMapRegex(filePath) { | ||
| let code; | ||
| try { | ||
| code = fs.readFileSync(filePath, 'utf-8'); | ||
| } catch { | ||
| return []; | ||
| } | ||
| const entries = []; | ||
| for (const m of code.matchAll(/(?:const|let|var)\s+(\w+)\s*=\s*new\s+(\w+)/g)) { | ||
| entries.push({ name: m[1], typeName: m[2] }); | ||
| } | ||
| return entries; |
There was a problem hiding this comment.
Regex matches inside comments and string literals
The pattern /(?:const|let|var)\s+(\w+)\s*=\s*new\s+(\w+)/g has no awareness of comment or string context. Code like:
// Legacy: const client = new HttpClient()
const help = "Example: const x = new Foo()";…would produce spurious { name: 'client', typeName: 'HttpClient' } and { name: 'x', typeName: 'Foo' } entries in the type map. Those entries then flow into supplementReceiverEdges, where ctx.nodesByName.get('HttpClient.someMethod') might accidentally find a real HttpClient.someMethod node in a different file and emit a false calls edge — corrupting the graph for the duration that the shim is active (i.e., all users still on a native binary < 3.2.0).
Even if the false-positive rate is low in typical code, the consequence (phantom call edges) is silent and hard to debug. Consider stripping single-line and block comments before running the regex, or narrowing the pattern to avoid the most common false-positive shapes (e.g., require the match not be preceded by // on the same line).
There was a problem hiding this comment.
Fixed — replaced the regex-based extractTypeMapRegex with backfillTypeMap, which uses WASM tree-sitter AST extraction via wasmExtractSymbols. This is context-aware and will not match patterns inside comments or string literals.
| if (native) { | ||
| const result = native.parseFile(filePath, source, !!opts.dataflow, opts.ast !== false); | ||
| return result ? patchNativeResult(result) : null; | ||
| if (!result) return null; | ||
| const patched = patchNativeResult(result); | ||
| if (!patched.typeMap || patched.typeMap.length === 0) { | ||
| patched.typeMap = extractTypeMapRegex(filePath); | ||
| } | ||
| return patched; |
There was a problem hiding this comment.
Redundant disk read when
source is already in memory
parseFileAuto already has the file content in the source parameter. Calling extractTypeMapRegex(filePath) re-reads the same bytes from disk unnecessarily. For single-file incremental rebuilds this doubles the I/O for every file processed on older native binaries.
extractTypeMapRegex could accept an optional source string to avoid the extra read:
function extractTypeMapRegex(filePath, source) {
let code = source;
if (!code) {
try { code = fs.readFileSync(filePath, 'utf-8'); } catch { return []; }
}
// ...
}Then the parseFileAuto call site becomes:
patched.typeMap = extractTypeMapRegex(filePath, source);The parseFilesAuto and parseFileIncremental paths don't have source readily available, so they keep the current extractTypeMapRegex(r.file) form unchanged.
There was a problem hiding this comment.
Fixed — the new backfillTypeMap function accepts the source as a parameter and only falls back to fs.readFileSync when source is not provided. In the batch path (parseFilesAuto), source is already available from the native parse results, so no redundant disk read occurs.
) The regex pattern matched inside comments and string literals, producing spurious type map entries that could create phantom call edges. Replace with WASM tree-sitter extraction which uses the AST and is immune to false positives from non-code contexts. Also eliminates redundant disk reads in parseFileAuto where source was already in memory. Impact: 4 functions changed, 2 affected
Summary
.pyi(Python type stubs),.phtml(PHP templates),.rake/.gemspec(Ruby) — updated WASM registry, native Rust registry, and import resolution (both JS and Rust sides)—toN/Awith footnote (dynamic/declarative languages)Test plan
vitest run tests/unit/parser.test.js— 6/6 pass.pyi/.phtml/.rake/.gemspecfiles are picked up bycodegraph build