feat: type inference for all typed languages (WASM + native)#501
feat: type inference for all typed languages (WASM + native)#501carlos-alm merged 35 commits intomainfrom
Conversation
- Remove dead `truncate` function from ast-analysis/shared.js (0 consumers) - Remove dead `truncStart` function from presentation/table.js (0 consumers) - Un-export `BATCH_CHUNK` in builder/helpers.js (only used internally) Skipped sync.json targets that were false positives: - BUILTIN_RECEIVERS: used by incremental.js + build-edges.js - TRANSIENT_CODES/RETRY_DELAY_MS: internal to readFileSafe - MAX_COL_WIDTH: internal to printAutoTable - findFunctionNode: re-exported from index.js, used in tests Impact: 1 functions changed, 32 affected
Impact: 29 functions changed, 105 affected
…ures Impact: 5 functions changed, 7 affected
connection.js: add debug() logging to all 8 catch-with-fallback blocks so failures are observable without changing behavior. migrations.js: replace 14 try/catch blocks in initSchema with hasColumn() and hasTable() guards. CREATE INDEX calls use IF NOT EXISTS directly. getBuildMeta uses hasTable() check instead of try/catch. Impact: 10 functions changed, 19 affected
Add debug() logging to 10 empty catch blocks across context.js, symbol-lookup.js, exports.js, impact.js, and module-map.js. All catches retain their fallback behavior but failures are now observable via debug logging. Impact: 6 functions changed, 18 affected
Add debug() logging to 6 empty catch blocks: 3 in disposeParsers() for WASM resource cleanup, 2 in ensureWasmTrees() for file read and parse failures, and 1 in getActiveEngine() for version lookup. Impact: 3 functions changed, 0 affected
Add debug() logging to 9 empty catch blocks across complexity.js (5), cfg.js (2), and dataflow.js (2). All catches for file read and parse failures now log the error message before continuing. Impact: 4 functions changed, 2 affected
Split the monolithic walkJavaScriptNode switch (13 cases, cognitive 228) into 11 focused handler functions. The dispatcher is now a thin switch that delegates to handleFunctionDecl, handleClassDecl, handleMethodDef, handleInterfaceDecl, handleTypeAliasDecl, handleVariableDecl, handleEnumDecl, handleCallExpr, handleImportStmt, handleExportStmt, and handleExpressionStmt. The expression_statement case now reuses the existing handleCommonJSAssignment helper, eliminating ~50 lines of duplication. Worst handler complexity: handleVariableDecl (cognitive 20), down from the original monolithic function (cognitive 279). Impact: 13 functions changed, 3 affected
Split walkPythonNode switch into 7 focused handlers: handlePyFunctionDef, handlePyClassDef, handlePyCall, handlePyImport, handlePyExpressionStmt, handlePyImportFrom, plus the decorated_definition inline dispatch. Moved extractPythonParameters, extractPythonClassProperties, walkInitBody, and findPythonParentClass from closures to module-scope functions. Impact: 12 functions changed, 5 affected
Split walkJavaNode switch into 8 focused handlers plus an extractJavaInterfaces helper. Moved findJavaParentClass to module scope. The class_declaration case (deepest nesting in the file) is now split between handleJavaClassDecl and extractJavaInterfaces. Impact: 12 functions changed, 5 affected
Apply the same per-category handler decomposition to all remaining language extractors: Go (6 handlers), Ruby (8 handlers), PHP (11 handlers), C# (11 handlers), Rust (9 handlers), HCL (4 handlers). Each extractor now follows the template established by the JS extractor: - Thin entry function creates ctx, delegates to walkXNode - walkXNode is a thin dispatcher switch - Each case is a named handler function at module scope - Helper functions (findParentClass, etc.) moved to module scope Impact: 66 functions changed, 23 affected
…pers Move nested handler functions to module level in cfg-visitor.js, dataflow-visitor.js, and complexity-visitor.js — reducing cognitive complexity of each factory function from 100-337 down to thin coordinators. Extract WASM pre-parse, visitor setup, result storage, and build delegation from runAnalyses into focused helper functions. Impact: 66 functions changed, 43 affected
Extract edge-building by type (import, call-native, call-JS, class hierarchy) from buildEdges. Extract per-phase insertion logic from insertNodes. Extract scoped/incremental/full-build paths and reverse-dep cascade from detectChanges. Extract setup, engine init, alias loading from pipeline.js. Extract node/edge-building helpers from incremental.js rebuildFile. Impact: 44 functions changed, 19 affected
Impact: 37 functions changed, 29 affected
Impact: 5 functions changed, 3 affected
…sification Impact: 8 functions changed, 3 affected
Impact: 10 functions changed, 5 affected
Impact: 5 functions changed, 2 affected
Impact: 5 functions changed, 2 affected
Impact: 12 functions changed, 6 affected
…age) Extract per-section validators from validateBoundaryConfig (cog 101→2). Extract buildCommunityObjects and analyzeDrift from communitiesData (cog 32→4). Extract buildTriageItems and computeTriageSummary from triageData (bugs 1.4→0.48). Impact: 13 functions changed, 11 affected
Extract printDiffFunctions/Coupled/Ownership/Boundaries/Summary from diffImpact (cog 28→6, cyc 21→7). Extract printExportHeader/Symbols from fileExports. Extract printNotFound/PathSteps from symbolPath. Impact: 12 functions changed, 7 affected
Extract runManifesto/validateKind from check execute (cyc 14→10). Extract runHotspots/validateFilters/parseWeights from triage execute (cyc 13→4). Extract loadMCPSdk/createLazyLoaders/resolveDbPath/validateMultiRepoAccess from startMCPServer (cog 34→13, cyc 19→7). Impact: 14 functions changed, 4 affected
Move createRequire assignment after all import declarations to satisfy Biome's import sorting rule. Impact: 3 functions changed, 1 affected
Impact: 28 functions changed, 35 affected
Impact: 31 functions changed, 8 affected
…elds in lockfile
…nto release/3.2.0
Extract per-file typeMap (varName → typeName) from type annotations, new expressions, and typed parameters for JS/TS, Java, Go, Rust, C#, PHP, and Python. The edge resolver uses typeMap to connect variable receiver calls (x.method()) to their declared types (Type.method()) with 0.9 confidence. Implemented in both WASM (JS extractors) and native (Rust extractors + edge builder) engines for full parity. Updated README language table with symbols-extracted, type-inference, and engine-parity columns. Impact: 73 functions changed, 54 affected
|
Claude finished @carlos-alm's task —— View job Review CompleteReviewing PR #501: Type inference for all typed languages Root Cause AnalysisProblem Being Solved: Type inference addresses a fundamental limitation in codegraph's call resolution. Currently, typed method calls like Why This Approach: The solution extracts type annotations and Risk Assessment: Low risk. This is purely additive functionality with comprehensive test coverage and no configuration changes. Backlog Compliance
Critical ConcernsNone. This is high-quality implementation of a crucial feature. Final Recommendation
|
Greptile SummaryThis PR adds per-file type inference across 8 languages (JS/TS, Java, Go, Rust, C#, PHP, Python) by extracting a Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant File as Source File
participant Ext as Extractor (JS/Rust)
participant TM as typeMap
participant EB as Edge Builder
participant DB as Graph DB
File->>Ext: parse AST
Ext->>TM: populate typeMap<br/>(annotations, new exprs, typed params)
Note over TM: { varName → TypeName }
Ext->>EB: symbols + typeMap
EB->>EB: process call: x.method()
EB->>TM: lookup receiver "x"
TM-->>EB: "TypeName"
alt type resolved
EB->>DB: call edge → TypeName.method() (confidence=0.9)
EB->>DB: receiver edge → TypeName (confidence=0.9)
else no type info
EB->>DB: fallback resolution (confidence=0.7)
end
Last reviewed commit: "refactor: remove red..." |
src/extractors/python.js
Outdated
| const typeNode = node.childForFieldName('type'); | ||
| if (nameNode && nameNode.type === 'identifier' && typeNode) { | ||
| const typeName = extractPythonTypeName(typeNode); | ||
| if (typeName && typeName !== 'self' && typeName !== 'cls') { |
There was a problem hiding this comment.
self/cls filter checks the type name, not the variable name
The guard typeName !== 'self' && typeName !== 'cls' compares the declared type against the strings 'self'/'cls', not the parameter name. Its intent is to skip self and cls parameters (which always reference the current instance), but because the comparison is on the wrong variable, it fails to filter them.
Concretely, if someone writes def foo(self: Router): self.get("/"), this branch evaluates 'Router' !== 'self' → true, so self → Router gets inserted into typeMap. Later in resolveByMethodOrGlobal, the type-aware lookup runs before the special-case self scoped fallback, so self.get(...) would be incorrectly resolved to Router.get via the type map instead of going through the normal same-class resolution.
The Rust counterpart in crates/codegraph-core/src/extractors/python.rs correctly does:
let name = node_text(&name_node, source);
if name != "self" && name != "cls" {The JS version should match:
| if (typeName && typeName !== 'self' && typeName !== 'cls') { | |
| if (typeName && nameNode.text !== 'self' && nameNode.text !== 'cls') { |
There was a problem hiding this comment.
Fixed — changed \ to \ (and same for ) to match the Rust implementation. Also added the same self/cls guard to \ which was missing it entirely.
src/extractors/javascript.js
Outdated
| // Populate typeMap for parameter type annotations (walk path handles variables inline) | ||
| extractTypeMapWalk(tree.rootNode, ctx.typeMap); |
There was a problem hiding this comment.
Redundant double processing of variable declarations in the walk path
The comment says "walk path handles variables inline", but extractTypeMapWalk (called on the next line) also processes variable_declarator nodes — it is not limited to parameters. This means every const/let/var declaration is walked twice: once inside handleVariableDecl (called by walkJavaScriptNode) and again inside extractTypeMapWalk.
Because Map.set is idempotent, this produces correct results, but the redundancy is wasteful and the comment is misleading. Options to clean this up:
- Remove the
variable_declaratorbranch fromextractTypeMapWalk(keep onlyrequired_parameter/optional_parameter) and update the comment to "handles parameters only". - Remove the inline typeMap logic from
handleVariableDecland rely solely onextractTypeMapWalk.
The extractSymbolsQuery path (which calls extractTypeMapWalk but not handleVariableDecl) does not have this issue.
There was a problem hiding this comment.
Fixed — removed the redundant inline typeMap logic from handleVariableDecl. extractTypeMapWalk now handles all variable_declarator processing in both the query and walk paths. Updated the comment to reflect this.
| function extractPythonTypeMap(node, ctx) { | ||
| extractPythonTypeMapDepth(node, ctx, 0); | ||
| } | ||
|
|
||
| function extractPythonTypeMapDepth(node, ctx, depth) { | ||
| if (depth >= 200) return; | ||
|
|
||
| // typed_parameter: identifier : type | ||
| if (node.type === 'typed_parameter') { | ||
| const nameNode = node.child(0); | ||
| const typeNode = node.childForFieldName('type'); | ||
| if (nameNode && nameNode.type === 'identifier' && typeNode) { | ||
| const typeName = extractPythonTypeName(typeNode); | ||
| if (typeName && typeName !== 'self' && typeName !== 'cls') { | ||
| ctx.typeMap.set(nameNode.text, typeName); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // typed_default_parameter: name : type = default | ||
| if (node.type === 'typed_default_parameter') { | ||
| const nameNode = node.childForFieldName('name'); | ||
| const typeNode = node.childForFieldName('type'); | ||
| if (nameNode && nameNode.type === 'identifier' && typeNode) { | ||
| const typeName = extractPythonTypeName(typeNode); | ||
| if (typeName) ctx.typeMap.set(nameNode.text, typeName); | ||
| } | ||
| } | ||
|
|
||
| for (let i = 0; i < node.childCount; i++) { | ||
| const child = node.child(i); | ||
| if (child) extractPythonTypeMapDepth(child, ctx, depth + 1); | ||
| } | ||
| } |
There was a problem hiding this comment.
Python annotated assignments (
x: Type = ...) listed in PR description but not implemented
The PR description's type-sources table lists x: Type = ... as a supported Python type source alongside def foo(x: Type). However, the implementation only covers typed_parameter and typed_default_parameter (function parameters). Python module-level or class-body annotated assignments (annotated_assignment in tree-sitter) — e.g.:
router: Router = Router()
db: Database = create_engine(...)— are not handled here, nor in the native Rust counterpart (crates/codegraph-core/src/extractors/python.rs). This is a documentation–implementation mismatch that could mislead users expecting that style of annotation to drive edge resolution.
If annotated assignments are intentionally out of scope for this PR, the PR description's table should drop x: Type = ... for Python.
There was a problem hiding this comment.
Confirmed — Python annotated assignments (x: Type = ...) are not implemented in either the JS or Rust extractor. This is a documentation-implementation gap in the PR description. The PR description table for Python should list only "def foo(x: Type)" as a supported source. Will update the PR description to remove the incorrect entry.
Summary
typeMap(varName → typeName) from type annotations,newexpressions, and typed parametersx.method()→Type.method()with 0.9 confidence, in bothbuild-edges.js(JS fallback) andedge_builder.rs(native)Type sources by language
const x: Type,new Type(),(x: Type) =>Type x = ...,void foo(Type x)var x Type,func foo(x Type)let x: Type,fn foo(x: Type)Type x = ...,void Foo(Type x)function foo(Type $x)def foo(x: Type)Test plan