Skip to content

feat: add .pyi/.phtml/.rake/.gemspec extensions, condense comparison table#502

Merged
carlos-alm merged 44 commits intomainfrom
fix/type-inference-review
Mar 18, 2026
Merged

feat: add .pyi/.phtml/.rake/.gemspec extensions, condense comparison table#502
carlos-alm merged 44 commits intomainfrom
fix/type-inference-review

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • New file extensions: .pyi (Python type stubs), .phtml (PHP templates), .rake/.gemspec (Ruby) — updated WASM registry, native Rust registry, and import resolution (both JS and Rust sides)
  • Condensed comparison table: 28 rows × 8 competitors → 12 rows × 4 competitors. Grouped related features, verified all claims against actual repos, fixed incorrect counts (joern ~12 not 14, code-graph-rag 7 not 11), added security scanning row ("intentionally out of scope"), moved lesser tools to footnote
  • Backlog docs: add feature backlog and track file moves in hooks #71 (type inference) marked as DONE with PR feat: type inference for all typed languages (WASM + native) #501 link
  • Ruby/HCL type inference column changed from to N/A with footnote (dynamic/declarative languages)

Test plan

  • vitest run tests/unit/parser.test.js — 6/6 pass
  • Verify README table renders correctly on GitHub
  • Verify .pyi/.phtml/.rake/.gemspec files are picked up by codegraph build

- Remove dead `truncate` function from ast-analysis/shared.js (0 consumers)
- Remove dead `truncStart` function from presentation/table.js (0 consumers)
- Un-export `BATCH_CHUNK` in builder/helpers.js (only used internally)

Skipped sync.json targets that were false positives:
- BUILTIN_RECEIVERS: used by incremental.js + build-edges.js
- TRANSIENT_CODES/RETRY_DELAY_MS: internal to readFileSafe
- MAX_COL_WIDTH: internal to printAutoTable
- findFunctionNode: re-exported from index.js, used in tests

Impact: 1 functions changed, 32 affected
…ures

Impact: 5 functions changed, 7 affected
connection.js: add debug() logging to all 8 catch-with-fallback blocks
so failures are observable without changing behavior.

migrations.js: replace 14 try/catch blocks in initSchema with hasColumn()
and hasTable() guards. CREATE INDEX calls use IF NOT EXISTS directly.
getBuildMeta uses hasTable() check instead of try/catch.

Impact: 10 functions changed, 19 affected
Add debug() logging to 10 empty catch blocks across context.js,
symbol-lookup.js, exports.js, impact.js, and module-map.js.
All catches retain their fallback behavior but failures are now
observable via debug logging.

Impact: 6 functions changed, 18 affected
Add debug() logging to 6 empty catch blocks: 3 in disposeParsers()
for WASM resource cleanup, 2 in ensureWasmTrees() for file read and
parse failures, and 1 in getActiveEngine() for version lookup.

Impact: 3 functions changed, 0 affected
Add debug() logging to 9 empty catch blocks across complexity.js (5),
cfg.js (2), and dataflow.js (2). All catches for file read and parse
failures now log the error message before continuing.

Impact: 4 functions changed, 2 affected
Split the monolithic walkJavaScriptNode switch (13 cases, cognitive 228)
into 11 focused handler functions. The dispatcher is now a thin switch
that delegates to handleFunctionDecl, handleClassDecl, handleMethodDef,
handleInterfaceDecl, handleTypeAliasDecl, handleVariableDecl,
handleEnumDecl, handleCallExpr, handleImportStmt, handleExportStmt,
and handleExpressionStmt.

The expression_statement case now reuses the existing
handleCommonJSAssignment helper, eliminating ~50 lines of duplication.

Worst handler complexity: handleVariableDecl (cognitive 20), down from
the original monolithic function (cognitive 279).

Impact: 13 functions changed, 3 affected
Split walkPythonNode switch into 7 focused handlers: handlePyFunctionDef,
handlePyClassDef, handlePyCall, handlePyImport, handlePyExpressionStmt,
handlePyImportFrom, plus the decorated_definition inline dispatch.

Moved extractPythonParameters, extractPythonClassProperties, walkInitBody,
and findPythonParentClass from closures to module-scope functions.

Impact: 12 functions changed, 5 affected
Split walkJavaNode switch into 8 focused handlers plus an
extractJavaInterfaces helper. Moved findJavaParentClass to module scope.
The class_declaration case (deepest nesting in the file) is now split
between handleJavaClassDecl and extractJavaInterfaces.

Impact: 12 functions changed, 5 affected
Apply the same per-category handler decomposition to all remaining
language extractors: Go (6 handlers), Ruby (8 handlers), PHP (11
handlers), C# (11 handlers), Rust (9 handlers), HCL (4 handlers).

Each extractor now follows the template established by the JS extractor:
- Thin entry function creates ctx, delegates to walkXNode
- walkXNode is a thin dispatcher switch
- Each case is a named handler function at module scope
- Helper functions (findParentClass, etc.) moved to module scope

Impact: 66 functions changed, 23 affected
…pers

Move nested handler functions to module level in cfg-visitor.js,
dataflow-visitor.js, and complexity-visitor.js — reducing cognitive
complexity of each factory function from 100-337 down to thin
coordinators. Extract WASM pre-parse, visitor setup, result storage,
and build delegation from runAnalyses into focused helper functions.

Impact: 66 functions changed, 43 affected
Extract edge-building by type (import, call-native, call-JS, class
hierarchy) from buildEdges. Extract per-phase insertion logic from
insertNodes. Extract scoped/incremental/full-build paths and
reverse-dep cascade from detectChanges. Extract setup, engine init,
alias loading from pipeline.js. Extract node/edge-building helpers
from incremental.js rebuildFile.

Impact: 44 functions changed, 19 affected
Impact: 5 functions changed, 3 affected
…sification

Impact: 8 functions changed, 3 affected
Impact: 10 functions changed, 5 affected
Impact: 5 functions changed, 2 affected
Impact: 12 functions changed, 6 affected
…age)

Extract per-section validators from validateBoundaryConfig (cog 101→2).
Extract buildCommunityObjects and analyzeDrift from communitiesData (cog 32→4).
Extract buildTriageItems and computeTriageSummary from triageData (bugs 1.4→0.48).

Impact: 13 functions changed, 11 affected
Extract printDiffFunctions/Coupled/Ownership/Boundaries/Summary from
diffImpact (cog 28→6, cyc 21→7). Extract printExportHeader/Symbols from
fileExports. Extract printNotFound/PathSteps from symbolPath.

Impact: 12 functions changed, 7 affected
Extract runManifesto/validateKind from check execute (cyc 14→10).
Extract runHotspots/validateFilters/parseWeights from triage execute (cyc 13→4).
Extract loadMCPSdk/createLazyLoaders/resolveDbPath/validateMultiRepoAccess
from startMCPServer (cog 34→13, cyc 19→7).

Impact: 14 functions changed, 4 affected
Move createRequire assignment after all import declarations
to satisfy Biome's import sorting rule.

Impact: 3 functions changed, 1 affected
Impact: 28 functions changed, 35 affected
Impact: 31 functions changed, 8 affected
Extract per-file typeMap (varName → typeName) from type annotations,
new expressions, and typed parameters for JS/TS, Java, Go, Rust, C#,
PHP, and Python. The edge resolver uses typeMap to connect variable
receiver calls (x.method()) to their declared types (Type.method())
with 0.9 confidence.

Implemented in both WASM (JS extractors) and native (Rust extractors +
edge builder) engines for full parity. Updated README language table
with symbols-extracted, type-inference, and engine-parity columns.

Impact: 73 functions changed, 54 affected
… extractor (#501)

Impact: 1 functions changed, 2 affected
…on table

- Python: .pyi (type stubs), PHP: .phtml, Ruby: .rake/.gemspec
- Updated WASM registry, native Rust registry, and import resolution
- Condensed README comparison table from 28×8 to 12×5, verified claims
  against actual repos, added security scanning row, grouped features
- Marked backlog #71 (type inference) as done (#501)
- Ruby/HCL type inference column: N/A (dynamic/declarative)

Impact: 3 functions changed, 3 affected
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 18, 2026

Greptile Summary

This PR extends language support with four new file extensions (.pyi, .phtml, .rake, .gemspec) across both the WASM registry and native Rust registry, and bundles several previously-reviewed fixes: the AST-aware backfillTypeMap shim for native binaries < 3.2.0, depth guards on extractTypeMapWalk, multi-name Go var_spec iteration in both JS and Rust, removal of the dead methodCandidates block, and the corrected integration test comment. The README comparison table is condensed from 28×8 to 12×6 and backlog item #71 is marked done.

Key observations:

  • hasReceiver shim heuristic is overly broad — every native build of a repository with zero type-annotated code (plain JS, Ruby, Python without annotations) will run supplementReceiverEdges unnecessarily because nativeEdges.some(e => e.kind === 'receiver') is false. The function is a no-op in those cases but adds an O(all-calls) pass on every incremental build until all binaries hit 3.2.0.
  • Missing /__init__.pyi in import probes.pyi was added to the file-level probe in both resolve.js and import_resolution.rs, but /__init__.pyi was not added alongside /__init__.py. Python stub-only packages (those shipping package/__init__.pyi without a .py) won't have their imports resolved.
  • The backfillTypeMap / createParsers() path is correctly memoized; no per-call WASM init overhead.
  • Go multi-name var_spec fix is correct in both JS and Rust — value expressions are always wrapped inside an expression_list child node, so iterating direct identifier children captures only the declared names.

Confidence Score: 4/5

  • Safe to merge — all logic is correct and the two flagged issues are low-severity gaps rather than regressions.
  • The core extension additions and the four fixes from previous review rounds are all correct. The hasReceiver heuristic adds unnecessary work (not incorrect results) on annotation-free repos, and the missing /__init__.pyi is a minor coverage gap in the new .pyi feature, not a breaking change. No data integrity or graph-corruption risk was found.
  • src/domain/graph/builder/stages/build-edges.js (hasReceiver heuristic), src/domain/graph/resolve.js and crates/codegraph-core/src/import_resolution.rs (missing /init.pyi probe)

Important Files Changed

Filename Overview
src/domain/graph/builder/stages/build-edges.js Adds supplementReceiverEdges shim for native binaries < 3.2.0; removes dead methodCandidates block. The hasReceiver heuristic used to gate the shim will fire on every build for annotation-free repos, adding unnecessary overhead.
src/domain/parser.js Adds backfillTypeMap (WASM-based, AST-aware) replacing the previous regex shim; integrates backfill into parseFileAuto, parseFilesAuto, and parseFileIncremental. Logic is correct; createParsers() is cached so repeated calls are cheap.
src/domain/graph/resolve.js Adds .pyi to the JS import resolution extension probe list. Missing /__init__.pyi means package-level Python stub-only packages won't resolve correctly.
crates/codegraph-core/src/import_resolution.rs Adds .pyi to the Rust import resolution extension probe list (same gap as the JS side: no /__init__.pyi).
crates/codegraph-core/src/parser_registry.rs Adds .pyi → Python, .rake/.gemspec → Ruby, .phtml → PHP to the native Rust language kind matcher. Straightforward and correct.
crates/codegraph-core/src/extractors/go.rs Fixes multi-name var_spec extraction by iterating all identifier children rather than using child_by_field_name("name"). Values are always wrapped in an expression_list child, so only name identifiers are captured.
src/extractors/go.js JS mirror of the Rust var_spec multi-name fix — correct for the same reasons.
src/extractors/javascript.js Adds depth parameter and if (depth >= 200) return guard to extractTypeMapWalk, consistent with every other type-map extractor in the codebase.
tests/integration/build.test.js Fixes misleading comment: "native deferred" → "native binary may not be present in all test environments".
docs/roadmap/BACKLOG.md Marks backlog item #71 (type inference) as DONE with a link to PR #501. Copy change only.
README.md Condenses comparison table from 28×8 to 12×6, updates language extension lists for Python/PHP/Ruby, clarifies type-inference footnote for Ruby/HCL, and revises roadmap numbering.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[parseFile / parseFiles] --> B{Native engine?}
    B -- No --> C[WASM extract\ntypeMap included]
    B -- Yes --> D[native.parseFile / parseFiles]
    D --> E{typeMap empty\nor missing?}
    E -- No --> F[Return patched result]
    E -- Yes --> G[backfillTypeMap\nWASM AST extraction]
    G --> F

    F --> H[buildCallEdgesNative]
    H --> I[native.buildCallEdges]
    I --> J{hasReceiver\nin nativeEdges?}
    J -- Yes >= 3.2.0 + annotations --> K[Done — native edges used]
    J -- No old binary OR no annotations --> L[supplementReceiverEdges\nJS shim]
    L --> K

    style J fill:#f9c,stroke:#c66
    style L fill:#ffd,stroke:#aa0
Loading

Last reviewed commit: "fix: replace regex t..."

'app.post("/items");',
'',
].join('\n'),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Misleading "native deferred" comment

The comment says // Force WASM engine — typeMap resolution is JS-only (native deferred), but native Rust support for typeMap is fully implemented in this very PR — edge_builder.rs has a complete type-map-aware resolution path, and parser_registry.rs / types.rs expose typeMap through the NAPI layer.

The real reason to force WASM in this test is likely that the native binary isn't guaranteed to be compiled in the CI environment where unit/integration tests run, not that the feature is unimplemented. Using an incorrect rationale here will mislead future contributors and could cause someone to remove native support thinking it's still "deferred".

Suggested fix:

Suggested change
);
// Force WASM engine — native binary may not be present in all test environments
await buildGraph(typedDir, { skipRegistry: true, engine: 'wasm' });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated the comment to: "Force WASM engine — native binary may not be present in all test environments". The original rationale was incorrect since native Rust type inference is fully implemented in this PR.

Comment on lines +828 to +842
function walk(node) {
const t = node.type;
if (t === 'variable_declarator') {
const nameN = node.childForFieldName('name');
if (nameN && nameN.type === 'identifier') {
const typeAnno = findChild(node, 'type_annotation');
if (typeAnno) {
const typeName = extractSimpleTypeName(typeAnno);
if (typeName) typeMap.set(nameN.text, typeName);
} else {
const valueN = node.childForFieldName('value');
if (valueN && valueN.type === 'new_expression') {
const ctorType = extractNewExprTypeName(valueN);
if (ctorType) typeMap.set(nameN.text, ctorType);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 extractTypeMapWalk has no recursion depth guard

Every other JS type-map extractor added in this PR (php.js, python.js, go.js, rust.js, csharp.js) protects against deep ASTs with an explicit depth limit:

function extractPhpTypeMapDepth(node, ctx, depth) {
  if (depth >= 200) return;

The inner walk closure in extractTypeMapWalk recurses unconditionally into every child node. For pathologically large or deeply nested files (e.g., auto-generated code, deeply nested template expressions), this can exhaust the call stack and throw a RangeError: Maximum call stack size exceeded.

Add a depth parameter and guard, consistent with every other extractor in this PR:

Suggested change
function walk(node) {
const t = node.type;
if (t === 'variable_declarator') {
const nameN = node.childForFieldName('name');
if (nameN && nameN.type === 'identifier') {
const typeAnno = findChild(node, 'type_annotation');
if (typeAnno) {
const typeName = extractSimpleTypeName(typeAnno);
if (typeName) typeMap.set(nameN.text, typeName);
} else {
const valueN = node.childForFieldName('value');
if (valueN && valueN.type === 'new_expression') {
const ctorType = extractNewExprTypeName(valueN);
if (ctorType) typeMap.set(nameN.text, ctorType);
}
function extractTypeMapWalk(rootNode, typeMap) {
function walk(node, depth) {
if (depth >= 200) return;
const t = node.type;
if (t === 'variable_declarator') {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added if (depth >= 200) return; guard to extractTypeMapWalk, consistent with every other type-map extractor in this PR. The walk closure now takes a depth parameter, incremented on each recursive call.

Comment on lines +247 to +261
const methodCandidates = (ctx.nodesByName.get(call.name) || []).filter(
(n) => n.name.endsWith(`.${call.name}`) && n.kind === 'method',
);
if (methodCandidates.length > 0) return methodCandidates;

// Type-aware resolution: translate variable receiver to its declared type
if (call.receiver && typeMap) {
const typeName = typeMap.get(call.receiver);
if (typeName) {
const qualifiedName = `${typeName}.${call.name}`;
const typed = (ctx.nodesByName.get(qualifiedName) || []).filter((n) => n.kind === 'method');
if (typed.length > 0) return typed;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 methodCandidates guard is effectively dead code

ctx.nodesByName is keyed by node.name (see setupNodeLookups), where class methods are stored under their fully-qualified name (e.g., 'Router.get'). When call.name is the simple method name 'get', ctx.nodesByName.get('get') returns only nodes whose literal .name property equals 'get' — i.e., top-level functions, never class methods. The subsequent .endsWith('.get') filter then eliminates those too, so methodCandidates is always empty and the early-return guard never fires.

This means the new type-aware qualifiedName lookup below is always reachable (good), but the intent of the methodCandidates block is misleading and will confuse anyone reading this path. Consider removing the dead block or adding a comment that explains why methodCandidates is always empty:

// nodesByName is keyed by fully-qualified name (e.g., 'Router.get'),
// so get(call.name) never returns qualified methods — this block is a no-op
// and falls through to the type-aware qualified lookup below.
const methodCandidates = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed the dead methodCandidates block entirely. Since nodesByName is keyed by fully-qualified name (e.g. Router.get), get(call.name) with a simple name never returns class methods. The type-aware resolution path is now the clear entry point in resolveByMethodOrGlobal.

Comment on lines +215 to +225
if (node.type === 'var_spec') {
const nameNode = node.childForFieldName('name');
const typeNode = node.childForFieldName('type');
if (nameNode && typeNode) {
const typeName = extractGoTypeName(typeNode);
if (typeName) ctx.typeMap.set(nameNode.text, typeName);
}
}

// Function/method parameter types: parameter_declaration has identifiers then a type
if (node.type === 'parameter_declaration') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Go multi-name var declarations only capture the first name

In Go, a single var_spec can declare multiple variables with the same type:

var x, y MyService

In tree-sitter-go the name field of var_spec is a list, but node.childForFieldName('name') only returns the first name identifier. The same limitation exists in the corresponding Rust implementation (extract_go_type_map_depth in go.rs, line ~337).

Variables after the first (y in the example) will silently fail to appear in the type map, causing type-aware method-call resolution to miss them. This won't cause any errors but means the feature is incomplete for this pattern, which is common in Go:

var req *http.Request
var rw, backup http.ResponseWriter

Consider iterating named children or using children_by_field_name (in tree-sitter Rust) / walking all identifier children before the type node (in JS) to capture all declared names in a var_spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — both JS () and Rust () now iterate all identifier children in a \ node instead of using . Multi-name declarations like \ now register all names in the type map.

- Add GitNexus column to feature comparison table (verified against repo)
- Fix codegraph license: MIT → Apache-2.0
- Remove "Other tools evaluated" footnote (GitNexus now in table)
- Update Limitations: reflect that type inference exists, clarify tsc gap
- Sync Roadmap section with actual ROADMAP.md (11 phases, correct versions)
Native binaries < 3.2.0 don't emit typeMap from parsing or handle
receiver-based method calls in buildCallEdges. This adds two fallbacks:

1. extractTypeMapRegex: when native parse results lack typeMap, extract
   `const x = new Foo()` patterns via regex so the edge builder can
   resolve variable receivers to their declared types.

2. supplementReceiverEdges: after native buildCallEdges runs, if no
   receiver edges were produced, create them JS-side using the same
   buildReceiverEdge + type-resolved method-call logic as the WASM path.

Both fallbacks are gated on the absence of the feature and will be
no-ops once native binaries >= 3.2.0 are published.

Fixes build-parity test failure on CI (native 31 edges vs WASM 33).

Impact: 6 functions changed, 15 affected

Impact: 2 functions changed, 5 affected
The comment incorrectly stated native type-map resolution was deferred,
but native Rust support is fully implemented in this PR. The actual
reason is that the native binary may not be present in CI.
Every other type-map extractor guards at depth 200, but the JS/TS
walker recursed unconditionally. This could cause stack overflow on
pathologically large or deeply nested auto-generated files.

Impact: 2 functions changed, 3 affected
)

nodesByName is keyed by fully-qualified name (e.g. Router.get), so
get(call.name) with a simple name like 'get' never returns class
methods — the filter was always empty. Removed the unreachable block
so the type-aware resolution path is the clear entry point.
var x, y MyService now registers all names in the type map, not just
the first. Fixed in both JS (extractGoTypeMapDepth) and Rust
(extract_go_type_map_depth) extractors by iterating identifier children
instead of using child_by_field_name('name').

Impact: 2 functions changed, 4 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +386 to +397
function extractTypeMapRegex(filePath) {
let code;
try {
code = fs.readFileSync(filePath, 'utf-8');
} catch {
return [];
}
const entries = [];
for (const m of code.matchAll(/(?:const|let|var)\s+(\w+)\s*=\s*new\s+(\w+)/g)) {
entries.push({ name: m[1], typeName: m[2] });
}
return entries;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Regex matches inside comments and string literals

The pattern /(?:const|let|var)\s+(\w+)\s*=\s*new\s+(\w+)/g has no awareness of comment or string context. Code like:

// Legacy: const client = new HttpClient()
const help = "Example: const x = new Foo()";

…would produce spurious { name: 'client', typeName: 'HttpClient' } and { name: 'x', typeName: 'Foo' } entries in the type map. Those entries then flow into supplementReceiverEdges, where ctx.nodesByName.get('HttpClient.someMethod') might accidentally find a real HttpClient.someMethod node in a different file and emit a false calls edge — corrupting the graph for the duration that the shim is active (i.e., all users still on a native binary < 3.2.0).

Even if the false-positive rate is low in typical code, the consequence (phantom call edges) is silent and hard to debug. Consider stripping single-line and block comments before running the regex, or narrowing the pattern to avoid the most common false-positive shapes (e.g., require the match not be preceded by // on the same line).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced the regex-based extractTypeMapRegex with backfillTypeMap, which uses WASM tree-sitter AST extraction via wasmExtractSymbols. This is context-aware and will not match patterns inside comments or string literals.

Comment on lines 434 to +441
if (native) {
const result = native.parseFile(filePath, source, !!opts.dataflow, opts.ast !== false);
return result ? patchNativeResult(result) : null;
if (!result) return null;
const patched = patchNativeResult(result);
if (!patched.typeMap || patched.typeMap.length === 0) {
patched.typeMap = extractTypeMapRegex(filePath);
}
return patched;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant disk read when source is already in memory

parseFileAuto already has the file content in the source parameter. Calling extractTypeMapRegex(filePath) re-reads the same bytes from disk unnecessarily. For single-file incremental rebuilds this doubles the I/O for every file processed on older native binaries.

extractTypeMapRegex could accept an optional source string to avoid the extra read:

function extractTypeMapRegex(filePath, source) {
  let code = source;
  if (!code) {
    try { code = fs.readFileSync(filePath, 'utf-8'); } catch { return []; }
  }
  // ...
}

Then the parseFileAuto call site becomes:

patched.typeMap = extractTypeMapRegex(filePath, source);

The parseFilesAuto and parseFileIncremental paths don't have source readily available, so they keep the current extractTypeMapRegex(r.file) form unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — the new backfillTypeMap function accepts the source as a parameter and only falls back to fs.readFileSync when source is not provided. In the batch path (parseFilesAuto), source is already available from the native parse results, so no redundant disk read occurs.

)

The regex pattern matched inside comments and string literals, producing
spurious type map entries that could create phantom call edges. Replace
with WASM tree-sitter extraction which uses the AST and is immune to
false positives from non-code contexts. Also eliminates redundant disk
reads in parseFileAuto where source was already in memory.

Impact: 4 functions changed, 2 affected
@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 5c316a6 into main Mar 18, 2026
28 checks passed
@carlos-alm carlos-alm deleted the fix/type-inference-review branch March 18, 2026 23:19
@github-actions github-actions bot locked and limited conversation to collaborators Mar 18, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant