-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Summary
The native Rust engine (crates/codegraph-core/) crashes with a segfault when parsing files with deeply nested syntax structures. The walk_node function in all language extractors recurses over the full AST without any depth limit, causing stack overflow that terminates the Node.js process.
Workaround: Use --engine wasm to bypass the native addon entirely.
Root Cause
Every language extractor's walk_node function follows this pattern:
fn walk_node(node: &Node, source: &[u8], symbols: &mut FileSymbols) {
match node.kind() { /* ... */ }
for i in 0..node.child_count() {
if let Some(child) = node.child(i) {
walk_node(&child, source, symbols); // RECURSIVE — NO DEPTH LIMIT
}
}
}Each recursive call consumes ~256+ bytes of stack. When AST depth exceeds the thread stack limit (~1MB on Windows, ~8MB on Linux), Rust panics and the napi-rs binding propagates this as a native crash (segfault).
Affected Components
| Rank | Component | File | Depth Limit |
|---|---|---|---|
| CRITICAL | walk_node (all extractors) |
src/extractors/*.rs |
None |
| CRITICAL | walk_ast_nodes (JS) |
src/extractors/javascript.rs:411 |
None |
| CRITICAL | walk_ast_nodes_with_config |
src/extractors/helpers.rs:208 |
None |
| HIGH | scan_import_names (JS) |
src/extractors/javascript.rs:1115 |
None |
| HIGH | extract_implements_from_node (JS) |
src/extractors/javascript.rs:765 |
None |
| MODERATE | walk_children (complexity) |
src/complexity.rs:381 |
Semantic only |
| MODERATE | process_if (CFG) |
src/cfg.rs:443 |
None |
| OK | visit (dataflow) |
src/dataflow.rs:854 |
200 (has protection) |
Note: The dataflow module (dataflow.rs) already implements MAX_VISIT_DEPTH = 200 — this pattern should be adopted everywhere.
Affected Extractors
javascript.rs(line 19) — also has second recursive walker at line 411go.rs(line 19)python.rs(line 19)java.rs(line 35)csharp.rs(line 36)ruby.rs(line 35)php.rs(line 35)rust_lang.rs(line 32)
Trigger Conditions
Files with AST depth exceeding ~30-40 levels (platform-dependent), caused by:
- Deeply nested function definitions or callbacks
- Nested object/array literals
- Nested template strings with expressions
- Deeply chained method calls
- Heavily indented control flow (if/else, try/catch)
Proposed Fix
- Add a
MAX_WALK_DEPTHconstant (e.g., 200, matching dataflow's existingMAX_VISIT_DEPTH) - Thread a
depth: usizeparameter through all recursivewalk_node,walk_ast_nodes, and helper functions - Early-return when depth is exceeded — log a warning but don't crash
- Alternatively, convert to iterative traversal using an explicit stack (
Vec<Node>) to eliminate stack overflow risk entirely - Add
std::panic::catch_unwindaround the napi-rs entry point as a safety net
Option 4 (iterative) is the most robust long-term solution. Option 1-3 is a simpler short-term fix.
Reproduction
Observed during a Titan GAUNTLET audit session (Batch 3 — Extractors). The native engine segfaulted while running codegraph commands against its own codebase. Switching to --engine wasm worked around the issue.
Labels
bug, native-engine, safety