Skip to content

refactor(native): extract generic walk_tree to deduplicate extractor boilerplate#703

Merged
carlos-alm merged 2 commits intomainfrom
refactor/walk-tree-dedup
Mar 30, 2026
Merged

refactor(native): extract generic walk_tree to deduplicate extractor boilerplate#703
carlos-alm merged 2 commits intomainfrom
refactor/walk-tree-dedup

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Add walk_tree<F> generic function to helpers.rs — a depth-limited tree walker that accepts a closure for language-specific node matching, then handles the depth check and child recursion automatically
  • Replace duplicated walk_node/walk_node_depth pattern across all 9 language extractors with walk_tree + a match_X_node function
  • Replace duplicated type map walkers (extract_X_type_map/extract_X_type_map_depth) across 7 extractors with same pattern
  • Net: -190 lines (66 added, 256 removed), zero-cost abstraction (monomorphized generics, no dyn dispatch)

Test plan

  • All 272 parser tests pass
  • All 562 integration tests pass
  • Rust cargo check (blocked by missing MSVC build tools on dev machine — CI will validate)

…pth duplication

The walk_node/walk_node_depth pattern was duplicated identically across
all 9 language extractors (~190 lines of boilerplate). Each extractor
repeated the same depth check, match dispatch, and child traversal loop
— only the match arms differed.

Add a generic `walk_tree<F>` function to helpers.rs that handles depth
limiting and child recursion, accepting a closure for language-specific
node matching. Refactor all 9 extractors and 7 type map walkers to use
it. Zero-cost abstraction (monomorphized, no dyn dispatch).
@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Native Tree Walker Refactoring ⭐⭐⭐⭐

Todo List:

  • Phase 1: Root cause analysis - understand the deduplication problem
  • Phase 2: Configuration change detection - check for risky changes
  • Phase 3: Test integrity - verify no weakened assertions
  • Phase 4: Code quality review - examine the Rust refactoring implementation
  • Phase 5: Backlog compliance - rate against project criteria
  • Provide final assessment with rating and recommendation

Root Cause Analysis

Problem Being Solved: Elimination of code duplication across Rust language extractors. The walk_node/walk_node_depth pattern was identically repeated across all 9 language extractors (~190 lines), with only the node matching logic differing between languages.

Why This Approach: Generic function walk_tree<F> with closure parameter is the idiomatic Rust solution. Monomorphized generics provide zero-cost abstraction - no dyn dispatch overhead. The refactor maintains identical behavior while centralizing the depth-checking and child traversal logic.

Risk Assessment: Very low risk. This is pure refactoring with no behavioral changes. All tests pass and the generic function perfectly replicates the previous logic.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies, builds on existing tree-sitter/Rust infrastructure
  • Foundation-aligned: ✓ — Maintains native speed (P2), improves code maintainability without changing core functionality
  • Problem-fit: 2 — Code quality improvement that reduces maintenance burden but doesn't directly address the core "lost AI" problem
  • Breaking: No — Pure internal refactoring, no API or behavior changes
  • Tier: 1

Code Quality Assessment

Excellent refactoring execution:

  1. Perfect abstraction: The generic walk_tree<F> in helpers.rs:101-126 extracts exactly the right level of commonality - depth limiting and child traversal - while leaving language-specific matching as a closure parameter.

  2. Zero-cost design: Uses monomorphized generics, not trait objects. Each language gets its own compiled version with no runtime dispatch overhead.

  3. Complete coverage: Transforms all 9 language extractors (javascript.rs:13, python.rs:13, go.rs:13, etc.) plus 7 type map walkers. Consistent pattern across the entire codebase.

  4. Behavior preservation: The new walk_tree_depth function (lines 108-126) is a direct translation of the previous duplicated logic - same depth check, same recursion, same parameters.

  5. Clean API: Each extractor now calls walk_tree(&tree.root_node(), source, &mut symbols, match_language_node) - much clearer intent than the previous boilerplate.

Minor Observations

  • The generic AST walker walk_ast_nodes_with_config (lines 250+) also follows this pattern well
  • Language-specific configs (PYTHON_AST_CONFIG, GO_AST_CONFIG, etc.) are well-organized
  • JavaScript extractor still has its own walk_ast_nodes function but uses the generic config version for type mapping

Critical Concerns

None identified. This is exemplary refactoring work.

Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent code quality improvement with zero behavioral risk. Eliminates 190 lines of duplication through proper abstraction. All tests pass, no configuration changes, purely internal improvement. The only reason this isn't 5-star is that it's "just" refactoring - no new functionality or user-facing improvements.

This PR demonstrates exactly how technical debt should be addressed - identify the pattern, extract the right abstraction, apply consistently, verify with tests. Well done.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR introduces a single generic walk_tree<F> helper in helpers.rs and uses it to eliminate the walk_node / walk_node_depth and extract_X_type_map / extract_X_type_map_depth boilerplate pairs that were copy-pasted across all 9 language extractors. The abstraction is sound — depth limiting, per-node visitor dispatch, and unconditional child recursion are all handled once in the shared walker.

Key correctness checks:

  • Depth guard preservedwalk_tree_depth still checks depth >= MAX_WALK_DEPTH before calling the closure or recursing, matching the old per-extractor guard exactly.
  • decorated_definition removal (Python) — the old match arm walked all children and returned, which is identical to the default path in walk_tree_depth (call closure → iterate children). No behavioral regression.
  • walk_type_map_children removal (JavaScript) — the old "annotation wins" path called return walk_type_map_children(node, …) to recurse into children; the new return; inside match_js_type_map relies on walk_tree_depth to iterate those same children, producing the same result.
  • Closure borrow patternwalk_tree takes F by value and passes &F to the private walk_tree_depth, which is the standard Rust idiom for recursive monomorphized closures without heap allocation.
  • _depth parameter — all match_*_node functions accept but ignore the depth argument; the convention is clean and leaves the door open for depth-sensitive logic in a future visitor without a signature change.

Confidence Score: 5/5

Safe to merge — pure refactoring with no behavioral changes, verified against all three structural patterns in the diff.

All changed call-sites produce identical traversal order, depth limits, and symbol emission. No P0/P1 findings. 272 parser tests + 562 integration tests pass according to the test plan.

No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/helpers.rs Introduces the generic walk_tree + private walk_tree_depth pair; correctly passes &F to recursive calls to avoid moving the closure; depth guard preserved; FileSymbols import added.
crates/codegraph-core/src/extractors/python.rs Removes the decorated_definition match arm — semantically equivalent because that arm only walked all children then returned, which is exactly what walk_tree_depth does unconditionally.
crates/codegraph-core/src/extractors/javascript.rs Replaces return walk_type_map_children(…) with return; — correct because walk_tree_depth iterates all children after every match_node call regardless of return path.
crates/codegraph-core/src/extractors/csharp.rs Boilerplate walk_node/walk_node_depth and extract_csharp_type_map/extract_csharp_type_map_depth removed and replaced with two walk_tree calls; no behavioral change.
crates/codegraph-core/src/extractors/go.rs Same pattern applied; both match_go_node and match_go_type_map correctly delegate child recursion to the shared walker.
crates/codegraph-core/src/extractors/java.rs Standard boilerplate removal; match_java_node and match_java_type_map extracted cleanly.
crates/codegraph-core/src/extractors/php.rs Both symbol and type-map walkers replaced; no logic changes.
crates/codegraph-core/src/extractors/ruby.rs Only symbol walker replaced (no type-map pass existed); straightforward conversion.
crates/codegraph-core/src/extractors/hcl.rs Single symbol walker replaced; HCL block handling unchanged.
crates/codegraph-core/src/extractors/rust_lang.rs Both passes replaced; match_rust_node and match_rust_type_map extracted without behavioral change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["extract() — SymbolExtractor"] --> B["walk_tree(root, source, symbols, match_X_node)"]
    A --> C["walk_ast_nodes_with_config(...)"]
    A --> D["walk_tree(root, source, symbols, match_X_type_map)"]

    B --> E["walk_tree_depth(node, source, symbols, depth=0, &match_X_node)"]
    D --> F["walk_tree_depth(node, source, symbols, depth=0, &match_X_type_map)"]

    E --> G{depth >= MAX_WALK_DEPTH?}
    G -- Yes --> H[return]
    G -- No --> I["match_X_node(node, source, symbols, depth)"]
    I --> J["for each child: walk_tree_depth(child, depth+1, &match_X_node)"]
    J --> E
Loading

Reviews (1): Last reviewed commit: "refactor(native): extract generic walk_t..." | Re-trigger Greptile

@carlos-alm carlos-alm merged commit a45aa65 into main Mar 30, 2026
18 checks passed
@carlos-alm carlos-alm deleted the refactor/walk-tree-dedup branch March 30, 2026 10:03
@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant