Skip to content

feat: add Elixir, Lua, Dart, Zig, Haskell, OCaml language support#718

Open
carlos-alm wants to merge 9 commits intomainfrom
feat/batch2-languages
Open

feat: add Elixir, Lua, Dart, Zig, Haskell, OCaml language support#718
carlos-alm wants to merge 9 commits intomainfrom
feat/batch2-languages

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Add Batch 2 language support from ROADMAP Phase 7: Elixir, Lua, Dart, Zig, Haskell, OCaml
  • Each language includes dual-engine extractors (WASM TypeScript + native Rust), AST configs, and parser tests
  • Brings total supported languages from 17 to 23

Language Details

Language Extensions Key Constructs Notes
Elixir .ex, .exs modules, functions, protocols, use/import/require All constructs are generic call nodes
Lua .lua functions, methods, require() imports require() detected as imports
Dart .dart classes, enums, mixins, extensions, imports No call_expression — uses selector/argument_part
Zig .zig functions, structs, enums, unions, @import, tests Structs/enums are anonymous, named by parent decl
Haskell .hs functions, data/newtype/type, typeclasses, imports Grammar misspells type_synomym
OCaml .ml, .mli let bindings, modules, types, open, applications Sub-grammar under grammars/ocaml

Files Changed

  • 18 new files: 6 TS extractors, 6 Rust extractors, 6 test files
  • 10 modified files: types, registry, build config, Cargo.toml, package.json

Test plan

  • All 6 new parser test files pass (34 tests)
  • Full test suite passes (2257 tests, 0 failures)
  • TypeScript compiles cleanly (tsc --noEmit)
  • Lint passes (no new errors)
  • All 6 WASM grammars build successfully
  • CI passes
  • Rust native build compiles with new crate dependencies

- CHANGELOG: fix total language count from 14 to 17
- README: add 6 new languages to multi-language differentiator row
- ROADMAP: update Phase 7 overview to reflect Batch 1 completion
Add Batch 2 languages from the ROADMAP Phase 7 plan. Each language
includes dual-engine support (WASM + native Rust extractors), AST
configs, and parser tests.

Language details:
- Elixir (.ex, .exs): modules, functions, protocols, imports/use/require
- Lua (.lua): functions, methods, require() imports, table patterns
- Dart (.dart): classes, enums, mixins, extensions, imports, inheritance
- Zig (.zig): functions, structs, enums, unions, @import, test decls
- Haskell (.hs): functions, data/newtype/type, typeclasses, imports
- OCaml (.ml, .mli): let bindings, modules, types, open, applications

Notable grammar quirks handled:
- Elixir: all constructs are generic `call` nodes, distinguished by
  identifier text (defmodule, def, defp, use, import, etc.)
- Dart: no call_expression node — calls detected via selector/argument_part
- Zig: structs/enums are anonymous, named by parent variable_declaration
- Haskell: grammar misspells type_synomym (missing 'n')
- OCaml: sub-grammar under grammars/ocaml, Rust export LANGUAGE_OCAML
- Dart Rust crate uses old language() function instead of LANGUAGE constant
@claude
Copy link
Copy Markdown

claude bot commented Mar 30, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Batch 2 Language Support

Reviewing this PR that adds 6 new language parsers (Elixir, Lua, Dart, Zig, Haskell, OCaml) following the comprehensive review framework.

Review Progress

  • Phase 1: Root Cause Analysis
  • Phase 2: Configuration Change Detection
  • Phase 3: Test Integrity Check
  • Phase 4: Code Quality Review
  • Phase 5: Backlog Compliance Assessment
  • Final Assessment and Rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Expanding language support from ROADMAP Phase 7 to meet growing demand for multi-language codebases. This adds 6 carefully selected languages (Elixir, Lua, Dart, Zig, Haskell, OCaml) that complement existing coverage and address specific ecosystem needs.

Why This Approach: Follows established dual-engine pattern with both WASM TypeScript extractors and native Rust extractors. Maintains architecture consistency with existing language implementations.

Risk Assessment: Low risk - purely additive functionality following proven patterns. All new parsers are non-required (gracefully degrade if WASM unavailable). Comprehensive test coverage included.

Phase 2: Configuration Change Detection ✅

✅ No High-Risk Configuration Changes Detected

  • No Biome/linting rule relaxation - only existing any[] usage follows established patterns
  • No test timeout increases or coverage threshold changes
  • No build configuration or module resolution changes
  • No TypeScript strict mode flag modifications
  • Only additions to devDependencies (tree-sitter grammars) and Cargo.toml (new language crates)

Phase 3: Test Integrity Check ✅

✅ Strong Test Coverage for New Languages

  • 6 new test files added (one per language): elixir.test.ts, lua.test.ts, dart.test.ts, zig.test.ts, haskell.test.ts, ocaml.test.ts
  • 34 total tests across new languages covering core extraction patterns
  • Each test file follows established pattern: module/function/class extraction, imports, calls
  • No weakened assertions - all tests use proper expect().toContainEqual() assertions
  • Tests validate both symbol extraction and kind classification correctly

Phase 4: Code Quality Review ✅

✅ Excellent Implementation Quality

Architecture Consistency:

  • All 6 languages correctly added to LANGUAGE_REGISTRY with required: false (graceful degradation)
  • Dual extractors: TypeScript in src/extractors/ (169-304 LOC each) + Rust in crates/codegraph-core/src/extractors/
  • New Cargo dependencies: 6 tree-sitter crates added (elixir, lua, dart, zig, haskell, ocaml)
  • Follows existing patterns perfectly - no architectural divergence

Code Quality:

  • No TODOs/FIXMEs/HACKs in new code - production-ready implementations
  • Consistent error handling and node traversal patterns
  • Appropriate symbol kind mappings (e.g., Elixir protocols → interface, Dart mixins → class)
  • Clean separation of concerns: parsing, symbol extraction, type mapping

Phase 5: Backlog Compliance Assessment ✅

Backlog Compliance

  • Zero-dep: ✅ — No new runtime dependencies. Only adds tree-sitter grammar devDependencies for build-time WASM generation
  • Foundation-aligned: ✅ — Fully aligned with all 8 core principles. Maintains single registry (P6), preserves dual-engine architecture (P2), zero-cost core (P4)
  • Problem-fit: 4 — Significantly improves agent accuracy by expanding language coverage from 17 to 23 languages. Directly reduces "unsupported language" gaps that force agents to work blind
  • Breaking: No — Purely additive. All new languages are required: false with graceful degradation
  • Tier: 1 — Zero-dep ✅ + Foundation-aligned ✅

Critical Concerns

None identified. This is a textbook implementation of the established language addition pattern.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality implementation that follows all established patterns. Comprehensive test coverage, dual-engine consistency, graceful degradation, and zero architectural debt. The only reason this isn't 5/5 is that language additions, while valuable, don't break new architectural ground — they execute a proven pattern excellently.

This PR successfully delivers 6 new language parsers with production-ready quality and zero architectural compromise. Strongly recommend approval.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR adds Batch 2 language support (Elixir, Lua, Dart, Zig, Haskell, OCaml) from Phase 7 of the roadmap, bringing the total to 23 languages. Each language ships with a WASM/TypeScript extractor, a native Rust extractor, AST node configs, and parser tests — following the same dual-engine pattern established in Batch 1 (PR #708). The approach is well-structured and the TS extractors are generally thorough and well-commented.

Key findings:

  • elixir.rs — dead visibility variable (line 94): let visibility = if keyword == \"defp\" { ... } is computed but never stored in the Definition push. This produces a Rust compiler warning and means the native engine silently drops public/private visibility for all Elixir functions — while the TS engine correctly stores it.

  • elixir.rs — engine parity gap in handle_defmodule (line 62): The Rust extractor hardcodes children: None for module definitions, whereas elixir.ts's handleDefmodule calls collectModuleMembers to populate the module's function children. Users querying module membership will see different results between --engine=native and --engine=wasm.

  • parser_registry.rs.mli files mapped to wrong OCaml grammar (line 126): Both .ml and .mli resolve to Self::OcamlLANGUAGE_OCAML. OCaml interface files (.mli) use a distinct grammar (LANGUAGE_OCAML_INTERFACE) that lacks expression syntax. Parsing .mli with the regular grammar will produce heavily error-node–polluted ASTs and likely miss most interface declarations. The WASM side has the same gap — build-wasm.ts only builds grammars/ocaml with no interface grammar counterpart.

Confidence Score: 4/5

Safe to merge for the TS/WASM engine path; the Rust native engine has three issues worth fixing before the native CI gate is re-enabled

All 34 parser tests pass and the TS/WASM path is clean. However, the native Rust engine has a confirmed dead variable (visibility in elixir.rs), an engine parity gap (module children in elixir.rs), and a grammar mismatch that will produce wrong results for .mli files (parser_registry.rs). The Rust native build is also still marked unchecked in the PR test plan.

crates/codegraph-core/src/extractors/elixir.rs (visibility variable + module children), crates/codegraph-core/src/parser_registry.rs (OCaml .mli grammar)

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/elixir.rs New Elixir Rust extractor: two parity gaps vs the TS engine — visibility computed but dropped from Definition struct (dead variable + missing field), and handle_defmodule hardcodes children: None while TS collects module members
crates/codegraph-core/src/parser_registry.rs Registry maps both .ml and .mli to LANGUAGE_OCAML; .mli interface files should use LANGUAGE_OCAML_INTERFACE to avoid heavily error-node–polluted ASTs
crates/codegraph-core/src/extractors/dart.rs New Dart Rust extractor with class/enum/mixin/extension/import support; extension body methods are not attributed to the extension type (consistent with TS extractor)
crates/codegraph-core/src/extractors/haskell.rs New Haskell Rust extractor; correctly uses the misspelled type_synomym grammar node; parity with TS extractor looks good
crates/codegraph-core/src/extractors/ocaml.rs New OCaml Rust extractor; logic is sound and mirrors the TS extractor well, though the registry-level .mli grammar issue affects both engines
crates/codegraph-core/src/extractors/zig.rs New Zig Rust extractor; correctly handles anonymous struct/enum/union names from parent variable declarations, @import detection, and test blocks
src/extractors/elixir.ts New Elixir TS extractor; correctly passes currentModule through the recursive walk for qualified function names, and collects module children — richer than the Rust counterpart
src/extractors/dart.ts New Dart TS extractor with selector-based call detection; class/mixin/extension/enum support; inheritance edges extracted correctly
src/domain/parser.ts Adds all 6 new languages to the LANGUAGE_REGISTRY with correct extensions, WASM filenames, and extractor functions; no issues found
scripts/build-wasm.ts Adds 6 new WASM grammar entries; OCaml uses grammars/ocaml sub-path (only the .ml grammar, not the .mli interface grammar)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    File["Source File (.ex/.lua/.dart/.zig/.hs/.ml/.mli)"]
    Ext["Extension → LanguageKind\n(parser_registry.rs / LANGUAGE_REGISTRY)"]
    File --> Ext
    Ext --> WASM["WASM Engine (TypeScript)"]
    Ext --> NATIVE["Native Engine (Rust)"]
    WASM --> TSE["TS Extractor\nelixir.ts / lua.ts / dart.ts\nzig.ts / haskell.ts / ocaml.ts"]
    NATIVE --> RSE["Rust Extractor\nelixir.rs / lua.rs / dart.rs\nzig.rs / haskell.rs / ocaml.rs"]
    TSE --> OUT["ExtractorOutput\ndefinitions · calls · imports\nclasses · exports · typeMap"]
    RSE --> OUT2["FileSymbols\ndefinitions · calls · imports\nclasses · ast_nodes"]
    subgraph Issues
        I1["⚠️ elixir.rs: visibility dropped\n(dead var, parity gap)"]
        I2["⚠️ elixir.rs: module children = None\n(parity gap vs TS)"]
        I3["⚠️ parser_registry.rs:\n.mli → LANGUAGE_OCAML\n(should be LANGUAGE_OCAML_INTERFACE)"]
    end
    RSE -.->|affects| I1
    RSE -.->|affects| I2
    Ext -.->|affects| I3
Loading

Reviews (2): Last reviewed commit: "docs: update language counts from 17 to ..." | Re-trigger Greptile

Comment on lines +9 to +12
### Features

* add C, C++, Kotlin, Swift, Scala, Bash language support ([#708](https://github.com/optave/ops-codegraph-tool/pull/708))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 v3.6.0 CHANGELOG entry describes the wrong language batch

The v3.6.0 entry reads: "This release adds first-class support for C, C++, Kotlin, Swift, Scala, and Bash — bringing the total supported languages to 17." That describes Batch 1 (PR #708), not the six languages being added in this PR. The entry should reference Elixir, Lua, Dart, Zig, Haskell, and OCaml bringing the total to 23.

Suggested change
### Features
* add C, C++, Kotlin, Swift, Scala, Bash language support ([#708](https://github.com/optave/ops-codegraph-tool/pull/708))
**Six new languages: Elixir, Lua, Dart, Zig, Haskell, OCaml.** This release adds first-class support for Elixir, Lua, Dart, Zig, Haskell, and OCaml — bringing the total supported languages to 23. Each language ships with dual-engine extractors (WASM TypeScript + native Rust), AST configs, and parser tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 441fcd1. The CHANGELOG v3.6.0 entry now correctly references Elixir, Lua, Dart, Zig, Haskell, and OCaml (Batch 2), with the total updated to 23.

…unction names (#718)

walkElixirNode never updated currentModule when descending into defmodule
bodies, so all functions got unqualified names (e.g. `greet` instead of
`Greeter.greet`), breaking parity with the native Rust engine. Strengthen
the test assertion to check qualified names.
…is_pub (#718)

WASM extractor: handleZigFunction now skips nodes inside struct/union
containers since extractZigContainerMethods already emits them as
qualified methods. Rust extractor: remove dead is_pub call from
handle_zig_function (Definition struct lacks a visibility field).
CHANGELOG v3.6.0 entry incorrectly described Batch 1 languages instead
of Batch 2. README comparison table, feature descriptions, and ROADMAP
phase summary all updated to reflect 23 supported languages.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressing all Greptile review comments

P1 — Elixir WASM currentModule never propagated (f47be65)
Fixed in src/extractors/elixir.ts. walkElixirNode now detects defmodule call nodes and updates nextModule before recursing into children. Functions inside modules now get qualified names (e.g. Greeter.greet), matching the native Rust engine. Test assertion strengthened to verify qualified name.

P1 — Zig WASM duplicate struct method emission (0f7c637)
Fixed in src/extractors/zig.ts. Added isInsideZigContainer() guard to handleZigFunction — if the node is inside a struct_declaration or union_declaration, it returns early since extractZigContainerMethods already emits qualified method definitions.

P2 — is_pub computed but never used in zig.rs (0f7c637)
Removed the dead is_pub call from handle_zig_function in the Rust extractor. The Definition struct lacks a visibility field, so the value couldn't be wired through. Added #[allow(dead_code)] + TODO on the helper function for when the struct gains visibility support.

P2 — README/ROADMAP/CHANGELOG language counts (441fcd1)

  • CHANGELOG v3.6.0 entry now references Batch 2 languages (Elixir, Lua, Dart, Zig, Haskell, OCaml) and total of 23
  • README comparison table: 17 → 23
  • README feature descriptions: all "17 languages" → "23 languages" (5 occurrences)
  • README multi-language list: added Elixir, Lua, Dart, Zig, Haskell, OCaml
  • ROADMAP Phase 7 summary: updated to reflect Batch 1+2 shipped, 11 remaining in 2 batches

CI failure (incremental-parity test): Resolved by merging main into the branch — the test relied on a CFG fix that landed on main after this branch diverged.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant