Skip to content

feat: dev publish workflow, parser refactor, and quality-of-life improvements#25

Merged
carlos-alm merged 19 commits intomainfrom
feat/dev-publish-workflow
Feb 23, 2026
Merged

feat: dev publish workflow, parser refactor, and quality-of-life improvements#25
carlos-alm merged 19 commits intomainfrom
feat/dev-publish-workflow

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Publish workflow consolidation: merge dev + stable publish into a single publish.yml workflow with workflow_dispatch inputs
  • Parser refactor: split monolithic parser.js extractors into per-language files under src/extractors/
  • codegraph stats command: new CLI command for graph health overview
  • Native engine fixes: normalize import paths, throw on explicit --engine native when addon unavailable
  • Registry improvements: TTL-based pruning for idle entries, auto-prune stale entries, skip temp dir registration, isolate CLI tests from real registry
  • Embedding regression CI: real-model integration test with dedicated weekly workflow
  • Worktree workflow hooks: guard-git.sh, track-edits.sh, rebuild-graph.sh for parallel session safety
  • Windows compatibility: replace jq with node in hooks
  • Competitive analysis: expanded from 21 to 135+ tools

Test plan

  • Full test suite passes (385 passed, 43 skipped) on all platforms
  • Lint clean
  • Verify codegraph stats output on a real repo
  • Verify publish.yml workflow_dispatch triggers correctly for dev and stable channels
  • Confirm embedding regression workflow runs on schedule and relevant PR changes

Add dev-publish.yml that triggers on every merge to main, publishing
prerelease versions (e.g. 2.0.1-dev.abc1234) with --tag dev. Includes
concurrency control and skip logic for stable release version bumps.

Simplify publish.yml to release-event-only: remove workflow_dispatch
trigger and inputs, add explicit --tag latest, disable prepublishOnly
in CI since tests already run in preflight.
Merge dev-publish and stable-publish into one publish.yml to satisfy
npm Trusted Publishing's one-workflow-per-package constraint.

- push to main → dev release (e.g. 2.0.1-dev.abc1234) with --tag dev
- GitHub Release event → stable release with --tag latest
- Concurrency group cancels in-flight dev publishes on rapid merges
- Skip logic for stable version bump commits (chore: release v*)
- Version computed in dedicated job, shared via outputs
- No git commits/tags/PRs for dev releases
The Rust resolve_import function produced paths like src/./db.js
because Path::join preserves . segments from relative imports.
This caused import edges to be lost for native engine users.

Fix by normalizing the resolved PathBuf immediately after join using
components().collect(), and upgrading normalize_path to also clean
. / .. segments. Remove the temporary path.normalize() JS workaround
since the Rust side now returns clean paths directly.
Prevent registry pollution from temp directory builds by checking
os.tmpdir() before auto-registration. Auto-prune stale entries on
`registry list` (CLI) and `list_repos` (MCP) so users and AI agents
always see a clean registry.
Prevent concurrent Claude Code instances from unstaging, deleting,
or reverting each other's files.
Wrap native resolveImport and resolveImports results with
normalizePath(path.normalize()) to catch any remaining ./.. segments
that the Rust engine might produce on edge cases.
Track lastAccessedAt on registry entries (updated on build and MCP
query). pruneRegistry now removes entries not accessed within a
configurable TTL (default 30 days) in addition to missing directories.
CLI `registry prune --ttl <days>` exposes the TTL parameter.
Move 9 language extractors (~1,630 lines) from parser.js into
src/extractors/, mirroring the native engine's per-file structure.
parser.js re-exports all extractors for backward compatibility.

Also fix config.test.js to match current embeddings default (jina-code).
Mark registry cleanup (#4) and git-diff guard (#6) as fixed.
Update testing summary for native engine and registry status.
…PATH env var

The cli.test.js afterAll called pruneRegistry() with no arguments, operating
on the real ~/.codegraph/registry.json. The run() helper also lacked HOME
isolation, risking writes to the real registry. Add CODEGRAPH_REGISTRY_PATH
env var support to registry.js, isolate all CLI spawns with a fake HOME, and
remove the bare pruneRegistry() call.
Previously --engine native silently fell back to WASM with a warning,
which was easy to miss. Now it throws a clear error with install
instructions, matching user intent. Auto mode remains unchanged.
…ctors

Each extractor's internal AST traversal function was identically named
`walk`, making codegraph search results ambiguous across languages.
Renamed to walkPythonNode, walkJavaScriptNode, walkGoNode, etc. so
symbol searches return unique, language-specific hits.
Unified command showing node/edge breakdowns, language distribution,
cycle counts, coupling hotspots, and embedding status. Supports --json.
Add guard-git.sh (PreToolUse) to block dangerous git commands that
interfere with parallel sessions, and track-edits.sh (PostToolUse) to
log edited files so commits can be validated against the session log.
Update CLAUDE.md with worktree-first workflow and fix the Claude Code
hooks example in recommended-practices.md to use the correct schema.
Add integration test that runs the embed+search pipeline with the real
minilm model to catch regressions from model updates, embedding text
format changes, or pipeline bugs. Includes a dedicated CI workflow
(weekly + on relevant PR changes) with HuggingFace model caching.
check-readme.sh and enrich-context.sh used jq for JSON parsing, which
isn't available on Windows. Switch to node inline scripts matching the
pattern used by guard-git.sh and track-edits.sh. Also commit
rebuild-graph.sh which was wired into settings.json but never tracked.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 23, 2026

Greptile Summary

Consolidated dev+stable publish workflows into a single workflow_dispatch-triggered publish.yml that auto-computes versions and npm tags. Dev releases now publish automatically on every main commit with -dev.SHA suffix tagged as dev on npm, while stable releases extract version from GitHub release tags.

Refactored the 1917-line monolithic parser.js into a clean architecture with language-specific extractors in src/extractors/ (JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, HCL). All extraction logic moved to dedicated files with shared helpers, improving maintainability.

Added codegraph stats command that provides comprehensive graph health metrics: node/edge counts by kind, language distribution, cycle detection, top coupling hotspots, and embedding metadata.

Enhanced registry with TTL-based auto-pruning (30-day default) that removes idle entries, lastAccessedAt tracking updated on each resolution, and CODEGRAPH_REGISTRY_PATH env var support. Builder now skips auto-registration for temp directories.

Fixed native engine import path normalization issues by cleaning . and .. segments via PathBuf components in Rust and normalizing results in JS wrapper.

Implemented worktree workflow safety with three new hooks: guard-git.sh blocks dangerous git commands (add -A, reset, clean, stash) and validates commits against session edit log; track-edits.sh logs all Edit/Write operations; rebuild-graph.sh auto-rebuilds graph incrementally after source edits.

Replaced jq with node in all hooks for Windows compatibility.

Added embedding regression test with real ML model validation and dedicated weekly CI workflow.

All test changes include proper isolation (skipRegistry, temp HOME override) to prevent cross-test registry pollution.

Confidence Score: 4/5

  • Safe to merge with minor verification needed for workflow behavior
  • Large refactor is well-structured with comprehensive test coverage (385 tests passing). All changes are backward-compatible with proper exports. The publish workflow consolidation is a significant change that should be verified via test trigger, but the logic is sound. Registry TTL and worktree hooks add valuable safety features. Native engine path normalization fixes real cross-platform bugs.
  • Verify .github/workflows/publish.yml triggers correctly for both dev and stable flows via workflow_dispatch test runs before relying on automated publishes

Important Files Changed

Filename Overview
.github/workflows/publish.yml Consolidated dev+stable publish into single workflow with auto-versioning; dev releases publish on every main commit with -dev.SHA suffix and dev npm tag
src/parser.js Refactored from 1917 line monolith to thin wrapper that re-exports extractors from src/extractors/ — all language-specific logic moved to dedicated files
src/extractors/javascript.js Extracted JavaScript/TypeScript parser logic from parser.js with no functional changes, just module reorganization
src/cli.js Added stats command for graph health overview, auto-prune on registry list, changed default embed model to jina-code, dynamic version from package.json
src/queries.js Added stats and statsData functions for comprehensive graph health metrics (nodes, edges, languages, cycles, hotspots, embeddings); fixed git repo detection in diffImpact
src/registry.js Added TTL-based pruning (30-day default), lastAccessedAt tracking, CODEGRAPH_REGISTRY_PATH env var support, preserves addedAt on re-registration
src/builder.js Skip auto-registration for temp directories, added skipRegistry option for test isolation
src/resolve.js Normalize import paths from native engine to fix cross-platform path inconsistencies, guard against missing aliases.paths
crates/codegraph-core/src/import_resolution.rs Improved path normalization to clean . and .. segments via PathBuf components, ensuring cross-platform consistency
.claude/hooks/guard-git.sh New PreToolUse hook that blocks dangerous git commands (add -A, reset, clean, stash) and validates commits against session edit log for parallel session safety
.claude/hooks/track-edits.sh New PostToolUse hook that logs all Edit/Write operations to session-edits.log for commit validation by guard-git.sh; uses node instead of jq for Windows compatibility
tests/unit/registry.test.js Comprehensive test coverage for TTL pruning, lastAccessedAt tracking, env var override, and addedAt preservation
tests/search/embedding-regression.test.js New real-model integration test that validates semantic search quality with 5 regression queries, skips gracefully when transformers not installed

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Main Push to main] --> B{Event Type?}
    B -->|push| C[Dev Publish Flow]
    B -->|release| D[Stable Release Flow]
    
    C --> E[Compute Dev Version<br/>MAJOR.MINOR.PATCH+1-dev.SHA]
    D --> F[Extract Version from Tag]
    
    E --> G[Run Tests]
    F --> G
    
    G --> H[Build Native Binaries<br/>Linux/macOS/Windows]
    H --> I[Publish to npm]
    
    I --> J{Is Stable?}
    J -->|Yes| K[Create Version Bump PR<br/>+ Push Git Tag]
    J -->|No| L[Skip PR/Tag<br/>Show Dev Install Instructions]
    
    subgraph "Parser Refactor"
        M[parser.js<br/>1917 lines] -.refactor.-> N[parser.js<br/>thin wrapper]
        N --> O[extractors/javascript.js]
        N --> P[extractors/python.js]
        N --> Q[extractors/go.js]
        N --> R[extractors/rust.js]
        N --> S[extractors/java.js]
        N --> T[extractors/...]
        O & P & Q & R & S & T --> U[extractors/helpers.js]
    end
    
    subgraph "Worktree Hooks"
        V[Edit/Write] --> W[track-edits.sh<br/>Log to session-edits.log]
        W --> X[rebuild-graph.sh<br/>Incremental rebuild]
        Y[git commit] --> Z[guard-git.sh<br/>Validate staged files]
        Z -.check.-> W
    end
    
    subgraph "Registry TTL"
        AA[registerRepo] --> AB[Set lastAccessedAt]
        AC[resolveRepoDbPath] --> AB
        AD[pruneRegistry] --> AE{Check each entry}
        AE -->|Missing dir| AF[Remove: missing]
        AE -->|Idle > 30 days| AG[Remove: expired]
        AE -->|Active| AH[Keep]
    end
Loading

Last reviewed commit: ac0b198

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@carlos-alm carlos-alm merged commit c6bdbaa into main Feb 23, 2026
21 checks passed
@carlos-alm carlos-alm deleted the feat/dev-publish-workflow branch February 23, 2026 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant