feat: add query + incremental regression benchmarks by carlos-alm · Pull Request #80 · optave/codegraph

carlos-alm · 2026-02-24T03:58:51Z

Summary

Add query benchmark (scripts/query-benchmark.js): measures fnDepsData and fnImpactData at depth 1/3/5 (median of 5 runs) plus diffImpactData with a synthetic staged diff, both engines
Add incremental benchmark (scripts/incremental-benchmark.js): measures full/no-op/1-file rebuild tiers and import resolution throughput (native batch vs JS fallback), both engines
Add report updaters (update-query-report.js, update-incremental-report.js) following the exact pattern of existing updaters — trend arrows, HTML comment history, deduplicate by version
Add 2 new CI jobs to benchmark.yml (query-benchmark, incremental-benchmark)
Add 2 new rows to the CONTRIBUTING.md regression benchmarks table
Add Lightweight Footprint section to README with live shields.io badges: unpacked size, dependency GitHub stars, and npm weekly downloads

Test plan

node scripts/query-benchmark.js 2>/dev/null | jq . — produces valid JSON
node scripts/incremental-benchmark.js 2>/dev/null | jq . — produces valid JSON
Pipe into updaters — correct markdown generated
npm run lint passes
Workflow YAML validates
shields.io badge URLs return 200

…int section Add 4 new regression benchmarks (query depth scaling, diff-impact latency, incremental build tiers, import resolution throughput) with dual-engine support, report updaters, and CI workflow jobs. Add lightweight footprint section to README with live shields.io badges for unpacked size, dependency stars, and weekly downloads.

claude · 2026-02-24T03:59:08Z

Claude finished @carlos-alm's task —— View job

PR Review: Query + Incremental Regression Benchmarks

Review Tasks:

Phase 1: Root Cause Analysis - Understanding the problem and approach
Phase 2: Configuration Change Detection - Check for high-risk changes
Phase 3: Test Integrity - Verify test quality and assertions
Phase 4: Code Quality - Review implementation details
Final Assessment - Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR addresses the need for comprehensive regression testing of codegraph's query and incremental build performance. The existing benchmark CI only covers build benchmarks and embedding benchmarks, but doesn't test query performance (fnDeps, fnImpact, diffImpact) or incremental build scenarios (full/no-op/1-file rebuilds).

Why This Approach: The solution follows the established benchmark pattern used by existing benchmarks in the codebase:

Self-benchmarking on the codegraph repository itself
Consistent JSON output format with report updaters
CI integration that creates PRs with updated reports
Both native and WASM engine testing
Median timing over multiple runs for statistical reliability

Risk Assessment: This is a LOW-RISK PR that adds monitoring/observability without changing core functionality. The benchmarks are measurement-only and the CI jobs follow the same secure pattern as existing benchmark jobs.

Phase 2: Configuration Change Detection ✅

No High-Risk Configuration Changes Found:

✅ No biome.json, eslint, or TypeScript config modifications
✅ No test timeout increases or coverage threshold reductions
✅ No disabled linting rules or relaxed build configurations
✅ No package.json dependency or script changes

The PR adds new CI jobs but follows the same secure pattern as existing benchmark workflows.

Phase 3: Test Integrity ✅

No Test Weakening Found:

✅ No skipped tests or weakened assertions
✅ No test files modified alongside code changes
✅ Benchmark scripts include proper error handling (scripts/query-benchmark.js:69)
✅ Scripts handle native engine unavailability gracefully
✅ Proper stdout/stderr separation for clean JSON output

Phase 4: Code Quality ✅

Security Review:

✅ Git operations are secure: Uses execFileSync with explicit arguments array, no shell injection risk
✅ Proper cleanup: Git staging changes are properly restored in finally blocks (scripts/query-benchmark.js:130)
✅ File operations are safe: Creates temporary files only in project directory, restores original content

Code Quality:

✅ Consistent patterns: Follows exact same structure as existing benchmark scripts
✅ Proper error handling: Scripts include error boundaries and graceful fallbacks
✅ Statistical reliability: Uses median of 5 runs for query benchmarks, 3 for incremental
✅ Documentation: Scripts are well-documented with usage examples
✅ CI Integration: New workflow jobs mirror existing security model

Implementation Details:

✅ Dynamic target selection using graph connectivity analysis (scripts/query-benchmark.js:55-75)
✅ Synthetic diff creation for diffImpact testing with proper cleanup
✅ Import resolution benchmarking covers both native batch and JS fallback paths
✅ Trend calculation with percentage changes and directional arrows
✅ README footprint section uses legitimate shields.io badges

Final Assessment

Root Cause Analysis

Problem Being Solved: Expanding regression testing coverage to include query performance and incremental build scenarios that were previously unmeasured.
Why This Approach: Follows established codebase patterns and provides comprehensive performance monitoring across both engines.
Risk Assessment: Low-risk observability enhancement with no functional code changes.

Critical Concerns

None identified. This is a well-implemented observability enhancement that follows security best practices and existing codebase patterns.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: High-quality implementation that addresses a legitimate need for performance regression testing. The code follows established patterns, includes proper security measures, and provides valuable observability without introducing risk. The only reason for 4/5 instead of 5/5 is that this is infrastructure/tooling rather than core feature work.

Minor observations for future consideration:

Consider adding performance threshold alerts in CI if benchmarks regress significantly
The 836 additions with 0 deletions is appropriate for this type of pure additive enhancement

greptile-apps · 2026-02-24T04:00:56Z

Greptile Summary

Added query and incremental regression benchmarks to complement existing build/embedding benchmarks. The new scripts measure query depth scaling (fnDeps/fnImpact at depths 1/3/5), diff-impact latency, and incremental build performance (full/no-op/1-file rebuild tiers) plus import resolution throughput (native vs JS).

Both benchmarks follow the established pattern: redirect logs to stderr, output JSON to stdout, test both engines, use median of multiple runs
Report updaters match existing structure: HTML comment storage, version deduplication, trend arrows with 2% threshold
CI jobs replicate the embedding-benchmark pattern: run on workflow dispatch or after successful tests, create PRs for updates
Documentation additions are accurate: CONTRIBUTING.md correctly maps benchmarks to relevant source files, README badges use valid shields.io URLs
README footprint section adds transparency about dependencies without introducing bloat

Confidence Score: 5/5

Safe to merge - well-structured addition following existing patterns with no runtime impact
All new code follows established patterns from existing benchmark infrastructure, no changes to production code paths, only adds observability tooling and documentation
No files require special attention

Important Files Changed

Filename	Overview
scripts/query-benchmark.js	Added query benchmark measuring fnDeps/fnImpact depth scaling and diffImpact latency, following existing benchmark patterns
scripts/incremental-benchmark.js	Added incremental build benchmark measuring full/no-op/1-file rebuilds and import resolution throughput
scripts/update-query-report.js	Added report updater that generates markdown with trend arrows and historical data, matching existing updater pattern
scripts/update-incremental-report.js	Added report updater for incremental benchmarks with historical tracking and trend calculation
.github/workflows/benchmark.yml	Added two new CI jobs (query-benchmark, incremental-benchmark) following existing workflow patterns

_{Last reviewed commit: 0fd1967}

greptile-apps

_{7 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps bot reviewed Feb 24, 2026

View reviewed changes

carlos-alm merged commit eb37ec5 into main Feb 24, 2026
17 checks passed

carlos-alm deleted the fix/review-comments-benchmarks branch February 24, 2026 04:03

greptile-apps bot mentioned this pull request Mar 19, 2026

docs: add unified multi-repo graph to backlog (#97) #514

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add query + incremental regression benchmarks#80

feat: add query + incremental regression benchmarks#80
carlos-alm merged 1 commit intomainfrom
fix/review-comments-benchmarks

carlos-alm commented Feb 24, 2026

Uh oh!

claude bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 24, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Feb 24, 2026

Summary

Test plan

Uh oh!

claude bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Query + Incremental Regression Benchmarks

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity ✅

Phase 4: Code Quality ✅

Final Assessment

Root Cause Analysis

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Feb 24, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Feb 24, 2026 •

edited

Loading