Skip to content

feat: add query + incremental regression benchmarks#80

Merged
carlos-alm merged 1 commit intomainfrom
fix/review-comments-benchmarks
Feb 24, 2026
Merged

feat: add query + incremental regression benchmarks#80
carlos-alm merged 1 commit intomainfrom
fix/review-comments-benchmarks

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Add query benchmark (scripts/query-benchmark.js): measures fnDepsData and fnImpactData at depth 1/3/5 (median of 5 runs) plus diffImpactData with a synthetic staged diff, both engines
  • Add incremental benchmark (scripts/incremental-benchmark.js): measures full/no-op/1-file rebuild tiers and import resolution throughput (native batch vs JS fallback), both engines
  • Add report updaters (update-query-report.js, update-incremental-report.js) following the exact pattern of existing updaters — trend arrows, HTML comment history, deduplicate by version
  • Add 2 new CI jobs to benchmark.yml (query-benchmark, incremental-benchmark)
  • Add 2 new rows to the CONTRIBUTING.md regression benchmarks table
  • Add Lightweight Footprint section to README with live shields.io badges: unpacked size, dependency GitHub stars, and npm weekly downloads

Test plan

  • node scripts/query-benchmark.js 2>/dev/null | jq . — produces valid JSON
  • node scripts/incremental-benchmark.js 2>/dev/null | jq . — produces valid JSON
  • Pipe into updaters — correct markdown generated
  • npm run lint passes
  • Workflow YAML validates
  • shields.io badge URLs return 200

…int section

Add 4 new regression benchmarks (query depth scaling, diff-impact latency,
incremental build tiers, import resolution throughput) with dual-engine
support, report updaters, and CI workflow jobs. Add lightweight footprint
section to README with live shields.io badges for unpacked size, dependency
stars, and weekly downloads.
@claude
Copy link

claude bot commented Feb 24, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Query + Incremental Regression Benchmarks

Review Tasks:

  • Phase 1: Root Cause Analysis - Understanding the problem and approach
  • Phase 2: Configuration Change Detection - Check for high-risk changes
  • Phase 3: Test Integrity - Verify test quality and assertions
  • Phase 4: Code Quality - Review implementation details
  • Final Assessment - Provide rating and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR addresses the need for comprehensive regression testing of codegraph's query and incremental build performance. The existing benchmark CI only covers build benchmarks and embedding benchmarks, but doesn't test query performance (fnDeps, fnImpact, diffImpact) or incremental build scenarios (full/no-op/1-file rebuilds).

Why This Approach: The solution follows the established benchmark pattern used by existing benchmarks in the codebase:

  • Self-benchmarking on the codegraph repository itself
  • Consistent JSON output format with report updaters
  • CI integration that creates PRs with updated reports
  • Both native and WASM engine testing
  • Median timing over multiple runs for statistical reliability

Risk Assessment: This is a LOW-RISK PR that adds monitoring/observability without changing core functionality. The benchmarks are measurement-only and the CI jobs follow the same secure pattern as existing benchmark jobs.

Phase 2: Configuration Change Detection ✅

No High-Risk Configuration Changes Found:

  • ✅ No biome.json, eslint, or TypeScript config modifications
  • ✅ No test timeout increases or coverage threshold reductions
  • ✅ No disabled linting rules or relaxed build configurations
  • ✅ No package.json dependency or script changes

The PR adds new CI jobs but follows the same secure pattern as existing benchmark workflows.

Phase 3: Test Integrity ✅

No Test Weakening Found:

  • ✅ No skipped tests or weakened assertions
  • ✅ No test files modified alongside code changes
  • ✅ Benchmark scripts include proper error handling (scripts/query-benchmark.js:69)
  • ✅ Scripts handle native engine unavailability gracefully
  • ✅ Proper stdout/stderr separation for clean JSON output

Phase 4: Code Quality ✅

Security Review:

  • Git operations are secure: Uses execFileSync with explicit arguments array, no shell injection risk
  • Proper cleanup: Git staging changes are properly restored in finally blocks (scripts/query-benchmark.js:130)
  • File operations are safe: Creates temporary files only in project directory, restores original content

Code Quality:

  • Consistent patterns: Follows exact same structure as existing benchmark scripts
  • Proper error handling: Scripts include error boundaries and graceful fallbacks
  • Statistical reliability: Uses median of 5 runs for query benchmarks, 3 for incremental
  • Documentation: Scripts are well-documented with usage examples
  • CI Integration: New workflow jobs mirror existing security model

Implementation Details:

  • ✅ Dynamic target selection using graph connectivity analysis (scripts/query-benchmark.js:55-75)
  • ✅ Synthetic diff creation for diffImpact testing with proper cleanup
  • ✅ Import resolution benchmarking covers both native batch and JS fallback paths
  • ✅ Trend calculation with percentage changes and directional arrows
  • ✅ README footprint section uses legitimate shields.io badges

Final Assessment

Root Cause Analysis

Problem Being Solved: Expanding regression testing coverage to include query performance and incremental build scenarios that were previously unmeasured.
Why This Approach: Follows established codebase patterns and provides comprehensive performance monitoring across both engines.
Risk Assessment: Low-risk observability enhancement with no functional code changes.

Critical Concerns

None identified. This is a well-implemented observability enhancement that follows security best practices and existing codebase patterns.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality implementation that addresses a legitimate need for performance regression testing. The code follows established patterns, includes proper security measures, and provides valuable observability without introducing risk. The only reason for 4/5 instead of 5/5 is that this is infrastructure/tooling rather than core feature work.

Minor observations for future consideration:

  • Consider adding performance threshold alerts in CI if benchmarks regress significantly
  • The 836 additions with 0 deletions is appropriate for this type of pure additive enhancement

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 24, 2026

Greptile Summary

Added query and incremental regression benchmarks to complement existing build/embedding benchmarks. The new scripts measure query depth scaling (fnDeps/fnImpact at depths 1/3/5), diff-impact latency, and incremental build performance (full/no-op/1-file rebuild tiers) plus import resolution throughput (native vs JS).

  • Both benchmarks follow the established pattern: redirect logs to stderr, output JSON to stdout, test both engines, use median of multiple runs
  • Report updaters match existing structure: HTML comment storage, version deduplication, trend arrows with 2% threshold
  • CI jobs replicate the embedding-benchmark pattern: run on workflow dispatch or after successful tests, create PRs for updates
  • Documentation additions are accurate: CONTRIBUTING.md correctly maps benchmarks to relevant source files, README badges use valid shields.io URLs
  • README footprint section adds transparency about dependencies without introducing bloat

Confidence Score: 5/5

  • Safe to merge - well-structured addition following existing patterns with no runtime impact
  • All new code follows established patterns from existing benchmark infrastructure, no changes to production code paths, only adds observability tooling and documentation
  • No files require special attention

Important Files Changed

Filename Overview
scripts/query-benchmark.js Added query benchmark measuring fnDeps/fnImpact depth scaling and diffImpact latency, following existing benchmark patterns
scripts/incremental-benchmark.js Added incremental build benchmark measuring full/no-op/1-file rebuilds and import resolution throughput
scripts/update-query-report.js Added report updater that generates markdown with trend arrows and historical data, matching existing updater pattern
scripts/update-incremental-report.js Added report updater for incremental benchmarks with historical tracking and trend calculation
.github/workflows/benchmark.yml Added two new CI jobs (query-benchmark, incremental-benchmark) following existing workflow patterns

Last reviewed commit: 0fd1967

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@carlos-alm carlos-alm merged commit eb37ec5 into main Feb 24, 2026
17 checks passed
@carlos-alm carlos-alm deleted the fix/review-comments-benchmarks branch February 24, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant