Skip to content

fix: drop Node 18 from CI, require Node >=20#21

Merged
SimplyLiz merged 10 commits intodevelopfrom
main
Mar 22, 2026
Merged

fix: drop Node 18 from CI, require Node >=20#21
SimplyLiz merged 10 commits intodevelopfrom
main

Conversation

@SimplyLiz
Copy link
Owner

Summary

  • Drop Node 18 from CI test matrix — vitest 4 (via rolldown) requires node:util.styleText (Node 20+)
  • Update engines to >=20 in package.json
  • Simplify CI test step (no more Node 18 special-case)
  • Update CLAUDE.md to reflect Node >=20

The library code itself is pure ESM and technically runs on Node 18, but the test runner can't. Since .nvmrc targets 22 and coverage already required 20+, this aligns everything.

Test plan

  • CI should pass on Node 20 and 22 (no 18 job)
  • Lint, bench, e2e unaffected

SimplyLiz and others added 10 commits February 25, 2026 07:06
Add domain-agnostic framing (legal, medical, documentation, support)
and rename "Code-aware" to "Structure-aware" in feature list.
Separate quality benchmark system (bench/quality.ts) that measures
compression fidelity independently from the existing perf/regression
suite. Includes:

- quality-analysis.ts: compressed-only retention metrics, semantic
  fidelity scoring (fact extraction + negation detection), per-message
  quality breakdown, and recencyWindow tradeoff sweep
- quality-scenarios.ts: 6 edge case scenarios (single-char, giant
  message, code-only, entity-dense, prose-only, mixed languages)
- quality.ts: standalone runner with --save/--check against its own
  baseline namespace (bench/baselines/quality/)
- backfill.ts: retroactively generates quality baselines for older
  git refs via temporary worktrees

Key design decisions:
- Retention measured only on compressed messages (fixes the all-1.0
  masking problem in the existing analyzeRetention)
- Code block integrity is byte-identical verification, not just fence
  count
- Zero-tolerance regression on code block integrity, 5% on entity
  retention, 10% on fact retention
- Completely isolated from existing --check (separate baseline files)
- Backfilled v1.0.0 baseline for historical comparison
… LLM judge

Replace broken quality metrics (keywordRetention, factRetention, negationErrors)
with five meaningful ones: task-based probes (~70 across 13 scenarios),
information density, compressed-only quality score, negative compression
detection, and summary coherence checks.

- Add ProbeDefinition type and getProbesForScenario() with curated probes
- Add computeInformationDensity(), computeCompressedQualityScore(),
  detectNegativeCompressions(), checkCoherence() analysis functions
- Add min-output-chars probes to catch over-aggressive compression
- Add lang aliases to countCodeBlocks (typescript/ts, python/py, yaml/yml)
- Fix regression thresholds: coherence/negativeCompressions track increases
  from baseline, not just zero-to-nonzero transitions
- Add --llm-judge flag with multi-provider support (OpenAI, Anthropic,
  Gemini, Ollama) for LLM-as-judge scoring (display-only, not in baseline)
- Add Gemini provider to bench/llm.ts (@google/genai SDK)
- Add bench:quality:judge npm script
- Update docs/benchmarks.md with quality metrics, probes, LLM judge, and
  regression threshold documentation
- Update CLAUDE.md with quality benchmark commands
- Re-save quality baseline with new format
- Bump version to 1.3.0
- Add quality history documentation with version comparison
- Add --features flag for opt-in feature benchmarking
- Update CHANGELOG with all 1.3.0 changes
- Save baselines for v1.3.0
- Regenerate benchmark-results.md
- Link quality-history.md from README and docs index
# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	README.md
Re-apply: version bump to 1.3.0, CHANGELOG 1.3.0 section, quality
benchmark npm scripts, CLAUDE.md commands, Gemini provider in llm.ts,
quality-history link in README and docs index, @google/genai devDep.
v1.3.0 — Quality benchmark overhaul + LLM judge
@SimplyLiz SimplyLiz merged commit 53862a1 into develop Mar 22, 2026
30 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant