Skip to content

feat: v2 compression features — quality metrics, flow detection, tiered budget, depth control#19

Merged
SimplyLiz merged 20 commits intodevelopfrom
feature/v2-improvements
Mar 20, 2026
Merged

feat: v2 compression features — quality metrics, flow detection, tiered budget, depth control#19
SimplyLiz merged 20 commits intodevelopfrom
feature/v2-improvements

Conversation

@SimplyLiz
Copy link
Owner

Summary

  • 11 new opt-in compression features with zero-dependency, backward-compatible defaults
  • Quality metrics computed automatically on every compression (entity_retention, structural_integrity, quality_score)
  • 7 new source modules: entities, entropy, flow, coreference, cluster, discourse, ml-classifier
  • 663 tests (up from 540), including 8 adversarial edge-case tests
  • Default path produces identical output to v1.1.0 — all new features are opt-in

Feature impact (bench results, default recencyWindow=0)

Feature Best ratio gain Quality floor Round-trip
compressionDepth: 'moderate' +168% (Deep) 100% all scenarios PASS
relevanceThreshold: 3 +427% (Long Q&A) 80% min PASS
conversationFlow +141% (Long Q&A) 64% min PASS
importanceScoring neutral 100% PASS
coreference neutral 100% PASS
semanticClustering +21% (Agentic) 92% min PASS
discourseAware -8 to -28% (experimental) 100% PASS

Recommended usage

// Safe upgrade — zero quality cost
compress(messages, { compressionDepth: 'moderate' });

// Aggressive but measured
compress(messages, { compressionDepth: 'moderate', relevanceThreshold: 3 });

Key fixes during development

  • Flow chains and clusters only mark as processed after successful compression (prevents message drops)
  • Semantic clusters restricted to consecutive indices (preserves round-trip ordering)
  • Flow chains exclude code-fenced messages (preserves structural integrity)
  • Adaptive budgets gated behind explicit depth setting (preserves default path parity)
  • Importance threshold raised from 0.35 → 0.65 (prevents over-preservation)

Test plan

  • 663 unit tests pass (24 files)
  • All 8 bench scenarios × 8 V2 configs pass round-trip
  • Default path output matches develop baseline exactly
  • npm run bench quality metrics all ≥ 0.94 on default path
  • npm run bench:compare shows v1 vs v2 side-by-side
  • Adversarial tests cover: pronoun-heavy, scattered entities, correction chains, code-interleaved, near-duplicates, 10k+ messages, mixed SQL/JSON/bash
  • Lint clean, format clean

- Extract entity logic to src/entities.ts with enhanced extraction
  (file paths, URLs, version numbers)
- Compute entity_retention, structural_integrity, reference_coherence,
  and composite quality_score in CompressResult
- Add relevanceThreshold option: low-value messages replaced with
  compact stubs instead of low-quality summaries
- Export bestSentenceScore for external relevance scoring
- Add roadmap-v2.md tracking all planned improvements
- Tiered budget: keeps recencyWindow fixed, progressively compresses
  older content by priority tier (tighten → stub → truncate) instead
  of shrinking the recency window via binary search
- Adaptive summary budget: scales with content density — entity-dense
  messages get up to 45% budget, sparse content gets down to 15%
- budgetStrategy option: 'binary-search' (default) or 'tiered'
- Both sync and async paths supported for tiered strategy
- New entropyScorer option: plug in a small LM for self-information
  based sentence importance scoring (Selective Context paper)
- entropyScorerMode: 'replace' (entropy only) or 'augment' (weighted
  average with heuristic, default)
- src/entropy.ts: splitSentences, normalizeScores, combineScores utils
- Sync and async paths supported; async scorer throws in sync mode
- Zero new dependencies: scorer is user-provided function
- Detects Q&A pairs, request→action→confirmation chains, corrections,
  and acknowledgment patterns in message history
- Groups flow chains into single compression units producing more
  coherent summaries (e.g., "Q: how does X work? → A: it uses Y")
- conversationFlow option: opt-in, default false
- Flow chains override soft preservation (recency, short content)
  but not hard blocks (system role, dedup, tool_calls)
…uto)

- compressionDepth option controls summarization aggressiveness
- gentle: standard sentence selection (default, backward compatible)
- moderate: 50% tighter budgets for more aggressive compression
- aggressive: entity-only stubs for maximum ratio
- auto: progressively tries gentle → moderate → aggressive until
  tokenBudget fits, with quality gate (stops if quality < 0.60)
- Both sync and async paths supported
- Coreference tracking (coreference option): when a compressed message
  defines an entity referenced by a preserved message, the definition
  is inlined into the summary to prevent orphaned references
- Semantic clustering (semanticClustering option): groups messages by
  topic using TF-IDF cosine similarity + entity overlap Jaccard, then
  compresses each cluster as a unit for better topic coherence
- Both features are opt-in, zero new dependencies
- Segments text into Elementary Discourse Units with dependency graph
- Clause boundary detection via discourse markers (then, because, which...)
- Pronoun/demonstrative, temporal, and causal dependency edges
- When selecting EDUs for summary, dependency parents are included
  (up to 2 levels) to prevent incoherent output
- discourseAware option: opt-in, default false
- 8 adversarial test cases: pronoun-heavy, scattered entities,
  correction chains, code-interleaved prose, near-duplicates with
  critical differences, 10k+ char messages, mixed SQL/JSON/bash,
  and full round-trip integrity with all features enabled
- Update roadmap: 14 of 16 items complete
- ML token classifier (mlTokenClassifier option): per-token keep/remove
  classification via user-provided model (LLMLingua-2 style). Includes
  sync/async support, whitespace tokenizer, mock classifier for testing
- A/B comparison tool (npm run bench:compare): side-by-side comparison
  of default vs v2 features across coding, deep conversation, and
  agentic scenarios. Reports ratio, quality, entity retention, tokens
- All 16/16 roadmap items now complete
…tion

- bench/run.ts: new Quality Metrics (v2) table showing entity retention,
  structural integrity, reference coherence, and quality score per scenario
- bench/baseline.ts: QualityResult type, quality section in generated docs,
  average quality score in summary table
- bench/compare.ts: add Long Q&A and Technical explanation scenarios,
  rename V2 option set to "V2 balanced" (no relevanceThreshold)
- flow.ts: exclude messages with code fences from flow chain detection
  to prevent Q&A chains from dropping code content
- package.json: add bench:compare script
- New docs/v2-features.md: full documentation for all 11 new features
  with usage examples, how-it-works sections, and explicit tradeoff
  analysis for each feature
- docs/api-reference.md: updated exports listing, 13 new options in
  CompressOptions table, 5 new result fields, new types
  (MLTokenClassifier, TokenClassification)
- docs/token-budget.md: added tiered budget strategy and compression
  depth sections with cross-links
- docs/README.md: added V2 Features to index
- Each feature documents: what it does, how to use it, how it works
  internally, and what you give up (the tradeoff)
- Flow chains and clusters no longer skip non-member messages between
  chain endpoints. Previously, a chain spanning indices [1,4] would
  skip indices 2,3 even if they weren't chain members (dropping code)
- Importance threshold raised from 0.35 to 0.65. The old threshold
  preserved nearly all messages in entity-rich conversations, reducing
  compression ratio by up to 30% with no quality benefit
- EDU scorer replaced length-based heuristic with information-density
  scoring (identifiers, numbers, emphasis) to avoid keeping long filler
  clauses over short technical ones
- Quick reference table, feature section, and TSDoc all flag the 8-28%
  ratio regression without a custom ML scorer
- Explain why: dependency tracking inherently fights compression by
  pulling in parent EDUs, and the rule-based scorer can't distinguish
  load-bearing dependencies from decorative ones
- Recommend using exported segmentEDUs/scoreEDUs/selectEDUs directly
  with a custom scorer instead of the discourseAware option
- Remove discourseAware from recommended feature combinations
Adaptive entity-aware budgets were changing default compression output
(6% regression on coding scenario) because extractEntities was called
unconditionally. Now entity-adaptive budgets only activate when
compressionDepth is explicitly set to moderate/aggressive/auto.

Default path (no v2 options) now produces identical output to develop.
- Flow chains and clusters only mark themselves as processed AFTER
  successful compression. Previously they were marked on entry,
  causing non-compressed chain members to be silently dropped
- Semantic clusters restricted to consecutive indices only —
  non-consecutive merges broke round-trip because uncompress can't
  restore interleaved message ordering
- Added V2 Features Comparison section to bench reporter showing
  each feature individually and recommended combo vs default, with
  per-scenario ratio/quality and delta row
- All 8 scenarios × 8 configs pass round-trip verification
if (score > best) best = score;
}
return best;
}

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
scoreMap.set(i, rawScores[i]);
}
} else {
// augment: weighted average of heuristic and entropy

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
if (userSummarizer) {
next = gen.next(await withFallback(text, userSummarizer, budget));
} else {
next = gen.next(summarize(text, budget, externalScores));

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
if (entities.length === 0) return '';

// For each entity, find the sentence where it first appears
const sentences = sourceContent.match(/[^.!?\n]+[.!?]+/g) ?? [sourceContent];

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
*/
export function segmentEDUs(text: string): EDU[] {
// First split into sentences
const sentences = text.match(/[^.!?\n]+[.!?]+/g) ?? [text.trim()];

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
* Returns the sentences and their original indices for reassembly.
*/
export function splitSentences(text: string): string[] {
const sentences = text.match(/[^.!?\n]+[.!?]+/g);

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings with many repetitions of ' '.
@SimplyLiz SimplyLiz merged commit ac04bef into develop Mar 20, 2026
10 of 11 checks passed
@SimplyLiz SimplyLiz deleted the feature/v2-improvements branch March 20, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant