feat: v2 compression features — quality metrics, flow detection, tiered budget, depth control by SimplyLiz · Pull Request #19 · SimplyLiz/ContextCompressionEngine

SimplyLiz · 2026-03-20T22:30:21Z

Summary

11 new opt-in compression features with zero-dependency, backward-compatible defaults
Quality metrics computed automatically on every compression (entity_retention, structural_integrity, quality_score)
7 new source modules: entities, entropy, flow, coreference, cluster, discourse, ml-classifier
663 tests (up from 540), including 8 adversarial edge-case tests
Default path produces identical output to v1.1.0 — all new features are opt-in

Feature impact (bench results, default recencyWindow=0)

Feature	Best ratio gain	Quality floor	Round-trip
`compressionDepth: 'moderate'`	+168% (Deep)	100% all scenarios	PASS
`relevanceThreshold: 3`	+427% (Long Q&A)	80% min	PASS
`conversationFlow`	+141% (Long Q&A)	64% min	PASS
`importanceScoring`	neutral	100%	PASS
`coreference`	neutral	100%	PASS
`semanticClustering`	+21% (Agentic)	92% min	PASS
`discourseAware`	-8 to -28% (experimental)	100%	PASS

Recommended usage

// Safe upgrade — zero quality cost
compress(messages, { compressionDepth: 'moderate' });

// Aggressive but measured
compress(messages, { compressionDepth: 'moderate', relevanceThreshold: 3 });

Key fixes during development

Flow chains and clusters only mark as processed after successful compression (prevents message drops)
Semantic clusters restricted to consecutive indices (preserves round-trip ordering)
Flow chains exclude code-fenced messages (preserves structural integrity)
Adaptive budgets gated behind explicit depth setting (preserves default path parity)
Importance threshold raised from 0.35 → 0.65 (prevents over-preservation)

Test plan

663 unit tests pass (24 files)
All 8 bench scenarios × 8 V2 configs pass round-trip
Default path output matches develop baseline exactly
npm run bench quality metrics all ≥ 0.94 on default path
npm run bench:compare shows v1 vs v2 side-by-side
Adversarial tests cover: pronoun-heavy, scattered entities, correction chains, code-interleaved, near-duplicates, 10k+ messages, mixed SQL/JSON/bash
Lint clean, format clean

- Extract entity logic to src/entities.ts with enhanced extraction (file paths, URLs, version numbers) - Compute entity_retention, structural_integrity, reference_coherence, and composite quality_score in CompressResult - Add relevanceThreshold option: low-value messages replaced with compact stubs instead of low-quality summaries - Export bestSentenceScore for external relevance scoring - Add roadmap-v2.md tracking all planned improvements

- Tiered budget: keeps recencyWindow fixed, progressively compresses older content by priority tier (tighten → stub → truncate) instead of shrinking the recency window via binary search - Adaptive summary budget: scales with content density — entity-dense messages get up to 45% budget, sparse content gets down to 15% - budgetStrategy option: 'binary-search' (default) or 'tiered' - Both sync and async paths supported for tiered strategy

- New entropyScorer option: plug in a small LM for self-information based sentence importance scoring (Selective Context paper) - entropyScorerMode: 'replace' (entropy only) or 'augment' (weighted average with heuristic, default) - src/entropy.ts: splitSentences, normalizeScores, combineScores utils - Sync and async paths supported; async scorer throws in sync mode - Zero new dependencies: scorer is user-provided function

- Detects Q&A pairs, request→action→confirmation chains, corrections, and acknowledgment patterns in message history - Groups flow chains into single compression units producing more coherent summaries (e.g., "Q: how does X work? → A: it uses Y") - conversationFlow option: opt-in, default false - Flow chains override soft preservation (recency, short content) but not hard blocks (system role, dedup, tool_calls)

…uto) - compressionDepth option controls summarization aggressiveness - gentle: standard sentence selection (default, backward compatible) - moderate: 50% tighter budgets for more aggressive compression - aggressive: entity-only stubs for maximum ratio - auto: progressively tries gentle → moderate → aggressive until tokenBudget fits, with quality gate (stops if quality < 0.60) - Both sync and async paths supported

- Coreference tracking (coreference option): when a compressed message defines an entity referenced by a preserved message, the definition is inlined into the summary to prevent orphaned references - Semantic clustering (semanticClustering option): groups messages by topic using TF-IDF cosine similarity + entity overlap Jaccard, then compresses each cluster as a unit for better topic coherence - Both features are opt-in, zero new dependencies

- Segments text into Elementary Discourse Units with dependency graph - Clause boundary detection via discourse markers (then, because, which...) - Pronoun/demonstrative, temporal, and causal dependency edges - When selecting EDUs for summary, dependency parents are included (up to 2 levels) to prevent incoherent output - discourseAware option: opt-in, default false

- 8 adversarial test cases: pronoun-heavy, scattered entities, correction chains, code-interleaved prose, near-duplicates with critical differences, 10k+ char messages, mixed SQL/JSON/bash, and full round-trip integrity with all features enabled - Update roadmap: 14 of 16 items complete

- ML token classifier (mlTokenClassifier option): per-token keep/remove classification via user-provided model (LLMLingua-2 style). Includes sync/async support, whitespace tokenizer, mock classifier for testing - A/B comparison tool (npm run bench:compare): side-by-side comparison of default vs v2 features across coding, deep conversation, and agentic scenarios. Reports ratio, quality, entity retention, tokens - All 16/16 roadmap items now complete

…tion - bench/run.ts: new Quality Metrics (v2) table showing entity retention, structural integrity, reference coherence, and quality score per scenario - bench/baseline.ts: QualityResult type, quality section in generated docs, average quality score in summary table - bench/compare.ts: add Long Q&A and Technical explanation scenarios, rename V2 option set to "V2 balanced" (no relevanceThreshold) - flow.ts: exclude messages with code fences from flow chain detection to prevent Q&A chains from dropping code content - package.json: add bench:compare script

- New docs/v2-features.md: full documentation for all 11 new features with usage examples, how-it-works sections, and explicit tradeoff analysis for each feature - docs/api-reference.md: updated exports listing, 13 new options in CompressOptions table, 5 new result fields, new types (MLTokenClassifier, TokenClassification) - docs/token-budget.md: added tiered budget strategy and compression depth sections with cross-links - docs/README.md: added V2 Features to index - Each feature documents: what it does, how to use it, how it works internally, and what you give up (the tradeoff)

- Flow chains and clusters no longer skip non-member messages between chain endpoints. Previously, a chain spanning indices [1,4] would skip indices 2,3 even if they weren't chain members (dropping code) - Importance threshold raised from 0.35 to 0.65. The old threshold preserved nearly all messages in entity-rich conversations, reducing compression ratio by up to 30% with no quality benefit - EDU scorer replaced length-based heuristic with information-density scoring (identifiers, numbers, emphasis) to avoid keeping long filler clauses over short technical ones

- Quick reference table, feature section, and TSDoc all flag the 8-28% ratio regression without a custom ML scorer - Explain why: dependency tracking inherently fights compression by pulling in parent EDUs, and the rule-based scorer can't distinguish load-bearing dependencies from decorative ones - Recommend using exported segmentEDUs/scoreEDUs/selectEDUs directly with a custom scorer instead of the discourseAware option - Remove discourseAware from recommended feature combinations

Adaptive entity-aware budgets were changing default compression output (6% regression on coding scenario) because extractEntities was called unconditionally. Now entity-adaptive budgets only activate when compressionDepth is explicitly set to moderate/aggressive/auto. Default path (no v2 options) now produces identical output to develop.

- Flow chains and clusters only mark themselves as processed AFTER successful compression. Previously they were marked on entry, causing non-compressed chain members to be silently dropped - Semantic clusters restricted to consecutive indices only — non-consecutive merges broke round-trip because uncompress can't restore interleaved message ordering - Added V2 Features Comparison section to bench reporter showing each feature individually and recommended combo vs default, with per-scenario ratio/quality and delta row - All 8 scenarios × 8 configs pass round-trip verification

src/compress.ts

+    if (score > best) best = score;
+  }
+  return best;
+}


src/compress.ts

+      scoreMap.set(i, rawScores[i]);
+    }
+  } else {
+    // augment: weighted average of heuristic and entropy


src/compress.ts

+      if (userSummarizer) {
+        next = gen.next(await withFallback(text, userSummarizer, budget));
+      } else {
+        next = gen.next(summarize(text, budget, externalScores));


src/coreference.ts

+  if (entities.length === 0) return '';
+
+  // For each entity, find the sentence where it first appears
+  const sentences = sourceContent.match(/[^.!?\n]+[.!?]+/g) ?? [sourceContent];


src/discourse.ts

+ */
+export function segmentEDUs(text: string): EDU[] {
+  // First split into sentences
+  const sentences = text.match(/[^.!?\n]+[.!?]+/g) ?? [text.trim()];


src/entropy.ts

+ * Returns the sentences and their original indices for reassembly.
+ */
+export function splitSentences(text: string): string[] {
+  const sentences = text.match(/[^.!?\n]+[.!?]+/g);


SimplyLiz added 18 commits March 20, 2026 20:01

docs: update roadmap progress tracker (7/16 items complete)

db9d914

docs: update roadmap progress (8/16 items complete)

112cbb7

chore: bump version to 1.2.0, save baseline, update changelog

bcb97c1

github-advanced-security bot found potential problems Mar 20, 2026

View reviewed changes

SimplyLiz added 2 commits March 20, 2026 23:35

chore: re-save baseline after formatting (2-byte bundle delta)

26273df

chore: format benchmark-results.md

a75f1d4

SimplyLiz merged commit ac04bef into develop Mar 20, 2026
10 of 11 checks passed

SimplyLiz deleted the feature/v2-improvements branch March 20, 2026 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v2 compression features — quality metrics, flow detection, tiered budget, depth control#19

feat: v2 compression features — quality metrics, flow detection, tiered budget, depth control#19
SimplyLiz merged 20 commits intodevelopfrom
feature/v2-improvements

SimplyLiz commented Mar 20, 2026

Uh oh!

Check failure

Check failure

Check failure

Check failure

Check failure

Check failure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SimplyLiz commented Mar 20, 2026

Summary

Feature impact (bench results, default recencyWindow=0)

Recommended usage

Key fixes during development

Test plan

Uh oh!

Check failure

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant