feat: Agent Improvement Engine (insights, improve, evolve) by alishakawaguchi · Pull Request #765 · entireio/cli

alishakawaguchi · 2026-03-25T00:44:27Z

Summary

Adds three new features to help users improve their AI coding sessions based on collected session data:

entire insights — Session quality scoring, cross-session trends, and agent comparisons. SQLite-cached, <1s response time.
entire improve — Two-phase friction analysis (SQLite index → transcript deep-read → Claude CLI) that generates context file improvement suggestions (CLAUDE.md, AGENTS.md, .cursorrules, .gemini) with evidence and unified diffs.
Evolution loop — Auto-triggers improvement suggestions after N sessions (configurable, opt-in).

Architecture

SQLite cache (.entire/insights.db) — local analytics cache using modernc.org/sqlite (pure Go, CGO_ENABLED=0 compatible). Populated from entire/checkpoints/v1 branch with staleness detection.
Commit-time scoring — Session quality scores computed during condensation (pure math, <1ms, no AI call). Written to insights/scores/ on the checkpoint branch for future frontend consumption.
Shared llmcli package — Common Claude CLI execution extracted from summarize/claude.go. Both summarize and improve use it with different prompts.
termstyle package — Shared terminal styling extracted from status_style.go to avoid duplication across renderers.

New packages

Package	Purpose
`cmd/entire/cli/termstyle/`	Shared terminal styling
`cmd/entire/cli/llmcli/`	Shared Claude CLI execution
`cmd/entire/cli/insightsdb/`	SQLite cache layer
`cmd/entire/cli/insights/`	Scoring algorithm + trend analysis
`cmd/entire/cli/improve/`	Context file detection, friction analyzer, suggestion generator
`cmd/entire/cli/evolve/`	Evolution loop trigger + suggestion tracker

Key decisions

Requires summarization enabled — insights/improve gate on IsSummarizeEnabled()
Improve uses two-phase analysis: SQLite finds recurring friction themes, then reads transcript excerpts from git for evidence before generating suggestions
Evolution loop is opt-in (evolve.enabled: false by default)
Binary size: +~8MB from modernc.org/sqlite (32MB → 40MB)

Test plan

Run mise run test:ci — all unit + integration tests pass
Verify entire insights renders scores, trends, and agent comparisons
Verify entire improve --dry-run shows friction patterns without AI call
Verify entire improve generates context file suggestions
Verify .entire/insights.db is created and populated on first run
Verify scoring happens at commit time (check CondenseResult.SessionScore)
Verify pre-existing tests still pass (strategy, summarize, settings)

🤖 Generated with Claude Code

Note

Medium Risk
Adds new CLI commands plus a new SQLite cache and hooks into session condensation to compute/store quality scores, which could impact core session/metadata workflows and local disk state. Also introduces Claude CLI execution via a shared runner and transcript reads, increasing integration surface with external tooling.

Overview
Introduces a new analytics and improvement workflow: entire insights computes per-session quality scores, trend metrics, and agent comparisons from a local SQLite cache, with both terminal and --json output.

Adds entire improve to analyze recent sessions for recurring friction (SQLite index + optional transcript deep-read) and then call the Claude CLI to generate context-file suggestions (with evidence and unified diffs), including a --dry-run mode that skips AI/transcript reads.

Adds an opt-in evolution loop (settings.evolve) to track sessions since the last improvement run and print a tip prompting users to run entire improve after a configurable threshold; also refactors shared infrastructure by extracting llmcli (Claude CLI runner + git isolation) and termstyle (shared lipgloss styling), and adds the modernc.org/sqlite dependency for the new cache.

^{Written by Cursor Bugbot for commit 7816386. Configure here.}

Adds EvolveSettings struct, EvolveConfig field on EntireSettings, and GetEvolveConfig() convenience method with defaults (SessionThreshold=5). Includes mergeJSON support and unit tests covering nil/zero/explicit cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Entire-Checkpoint: a9304e803fa1

Create cmd/entire/cli/llmcli with Runner.Execute(), StripGitEnv(), and ExtractJSONFromMarkdown() so the upcoming improve/generator.go can reuse the same CLI invocation logic without duplicating it from summarize. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Entire-Checkpoint: bb2d026600c0

Move terminal color detection, width calculation, token formatting, and lipgloss style construction into a new exported cmd/entire/cli/termstyle package so upcoming renderers (insights, improve) can reuse them without duplicating the logic or importing the cli package. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Entire-Checkpoint: 04021bde7725

After summary generation in CondenseSession(), compute a SessionScore using pure math via insights.ScoreSession/ComputeOverall and return it in CondenseResult. No AI call, no latency impact (<1ms). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Entire-Checkpoint: a7a24df28aa9

…gestion generator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds the `entire insights` command that reads session quality data from a SQLite cache backed by the entire/checkpoints/v1 branch, then renders scores, trends, and agent comparisons in the terminal or as JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add the evolve package with three responsibilities: threshold-based triggering (ShouldTrigger/IncrementSessionCount/RecordRun), in-memory suggestion lifecycle tracking (Tracker with Accept/Reject/MeasureImpact), and user-facing notification (CheckAndNotify) when the session count meets the configured threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Entire-Checkpoint: a67b9372b855

Adds `entire improve` — a two-phase pipeline that queries recurring friction from the SQLite insights cache, deep-reads transcripts for evidence, detects context files, and calls Claude to generate unified diff suggestions. Also adds JSON tags to insightsdb.FrictionTheme. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Entire-Checkpoint: 64a14f6741b8

…engine - Add context.Context to all insightsdb methods (noctx) - Wrap external package errors (wrapcheck) - Add nolint for tx.Rollback errcheck and maintidx on CondenseSession - Fix nilerr in cache refresh, unparam in renderInsightsTerminal - Add insightsdb cache/db files, insights scoring package - Run go mod tidy for modernc.org/sqlite dependency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b5a4d26c47f7

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-03-25T00:47:11Z

cmd/entire/cli/improve_cmd.go

+	summaries := sessionRowsToSummaries(rows)
+	analysis := improve.AnalyzePatterns(summaries)
+	// Overlay the transcript excerpts we fetched into the analysis.
+	applyExcerpts(analysis.RepeatedFriction, patterns)


Theme mismatch prevents transcript excerpts from being applied

High Severity

applyExcerpts matches dst and src patterns by Theme, but the two slices use incompatible theme formats. buildFrictionPatterns sets Theme to the raw friction text from the database (e.g., "Lint errors not caught by agent"), while AnalyzePatterns sets Theme to a classified keyword (e.g., "lint"). The map lookup in applyExcerpts will never find a match, so transcript excerpts collected in Phase 2 are silently discarded and never included in the prompt sent to Claude.

Additional Locations (1)

cmd/entire/cli/improve_cmd.go#L137-L148

cursor · 2026-03-25T00:47:11Z

cmd/entire/cli/insightsdb/cache.go

+		return nil
+	}
+	return f
+}


nullableFloat stores legitimate zero scores as NULL

Medium Severity

nullableFloat converts any zero float to SQL NULL, but the scoring functions (scoreFriction, scoreFocus, scoreFirstPassSuccess) can legitimately produce 0.0 for poor-performing sessions (e.g., 5+ friction items yields ScoreFriction = 0). This conflates "score not yet computed" with "worst possible score" at the database level, making it impossible to distinguish the two states in queries.

cursor · 2026-03-25T00:47:11Z

cmd/entire/cli/strategy/manual_commit_condensation.go

+		total += totalTokensFromUsage(tu.SubagentTokens)
+	}
+	return total
+}


Duplicate token summation function added in strategy package

Low Severity

totalTokensFromUsage is functionally identical to termstyle.TotalTokens (which was extracted from the old totalTokens in status_style.go in this same PR). Both recursively sum InputTokens + CacheCreationTokens + CacheReadTokens + OutputTokens including subagent tokens. The new function duplicates existing logic rather than reusing it.

Copilot

Pull request overview

Adds an “Agent Improvement Engine” to Entire CLI: new insights and improve commands powered by a local SQLite cache, plus an opt-in “evolve” loop to nudge users to run improvements after N sessions.

Changes:

Introduces entire insights (session scoring, trends, agent comparisons) backed by a SQLite cache (modernc.org/sqlite).
Introduces entire improve (recurring friction detection + optional transcript deep-read + Claude CLI suggestions with unified diffs).
Extracts shared utilities into new llmcli (Claude CLI runner) and termstyle (terminal styling) packages; adds evolve settings + basic evolve state helpers.

Reviewed changes

Copilot reviewed 40 out of 41 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
go.mod	Adds `modernc.org/sqlite` dependency (plus indirects).
go.sum	Updates module checksums for new dependencies.
cmd/entire/cli/termstyle/termstyle.go	New shared terminal styling helpers (color/width detection, rules, token formatting).
cmd/entire/cli/termstyle/termstyle_test.go	Unit tests for termstyle utilities.
cmd/entire/cli/summarize/claude.go	Refactors summarization Claude invocation to use shared `llmcli`.
cmd/entire/cli/llmcli/llmcli.go	New shared Claude CLI runner with git isolation + markdown JSON extraction.
cmd/entire/cli/llmcli/llmcli_test.go	Tests for runner defaults, error paths, git isolation, markdown extraction.
cmd/entire/cli/status_style.go	Switches existing status renderer to delegate styling/token helpers to `termstyle`.
cmd/entire/cli/strategy/manual_commit_types.go	Extends `CondenseResult` with optional `SessionScore`.
cmd/entire/cli/strategy/manual_commit_condensation.go	Computes session score during condensation and returns it in `CondenseResult`.
cmd/entire/cli/strategy/manual_commit_condensation_test.go	Adds tests for token/turn-count helper functions used in scoring.
cmd/entire/cli/insightsdb/db.go	New SQLite cache DB open/migrations + table listing for tests.
cmd/entire/cli/insightsdb/db_test.go	Tests for DB creation/migrations/idempotency.
cmd/entire/cli/insightsdb/cache.go	Cache insert/query helpers and schema mapping structs.
cmd/entire/cli/insightsdb/cache_test.go	Tests for cache meta + insert behavior + denormalized fields.
cmd/entire/cli/insightsdb/queries.go	Cache query methods (last N, by agent, recurring friction, etc.).
cmd/entire/cli/insightsdb/queries_test.go	Tests for query ordering/filtering/recurring friction behavior.
cmd/entire/cli/insights/insights.go	New insights domain types (scores, trends, report).
cmd/entire/cli/insights/scorer.go	New scoring algorithm + overall weighting.
cmd/entire/cli/insights/scorer_test.go	Unit tests for scoring functions.
cmd/entire/cli/insights/trends.go	Trend analysis + per-agent comparisons.
cmd/entire/cli/insights/trends_test.go	Tests for trends + agent comparisons.
cmd/entire/cli/insights_cmd.go	New `entire insights` command: cache refresh + renderers + score computation.
cmd/entire/cli/improve/improve.go	New improve domain types (context files, suggestions, pattern analysis).
cmd/entire/cli/improve/context_files.go	Detects known context files and reads content.
cmd/entire/cli/improve/context_files_test.go	Tests for context file detection behavior.
cmd/entire/cli/improve/analyzer.go	Builds repeated-friction themes + deduped learnings/open items from summaries.
cmd/entire/cli/improve/analyzer_test.go	Tests for analyzer theming/deduplication/threshold behavior.
cmd/entire/cli/improve/generator.go	Generates context-file suggestions via Claude CLI using `llmcli`.
cmd/entire/cli/improve/generator_test.go	Tests for generator parsing, defaults, IDs, timestamps, error handling.
cmd/entire/cli/improve_cmd.go	New `entire improve` command: friction query, deep-read excerpts, suggestion rendering.
cmd/entire/cli/evolve/evolve.go	New evolve state + suggestion record types.
cmd/entire/cli/evolve/trigger.go	Trigger logic for session-threshold evolution loop.
cmd/entire/cli/evolve/trigger_test.go	Tests for trigger logic/state updates.
cmd/entire/cli/evolve/notify.go	Emits user-facing “tip” to run `entire improve` when threshold reached.
cmd/entire/cli/evolve/notify_test.go	Tests for notification behavior.
cmd/entire/cli/evolve/tracker.go	In-memory tracker for suggestion lifecycle and simple impact measurement.
cmd/entire/cli/evolve/tracker_test.go	Tests for tracker operations.
cmd/entire/cli/settings/settings.go	Adds `evolve` settings + defaulting + JSON merge handling.
cmd/entire/cli/settings/settings_evolve_test.go	Tests for evolve settings defaulting/override behavior.
cmd/entire/cli/root.go	Registers new `insights` and `improve` commands.

Copilot · 2026-03-25T00:51:15Z

cmd/entire/cli/insights/trends.go

+	// Split into first and second halves.
+	mid := len(scores) / 2
+	firstHalf := scores[:mid]
+	secondHalf := scores[mid:]
+
+	firstAvg := average(firstHalf, extract)
+	secondAvg := average(secondHalf, extract)
+


ComputeTrends assumes the input slice is ordered oldest→newest when splitting into halves. In entire insights, sessions are queried ORDER BY created_at DESC, so the newest sessions land in the first half and trend directions will be inverted. Consider sorting by CreatedAt ascending inside ComputeTrends (or explicitly reversing in the caller) before computing averages/data points.

Copilot · 2026-03-25T00:51:16Z

cmd/entire/cli/improve/generator.go

+// buildPrompt constructs the prompt for the Claude CLI.
+// All untrusted content (friction text, learnings, context file content) is wrapped
+// in XML tags to prevent prompt injection.
+func buildPrompt(analysis PatternAnalysis, contextFiles []ContextFile) string {
+	var sb strings.Builder
+
+	sb.WriteString(`Analyze recurring patterns from recent AI coding sessions and suggest
+improvements to the project's context files.
+
+`)
+
+	// Repeated friction section
+	sb.WriteString("<repeated_friction>\n")
+	if len(analysis.RepeatedFriction) == 0 {
+		sb.WriteString("(no repeated friction patterns found)\n")
+	} else {
+		for _, p := range analysis.RepeatedFriction {
+			fmt.Fprintf(&sb, "Theme: %s issues (occurred %d times)\n", p.Theme, p.Count)
+			for _, ex := range p.Examples {
+				fmt.Fprintf(&sb, "  - %q\n", ex)
+			}
+			if p.TranscriptExcerpt != "" {
+				fmt.Fprintf(&sb, "  Excerpt: %q\n", p.TranscriptExcerpt)
+			}
+		}


The prompt-building comment says untrusted content is wrapped in XML tags to prevent prompt injection, but the inserted friction/learnings/context contents are not escaped. A friction string containing < / </repeated_friction> could break the structure and undermine the mitigation. Consider escaping/encoding untrusted strings (e.g., XML-escape or JSON-marshal values) and updating the comment to reflect the actual guarantees.

Copilot · 2026-03-25T00:51:16Z

cmd/entire/cli/improve/generator.go

+	suggestions := make([]Suggestion, 0, len(resp.Suggestions))
+	for i, s := range resp.Suggestions {
+		suggestions = append(suggestions, Suggestion{
+			ID:          fmt.Sprintf("sug-%d-%d", now.Unix(), i),


Suggestion IDs are based on now.Unix() (seconds) plus the loop index. Two entire improve runs within the same second can generate identical IDs (especially if they produce the same number of suggestions), which will collide with the suggestions.id primary key in SQLite. Consider using UnixNano, a UUID, or a random suffix for IDs.

Suggested change

ID: fmt.Sprintf("sug-%d-%d", now.Unix(), i),

ID: fmt.Sprintf("sug-%d-%d", now.UnixNano(), i),

Copilot · 2026-03-25T00:51:16Z

cmd/entire/cli/improve/analyzer.go

+	// Build repeated friction list (threshold: 2+ occurrences)
+	var repeated []FrictionPattern
+	for theme, acc := range byTheme {
+		if acc.count < 2 {
+			continue
+		}
+		sessions := make([]string, 0, len(acc.sessions))
+		for id := range acc.sessions {
+			sessions = append(sessions, id)
+		}
+		repeated = append(repeated, FrictionPattern{
+			Theme:            theme,
+			Count:            acc.count,
+			Examples:         acc.examples,
+			AffectedSessions: sessions,
+		})
+	}


AnalyzePatterns builds RepeatedFriction by iterating over a map, so the order of patterns (and therefore CLI output/prompt content) is nondeterministic across runs. This can lead to noisy diffs in generated suggestions and make behavior harder to reason about. Consider sorting repeated deterministically (e.g., by Count desc, then Theme asc) before returning.

Copilot · 2026-03-25T00:51:17Z

cmd/entire/cli/insights_cmd.go

+		row.InputTokens = meta.TokenUsage.InputTokens + meta.TokenUsage.CacheCreationTokens + meta.TokenUsage.CacheReadTokens
+		row.OutputTokens = meta.TokenUsage.OutputTokens
+		row.TotalTokens = row.InputTokens + row.OutputTokens


metadataToSessionRow populates InputTokens with input+cache tokens but never sets CacheTokens, even though the DB schema and SessionRow struct have a dedicated cache_tokens column. This makes cached token breakdown inconsistent and prevents showing cache token usage accurately. Suggest setting InputTokens to just input tokens, CacheTokens to cache creation+read, and TotalTokens to input+cache+output.

Suggested change

row.InputTokens = meta.TokenUsage.InputTokens + meta.TokenUsage.CacheCreationTokens + meta.TokenUsage.CacheReadTokens

row.OutputTokens = meta.TokenUsage.OutputTokens

row.TotalTokens = row.InputTokens + row.OutputTokens

// Keep token categories distinct so cache usage can be reported accurately.

row.InputTokens = meta.TokenUsage.InputTokens

row.CacheTokens = meta.TokenUsage.CacheCreationTokens + meta.TokenUsage.CacheReadTokens

row.OutputTokens = meta.TokenUsage.OutputTokens

row.TotalTokens = row.InputTokens + row.CacheTokens + row.OutputTokens

Copilot · 2026-03-25T00:51:17Z

cmd/entire/cli/insightsdb/cache.go

+// nullableFloat converts a zero float to a SQL NULL value.
+// Non-zero floats are passed through as-is.
+func nullableFloat(f float64) interface{} {
+	if f == 0 {
+		return nil
+	}


nullableFloat converts 0.0 to SQL NULL. For score fields, 0 can be a valid computed value (e.g., severe friction), so this loses information in the DB and can change behavior for any future SQL aggregates/filters. Prefer inserting 0 as 0 and using an explicit “score computed” flag/column if you need to represent “unknown”.

Suggested change

// nullableFloat converts a zero float to a SQL NULL value.

// Non-zero floats are passed through as-is.

func nullableFloat(f float64) interface{} {

if f == 0 {

return nil

}

// nullableFloat passes floats through as-is so that 0.0 is preserved as a valid value.

func nullableFloat(f float64) interface{} {

Copilot · 2026-03-25T00:51:17Z

cmd/entire/cli/insightsdb/queries.go

+// QuerySessionsWithFriction returns checkpoint IDs of sessions containing
+// friction matching the given SQL LIKE pattern (e.g., "%tool call failed%").
+func (idb *InsightsDB) QuerySessionsWithFriction(ctx context.Context, pattern string) ([]string, error) {
+	rows, err := idb.db.QueryContext(ctx,
+		"SELECT DISTINCT checkpoint_id FROM friction WHERE text LIKE ?",
+		pattern,
+	)


QuerySessionsWithFriction returns only checkpoint IDs, dropping session_index even though the friction table is keyed by (checkpoint_id, session_index). Callers (e.g., transcript deep-read) can end up reading the wrong session (or always index 0) and miss the friction evidence. Consider returning (checkpoint_id, session_index) pairs (or a small struct) and adjusting callers accordingly.

Copilot · 2026-03-25T00:51:18Z

cmd/entire/cli/improve_cmd.go

+	// Phase 2: Deep-read transcripts for top friction themes.
+	patterns := buildFrictionPatterns(frictionThemes)
+	if err = attachTranscriptExcerpts(ctx, idb, patterns, worktreeRoot); err != nil {
+		// Non-fatal: proceed without transcript excerpts.
+		_ = err
+	}
+
+	// Phase 3: Detect context files.
+	contextFiles := improve.DetectContextFiles(worktreeRoot)
+
+	// Phase 4: Build analysis from session data + friction patterns, then generate.
+	summaries := sessionRowsToSummaries(rows)
+	analysis := improve.AnalyzePatterns(summaries)
+	// Overlay the transcript excerpts we fetched into the analysis.
+	applyExcerpts(analysis.RepeatedFriction, patterns)


Phase 2/4 excerpt wiring looks broken: patterns := buildFrictionPatterns(frictionThemes) sets Theme to the raw friction text, but AnalyzePatterns produces RepeatedFriction themes like "lint", "test", etc. applyExcerpts matches by Theme, so transcript excerpts will almost never attach to the analysis passed into Generator.Generate. Consider running attachTranscriptExcerpts directly on analysis.RepeatedFriction (or ensuring both phases use the same theme key).

Suggested change

// Phase 2: Deep-read transcripts for top friction themes.

patterns := buildFrictionPatterns(frictionThemes)

if err = attachTranscriptExcerpts(ctx, idb, patterns, worktreeRoot); err != nil {

// Non-fatal: proceed without transcript excerpts.

_ = err

}

// Phase 3: Detect context files.

contextFiles := improve.DetectContextFiles(worktreeRoot)

// Phase 4: Build analysis from session data + friction patterns, then generate.

summaries := sessionRowsToSummaries(rows)

analysis := improve.AnalyzePatterns(summaries)

// Overlay the transcript excerpts we fetched into the analysis.

applyExcerpts(analysis.RepeatedFriction, patterns)

// Phase 2: (deprecated wiring) Deep-read transcripts will be attached after analysis is built.

// Phase 3: Detect context files.

contextFiles := improve.DetectContextFiles(worktreeRoot)

// Phase 4: Build analysis from session data, then attach transcript excerpts and generate.

summaries := sessionRowsToSummaries(rows)

analysis := improve.AnalyzePatterns(summaries)

// Attach transcript excerpts directly to the repeated friction patterns used in generation.

if err = attachTranscriptExcerpts(ctx, idb, analysis.RepeatedFriction, worktreeRoot); err != nil {

// Non-fatal: proceed without transcript excerpts.

_ = err

}

Copilot · 2026-03-25T00:51:18Z

cmd/entire/cli/improve_cmd.go

+// truncateString truncates s to at most maxLen bytes, appending "..." if truncated.
+func truncateString(s string, maxLen int) string {
+	if len(s) <= maxLen {
+		return s
+	}
+	return s[:maxLen] + "..."
+}


truncateString slices by bytes (s[:maxLen]), which can cut multi-byte UTF-8 sequences (common in transcripts) and produce invalid text. Consider truncating by runes (or using utf8.ValidString / []rune), and ensure the ellipsis doesn’t exceed the requested limit if that matters.

Copilot · 2026-03-25T00:51:18Z

cmd/entire/cli/improve/generator.go

+func (g *Generator) Generate(ctx context.Context, analysis PatternAnalysis, contextFiles []ContextFile) ([]Suggestion, error) {
+	prompt := buildPrompt(analysis, contextFiles)
+
+	raw, err := g.Runner.Execute(ctx, prompt)


Generator.Generate assumes g.Runner is non-nil and will panic if a caller constructs Generator{}. Since Generator is exported, consider defaulting g.Runner to &llmcli.Runner{} when nil (or returning a clear error) to avoid panics in other packages/tests.

Suggested change

raw, err := g.Runner.Execute(ctx, prompt)

runner := g.Runner

if runner == nil {

runner = &llmcli.Runner{}

}

raw, err := runner.Execute(ctx, prompt)

The ireturn linter now flags these regardless of nolint directives. These are type-assertion helpers that must return interfaces by design. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 47ce155f7bff

- Replace byte-based truncateString with UTF-8 safe stringutil.TruncateRunes - Fix TotalTokens in insights_cmd to include subagent tokens via termstyle.TotalTokens - Deduplicate totalTokensFromUsage in strategy by delegating to termstyle.TotalTokens - Replace hand-rolled splitCSV with strings.Split - DRY up Accept/Reject in evolve tracker via shared resolve method - Cap friction examples at 10 per theme to prevent unbounded LLM prompt growth - Remove unnecessary WHAT comments in evolve/tracker.go - Restore nolint:ireturn directives removed in prior commit (fixes lint) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: d63aba52b8ce

Parse total_cost_usd and usage fields from Claude CLI JSON response. llmcli.Execute now returns UsageInfo alongside the result, and entire improve shows a cost/token summary line after the report. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 8e74831eb583

Keep session scoring from agent-improvements and writeOpts variable with v2 dual-write from main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 71977e30b301

- Generate summaries on-demand for sessions that lack them during insights/improve runs, scoped to --last N sessions requested - Fix token efficiency scoring: replace exp decay (always 0 for real data) with sigmoid centered at 500k tokens/turn - Fix first-pass scoring: raise baseline to 90, reduce penalties (friction 10→5, turns 3→2, open items 5→3) - Fix friction scoring: reduce penalty from 20→15 per item - Fix focus scoring: use gaussian on turns/files ratio instead of flat band - Add has_summary column to sessions table for tracking - Add UpdateSessionSummary for backfill cache updates - Remove nullableFloat (stored valid 0.0 scores as NULL) - Always compute partial scores even without summaries - Show "(no summary)" indicator in insights output - Restore nolint:ireturn directives on capabilities.go Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: dea7491a6bda

Entire-Checkpoint: faa75c2dab8f

Entire-Checkpoint: b4d20b476fee

Entire-Checkpoint: 89347a654b11

Entire-Checkpoint: 74a14a71f350

Scaffold the memorylooptui package (styles, keys, messages, render helpers, root model with tab switching) and implement the memories tab with bubbles/table for record listing, status filter cycling, search mode, detail pane toggle, and lifecycle action keybindings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 0f0d39f4f655

- Fix truncate() to slice by rune instead of byte to prevent UTF-8 corruption - Replace hand-rolled ANSI width parser with lipgloss.Width() - Distinguish empty state messages: no store vs no records vs no filter matches - Refactor updateSearch to use key.Matches() and msg.Runes instead of msg.String() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: c63d65192881

…d-memory form - Injection tab: log table with prompt tester and match results - History tab: refresh history table with R trigger - Settings tab: mode/policy/max cycling with auto-save - Add-memory form: inline textinput fields on memories tab (n key) - Root model: input capture coordination, refresh placeholder Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 0b5a0dcac774

…ail pane on by default - Change accent from cyan (6) to orange/amber (214) across all tabs - Make inactive tabs use lighter gray (245) so they're visible on dark terminals - Show detail pane by default instead of hiding behind Enter toggle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 45aeb0cc713e

- Increase maxWidth from 80 to 120 for wider terminal support - Add rounded-border card to memories detail pane - Add injection log detail view with bordered card for selected entries - Add bordered card around prompt tester match results - Make prompt tester input more visible with orange arrow prompt - Inline pushState calls in handler methods for correct state propagation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 1b62f97ce468

…tions - Border color changed from dark gray (8) to light gray (245) for visibility - Added blank line between tab bar and filter bar - Added spacing between memory table and detail card - Fixed card width calculation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 14ef2cb624b1

…ngs key fix - Settings tab: each setting in a bordered card with proper layout - Injection tab: constrain log table height to prevent overflow - Settings update: capture changed msg in local struct to fix closure scope - Stats line condensed to single row Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 9872a596d85e

- Selected setting options shown with orange background + black text chip - Unselected options in light gray - Max injected number in bold orange - Removed debug file logging from settings and root Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 9393df2b476e

Entire-Checkpoint: 55da86a28935

Entire-Checkpoint: d0d138a8b16d

…ILS header Redesign the TUI navigation to look like proper tabs with an amber underline under the active tab and a MEMORY LOOP app title. Convert filter labels to uppercase chip buttons matching the Settings tab pattern. Add DETAILS section header above the memory detail card. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: abe1cf53d792

Add blank lines between title, metadata, body, why, and stats sections. Increase card padding and add spacing between filter chips and table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: a68a4e85fda9

…larity Injection tab: use section headers, increase card padding, add blank lines between sections, align detail card fields. History tab: add descriptive header explaining what refreshes do, widen scope column from 8 to 16 chars. Increase prompt preview storage limit from 120 to 500 chars so injection detail cards show more of the original prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 7c16b9136913

Add `entire skill` command that discovers skill files across agent config directories (.claude/skills/, .gemini/agents/, etc.), tracks their usage from insightsdb session data in a dedicated skill-analytics.db, and generates AI-powered improvement suggestions via an interactive Bubbletea TUI with picker screen and 3-tab dashboard (Stats, Friction, Improve). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 12bbd28d76e5

alishakawaguchi and others added 9 commits March 24, 2026 16:47

feat(improve): add context file detection, friction analyzer, and sug…

c788f86

…gestion generator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 25, 2026 00:44

Copilot started reviewing on behalf of alishakawaguchi March 25, 2026 00:44 View session

cursor bot reviewed Mar 25, 2026

View reviewed changes

Copilot AI reviewed Mar 25, 2026

View reviewed changes

alishakawaguchi and others added 17 commits March 25, 2026 15:12

merge: resolve conflict in manual_commit_condensation.go

dff7537

Keep session scoring from agent-improvements and writeOpts variable with v2 dual-write from main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 71977e30b301

feat: add structured session facets for improve

4d85473

Entire-Checkpoint: faa75c2dab8f

chore: commit remaining insights changes

e07b227

Entire-Checkpoint: b4d20b476fee

docs: add heavyweight memory loop design and plan

48bf34a

Entire-Checkpoint: 89347a654b11

feat: add heavyweight memory loop workflow

ae6f1b5

Entire-Checkpoint: 74a14a71f350

alishakawaguchi and others added 6 commits March 26, 2026 18:05

docs: add memory-loop TUI redesign design

f9b642b

Entire-Checkpoint: 55da86a28935

docs: add memory-loop TUI restyle design

d4a1f54

Entire-Checkpoint: d0d138a8b16d

	ID: fmt.Sprintf("sug-%d-%d", now.Unix(), i),
	ID: fmt.Sprintf("sug-%d-%d", now.UnixNano(), i),

-	raw, err := g.Runner.Execute(ctx, prompt)
+	runner := g.Runner
+	if runner == nil {
+		runner = &llmcli.Runner{}
+	}
+	raw, err := runner.Execute(ctx, prompt)

Conversation

alishakawaguchi commented Mar 25, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

New packages

Key decisions

Test plan

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 25, 2026

Choose a reason for hiding this comment

Theme mismatch prevents transcript excerpts from being applied

Uh oh!

cursor bot Mar 25, 2026

Choose a reason for hiding this comment

nullableFloat stores legitimate zero scores as NULL

Uh oh!

cursor bot Mar 25, 2026

Choose a reason for hiding this comment

Duplicate token summation function added in strategy package

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

alishakawaguchi commented Mar 25, 2026 •

edited by cursor bot

Loading

`nullableFloat` stores legitimate zero scores as NULL