Skip to content

feat: Agent Improvement Engine (insights, improve, evolve)#765

Draft
alishakawaguchi wants to merge 32 commits intomainfrom
agent-improvements
Draft

feat: Agent Improvement Engine (insights, improve, evolve)#765
alishakawaguchi wants to merge 32 commits intomainfrom
agent-improvements

Conversation

@alishakawaguchi
Copy link
Copy Markdown
Contributor

@alishakawaguchi alishakawaguchi commented Mar 25, 2026

Summary

Adds three new features to help users improve their AI coding sessions based on collected session data:

  • entire insights — Session quality scoring, cross-session trends, and agent comparisons. SQLite-cached, <1s response time.
  • entire improve — Two-phase friction analysis (SQLite index → transcript deep-read → Claude CLI) that generates context file improvement suggestions (CLAUDE.md, AGENTS.md, .cursorrules, .gemini) with evidence and unified diffs.
  • Evolution loop — Auto-triggers improvement suggestions after N sessions (configurable, opt-in).

Architecture

  • SQLite cache (.entire/insights.db) — local analytics cache using modernc.org/sqlite (pure Go, CGO_ENABLED=0 compatible). Populated from entire/checkpoints/v1 branch with staleness detection.
  • Commit-time scoring — Session quality scores computed during condensation (pure math, <1ms, no AI call). Written to insights/scores/ on the checkpoint branch for future frontend consumption.
  • Shared llmcli package — Common Claude CLI execution extracted from summarize/claude.go. Both summarize and improve use it with different prompts.
  • termstyle package — Shared terminal styling extracted from status_style.go to avoid duplication across renderers.

New packages

Package Purpose
cmd/entire/cli/termstyle/ Shared terminal styling
cmd/entire/cli/llmcli/ Shared Claude CLI execution
cmd/entire/cli/insightsdb/ SQLite cache layer
cmd/entire/cli/insights/ Scoring algorithm + trend analysis
cmd/entire/cli/improve/ Context file detection, friction analyzer, suggestion generator
cmd/entire/cli/evolve/ Evolution loop trigger + suggestion tracker

Key decisions

  • Requires summarization enabled — insights/improve gate on IsSummarizeEnabled()
  • Improve uses two-phase analysis: SQLite finds recurring friction themes, then reads transcript excerpts from git for evidence before generating suggestions
  • Evolution loop is opt-in (evolve.enabled: false by default)
  • Binary size: +~8MB from modernc.org/sqlite (32MB → 40MB)

Test plan

  • Run mise run test:ci — all unit + integration tests pass
  • Verify entire insights renders scores, trends, and agent comparisons
  • Verify entire improve --dry-run shows friction patterns without AI call
  • Verify entire improve generates context file suggestions
  • Verify .entire/insights.db is created and populated on first run
  • Verify scoring happens at commit time (check CondenseResult.SessionScore)
  • Verify pre-existing tests still pass (strategy, summarize, settings)

🤖 Generated with Claude Code


Note

Medium Risk
Adds new CLI commands plus a new SQLite cache and hooks into session condensation to compute/store quality scores, which could impact core session/metadata workflows and local disk state. Also introduces Claude CLI execution via a shared runner and transcript reads, increasing integration surface with external tooling.

Overview
Introduces a new analytics and improvement workflow: entire insights computes per-session quality scores, trend metrics, and agent comparisons from a local SQLite cache, with both terminal and --json output.

Adds entire improve to analyze recent sessions for recurring friction (SQLite index + optional transcript deep-read) and then call the Claude CLI to generate context-file suggestions (with evidence and unified diffs), including a --dry-run mode that skips AI/transcript reads.

Adds an opt-in evolution loop (settings.evolve) to track sessions since the last improvement run and print a tip prompting users to run entire improve after a configurable threshold; also refactors shared infrastructure by extracting llmcli (Claude CLI runner + git isolation) and termstyle (shared lipgloss styling), and adds the modernc.org/sqlite dependency for the new cache.

Written by Cursor Bugbot for commit 7816386. Configure here.

alishakawaguchi and others added 9 commits March 24, 2026 16:47
Adds EvolveSettings struct, EvolveConfig field on EntireSettings, and
GetEvolveConfig() convenience method with defaults (SessionThreshold=5).
Includes mergeJSON support and unit tests covering nil/zero/explicit cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Entire-Checkpoint: a9304e803fa1
Create cmd/entire/cli/llmcli with Runner.Execute(), StripGitEnv(), and
ExtractJSONFromMarkdown() so the upcoming improve/generator.go can reuse
the same CLI invocation logic without duplicating it from summarize.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Entire-Checkpoint: bb2d026600c0
Move terminal color detection, width calculation, token formatting, and
lipgloss style construction into a new exported cmd/entire/cli/termstyle
package so upcoming renderers (insights, improve) can reuse them without
duplicating the logic or importing the cli package.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 04021bde7725
After summary generation in CondenseSession(), compute a SessionScore
using pure math via insights.ScoreSession/ComputeOverall and return it
in CondenseResult. No AI call, no latency impact (<1ms).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Entire-Checkpoint: a7a24df28aa9
…gestion generator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the `entire insights` command that reads session quality data from
a SQLite cache backed by the entire/checkpoints/v1 branch, then renders
scores, trends, and agent comparisons in the terminal or as JSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the evolve package with three responsibilities: threshold-based
triggering (ShouldTrigger/IncrementSessionCount/RecordRun), in-memory
suggestion lifecycle tracking (Tracker with Accept/Reject/MeasureImpact),
and user-facing notification (CheckAndNotify) when the session count
meets the configured threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Entire-Checkpoint: a67b9372b855
Adds `entire improve` — a two-phase pipeline that queries recurring
friction from the SQLite insights cache, deep-reads transcripts for
evidence, detects context files, and calls Claude to generate unified
diff suggestions. Also adds JSON tags to insightsdb.FrictionTheme.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 64a14f6741b8
…engine

- Add context.Context to all insightsdb methods (noctx)
- Wrap external package errors (wrapcheck)
- Add nolint for tx.Rollback errcheck and maintidx on CondenseSession
- Fix nilerr in cache refresh, unparam in renderInsightsTerminal
- Add insightsdb cache/db files, insights scoring package
- Run go mod tidy for modernc.org/sqlite dependency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b5a4d26c47f7
Copilot AI review requested due to automatic review settings March 25, 2026 00:44
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

summaries := sessionRowsToSummaries(rows)
analysis := improve.AnalyzePatterns(summaries)
// Overlay the transcript excerpts we fetched into the analysis.
applyExcerpts(analysis.RepeatedFriction, patterns)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theme mismatch prevents transcript excerpts from being applied

High Severity

applyExcerpts matches dst and src patterns by Theme, but the two slices use incompatible theme formats. buildFrictionPatterns sets Theme to the raw friction text from the database (e.g., "Lint errors not caught by agent"), while AnalyzePatterns sets Theme to a classified keyword (e.g., "lint"). The map lookup in applyExcerpts will never find a match, so transcript excerpts collected in Phase 2 are silently discarded and never included in the prompt sent to Claude.

Additional Locations (1)
Fix in Cursor Fix in Web

return nil
}
return f
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullableFloat stores legitimate zero scores as NULL

Medium Severity

nullableFloat converts any zero float to SQL NULL, but the scoring functions (scoreFriction, scoreFocus, scoreFirstPassSuccess) can legitimately produce 0.0 for poor-performing sessions (e.g., 5+ friction items yields ScoreFriction = 0). This conflates "score not yet computed" with "worst possible score" at the database level, making it impossible to distinguish the two states in queries.

Fix in Cursor Fix in Web

total += totalTokensFromUsage(tu.SubagentTokens)
}
return total
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate token summation function added in strategy package

Low Severity

totalTokensFromUsage is functionally identical to termstyle.TotalTokens (which was extracted from the old totalTokens in status_style.go in this same PR). Both recursively sum InputTokens + CacheCreationTokens + CacheReadTokens + OutputTokens including subagent tokens. The new function duplicates existing logic rather than reusing it.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an “Agent Improvement Engine” to Entire CLI: new insights and improve commands powered by a local SQLite cache, plus an opt-in “evolve” loop to nudge users to run improvements after N sessions.

Changes:

  • Introduces entire insights (session scoring, trends, agent comparisons) backed by a SQLite cache (modernc.org/sqlite).
  • Introduces entire improve (recurring friction detection + optional transcript deep-read + Claude CLI suggestions with unified diffs).
  • Extracts shared utilities into new llmcli (Claude CLI runner) and termstyle (terminal styling) packages; adds evolve settings + basic evolve state helpers.

Reviewed changes

Copilot reviewed 40 out of 41 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
go.mod Adds modernc.org/sqlite dependency (plus indirects).
go.sum Updates module checksums for new dependencies.
cmd/entire/cli/termstyle/termstyle.go New shared terminal styling helpers (color/width detection, rules, token formatting).
cmd/entire/cli/termstyle/termstyle_test.go Unit tests for termstyle utilities.
cmd/entire/cli/summarize/claude.go Refactors summarization Claude invocation to use shared llmcli.
cmd/entire/cli/llmcli/llmcli.go New shared Claude CLI runner with git isolation + markdown JSON extraction.
cmd/entire/cli/llmcli/llmcli_test.go Tests for runner defaults, error paths, git isolation, markdown extraction.
cmd/entire/cli/status_style.go Switches existing status renderer to delegate styling/token helpers to termstyle.
cmd/entire/cli/strategy/manual_commit_types.go Extends CondenseResult with optional SessionScore.
cmd/entire/cli/strategy/manual_commit_condensation.go Computes session score during condensation and returns it in CondenseResult.
cmd/entire/cli/strategy/manual_commit_condensation_test.go Adds tests for token/turn-count helper functions used in scoring.
cmd/entire/cli/insightsdb/db.go New SQLite cache DB open/migrations + table listing for tests.
cmd/entire/cli/insightsdb/db_test.go Tests for DB creation/migrations/idempotency.
cmd/entire/cli/insightsdb/cache.go Cache insert/query helpers and schema mapping structs.
cmd/entire/cli/insightsdb/cache_test.go Tests for cache meta + insert behavior + denormalized fields.
cmd/entire/cli/insightsdb/queries.go Cache query methods (last N, by agent, recurring friction, etc.).
cmd/entire/cli/insightsdb/queries_test.go Tests for query ordering/filtering/recurring friction behavior.
cmd/entire/cli/insights/insights.go New insights domain types (scores, trends, report).
cmd/entire/cli/insights/scorer.go New scoring algorithm + overall weighting.
cmd/entire/cli/insights/scorer_test.go Unit tests for scoring functions.
cmd/entire/cli/insights/trends.go Trend analysis + per-agent comparisons.
cmd/entire/cli/insights/trends_test.go Tests for trends + agent comparisons.
cmd/entire/cli/insights_cmd.go New entire insights command: cache refresh + renderers + score computation.
cmd/entire/cli/improve/improve.go New improve domain types (context files, suggestions, pattern analysis).
cmd/entire/cli/improve/context_files.go Detects known context files and reads content.
cmd/entire/cli/improve/context_files_test.go Tests for context file detection behavior.
cmd/entire/cli/improve/analyzer.go Builds repeated-friction themes + deduped learnings/open items from summaries.
cmd/entire/cli/improve/analyzer_test.go Tests for analyzer theming/deduplication/threshold behavior.
cmd/entire/cli/improve/generator.go Generates context-file suggestions via Claude CLI using llmcli.
cmd/entire/cli/improve/generator_test.go Tests for generator parsing, defaults, IDs, timestamps, error handling.
cmd/entire/cli/improve_cmd.go New entire improve command: friction query, deep-read excerpts, suggestion rendering.
cmd/entire/cli/evolve/evolve.go New evolve state + suggestion record types.
cmd/entire/cli/evolve/trigger.go Trigger logic for session-threshold evolution loop.
cmd/entire/cli/evolve/trigger_test.go Tests for trigger logic/state updates.
cmd/entire/cli/evolve/notify.go Emits user-facing “tip” to run entire improve when threshold reached.
cmd/entire/cli/evolve/notify_test.go Tests for notification behavior.
cmd/entire/cli/evolve/tracker.go In-memory tracker for suggestion lifecycle and simple impact measurement.
cmd/entire/cli/evolve/tracker_test.go Tests for tracker operations.
cmd/entire/cli/settings/settings.go Adds evolve settings + defaulting + JSON merge handling.
cmd/entire/cli/settings/settings_evolve_test.go Tests for evolve settings defaulting/override behavior.
cmd/entire/cli/root.go Registers new insights and improve commands.

Comment on lines +56 to +63
// Split into first and second halves.
mid := len(scores) / 2
firstHalf := scores[:mid]
secondHalf := scores[mid:]

firstAvg := average(firstHalf, extract)
secondAvg := average(secondHalf, extract)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComputeTrends assumes the input slice is ordered oldest→newest when splitting into halves. In entire insights, sessions are queried ORDER BY created_at DESC, so the newest sessions land in the first half and trend directions will be inverted. Consider sorting by CreatedAt ascending inside ComputeTrends (or explicitly reversing in the caller) before computing averages/data points.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +95
// buildPrompt constructs the prompt for the Claude CLI.
// All untrusted content (friction text, learnings, context file content) is wrapped
// in XML tags to prevent prompt injection.
func buildPrompt(analysis PatternAnalysis, contextFiles []ContextFile) string {
var sb strings.Builder

sb.WriteString(`Analyze recurring patterns from recent AI coding sessions and suggest
improvements to the project's context files.

`)

// Repeated friction section
sb.WriteString("<repeated_friction>\n")
if len(analysis.RepeatedFriction) == 0 {
sb.WriteString("(no repeated friction patterns found)\n")
} else {
for _, p := range analysis.RepeatedFriction {
fmt.Fprintf(&sb, "Theme: %s issues (occurred %d times)\n", p.Theme, p.Count)
for _, ex := range p.Examples {
fmt.Fprintf(&sb, " - %q\n", ex)
}
if p.TranscriptExcerpt != "" {
fmt.Fprintf(&sb, " Excerpt: %q\n", p.TranscriptExcerpt)
}
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt-building comment says untrusted content is wrapped in XML tags to prevent prompt injection, but the inserted friction/learnings/context contents are not escaped. A friction string containing < / </repeated_friction> could break the structure and undermine the mitigation. Consider escaping/encoding untrusted strings (e.g., XML-escape or JSON-marshal values) and updating the comment to reflect the actual guarantees.

Copilot uses AI. Check for mistakes.
suggestions := make([]Suggestion, 0, len(resp.Suggestions))
for i, s := range resp.Suggestions {
suggestions = append(suggestions, Suggestion{
ID: fmt.Sprintf("sug-%d-%d", now.Unix(), i),
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion IDs are based on now.Unix() (seconds) plus the loop index. Two entire improve runs within the same second can generate identical IDs (especially if they produce the same number of suggestions), which will collide with the suggestions.id primary key in SQLite. Consider using UnixNano, a UUID, or a random suffix for IDs.

Suggested change
ID: fmt.Sprintf("sug-%d-%d", now.Unix(), i),
ID: fmt.Sprintf("sug-%d-%d", now.UnixNano(), i),

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +103
// Build repeated friction list (threshold: 2+ occurrences)
var repeated []FrictionPattern
for theme, acc := range byTheme {
if acc.count < 2 {
continue
}
sessions := make([]string, 0, len(acc.sessions))
for id := range acc.sessions {
sessions = append(sessions, id)
}
repeated = append(repeated, FrictionPattern{
Theme: theme,
Count: acc.count,
Examples: acc.examples,
AffectedSessions: sessions,
})
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzePatterns builds RepeatedFriction by iterating over a map, so the order of patterns (and therefore CLI output/prompt content) is nondeterministic across runs. This can lead to noisy diffs in generated suggestions and make behavior harder to reason about. Consider sorting repeated deterministically (e.g., by Count desc, then Theme asc) before returning.

Copilot uses AI. Check for mistakes.
Comment on lines +192 to +194
row.InputTokens = meta.TokenUsage.InputTokens + meta.TokenUsage.CacheCreationTokens + meta.TokenUsage.CacheReadTokens
row.OutputTokens = meta.TokenUsage.OutputTokens
row.TotalTokens = row.InputTokens + row.OutputTokens
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadataToSessionRow populates InputTokens with input+cache tokens but never sets CacheTokens, even though the DB schema and SessionRow struct have a dedicated cache_tokens column. This makes cached token breakdown inconsistent and prevents showing cache token usage accurately. Suggest setting InputTokens to just input tokens, CacheTokens to cache creation+read, and TotalTokens to input+cache+output.

Suggested change
row.InputTokens = meta.TokenUsage.InputTokens + meta.TokenUsage.CacheCreationTokens + meta.TokenUsage.CacheReadTokens
row.OutputTokens = meta.TokenUsage.OutputTokens
row.TotalTokens = row.InputTokens + row.OutputTokens
// Keep token categories distinct so cache usage can be reported accurately.
row.InputTokens = meta.TokenUsage.InputTokens
row.CacheTokens = meta.TokenUsage.CacheCreationTokens + meta.TokenUsage.CacheReadTokens
row.OutputTokens = meta.TokenUsage.OutputTokens
row.TotalTokens = row.InputTokens + row.CacheTokens + row.OutputTokens

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +213
// nullableFloat converts a zero float to a SQL NULL value.
// Non-zero floats are passed through as-is.
func nullableFloat(f float64) interface{} {
if f == 0 {
return nil
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullableFloat converts 0.0 to SQL NULL. For score fields, 0 can be a valid computed value (e.g., severe friction), so this loses information in the DB and can change behavior for any future SQL aggregates/filters. Prefer inserting 0 as 0 and using an explicit “score computed” flag/column if you need to represent “unknown”.

Suggested change
// nullableFloat converts a zero float to a SQL NULL value.
// Non-zero floats are passed through as-is.
func nullableFloat(f float64) interface{} {
if f == 0 {
return nil
}
// nullableFloat passes floats through as-is so that 0.0 is preserved as a valid value.
func nullableFloat(f float64) interface{} {

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +80
// QuerySessionsWithFriction returns checkpoint IDs of sessions containing
// friction matching the given SQL LIKE pattern (e.g., "%tool call failed%").
func (idb *InsightsDB) QuerySessionsWithFriction(ctx context.Context, pattern string) ([]string, error) {
rows, err := idb.db.QueryContext(ctx,
"SELECT DISTINCT checkpoint_id FROM friction WHERE text LIKE ?",
pattern,
)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuerySessionsWithFriction returns only checkpoint IDs, dropping session_index even though the friction table is keyed by (checkpoint_id, session_index). Callers (e.g., transcript deep-read) can end up reading the wrong session (or always index 0) and miss the friction evidence. Consider returning (checkpoint_id, session_index) pairs (or a small struct) and adjusting callers accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +114
// Phase 2: Deep-read transcripts for top friction themes.
patterns := buildFrictionPatterns(frictionThemes)
if err = attachTranscriptExcerpts(ctx, idb, patterns, worktreeRoot); err != nil {
// Non-fatal: proceed without transcript excerpts.
_ = err
}

// Phase 3: Detect context files.
contextFiles := improve.DetectContextFiles(worktreeRoot)

// Phase 4: Build analysis from session data + friction patterns, then generate.
summaries := sessionRowsToSummaries(rows)
analysis := improve.AnalyzePatterns(summaries)
// Overlay the transcript excerpts we fetched into the analysis.
applyExcerpts(analysis.RepeatedFriction, patterns)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 2/4 excerpt wiring looks broken: patterns := buildFrictionPatterns(frictionThemes) sets Theme to the raw friction text, but AnalyzePatterns produces RepeatedFriction themes like "lint", "test", etc. applyExcerpts matches by Theme, so transcript excerpts will almost never attach to the analysis passed into Generator.Generate. Consider running attachTranscriptExcerpts directly on analysis.RepeatedFriction (or ensuring both phases use the same theme key).

Suggested change
// Phase 2: Deep-read transcripts for top friction themes.
patterns := buildFrictionPatterns(frictionThemes)
if err = attachTranscriptExcerpts(ctx, idb, patterns, worktreeRoot); err != nil {
// Non-fatal: proceed without transcript excerpts.
_ = err
}
// Phase 3: Detect context files.
contextFiles := improve.DetectContextFiles(worktreeRoot)
// Phase 4: Build analysis from session data + friction patterns, then generate.
summaries := sessionRowsToSummaries(rows)
analysis := improve.AnalyzePatterns(summaries)
// Overlay the transcript excerpts we fetched into the analysis.
applyExcerpts(analysis.RepeatedFriction, patterns)
// Phase 2: (deprecated wiring) Deep-read transcripts will be attached after analysis is built.
// Phase 3: Detect context files.
contextFiles := improve.DetectContextFiles(worktreeRoot)
// Phase 4: Build analysis from session data, then attach transcript excerpts and generate.
summaries := sessionRowsToSummaries(rows)
analysis := improve.AnalyzePatterns(summaries)
// Attach transcript excerpts directly to the repeated friction patterns used in generation.
if err = attachTranscriptExcerpts(ctx, idb, analysis.RepeatedFriction, worktreeRoot); err != nil {
// Non-fatal: proceed without transcript excerpts.
_ = err
}

Copilot uses AI. Check for mistakes.
Comment on lines +246 to +252
// truncateString truncates s to at most maxLen bytes, appending "..." if truncated.
func truncateString(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
return s[:maxLen] + "..."
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncateString slices by bytes (s[:maxLen]), which can cut multi-byte UTF-8 sequences (common in transcripts) and produce invalid text. Consider truncating by runes (or using utf8.ValidString / []rune), and ensure the ellipsis doesn’t exceed the requested limit if that matters.

Copilot uses AI. Check for mistakes.
func (g *Generator) Generate(ctx context.Context, analysis PatternAnalysis, contextFiles []ContextFile) ([]Suggestion, error) {
prompt := buildPrompt(analysis, contextFiles)

raw, err := g.Runner.Execute(ctx, prompt)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generator.Generate assumes g.Runner is non-nil and will panic if a caller constructs Generator{}. Since Generator is exported, consider defaulting g.Runner to &llmcli.Runner{} when nil (or returning a clear error) to avoid panics in other packages/tests.

Suggested change
raw, err := g.Runner.Execute(ctx, prompt)
runner := g.Runner
if runner == nil {
runner = &llmcli.Runner{}
}
raw, err := runner.Execute(ctx, prompt)

Copilot uses AI. Check for mistakes.
alishakawaguchi and others added 17 commits March 25, 2026 15:12
The ireturn linter now flags these regardless of nolint directives.
These are type-assertion helpers that must return interfaces by design.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 47ce155f7bff
- Replace byte-based truncateString with UTF-8 safe stringutil.TruncateRunes
- Fix TotalTokens in insights_cmd to include subagent tokens via termstyle.TotalTokens
- Deduplicate totalTokensFromUsage in strategy by delegating to termstyle.TotalTokens
- Replace hand-rolled splitCSV with strings.Split
- DRY up Accept/Reject in evolve tracker via shared resolve method
- Cap friction examples at 10 per theme to prevent unbounded LLM prompt growth
- Remove unnecessary WHAT comments in evolve/tracker.go
- Restore nolint:ireturn directives removed in prior commit (fixes lint)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: d63aba52b8ce
Parse total_cost_usd and usage fields from Claude CLI JSON response.
llmcli.Execute now returns UsageInfo alongside the result, and
entire improve shows a cost/token summary line after the report.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8e74831eb583
Keep session scoring from agent-improvements and writeOpts variable
with v2 dual-write from main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 71977e30b301
- Generate summaries on-demand for sessions that lack them during
  insights/improve runs, scoped to --last N sessions requested
- Fix token efficiency scoring: replace exp decay (always 0 for real
  data) with sigmoid centered at 500k tokens/turn
- Fix first-pass scoring: raise baseline to 90, reduce penalties
  (friction 10→5, turns 3→2, open items 5→3)
- Fix friction scoring: reduce penalty from 20→15 per item
- Fix focus scoring: use gaussian on turns/files ratio instead of
  flat band
- Add has_summary column to sessions table for tracking
- Add UpdateSessionSummary for backfill cache updates
- Remove nullableFloat (stored valid 0.0 scores as NULL)
- Always compute partial scores even without summaries
- Show "(no summary)" indicator in insights output
- Restore nolint:ireturn directives on capabilities.go

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: dea7491a6bda
Entire-Checkpoint: faa75c2dab8f
Entire-Checkpoint: b4d20b476fee
Entire-Checkpoint: 89347a654b11
Entire-Checkpoint: 74a14a71f350
Scaffold the memorylooptui package (styles, keys, messages, render helpers,
root model with tab switching) and implement the memories tab with bubbles/table
for record listing, status filter cycling, search mode, detail pane toggle,
and lifecycle action keybindings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 0f0d39f4f655
- Fix truncate() to slice by rune instead of byte to prevent UTF-8 corruption
- Replace hand-rolled ANSI width parser with lipgloss.Width()
- Distinguish empty state messages: no store vs no records vs no filter matches
- Refactor updateSearch to use key.Matches() and msg.Runes instead of msg.String()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: c63d65192881
…d-memory form

- Injection tab: log table with prompt tester and match results
- History tab: refresh history table with R trigger
- Settings tab: mode/policy/max cycling with auto-save
- Add-memory form: inline textinput fields on memories tab (n key)
- Root model: input capture coordination, refresh placeholder

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 0b5a0dcac774
…ail pane on by default

- Change accent from cyan (6) to orange/amber (214) across all tabs
- Make inactive tabs use lighter gray (245) so they're visible on dark terminals
- Show detail pane by default instead of hiding behind Enter toggle

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 45aeb0cc713e
- Increase maxWidth from 80 to 120 for wider terminal support
- Add rounded-border card to memories detail pane
- Add injection log detail view with bordered card for selected entries
- Add bordered card around prompt tester match results
- Make prompt tester input more visible with orange arrow prompt
- Inline pushState calls in handler methods for correct state propagation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 1b62f97ce468
…tions

- Border color changed from dark gray (8) to light gray (245) for visibility
- Added blank line between tab bar and filter bar
- Added spacing between memory table and detail card
- Fixed card width calculation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 14ef2cb624b1
…ngs key fix

- Settings tab: each setting in a bordered card with proper layout
- Injection tab: constrain log table height to prevent overflow
- Settings update: capture changed msg in local struct to fix closure scope
- Stats line condensed to single row

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 9872a596d85e
- Selected setting options shown with orange background + black text chip
- Unselected options in light gray
- Max injected number in bold orange
- Removed debug file logging from settings and root

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 9393df2b476e
alishakawaguchi and others added 6 commits March 26, 2026 18:05
Entire-Checkpoint: 55da86a28935
Entire-Checkpoint: d0d138a8b16d
…ILS header

Redesign the TUI navigation to look like proper tabs with an amber underline
under the active tab and a MEMORY LOOP app title. Convert filter labels to
uppercase chip buttons matching the Settings tab pattern. Add DETAILS section
header above the memory detail card.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: abe1cf53d792
Add blank lines between title, metadata, body, why, and stats sections.
Increase card padding and add spacing between filter chips and table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: a68a4e85fda9
…larity

Injection tab: use section headers, increase card padding, add blank lines
between sections, align detail card fields.

History tab: add descriptive header explaining what refreshes do, widen scope
column from 8 to 16 chars.

Increase prompt preview storage limit from 120 to 500 chars so injection
detail cards show more of the original prompt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 7c16b9136913
Add `entire skill` command that discovers skill files across agent config
directories (.claude/skills/, .gemini/agents/, etc.), tracks their usage
from insightsdb session data in a dedicated skill-analytics.db, and
generates AI-powered improvement suggestions via an interactive Bubbletea
TUI with picker screen and 3-tab dashboard (Stats, Friction, Improve).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 12bbd28d76e5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants