LCORE-1314: conversation compaction spike — design docs, PoC, and 50-…#1328
LCORE-1314: conversation compaction spike — design docs, PoC, and 50-…#1328max-svistunov wants to merge 2 commits intolightspeed-core:mainfrom
Conversation
WalkthroughAdds conversation-history compaction: design docs and PoC results, a Python compaction module using Llama Stack summarization, integration into response handling to optionally compact conversations, and comprehensive unit tests covering compaction logic and flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ResponseHandler as Response Handler
participant Compaction as Compaction Logic
participant ConvStore as Conversation Storage
participant LlamaStack as Llama Stack Backend
Client->>ResponseHandler: user message + existing conversation_id
ResponseHandler->>Compaction: compact_conversation_if_needed(la_id, model, message_count)
Compaction->>ConvStore: fetch conversation items for la_id
ConvStore-->>Compaction: items list
alt item count > threshold
Compaction->>Compaction: split items -> old_items + recent_items
Compaction->>LlamaStack: create temp conversation for summarization
LlamaStack-->>Compaction: temp_conv_id
Compaction->>LlamaStack: send old_items + summarization prompt
LlamaStack-->>Compaction: summary_text
Compaction->>ConvStore: create new conversation seeded with [summary_text] + recent_items
ConvStore-->>Compaction: new_conv_id
Compaction-->>ResponseHandler: return new_conv_id
else below threshold
Compaction-->>ResponseHandler: return original conversation_id
end
ResponseHandler-->>Client: continue with (possibly new) conversation_id
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can disable the changed files summary in the walkthrough.Disable the |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (2)
tests/unit/utils/test_compaction.py (1)
245-258: Consider addingassert_not_calledfor consistency.Unlike
test_skips_below_threshold(Line 243), this test doesn't verify thatconversations.items.listwas not called. Adding this assertion would make the tests consistent and more explicitly verify the early-return behavior.Suggested addition
assert result == "conv_original" + mock_client.conversations.items.list.assert_not_called()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_compaction.py` around lines 245 - 258, In test_skips_at_threshold, add an assertion that the client list method was not invoked to mirror test_skips_below_threshold: after calling compact_conversation_if_needed (in test_skips_at_threshold) and asserting the return equals "conv_original", call mock_client.conversations.items.list.assert_not_called() (or mock_client.conversations.items.list.assert_not_awaited() if you prefer to specifically assert it wasn't awaited) to explicitly verify the early-return behavior of compact_conversation_if_needed.src/utils/compaction.py (1)
49-64: Text extraction differs from canonical pattern inresponses.py.The canonical
_extract_text_from_contentinsrc/utils/responses.py(lines 1127-1148) checkspart.typeforinput_text,output_text, andrefusalcontent types. This simpler implementation works for the PoC but may miss content in production if parts have different structures.For production, consider aligning with or reusing the canonical extraction logic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/compaction.py` around lines 49 - 64, The _extract_message_text implementation is a simplified extractor that can miss parts handled by the canonical _extract_text_from_content; update _extract_message_text to reuse or mirror that canonical logic by checking each part's type (handle 'input_text', 'output_text', 'refusal') as well as existing cases (string, dict with "text", object with .text), concatenating extracted texts into a single string and falling back to str(content) when nothing matches; specifically modify the function named _extract_message_text to call or inline the same extraction rules as _extract_text_from_content so all content shapes are covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/conversation-compaction/conversation-compaction-spike.md`:
- Line 11: The reviewer handle in the document is inconsistent: replace all
occurrences of "@ptisnovs" with the requested reviewer "@tisnik" (notably the
mentions currently at the header and the decision entries such as the mention
near the top and the one around the decision block referenced on lines ~11 and
~86) so the requested reviewers match the PR objectives; ensure you update every
instance of the old handle to the new handle to avoid routing confusion.
- Line 25: The intra-document markdown links using fragments like
`#design-alternatives`, `#llama-stack-upstream`, and `#decisions-technical` are
not matching the generated heading IDs; search the document for those exact
fragment strings and replace each with the precise generated anchor created by
your markdown renderer (match capitalization, spacing-to-hyphen rules, and any
punctuation handling) so links point to the actual headings (e.g., update the
link target to the exact slug produced for the "Design alternatives", "Llama
Stack (upstream)", and "Decisions - Technical" headings); also fix the same
fragments used elsewhere in the file to ensure all intra-doc links resolve.
In `@docs/design/conversation-compaction/conversation-compaction.md`:
- Line 8: Update the broken spike-doc link in conversation-compaction.md:
replace the existing reference to docs/design/conversation-compaction-spike.md
with the correct path under the conversation-compaction folder
(docs/design/conversation-compaction/conversation-compaction-spike.md) so the
table row that currently contains the link text "Spike doc:
`docs/design/conversation-compaction-spike.md`" points to the actual file
location.
- Around line 348-349: The guidance contains malformed code/markdown for config
classes: replace the broken fragments so it reads that all config classes should
extend ConfigurationBase which sets extra="forbid", use Field(...) with
defaults, title, and description, and add `@model_validator`(mode="after") for
cross-field validation when needed; update references to ConfigurationBase,
extra, Field, and `@model_validator` to use correct punctuation and valid syntax
so the instructions are unambiguous.
In `@docs/design/conversation-compaction/poc-results/06-probe-responses.txt`:
- Around line 19-20: The "Network Policies" sentence in Turn 11 is truncated
("communicate with each") and leaves the artifact unreliable; edit the Turn 11
probe response to either finish the sentence (e.g., "communicate with each other
or external services, limiting traffic between pods and namespaces") or
explicitly mark the paragraph as intentionally truncated (e.g., add
"[TRUNCATED]" and a brief note), updating the "Network Policies" line in the
Turn 11 section so the document is complete and unambiguous.
In `@src/utils/compaction.py`:
- Around line 109-145: The _summarize_conversation function must catch Llama
Stack API errors around client.conversations.create and client.responses.create;
wrap those calls in a try/except that handles APIConnectionError and
APIStatusError (import them from llama_stack_client), log the error with context
(e.g., "Failed to create conversation" / "Failed to create response for
summarization") and re-raise or return a fallback as appropriate; ensure you
avoid leaving an orphaned conversation by deleting the temporary conversation
(use client.conversations.delete(summary_conv.id)) if response creation fails
(perform delete in the except block or a finally that checks summary_conv and
whether response was created); keep the existing summary extraction from
response.output unchanged.
In `@src/utils/responses.py`:
- Around line 300-307: The call to compact_conversation_if_needed (when
user_conversation is truthy) can raise Llama Stack errors but is not translated
to the API's HTTPException like other Llama Stack operations; wrap the
compact_conversation_if_needed call in a try/except that catches the Llama Stack
error types used elsewhere (e.g., APIConnectionError or the project's
LlamaStackError wrapper) and re-raise an appropriate fastapi.HTTPException with
the same status code/message mapping used in this module; ensure you reference
compact_conversation_if_needed, llama_stack_conv_id, and user_conversation so
the behavior matches other error handling in responses.py.
---
Nitpick comments:
In `@src/utils/compaction.py`:
- Around line 49-64: The _extract_message_text implementation is a simplified
extractor that can miss parts handled by the canonical
_extract_text_from_content; update _extract_message_text to reuse or mirror that
canonical logic by checking each part's type (handle 'input_text',
'output_text', 'refusal') as well as existing cases (string, dict with "text",
object with .text), concatenating extracted texts into a single string and
falling back to str(content) when nothing matches; specifically modify the
function named _extract_message_text to call or inline the same extraction rules
as _extract_text_from_content so all content shapes are covered.
In `@tests/unit/utils/test_compaction.py`:
- Around line 245-258: In test_skips_at_threshold, add an assertion that the
client list method was not invoked to mirror test_skips_below_threshold: after
calling compact_conversation_if_needed (in test_skips_at_threshold) and
asserting the return equals "conv_original", call
mock_client.conversations.items.list.assert_not_called() (or
mock_client.conversations.items.list.assert_not_awaited() if you prefer to
specifically assert it wasn't awaited) to explicitly verify the early-return
behavior of compact_conversation_if_needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5bb2811f-d69d-4f62-951a-a2cbb2af3edd
📒 Files selected for processing (13)
docs/design/conversation-compaction/conversation-compaction-spike.mddocs/design/conversation-compaction/conversation-compaction.mddocs/design/conversation-compaction/poc-results/01-analysis.txtdocs/design/conversation-compaction/poc-results/02-conversation-log.txtdocs/design/conversation-compaction/poc-results/03-token-usage.txtdocs/design/conversation-compaction/poc-results/04-compaction-events.jsondocs/design/conversation-compaction/poc-results/05-summaries-extracted.txtdocs/design/conversation-compaction/poc-results/06-probe-responses.txtdocs/design/conversation-compaction/poc-results/07-conversation-items-per-compaction.txtdocs/design/conversation-compaction/poc-results/08-all-responses-raw.jsonsrc/utils/compaction.pysrc/utils/responses.pytests/unit/utils/test_compaction.py
|
|
||
| **PoC validation**: A working proof-of-concept was built and tested with 50 queries across 4 compaction cycles. Results in [PoC results](#poc-results). | ||
|
|
||
| # Decisions for @ptisnovs and @sbunciak |
There was a problem hiding this comment.
Reviewer handle appears inconsistent with requested reviewers.
Line 11 and Line 86 reference @ptisnovs; the PR objectives request confirmations from @tisnik. Please align naming to avoid decision-routing confusion.
Also applies to: 86-86
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/conversation-compaction/conversation-compaction-spike.md` at line
11, The reviewer handle in the document is inconsistent: replace all occurrences
of "@ptisnovs" with the requested reviewer "@tisnik" (notably the mentions
currently at the header and the decision entries such as the mention near the
top and the one around the decision block referenced on lines ~11 and ~86) so
the requested reviewers match the PR objectives; ensure you update every
instance of the old handle to the new handle to avoid routing confusion.
| All config classes extend `ConfigurationBase` which sets `extra`"forbid"`. | ||
| Use =Field()` with defaults, title, and description. Add `@model_validator(mode`"after")= for cross-field validation if needed. |
There was a problem hiding this comment.
Correct malformed config-pattern guidance snippet.
Line 348 and Line 349 contain broken markdown/code (extra"forbid", =Field(), @model_validator(mode"after")=). Please fix these to valid syntax so implementation instructions are unambiguous.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/conversation-compaction/conversation-compaction.md` around lines
348 - 349, The guidance contains malformed code/markdown for config classes:
replace the broken fragments so it reads that all config classes should extend
ConfigurationBase which sets extra="forbid", use Field(...) with defaults,
title, and description, and add `@model_validator`(mode="after") for cross-field
validation when needed; update references to ConfigurationBase, extra, Field,
and `@model_validator` to use correct punctuation and valid syntax so the
instructions are unambiguous.
| 5. **Network Policies**: Kubernetes allows the enforcement of network policies to control how pods communicate with each | ||
|
|
There was a problem hiding this comment.
Complete the truncated probe response in Turn 11.
Line 19 ends mid-sentence (“communicate with each”), and no continuation follows. Please restore the missing text or explicitly mark this section as intentionally truncated so the artifact remains reliable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/conversation-compaction/poc-results/06-probe-responses.txt`
around lines 19 - 20, The "Network Policies" sentence in Turn 11 is truncated
("communicate with each") and leaves the artifact unreliable; edit the Turn 11
probe response to either finish the sentence (e.g., "communicate with each other
or external services, limiting traffic between pods and namespaces") or
explicitly mark the paragraph as intentionally truncated (e.g., add
"[TRUNCATED]" and a brief note), updating the "Network Policies" line in the
Turn 11 section so the document is complete and unambiguous.
| async def _summarize_conversation( | ||
| client: AsyncLlamaStackClient, | ||
| model: str, | ||
| items: list[Any], | ||
| ) -> str: | ||
| """Use the LLM to produce a concise summary of *items*. | ||
| Parameters: | ||
| client: Llama Stack client. | ||
| model: Full model ID (``provider/model``). | ||
| items: Conversation items to summarize. | ||
| Returns: | ||
| Summary text produced by the LLM. | ||
| """ | ||
| conversation_text = _format_conversation_for_summary(items) | ||
| prompt = SUMMARIZATION_PROMPT.format(conversation_text=conversation_text) | ||
|
|
||
| summary_conv = await client.conversations.create(metadata={}) | ||
| response = await client.responses.create( | ||
| input=prompt, | ||
| model=model, | ||
| conversation=summary_conv.id, | ||
| store=False, | ||
| ) | ||
|
|
||
| summary_parts: list[str] = [] | ||
| for output_item in response.output: | ||
| content = getattr(output_item, "content", None) | ||
| if content is None: | ||
| continue | ||
| for content_part in content: | ||
| text = getattr(content_part, "text", None) | ||
| if text: | ||
| summary_parts.append(text) | ||
|
|
||
| return "".join(summary_parts) or "No summary generated." |
There was a problem hiding this comment.
Missing error handling for Llama Stack API calls.
Per coding guidelines, APIConnectionError from Llama Stack should be handled in API operations. The calls to client.conversations.create (Line 127) and client.responses.create (Lines 128-133) could fail, leaving the caller without graceful error handling.
Additionally, if responses.create fails, the temporary conversation created at Line 127 becomes orphaned.
Suggested error handling pattern (from responses.py)
from llama_stack_client import APIConnectionError, APIStatusError
async def _summarize_conversation(...) -> str:
# ... format prompt ...
try:
summary_conv = await client.conversations.create(metadata={})
response = await client.responses.create(
input=prompt,
model=model,
conversation=summary_conv.id,
store=False,
)
except APIConnectionError as e:
logger.error("Failed to connect to Llama Stack for summarization: %s", e)
raise # or return a fallback / re-raise wrapped
except APIStatusError as e:
logger.error("Llama Stack returned error during summarization: %s", e)
raise
# ... extract summary ...As per coding guidelines: "Handle APIConnectionError from Llama Stack in API operations".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/compaction.py` around lines 109 - 145, The _summarize_conversation
function must catch Llama Stack API errors around client.conversations.create
and client.responses.create; wrap those calls in a try/except that handles
APIConnectionError and APIStatusError (import them from llama_stack_client), log
the error with context (e.g., "Failed to create conversation" / "Failed to
create response for summarization") and re-raise or return a fallback as
appropriate; ensure you avoid leaving an orphaned conversation by deleting the
temporary conversation (use client.conversations.delete(summary_conv.id)) if
response creation fails (perform delete in the except block or a finally that
checks summary_conv and whether response was created); keep the existing summary
extraction from response.output unchanged.
| # Check if conversation needs compaction (PoC) | ||
| if user_conversation: | ||
| llama_stack_conv_id = await compact_conversation_if_needed( | ||
| client=client, | ||
| llama_stack_conv_id=llama_stack_conv_id, | ||
| model=model, | ||
| message_count=user_conversation.message_count, | ||
| ) |
There was a problem hiding this comment.
Wrap compaction call with the same API error translation used elsewhere.
Lines 302-307 can raise Llama Stack errors through compact_conversation_if_needed(), but this branch currently does not map them to your standard HTTPException responses.
💡 Suggested fix
# Check if conversation needs compaction (PoC)
if user_conversation:
- llama_stack_conv_id = await compact_conversation_if_needed(
- client=client,
- llama_stack_conv_id=llama_stack_conv_id,
- model=model,
- message_count=user_conversation.message_count,
- )
+ try:
+ llama_stack_conv_id = await compact_conversation_if_needed(
+ client=client,
+ llama_stack_conv_id=llama_stack_conv_id,
+ model=model,
+ message_count=user_conversation.message_count,
+ )
+ except APIConnectionError as e:
+ error_response = ServiceUnavailableResponse(
+ backend_name="Llama Stack",
+ cause=str(e),
+ )
+ raise HTTPException(**error_response.model_dump()) from e
+ except APIStatusError as e:
+ error_response = handle_known_apistatus_errors(e, model)
+ raise HTTPException(**error_response.model_dump()) from eAs per coding guidelines "Handle APIConnectionError from Llama Stack in API operations".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Check if conversation needs compaction (PoC) | |
| if user_conversation: | |
| llama_stack_conv_id = await compact_conversation_if_needed( | |
| client=client, | |
| llama_stack_conv_id=llama_stack_conv_id, | |
| model=model, | |
| message_count=user_conversation.message_count, | |
| ) | |
| # Check if conversation needs compaction (PoC) | |
| if user_conversation: | |
| try: | |
| llama_stack_conv_id = await compact_conversation_if_needed( | |
| client=client, | |
| llama_stack_conv_id=llama_stack_conv_id, | |
| model=model, | |
| message_count=user_conversation.message_count, | |
| ) | |
| except APIConnectionError as e: | |
| error_response = ServiceUnavailableResponse( | |
| backend_name="Llama Stack", | |
| cause=str(e), | |
| ) | |
| raise HTTPException(**error_response.model_dump()) from e | |
| except APIStatusError as e: | |
| error_response = handle_known_apistatus_errors(e, model) | |
| raise HTTPException(**error_response.model_dump()) from e |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/responses.py` around lines 300 - 307, The call to
compact_conversation_if_needed (when user_conversation is truthy) can raise
Llama Stack errors but is not translated to the API's HTTPException like other
Llama Stack operations; wrap the compact_conversation_if_needed call in a
try/except that catches the Llama Stack error types used elsewhere (e.g.,
APIConnectionError or the project's LlamaStackError wrapper) and re-raise an
appropriate fastapi.HTTPException with the same status code/message mapping used
in this module; ensure you reference compact_conversation_if_needed,
llama_stack_conv_id, and user_conversation so the behavior matches other error
handling in responses.py.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
docs/design/conversation-compaction/conversation-compaction-spike.md (2)
661-661: Significant architectural change warrants emphasis.Line 661 states "Call Llama Stack with explicit input (NOT =conversation_id=)" and Line 671 notes "lightspeed-stack stops using Llama Stack's
conversationparameter." This is a major architectural shift from the current design where Llama Stack manages conversation history retrieval.Consider adding a dedicated subsection or callout highlighting this change, its rationale (full control over LLM input), and implications (lightspeed-stack becomes responsible for conversation assembly). This will help reviewers and implementers understand the scope of the change.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/conversation-compaction/conversation-compaction-spike.md` at line 661, Add a dedicated subsection or highlighted callout in the document that explicitly calls out the architectural change described by "Call Llama Stack with explicit input (NOT =conversation_id=)" and "lightspeed-stack stops using Llama Stack's `conversation` parameter"; in that subsection state the rationale (lightspeed-stack takes full control over LLM input/assembly) and list concrete implications (responsibility for conversation assembly, retrieval, ordering, and any API/contract changes) so implementers understand scope and migration steps.
726-731: Consider highlighting cost implications more prominently.The example shows ~74,000 tokens per compaction call (70K input + 2-4K output). At typical LLM pricing, this represents material cost per compaction event. For a frequently-compacting conversation, costs could accumulate significantly.
Consider adding a concrete cost estimate (e.g., "$0.74 per compaction at $10/1M tokens for GPT-4") and noting that high-usage conversations might trigger compaction multiple times, making cost monitoring important for deployments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/conversation-compaction/conversation-compaction-spike.md` around lines 726 - 731, Update the "Example for a 128K context window at 70% threshold" block to call out cost implications explicitly: compute a concrete cost estimate for the ~74,000-token compaction (e.g., "$0.74 per compaction at $10/1M tokens for GPT-4") and add a short note that repeated compactions can accumulate costs and should be monitored or throttled in high-usage conversations; place this cost estimate immediately after the bullet list for the 128K example and ensure the text references the same example wording ("128K context window at 70% threshold" and the "~74,000 total tokens" summary) so readers can see cost per compaction and the recommendation to monitor compaction frequency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/conversation-compaction/conversation-compaction-spike.md`:
- Line 388: The doc claim mismatches the actual change: verify the insertion in
src/utils/responses.py is the same as described in the PR objectives (the hook
in prepare_responses_params()); either update the doc line to say "8-line
insertion" if the code contains 8 new lines, or change the code to include the
full 10-line hook as described — then update the text in
conversation-compaction-spike.md to reflect the accurate line count and ensure
the symbol prepare_responses_params() is referenced consistently.
- Line 153: Change the seven JIRA headings from third-level to second-level
headings so the document follows proper markdown hierarchy: replace each "###
LCORE-????: Add token estimation to lightspeed-stack" style heading (and the
other six JIRA headings in the same pattern) with "##" instead of "###" so they
sit under the top-level "Proposed JIRAs" heading; ensure all instances of the
JIRA headings (the exact heading texts at lines shown in the review) are updated
consistently.
- Line 25: Update the broken intra-document markdown links: replace any link
pointing to "#design-alternatives" with
"#design-alternatives-for-lightspeed-stack", change "#llama-stack-upstream"
occurrences to "#appendix-a-llama-stack-upstream-status" (including the repeated
instance near the appendix), and remove or replace the non-existent
"#decisions-technical" reference with the correct existing heading anchor or add
the missing "Decisions — Technical" heading to the document so the link
resolves.
---
Nitpick comments:
In `@docs/design/conversation-compaction/conversation-compaction-spike.md`:
- Line 661: Add a dedicated subsection or highlighted callout in the document
that explicitly calls out the architectural change described by "Call Llama
Stack with explicit input (NOT =conversation_id=)" and "lightspeed-stack stops
using Llama Stack's `conversation` parameter"; in that subsection state the
rationale (lightspeed-stack takes full control over LLM input/assembly) and list
concrete implications (responsibility for conversation assembly, retrieval,
ordering, and any API/contract changes) so implementers understand scope and
migration steps.
- Around line 726-731: Update the "Example for a 128K context window at 70%
threshold" block to call out cost implications explicitly: compute a concrete
cost estimate for the ~74,000-token compaction (e.g., "$0.74 per compaction at
$10/1M tokens for GPT-4") and add a short note that repeated compactions can
accumulate costs and should be monitored or throttled in high-usage
conversations; place this cost estimate immediately after the bullet list for
the 128K example and ensure the text references the same example wording ("128K
context window at 70% threshold" and the "~74,000 total tokens" summary) so
readers can see cost per compaction and the recommendation to monitor compaction
frequency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7f012f04-87ce-475a-8eea-260918df70b4
📒 Files selected for processing (1)
docs/design/conversation-compaction/conversation-compaction-spike.md
| | B | Tiered memory (MemGPT-style) | High | Excellent | | ||
| | C | Delegate to provider-native APIs | Low-Med | Varies | | ||
|
|
||
| See [Design alternatives](#design-alternatives) for full pros/cons of each. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify that markdown link fragments match the generated heading IDs
# Extract all internal links and headings from the document
cat docs/design/conversation-compaction/conversation-compaction-spike.md | \
gawk '
/^#+ / {
heading = tolower($0)
gsub(/^#+ /, "", heading)
gsub(/[^a-z0-9- ]/, "", heading)
gsub(/ /, "-", heading)
print "HEADING: " heading
}
/]\(#[^)]+\)/ {
match($0, /]\(#([^)]+)\)/, arr)
if (arr[1]) print "LINK: " arr[1]
}
'Repository: lightspeed-core/lightspeed-stack
Length of output: 134
🏁 Script executed:
#!/bin/bash
# Verify markdown link fragments match heading IDs
# Read the file and use Python to extract headings and links
python3 << 'EOF'
import re
import os
file_path = "docs/design/conversation-compaction/conversation-compaction-spike.md"
if not os.path.exists(file_path):
print(f"File not found: {file_path}")
exit(1)
with open(file_path, 'r') as f:
content = f.read()
# Extract all headings and generate their fragment IDs
headings = {}
for match in re.finditer(r'^(#{1,6})\s+(.+)$', content, re.MULTILINE):
heading_text = match.group(2).strip()
# Standard markdown ID generation: lowercase, spaces to hyphens, remove special chars
fragment_id = re.sub(r'[^\w\s-]', '', heading_text.lower())
fragment_id = re.sub(r'\s+', '-', fragment_id)
line_num = content[:match.start()].count('\n') + 1
headings[fragment_id] = {'text': heading_text, 'line': line_num}
print(f"HEADING: Line {line_num} -> #{fragment_id} (text: '{heading_text}')")
print("\n" + "="*80 + "\n")
# Extract all internal links
links = {}
for match in re.finditer(r'\[([^\]]+)\]\(#([^)]+)\)', content):
link_text = match.group(1)
link_target = match.group(2)
line_num = content[:match.start()].count('\n') + 1
links[line_num] = {'text': link_text, 'target': link_target}
# Check if link exists in headings
exists = link_target in headings
status = "✓ FOUND" if exists else "✗ NOT FOUND"
print(f"LINK: Line {line_num} -> #{link_target} [{status}]")
if not exists and link_target:
# Find similar headings
similar = [h for h in headings.keys() if link_target.lower() in h or h in link_target.lower()]
if similar:
print(f" Possible matches: {similar}")
EOFRepository: lightspeed-core/lightspeed-stack
Length of output: 6831
Fix broken intra-document links in markdown.
Line 25: #design-alternatives should be #design-alternatives-for-lightspeed-stack
Line 84: #llama-stack-upstream should be #appendix-a-llama-stack-upstream-status
Additionally, Line 586 has the same issue as Line 84. Line 648 references #decisions-technical which does not exist in the document.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 25-25: Link fragments should be valid
(MD051, link-fragments)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/conversation-compaction/conversation-compaction-spike.md` at line
25, Update the broken intra-document markdown links: replace any link pointing
to "#design-alternatives" with "#design-alternatives-for-lightspeed-stack",
change "#llama-stack-upstream" occurrences to
"#appendix-a-llama-stack-upstream-status" (including the repeated instance near
the appendix), and remove or replace the non-existent "#decisions-technical"
reference with the correct existing heading anchor or add the missing "Decisions
— Technical" heading to the document so the link resolves.
|
|
||
| Each JIRA includes an agentic tool instruction that an assignee can optionally feed to Claude Code or similar. | ||
|
|
||
| ### LCORE-????: Add token estimation to lightspeed-stack |
There was a problem hiding this comment.
Fix heading hierarchy.
The heading structure jumps from h1 (# Proposed JIRAs) directly to h3 (### LCORE-????), skipping h2. This violates standard markdown heading hierarchy (MD001). Consider using h2 (##) for individual JIRA entries to maintain proper document structure.
📝 Suggested fix
# Proposed JIRAs
Each JIRA includes an agentic tool instruction that an assignee can optionally feed to Claude Code or similar.
-### LCORE-????: Add token estimation to lightspeed-stack
+## LCORE-????: Add token estimation to lightspeed-stackApply the same change to all 7 JIRA headings (lines 153, 179, 207, 234, 262, 286, 313).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### LCORE-????: Add token estimation to lightspeed-stack | |
| # Proposed JIRAs | |
| Each JIRA includes an agentic tool instruction that an assignee can optionally feed to Claude Code or similar. | |
| ## LCORE-????: Add token estimation to lightspeed-stack |
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 153-153: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/conversation-compaction/conversation-compaction-spike.md` at line
153, Change the seven JIRA headings from third-level to second-level headings so
the document follows proper markdown hierarchy: replace each "### LCORE-????:
Add token estimation to lightspeed-stack" style heading (and the other six JIRA
headings in the same pattern) with "##" instead of "###" so they sit under the
top-level "Proposed JIRAs" heading; ensure all instances of the JIRA headings
(the exact heading texts at lines shown in the review) are updated consistently.
| ## PoC code | ||
|
|
||
| - `src/utils/compaction.py` — compaction logic (trigger, split, summarize, new conversation). | ||
| - `src/utils/responses.py` — 8-line insertion calling `compact_conversation_if_needed()`. |
There was a problem hiding this comment.
Minor discrepancy in line count.
This line states "8-line insertion", but the PR objectives describe a "10-line hook in prepare_responses_params()". Please verify the correct line count for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/design/conversation-compaction/conversation-compaction-spike.md` at line
388, The doc claim mismatches the actual change: verify the insertion in
src/utils/responses.py is the same as described in the PR objectives (the hook
in prepare_responses_params()); either update the doc line to say "8-line
insertion" if the code contains 8 new lines, or change the code to include the
full 10-line hook as described — then update the text in
conversation-compaction-spike.md to reflect the accurate line count and ensure
the symbol prepare_responses_params() is referenced consistently.
LCORE-1314: Conversation compaction spike
Spike deliverable for LCORE-1311 (Conversation History Summarization).
What's in this PR
Design docs (
docs/design/conversation-compaction/):conversation-compaction-spike.md(use the "Outline" button) — the spike: research, design alternatives, PoC results, proposed JIRAsconversation-compaction.md(use the "Outline" button) — feature spec (requirements, use cases, architecture, implementation suggestions) assuming the recommended options are chosen -- changeable based on final decisions, will be kept in the repo long-termpoc-results/— evidence from a 50-query experiment validating the PoCPoC code:
src/utils/compaction.py— compaction logic (trigger, split, summarize, new conversation)src/utils/responses.py— 10-line hook inprepare_responses_params()tests/unit/utils/test_compaction.py— 19 unit tests, all passingMain findings
For reviewers
conversation-compaction-spike.md:@tisnik Please review and confirm the 4 technical-detail decisions (conversation_id handling, truncated field, storage location, buffer zone calculation) in the "Technical decisions for @ptisnovs" section.
@tisnik @sbunciak Please review the "Proposed JIRAs" section -- I will file the tickets once decisions are confirmed.
Doc structure note: The first three sections (
Decisions for @ptisnovs and @sbunciak,Technical decisions for @ptisnovs,Proposed JIRAs) of the spike doc are where your input is needed. They link to background sections later in the doc (current architecture, API comparison, design alternatives, PoC results) — read those if you need more context on a specific point, but it is optional.Closes # LCORE-1314
Related Issue # LCORE-1311
Summary by CodeRabbit
New Features
Documentation
Tests